-
-
Notifications
You must be signed in to change notification settings - Fork 29.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: improve AArch64 code generation #119726
Comments
Thanks for organizing our thoughts on this. Okay if I assign you, since you expressed interest in working on it?
Interesting! Mind elaborating on this a bit more? I get that it saves memory, but I'm curious if it's expected to be faster too.
I'd break this up into a couple of phases:
Also worth mentioning: we'll want to move to short jumps with trampolines on all platforms, not just AArch64 (AArch64 just sort of forces our hand right now since it only lets us use short jumps). So this work should also benefit other platforms too, which is nice. |
I've updated the original comment saying that it saves 8 bytes. About the speed, I think we need to measure it somehow but I would think it would be the same. The other saving is that we will do only one relocation instead of four. The code will be something like that:
Of course :) |
When emitting AArch64 trampolines at the end of every data stencil, re-use existent ones fot the same symbol. Fix the disassebly to reflect the "bl" instruction without the relocation.
Replace AArch64 trampolines with LDR of a PC relative literal. It saves 8 bytes in code size per trampoline and decreases the number of patches functions from 4 to 1 per stencil. It decreases by 17% the size of the stencil header file generated.
Emit AArch64 trampolines in the data section (instead of the code) of the stencil. In many cases this allows the branch to the next micro-op at the end of the stencil to be replaced with a fall-through NOP.
Feature or enhancement
Proposal:
This is really a follow up of #115802 and more focused on the AArch64 improvements of the code generated for the JIT.
This has been discussed with @brandtbucher during PyCon 2024.
There are a series of incremental improvements that we could implement when generating AArch64 code:
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
This has been discussed broadly at PyCon 2024 in person.
Linked PRs
The text was updated successfully, but these errors were encountered: