Inverting a binary tree using x64 assembly

While having a investigate cross-test twitter, I got here across this tweet:

Downside popular
Some ground rules
The task is to invert a binary tree. A binary tree node is defined as follows:
struct TreeNode{ int val; struct TreeNode * left; struct Treenode * right;};
This definition is from Leetcode 226. After we’re accomplished, we are capable of submit our code there to search if it basically works.
I will seemingly be using x64 assembly with the AT&T syntax because it’s miles objectively superior than the Intel syntax.
Technically the tweet asks to invert a merkle tree, nonetheless you would possibly per chance be ready to without problems prolong the final code to recompute the hash at each and each node after its two subtrees are inverted.
Pseudocode
The aim we desire to implement is as follows:
struct TreeNode * invertTree(struct TreeNode * root);
We can initiating by writing some pseudocode
|
|
This pseudocode is intentionally verbose so we are capable of without problems observe all local variables now we like to make and retailer on the stack.We would be translating this pseudocode because it’s miles to assembly.
General setup
|
|
Lines 1 and 2 repeat the assembler that we’re defining an invert_tree
image of kind aim. The globl
modifier enables our aim to be called from various source files.The .text
portion is the Code section where we are capable of be writing our aim.
Feature prologue and epilogue
To stipulate this aim in assembly, we first like to fancy the x86-64 calling convention.That is documented within the Machine V ABI on hand here.
|
|
First, we push the present fee of %rbp
on the stack and copy %rsp
to %rbp
. We then subtract 32 bytes
from the stack pointer to compose dwelling to retailer our local variables.
Doing this units up a stack frame for this aim, with %rbp
as the “execrable”. We can now use offsets from %rbp
to gain admission to data on the stack.
There would possibly maybe be an enter
instruction which units up the stack frame for us, but this would per chance nicely be very slack. On standard processors enter
decodes to a couple 10 to 20 µops, whereas the three instruction sequence is about 4 to 6, searching on the structure. Offer
The leave
instruction does exactly the different of traces 3
and 4
. It restores %rsp
and pops the fee from stack into %rbp
.
Stack alignment
In our pseudocode, there are 3 local variables local_root
, local_left
and local_right
. Each and each of these is a pointer kind, thus they require an total of 24 bytes to be kept on the stack (A pointer kind has dimension 8 bytes in x64).
Nonetheless, we instead compose dwelling for 32 bytes
to compose sure that our stack is “aligned” accurately. From the Machine V ABI doc:
“The stack must be 16 byte aligned straight earlier than any call instruction is completed.”
Defective case
The aim accepts a single pointer as input, thus in retaining with the calling convention, we are capable of be receiving it as a 64 bit fee within the register %rdi
.
To come relieve a pointer from our aim, now we like to retailer it within the register %rax
earlier than we ret
from our aim.
We evaluation if the root is NULL
and whether it’s miles, we return NULL
.
|
|
In x64, NULL
is the fee 0
. We compare %rdi
with 0
and if it succeeds, we put 0
in %rax
and return from our aim.To enact this, using xor
is faster than mov
or sub
.
Struct alignment
On most ISA’s each and each data kind in a struct has an alignment requirement. What this suggests is that the address of a kind must initiating on an address which is a just a few of their dimension. (There are some exceptions, but this is the everyday rule)
So int
’s must initiating on an address which is a just a few of 4, and pointers must initiating on an address which is a just a few of 8.
Given this, our normal struct is specified by memory something appreciate so:
struct TreeNode{ int val; char pad[4]; // 4 extra bytes to align the next pointer struct TreeNode * left; struct Treenode * right;};
This wastes 4 bytes for every struct in memory. We can set up this dwelling by declaring the pointers first, nonetheless we are capable of proceed to make use of the authentic struct declaration as we wish to submit this to Leetcode.
Storing local variables on the stack
As soon as we know the address offsets in our struct, we are capable of then retailer them on the stack. We can retailer local_root, left and proper pointers on the stack. Each and each of them has a dimension of 8 bytes. We retailer them as follows
Variable | space initiating | space stop |
---|---|---|
local_root | %rbp – 8 | %rbp |
local_left | %rbp – 16 | %rbp – 8 |
local_right | %rbp – 24 | %rbp – 16 |
Also, to compose relating to them less difficult, we are capable of stipulate these offsets as constants using the equ
assembler directive.
|
|
We can then use them in our code.
|
|
Hang the recursive calls
At this level, having a investigate cross-test at the pseudocode, we simply like to love 2 recursive calls.
To enact this, we retailer the supreme parameter to %rdi
and use the call
instruction.
|
|
To set up the outcomes of the recursive calls, we use the %rdx
register to compute the supreme addresses of root.left
and root.right
.Most of this is a straight translation of the pseudocode.
Submitting to leetcode
We cannot submit assembly to Leetcode, so we are capable of use C instead. Leetcode makes use of gcc to bring collectively our code, so we are capable of use the __asm__
directive to add our assembly.
We wish to additionally repeat our aim with extern
earlier than we are capable of call it.
|
|
And we’re accomplished!

The leetcode performance and memory numbers are very unreliable, so we don’t basically understand how well-known sooner our program is when when put next with various submissions, but I believe it’s miles sort of fast.

Conclusion
This became a fun excursion to brush up on my assembly skills. It wasn’t refined, but positively became slack (That is frequently how I likeabout most leetcode style questions).

Now not lower than now I will pronounce to extra memes on r/ProgrammerHumor

HN Front net page update
Since just a few of us from HN and reddit are here, time for a shameless stir:
I am actively having a investigate cross-test for a python/rust/quant job.Right here’s my Github and Email
Thanks for reading. Joyful Thanksgiving!