General

Inverting a binary tree using x64 assembly

While having a investigate cross-test twitter, I got here across this tweet:

Tweet asking you to invert merkle tree

Downside popular

Some ground rules

The task is to invert a binary tree. A binary tree node is defined as follows:

struct TreeNode{     int val;    struct TreeNode * left;    struct Treenode * right;};

This definition is from Leetcode 226. After we’re accomplished, we are capable of submit our code there to search if it basically works.

I will seemingly be using x64 assembly with the AT&T syntax because it’s miles objectively superior than the Intel syntax.

Technically the tweet asks to invert a merkle tree, nonetheless you would possibly per chance be ready to without problems prolong the final code to recompute the hash at each and each node after its two subtrees are inverted.

Pseudocode

The aim we desire to implement is as follows:

struct TreeNode * invertTree(struct TreeNode * root);

We can initiating by writing some pseudocode

 1 2 3 4 5 6 7 8 91011121314
function invert(root) {    let local_root = root;    if local_root is null {        return null    }    let local_left = local_root.left;    let local_right = local_root.right;    root.right = invert(local_left);    root.left = invert(local_right);    return local_root;}

This pseudocode is intentionally verbose so we are capable of without problems observe all local variables now we like to make and retailer on the stack.We would be translating this pseudocode because it’s miles to assembly.

General setup

12345
.globl invert_tree.type invert_tree, @function.section .textinvert_tree:    # Our function goes here

Lines 1 and 2 repeat the assembler that we’re defining an invert_tree image of kind aim. The globlmodifier enables our aim to be called from various source files.The .text portion is the Code section where we are capable of be writing our aim.

Feature prologue and epilogue

To stipulate this aim in assembly, we first like to fancy the x86-64 calling convention.That is documented within the Machine V ABI on hand here.

123456789
invert_tree:    # Function prologue    push %rbp    mov %rsp, %rbp    sub $32, %rsp    # Function epilogue    leave    ret

First, we push the present fee of %rbp on the stack and copy %rsp to %rbp. We then subtract 32 bytes from the stack pointer to compose dwelling to retailer our local variables.

Doing this units up a stack frame for this aim, with %rbp as the “execrable”. We can now use offsets from %rbp to gain admission to data on the stack.

There would possibly maybe be an enter instruction which units up the stack frame for us, but this would per chance nicely be very slack. On standard processors enter decodes to a couple 10 to 20 µops, whereas the three instruction sequence is about 4 to 6, searching on the structure. Offer

The leave instruction does exactly the different of traces 3 and 4. It restores %rsp and pops the fee from stack into %rbp.

Stack alignment

In our pseudocode, there are 3 local variables local_root, local_left and local_right. Each and each of these is a pointer kind, thus they require an total of 24 bytes to be kept on the stack (A pointer kind has dimension 8 bytes in x64).

Nonetheless, we instead compose dwelling for 32 bytes to compose sure that our stack is “aligned” accurately. From the Machine V ABI doc:

“The stack must be 16 byte aligned straight earlier than any call instruction is completed.”

Defective case

The aim accepts a single pointer as input, thus in retaining with the calling convention, we are capable of be receiving it as a 64 bit fee within the register %rdi.

To come relieve a pointer from our aim, now we like to retailer it within the register %rax earlier than we ret from our aim.

We evaluation if the root is NULLand whether it’s miles, we return NULL.

 1 2 3 4 5 6 7 8 91011121314151617181920
invert_tree:    # Function prologue    push %rbp    mov %rsp, %rbp    sub $32, %rsp    # --------- Base case ---------    # %rdi contains the input parameter (8 byte pointer to the root)    cmp $0, %rdi    # Root not NULL, continue with execution    jne continue    # Root is NULL, store NULL in %rax and return    xor %rax, %rax    jmp donecontinue:done:    # Function epilogue    leave    ret

In x64, NULL is the fee 0. We compare %rdi with 0 and if it succeeds, we put 0 in %rax and return from our aim.To enact this, using xor is faster than mov or sub.

Struct alignment

On most ISA’s each and each data kind in a struct has an alignment requirement. What this suggests is that the address of a kind must initiating on an address which is a just a few of their dimension. (There are some exceptions, but this is the everyday rule)

So int’s must initiating on an address which is a just a few of 4, and pointers must initiating on an address which is a just a few of 8.

Given this, our normal struct is specified by memory something appreciate so:

struct TreeNode{     int val;    char pad[4];  // 4 extra bytes to align the next pointer    struct TreeNode * left;    struct Treenode * right;};

This wastes 4 bytes for every struct in memory. We can set up this dwelling by declaring the pointers first, nonetheless we are capable of proceed to make use of the authentic struct declaration as we wish to submit this to Leetcode.

Storing local variables on the stack

As soon as we know the address offsets in our struct, we are capable of then retailer them on the stack. We can retailer local_root, left and proper pointers on the stack. Each and each of them has a dimension of 8 bytes. We retailer them as follows

Variable space initiating space stop
local_root %rbp – 8 %rbp
local_left %rbp – 16 %rbp – 8
local_right %rbp – 24 %rbp – 16

Also, to compose relating to them less difficult, we are capable of stipulate these offsets as constants using the equ assembler directive.

12345678
# ....section .text.equ local_root, -8.equ local_left, -16.equ local_right, -24invert_tree:# ... function here

We can then use them in our code.

 1 2 3 4 5 6 7 8 910111213141516171819
invert_tree:    # ...continue:    # --------- Save local variables to stack ---------    # Save the root parameter as local_root on stack    movq %rdi, local_root(%rbp)    # Advance 8 bytes from root, to get left pointer in struct.    movq 8(%rdi), %rdx    # Store this as local_left to stack    movq %rdx, local_left(%rbp)    # Advance 16 bytes from root, to get to right pointer in struct.    movq 16(%rdi), %rdx    # Store this as local_right to stack    movq %rdx, local_right(%rbp)done:    # ...

Hang the recursive calls

At this level, having a investigate cross-test at the pseudocode, we simply like to love 2 recursive calls.

To enact this, we retailer the supreme parameter to %rdi and use the call instruction.

 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
invert_tree:    # Function prologue    push %rbp    mov %rsp, %rbp    sub $32, %rsp    # --------- Base case ---------    # %rdi contains the input parameter (8 byte pointer to the root)    cmp $0, %rdi    # Root not NULL, continue with execution    jne continue    # Root is NULL, store NULL in %rax and return    xor %rax, %rax    jmp donecontinue:    # --------- Save local variables to stack ---------    # Save the root parameter as local_root on stack    movq %rdi, local_root(%rbp)    # Advance 8 bytes from root, to get left pointer in struct.    movq 8(%rdi), %rdx    # Store this as local_left to stack    movq %rdx, local_left(%rbp)    # Advance 16 bytes from root, to get to right pointer in struct.    movq 16(%rdi), %rdx    # Store this as local_right to stack    movq %rdx, local_right(%rbp)    # --------- Recursive calls ---------    # Recursively call invert_tree on local_right    movq local_right(%rbp), %rdi    call invert_tree    # Calculate address of root.left    movq local_root(%rbp), %rdx    addq $8, %rdx    # Assign root.left to value returned from recursive call    movq %rax, (%rdx)        # Recursively call invert_tree on local_left    movq local_left(%rbp), %rdi    call invert_tree    # Calculate address of root.right    movq local_root(%rbp), %rdx    addq $16, %rdx    # Assign root.right to value returned from recursive call    movq %rax, (%rdx)    # Return root    movq local_root(%rbp), %raxdone:    # Function epilogue    leave    ret

To set up the outcomes of the recursive calls, we use the %rdx register to compute the supreme addresses of root.left and root.right.Most of this is a straight translation of the pseudocode.

Submitting to leetcode

We cannot submit assembly to Leetcode, so we are capable of use C instead. Leetcode makes use of gcc to bring collectively our code, so we are capable of use the __asm__ directive to add our assembly.

We wish to additionally repeat our aim with extern earlier than we are capable of call it.

 1 2 3 4 5 6 7 8 910111213
typedef struct TreeNode TreeNode;extern TreeNode* invert_tree(TreeNode*);__asm__(".global invert_tree  nt"".type invert_tree, @function nt"".section .text               nt"// Rest of our code here...);// Leetcode funcTreeNode* invertTree(TreeNode* root){    return invert_tree(root);}

And we’re accomplished!

The leetcode performance and memory numbers are very unreliable, so we don’t basically understand how well-known sooner our program is when when put next with various submissions, but I believe it’s miles sort of fast.

Conclusion

This became a fun excursion to brush up on my assembly skills. It wasn’t refined, but positively became slack (That is frequently how I likeabout most leetcode style questions).

Max Howell rejected from Google

Now not lower than now I will pronounce to extra memes on r/ProgrammerHumor

Max Howell rejected from Google

HN Front net page update

Since just a few of us from HN and reddit are here, time for a shameless stir:

I am actively having a investigate cross-test for a python/rust/quant job.Right here’s my Github and Email

Thanks for reading. Joyful Thanksgiving!

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button