Most people, when first starting assembly, still carry over a lot of high level constructs in their assembly programs. A common pattern is to multiply and divide when a bit shift would suffice.
For example, a lot of people would write a program to write out the binary representation of an integer using the divide and modulo operations. This is rather inefficient compared to using shifts. For example, the divide by 2 can be replaced with a right shift by 1, and modulo 2 can be replaced by a bitwise AND with 1.
Aside: interestingly, taking any number modulo a power of two
m is equivalent to doing a bitwise AND with
m-1. The proof of this is left as an exercise for the reader.
This post will address the basics you need to know about shifts to get up to speed on writing good assembly.
The Types of Shifts
There are three types of shifts in most processors. These are:
- Left shift: shifting all the bits in an integer to the left, dropping the highest-ordered bits, and fill the lowest-order bits with
- Logical right shift: shifting all the bits in an integer to the right, dropping the lowest-ordered bits, and fill the highest-order bits with
- Arithmetic right shift: like a logical right shift, but the highest-order bits are filled with copies of the most significant bit, or the sign bit.
In general, a left shift by x is equivalent to multiplication by 2x, regardless of sign. Right shifts are equivalent to floor division by 2x. You would use logical right shift to process unsigned integers, and arithmetic right shifts to process signed integers.
In x86 assembly, the instructions
sar performs left shift, logical right shift, and arithmetic right shift, respectively.
Consider the bit pattern
1101 1110 1010 1101 1011 1110 1110 1111 stored in the register
shl eax, 4 # turns eax into 1110 1010 1101 1011 1110 1110 1111 0000 shr eax, 4 # turns eax into 0000 1110 1010 1101 1011 1110 1110 1111 sar eax, 4 # turns eax into 1111 1110 1010 1101 1011 1110 1110 1111
Bit shifts manifest itself in many places. For example, in memory addressing instructions. Many processors provide some instruction to index into an array where the elements are a power-of-two in size, with a base address, and perhaps also letting you add a constant offset.
The x86 instruction looks like
mov eax, [ebx+4*ecx]. While this looks like a multiplication, the fact that you can only multiply by powers-of-two is a dead giveaway: it is actually a shift. The multiplication by 4 is encoded as a left shift of 2.
The ARM processor is much less subtle. The equivalent instruction would be:
LDR R0, [R1, R2, LSL #2]. There is no abstraction to hide the shift in a power-of-two multiplication on this platform.
As an example, consider the following C code:
a = b[c] is equivalent to
a = *(b + c). And assuming
sizeof(int) == 4, then this effectively does
a = *(int*)((char*)b + 4 * c). In assembly, pointer arithmetic is always done on by byte level, hence we need to consider casting to
char* for an effective demonstration.
Now, consider that
a is mapped to
R0 for the ARM example),
b is mapped to
a = b[c] would be compiled to
mov eax, [ebx+4*ecx] (or
LDR R0, [R1, R2, LSL #2]).
On the x86, there is a very interesting instruction called
lea. It has a similar syntax to the aforementioned version of the
mov instruction, which adds two registers with an optional left shift on one of them, and a constant offset. However, instead of loading the value at that memory address into the destination register, it instead loads the address itself into the destination register.
More interestingly, the address that is loaded doesn’t even have to be valid. So, in effect, this is a fancy arithmetic instruction.
Consider this code:
Using the register assignment
a -> eax, b -> ebx, c -> ecx, then this can be done in one single instruction:
lea a, [b + 8*c + 10].
The takeaway is: effective use of bitwise shifts and instructions based on bitwise shifts can help you optimize your handwritten assembly code dramatically. Use them if possible.