Effective Assembly: Bitwise Shifts

Most people, when first starting assembly, still carry over a lot of high level constructs in their assembly programs. A common pattern is to multiply and divide when a bit shift would suffice.

For example, a lot of people would write a program to write out the binary representation of an integer using the divide and modulo operations. This is rather inefficient compared to using shifts. For example, the divide by 2 can be replaced with a right shift by 1, and modulo 2 can be replaced by a bitwise AND with 1.

Aside: interestingly, taking any number modulo a power of two m is equivalent to doing a bitwise AND with m-1. The proof of this is left as an exercise for the reader.

This post will address the basics you need to know about shifts to get up to speed on writing good assembly.

The Types of Shifts

There are three types of shifts in most processors. These are:

Left shift: shifting all the bits in an integer to the left, dropping the highest-ordered bits, and fill the lowest-order bits with 0.
Logical right shift: shifting all the bits in an integer to the right, dropping the lowest-ordered bits, and fill the highest-order bits with 0.
Arithmetic right shift: like a logical right shift, but the highest-order bits are filled with copies of the most significant bit, or the sign bit.

In general, a left shift by x is equivalent to multiplication by 2^x, regardless of sign. Right shifts are equivalent to floor division by 2^x. You would use logical right shift to process unsigned integers, and arithmetic right shifts to process signed integers.

Example

In x86 assembly, the instructions shl, shr, sar performs left shift, logical right shift, and arithmetic right shift, respectively.

Consider the bit pattern 1101 1110 1010 1101 1011 1110 1110 1111 stored in the register eax.

shl eax, 4 # turns eax into 1110 1010 1101 1011 1110 1110 1111 0000
shr eax, 4 # turns eax into 0000 1110 1010 1101 1011 1110 1110 1111
sar eax, 4 # turns eax into 1111 1110 1010 1101 1011 1110 1110 1111

Applications

Bit shifts manifest itself in many places. For example, in memory addressing instructions. Many processors provide some instruction to index into an array where the elements are a power-of-two in size, with a base address, and perhaps also letting you add a constant offset.

The x86 instruction looks like mov eax, [ebx+4*ecx]. While this looks like a multiplication, the fact that you can only multiply by powers-of-two is a dead giveaway: it is actually a shift. The multiplication by 4 is encoded as a left shift of 2.

The ARM processor is much less subtle. The equivalent instruction would be: LDR R0, [R1, R2, LSL #2]. There is no abstraction to hide the shift in a power-of-two multiplication on this platform.

As an example, consider the following C code:

int a, c = <some index>;
int *b = <some array>;
a = b[c];

a = b[c] is equivalent to a = *(b + c). And assuming sizeof(int) == 4, then this effectively does a = *(int*)((char*)b + 4 * c). In assembly, pointer arithmetic is always done on by byte level, hence we need to consider casting to char* for an effective demonstration.

Now, consider that a is mapped to eax (or R0 for the ARM example), b is mapped to ebx (R1), and c to ecx (R2). Then, a = b[c] would be compiled to mov eax, [ebx+4*ecx] (or LDR R0, [R1, R2, LSL #2]).

Fancy Arithmetic

On the x86, there is a very interesting instruction called lea. It has a similar syntax to the aforementioned version of the mov instruction, which adds two registers with an optional left shift on one of them, and a constant offset. However, instead of loading the value at that memory address into the destination register, it instead loads the address itself into the destination register.

More interestingly, the address that is loaded doesn’t even have to be valid. So, in effect, this is a fancy arithmetic instruction.

Consider this code:

int a, b = <some value>, c = <some value>;
a = b + 8 * c + 10;

Using the register assignment a -> eax, b -> ebx, c -> ecx, then this can be done in one single instruction: lea a, [b + 8*c + 10].

The takeaway is: effective use of bitwise shifts and instructions based on bitwise shifts can help you optimize your handwritten assembly code dramatically. Use them if possible.