gcc optimisation: __builtin_expect

__builtin_expect is a gcc build-in function that enables programmers to give a hint to gcc about branch prediction.

__builtin_expect is a gcc build-in function that enables programmers to give a hint to gcc about branch prediction. The declaration is:

long __builtin_expect (long exp, long c)

where exp is an expression and c is expected value of exp. The value of c has to be known to compiler at the compile time. Lets take a really simple example:

#include <stdio.h>;
#include <stdlib.h>;
 
int main(void)
{
int x;
 
//'fool' gcc not to optimize code too much
x =  rand();
if (__builtin_expect((x==38),1)) {
  x += 10;
}  else {
  x=0;
  printf("false branch\n");
}
//execute  some more code afterwards
x++;
return x;
}

In this case we predict (for some reason) that value of comparison x == 38 is going to be 1 (true). That means that first (if) branch should be executed more often than the second one (else branch). Assembly code looks like this:

zabuchy% gcc -O2 -S likely2.c
zabuchy% cat likely2.s
.file "likely2.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "false branch"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB26:
pushq %rbx
.LCFI0:
call rand
cmpl $38, %eax       ;comparing x to 38
movl %eax, %ebx
jne .L2                 ;if x!=38 than go to .L2
movb $48, %bl         ;x = 48 (38+10)
incl %ebx               ;x++
movl %ebx, %eax
popq %rbx
ret                       ;finish program
.L2:
xorl %ebx, %ebx     ;x=0
movl $.LC0, %edi
incl %ebx              ;x++
call puts               ;print 'false branch' text
movl %ebx, %eax
popq %rbx
ret                       ;finish program
[...]

Processors can execute sequential code faster, so our most likely branch was put just below comparison. In case we were wrong and x is not equal 38, processor will have to jump to label .L2, which means execution will be slower. Let’s change our expected value to tell gcc that ‘else’ branch is more probable:

[...]
if (__builtin_expect((x==38),0)) {
x += 10;
} else {
x=0;
[...]

if and else branched are swapped now:

[...]
.LCFI0:
call rand
cmpl $38, %eax
movl %eax, %ebx
je .L5
xorl %ebx, %ebx
movl $.LC0, %edi
incl %ebx               ;doubled
call puts             ;print 'false branch' text
movl %ebx, %eax     ;doubled
popq %rbx               ;doubled
ret
.L5:
movb $48, %bl         ;x = 48 (38+10)
incl %ebx              ;doubled
movl %ebx, %eax     ;doubled
popq %rbx               ;doubled
ret
[...]

As you can see last 3 instructions (commented as doubled) that are executed in both cases (x=38 and x!=38) exist in both branches:

incl %ebx
movl %ebx, %eax
popq %rbx

The reason for this is exactly the same – avoiding jumps in the code (even if compiler needs to duplicate some code).

__builtin_expect function is widely used in Linux development. In Linux source code you won’t see direct use of __builtin_expect but two macros, defined as:

#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)

GNU gcc manual recommends usage of profilers for branch prediction optimisation, as “programmers are notoriously bad at predicting how their programs actually perform”.

  1. A quick way to take advantage of __builtin_expect is to use it in asserts macros, which their conditions are almost always false. Depending on the codebase and platform, this can be a huge perf gain on builds with asserts enabled.

Leave a Comment


*


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">