Introduction to Computer Systems
Loop Unrolling

# Loop Unrolling

## Data Dependencies

To understand how a CPU will execute a series of instructions, you need to understand implicit ordering constraints within a set of instructions:

```
int f(int a, int b, int c)
{
int t1 = a * b;
int t2 = b / c;
int t3 = t1 & t2;
int t4 = t1 | t2;
int t5 = t3 - t4;
return t5;
}
```

```
f:
;# a = %rdi
;# b = %rsi
;# c = %rdx
movl %esi, %eax
movl %edx, %ecx
imull %esi, %edi
cltd
idivl %ecx
movl %edi, %edx
andl %eax, %edx
orl %eax, %edi
movl %edx, %eax
subl %edi, %eax
ret
```

## Intel Core i7 Functional Units

- Fast integer arithmetic, most vector arithmetic, floating point fused multiply and add, integer and floating point division, and branches
- All integer arithmetic, most vector arithmetic, floating point fused multiply and add, and load effective address
- Load and address computation
- Load and address computation
- Store Data
- Fast integer arithmetic, some vector arithmetic, and fast load effective address
- Fast integer arithmetic
- Store address computation

## Characteristics of CPU instructions

**Latency**- the amount of time between the start of an individual instruction and its completion (data-dependent)**Issue**- (AKA Throughput) the amount of time between the start of an individual instruction and the start of the next instruction within the same functional unit**Capacity**- the amount of functional units that can concurrently accept a specific instruction

## Data Dependencies With Loops

```
int A[N];
int sum = 0;
for (int i = 0; i < N i++)
{
sum += A[i];
}
```

```
movl $0, %eax
;# %rcx points to end of A
.L3:
addl (%rdx), %eax
addq $4, %rdx ;# adding 4 bits w/ quad = adding one
compq %rcx, %rdx
jne .L3
rep ret
```

- Only uses 4 of the functional units

##Unrolling The Loop By 4

- same code unrolled by 4:

```
int A[N];
int sum = 0;
for (int i = 0; i < N; i += 4)
{
sum += A[i];
sum += A[i + 1];
sum += A[i + 2];
sum += A[i + 3];
}
```

```
movl $0, %edx
movl $0, %eax
;# %edi points to the end of A
.L8:
...
```

### Fixing Potential Errors

…

### Removing Unneeded Dependencies

…