ASM (x86-64)

Learn about the assembly language understood by our home computers

Easy

ASM (x86-64)
Instructions and Registers

ASM_(x86-64)::Instructions_and_Registers

x86-64 refers to the 64-bit version of the x86 CPU architecture implemented by many CPU manufacturers. Assembly programs consist of an arrangement of instructions that will be executed by the CPU to interact with memory and registers.

What are instructions?

Instructions are the lowest-level unit of execution that the CPU interprets and executes. There are a wide variety of possible instructions and it would be impossible to cover all of them.

The CPU will execute instructions from lower addresses to high addresses, with a special register RIP that indicates the address of the instruction being executed. This may sound confusing, but it is actually very similar to how humans read recipes, from the start of the recipe (low address) to the end of the recipe (high address), doing one step(instruction) at a time.

Let's take a look at an assembly code example from the previous lesson.

push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-4], edi
mov     DWORD PTR [rbp-8], esi
mov     edx, DWORD PTR [rbp-4]
mov     eax, DWORD PTR [rbp-8]
add     eax, edx
pop     rbp
ret

When the CPU executes this code, it will start from the first instruction push rbp and keep executing one instruction at a time till it eventually reaches the last instruction ret. As each instruction gets executed, the special RIP register will be incremented accordingly to point to the next instruction, if this sounds confusing it will be explained shortly.

Baby's first instruction

While you may have been introduced to a variety of instructions from the past few examples, the first instruction we will explicitly go through is the nop!

The nop serves an important functionality, it does nothing!

Here is a sample assembly code snippet showcasing the nops function:

nop
nop
nop

After this code runs, nothing happens (except for RIP increasing).

What are registers?

Before we continue introducing more instructions, it will be important to go through the concept of registers. In traditional programming, you should have learnt about the concept of variables. Variables are temporary constructs that store information for intermediate calculations and operations. Registers behave very similarly, but only store certain data types, and they store information in the CPU itself!

In the previous lesson's example of Bob the Chef, registers are like the bowls!

In the x86-64 architecture CPUs, there are a few groups of registers: general-purpose, segment, and EFLAGS registers. On 64-bit systems like the x86-64, most registers will store and represent values using 64-bits of storage.

Here is a table of the register names.

Register TypeRegister Names
General-purposeRAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8-R15
SegmentES, CS, SS, DS, FS
*EFLAGSCF, PF, AF, ZF, SF ...

*EFLAGS "registers" are bit-flags on the same register.

General-purpose registers are used for storage of intermediate values, simple integer calculations, and used as memory pointers. The RSP register has some unique properties that will be discussed later, and generally isn't used the same way as other general-purpose registers.

Segment registers are a more advanced concept that will not be discussed for now. Most simple assembly programs will not need to deal with these.

EFLAGS is a special register used for control-flow that will be discussed in a future lesson. Generally, this register is not directly operated on, but rather is modified as a side effect of instructions.

Using registers

Let's dive right in and start looking at some assembly code! The first instruction we'll cover is mov.

The syntax for mov is as follows:

mov <dst>, <src>

The mov instruction is quite straightforward, it moves the values from the <src> operand to the <dst> operand. The <dst> operand will contain the name of the register you intend to write to, and the <src> operand can contain the name of the register you intend to read from, or an immediate numerical value.

mov rax, 123
result:
rax: 123

You can also move values between registers.

mov rax, 123
mov rbx, rax
result:
rax: 123
rbx: 123

Initializing...

CODE [0x400000]
Registers
rax:
rbx:
rcx:
rdx:
rdi:
rsi:
r8 :
r9 :
r10:
r11:
r12:
r13:
r14:
r15:
rbp:
rsp:
rip:
Although the name of the instruction is move, do not be confused as the operation is actually more similar to a copy, the original operand will not have its value cleared.

Sub-registers

Instruction Pointer (RIP)

As mentioned earlier, there is a special register called RIP that acts as a pointer to the next instruction to be executed.

This acts a sort of cursor to indicate the current instructions being executed. The value of this register is never explicitly modified. Rather, as the program progresses, the RIP register is modified as a side-effect accordingly. Therefore, you will not observe instructions that mov values into rip for example.

We can observe this effect in the below example. Take note of the value of the RIP register as you step through each line of assembly code.

Initializing...

CODE [0x400000]
Registers
rax:
rbx:
rcx:
rdx:
rdi:
rsi:
r8 :
r9 :
r10:
r11:
r12:
r13:
r14:
r15:
rbp:
rsp:
rip:

In general, the side effect of each instruction will be to increment the value of RIP to refer to the next immediate instruction. Intuitively, you can think of it as a teacher pointing to the next line in a storybook every time the current line has been read.

In later lessons, we will cover other instructions that may modify RIP in a very different manner. But in general, it is a safe guess that RIP will automatically point to the neighbouring instruction at every step.

Quiz

Which register is not a general-purpose register?

Consider the following assembly code.

    mov rax, 1
    mov rbx, 2
    mov rcx, rax
    mov rcx, rbx
        
What will be the value of rcx after the code has completed running.