Basic Assembly Tutorial, part 1

February 2016, Cremona, Italy

Back to Homepage
Back to ASM index
GOTO lesson 2

Assembly and Architectures

When programming in assembly you deal with hardware, thus you must be aware of the CPU architecture you're working on. The most basic example is the choice of registers (memory areas integrated in the CPU): every architecture has its own number of registers, with their own names. In this tutorial I'll mainly refer to the x86 architecture, since it's very common and since it's evolution x86-64 (aka AMD64) is retro-compatible with it. Make sure to assemble the code with the right flag in NASM! For example, I'm working on a x86-64 machine (it's an intel core i5), so I must enable the flag -f elf64 in NASM.
Unsure about your architecture? Run:
lscpu


For very basic assembly (basic instructions etc) the only thing that changes is the name and number of registers. For example, in x86 architectures the number of general purpose registers (GPRs) is 8, they are of 32 bits each and are named "eax, ebx, ecx, edx, ebp, esp, esi, edi". In x86-64 the number of GPRs is increased to 16, 64 bits each, named "rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15".

Programs structure

In assembly there are not variables in the sense of High Level programming languages: all that exists is memory and you can save data into it and then refer to that data, use it etc. An assembly program is made of three sections: the .data section in which you can set values in memory such as bufer sizes, numbers, characters or whatever else, the .bss section that is used to reserve some memory for subsequent usage and the .text section that contains the actual code.
section .data ; constants here section .bss ; variables here section .text global _start ; must be declared for the linker _start: ; actual code starts here

The instruction "global _start" must be specified for the linker and says the the actual code is to be found after "_start:".

Registers, system calls and the Hello World!

Assembly is much closer to the hardware than high level languages, thus many more basic instructions must be specified. Have fun: we're controlling the hardware now!
As you know, the CPU performs arithmetic operations on some data that it finds inside some areas of special memory built inside it, that are called "registers". Registers are paramount for all kind of operations. For example, sending a message to the screen (stdout) in FORTRAN is pretty simple:
PRINT *, "ciao"
.. but what is really going on is that you are performing the system call called "sys_write". WHAT IS A SYSTEM CALL?! A system call is a call from the user space (where you are working on) to the Kernel (that has direct access to the hardware). Some system calls are "sys_write", "sys_exit", "sys_read", "sys_time", ... System calls are choosen by putting their identifying number in the appropriate register and their parameters in other registers (later an explaination).

Hello World!

Let's see a Hello World example.
section .data msg db "hello, world!" ; defining the message section .text global _start ; this is for the linker _start: mov rax, 4 ; Select system call: 4 = sys_write mov rbx, 1 ; First argument: 1 = stdout mov rcx, msg ; Second argument: pointer to message mov rdx, 13 ; Third argument: number of bytes to be written int 0x80 ; perform the chosen system call (pass variables ; inside registers to the kernel and it will do ; the rest) mov rax, 1 ; 1 = sys_exit mov rbx, 0 ; exit status = 0 int 0x80 ; again, perform system call, this time sys_exit


"Compile" with the "NASM" assembler:
nasm -f elf64 -o hello.o hello.asm
then link with the "ld" linker:
ld -o hello hello.o
and run.

Helloooooooooooooo!!!! :-)


Let's take a closer look at the system call. The syntax of sys_write is (in C since the Linux Kernel is programmed in C..):
ssize_t sys_write(unsigned int fd, const char * buf, size_t count)
As you can see, the arguments are:
So the way to perform a generic system call is:
  1. Put the system call identifying number in the RAX register
  2. Put the first argument in the RBX register with the mov instruction
  3. Put the second argument in the RCX register
  4. Put the third argument in the ... and so on
  5. Then call "int 0x80" to perform the call
In x86 architecture just replace the mentioned registers with EAX, EBX etc.

Here's a table of a fex system calls:
EAX Syscall
1 sys_exit
2 sys_fork
3 sys_read
4 sys_write
5 sys_open
6 sys_close
Here you can find many more i386 system calls with syntax: http://asm.sourceforge.net/syscall.html

Read Keyboard Input

Let's now see a program to read data from keyboard, save it somewhere in memory and then print it. I've used the "E--" names for registers here (remember, for retrocompatibility it's OK).
The syntax for sys_read is pretty much the same as sys_write, except that it takes the memory location where to write (char * buf means a memory location, a pointer!).
ssize_t sys_read(unsigned int fd, "char * buf", size_t count)

Also, I've reserved some space in memory using the directive "RESB", that reserves some Bytes (I've choosen 4 bytes). This memory location has label "myvariable"
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; section .data ; Already initialized data insertmesg db 'Insert a number: ' outputmesg db 'You have choosed: ' ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; section .bss ; Data to be initialized myvariable resb 4 ; reserve 4 bytes to put the pressed ; key. "myvariable" is a label ; representing memory location ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; section .text global _start ; for the linker _start: ; STEP 1: writing the message mov eax, 4 ; sys_write mov ebx, 1 ; file descriptor: stdout mov ecx, insertmesg ; message to be print mov edx, 17 ; message length int 80h ; perform system call ; (same as int 0x80) ; STEP 2: read from keyboard mov eax, 3 ; sys_read mov ebx, 2 ; file descriptor: stdin mov ecx, myvariable ; destination (memory address) mov edx, 4 ; size of the the memory location ; in bytes int 80h ; perform system call ; STEP 3: print the number ; STEP 3.1: print "You entered number: " mov eax, 4 ; sys_write mov ebx, 1 ; std_out mov ecx, outputmesg mov edx, 19 int 80h ; STEP 3.2: print the value inside variable mov eax, 4 mov ebx, 1 mov ecx, myvariable mov edx, 4 int 80h ; STEP 4: exit with error status 0 mov eax, 1 mov ebx, 0 int 80h

The MOV instruction

So far we've made abundant use of the MOV instruction. What it does is straightforward: mov destination source. You can: Note that when moving data from registers to other registers the operation is very quick, and it gets slower and slower when memory is taken into consideration. A general remark for writing fast code is "keep the same data in the registers as much as you can".