Basic Assembly Tutorial, part 1
February 2016, Cremona, Italy
Back to Homepage
Back to ASM index
GOTO lesson 2
Assembly and Architectures
When programming in assembly you deal with hardware, thus you must be aware of the CPU
architecture you're working on.
The most basic example is the choice of registers (memory areas integrated in the CPU):
every architecture has its own number of registers, with their own names.
In this tutorial I'll mainly refer to the x86 architecture, since it's very common and since it's
evolution x86-64 (aka AMD64) is retro-compatible with it.
Make sure to assemble the code with the right flag in NASM!
For example, I'm working on a x86-64 machine (it's an intel core i5), so I must enable the flag -f elf64
in NASM.
Unsure about your architecture?
Run:
lscpu
For very basic assembly (basic instructions etc) the only thing that changes is the name and number
of registers.
For example, in x86 architectures the number of general purpose registers (GPRs) is 8, they are
of 32 bits each and are named "eax, ebx, ecx, edx, ebp, esp, esi, edi".
In x86-64 the number of GPRs is increased to 16, 64 bits each, named "rax, rbx, rcx, rdx, rbp, rsp,
rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15".
Programs structure
In assembly there are not variables in the sense of High Level programming languages:
all that exists is memory and you can save data into it and then refer to that data, use it etc.
An assembly program is made of three sections: the .data section in which you can set values
in memory such as bufer sizes, numbers, characters or whatever else, the .bss section
that is used to reserve some memory for subsequent usage and the .text section that contains
the actual code.
section .data
; constants here
section .bss
; variables here
section .text
global _start ; must be declared for the linker
_start:
; actual code starts here
The instruction "global _start" must be specified for the linker and says the the
actual code is to be found after "_start:".
Registers, system calls and the Hello World!
Assembly is much closer to the hardware than high level languages, thus many more basic
instructions must be specified.
Have fun: we're controlling the hardware now!
As you know, the CPU performs arithmetic operations on some data that it finds inside some
areas of special memory built inside it, that are called "registers".
Registers are paramount for all kind of operations.
For example, sending a message to the screen (stdout) in FORTRAN is pretty simple:
PRINT *, "ciao"
.. but what is really going on is that you are performing the system call called "sys_write".
WHAT IS A SYSTEM CALL?! A system call is a call from the user space (where you are working on)
to the Kernel (that has direct access to the hardware).
Some system calls are "sys_write", "sys_exit", "sys_read",
"sys_time", ...
System calls are choosen by putting their identifying number in the appropriate register and
their parameters in other registers (later an explaination).
Hello World!
Let's see a Hello World example.
section .data
msg db "hello, world!" ; defining the message
section .text
global _start ; this is for the linker
_start:
mov rax, 4 ; Select system call: 4 = sys_write
mov rbx, 1 ; First argument: 1 = stdout
mov rcx, msg ; Second argument: pointer to message
mov rdx, 13 ; Third argument: number of bytes to be written
int 0x80 ; perform the chosen system call (pass variables
; inside registers to the kernel and it will do
; the rest)
mov rax, 1 ; 1 = sys_exit
mov rbx, 0 ; exit status = 0
int 0x80 ; again, perform system call, this time sys_exit
"Compile" with the "NASM" assembler:
nasm -f elf64 -o hello.o hello.asm
then link with the "ld" linker:
ld -o hello hello.o
and run.
Helloooooooooooooo!!!! :-)
Let's take a closer look at the system call.
The syntax of sys_write is (in C since the Linux Kernel is programmed in C..):
ssize_t sys_write(unsigned int fd, const char * buf, size_t count)
As you can see, the arguments are:
- file descriptor (1 = stdout)
- message
- message length
So the way to perform a generic system call is:
- Put the system call identifying number in the RAX register
- Put the first argument in the RBX register with the mov instruction
- Put the second argument in the RCX register
- Put the third argument in the ... and so on
- Then call "int 0x80" to perform the call
In x86 architecture just replace the mentioned registers with EAX, EBX etc.
Here's a table of a fex system calls:
EAX |
Syscall |
1 |
sys_exit |
2 |
sys_fork |
3 |
sys_read |
4 |
sys_write |
5 |
sys_open |
6 |
sys_close |
Here you can find many more i386 system calls with syntax:
http://asm.sourceforge.net/syscall.html
Read Keyboard Input
Let's now see a program to read data from keyboard, save it somewhere in memory and then print it.
I've used the "E--" names for registers here (remember, for retrocompatibility it's OK).
The syntax for sys_read is pretty much the same as sys_write, except that it
takes the memory location where to write (char * buf means a memory location, a pointer!).
ssize_t sys_read(unsigned int fd, "char * buf", size_t count)
Also, I've reserved some space in memory using the directive "RESB", that reserves some
Bytes (I've choosen 4 bytes).
This memory location has label "myvariable"
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
section .data ; Already initialized data
insertmesg db 'Insert a number: '
outputmesg db 'You have choosed: '
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
section .bss ; Data to be initialized
myvariable resb 4 ; reserve 4 bytes to put the pressed
; key. "myvariable" is a label
; representing memory location
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
section .text
global _start ; for the linker
_start:
; STEP 1: writing the message
mov eax, 4 ; sys_write
mov ebx, 1 ; file descriptor: stdout
mov ecx, insertmesg ; message to be print
mov edx, 17 ; message length
int 80h ; perform system call
; (same as int 0x80)
; STEP 2: read from keyboard
mov eax, 3 ; sys_read
mov ebx, 2 ; file descriptor: stdin
mov ecx, myvariable ; destination (memory address)
mov edx, 4 ; size of the the memory location
; in bytes
int 80h ; perform system call
; STEP 3: print the number
; STEP 3.1: print "You entered number: "
mov eax, 4 ; sys_write
mov ebx, 1 ; std_out
mov ecx, outputmesg
mov edx, 19
int 80h
; STEP 3.2: print the value inside variable
mov eax, 4
mov ebx, 1
mov ecx, myvariable
mov edx, 4
int 80h
; STEP 4: exit with error status 0
mov eax, 1
mov ebx, 0
int 80h
The MOV instruction
So far we've made abundant use of the MOV instruction.
What it does is straightforward: mov destination source.
You can:
- mov register, constant
- mov register, memory
- mov ... various combinations ...
Note that when moving data from registers to other registers the
operation is very quick, and it gets slower and slower when memory is taken
into consideration.
A general remark for writing fast code is "keep the same data in the registers
as much as you can".