Basic Assembly Tutorial, part 2
February 2016, Cremona, Italy
Back to Homepage
Back to ASM index
GOTO lesson 3
Welcome to part II of this simple tutorial.
Variables do not exist, data does exist
Ok, the very first thing you must keep in mind is that in assembly, variables are *not* containers
of something (such as in high level languages), they are just labels referring to a location in
memory (RAM memory).
When assembling the code, the assembler substitutes the mnemonic name you gave to the variable
(such as my_var) to a memory location (such as 0x12345).
In fact defining variables can be seen this way:
mnemonic name size (1 byte) value
my_var db 2
This declaration will fill 1 byte of memory starting from a certain point (known as my_var)
containing the value "2".
If you define another variable after my_var, it will simply be placed 1 byte after the
position my_var.
Got it?
If my_var was defined as "dw" it would fill 2 bytes in the memory and other variables
would be stored 2 bytes after the position referred to as "my_var".
PLEASE REMEMBER: we would better talk about "labels" than "variables".
The section .data allows one to fill memory with some data and provide a name to the starting
point of the data block.
The section .bss allows one to reserve some memory for future usage and provide a name to the
starting point of that data block.
Initialized data
In the Hello World program we've been using the "db" declaration, standing for
"Define Byte".
There are other declarations, see table below.
Note that since a byte is made of 8 bits, you can represent with it 28 numbers, that is 256, so
you can represent numbers from 0 to 255 with 1 byte.
Two bytes are 16 bits, thus you can represent 216 numbers and so on.
declaration |
meaning |
size |
decimal values range |
db |
define byte |
1 byte |
0 - 255 |
dw |
define word |
2 bytes |
0 - 65535 |
dd |
define doubleword |
4 bytes |
0 - 4294967295 |
dq |
define quadword |
8 bytes |
pretty big |
dt |
define ten bytes |
10 bytes |
pretty huge |
Note that (as far as I know) in x86 (and x86-64) architectures, 1 byte is the smallest addressable datum!
You cannot address to the single bit, but at least to a byte.
EXAMPLE
section .data
myvar db 'k' ; Initialized data
section .text
global _start ; needed by linker
_start:
; Print current value at location myvar (and following)
mov eax, 4 ; sys_write
mov ebx, 1 ; stdout
mov ecx, myvar ; memory location to find stuff
mov edx, 1 ; length of data to be print is 1 byte
int 0x80
; Now let's modify the value 'k' to 'p'
; First we must put the value inside a register,
; then we can put it into memory
mov eax, 'p' ; mov the character 'p' into eax (this is
; not good, but for now it's ok. See more
; about registers!)
mov [myvar], eax ; mov the content of eax at location myvar
; Print the new value at memory location myvar
mov eax, 4 ; sys_write
mov ebx, 1 ; stdout
mov ecx, myvar ; memory location to find stuff
mov edx, 1 ; length of data to be print is 1 byte
int 0x80
; Exit
mov eax, 1
mov ebx, 0
int 0x80
A couple of things you must note in the previous example:
- initialized data can be modified
- to modify data in memory you must pass from a register
- myvar is a location in memory; to indicate that something is a memory
address you embrace it with square brackets.
- WE HAVE USED THE EAX REGISTER TO TEMPORARILY PUT DATA. THIS WILL PROVE TO BE WRONG!
SEE SECTION "Registers part 2"!
ASCII and sys_write
Consider the following code:
;;;;; I want to print myvar ;;;;;
section .data
myvar db 'y' ; saves in myvar the ASCII code for 'y'
[....]
mov eax, 4 ; sys_write
mov ebx, 1 ; stdout
mov ecx, myvar
mov edx, 1
int 0x80 ; runs system call
Since you've specified 'y' within apices, the assembler takes the ASCII value for the character 'y'
(that is 0x79 or 01111001 in binary or 121 in decimal) and saves it in a place that labels 'myvar'.
A few lines later, sys_write gets as an input the number 0x79 and prints the corresponding ASCII
character.
Remember the syntax of sys_write?
It expects characters (ASCII encoded)!
Let's now say you wish to print the number 2: the following code will produce an unexpected output!
;;;;; I want to print the number 2 ;;;;;
section .data
myvar db 2 ; WRONG CODE SNIPPET!!!!!!
[....]
When passed to sys_write it will print the character corresponding to ASCII code "2", that isn't
even a character (it's a special character).
The right choice is specifiyng to the compiler that "2" is a character:
;;;;; I want to print the number 2 ;;;;;
section .data
myvar db '2' ; THIS TIME RESULT WILL BE OK
[....]
Negative numbers
I've never mentioned negative numbers so far.
Let's take it back to numbers representation.
Machines do not represent numbers in decimal system (in fact they work with zeros and ones), but prefere
the binary system (and the hexadecimal, since its conversion from/to binary is straightforward).
Now, with one byte (8 bits) you can represent 28 different numbers, such as 0 to 255
or maybe we could represent numbers from -128 to +127.
First of all, a number is positive if its higher order bit (the one more at left) is 0 and is negative if
it's 1.
How positive numbers are converted to their negative counterpart seems a little weird in the beginning
and is called "two's complement system".
One have to take the binary form of one number, then invert each bit and in the end add 1.
This result in MINUS the starting number.
Here's an example:
- Take the binary of a number, say 5: 0000101
- Apply a logical NOT operation to each bit, result: 1111010
- Sum 1 to the previous: 1111101. This is -5
If you repeat the two's complement system, you get back +5.
Non initialized variables
Reserving memory for future use and giving a label to it is done with the .bss
section.
It's done almost like initialized data, using the REServe Byte directive or its
counterparts:
declaration |
meaning |
size |
decimal values range |
resb |
reserve byte |
1 byte |
0 - 255 |
resw |
reserve word |
2 bytes |
0 - 65535 |
resd |
reserve doubleword |
4 bytes |
0 - 4294967295 |
resq |
reserve quadword |
8 bytes |
pretty big |
rest |
reserve ten bytes |
10 bytes |
pretty huge |
Here's an example from the NASM manual:
buffer: resb 64 ; reserve 64 bytes
wordvar: resw 1 ; reserve a word
realarray resq 10 ; array of ten reals
ymmval: resy 1 ; one YMM register
zmmvals: resz 32 ; 32 ZMM registers
You might notice that the location name has a colon (with respect to the previous examples);
I guess it's optional.
Registers again
Before proceeding with some examples we must talk a bit more about registers.
We've already said that registers are kinds of memory areas built in the CPU, that can store
some data.
First of all, let's take one byte of memory: it's composed of 8 bits.
Bytes can thus be split in half, the first half (the one storing most important bits)
is called the "Higher" one, while the second half is the "Lower" one.
Half a byte (4 bits) is called a "Nibble", so one byte is composed by the Higher nibble
and the lower nibble.
In pretty much the same fashion, two bytes (16 bits) can be parted in the Higher byte and the
lower bytes.
Old architectures had 16 bits general purpose registers, called AX, BX, CX, DX.
Having a 16 bits register means that you can process 16 bits of data each time, but the handy
thing is that assembly provided a way to access half a register if one wanted to: just 8 bits
(1 byte).
Thus, the AX register could be accessed directly as "AX" or the programmer could access it's
higher part AH or lower part AL.
The same stands for BX, CX, DX registers (and others).
What's the point in doing this?
Imagine you have a 1 byte long value stored in memory (with label "my_value").
In order to change that value you have to access that byte (only that byte!) not the following
memory areas or you may change some other stored value.
Using a register of the appropriate size is thus mandatory to do things correctly!
Code 1: wrong, since we are writing out of bounds!
section .data
mydata1 db 'y' ; This data is 1 byte long
mydata2 db 'j' ; Some more data, stored after mydata1
[...]
; Now changing the byte from mydata1 label on
mov ax, 'n' ; we want to change value 'y' to 'n'..
; but we're making a mistake since ax is
; 2 bytes long (16 bit)!
mov [mydata1], ax ; Data will spill out into mydata2
Code 2: correct
section .data
mydata1 db 'y' ; This data is 1 byte long
mydata2 db 'j' ; Some more data, stored after mydata1
[...]
; Now changing the byte from mydata1 label on
mov ah, 'n' ; we want to change value 'y' to 'n'..
; this is ok since ah is 1 byte long
; that is half of ax.
mov [mydata1], ax ; Data will stay into ax
The same concept stands for 32 bit registers:
the "EAX" register's *lower* part is called "AX" and is 16 bits long.
"AX" is furtherly divided into "AH" and "AL".
The same stands for EBX, ECX, EDX etc.