Assembly Tutorial part 2

Basic Assembly Tutorial, part 2

February 2016, Cremona, Italy

Back to Homepage
Back to ASM index
GOTO lesson 3

Welcome to part II of this simple tutorial.

Variables do not exist, data does exist

Ok, the very first thing you must keep in mind is that in assembly, variables are *not* containers of something (such as in high level languages), they are just labels referring to a location in memory (RAM memory). When assembling the code, the assembler substitutes the mnemonic name you gave to the variable (such as my_var) to a memory location (such as 0x12345). In fact defining variables can be seen this way:

mnemonic name size (1 byte) value my_var db 2

This declaration will fill 1 byte of memory starting from a certain point (known as my_var) containing the value "2". If you define another variable after my_var, it will simply be placed 1 byte after the position my_var. Got it? If my_var was defined as "dw" it would fill 2 bytes in the memory and other variables would be stored 2 bytes after the position referred to as "my_var".
PLEASE REMEMBER: we would better talk about "labels" than "variables".

The section .data allows one to fill memory with some data and provide a name to the starting point of the data block.
The section .bss allows one to reserve some memory for future usage and provide a name to the starting point of that data block.

Initialized data

In the Hello World program we've been using the "db" declaration, standing for "Define Byte". There are other declarations, see table below. Note that since a byte is made of 8 bits, you can represent with it 2⁸ numbers, that is 256, so you can represent numbers from 0 to 255 with 1 byte. Two bytes are 16 bits, thus you can represent 2¹⁶ numbers and so on.

declaration	meaning	size	decimal values range
db	define byte	1 byte	0 - 255
dw	define word	2 bytes	0 - 65535
dd	define doubleword	4 bytes	0 - 4294967295
dq	define quadword	8 bytes	pretty big
dt	define ten bytes	10 bytes	pretty huge

Note that (as far as I know) in x86 (and x86-64) architectures, 1 byte is the smallest addressable datum! You cannot address to the single bit, but at least to a byte.

EXAMPLE

section .data myvar db 'k' ; Initialized data section .text global _start ; needed by linker _start: ; Print current value at location myvar (and following) mov eax, 4 ; sys_write mov ebx, 1 ; stdout mov ecx, myvar ; memory location to find stuff mov edx, 1 ; length of data to be print is 1 byte int 0x80 ; Now let's modify the value 'k' to 'p' ; First we must put the value inside a register, ; then we can put it into memory mov eax, 'p' ; mov the character 'p' into eax (this is ; not good, but for now it's ok. See more ; about registers!) mov [myvar], eax ; mov the content of eax at location myvar ; Print the new value at memory location myvar mov eax, 4 ; sys_write mov ebx, 1 ; stdout mov ecx, myvar ; memory location to find stuff mov edx, 1 ; length of data to be print is 1 byte int 0x80 ; Exit mov eax, 1 mov ebx, 0 int 0x80

A couple of things you must note in the previous example:

initialized data can be modified
to modify data in memory you must pass from a register
myvar is a location in memory; to indicate that something is a memory address you embrace it with square brackets.
WE HAVE USED THE EAX REGISTER TO TEMPORARILY PUT DATA. THIS WILL PROVE TO BE WRONG! SEE SECTION "Registers part 2"!

ASCII and sys_write

Consider the following code:

;;;;; I want to print myvar ;;;;; section .data myvar db 'y' ; saves in myvar the ASCII code for 'y' [....] mov eax, 4 ; sys_write mov ebx, 1 ; stdout mov ecx, myvar mov edx, 1 int 0x80 ; runs system call

Since you've specified 'y' within apices, the assembler takes the ASCII value for the character 'y' (that is 0x79 or 01111001 in binary or 121 in decimal) and saves it in a place that labels 'myvar'. A few lines later, sys_write gets as an input the number 0x79 and prints the corresponding ASCII character. Remember the syntax of sys_write? It expects characters (ASCII encoded)!
Let's now say you wish to print the number 2: the following code will produce an unexpected output!

;;;;; I want to print the number 2 ;;;;; section .data myvar db 2 ; WRONG CODE SNIPPET!!!!!! [....]

When passed to sys_write it will print the character corresponding to ASCII code "2", that isn't even a character (it's a special character). The right choice is specifiyng to the compiler that "2" is a character:

;;;;; I want to print the number 2 ;;;;; section .data myvar db '2' ; THIS TIME RESULT WILL BE OK [....]

Negative numbers

I've never mentioned negative numbers so far. Let's take it back to numbers representation. Machines do not represent numbers in decimal system (in fact they work with zeros and ones), but prefere the binary system (and the hexadecimal, since its conversion from/to binary is straightforward). Now, with one byte (8 bits) you can represent 2⁸ different numbers, such as 0 to 255 or maybe we could represent numbers from -128 to +127.
First of all, a number is positive if its higher order bit (the one more at left) is 0 and is negative if it's 1. How positive numbers are converted to their negative counterpart seems a little weird in the beginning and is called "two's complement system". One have to take the binary form of one number, then invert each bit and in the end add 1. This result in MINUS the starting number.
Here's an example:

Take the binary of a number, say 5: 0000101
Apply a logical NOT operation to each bit, result: 1111010
Sum 1 to the previous: 1111101. This is -5

If you repeat the two's complement system, you get back +5.

Non initialized variables

Reserving memory for future use and giving a label to it is done with the .bss section. It's done almost like initialized data, using the REServe Byte directive or its counterparts:

declaration	meaning	size	decimal values range
resb	reserve byte	1 byte	0 - 255
resw	reserve word	2 bytes	0 - 65535
resd	reserve doubleword	4 bytes	0 - 4294967295
resq	reserve quadword	8 bytes	pretty big
rest	reserve ten bytes	10 bytes	pretty huge

Here's an example from the NASM manual:

buffer: resb 64 ; reserve 64 bytes wordvar: resw 1 ; reserve a word realarray resq 10 ; array of ten reals ymmval: resy 1 ; one YMM register zmmvals: resz 32 ; 32 ZMM registers

You might notice that the location name has a colon (with respect to the previous examples); I guess it's optional.

Registers again

Before proceeding with some examples we must talk a bit more about registers. We've already said that registers are kinds of memory areas built in the CPU, that can store some data.
First of all, let's take one byte of memory: it's composed of 8 bits. Bytes can thus be split in half, the first half (the one storing most important bits) is called the "Higher" one, while the second half is the "Lower" one. Half a byte (4 bits) is called a "Nibble", so one byte is composed by the Higher nibble and the lower nibble.

In pretty much the same fashion, two bytes (16 bits) can be parted in the Higher byte and the lower bytes.
Old architectures had 16 bits general purpose registers, called AX, BX, CX, DX. Having a 16 bits register means that you can process 16 bits of data each time, but the handy thing is that assembly provided a way to access half a register if one wanted to: just 8 bits (1 byte). Thus, the AX register could be accessed directly as "AX" or the programmer could access it's higher part AH or lower part AL. The same stands for BX, CX, DX registers (and others).
What's the point in doing this?
Imagine you have a 1 byte long value stored in memory (with label "my_value"). In order to change that value you have to access that byte (only that byte!) not the following memory areas or you may change some other stored value. Using a register of the appropriate size is thus mandatory to do things correctly!
Code 1: wrong, since we are writing out of bounds!

section .data mydata1 db 'y' ; This data is 1 byte long mydata2 db 'j' ; Some more data, stored after mydata1 [...] ; Now changing the byte from mydata1 label on mov ax, 'n' ; we want to change value 'y' to 'n'.. ; but we're making a mistake since ax is ; 2 bytes long (16 bit)! mov [mydata1], ax ; Data will spill out into mydata2

Code 2: correct

section .data mydata1 db 'y' ; This data is 1 byte long mydata2 db 'j' ; Some more data, stored after mydata1 [...] ; Now changing the byte from mydata1 label on mov ah, 'n' ; we want to change value 'y' to 'n'.. ; this is ok since ah is 1 byte long ; that is half of ax. mov [mydata1], ax ; Data will stay into ax

The same concept stands for 32 bit registers: the "EAX" register's *lower* part is called "AX" and is 16 bits long. "AX" is furtherly divided into "AH" and "AL". The same stands for EBX, ECX, EDX etc.