Professional Documents
Culture Documents
Hello World Operating Systems
Hello World Operating Systems
Hello World Operating Systems
After pressing the “ON” button on your computer, the BIOS of the computer reads
512 bytes from the boot devices and, if it detects a two-byte “magic number” at the
end of those 512 bytes, loads the data from these 512 bytes as code and runs it.
This kind of code is called a “boot loader” (or “boot sector”) and we’re writing a tiny
bit of assembly code to make a virtual machine run our code and display “Hello
world” for the fun of it.
Boot loaders are also the ver y first stage of starting any operating system.
It does that by reading the first 512 bytes from the boot devices and checks if the
last two of these 512 bytes contain a magic number ( 0x55AA ). If that's what these
last two bytes are, the BIOS moves the 512 bytes to the memor y address 0x7c00 and
treats whatever was at the beginning of the 512 bytes as code, the so-called boot
loader. In this article we will write such a piece of code, have it print the text "Hello
World!" and then go into an infinite loop.
Real bootloaders usually load the actual operating system code into memor y,
change the CPU into the so-called protected mode and run the actual operating
system code.
So in short:
1. The power supply gives steady, low-voltage power to all necessar y components
3. The CPU starts executing code on a small read-only memor y in your computer,
called the BIOS
4. The BIOS reads 512 bytes from each device that is configured to be bootable
5. Once it finds 512 byte that end in two specific numbers, it considers the 512 byte
to be a boot loader and runs it as a program.
6. This bootloader usually then loads the actual operating system and runs it,
amongst a few other administrative things.
In this article, we’ll write our own boot loader which will print “hello world” and
then stop executing. Not really a billion-dollar program, but the start of what could
become your own operating system!
For those of you, who are not familiar with x86 assembly language and/or the GNU
assembler, I created this description that explains just enough assembly to get you
up to speed for the rest of this article. The assembly code within this article will also
be commented, so that you should be able to glance over the code snippets without
knowing much about the details of assembly.
at its end. It's also worth mentioning that no matter if you have a 32 or 64 bit x86
processor, at boot time the processor will run in the 16 bit real mode, so our
program needs to deal with that.
Let’s create our boot.s file for our assembly sourcecode and tell the GNU assembler
that we'll use 16 bits:
Ah, this is going great! Next up we should give us a starting point for our program
and make that available to the linker (more on that in a few moments):
.code16
.global init # makes our label "init" available to the outside
Note You can call your label whatever you wish. The standard would be _start but I
chose init to illustrate that you can call it anything, really.
Nice, now we even got an infinite loop, because we keep jumping to the label, then
jump to the label again…
Time to turn our code into some binar y by running the GNU assembler ( as ) and see
what we got:
$ as -o boot.o boot.s
$ ls -lh .
784 boot.o 152 boot.s
Woah, hold on! Our output is already 784 bytes? But we only have 512 bytes for our
bootloader!
Well, most of the time developers are probably interested in creating an executable
file for the operating system they are targeting, i.e. an exe (Windows), elf (Unix)
file. These files have a header (read: additional, preceeding bytes) and usually load a
few system libraries to access operating system functionality.
Our case is different: We want none of that, just our code in binar y for the bios to
execute upon boot.
Usually, the assembler produces an ELF or EXE file that is ready to run but we need
one additional step that strips the unwanted additional data in those files. We can
use the linker (GNU’s linker is called ld ) for this step.
The linker is normally used to combine the various libraries and the binar y
executables from other tools such as compilers or assemblers into one final file. In
our case we want to produce a “plain binar y file”, so we will pass --oformat binary
to ld when we run it. We also want to specify where our program starts, so we tell
the linker to use the starting label (I called it init ) in our code as the program's
entry point by using the -e init flag.
$ as -o boot.o boot.s
$ ld -o boot.bin --oformat binary -e init boot.s
$ ls -lh .
3 boot.bin
784 boot.o
152 boot.s
Okay, three bytes sounds much better, but this won’t boot up, because it is missing
the magic number 0x55AA at bytes 511 and 512 of our binar y...
Making it bootable
Luckily, we can just fill our binar y with a bunch of zeroes and add the magic number
as data at the end.
Let’s start with adding zeroes until our binar y file is 510 bytes long (because the last
two bytes will be the magic number).
We can use the the preprocessor directive .fill from as to do that. The syntax is
.fill, count,size,value - it adds count times size bytes with the value value
wherever we will write this directive into our assembly code in boot.s .
But how do we know how many bytes we need to fill in? Conveniently, the assembler
helps us again. We need a total number of 510 bytes so we will fill 510 — (byte size of
our code) bytes with zeroes. But what is the “byte size of our code”? Luckily as has
a helper that tells us the current byte position within the generated binar y: . - and
we can get the position of the labels, too. So our code size will be whatever the
current position . is after our code minus the positon of the first statement in our
code (which is the position of init ). So .-init returns the number of generated
bytes of our code in the final binar y file...
.code16
.global init # makes our label "init" available to the outside
$ as -o boot.o boot.s
$ ld -o boot.bin --oformat binary -e init boot.s
$ ls -lh .
510 boot.bin
1.3k boot.o
176 boot.s
We’re getting there — still missing the final two bytes of our magic word:
.code16
.global init # makes our label "init" available to the outside
.word 0xaa55 # magic bytes that tell BIOS that this is bootable
Oh wait… if the magic bytes are 0x55aa , why are we swapping them here?
That is because x86 is little endian, so the bytes get swapped in memor y.
You could theoretically write this binar y into the first 512 byte on a USB drive, a
floppy disk or whatever else your computer is happy booting from, but let’s use a
simple x86 emulator (it’s like a virtual machine) instead.
I will use QEmu with an x86 system architecture for this:
qemu-system-x86_64 boot.bin
The fact that QEmu stops looking for bootable devices means that our bootloader
worked — but it doesn’t do anything yet!
To prove that, we can cause a reboot loop instead of an infinite loop that does
nothing by changing our assembly code to this:
.code16
.global init # makes our label "init" available to the outside
.word 0xaa55 # magic bytes that tell BIOS that this is bootable
This new command ljmpw $0xFFFF, $0 jumps to the so-called reset vector.
This effectively means re-executing the first instruction after the system boots
again without actually rebooting. It's sometimes referred to as a "warm reboot".
Luckily, we have the BIOS still around and reachable, so we can make use of its
functions. These functions (along with a bunch of functions that different
hardware provides) are available to us via the so-called interrupts.
In Ralf Brown’s interrupt list we can find the video interrupt 0x10.
A single interrupt can carr y out many different functions which are usually selected
by setting the AX register to a specific value. In our case the function “Teletype”
sounds like a good match — it prints a character given in al and automatically
advances the cursor. Nifty! We can select that function by setting ah to 0xe , put the
ASCII code we want to print into al and then call int 0x10 :
.code16
.global init # makes our label "init" available to the outside
.word 0xaa55 # magic bytes that tell BIOS that this is bootable
Now we’re loading the necessar y value into the ax register, call interrupt 0x10 and
halt the execution (using hlt ).
When we run as and ld to get our updated bootloader, QEmu shows us this:
We can even see that the cursor blinks at the next position, so this function should
be easy to use with longer messages, right?
To get a full message to display, we will need a way to store this information in our
binar y. We can do that similar to how we store the magic word at the end of our
binar y, but we’ll use a different directive than .byte as we wanna store a full string.
as luckily comes with .ascii and .asciz for strings. The difference between them
is that .asciz automatically adds another byte that is set to zero. This will come in
handy in a moment, so we chose .asciz for our data.
Also, we will use a label to give us access to the address:
.code16
.global init # makes our label "init" available to the outside
msg: .asciz "Hello world!" # stores the string (plus a byte with
value "0") and gives us access via $msg
.word 0xaa55 # magic bytes that tell BIOS that this is bootable
We have one new feature in there:
The first line loads the address of the first byte into the register bx (we use the
entire register because addresses are 16 bit long).
The second line then loads the value that is stored at the address from bx into al , so
the first character of the message ends up in al , because bx points to its address.
$ as -o boot.o boot.s
$ ld -o boot.bin --oformat binary -e init -o boot.bin boot.o
Well it turns out that the address at which msg is moved in the ELF file ( boot.o )
doesn't fit in our 16 bit address space. We can fix that by telling ld where our
program memor y should start. The BIOS will load our code at address 0x7c00 , so we
will make that our starting address by specifying -Ttext 0x7c00 when we call the
linker:
$ as -o boot.o boot.s
$ ld -o boot.bin --oformat binary -e init -Ttext 0x7c00 -o boot.bin
boot.o
QEmu will now print “H”, the first character of our message text.
What is also useful is the fact that x86 has a special register and a bunch of special
instructions to deal with strings.
In order to use these instructions, we will load the address of our string ( msg ) into
the special register si which allows us to use the convenient lodsb instruction
that loads a byte from the address that si points to into al and increments the
address in si at the same time.
init:
mov $msg, %si # loads the address of msg into si
mov $0xe, %ah # loads 0xe (function number for int 0x10) into ah
print_char:
lodsb # loads the byte from the address in si into al and
increments si
cmp $0, %al # compares content in AL with zero
je done # if al == 0, go to "done"
int $0x10 # prints the character in al to screen
jmp print_char # repeat with next byte
done:
hlt # stop execution
.word 0xaa55 # magic bytes that tell BIOS that this is bootable
Let’s look at this new code in QEmu:
Ή Yay! Ή
It prints our message by looping from print_char to jmp print_char until we hit a
zero-byte (which is right after the last character of our message) in si . Once we
find the zero byte, we jump to done and halt execution.
First things first: nasm can produce a raw binar y by itself and it uses the Intel
Syntax:
operation target, source - I remember the order with "W,T,F" - "What, To, From" ;-)
times 510-($-$$) db 0 ; fill the output file with zeroes until 510
bytes are full
boot.asm .
Note that the order of arguments for cmp are the opposite of the order that as uses
and [org] in nasm and .org in as are not the same thing!
nasm does not do the extra step via the ELF file ( boot.o ), so it won't move our msg
Yet, if we forget to set the start address of our code to 0x7c00 , the address that the
binar y uses for msg will still be wrong, because nasm assumes a different start
address by default. When we explicitly set it to 0x7c00 (where the BIOS loads our
code), the addresses will be correctly calculated in the binar y and the code works
just like the other version does.
. . .
WRIT T EN BY
Martin Splitt
Follow
Related reads
Related reads
Related reads