L02 Os Structures

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

38 - Introduction

As you've seen previously, an operating system has to protect the integrity of the hardware resources it manages
while providing the services to the applications. Thus it has many responsibilities and a variety of functional
components that provide these services.

But how should all these pieces fit together?


At least some of the components of the operating system will have to run in a privileged mode of the processor
architecture that allows them access to hardware. But must the whole operating system have this privilege?
Also, if an application would benefit from having certain services, for example, memory management handled
in a particular way. Can we personalize the services to suit the needs of the application? In other words, can we
make the operating system flexible in terms of the policies it implements for the services offered by it? Does this
flexibility have to come at the price of performance and or safety of the operating system?
These are some of the questions we will try to answer in this course module. In this course module, we will
learn the basics of operating system structuring issues.
We will use SPIN and Exokernel as case studies of two closely-related approaches to providing extensibility
of operating system services. We will then study a microkernel-based approach as well, using L3 microkernel.
39 - OS System Services Question

This is a free-form quiz, and what I want you to do is, name as many system services as you can, that you expect
from an operating system.

40 - OS System Services Solution

You may have hit many of the services that I've identified here, and plus even more. And even if you did not get
some of the things that I've listed here, that's okay. It's sort of refresh your memory as to what system services,
one can expect from an operating system.
41 - OS Structure

What do we mean by operating system structure? What we mean by this term is the way the operating system
software is organized with respect to the applications that it serves and the underlying hardware that it manages.
It's sort of like the burger between the buns. There is application at the top and technology or hardware at the
bottom and system software of the operating system is what connects the applications to the underlying hardware.

42 - Importance of OS Structure Question

The question is why do you think the structure of the operating system is important?

43 - Importance of OS Structure Solution

If you checked off all the boxes you're right on. All of these issues are important issues to worry about in the
structure of an Operating System.
44 - Goals of OS Structure

Let's now elaborate on the goals of operating systems structure.

• The first goal is protection. By protection what we mean is protecting the user from the system and the
system from the user, and also users from one another. Also protecting an individual user from his or her
own mistakes.
• An operating system, of course, provides services and one of the key determinants of a good operating
system structure is how good the performance of the operating system is. That is, what is the time taken
to perform services on behalf of the application. You've heard me say this even before in the previous
lecture. A good operating system is one that provides the service that is needed by the application very
quickly and gets out of the way.
• Flexibility, sometimes also called extensibility, meaning that a service that is provided by the operating
system is not one size fits all, but the service is something that is adaptable to the requirements of the
application.
• Another important goal is to ensure that the performance of the operating system goes up as you add more
hardware resources to the system. This is sort of an intuitive understanding, but you want to make sure
that the operating system developers on this intuitive understanding that when you increase the hardware
resources, the performance also goes up, and that's what is meant by scalability.
• It turns out that both the needs of the application may change over the lifetime of an application and also
the resources that are available for the operating system to manage and give to the application may change
over time. Agility of the operating system refers to how quickly the operating system adapts itself to
changes either in the application needs or the resource availability from the underlying hardware.
• Another worthwhile goal of operating system structure would be responsiveness. That is, how quickly
the operating system reacts to external events, and this is particularly important for applications that are
interactive in nature. Imagine you are playing a video game. In that case, what you want to see is when
you do something like clicking the mouse to shoot at a target, you want to see action immediately on the
screen. So that is responsiveness, how quickly the operating system is reacting to external events.
45 - Commercial OS

Are all the goals simultaneously achievable in a given operating system? At first glance it would seem that some
of the goals conflict with one another. For example, it might seem that to achieve performance, we may have to
sacrifice protection and/or flexibility.
Let's explore how researchers have used their ingenuity to have the cake and eat it too. You're probably
wondering how the commercial operating systems that you and I use on an everyday basis meets many of these
goals that I identified.
The short answer to that question is they don't meet all the goals.
We will return to the influence of research leading up to the current state of the art in operating system structure
towards the end of this course module.
46 - Monolithic Structure

Now let's talk about different approaches to operating system structuring.

The first structure that I will introduce to you is what we will call as a monolithic structure.
• You have the hardware at the bottom which is managed by the operating system and hardware includes
the CPU, memory, peripheral devices such as the network and storage and so on.
• There are applications at the top. Each of these applications is in its own hardware address space. What
that means is that every application is protected from one another because the hardware ensures that the
address space occupied by one application is different from the other applications and that is the first level
of protection that you get between the applications themselves.
• All the services that applications expect from the operating system are contained in this blob and that
might include file system and network access, scheduling these applications on the available CPU, virtual
memory management, and access to other peripheral devices.
• The OS itself is a program providing entry points for the applications to the OS services. The code and
the data structure of the OS is contained in its own hardware address space. What that means is that the
OS is protected from the applications and vice-versa.
• So even if an application were to misbehave either maliciously or unintentionally, it won’t affect the
integrity of the OS because they are in their own address spaces.
• When an application needs any system service, we switch from the application address space to the OS
address space and execute the system code that provides the service. For example, accessing the file from
the hard disk, or dynamically allocation of more memory that an application may want, or sending a
message on the network. All of these things are done within the confinement of the address space of the
operating system itself.
Note that all of the services expected of the operating system, file system, memory management, CPU
scheduling, network and so on, are all contained in this one big blob. That is the reason it's also sometimes referred
to as the monolithic structure of an operating system.
47 - DOS-like Structure

Some of you may remember Microsoft's first entry in the world of PCs, with their operating system called DOS,
or disc operating system. The structure of DOS looks as shown here. And at first glance, at least visually, you
might think that this structure is very similar to what I showed you as a monolithic structure before.

48 - DOS-like Structure Pros and Cons Question

49 - DOS-like Structure Pros and Cons Solution

You have noticed visually that the key difference was, the red line was replaced by a dotted line, separating the
application from the operating system.
• What you get out of that is performance. Access to system services are going to be like a procedure call.
• What is lost in the DOS-like structure is the fact that you don't have protection of the operating system
from the application. An errant application can corrupt the operating system. We'll elaborate on this in the
next few panels.
50 - DOS-like Structure (cont)

So in the DOS-like structure, the main difference from the monolithic structure is that the red line separating the
application from the operating system is now replaced by a dotted line.

What that means, the main difference is there is no hard separation between the address space of the application
and the address space of the OS.
• The good news is an application can access all the OS services very quickly, as they would execute any
procedures within their own application. At memory speed, an application can make calls into the
operating system and get system services.
• The bad news is that there is no protection of the OS from an errant application. So, the integrity of the
OS can be compromised by a runaway application, either maliciously or unintentionally corrupting the
data structures in the operating system.
Now, you may wonder why DOS Chose this particular structure. In the early days of PC, it was thought that a
personal computer, as the name suggests, is a platform for a single user and more importantly, the vision was
there will be exactly one app running at a time, not even multitasking. So performance and simplicity was the key
and protection was not primary concern in the DOS-like structure.
The operating system is not living in its own address space. The application and the operating system are in
the same address space. Therefore, making a system call by an application is going to happen as quickly as the
application would call a procedure which the application developer wrote himself or herself.
51 - Loss of Protection in DOS like Structure

This loss of protection with the DOS-like structure is simply unacceptable for a general purpose OS today.

On the other hand, the monolithic structure gives the protection that is so important.
• At the same time, what it strives to do is it reduces the potential performance loss by consolidating all the
services in one big monolithic structure. That is, even though an application has to go from its address
space, into the OS address space, it is usually the case that the OS has several components and they have
to talk to one another in order to provide the service that an application wants.
• Think about the file system, for instance. You make a call to the file system to open a file and the file
system then may have to call the storage module in order to find out where exactly a file is residing. It
may have to contact the memory manager module to see where it wants to bring in the file that you want
to open.
• So in this sense there's infraction that's going to go on under the cover inside the operating system between
OS components, in order to satisfy a single service call from an application.
But what is lost in the monolithic structure is the ability to customize the OS service for different applications.
This model of one size fits all, as far as the system service is concerned with the monolithic structure, lost the
opportunity for customizing the OS service for the needs of different applications.
Now, you may wonder why do we need to customize the OS service for different applications? Why not one
size fits all? Why is there an issue? If you look at a couple of examples, the need for customization will become
fairly obvious. For example, interactive video games or another application, which is computing all the prime
numbers. You can immediately see that the OS service needs for these two classes of applications are perhaps
very different. On the one hand, for the little kid who is playing a video game, the key determinant of a good OS
would be responsiveness. How quickly the OS is responding to his nifty moves when he plays his video game.
On the other hand, for the programmer that wrote this prime number computing application, the key determinant
of performance is going to be sustained CPU time that's available for crunching his application.
52 - Opportunities for Customization

Let's explore the opportunities for customization with a very specific example of memory management, in
particular how an OS handles page faults.

Let's say that this thread executing on the processor incurs a page fault.
• The first thing that the OS has to do in order to service this page fault will be to find a free page frame to
host the missing page for this particular thread.
• Once it allocates a free page frame, then the OS is going to initiate the disk I/O to move the page from
disk into the free page frame.
• Once the I/O is completed and the missing page for this thread has been brought from storage into the free
page frame, the OS is going to update the page table for this thread or process, establishing the mapping
between the missing virtual page and the page frame that had been allocated for hosting that missing page.
• Once the page table has been updated, we can resume the process.
• At the point of the page fault, another thing that happens every so often, is that the OS will run a page
replacement algorithm to free up some page frames, in preparation for a page fault that a process may
incur.
Just as an airline overbooks its seats in the hope that some passengers won't show up, the OS is also
overcommitting its available physical memory hoping that not all of the pages of a particular process which is in
the memory footprint of the process will actually be needed by the process during its execution.
But how does the operating system know what the memory access pattern of a particular process is going to
be in making this decision? The short answer is it does not.
So whatever the operating system chooses as an algorithm to implement page replacement, it may not always
be the most appropriate one for some class of applications. So here is an opportunity for customization depending
on the nature of the application. Knowing some details about the nature of the application, it might be possible to
customize the way page replacement algorithm is handled by the operating system. Similar opportunities for
customization exist in the way the operating system schedules processes on the processor and reacts to external
events such as interrupts and so on.
53 - Microkernel based OS Structure

There is a need for customization and the opportunity for customization is what spurred OS designers to think of
a structure of the operating system that would5 allow customization of the services and gave birth to the idea of
microkernel-based OS.

As before, each of the applications is in its own hardware address space, the microkernel runs in a privileged
mode of the architecture, and provides simple abstractions such as threads, address space, and inter-process
communication. In other words, small number of mechanisms are supported by the microkernel.
The keyword is mechanisms, there are no policies ingrained in the microkernel, only mechanisms for accessing
hardware resources. The OS services, such as virtual memory management, CPU scheduling, file system, and so
on, are all implemented as servers on top of the microkernel.
In other words, these system services execute with a same privilege as the applications themselves. Each of
the system service is in its own address space and it is protected from one another and protected from the
application. The microkernel, being below this red line, is running in privileged mode and is protected from all
of the applications and other system services.
In principle, there is no distinction between regular applications and the system services that are executing a
server processes on top of the microkernel. Thus, we have very strong protection among the applications, system
services and the microkernel. Now, the structure entails that you need the microkernel to provide inter-process
communication so that the applications can request system services by contacting the servers and the servers need
to talk to one another as well.
So we have gained extensibility because these OS services are implemented as service processes, we can have
server processes of same functionality but with different characteristics. For instance, different applications may
choose to use different file systems. No longer a one size fits all, it is easy to extend the services that are provided
with the operating system to customize the services depending on the needs of the application. This all sounds
good, but is there a catch?
54 - Downside to Microkernel

Is there a downside to the microkernel based approach? Well, there is.

There is a potential for performance loss.


• In monolithic structure, let's say this application makes a call to the file system to open up a file. We slip
through this red line into the OS address space and run in privileged mode. Hardware architecture of the
CPU usually allows a privileged mode for execution of the OS code. So now the app is now inside the OS
in a privileged mode with one instruction usually, called a trap instruction. For example, a system call
results in a trap into the OS. And once inside the operating system, all the work that needs to be done in
order to satisfy the file system call (e.g. contact the storage manager, contact the memory manager and so
on) are available as components within this blob. All those components can be accessed at the speed of
normal procedure call in order to handle the original request from this application.
• On the other hand, if you look at a microkernel based structure, the application has to make an IPC call in
order to contact the service, in this case, a file system service. This means that the application has to go
through the microkernel, making the IPC call. Going up to the file system and the file system does the
work, makes another IPC call in order to deliver the results of that system service back up to the application.
So the minimum traversal so that you can see is going from the application of the microkernel, microkernel
to the file system, and back into the microkernel and back up to the application.
Potentially, there may be many more calls that may happen among servers that are sitting above the
microkernel. Because the file system may have to contact the storage manager and the file system may have to
contact the memory manager. All of those are server processes living above the microkernel and all of them
require IPC to talk to one another. So there is a potential that we may have to switch between the address spaces
of the application and many services that are living on top of the microkernel.
In the case of the monolithic structure that I showed you here, there is only two address space switches, one to
go from the application into the OS, and the other to return back to the application. Whereas in a microkernel
based design, there could potentially be several address space switches depending on the number of servers that
need to be contacted in order to satisfy one system call that may be emanating from the application.
55 - Why Performance Loss

Why do we have this potential for performance loss with the microkernel based design?
Mainly because of the border crossings. That is, going across hardware address spaces, can be quite expensive.

• First there is this explicit cost of switching the address space, from one hardware address space to another
hardware address space. That is the explicit cost. (PCB store/read, TLB update, scheduling, etc)
• In addition to the explicit cost of going across address spaces, there are implicit costs involved in this
border crossing. That comes about because of change in locality. We're going from one hardware address
space to a different address space, and that changes the locality of execution of the processor. That means
that memory hierarchy, in particular the caches close to the processor may not have the contents that are
needed for executing the code and accessing the data structures of a particular server. A change in locality
is another important determinant of performance and it can adversely affect the performance. (cold cache)
• Also, when we are going across address spaces to ensure the integrity of the system, either the micro-
kernel, or the server that is living on top of the microkernel. There may be a need to copy from user space
to system space. Those kind of copying of data from the application's memory space into the microkernel
and back out to a server process. All of those can result in affecting the performance of the operating
system. (communication cost)
On the contrary, in a monolithic structure, since all the OS components are contained within the same address
space, it is much easier for sharing data without copying. That's one of the biggest potential sources of
performance loss when we have this microkernel based structure.

56 - Features of Various OS Question


57 - Features of Various OS Solution

• A monolithic structure definitely gives you protection. We also discussed that it is performant because of
the fact that border crossings are minimized and loss of locality is minimized and, and the copying
overhead is also minimized. All of that adds up to good performance for the monolithic structure. But t's
not very extensible. Any change to the OS would require rebuilding the monolithic structure. So, one size
fits all is what you get with a monolithic structure.

• A DOS-like structure is performant because there is no separation between the application and the
operating system. Therefore, an application can execute system services at the same speed as a normal
procedure call. It's also easily extensible because you can build new versions of system service to cater to
the needs of specific applications. But on the other hand, it fails on the safety attribute because there is no
boundary separating the kernel from the user space.

• A micro-kernel OS pays attention to protection because it makes sure that the applications and the servers
are in distinct hardware address spaces separated from the microkernel itself. It is also easily extensible
because you can have different servers that provide the same service but with different characteristics to
cater to the needs of the application. But it may have performance flaws because of the need for so many
border crossing between applications and the server processes.
Having said that I want to give a note of caution, on the surface it may appear that the microkernel based
approach may not be performant because of the potential for frequent border crossings.
I'll have a surprise for you on this aspect when we discuss the L3 microkernel later on in this course module
where it is shown that a microkernel can be made performant by careful implementation, that's the key.

58 - What do we Want

Here's another way to visualize the relationship between these different attributes that I mentioned of performance,
extensibility and protection or safety.
• A DOS-like structure that does not pay attention to safety or protection, needs the two attributes of
performance and extensibility.

• A micro kernel based approach achieves protection and extensibility, but may have issues with respect to
performance.

• A monolithic structure may yield good performance, has protection. But it is not easily extensible.

Now what do we want? Of course we want all three of these characteristics in an operating system structure.
But can we have all of these three characteristics in an operating system?
In other words, what we would like the operating system structure to be such that we get to this center of the
triangle that caters to all three attributes. Performance, Extensibility, and Protection. The research ideas that we
will study in this course module is looking at ways to get to the center of the triangle so that all three attributes
can be present in the structure of the operating system. We will resume the course module with research
approaches that have been proposed than that we will cover in this course module that help us get to the middle
of the triangle.
59 - Introduction

So now we set the stage for discussing the spin and the exokernel approaches to achieving extensibility of the
operating system without losing out on protection or performance. Both these approaches start with two premises.
• The first premise is that micro-kernel based design compromises on performance due to frequent border
crossings.
• The second premise is that monolithic design does not lend itself to extensibility.
Because of the starting premises of spin and exokernel, both these approaches have certain commonality in what
they strive to do, although the path taken by these two approaches are very different.
60 - What are we Shooting for in OS Structure

So, let's revisit what we are shooting for in the structure of an operating system.

• We want the operating system structure to be thin. That is, like a microkernel. That is only mechanisms
should be in the kernel, and no policies should be ingrained in the kernel itself.

• The structure should allow fine-grained access to system resources without border crossing, as much as
possible. That is, it should be like the DOS-like structure.

• It should be flexible, meaning resource management should be easily morphed to suit the needs of the
application without sacrificing protection and performance. So, the flexibility part of it should be similar
to what we can get from microkernel based approach, but at the same time we want the protection and the
performance we can get with the monolithic approach.
So in other words, in a nutshell what we want in the operating structure is performance, protection, and flexibility.
61 - Approaches to Extensibility

Historically I should mention that there was interest in extensibility at least as far back as 1981. With a system
that was developed at CMU called the Hydra operating system.
• The Hydra operating system provided kernel mechanisms for resource allocation. The key word is only
mechanisms, not policies.

• It had a way of providing access to resources, using a capability based approach. The notion of a
capability has a special connotation in the operating system literature. It is an entity that can be passed
from one to the other. It cannot be forged. It can be verified. All of the things that you want in order to
make sure that the system integrity is not compromised, as enshrined in this abstract notion of capability.

• And as originally envisioned, capability was a heavyweight mechanism in terms of implementing it


efficiently in an operating system. Because capability is a heavyweight mechanism, the Hydra OS resource
managers were built as coarse-grained objects, in order to reduce the border crossing overhead.

• Border crossing in the Hydra system means that you have to pass capability from one object to another
and validate the capability for entering a particular object. For that reason, Hydra used coarse-grained
objects to implement resource managers. That way, they can reduce the border crossing overhead.

• Implementing resource managers as coarse-grained objects also means that it limits the opportunities for
customization and extensibility. In other word, the closer you make these object, the less opportunity you
have for customizing the services which is exactly the strike against the monolithic kernel.
So in principle, Hydra had all the right ideas of providing minimal mechanisms in the kernel and having the
resource managers implement policies because the fundamental mechanism for accessing the resources was
through this capability, which is a heavy weight abstract notion to implement efficiently. In practice, Hydra did
not fully achieve its goal of extensibility.
One of the most well-known extensible operating system of the early 90s was the Mach operating system from
CMU.
• It was microkernel-based, providing very limited mechanisms in the microkernel.

• Implementing all the services that you expect from an operating system as server processes that run as
normal, user level processes above the kernel. Clearly, with this micro kernel based approach, Mach
achieved its goal of extensibility. So it focused on extensibility and portability. The keyword is portability.

• Therein lies the rub, performance took a backseat, because Mach was very much focused on making the
operating system portable across different architectures, in addition to paying attention to extensibility.

• Since operating systems are generally so focused on performance, this design choice in Mach of
supporting portability gave microkernel-based design a bad press.

• Later on, when we look at L3 approach to microkernel-based design, we will revisit the right way to build
a microkernel-based design.

In this lesson, let's focus on Spin approach to extensibility.


The key idea in spin is to co-locate a minimal kernel with its extension in the same hardware address
space and avoid the border crossing between the OS kernel and these extension components.
Does it compromise on protection? Wasn't that the strike against the DOS-like structure that we talked about
earlier? The approach that Spin took was to rely on the characteristics of a strongly typed programming
language, so that the compiler can enforce the modularity that we need in order to give guarantees about the
protection. By using a strongly typed language (Modular-3), the kernel is able to provide well defined
interfaces.
• All of you may be quite familiar with declaring function prototypes in a header file and having the actual
implementation of the procedures in other files in a large software project. This is the same idea that is
now taken to the design of the operating system itself.

• After all, operating system is also a piece of software, a complex piece of software. Why not use a
strongly typed language as the basis for building the operating system? That's the idea in the spin
approach.
• Now, what you get when you use a strongly typed language is that you cannot cheat.

• For instance, in a language like C, you can type cast pointers so that a given data structure can be viewed
completely differently, depending on what you need to get done.

• That's not possible with a strongly typed language. Data abstractions provided by the programming
language such as an object serve as containers for logical protection domains. That is, we are no longer
reliant on hardware address spaces to provide the protection between different services.
The kernel provides only the interfaces and these logical protection domains actually implement the functionality
that is enshrined in those interface functions. There can be several implementations of the interface functions and
that's where the flexibility comes in.
Applications can dynamically bind different implementations of the same interface functions. That's how we
get different instances of specific system components, getting you the flexibility that you want in constructing an
operating system.
• Because we have co-located the kernel and the extension in the same hardware address space, we are
making the extensions as cheap as a procedure call.

• So in a nutshell, what we've accomplished with a Spin approach to extensibility as we are writing on the
characteristics of a strongly typed programming language, that enforces strong typing and therefore allows
the operating system designer to implement logical protection domains instead of relying on hardware
address spaces.

• And consequently we're making extensions as cheap as procedure calls.


62 - Logical Protection Domains

Modula-3 is a strongly typed language with built-in safety and encapsulation mechanisms.
• It does automatic management of memory so there are no memory leaks.
• Modula-3 supports a data abstraction called an object with well-defined entry points.
• Only the entry points are known outside the object, not the implementation nor the data structures inside
the object.
• Therefore there's no cheating possible as you can do with a language like C.
• Modula-3 allows exposing the externally visible methods inside an object using generic interfaces.
• It also supports the notion of threads that execute in the context of the object, and it allows raising
exceptions, for example, when there is a memory access violation.
All of the features allow implementing system services as an object with well-defined entry points.
Modula-3 allows the creation of logical protection domains.
What you can do from outside the object is what the entry point methods will let you do and no more.
In other words, we are getting the safety property of a monolithic kernel without having to put system code in a
separate hardware address space.
In other words the logical protection domains give you both protection and performance.
Now, what about flexibility? The genetic interface mechanism allows you to have multiple instances of the same
service. A given application may be able to exploit the different instances of services that are available to the
same generic interface, and that's the way you can get flexibility as well.
Objects that implement specific services can be the desired granularity of the system designer. It can be fine-
grained, or it can be a collection.
• You can think of individual hardware resources as fine-grained object. For example, a page frame and
what you can do with a particular page frame.
• You can have interfaces that provide a certain functionality. That can be what an object is. For example a
page allocation module can be on object.
• It can also make a collection of interfaces into an object. For example, an entire virtual memory subsystem
can be an object that is hierarchically composed of page allocation module, and within that, you may have
hardware resources defined as objects as well.
All of these objects, whether it is at the corse level of a collection of interfaces, or individual interface that is a
component of this collection, or specific hardware resources, all of those are accessible via capabilities.
• Now the word capability may give you jitters, because I just now said that capabilities traditionally in the
operating system parlance signifies a heavyweight mechanism
• But because we are dealing with a strongly typed language, capabilities to objects can be supported as
pointers.
• In other words, the programming language supported pointers can serve as capabilities to the objects.
• So now, with this idea, access to the resources (that is entry point functions within an resource object) is
provided via capabilities that are simply language supported pointers.
• Because they are language supported pointers, these capabilities that we are talking about here, are much
cheaper compared to real capabilities as was used in the hydra operating system.

63 - Pointers Question

64 - Pointers Solution

The right answer is Modula-3 pointers are type-specific.


That is, pointers in Modula-3 cannot be forged, there's no way to subvert the protection mechanism that is built
into the language.
You cannot take a data structure and cast it to appear like something else in Modula-3.
That is what allows us to implement logical protection domains in Modula-3 using objects and capabilities to
objects as pointers supported by the programming language.
65 - Spin Mechanisms for Protection Domains

There are three mechanisms in SPIN to create protection domains and use them.
The first one is the CREATE() call that allows creating a logical protection domain.
• This mechanism in SPIN allows initiating an object file with the contents and export the names that are
contained as entry point methods inside the object to be visible outside.
• For example, if I'm creating a memory management service, I can write the entry point functions in my
memory management service and export the names using this create mechanism that's available in SPIN.
The second mechanism in SPIN is RESOLVE().
• If one protection domain wants to use the names that is in another protection domain, the way we can
accomplish that is by using this resolve primitive that's available in SPIN.
• Resolve is very similar to linking two separately compiled files together that a compiler does routinely.
• So, you may be very familiar with the compilation process where you may separately compile files and
once you have done the separate compilation of the files, then you go through a link phase of the compiler
where the linker resolves the names that are being used by one object file with the names that are defined
in another object file.
• RESOLVE() resolves the names that are being used in source, which is a logical protection domain, and
the target, which is another logical protection domain. As the result of this resolve step, the source logical
protection domain and the target logical prediction domain are dynamically linked or bound together. And
once bound together, accessing methods that are inside this target protection domain happens at memory
speeds, meaning it is as efficient as a procedure call.
To reduce the proliferation of small logical protection domains and create an aggregate larger protection domain,
SPIN provides the COMBINE() mechanism.
• Once the names in a source and target protection domain have been resolved, they can be combined to
create an aggregate domain.
• The aggregate logical protection domain will have entry points, which is the union of the entry points that
were exported as names from the source and the target or any number of such domains that have been
combined together to create an aggregate domain.
• So this combined primitive in SPIN is mainly useful as a software engineering management tool to combat
the proliferation of many small domains.
So, once again, the road map for creating services is,
Write your code as a Modula-3 program with well-defined entry points.
• Use the SPIN mechanism of CREATE(), you can instantiate a service and export the names that are
available in that service.
• If another logical protection domain wants to use the names that are exported, it can do so by using the
SPIN mechanism RESOLVE() that causes the dynamic binding of the source and target logical protection
domains.
• Finally, the COMBINE() primitive allows aggregation of logical protection domains to create an
aggregate domain, that's the union of all the entry points that are available in the component logical
protection domains.
This is the secret sauce in SPIN to get protection and performance while allowing flexibility.
Everything hinges on the strongly-typed nature of the programming language that is being used for
implementing the operating system.
The language allows compile time checking, and run time enforcement of the logical protection domains.
That's the key to the success of this approach to providing flexibility, protection, and performance, all in one bag.
66 - Customized OS With Spin

So the upshot of the logical protection domain is the ability to extend SPIN to include OS services and make that
all part of the same hardware address space, so no border crossing between the services or the mechanisms
provided by SPIN.

So here is one example(Green) where all these system services are implemented as protection domain and using
create, resolve and combine. We've created all these services as logical extensions of SPIN.
Here is another extension (Purple) living on top of the same hardware, concurrently with the first extension.
As you see, each of these bounds represent a completely different operating system.
Each of these bounds may have their own subsystems for the same functionality.
For instance, this process uses memory manager two. This process uses memory manager one. Both of them
implement the same functionality. But very differently to cater to the needs of the applications that need those
services.
But they may also have common subsystems. For example, the network protocol stack, may be shared by both
extensions that live on top the same hardware framework.
67 - Example Extensions

• Here is a concrete example of an extension. It's a fairly standard implementation, let's say of Unix
operating system, but it is implemented as an extension on top of the Spin.
• Here is a more fun example. A client server application that is implemented directly on top of spin as an
extension. In other words, there is no operating system.
A display client uses an extension interface to implement the functionality for displaying video that is going to
be sent by a video server. So both the client and server extensions sit on top of basic Spin, and provide the exact
the functionality that is needed for the video application.
The bounding box here is showing Spin and the extensions thereof.
Similarly the bounding box here is showing Spin and the extension thereof.
• In the Unix example case, it is an entire operating system on top of Spin.
• In the video server + client case, it is just the video service itself as an extension on top of spin.
68 - Border Crossings Question

Now it's time for a question. And the question to you is, which of the above structures will result in the least
number of border crossings? Is it the monolithic structure, is it the micro-kernel structure, or the spin structure,
or either spin or monolithic structure?

69 - Border Crossings Solution

The right answer, either SPIN or monolithic, will result in the least number of border crossings.

Why?
• In the microkernel based structure, we're assuming that each one of these services are available as server
processes in their own hardware address space. Therefore any system service that an application needs
may have to go through multiple border crossings.

• By border crossing we, of course, mean going across different address spaces and the incumbent loss of
locality that it entails.

• Whereas, in the case of monolithic OS, you have only two border crossings: one into the monolithic kernel
and one out of the kernel.

• In Spin also, by construction, we are taking SPIN and extending it with the services so all the services all
contained in the same hardware address space.
• Even though we need to go through protections domains, but those protection domains all logical
protection domains. It does not involve border crossing that entails change of locality and loss of
performance.
70 - Spin Mechanisms for Events

An operating system has to field external events. External interrupts may come in when a process is executing.
The process itself may incur some exceptions such as a page fault. Or it may make system calls. All of these are
events that needs to be fielded by the operating system and SPIN has to support such external events.
SPIN supports external events using an EVENT based communication model.
Services can register what are called event handlers with the SPIN event dispatcher.
SPIN supports several types of mapping. It support a one to one mapping between an event and a handler. It
also support one to many mapping between an event and the handlers. And it also supports many to one mapping
(many events being mapped to the same handler).
In this picture it is a typical protocol stack. You may have several different interfaces available on your machine.
Therefore, a network packet may arrive through one of several interfaces, e.g. an Ethernet interface and an ATM
interface.
• A network packet may arrive on the Ethernet port or an ATM port. Those are events, and both of those
may be IP packets, in which case there is an IP handler that needs to see the events. So, here is an example
of many events mapping to the same handler. Ethernet packet arrival is an event, ATM packet arrival is
an event, different events but they map the same IP handler. That's an example of many-to-one mapping.
• The processing of the packet by this IP handler results in an IP packet arrival event. And there will be
several clients of the IP layer of the protocol stack. There will be UDP transport, there will be TCP
transport, there will be an ICMP layer. And they are all sitting on top of this IP network layer. So when
an IP packet arrival event is figured by this handler, there are multiple clients for that, and this is an
example of a one to many mapping.
• The SPIN dispatcher allows any number of handlers to be registered as handling a particular event type.
And when that even type is reached, then all the handlers associated with that event type will get scheduled
by the SPIN dispatcher. The order in which they get scheduled is not something that the designer can
count on, because SPIN has freedom in the ordering which these event handlers may get scheduled when
a particular event arise. But all the handlers that are associated with an event will get triggered when that
event is released. That's an example of one to many mapping.
• Finally, here is an example of a one to one mapping. If it's an ICMP packet, then it raises an event that an
ICMP packet has arrived. Maybe there is only one client for that particular event, and that may be the ping
program. So that's one-to-one mapping.
Even handlers may also be specified with guards of finer-grain handler execution. For example, this handler
could specify that only when IP packets arrive it should be executed. So that's a guard that the IP packet handler
could specify so that even though different kinds of packets may arrive on these interfaces, this handler will only
get triggered when the packet that arrived on the different interfaces than IP packet.
71 - Default Core Services in Spin - Memory Management

Spin provided the toolbox needed for building an operating system, one can build each of the essential OS services
we talked about early on from scratch as extensions into SPIN. (memory management, CPU scheduling, threads,
file system, network protocol stack, and so on)
Memory management and CPU scheduling are core services that any operating system should provide. But an
extensible operating system should not dictate how these services should be implemented. SPIN provides
interface procedures for implementing these services.
Note:
The macro-allocation of a bunch of physical memory to an extension, it's outside the scope of this discussion.
Assume that the allocation of a bunch of physical memory happens when an extension has started up. The
discussion in this tablet frame is to do with the management of the pre-allocated physical memory by the
extension.

Native OS such as Linux or Windows manages the physical memory by themselves. SPIN wants to allow
extensions to manage the pre-allocated physical memory allocated to them in whatever fashion they choose to.
The interface functions that I'm showing you here from memory management are simply header files provided
by SPIN.
• For example, allocating a page frame. Deallocating a page frame. Reclaiming a page frame.
• Similarly, allocating a virtual page or deallocating a virtual page which might be used for dynamic
memory allocation.
• Translating, has to do with creating and destroying address spaces, adding or removing mapping between
virtual pages, and physical frames.
• Because of what we said earlier about overcommitment of memory (airline ticket analogy), not all of a
process's address space will be able to fit in physical memory. So, there are event handlers, that are
provided as part of the core service of SPIN for handling page fault, access fault. If you had a page that is
write protected and if a process tries to write to it, that's an access violation. Or, if a process is trying to
access a region of memory that it doesn't have access to, generating a bad address exception.
All of these are interface functions that are defined as core services of memory management in the SPIN
operation system.
It is not saying anything about how these services are implemented, but it is giving you just a header file. The
implementer of an extension has to write the actual code for these header functions and create a logical protection
domain that corresponds to physical address management, virtual address management, translation management,
and the handler functions for dealing with these different types of events.
Once the logical protection domain is dynamically instantiated, it becomes an extension of SPIN, and after
that there's no border crossing between a particular service that has been so instantiated and SPIN itself. And all
of these functions are invoked automatically when the hardware events occur, corresponding to a page fault or
access violation fault and so on.

72 - Default Core Service in Spin - CPU

SPIN arbitrates another precious resource, the CPU.

SPIN only decides at a macro level, the amount of time that is given to a particular extension. That's done through
the SPIN global scheduler.
The global scheduler interacts with the application threads package. An application is a loose term here, it is
actually the extension that is living on top of SPIN, which may be an entire operating system or maybe just an
application.
• For example, let's say, we are running Linux and Vista as two extensions on top of SPIN. Each maybe
given a particular time slice, say of x milliseconds. How each extension uses the time that has been given
to it for scheduling user-level processes running inside the operating system is entirely up to those
extensions.
To support the concept of threads in the OS and management of time, SPIN provides an abstraction called strand.
The actual operating systems that extend SPIN will have the threads mapped to strands.
• Strand is the unit of scheduling that SPIN's global scheduler uses, but the semantics of the strand is
entirely decided by the extension.
• For instance, if I'm implementing pthreads, I will define the semantics of the strand to be the semantics of
the p-thread's scheduler. And there are event handlers that help in the scheduling that needs to happen in
the extensions.
The kind of events that SPIN provides for this core service of CPU scheduling are block, unblock, checkpoint,
and resume. The extensions event handlers have to give the semantic meaning of what needs to happen when
these event handlers are called, because these are only interface functions.
• What needs to happen when this interface function is called is up to the extension. For example, a disk
interrupt handler may result in an unblock event being raised for a particular strand that was waiting for
the disk I/O completion.
• Similarly, if an application were to make a system call that is a blocking system call, then the service that
provides that facility to the application will raise this block event, which will result in the extension taking
the appropriate action of saving the state of the currently running process and putting it in the appropriate
queues that it has, to wait for that system call completion.
So in a nutshell, what SPIN provides are exactly the kind of primitives that may be needed by an extension that
wants to provide the service of CPU scheduling. SPIN only provides the interface function definitions. The
semantics are all up to the extension on how exactly the scheduling is affected.
All that SPIN does is to ensure that the extension gets time on the CPU through this global scheduler that SPIN
has for allocating time to different extensions that may be concurrently living on top of SPIN.

73 - Conclusion

There are some deep implications that may not be readily obvious.
Core services are trusted services, since they provide access to hardware mechanisms.
Why? The services may need to step outside the language-enforced protection model to control the hardware
resources.
In other words, the applications that run on top of an extension have to trust the extension.
Extensions to core services affect only the applications that use that extension, so that it is not catastrophic and
does not affect other applications that do not rely on this particular extension.
74 - Exokernel Approach to Extensibility

Having seen SPIN's approach to extensibility, now we will look at Exokernel's approach to operating system
extensibility. The name, exokernel, itself comes from the fact that the kernel exposes hardware explicitly to the
operating system extensions living above it.
The basic idea in Exokernel, is to decouple authorization of the hardware from its actual use.
Let's say you want to do research in my lab. I may interview you, and once we're on the same page, I'll give you
a key to the lab and the resources you need to work in the lab (such as laptop, servers and so on). Then I get out
of the way when you actually use the resources.

That's the same idea in Exokernel.


Library operating system asks for a resource.
Exokernel will validate the request for the resource from the library and bind the request to the specific hardware
resource.
In other words, Exokernel exposes the hardware that was requested by the Library OS through creating a secure
binding between the ask and the actual hardware resource.
Once Exokernel has established this binding, it creates an encrypted key for the resource, and gives it to the
requesting library operating system.
After that, the semantics of how the resource is going to be used by the library is entirely up to the library,
of course, within the norm of accepted use. There are certain accepted norms for the user's resource that Exokernel
may have imposed, so as long as the library operating system is staying within those norms, then the semantics
of how a particular hardware resource is used is entirely up to the library operating system.
Once a library OS has asked for a resource and Exokernel has created the binding for that resource to the
requesting library operating system, then, the operating system is now ready to use the resource.
Now, how does it use the resource? Basically, what the library operating system will do is present the encrypted
key that is received, authenticating that use of the resource for this library to the Exokernel. In other words,
Exokernel will be able to validate whether the key presented to it, is the key that was presented for this particular
libraries operating system.
So in other words, the key cannot be forged, cannot be passed around.
If I gave a key to this library operating system, that key, if it is presented to the Exokernel by this library operating
system, it's a valid key.
Even if it's a valid key, but it is not the operating system to which Exokernel gave the key, then that request would
be denied.
So with a valid key, any time the library operating system can present the key to the Exokernel, Exokernel will
validate it, and then, the library operating system is free to go in using that resource for which it has this valid
key.
This is sort of like a doorman in an apartment building, checking when a resident comes in, whether the resident
is a bona fide occupant of the residence.
Once inside his apartment, what the resident does is not something that the doorman cares about.
Exactly the same thing is being done by Exokernel as a doorman for using the hardware resource for which a
valid key exists with a library operating system.
So, establishing the secure binding is a heavy duty operation. Once such a secure binding has been established,
the actual use of the hardware is going to be much cheaper.
75 - Examples of Candidate Resources

You are thinking, wow this sounds tedious and not performance conscious if exokernel has to validate the key
every time for the library to use it.
Well, it depends on what we mean by a resource. Let's look at some examples.

Here is an example of a candidate resource, a TLB entry.


• TLB entry is a record of mapping between a VPN and a PFN and the mapping by the library OS.
• Now, once the mapping has been done by the library OS, it presents the mapping to the exokernel along
with capability key for a particular TLB entry.
• Exokernel validates it and puts this mapping into the specific TLB entry of the hardware TLB.
• Putting an entry into the hardware TLB is a privileged operation. The library OS cannot do it by itself,
because it doesn't have the same privilege as exokernel. Therefore, once that capability in the form of the
encrypted key for this TLB entry is presented to exokernel, exokernel, on behalf of the library OS, will
put that mapping into the specific TLB entry of the hardware TLB.
• Once this entry has been put into the TLB, the process that is going to be using that virtual page, when it
is running, can use this multiple times without exokernel intervention.
So even though putting it into the hardware TLB require the intervention of exokernel because we are messing
with hardware. Once that entry has been put in, that entry is on behalf of this library OS. And processes of that
library OS, when they are running on the CPU, can access the TLB and do the translation any number of times
because all of that is happening under hardware control. Exokernel is not in the middle of any of that.
So that gives you an idea of how, even though we are seeing that in order to do certain things in the hardware,
you need exokernel to help the library OS. The normal use of a hardware resource is not going to be in any way
affected by the fact that exokernel is in the middle between the hardware and the library operating systems.
Here is another example of a candidate resource.
• Let's say that the operating system wants to install a packet filter that needs to be executed every time a
network packet arrives on behalf of a library OS.
• Predicates for looking at this incoming packet are loaded into the kernel by the library OS.
• Now, this is a heavy-duty operation, because you're doing it with the help of exokernel.
• But once those predicates have been loaded into exokernel by the library OS, on every packet arrival
exokernel will automatically check it using those predicates.
So those are examples of candidate resources that tell you that establishing the binding may be expensive.
But once established, using the binding does not incur the intervention by exokernel and therefore it can happen
at hardware speeds.
76 - Implementing Secure Bindings

Now let's talk about the mechanisms that are there in Exokernel for implementing these secure bindings.

There are three methods.


• The first method is hardware mechanisms. One example is as shown with the TLB entry example. Other
examples of hardware mechanisms include: getting a physical page frame from exokernel or a portion of
the frame buffer that is being used by the display. These are specific hardware resources that can be
requested by the library OS and can be bound to that library OS by exokernel. The exokernel will export
an encrypted key to the library OS. Once the library OS has the encrypted key for that resource, it can use
that any time it wants.

• The second mechanism is software caching on behalf of each library OS. Specifically the Shadow-TLB,
or caching the hardware TLB in a software cache for each library OS, is to avoid the context switch penalty
when exokernel switches from one library OS to another. Basically, what will happen is that at the point
of context switch, exokernel will dump the hardware TLB into a software TLB native structure that is
associated with each specific library OS. Similarly load the software TLB of the library OS to which it is
switching to into the hardware TLB.

• The third mechanism is downloading code into the kernel. This is simply to avoid border crossing by
inserting specific code that a library OS wants executed into the kernel. I gave you the example of the
packet filter earlier. If you think about it, this idea is very similar to Spin’s idea of extending the kernel
with logical protection domains that I created and dynamically linked in.
77 - Exokernel vs Spin Question

Time for a question. In exokernel we have this mechanism of downloading code into the kernel. Spin has a similar
functionality, which is to extend logical protection domains.
The question to you is, which one of these two mechanisms compromises protection more.

78 - Exokernel vs Spin Solution

As long as SPIN's logical protection remains follows Modular-3 language enforced compile-time checking,
and run-time verification, there is no violation of protection in SPIN.
But we cannot say the same about Exokernel, because it is arbitrary code that is being downloaded into the
kernel by a library OS and that's the reason that Exokernel may end up compromising protection more than the
SPIN mechanism.
But having said that, I should mention that it's not always possible to live within Modular-3 enforced protection
domains, even in SPIN. Because we've seen that even in SPIN, in order to do certain things in the hardware, SPIN
may have to step outside the protection boundaries of Modular-3.
In other words, a reality that exists with real hardware is that it's not always possible to do this within the
confines of language-enforced protection domains.
But if you just think in terms of the logical protection domains as defined by SPIN as Modular-3 objects. Those
have strong guarantees of protection compared to arbitrary code that we can download into Exokernel.
79 - Default Core Services in Exokernel - Memory Management

When we discussed spin, I mentioned that memory management and CPU management are core services that any
operating system has to provide. We discussed how spin had its own way of dealing with those core services. We
will do the same analysis for exokernel as to how it does memory management and CPU scheduling.

First memory management. Specifically let's see how exokernel will handle a page fault incurred by a library OS.
• In this picture we have an application thread that belongs to a specific library OS. As long as this
application thread is doing normal memory accesses, where all its virtual addresses have been mapped to
physical page frames, the thread is executing at hardware speeds.
• When this thread incurs a page fault, the page fault is first fielded by Exokernel.
• Exokernel knows which library OS is currently executing on the CPU. It can kick it up to the library OS
through a registered page fault handler.
• Servicing the page fault involves requesting a page frame from the Exokernel to host the missing page.
• The Exokernel creates a binding for a page frame and return an encrypted key to the library OS.
• Now the library OS has page frames and it will establish a mapping between the virtual piece that was
missing and the page frame returned from the Exokernel.
• Then the library OS will try to update the TLB, but the library cannot do that directly into the TLB.
• Updating the hardware TLB is a privileged operation, meaning that it can be done only in the kernel mode
of the processor. That's the reason that you have the red line between the library OS that runs at the non-
privileged level, and Exokernel that runs at the privileged level to do certain operations such as, installing
an entry into the TLB.
• So it presents a mapping to Exokernel along with the encrypted key that shows the capability of the library
OS to a specific TLB entry. Exokernel will validate the encrypted key and it will go ahead and install the
mapping in the hardware TLB.
• Once the entry is installed in the TLB, if the library operating system is once again scheduled on the
processor and if the same process is run by the library operating system when it generates the same virtual
address we're going to find a valid mapping and life will be good.
80 - Secure Binding

Downloading code into the kernel and secure binding, how is this secure?
This is a bit dicey.
Here, the library operating system is given an ability to drop code into the kernel.
The rationale is purely a performance one, namely, to avoid border crossings.
But obviously it can be a serious security loophole. Even in Spin we've noticed that a core service may have
to step outside the language enforced protection mechanism, in order to control hardware resources.
Bottom line is, while both SPIN and Exokernel started out with the idea of allowing extensibility, they may
have to necessarily restrict who will be allowed to do such extensions.
Not any arbitrary user. It has to be a trusted set of users.
81 - Memory Management Using S-TLB

I mentioned software caching as a mechanism that's available in exokernel for establishing secure binding. And
software-TLB is one specific example of using the software caching idea.

• When we have a context switch, one of the biggest cause of performance loss is the fact that you're losing
locality for the newly scheduled process.

• Since the address space occupied by different library OSes could be completely different.

• We may have to flush out the entire TLB.

• Then when we run this other library OS, it will have a lot of TLB miss. That's a huge source of overhead.
In order to mitigate that overhead, exokernel has this mechanism called software-TLB.
The software TLB is sort of a snapshot of the hardware TLB for each of the library OS.
This software TLB is a data structure in the exokernel that represents the mappings for library OS-1. Similarly,
this data structure represents the mapping for library OS-2.
• Say we are running library OS-1, so the TLB entries correspond to valid mappings for library OS-1.

• Say exokernel decides to switch from this library OS-1 to OS-2.

• The exokernel will dump the TLB into the software-TLB data structure that it holds on behalf of OS1. (In
fact, only some subset of the TLB mappings will be dumped into this data structure)

• Meanwhile, the exokernel will to pre-load the TLB with the software-TLB data set that is associated with
library OS.

• As a result, when library OS-2 starts running on the CPU, it will find some of its mappings already present
in the hardware TLB. That's how exokernel uses S-TLB to mitigate the loss of locality during context
switch, in terms of address translations.
82 - Default Core Services in Exokernel - CPU scheduling

The second core service is CPU scheduling.

Exokernel maintains a linear vector of time slots.


Time is divided into these epochs T1, T2, T3 and so on, and every time quantum has a begin and an end. These
time quantum represent the time that is allocated to the library OS living on top of exokernel.
The time quantum is bound by the begin and end markers for each library OS.
Each library OS gets to mark its time quantum at startup in the linear vector.
So CPU scheduling in exokernel is essentially looking at this linear vector of time slots and asking the question,
which library OS should be running in a particular time quantum of the processor?
Let's say, OS-1 is now running on the CPU. When the timer interrupt goes off at this endpoint, control is
transferred by exokernel to the library OS to do any saving that it has to do of the context. But OS-1 only has a
limited time to do so.
If an OS misbehaves, say OS-1 takes more time than is allowed to at the point of this context, Exokernel will
remember that OS-1 misbehaved and it will take time off of OS-1 the next time it is scheduled, because there's a
penalty associated with exceeding the time quantum.
The time quantum is bound.
During this time quantum, OS-1 has complete control of the processor. (unless something like a page fault
occurs)
At the end of the time quantum, the time integer goes off. Exokernel feels it and kicks it up to the operating
system and tells the operating system to clean up its act and save any context it wants, so that the CPU can be re-
allocated to the next library OS.
That's where the time is bounded, as to how much time the library operating system can take in order to do
that saving of the context.
83 - Revocation of Resources

Exokernel has mechanisms for securely giving resources to the library OS. The resources may be a memory
resource, time resource, or specific hardware resource like area of the graphic display and so on.

Still, Exokernel needs a way of revoking or reclaiming resources from a library OS.
• Exokernel keeps the scoreboard as to what resources have been allocated to different library OS. Therefore,
Exokernel can revoke the resources from a library OS at any time.

• Exokernel has a revoke mechanism for this purpose. The revoke call is an upcall into the library OS, which
gives this library OS a repossession vector, saying these are the resources I'm taking away from you.

• When the Exokernel gives those repossession vector to the library OS, it is a responsibility of the library
OS to do what it needs to do in order to clean up.

• In other words, the library takes corrective action commensurate with the repossession vector that has
been presented to it by the exokernel.

• For example, if the exokernel tells this library OS that I'm going to take away page frame number 20 and
page frame number 25 from you, then the library OS will say, oh I have to stash away the contents of
those page frames into the disk. That's the corrective action that the library OS will have to take when it
is informed by exokernel that the exokernel will reclaim some hardware resource.
To make the life of the library OS easier, exokernel also allows a library to seed it with autosave options for
resources that exokernel wants to revoke.
• In other words, if exokernel decides to revoke some page frames from the library OS, the library OS could
have seeded the exokernel ahead of time that any time you want to take away these page frames, dump it
into the disk for me.
• Seeding allows the exokernel to do the work on behalf of the library OS, so that at the point of revocation,
the amount of work that the library has to do is minimal.
84 - Code Usage by Exokernel Question

85 - Code Usage by Exokernel Solution

Here are a couple of examples. I'm sure that you may have thought of other examples as well.
Packet filter is one thing that I mentioned already to you. This is something that may be a critical component of
performance for any operating system. Therefore, it might install a packet filter for demultiplexing incoming
network packets so that exokernel can hand packets intended for a particular library OS by running this code on
behalf of the library OS.
A second example would be things that a library would like exokernel to do on its behalf, even when it is not
currently scheduled. For instance, a garbage collection mechanism for an application is something that a library
OS may want the exokernel to run on behalf of it. And that's something that can be installed as a code that is
downloaded into exokernel and executed on behalf of that library operating system.
So these are all examples, you may have thought of other examples as well.
86 - Putting it all Together

So let's put it all together the mechanisms exokernel offers, and how library OS can live on top of this red line,
which is the protection boundary for exokernel. Meaningfully, execute applications that belong to it on the
hardware resources, and not interfere with one another. That is to achieve extensibility, protection, and
performance.

Exokernel has been the broker in giving some capabilities for some hardware resources to specific library OS.
(e.g. memory access management, address translation, TLB update, etc)
Exokernel has also been the broker for downloading some code specific to library OS into the exokernel code
base itself. (e.g. packet filter)
There could be this discontinuities when the user application runs, such as system call, page fault, exception
(divided by 0) or external interrupt.
All these will cause a discontinuity to this execution of this process on the CPU and result in the CPU incurring
a fault or a trap, and the trap is fielded by exokernel.
Exokernel knows which library OS is currently running on the CPU, based on the linear vector of time I holds.
So the exokernel will pass the program discontinuity to the appropriate library OS that is living on top of it.
To facilitate a finer-grain association between these different kinds of discontinuities and the specific handler
in the library OS, the exokernel maintains state for each currently existing library OS and we will discuss the state
maintained by exokernel. On behalf of every library operating system next.
87 - Exokernel Data Structures

To facilitate the bookkeeping needed for the different types of program discontinuities, exokernel maintains the
PE data structure on behalf of each library OS.
• The PE data structure contains the entry points in the library OS for dealing with the different kinds of
program discontinuities. For example, exceptions handler (EXC) has a specific entry point in the library
OS. Similarly, external interrupts (INT), system calls (SYS), page fault (MMAP) all have specific
handlers in the library OS.
• So in a nutshell, this PE data structure that is unique for every library OS, contains the handler entry points
for different types of events that are fielded by exokernel.
In this sense, the PE mechanism is very similar to the event handler mechanism that we discussed in the Spin
operating system.
• We already mentioned the software-TLB, that exokernel maintains on behalf of every library OS and
these are all guaranteed mappings. There is an entry point for S-TLB in the PE data structure as well,
which tells the exokernel the set of guaranteed mappings that a particular library OS wants exokernel to
maintain on its behalf.
I mentioned an external interrupt is a source of discontinuity for the currently executing process.
• It is possible that an external interrupt may not always be for the currently scheduled library OS. It may
be for some other library OS.
• For example, maybe OS-1 scheduled a disk I/O and the disk I/O got completed when OS-2 is running.
• Exokernel needs a way of associating the interrupt with the correct library OS that it is intended for.
Downloading code into the kernel, which I mentioned as one of the mechanism that exokernel provides, allows
first level interrupt handling on behalf of a library operating system.
88 - Performance Results of Spin and Exokernel

Systems research is 1% inspiration and 99% perspiration. The only way to convince someone of your ideas is to
build a system and evaluate it.

Now, how do you go about evaluating a system such as Spin or Exokernel?


You can qualitatively argue that you're achieving extensibility without sacrificing safety. Simply by the way the
system is constructed. But you also have to show that you're not losing out on performance, due to the extensibility
hooks.
For Spin and exokernel, the competition at the time was UNIX as a Monolithic example, and Mach as a Micro
kernel example. And the performance questions always center around space and time.
• For example, how much better timewise, is the extended kernel (Spin or exokernel) compared to a
microkernel-based? We know an extended kernel may have to incur loss of locality and border crossing
overhead and so on compared to a monolithic kernel, another question that we may want to ask is, is the
extended kernel at least as good as a monolithic kernel?
• What is the code size of implementing a standard operating system, say Unix, as a monolithic operating
system, or a micro kernel based operating system or an extended kernel operating system? So that's a
space question.
I encourage you to read the performance results in the papers.
Key take away that you will see when you read the performance results reported by both SPIN and exokernel, is
that they do much better than Mach microkernel for protected procedure call.
That is, when you go from one protection domain to another, how well are you doing? For that, both SPIN and
exokernel exceed the performance of Mach. And you also see that both SPIN and exokernel do as well for dealing
with system calls as a Monolithic kernel does.
89 - Introduction

Both Spin and Exokernel started out with the assumption that microkernel-based operating systems structure is
inherently poised for poor performance. Why did they start with such an assumption? Well they used a popular
microkernel of the time called Mach which was developed at CMU as an example of microkernel-based operating
system structure.

But Mach had portability as an important goal. If we keep performance as the primary goal, can we achieve that
with a microkernel based approach? In this lesson, we will look at L3, a microkernel based operating system
design that provides a contrarian viewpoint to the spin and exokernel assumption.
90 - Microkernel-Based OS Structure

Just to refresh your memory about microkernel-based OS structure.


The idea of microkernel is to provide a small number of simple abstraction, such as address space and inter
process communication.
All the system services that you normally expect from an operating system, such as the file system, memory
manager, CPU scheduling, and so on are implemented as processes above microkernel.
In other words, these operating system services run at the same privilege level as user-level applications in
their own individual address spaces.
Only the microkernel runs at the privilege level of the processor architecture.
Since all the operating system services are implemented as server's processes on top of the microkernel, they
may have to cooperate with one another in order to satisfy a particular user's request.
And in that case, they may have to talk to one another, and in order to do that, they need IPC that is provided
by the microkernel in order to accomplish what is needed by a particular system call emanating from an
application.
91 - Potentials for Performance Loss

What are the potentials for performance loss when you have a microkernel based operating system design? Well,
the main source of potential performance loss would occur at border crossings as we have seen before?
Border crossings have both an explicit cost as well as an implicit cost associated with it.

The explicit cost is the fact that from an application which is at a particular protection level, namely the user
level protection level of the processor architecture, you are slipping into the microkernel, which is at a different
privilege level. That is the explicit cost in border crossing.
In microkernel, the processes that actually provides the system services sit above the Microkernel. Therefore,
to accomplish a particular service that an application has requested, there are border crossings involved going
from the application to the Microkernel to the particular service.
For example, a system service like file system may have to consult other services such as a storage module or
a memory management module, in order to complete the requested service of the application. There are going to
be protected procedure calls that are going to be executed between services across different address space, so they
will be more expensive than normal procedure calls.
Typically, protected procedure calls can be as expensive as 100 times normal procedure calls. This is coming
up the fact that each of these services in a micro-kernel based design is assumed to be implemented in its own
address space to protect the integrity of each of these system services.
Now why are protected procedure calls that much more expensive than normal procedure calls?
• This is where the implicit cost of border crossings comes in.
• The implicit cost is that we're losing locality, both in terms of address translations contained in the
TLB, as well as the contents of the cache that the processor uses in order to access memory.
• All of those add up and make protective procedure calls between user space and kernel space much more
expensive.
92 - L3 Microkernel

The keyword when I describe the, the performance loss in micro kernel-based operating system structure is the
potential for performance loss.
What L3 microkernel does is that, by proof of construction they show that they can debunk the myths about
microkernel-based OS structure.

L3 microkernel, being a microkernel, has a minimal set of abstractions. It has address space, threads, inter-process
communication, and a service for providing unique IDs for subsystems that live on top of the microkernel. L3
argues that these are fundamental abstractions that any subsystem that lives on top of the microkernel need and
microkernel OS should provide these as the minimal set of abstractions.
The microkernel system services have to be in distinct protection domains. We also have this hard line between
the applications and the servers providing the services and the microkernel.
This is the structure of a microkernel based operating system. The key is that each of these services of the OS
needs to be in their own protection domain, not necessarily distinct hardware address spaces.
The point L3 establishes by proof of construction, is that there are ways to construct a microkernel based OS
to provide these services efficiently, knowing the features of the hardware platform.
93 - Strikes Against Microkernel

What are the strikes against a microkernel based design?

1. The first strike is the border crossing cost going between kernel and the user and vice versa. And this
you have to do every time a user-level process makes a system call. You have to go through the kernel.
That's the border crossing cost. So that is the first explicit cost that can be a strike against microkernel if
this were to happen too often. (i.e. how many instructions/cycles are needed to perform crossing)

2. The second strike against a microkernel based design is address space switches. When application needs
any system service, that may involve the servers living above the microkernel having to talk to one another.
Protected procedure call is the basis for cross protection domain communication. So here's the protection
domain, a file system. Here is another protection domain, the storage module. If the file system has to get
some service out of the storage module in order to serve the original request from the application process,
that communication is implemented as a protected procedure call. And going across hardware address
spaces minimally involves flushing the TLB of the processor in order to make room for the TLB entries
of the domain that we're entering. But the key point is, there's an explicit cost involved in going from one
address space to another address space, and that is the second strike against a microkernel based design.

3. The third strike against microkernel-based design is the cost for doing thread switches. These thread
switches have to be mediated by the kernel. If the file system needs to make a protected procedure call to
the storage module, in order to complete an application level request for an operating system service, that
involves the file system having to be mediated through the microkernel in order to go and execute some
functionality in the storage module. So that involves a thread switch, and IPC. So in other words, the basis
for protected procedure call is thread switches and inter-process communication, which has to be
mediated through the kernel. And this kernel mediated thread switching and IPC can be expensive.

4. In addition to all of these explicit costs, there could be a fourth cost, which is the implicit cost. This is
due to the memory subsystem and the loss of locality that can happen when we are going across address
spaces, i.e. loss of locality in TLB and cache.
94 - Debunking User Kernel Border Crossing Myth

The first myth that caused the performance loss in microkernel design is the border crossing myth.
That is the user space to kernel space switching.

By proof of construction, L3 accomplishes this border crossing in 123 processor cycles. This includes the TLB
misses that may be incurred because of the fact that we're going from user space to kernel space as well as cache
misses that may be incurred because we are starting to execute some code in the microkernel.
L3 also tried to calculate what will be the minimum cost involved in doing this border crossing for a particular
processor architecture? And they showed that the minimal number of processor cycles needed is about 107.
So L3 was able to achieve border crossing at a cost that is close to what it would take minimally in the processor.
So that is the proof that microkernel-based design is not going to be incurring any more overhead for border
crossing. But CMU's Mach operating system, on the same hardware, takes 900 cycles to complete the border
crossing.
Spin and exokernel used Mach as the basis for decrying microkernel based design, saying that, well, border
crossing in microkernel based design is expensive because it takes 900 cycles for Mach to do border crossing.
But what L3 has shown is that it need not take this much time, it can be done in much shorter amount of time,
which is close to what the best a hardware can actually do.
95 - Cycles Question

96 - Cycles Solution

The right choice is the design priorities of Mach.


First of all, I already mentioned that the number that I showed you for Mach's performance for user to kernel
switch is evaluated on the same processing hardware as L3. Therefore there is no cheating here.
The Liedtke may be smaller but that's not the reason either.
Microkernels are not slow by definition, but it is because of the fact that the design priorities of Mach was
different. Mach's design priority was not just extensibility of an operating system, but also portability.
But at this point I just want you to understand that the reason why Mach took significantly more time for this
border crossing compared to L3 micro kernel, has nothing to do with the structure of the microkernel, it is only
the design priorities.
97 - Address Space Switches

The second myth concerns going across protection domains and especially if the protection domains are
implemented as distinct hardware address spaces, then the myth is that crossing protection domains implemented
using hardware address space switch is expensive.
We are now talking explicit cost for crossing protection domains implemented as distinct hardware address spaces.

The virtual address in TLB consists of two parts: an index and a tag. The index is used to look up the TLB and
the tag is used for match check. If they match, then we got a hit, then this particular virtual address has a valid
entry in the TLB and you can get its corresponding PFN directly.
On a context switch going from one address space to another address space, the virtual address to physical address
mapping will change for the new process.
Say this TLB contains the translations for a particular process now, after the context switch, do we have to flush
the TLB?
The answer to the question is: it depends. It depends on whether TLB support process ID tags
Some architecture supports address space tags in TLB, where each TLB entry use some bits for the process ID.
MIPS architecture uses address space tagged TLBs. However, some architecture may not use an address space
tagged TLB, such as Intel 486, and Intel Pentium.
So, in an Intel architecture, at the point of the context switch, you have to throw away all the entries that are there
in the TLB on behalf of the process. In the Intel architecture, actually the TLB is split into two parts, a user part
and a kernel part.
• The kernel part is common regardless of which process is running and you don't have to flush the kernel
portion of the TLB.
• But the user portion of the TLB has to be flushed on a context switch because the virtual address, the
physical address mapping, is going to be different for the new process that starts to run on the processor.
98 - Address Space Switches With As Tagged TLB

In an architecture that supports address space tags in the TLB, when we make an entry into the TLB, we will store
the “tag” as well as the “PID” in the TLB entry.

So how does address translation work in this address space tagged TLB?
• Similar to a normal TLB, we're going to take this virtual address and split it in two parts, the index part
and the tag part.

• The index part is used to look up a particular entry in the TLB.

• For the tag part, we have two tags.

• One tag is the address space tag. This is signifying which process created this particular entry. So we can
compare the PID of the process that is currently generating this virtual address against the tag that is
contained at that entry. The matching is checked in hardware. If it’s not a match, this entry does not
correspond to the virtual address that we're trying to translate here.

• If it does match, this entry does correspond to this process, then we match the actual tag bits in the virtual
address. Only if both the process ID and the tag matches, we have a hit in the TLB.
Therefore, when we do a context switch from one process to another process, there is no need to flush the TLB
on the context switch, because the TLB may contain some translations of the new process already, if it has
executed before.
The hardware disambiguates these entries by checking the match of process ID.

But what do we do if the memory management hardware does not support address space tag.
99 - Liedtke’s Suggestions for Avoiding TLB Flush

Liedtke, the author of the L3 microkernel, suggests tricks for exploiting the hardware and avoiding TLB flushes,
even if the TLB is not address space tagged by taking advantage of whatever the architecture offers you.
For example, the architecture like x86 and PowerPC both offer segment registers.

What the segment registers do is give an opportunity for the OS to specify the range of virtual addresses that
can be legally accessed by a particular process.
For example, in the linear address space provided by the hardware, starts from zero to a max (e.g. on a 32-bit
architecture, you have 2^32 as the maximum).
If the architecture (e.g. PowerPC), offers segment registers to bound the range of virtual addresses that can be
legally generated by a running process, then we can segment registers to define a protection domain.
For example one process running in protection domain S1 can only use virtual address in the top (purple)
range and any other virtual address from this process is illegal. It will be ensured by the hardware.
The segment registers are hardware-provided facility for bonding the range of legal virtual addresses that can
be generated by this protection domain.
Similarly, we can have another protection domain S2, and so on.
Once we limit each process to different protection domains, there is no need for flushing the TLB on a context
switch. Because different processed will use different virtual addresses, i.e. each TLB entry is exclusive to one
particular process.
The segment bounds that we're talking about are hardware-enforced.
It works really well, if the protection domain is small.
100 - Large Protection Domains

You may ask, what if the protection domain is so large that it needs all of the hardware address space?
• Maybe the file system code base is so big that it needs the entire hardware address space, or the code base
for the storage module is so big that there is no room in the address space for a second protection
domain.
• In such cases, at context switch, we have no choice but to do a TLB flush because the memory footprint
of each of these service is as big as the hardware address space that's available on the processor.
• Or in other words, the segments overlap.

The cost that we've been talking about so far is the explicit cost, that is, the cost that is incurred for flushing the
TLB at the point of a context switch.
• For small protection domains that we want to go across, we want to avoid that cost and we can do that by
packing them into different segments in the linear address space provided by the hardware.

• But if the memory footprint of the service is so big that it occupies the hardware address space of the
architecture completely, the explicit cost cannot be avoided, if the TLB is not address space tagged.

After all, those explicit cost is very insignificant compared to the implicit cost that is going to be incurred.
What do we mean by implicit cost? We mean cache effects, i.e. the loss of locality going from this service to this
service, that the cache is not going to have the working set of the new service process.
That impact is much more that the explicit cost. For example, Liedtke shows that on the specific architecture in
which they implemented L3 (Pentium architecture, with 32 entries for kernel TLB and 64 entries for user TLB),
even if you want to flush all the entries in the TLB, it only takes 864 cycles to do that. But, the loss of locality,
when a service goes from this to this, in terms of cache effects is going to be much more significant because, with
a large address space you expect that you're doing more work in the subsystem and the implicit costs are going to
dominate.
101 - Upshot for Address Space Switching

So the upshot of address space switching is you have to really ask the question, are we switching between small
protection domains or large protection domains?

If you are switching between small protection domains, then by taking advantage of whatever the hardware gives
you, you can make the switching between these small protection domains efficient by careful construction of
the services.
On the other hand, if the switch is from one large protection domain to another large protection domain, the
explicit cost of switching from one hardware address space to a different hardware address space is not that
significant, because the loss of locality when you go from a large protection domain to another large protection
domain, both in terms of TLB misses as well as the cache effects. The implicit cost is much more significant than
just the switching cost going from one large protection domain to another large protection domain.
So this is the way, the address-based switching myth, is debunked by the L3 microkernel, by construction.
102 - Thread Switches and IPC

The third myth about microkernel based design is that the thread switches and inter process communication can
be very expensive.

By thread switch we mean, if you are executing one thread in a particular protection domain and going to execute
another thread in a different protection domain, how much time does it take for this thread switch to be completed
by the microkernel?
Here again, we're only talking about the explicit cost of doing the thread switching.
The explicit cost includes saving all the volatile state of the process/thread that is running on the processor, its
modified the registers of the CPU.
All of that has to be stashed away in the thread context block before we can schedule the second thread to run
on the processor.
The cost that we're talking about is saving all the volatile state of the process.
What L3 does is once again, by construction, it shows that the thread switch time in L3 microkernel is as
competitive as SPIN and exokernel.
So, once again by construction, L3 debunks that myth that microkernel based OS structure is going to be more
expensive for thread switching compared to SPIN or exokernel or even a monolithic operating system.
103 - Memory Effects

The next myth is regarding memory effects, which asserts that the loss of locality in a microkernel based design
is much more than in a monolithic structure, or Spin and Exokernel.

• In the process of architecture, you have the CPU, you have the TLB, and you have several levels of caches
and main memory. The caches are typically physically tagged.

• If you have this hardware address space in the memory, we know that the entire hardware address space
may not even be in physical memory, because in a demand-based system, when a process is executing,
the pages that it needs will be demand-paged from the disk and brought into physical memory.

• When the CPU is executing instructions, the instructions and data contained in physical memory move
into the memory hierarchy close to the CPU, i.e. caches, so that the CPU can access the working set of the
currently running thread faster.

• That's the hope in this memory hierarchy.


What we mean by memory effects? When we context switch between protection domains, how warm are the
caches?
Now recall that if we have small protection domains. P1, P2, P3, P4. Let's say they are all small protection
domains so that we can use segment registers. So when we have small protection domains, then, the caches are
going to be pretty warm. Even when you do a context switch from one process to another process. There's a good
chance, that the cache hierarchy is going to contain the working set of the newly scheduled small protection
domain. So in other words, the memory effects can be mitigated by carefully structuring the protection domains,
in the hardware address space.
So debunking the second myth with respect to address space switching, also helps in reducing the ill effects of
implicit costs, associated with address space switching, because these small protection domains. Occupy only a
small memory footprint, and therefore occupy only a small memory footprint in the caches as well. And therefore,
when we go across small protection domains, there's a good chance that, the locality for the newly scheduled
small protection domain is going to be contained in the cache hierarchy. So the caches are probably going to be
warm, when we do context switches, so long as the protection domains are small.
We already mentioned that if the protection domains are large, you cannot avoid that.
Whether it is a monolithic kernel, the Exokernel, or the Spin type of extensibility, if the memory footprint of
the system service is so big, it is going to pollute the cache when we invoke that particular system service.
So even if we have a monolithic kernel, and the monolithic kernel has subsystems, that occupy a significant
portion of the hardware address space, even though we are not doing any context for it, the ill effects of implicit
costs in the memory hierarchy is going to be felt.
Because the cache is after all a physically tagged and therefore, when we go from large protection domains, or
large subsystems in the context of a monolithic kernel, you have to incur the cache pollution.
This ill effect is unavoidable for large protection domains.
So the only place where a monolithic/Spin/exokernel kernel can win, is in small protection domains. But we also
just showed that a microkernel can also win for a small protection domain, by packing multiple small protection
domains in the same hardware address space.
So this begs the question: Why was it so bad in Mach?
104 - Reasons for Mach’s Expensive Border Crossing

The reason for Mach's expensive border crossing is because Mach is focused on portability.

What that means is that the Mach microkernel is architecture independent to allow that Mach microkernel to run
on several different processor architectures.
So there is code bloat in the Mach microkernel because it has to have the personality for any architecture that it
may need to run on. In particular, in the Mach microkernel, there is an architecture independent part of the
microkernel, an architecture specific part of the microkernel. The two together results in a significant code bloat,
which means it has a large memory footprint and may lead to less locality.
This is the reason why you have a longer latency for border crossing in Mach, as opposed to the theoretically
smallest number of processor cycles in order to go from user space to kernel space.
As I mentioned earlier, Liedtke did back-of-the-envelope calculation on the same processor hardware and said
that it should take about 107 cycles to go from user space to kernel space. And then he implemented L3
microkernel and showed that it took only 123 processor cycles to do that.
The reason for Mach's expensive border crossing is all due to the fact that there is code bloat, resulting in more
cache misses and therefore incurring longer latency for border crossing.
So in other words, Mach kernel’s big memory footprint is the culprit for the expensive border crossing.
It is not the philosophy, or the principle, of microkernel based operating system design, because by proof of
construction, L3 has shown that you can have very minimal border crossing overhead by careful construction.
Another way of saying it is that portability, which was one of the main concerns in Mach design, and performance,
are at logger heads with each other.
So if you focus on portability, you may sacrifice performance, and that's what we saw when we look at the
difference between Mach's performance and L3's performance on the same processing hardware.
105 - Thesis of L3 for OS Structuring

So, L3 Microkernel serves to debunk the myth about Microkernel-based OS structure. It goes beyond that, and it
has a thesis for how an OS should be structured.

The first principle advocated by L3 is that the Microkernel should have minimal abstractions that includes
support for
• address spaces
• threads
• inter-process communication
• generating unique IDs.
Why do we need these abstractions in the microkernel? The argument is that these four abstractions are needed
by any subsystem that provides a functionality for end users in an OS. Therefore, the principle of optimizing the
common case suggests that these abstractions should be part of any microkernel.
The second thesis coming out of L3 microkernel experience is that microkernels are processor specific in
implementation. In other words, if you want an efficient implementation of microkernel, you have to take
advantage of whatever the hardware is offering you, which suggests that microkernels by the very nature are non-
portable if you want to focus on performance. Making a microkernel processor independent is essentially
sacrificing performance.
So if you put these two principles together, what L3 is advocating is the right set of kernel abstractions and
processor-specific implementation. If you do that, then you can build efficient processor-independent abstractions
at the upper layers. All of the services that we associate in a monolithic kernel like a UNIX operating system such
as, file system, network protocols, scheduling, memory management. They all can be implemented in a processor
independent way on top of a microkernel that provides the right set of abstractions and exploits whatever the
hardware gives in terms of capabilities to get an efficient processor specific implementation. That's the thesis of
L3 microkernel, processor specific kernel and process the independent abstractions at higher layers of the
operating system stack.
106 - Conclusion

Research on the structure of operating systems in the mid' 90s, as exemplified by the three research papers we
studied in this course module, led to fascinating innovations in the operating system structure.
It is important to note that all three systems: Spin, Exokernel and L3 microkernel were done contemporaneously.
In this sense, they mutually informed each other and laid the basis for several innovations in operating system
structuring.
For example, many modern operating systems have internally adopted a microkernel based design.
Similarly, technologies such as dynamic loading of device drivers into the kernel, have come out of the thoughts
that went into extensibility of operating system services.
The next lesson we're going to look at is a logical progression of the idea of extensibility, namely virtualization.

You might also like