The Windows Operating System

Used to support MIPS, PowerPC and Alpha Currently supports x86, ia64, and amd64 Multiple vendors build hardware

POSIX, OS2, and Win32 subsystems
OS2 is dead POSIX is still supportedseparate product Lots of Win32 software out there in the world

High performance
Anticipated PC speeds approaching minicomputers and mainframes Async IO model is standard Support for large physical memories SMP was an early design goal Designed to support multi-threaded processes Kernel has to be reentrant

Process Model
Threads and processes are distinct Process:
Address space Handle table (Handles => file descriptors) Process default security token

Execution Context Optional thread-specific security token

Who you arelist of identities
Each identity is a SID

Also contains Privileges

Shutdown, Load drivers, Backup, Debug

Can be passed through LPC ports and named pipe requests

Server side can use this to selectively impersonate the client.

Object Manager
Uniform interface to kernel mode objects. Handles are 32bit opaque integers Per-process handle table maps handles to objects and permissions on the objects Implements refcount GC
Pointer counttotal number of references Handle countnumber of open handles

Implements an object namespace
Win32 objects are under \BaseNamedObjects Devices under \Device
This includes filesystems

Drive letters are symbolic links

\??\C: => the appropriate filesystem device

Some things have other names

Processes and threads are opened by specifying a CID: (Process.Thread)

Standard operations on handles

CloseHandle() DuplicateHandle()
Takes source and destination process Very useful for servers

WaitForSingleObject(), WaitForMultipleObjects()
Wait for something to happen Can wait on up to 64 handles at once

Security Descriptors
Each object has a Security Descriptor
Discretionary Access Control List List of SIDs and granted or denied access rights

System Access Control List List of SIDs and access rights to be audited

Access Rights
typedef struct _ACCESS_MASK { USHORT SpecificRights; UCHAR StandardRights; UCHAR AccessSystemAcl : 1; UCHAR Reserved : 3; UCHAR GenericAll : 1; UCHAR GenericExecute : 1; UCHAR GenericWrite : 1; UCHAR GenericRead : 1; } ACCESS_MASK;

Security Use
Objects are referred to via handles Security checks occur when an object is opened
Open requests contain a mask of requested access rights If granted to the token by the DACL, the handle contains those access rights

Access rights are checked on use

Just a bit testvery fast

Object Open
evt = OpenEvent(EVENT_MODIFY_STATE, FALSE, "SomeName"); Finds the event object by name Walks the DACL, looking for token SIDs Keeps looking until all permissions are granted If access is granted, inserts a handle to the object into the processs handle table, with EVENT_MODIFY_STATE access

Object Use
SetEvent() requires EVENT_MODIFY_STATE access, and an event object. The kernel looks up the handle in the processs handle table. Checks to make sure that it maps to an event object, and that the granted access bits contain the EVENT_MODIFY_STATE bit. If all is good, the event is set.

WaitForSingleObject() requires a synchronization object (like an event) and SYNCHRONIZE access. evt maps to an event object SYNCHRONIZE access was not requested when the handle was inserted. Even if the DACL permits it, the wait fails.

Types of Objects
State is set or clear. Can clear when a wait completes (auto-reset)

Can be acquired by a single thread at a time. Automatically release when owner exits.

Maintain a count Waits decrement the count

More objects
Threads, Processes, Timerslike events Registry Keys
Manipulate data in the registrycentralized store of system configuration info.

LPC Ports
Fast local RPC Security tokens can transfer over LPC calls


Files & IO
File objects maintain a current offset, and a pointer to the underlying stream. Default internal model is asynchronous
Synchronous IO just waits for the IO to complete Async IO can set an event, or run a callback in the thread which queued the IO, or post a message to an IO completion port.

Each request is an IRP

Maintain state of IO requests, independent of the thread working on the IO IRPs are handed off through the device stack to their destinations
Threads process IRPs Initiating thread processes the IRP until a device returns STATUS_PENDING Subsequent processing can be done in kernel worker threads

IRQLInterrupt Request Level:
Processor is running threads All usermode code is at IRQL 0

1 => APC_LEVEL; threads, APCs disabled 2 => DISPATCH_LEVEL

Running as the processor: cant stop! Cant take a page fault Only locks available are KSPIN_LOCKs

3-26 => Device Interrupt Service Routines
Device interrupts are mapped to an IRQL and an interrupt service routine; ISR is called at that IRQL

27 => PROFILE_LEVELprofiling 28 => CLOCK2_LEVELclock interrupt 29 => IPI_LEVELinterprocessor interrupt

Requests another processor to do something

30 => POWER_LEVELpower failure 31 => HIGH_LEVELinterrupts disabled

Hardware signals an interrupt Interrupts ISR runs at device IRQL
Has to be fast; get off the processor and allow other ISRs to run Typically queues a DPC, acknowledges the interrupt, and returns

DPCDelayed Procedure Call

Further processing at DISPATCH_LEVEL Queues work to kernel worker threads

IO Completion
Driver calls IO Manager to complete the IRP IO Manager queues a kernel mode APC to the initiating thread APC: Asynchronous Procedure Call
Kernel mode APC preempts thread execution Writes data back to user mode in the context of the thread which initiated the IO Signals completion of the IO

IO Cache
Classic: block cache
Page mappings translate directly to blocks on the underlying partition.

Windows: stream cache

Page mappings are offsets within a stream. IO Cache Manager uses the same mappings. All cache management (trimming) is centralized in the memory manager All modifications show up in mapped views.

Virtual Memory
Sectionsanother object type
Can be created to map a file Can also be created off the pagefile Optionally named, for shared memory

Range of VA which will not be handed out for some other purpose

VA which actually maps to something

Aside: CreateProcess
Just a user mode Win32 API { NtCreateFile(&file, szImage); NtCreateSection(&sec, file); NtCreateProcess(&proc, sec); NtCreateThread(&thrd, proc); } WaitForSingleObject(proc);

Virtual Memory
Memory Manager maintains processorspecific page table entry mappings.
Some parts of the address space are shared between processesfor instance, the kernels address space and the per-session space.

On a pagefault, mm reads in the data Pages can be mapped without the appropriate access what to do?

With threads, signals dont work very well. Some software designs expect to touch inaccessible memory.
Large structured files Concurrent garbage collection SLists

Single global handler has to somehow know about all possible situations.

Structured Exception Handling

Exceptions unwind the stack
Almost like C++! C++ matches against a type hierarchy SEH calls exception filter codefilters are Turing-complete.

Two ways to deal with exceptions:

try/finally try/except

res = AllocateSomeResource(); try { SomeOperation(res); } finally { if (AbnormalTermination()) { FreeSomeResource(res); } } return res;

try { SomeOperationWhichMayAV(); } except (Filter( GetExceptionCode(), GetExceptionInformation())) { DoSomethingElse(); }

A code indicating the cause of the exception

Additional code-specific info The full processor context

Filter decides what to do


Structured Exception Handling

auto structs, pointing to handler code pushed by function prolog popped by function epilog

On exception, RtlDispatchException() walks the list.

Runs the filters to figure out what to do Calls handler functions

Structured Exception Handling

On x86, theres some overhead with pushing and popping the registration record On ia64, there is no overhead
Stack traces are reliable Its always possible to look up the handler

Exception handling is very slow

Especially on ia64

Used only for truly exceptional conditions

Structured Exception Handling

Used in kernel mode too!
Most user mode access will just work Still need to validate address ranges & data Works great for SMP when another thread might be in the middle of modifying the address space Expected read exceptions are returned as status codes from system calls Expected writes are returned as SUCCESS Unexpected => buggy kernel => blue screen

Top-level Exception Filter

Top frame on each thread defines a catchall exception filter Top-level exception filter:
Notifies the debugger (if being debugged) Launches a just-in-time debugger (if set up) Loads faultrep.dll to report the failure

faultrep.dll offers to report the failure back to Microsoft We analyze the failures
A significant number are recognized instantly; we can tell the user what happened and how to fix it. The others go through the standard triage process; developers analyze the dumps and figure out what happened.

67 million machines running XP Tens of thousands of drivers Over 100 drivers on any given machine One bug in one driver => Crash A significant number of crashes come from third-party drivers (some of which ship on the CD) Lots of different problems, though

Driver Verifier
Controlled by verifier.exe Special-pools allocations
Detects allocation overruns & use after free

Validates some behaviors

IRQLtouching paged memory? DMA buffers

Can inject failuresuseful for testing behavior under sub-optimal conditions

Every night, a couple hundred machines run stress on the latest build Stress exercises filesystems, memory, GUI, scheduler, &c, trying to uncover lowmemory handling problems and race conditions Every morning, the stress test team triages failed machines Developers debug the failures


