Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

US006163812A

United States Patent [19] [11] Patent Number: 6,163,812


Gopal et al. [45] Date of Patent: Dec. 19, 2000

[54] ADAPTIVE FAST PATH ARCHITECTURE P. Cao et al., “Implementation and Performance of Inte
FOR COMMERCIAL OPERATING SYSTEMS grated Application—Controlled File Caching, Prefetching
AND INFORMATION SERVER and Disk Scheduling”, Nov., 1994.
APPLICATIONS C. Pu et al., “Optimistic Incremental Specialization: Stream
lining a Commercial Operating System”, 1995.
[75] Inventors: Ajei Gopal, Fort Lee, N.J.; Richard B. Bershad et al., “Extensibility, Safety and Performance in
Neves, Tarrytown, NY; Suvas the SPIN Operating System”, 1995.
Vajracharya, Boulder, Colo.
W. Richard Stevens, “UNIX® Network Programming”
[73] Assignee: International Business Machines Prentice Hall PTR, EngleWood Cliffs, NJ 07632, pp.
Corporation, Armonk, NY. 206—212.

Primary Examiner—Kenneth R. Coulter


[21] Appl. No.: 08/954,710 Attorney, Agent, or Firm—Wayne L. Ellenbogen
[22] Filed: Oct. 20, 1997
[57] ABSTRACT
[51] Int. Cl.7 ................................................. .. G06F 15/163
[52] US. Cl. ............................................................ .. 709/310
A general, event/handler kernel extension system is imple
mented. NetWork server extension architecture isolates and
[58] Field of Search ................................... .. 709/300, 303,
exploits the ability to derive responses on the same interrupt
709/305, 235, 310, 330; 710/260; 370/355 the original request Was received on using non-paged
memory. TCP netWork server extensions are implemented. A
[56] References Cited
technique is de?ned for facilitating immediate completion of
U.S. PATENT DOCUMENTS connection requests using pre-allocated connection end
points and describes an approach to recycling these connec
5,210,832 5/1993 Maier et a1. .......................... .. 712/227
tion endpoints. A hybrid HTTP extension implemented
5,561,799 10/1996 Khalidi et a1. 707/200
5,724,514 3/1998 Arias ........... .. 709/235
partially in user space and partially in kernel space is de?ned
5,758,168 5/1998 Mealey et a1. 710/260 that provides explicit or transparent implementation of the
5,808,911 9/1998 Tucker et a1. 709/303 user space component and shared logging between user and
5,903,559 3/1999 Acharya et a1. ...................... .. 370/355 kernel space. A technique is de?ned for prefetching
responses to HTTP GET requests using earlier GET
OTHER PUBLICATIONS responses. Classifying of handler extensions according to
“Real—Time Local and Remote Mach IPC: Architecture and latency in deriving a response to a netWork request is
Design” Version 1.0 by Franco Travostino et al, Apr. 1994. de?ned. Tight integration With the ?le system cache is
Wallach et al., SIGCOMM, Aug., 1996, “ASHs: Applica available for sending non-paged responses from the ?le
tion—Speci?c Handlers for High—Performance Messaging”, system cache to the remote client. A complete caching
pp. 1 through 13. scheme is de?ned for protocols serving ?le contents to
Engler et al., “Exokernel: An Operating System Architecture remote clients.
for Application—Level Resource Management”, pp. 1
through 16, 1995. 29 Claims, 8 Drawing Sheets

440
\_ HTTP EXTENSION (USER SPACE)

USER 1
KERNEL \336

130
\2 AFpA AFPA‘S _ EXTENSION'S
DeriveCuchedResponse / DenvecqchedResponse
330 A 406 J
410 332

-/404
1 2O PORTABILHY EXTENSION ‘S
\ DeriveUncochedResponse
ARCHITECTURE TCP/1P
RECEIVE HANDLER 33:“ 420
212
V
#402 EXTENSIONS
1 10 DeriveRemoteResponse
\ NATIVE KERNEL
ENVIRONMENT 430
NA'ITVE EVENT
112

Q HTTP EXTENSION (KERNEL)


U.S. Patent Dec. 19,2000 Sheet 3 0f8 6,163,812
U.S. Patent Dec. 19,2000 Sheet 5 0f8 6,163,812

own
I
\

zocm Il /?n I,
NomEod
\
\
HEQ\m \od zo?w
I 0mm \
\
mam on?
I

mWS>smo [anr05 .
O
(OR3m0 10
6% 2E\\\QHEZH .QE
m
56AI8E12lO0%8:
9m2%52 8 M5EA8“O56wl0E2%
8mz:Ez o mAva Jamwmz\m VEsOmzEo ma
U.S. Patent Dec. 19,2000 Sheet 6 0f8 6,163,812

.OE
w
U.S. Patent Dec. 19,2000 Sheet 7 0f8 6,163,812

Q.5a838:
WwmIHE>.HzE_6tomQ<2;Z:oI QR

ME2QHEM35z40<:

m
oi

HE2g2EE<.5r_.9o<mgQ

on on“

2mPHE2HS@§o1.mk_<o50~w:am>E
0KE.9m3z.o2m-0w Hm52I“wzak0otmw<;é:o

Qm.53Mz02am<5:
own
6,163,812
1 2
ADAPTIVE FAST PATH ARCHITECTURE extension techniques is a softWare engineering perspective.
FOR COMMERCIAL OPERATING SYSTEMS Library operating systems provide a “take it or leave it”
AND INFORMATION SERVER extensible environment for applications of a certain type
APPLICATIONS While event/handler extensibility provides loW level “build
ing block” style extensibility for individual applications or
FIELD OF THE INVENTION extensions. An example combining the tWo can be found in
The present invention relates to operating systems and D. A. Wallach et al., Application-Speci?c Handlers for
more particularly to an operating system kernel environ High-Performance Message, Proceedings from the Sympo
ment. Speci?cally, a method and apparatus are described for sium of Communication Architectures and Protocols
extending operating system function. 10 (SIGCOMM ’96), 1996, Where ASHes (Application Speci?c
Handlers) augment the library operating system approach.
BACKGROUND OF THE INVENTION
SUMMARY OF THE INVENTION
Extensibility is found in most modern operating systems,
but it is often limited to supporting neW devices, not An extension system executes code in response to oper
15 ating system events and services in a kernel environment.
specialiZing the usage of system resources. One technique
exposes operating system policies to applications via event/ Several handler routines are each executable by respective
handler extensions. This is described in B. N. Bershad et al., kernel events. Each handler exports a kernel event and
Extensibility, Safety and Performance in the Operating services to an extension architecture situated externally to
System, Proceedings of the Fifteenth Symposium on Oper the kernel environment.
ating System Principles, 1995. Asecond technique provides BRIEF DESCRIPTION OF THE DRAWINGS
adaptive in-kernel policies that manage resources based on
application usage patterns. This is described in C. Pu et al., FIG. 1 is a block diagram illustrating the relationship
Optimistic Incremental SpecialiZation: Streamlining a Com betWeen support modules and extension modules in accor
mercial Operating System, Proceedings of the Fifteenth dance With an exemplary embodiment of the present inven
25
Symposium on Operating System Principles, 1995; and P. tion.
Cao et al., Implementation and Performance of Application FIG. 2 is a block diagram Which illustrates netWork server
Controlled File Caching, Proceedings of the First Sympo portability architecture in accordance With an exemplary
sium on Operating Systems Design and Implementation embodiment of the present invention.
(OSDI ’94), pp. 165—177, November 1994. A third tech FIG. 3 is a block diagram Which illustrates netWork server
nique replaces underlying components of the operating extension architecture in accordance With an exemplary
system With neW components referred to as library operating embodiment of the present invention.
systems. This is described in D. R. Engler et al., Exokernel:
FIG. 4 is a block diagram Which illustrates control ?oW of
An Operating System Architecture for Application-Level
a remote client request in accordance With an exemplary
Resource Management, Proceedings of the Fifteenth Sym
35 embodiment of the present invention.
posium on Operating System Principles, 1995.
FIG. 5 is a block diagram Which illustrates connection
AIX is an example of an operating system that alloWs
limited extensibility in the form of neW device drivers, management for TCP/IP.
system calls, and autonomous user-installed kernel pro FIG. 6 is a block diagram Which illustrates request man
cesses. Another example, Microsoft’s WindoWs NT, pro agement for TCP.
vides the same extension abilities Without the ability to add FIG. 7 is a graphical diagram Which illustrates control
neW system calls. paths relating to cache management for ?le oriented netWork
servers.
Event/handler extensions refer to the extensibility
achieved by alloWing applications to install handler code in FIG. 8 is a block diagram Which illustrates control How
the kernel. This handler code is triggered by normal oper 45
scenarios for requesting routing in accordance With an
ating system events such as processing incoming or outgo exemplary embodiment of the present invention.
ing netWork packets. The ability to install event-triggered
DETAILED DESCRIPTION OF THE
handler code in the kernel reduces overhead due to kernel/
INVENTION
user interactions such as system calls, memory space copies
and maintenance of kernel-supplied user abstractions such The present invention relates to a kernel run-time envi
as sockets. This is especially useful When several operating ronment for authoring high performance netWork servers.
system components are used together. One aspect of the present invention relates to an Adaptive
Adaptive operating system mechanisms are useful Where Fast Path Architecture (AFPA). AFPA makes it possible for
the performance of an application might be improved by multiple concurrent servers to extend the operating system
specialiZing management of the applications resources 55 for high performance. AFPA is designed to alloW such
dynamically and transparently. An example of this is spe extensions to be portable relative to kernel environments
cialiZing the open( )/read( )/Write( )/close( ) application Where AFPA has been implemented.
programmer interfaces (APIs) on the ?y such that the Before proceeding, it is useful to de?ne several terms:
implementations are optimiZed differently for short-lived Event—This term refers to the execution of discrete code
random accesses than for long lived sequential accesses. in response to control How in a softWare system. The
Another example of adaptive extensibility is found in ?le code is typically executed as a result of a hardWare
buffer management. Instead of using the same replacement state, such as a hardWare interrupt. Examples of net
policies for all ?le buffers, the ?le buffers for a particular Work kernel events are TCP packet arrival, TCP con
application can be specialiZed for large, small, sequential, or nection establishment, and TCP disconnection. Net
random access patterns. 65 Work events are initiated by hardWare interrupt to the
Library operating systems are functionally equivalent to CPU by a netWork adapter. These hardWare interrupts
event/handler extensibility. The difference betWeen the two result in TCP protocol events.
6,163,812
3 4
Handler—A pointer to discrete code. Handlers are some according to the operating system operating Within native
times referred to as callback functions. That is, func kernel environment 110. Thus, portability handlers 122 are
tions that are speci?ed and called at a later time When operating system dependent.
desirable. Operating systems often require program As previously stated, the arroWs in FIG. 1 represent the
mers to implement callback functions. For example, a How of event occurrence and handler invocation. Thus,
signal handler executes functions Written by a program native kernel environment 110 exposes native events 112 to
mer When a signal occurs. In short, a handler is just a the portability handlers 122 Within netWork portability archi
piece of code Written a priori by a programmer and tecture 120. Arc 118 illustrates the event occurrence and
executed later on When a speci?c event occurs on handler invocation relationship betWeen native kernel events
behalf of the programmer. For another example, an 112 and portability handlers 122.
event may be a netWork packet arrival. A handler for As a result of event occurrence and handler invocation
that event is a function. The handler function Would be 118, native events 112 Which Were initially exposed Within
code that is executed When the event occurs. The native kernel environment 110 are noW exposed by netWork
handler code Would perform Whatever steps it is server portability architecture 120.
instructed to perform With that packet. 15 NetWork server extension architecture 130 implements
Expose—An event is exposed by de?ning interfaces handlers for events Which are exposed by netWork server
desirable for another softWare system to implement a portability architecture 120. NetWork server extension archi
handler for the event. Exposing the event means pro tecture 130 includes runtime support Which is typically
viding the interface as desirable for another softWare found in netWork server extensions. Run time support may
system to register the handler to be executed When the be, for example, a library of routines providing services.
event occurs. Thus, run time support 134 is included in netWork server
TWo softWare systems participate in de?ning events and extension architecture 130. The library of routines Which are
handlers: 1) The system exposing the event and 2) the included in run time support unit 134 are described beloW.
system implementing the handler to be executed When the NetWork server extension architecture 130 includes han
event occurs. 25 dlers 132. Portability handlers 122 execute AFPA handlers
The present invention relates to an event/handler exten 132. The execution of AFPA handlers 132 by portability
sion system Which is specialiZed for TCP netWork server handlers 122 is illustrated by arc 128.
applications. There are three architectures Which are desir NetWork server extension importation mechanism 140
ably used With this system: A netWork server portability implements handlers for events exposed by netWork server
architecture (Which provides a specialiZed portability extension architecture 130. NetWork server extension impor
architecture), a netWork server extension architecture (Which tation mechanism 140 provides interfaces for registering
provides a specialiZed extension architecture), and a net (creating neW event instances and con?guring event
Work server extension importation mechanism (Which pro handling) and deregistering for extensions associated With
vides a specialiZed extension importation mechanism). A netWork server extension architecture 130.
netWork server extension is also implemented by a program 35 Simply put, netWork server extension importation mecha
mer to directly interface With programmer Written code. nism 140 behaves like a table mapping the invocation of
The relationship betWeen the above described support extension handlers 152 from AFPA handlers 132. Thus, one
modules and the above described extension module are of the extension handlers 152 is executed in response to the
shoWn in FIG. 1. As depicted, arroWs represent a How of execution of one of the AF PA handlers 132. In addition, after
event occurrence and handler invocation. an extension registers With NetWork Server Extension Archi
As shoWn in FIG. 1, the native kernel environment of the tecture 130 (through the interfaces provided by NetWork
operating system is identi?ed as item 110. Events Which Server Extension Importation Mechanism 140), NetWork
occur in the native kernel environment 110 are identi?ed as Sever Extension Architecture 130 separates the resource
native events 112. Native kernel environment 110 includes management for that extension from the resource manage
address space, execution context, security attributes, etc. of 45 ment for other extensions. By separating resource
an operating system Which are typically hidden from the management, handler invocation on behalf of one server
user level applications. The kernel environment implements extension does not use the resources of another server
services exposed to applications in user space. The kernel extension. Separation is made possible by registration. At
environment provides services for Writing kernel extensions. registration time, a netWork server extension speci?es cer
The term native is used as a modi?er to “kernel environ tain events by specifying the port those events are associated
ment” to refer to a particular kernel environment. AIX, With.
OS/2, and NT all have distinct native kernel environments. A port is a number. Remote clients send packets to a
They are distinct because they each have a speci?c set of server. Remote clients specify the port number in the packets
application program interfaces (API) for Writing subsystems they send to a server. Aserver application “listens” for these
(such as netWork adapter drivers, video drivers, or kernel 55 packets. When a packet arrives With the port number the
extensions). server is listening for, the server establishes a connection
NT provides hundreds of APIs Within its native kernel With the remote client. Sockets are a knoWn means for a
environment. AIX and OS/2 each have different APIs. server application to use TCP/IP. One socket function is
Furthermore, the environments among these operating sys called “listen”. One of the parameters for “listen” is the port
tems differ in hoW the ?le system and TCP/IP subsystems number. When a server calls listen( ) With a port number, for
interact. example port 80, the server is listening for packets associ
As stated above, native events (e.g., TCP kernel events) ated With port 80 from a remote client. The use of ports is
112 are exposed Within native kernel environment 110. A described, for example, in Stevens, W. R., Unix NetWork
plurality of portability handlers 122 (or functions) are called Programming, Prentice Hall, 1990.
When native events 112 occur. These portability handlers 65 In an exemplary embodiment of the present invention, an
122 are implemented Within the netWork server portability HTTP server registers its handler functions and a TCP/IP
architecture 120. Portability handlers 122 are de?ned port (such as port 80) similarly to the manner in Which
6,163,812
5 6
“listen” speci?es a port to listen to. In so doing, events server, network server extension 150 manages (for example)
occurring on port 80 cause the invocation of the handler HTTP GET requests while deferring HTTP CGI requests to
functions speci?ed by the HTTP server and not the handler network server extension 160. The programmer designs the
functions associated with another server extension. server such that performance critical pieces reside in net
Furthermore, resource usage is dictated by the port that work server extension 150 and less critical pieces requiring
triggered the event. user space process protection reside in network server
Resource management and allocation for events occurring extension 160.
within Network Server Extension Architecture 130 on port As an example, a programmer writing an HTTP server
80 use a context pointer as a reference to the resources kernel component servicing GET requests may implement
associated with port 80 and the server extension which 10
the logic for handling such requests in extension handlers
152. Extension handlers are registered with the extension
registered port 80. Thus, resources such as memory and importation mechanism 140. Thus, extension handlers 152
connection end points are allocated and used separately on are mapped to AFPA Handlers 132 such that the extension
behalf of network server extension 150 by Network Server handlers are executed when the AFPA handlers are executed.
Extension Architecture 130. These same handler invocation Arcs 138 and 148 illustrate the mapping between AFPA
and extension registration capabilities are found in device 15 handlers and extension handlers.
driver frameworks on modern operating systems. One such In an exemplary embodiment of the present invention,
example is NT where a device driver framework referred to HTTP GET requests may be prefeteched while deriving
as the “port”/“miniport” model is used extensively. In this responses to a previous HTTP GET request. This is accom
framework, a “port” driver (not to be confused with a plished as follows. A remote client sends an HTTP GET
network port) allows multiple device drivers called request to the server implemented by Network Server Exten
“miniport” drivers to register themselves. Each time a new sion 150, 160. When a request is received that is not in the
registration takes place, a new device context is created. On response cache 136, GetUncachedResponse 152 is called by
NT, the device context associates resource usage and man AFPA handlers 132 since GetCachedResponse failed in
agement with that particular miniport driver. NT provides ?nding the response in the response cache 136. Thus, a
two port architectures, the speci?cation of which can be 25 cache miss has occurred. For the HTTP GET protocol,
adapted to the needs of this invention. The ?rst is the SCSI responses consist of ?les (e.g. image ?les) that must be read
port architecture. The second is the NDIS port architecture. out of the ?le system and placed in the cache 136. It is
Network server extension 150 and network server exten conceivable to search for embedded requests in a given
sion 160 together make a server (e.g., a web server). response each time a response is sent to the remote client.
Network server extension 150 is a kernel component. Net This, however, may incur a great deal of overhead in
work server extension 160 is a user component. The kernel searching a response byte by byte each time a response is
component could be (for example) the “GET” engine of the sent out. This overhead may be avoided by searching for
HTTP protocol. The user component could be “everything embedded requests when the response is ?rst loaded into the
else” that is part of the HTTP protocol. It is useful to separate
cache. Thus, on a cache miss, an HTML response ?le is read
the server extension between kernel and user pieces. This is 35 from the ?le system and searched for directives indicating an
referred to as a hybrid user/kernel server. The approach embedded requests once. These future requests are
allows the extension writer to pick and choose what pieces described in the HTML (hypertext markup language) stan
of the server need to be in user space and what pieces need dard as special tags specifying ?les such as image ?les. The
to be in the kernel. remote browser is expected to make these requests soon
Operation of network server extension 150 depends on the after receiving the initial response (which was just
type of extension (or server or protocol) that is being searched). In anticipation of these requests, references to the
implemented. Network server extension 150 handlers are requests are placed in the cache entry associated with the
called by network server extension architecture 130. Net original response. Thus, when any future browser requests
work server extension 150 (i.e. the server extension) gets this same response, some or all of the embedded requests are
requests from a remote web browser via these handlers. That 45 readily available in the cache. When this happens, some or
is, AFPA handlers 132 invoke the GetCachedResponse pro all of the future requests are loaded into the cache if they
vided by network server extension 150. GetCachedResponse have not been already. This technique may shorten the time
checks the contents of response cache 136 in AF PA run-time to deliver a web page with several embedded requests.
support 134 to determine if particular responses are cached. Network server portability architecture 120 is described
For cached responses, Network Server Extension 150 with reference to FIG. 2. Examples of portability handlers
returns the cached response and returns from the call to 122 and portability services 126 are given. The portability
GetCachedResponse. Based on a return code from handlers provided in the network server portability archi
GetCachedResponse, AFPA handlers 132 send the response tecture are the TCP connect handler 210, TCP receive
immediately to the remote client. If Network Server Exten handler 212, and the TCP disconnect handler 214. TCP
sion 150 is unable to ?nd responses in cache 136, several 55 connect handler 210 is executed when TCP connections are
possibilities exist and are described via a return code when established with a remote client. TCP receive handler 212 is
returning from GetCachedResponse. Three exemplary pos executed when TCP packets arrive from a remote client.
sibilities are described with reference to FIG. 8 as TCP disconnect handler 214 is executed when TCP discon
DeriveUncachedResponse, DeriveRemoteResponse, and nect requests arrive from a remote client. Windows NT
defer to user space. provides a means to execute such handlers. The Windows
A programmer writing a network server extension writes NT native kernel environment provides events 112 that map
a user space component network server extension 160 and a directly to the portability handlers 210, 212, 214. Windows
kernel component network server extension 150. Network NT de?nes these kernel events as transport driver interface
server extension 160 executes code requiring a user space (TDI). The portability handlers may be de?ned to work with
process context. Network server extension 150 executes 65 TDI.
code meant to execute in the context of AFPA handlers 132. Network server portability architecture 120 also provides
An example of this scenario is an HTTP server. In such a network services required by the other kernel software
6,163,812
7 8
modules. These services include the TCP service for sending derive ?le responses at a different interrupt level. Separating
data 220, the TCP service for initiating a TCP disconnect the call to DeriveCachedResponse from the call to Derive
222, ?le system services for modifying the cache and UncachedResponse permits the AFPA runtime to queue the
performing ?le I/O (224 and 226, respectively), multithread request such that threads in a different interrupt level can
ing services 228, and user space interface creation 230. Each execute DeriveUncachedResponse. These same interrupt
of these services are standard native kernel environment requirements can be found on UNIX operating systems as
services in most modem kernel environments. Network Well. This control How is illustrated in item 850 (discussed
server portability architecture 120 provides a common Wrap beloW). Once executed, DeriveUncachedResponse performs
per for these services such that NetWork Server Extension the necessary I/O to derive a response. For an HTTP server,
Architecture 130, NetWork Server Extension Importation 10 this means opening a ?le handle, updating the cache 136,
Mechanism 140, and NetWork Server Extension 150 can be and returning to the caller indicating that the neWly derived
implemented across platforms using the same code. Win response should be sent to the remote client. Thus, the
doWs NT contains hundreds of such services, only some of response is noW in cache 136 and may be serviced by
Which are useful for implementing netWork servers. The DeriveCachedResponse Without calling DerivingUncached
netWork server portability architecture 122 implements 15 response in the future.
Wrappers for portability services 126 since these are most If execute DeriveRemoteResponse 334, is speci?ed by
commonly needed for netWork servers. The netWork server DeriveCachedResponse, the extension’s DeriveRemoteR
portability architecture does not Wrap every native kernel esponse function is called and passed the request by an
environment interface. AFPA handler 132. DeriveRemoteResponse is called When
NetWork server extension architecture 130 is described responses are only possible using netWork communication.
With reference to FIG. 3. In an exemplary embodiment of the DeriveRemoteResponse is an example of an additional
present invention, netWork server extension architecture 130 callback that might be useful for implementing different
is referred to as the adaptive fast path architecture 130 or types of servers. An example of Where DeriveRemoteR
AFPA 130. AFPA 130 has tWo components: AFPA handlers esponse is useful is in the context of HTTP proxy servers. In
132 and AFPA run-time support 134. AFPA 130 provides 25 such servers, requests must be forWarded to another server
four handlers for executing server extension code. These and the response and forWarded back to the remote client.
four handlers are DeriveCachedResponse 330, DeriveUn DeriveRemoteResponse returns a response obtained from
cachedResponse 332, DeriveRemoteResponse 334, and netWork communication back to the caller (AFPA 132).
deferral 336 to the user space component of kernel extension If defer to user space 336 is speci?ed by
160. These four functions are a callback interface for author DeriveCachedResponse, the extensions user space compo
ing netWork servers as a kernel component 150. These nent is given the request for processing. This is accom
functions are noW described in the context of Writing (i.e. plished by producing the request to a standard FIFO queue
TCP) netWork server extensions 150/160. that is consumed by the user space component via custom
DeriveCachedResponse 330 is called by an AF PA handler kernel interfaces. These custom interfaces are conventional
132 via the handler mapping 142 and is passed via param 35 interfaces that request data from a shared queue in the kernel
eters the unprocessed data from the TCP stack. Thus, environment. On AIX, such interfaces are actual system
DeriveCachedResponse is called When a TCP packet arrives calls that are added to the operating system. On NT, such
from the remote host. The programmer implements Derive interfaces are IOCTL interfaces to an AFPA driver exten
CachedResponse to parse this packet and extract a request sion. Once the user space component has derived a response
from it. The parsed request is referred to as the request ID using Whatever means are convenient, the response is sent
and is used to look up a response in cache 136. For example, via custom kernel interfaces. These custom interfaces are
a request ID for an HTTP request Would be a ?le name. conventional interfaces that pass data to a kernel component
DeriveCachedResponse uses the request ID to look up the (in this case AFPA).
response (?le) in the cache 136. If the response is in the The above discussion describes four possible handler
cache, DeriveCachedResponse indicates this via a return 45 interfaces to a server extension implemented in 150. One of
code. The caller (an AFPA handler in 132) sends the ordinary skill Will recogniZe that there are interfaces that are
response to the remote client as instructed by the return code variations of these and input and output to these interfaces
from DeriveCachedResponse. In addition to parsing raW that differs from the discussion above. In many cases, native
request data and determining if a response is cached, Derive kernel events are augmented by application speci?c code.
CachedResponse uses the raW packet it is passed to deter One of ordinary skill Will also recogniZe that there are
mine if enough data has been sent to constitute a complete variations on the types of responses stored in the AFPA
request. cache.
If the response is not in cache 136, DeriveCached In an alternative embodiment of the present invention,
Response indicates one of three possible actions that are to cache 136 could contain much more than ?le handles or ?le
be taken by the caller (an AF PA handler in 132). Again, these 55 data. For example, HTTP responses consist of multiple types
actions are speci?ed via a return code from DeriveCached of header data and a ?le. It is conceivable that the cached
Response. Three possible actions are: 1) Execute Derive response is a combination of ?le and meta data such as this.
UncachedResponse 332, 2) Execute DeriveRemoteR AF PA 130 implements run-time support 134 for a number of
esponse 334, and 3) defer 336 to user space component 160. items. These items are connection management 310, request
If execute DeriveUncachedResponse 332, is speci?ed by management 312, cache management 314, thread manage
DeriveCachedResponse the extension’s DeriveUncached ment 316, and queue management 318. Connection man
Response function is called and passed the request. Derive agement is discussed With reference to FIG. 5. Request
CachedResponse does not call DeriveUncachedResponse management is discussed With reference to FIG. 6. Cache
directly because DeriveCachedResponse is typically in a management is discussed With reference to FIG. 7. Queue
restricted interrupt context. On WindoWs NT DeriveCache 65 management means providing an logical queue abstraction
dResponse executes at “dispatch” level. Because ?le I/O With interfaces and multiprocessor safe modi?cation. Unless
cannot occur at “dispatch” level on NT, it is necessary to otherWise speci?ed, queues in this invention are FIFO (?rst
6,163,812
10
in ?rst out) and are conventional. Thread management HTTP request and checks the AFPA cache for the desired
means creating and destroying threads. Thread management response. Three exemplary outcomes 332, 334, 336 are
also means consuming resources from queues Without race possible resulting in three exemplary extension handlers
conditions. Thread management may be implemented in 420, 430, and 436 being executed. This is described in more
accordance With conventional techniques. detail With reference to FIG. 8.
Connection, requests, and messages are noW de?ned. FIG. 5 is an illustration of connection management for
Connections are data structures de?ned by the native kernel TCP/IP. The remote client portion of the ?gure shoWs the
environment 110, but allocated and deallocated via AFPA actions occurring from the perspective of the remote client
run-time support 134. An example of a connection is a TCP computer. LikeWise, the server side shoWs the actions from
connection endpoint created on the server for tracking
the perspective of an exemplary embodiment of the present
connections made on a unique address/port pair.
invention.
A request has four components: connection, request data, AFPA 130 maintains a pool of unused connection data
request ID, and send data. A server handling multiple
requests over the same connection (such as FTP) generates structures in connection pool 550. As connections are
requests With the same connection. A server handling one established, their data structures are allocated from connec
request per connection (such as HTTP) has a unique con 15 tion pool 550. As disconnections occur, connection data
nection for each request. structures are returned to connection pool 550. This reduces
Request data is a linked list of non-paged memory frag the computation necessary to establish connections.
ments referred to as messages. The linked list data structure When the remote client issues a connect request 510, the
is used to alloW multiple messages to be associated With a server computer receives a TCP/IP SYN packet 520. This
single request. This is necessary for client/server protocols triggers a native kernel event that executes connect handler
Where the request data siZe exceeds the maximum transfer 210. Connect handler 210 executes connection management
unit (MTU) of the underlying netWork. code Within AFPA 134. This may include, for example,
NetWork server extension 150 provides a reproducible connection management code 540, 590, and 593. Upon
request ID for every request data in every request. The receiving the TCP/IP SYN packet, the server computer
request ID is used to associate responses in the response 25 allocates a neW connection 540 from the connection pool
cache With neWly arrived request data. For HTTP server 550. On WindoWs NT a connection is referred to as a
extensions, a request ID may be a ?le name. This request ID connection endpoint On UNIX operating systems, a con
is derived from the request sent by the remote client (such nection is typically referred to as a socket. In a preferred
as an HTTP GET request). In FIG. 8, a request ID is embodiment of the present invention listen and socket
produced to one of three queues depending on the action queues are excluded. When a SYN packet arrives it is
speci?ed by DeriveCachedResponse on a cache miss. In all immediately allocated a connection 540 instead of being
three cases (850, 890, 894), the request ID is produced to a queued. Completing a TCP connection may be indicated by
queue that is consumed by one or more threads executing the arrival of a TCP SYN packet. This may be accomplished
DeriveUncachedResponse, DeriveRemoteResponse, or in by sending a TCP SYN\ACK packet at the same time an
the user space component. 35 interrupt or received for the TCP SYN packet.
The send data is the non-cacheable data associated With a Aconnection is destroyed one of tWo Ways: 1) The remote
request that is desirably included in the corresponding client closes the connection FIN packet 570 and disconnect
response. Since such data cannot be cached (since it changes handler 214. 2) The server initiates a disconnect 593 and
for each response), it is necessary to associate this data With 220. In either case, the connection is returned to the con
each request. For example, HTTP GET responses require nection pool 590 or 593 for use in establishing a neW
data speci?c to the original HTTP request and the contents connection 540.
of a particular ?le. For example, a date stamp is required for FIG. 6 illustrates an example of request management for
such responses. The later is cacheable While the former is TCP. The remote client portion of the ?gure shoWs the
not. To preserve the non-cacheable data that must be sent actions occurring from the perspective of the remote client.
With the response, individual requests contain this“send 45 LikeWise, the server side shoWs the actions from the per
data”. spective of the invention embodiment (netWork server
FIG. 4 illustrates an example the control How of a remote extension system).
client request in the AFPA 130 for an HTTP extension 150. When a receive event occurs 610, a data packet 620
Control ?oW begins in native kernel environment 110 When arrives. Receive handler 212 is executed Which in turn
a native kernel event 112 occurs indicating the arrival of executes DeriveCachedResponse 332 of AFPA 130. Derive
TCP data. The portability architecture 120 provides a TCP/ CachedResponse 332 parses the request, determining if it is
IP receive handler 212 Which is executed in response to the complete. To accommodate the neWly arrived data, AFPA
native kernel event. As a result of the portability architecture 130 removes a request data structure 640 from a pool of
executing its TCP/IP receive handler, DeriveCached preallocated request data structures 660. AFPA 130 also
Response handler 330 in AFPA 130 is executed. 55 preallocates message fragments from the message pool 670.
A priori to the above events, HTTP kernel component For each neW data packet, a neW message fragment is
extension 150 has registered each of its handlers With AFPA allocated from the message fragment pool and placed on the
130. Registration With AFPA 130 means AFPA 130 is message fragment list 680 Which is part of the request data
con?gured to execute handlers Within kernel extension com structure 640. DeriveCachedResponse 332 then copies the
ponent 150 When the corresponding handlers are executed neWly arrived data into the message fragment. When enough
by AFPA 130. For example, DeriveCachedResponse 330 data has arrived on the connection to constitute a full
executes DeriveCachedResponse 410 Within kernel compo request, DeriveCachedResponse 332 parses the request and
nent extension 150. Registration With AFPA 130 also means queries the cache for the response.
a port has been speci?ed by the kernel extension component An exemplary embodiment of the present invention
150. 65 addresses netWork servers relying on the ?le system for
After data from the remote client arrives, DeriveCache deriving responses. Such servers typically require integra
dResponse 410 Within extension 150 parses the incoming tion With the native kernel environment’s cache. The dis
6,163,812
11 12
tinction between such servers is made When the network data in the ?le system cache. Pinning ?le data is not
server extension registers With network extension importa successful because the ?le Was only partially cached in the
tion mechanisms. ?le system cache. The caller (AFPA 132) performs neces
DeriveCachedResponse and DeriveUncachedResponse in sary ?le I/O to load contents of ?le into ?le system cache.
FIG. 4 query and update cache 136 in performing the above Another attempt is made to pin the ?le data and is successful.
described actions. For ?le oriented servers these queries and Another response entry’s pinned page frame list is unpinned
updates result in additional queries and updates to the ?le and freed. This response entry is updated With the neWly
system cache. FIG. 7 shoWs the control How for obtaining pinned page frame list.
?le system data to derive a response to a client request. DeriveCachedResponse computes the request ID and que
FIG. 7 demonstrates 5 different control paths in deriving ries the response cache for the response cache entry. The
a response using data from the ?le system and a response response cache entry is found. DeriveCachedResponse
cache entry. This example assumes ?le I/O must be handled returns the response cache entry to the caller (AFPA 132).
on a different thread than the handler for the netWork receive The caller (AF PA 132) does not ?nd a pinned page frame list
event because such a handler executes in a restrictive in the response cache entry. The caller (AFPA 132) sWitches
interrupt level. This example assumes (for clarity not 15 to a less restrictive interrupt level Where ?le U0 is possible.
necessity) that every request ID has a response cache entry The caller (AFPA 132) does not ?nd the ?le handle in the
in the AFPA response cache 136. Aresponse cache entry can response cache entry. The caller (AFPA 132) obtains a ?le
be created and paths 730, 740, 750, 770 executed as desir handle by opening the ?le. Another response cache entry’s
able. This example also assumes that the ?le system cache ?le handle is invalidated. This response cache entry is
is backed by pageable memory (as on WindoWs NT or AIX updated With the neW ?le handle. The caller (AFPA 132)
operating systems). Lastly, the example assumes that ?le attempts to pin the ?le’s data in the ?le system cache.
data can be pinned and unpinned in one action. For large Pinning ?le data is successful. The caller (AFPA 132)
?les this may not be possible. obtains the page frame list for ?le data in the ?le system
Handling ?le I/ O on a different thread is desirable in three cache. Another response entry’s pinned page frame list is
cases: 1) The ?le has not been read yet (and is not in the ?le 25 unpinned and freed. This response entry is updated With the
system cache). 2) The ?le has been read, but is only partially pinned page frame list.
cached. 3) The ?le has been read and is in the cache, but DeriveCachedResponse computes the request ID and que
requires a page fault before the data is in paged in. The 5 ries the response cache for the response cache entry. The
control paths are noW described: response cache entry is found. DeriveCachedResponse
The DeriveCachedResponse routine computes the request returns the response cache entry to the caller (AFPA 132).
ID and queries the response cache. The response cache entry The caller (AF PA 132) does not ?nd a pinned page frame list
is found. DeriveCachedResponse returns the response cache in the response cache entry. The caller (AFPA 132) sWitches
entry to the caller (AFPA 132). The caller (AFPA 132) ?nds to a less restrictive interrupt level Where ?le U0 is possible.
the pinned page frame list for the response and sends it The caller (AFPA 132) does not ?nd the ?le handle in the
immediately using the portability services 126. This is the 35 response cache entry. The caller (AFPA 132) obtains a ?le
best case and yields maximum performance since the handle by opening the ?le. Another response cache entry’s
response is derived immediately upon receiving the request ?le handle is invalidated. This response cache entry is
using non-pageable memory on the same interrupt the updated With the neW ?le handle. The caller (AFPA 132)
receive event occurred on. attempts to pin the ?le’s data in the ?le system cache.
DeriveCachedResponse computes the request ID and que Pinning ?le data is not successful because the ?le Was only
ries response cache 136 for the response cache entry. The partially cached in the ?le system cache. The caller (AFPA
response cache entry is found. DeriveCachedResponse 132) performs necessary ?le I/O to load contents of ?le into
returns the response cache entry to the caller (AFPA 132). ?le system cache. Another attempt is made to pin the ?le
The caller (AF PA 132) does not ?nd a pinned page frame list data and is successful. Another response entry’s pinned page
in the response cache entry. The caller (AFPA 132) sWitches 45 frame list is unpinned and freed. This response entry is
to a less restrictive interrupt level Where ?le U0 is possible. updated With the neWly pinned page frame list.
The caller (AFPA 132) ?nds the ?le handle in the response FIG. 8 illustrates the possible control How scenarios for
cache entry. The caller (AFPA 132) attempts to pin the ?le’s request routing using the AF PA netWork extension architec
data in the ?le system cache. Pinning ?le data is successful. ture. AFPA provides four handlers for executing server
The caller (AFPA 132) obtains the page frame list for ?le extension code. These four handlers are DeriveCached
data in the ?le system cache. Another response entry’s Response 330, DeriveUncachedResponse 332, DeriveR
pinned page frame list is unpinned and freed. This response emoteResponse 334, and deferral to the user space compo
entry is updated With the pinned page frame list. nent 160 of kernel extension.
In accordance With the description above, AFPA caches FIG. 8 shoWs possible routes a request can take in
references to ?le data and lets the native ?le system cache 55 executing AFPA handlers When cache misses and cache hits
perform its prescribed job of caching ?le contents. By occur. Receive handler 212 triggers DeriveCachedResponse
avoiding the double buffering of tWo caches, greater ef? to be executed repeatedly until the entire request has been
ciency is realiZed. received. This is illustrated in FIG. 6. If DeriveCached
DeriveCachedResponse computes the request ID and que Response determines the response is in the response cache,
ries the response cache for the response cache entry. The this is considered a cache hit 840 and the portability layer’s
response cache entry is found. DeriveCachedResponse TCP/IP send service 220 is used to send the response to the
returns the response cache entry to the caller (AFPA 132). remote client directly from the cache.
The caller (AF PA 132) does not ?nd a pinned page frame list DeriveCachedResponse executes at the same restrictive
in the response cache entry. The caller (AFPA 132) sWitches interrupt level that TCP processing for that request Was
to a less restrictive interrupt level Where ?le U0 is possible. 65 executed on. When the response is in cache and stored in
The caller (AFPA 132) ?nds the ?le handle in the response non-pageable (i.e. pinned) memory, AFPA achieves optimal
cache entry. The caller (AFPA 132) attempts to pin the ?le’s performance.
6,163,812
13 14
However, any responses requiring ?le I/O must be 9. The system of claim 1, Wherein said one or more
deferred to later processing in the context chosen by AFPA domain speci?c applications are implemented partially in
for DeriveUncachedResponse 880. This context might be a user space and partially in said kernel environment as said
kernel thread running at a less restructure interrupt level. supplied handler routines.
Refer to FIG. 7 to determine Whether the cache hit handler 10. The system of claim 9 Wherein said domain speci?c
can complete Without ?le I/O. Any control How in FIG. 7 application is a netWork server application.
requiring the execution of item 770 Will require ?le I/O. In 11. The system of claim 10 Wherein the user space
these cases, a less restrictive interrupt level is desirable. application component obtains TCP/IP delivered requests
When a response cache entry cannot be found, the AFPA and TCP/IP delivered responses using standard TCP/IP
caller 134 is unable to deliver a response. In these cases, the 10 socket interfaces, said user space application is not modi?ed
cache query is a miss (850 or 890 in FIG. 8) and the request to operate in conjunction With class speci?c architecture
is queued to one of the remaining three AFPA handlers 860, component.
891. 12. The system in claim 10 Wherein said extension
The DeriveUncachedResponse miss case 880 attempts to architecture is a netWork server extension architecture and
retrieve the ?le via ?le U0. The DeriveRemoteResponse 892 15 Wherein the user space application component receives
miss case is given control When the response must be TCP/IP delivered client requests explicitly using a user
derived via additional netWork communication. request queue and sends server responses using response
While preferred embodiments of the invention have been services (add to ?g/spec), said user request queue couples
shoWn and described herein, it Will be understood that such the user space application component and the netWork server
embodiments are provided by Way of example only. Numer extension architecture, said user request queue comprising:
ous variations, changes, and substitutions Will occur to those
skilled in the art Without departing from the spirit of the a) a producer/consumer queue Wherein a producer is the
netWork server extension architecture and a consumer
invention. Accordingly, it is intended that the appended
is a user space application component, said netWork
claims cover all such variations as fall Within the spirit and
server extension architecture adapted to receive request
scope of the invention. 25 data fragments and produce said data fragments as a
What is claimed:
single request to said producer/consumer queue; and
1. In a operating system kernel environment, an event/
handler extension system for executing application code for b) request services for the user space application compo
a speci?c class of applications in response to operating nent to consume said client requests from the request
system kernel events and services, said system comprising: queue.
a) an extension importation mechanism for specifying 13. The system of claim 1 Wherein said kernel events are
handler routines on behalf of one or more classes of netWork kernel events.
applications, Wherein said handler routines execute in a 14. The system of claim 13 Wherein said netWork kernel
events are TCP/IP netWork kernel events comprising one of:
kernel environment, in response to said kernel events;
b) one or more class speci?c extension architectures, 35 a) connection establishment (TCP/IP SYN packet arrival);
coupled to said importation mechanism, for mapping b) disconnection (TCP/IP FIN packet arrival);
supplied handler routines to said kernel events and c) TCP/IP data packet arrival.
implementing class speci?c functionality necessary in 15. The system in claim 13 Wherein said netWork kernel
executing said handler routines in response to said events are UDP/IP kernel events.
discrete operating system kernel events; and 16. The system of claim 13, further comprising: one or
c) one or more operating system dependent portability more extension architecture handlers, Wherein said supplied
architectures, coupled to said kernel environment, for handler routines are indirectly coupled to said netWork
exporting said kernel events and said services to said kernel events via said one or more extension architecture
one or more extension architectures, Wherein said one handlers Which execute said supplied handler routines; said
or more portability architectures separate each exten 45 one or more extension architecture handlers are executed
sion architecture from said kernel environment. based on method to derive response; said one or more
2. The system of claim 1 Wherein a class speci?c appli extension architecture handlers including one of: a Derive
cation code is a netWork server kernel extension. CachedResponse handler Wherein a netWork server response
3. The system of claim 2 Wherein said netWork server is derivable from cached non-paged memory; a DeriveUn
kernel extension derives responses from non-pageable cachedResponse handler Wherein a netWork server response
memory on a same interrupt or thread of execution as the is derivable from persistent media local to a computer
processing required to receive the original request. running a netWork server; a DeriveRemoteResponse handler
4. The system of claim 2 Wherein said netWork server Wherein a netWork server response is derivable from addi
kernel extension uses the TCP/IP transport protocol. tional netWork communication; and a defer to user space
5. The system of claim 4 Where completing a TCP 55 kernel extension component handler Wherein a netWork
connection as indicated by the arrival of a TCP SYN packet server response is derivable from a user space class speci?c
is accomplished by sending a TCP SYN/ACK packet on a application component.
same interrupt or thread of execution as the processing 17. The system of claim 14 Wherein said extension
required to receive the said TCP SYN packet. architecture is a netWork extension architecture Which reuses
6. The system of claim 4 Wherein said netWork server TCP/IP connection endpoints after these connection end
extension implements the HTTP protocol. points are disconnected, Wherein said connection endpoints
7. The system of claim 6 Where the GET portion of the are data structures associated With individual TCP/IP con
HTTP protocol is implemented as said handler routines nections in a TCP/IP implementation native to an operating
executed in response to said kernel events. system.
8. The system of claim 7 Wherein future HTTP GET 65 18. The system of claim 1 Wherein the extension archi
responses are prefetched While deriving responses to a tecture is a netWork server extension architecture, said
previous HTTP GET requests. netWork server extension architecture comprising one or
6,163,812
15 16
more of: a connection endpoint management means for 25. The system in claim 23 Where access to ?le data in ?le
allocating and garbage collecting data structures associated system cache is facilitated using cached ?le handles and
With a netWork transport layer connection endpoints; a kernel interfaces to access cached pageable ?le data asso
request management logic means for managing request ciated With said ?le handles Without copying ?le data.
fragments on behalf of said supplied handler routines; a 26. The system in claim 23 Where access to ?le data in ?le
cache management logic for deriving responses from non
paged memory; and a thread management logic means for
system cache is facilitated using cached ?le handles and
deriving responses on high latency requests. kernel interfaces pin and unpin cached ?le system memory.
19. The system of claim 18 Where said netWork server 27. The system in claim 1 Wherein the portability archi
extension architecture derives responses based on cached 10 tecture is a netWork server portability architecture compris
information. ing generic interfaces to the native operating system envi
20. The system of claim 19 Where said cached information ronment for one or more of:
is derived by prefetching future responses based on data
a) exposing netWork kernel events;
from previous requests.
21. The system in claim 19 Wherein said netWork server 15 b) accessing the ?le system cache;
extension architecture derives responses based on ?le data. c) managing operating system ?le handles; and
22. The system in claim 21 Where ?le data is cached in
kernel memory separate from the ?le system cache.
d) managing operating system threads.
23. The system in claim 21 Where the ?le data is obtained 28. The system in claim 27 Wherein said netWork kernel
from the ?le system cache using ?le system interfaces Within events are TCP/IP netWork kernel events.
the kernel environment. 29. The system in claim 27 Wherein said netWork kernel
24. The system in claim 23 Where access to ?le data in ?le events are UDP/IP netWork kernel events.
system cache is facilitated using cached ?le handles to copy
data from cache to distinct kernel memory.

You might also like