Professional Documents
Culture Documents
Extensible Storage Engine
Extensible Storage Engine
ESE is for use in applications that require fast and/or light structured data
storage, where raw file access or the registry does not support the application's
indexing or data size requirements.
It is used by applications that never store more than 1 megabyte of data, and has
been used in applications with databases in extreme cases in excess of 1 terabyte,
and commonly over 50 gigabytes.
This documentation is intended for developers who are familiar with C and C++,
and basic database concepts such as tables, columns, indexes, recovery, and
transactions. The only access method for ESE is the C API that is described in this
documentation.
ESE provides a user-mode storage engine that manages data inside of flat, binary
files that are accessible through the Windows APIs. ESE is accessed through a
DLL that is loaded directly into the application's process; no remote access
methods are required of or provided by the database engine itself. Though ESE
has no remote or inter-process access method, the data files it uses can be
provided remotely by using server message block (SMB) through the Windows
APIs, but this is not recommended.
Note Windows XP 64-Bit Edition is the same as Windows Server 2003 for the
purpose of determining the ESE feature set that is supported.
Notes
ESE was formerly known as Joint Engine Technology (JET) Blue, and so
frequently the term "JET Blue" or "JET" is used interchangeably with the term
ESE outside this documentation. However, there are in fact two completely
separate implementations of the JET API, called JET Blue and JET Red. The term
"JET" is frequently also used to refer to JET Red, which is the database engine
that is used with Microsoft Office Access. The two JET implementations are
completely different, are separately maintained, have a vastly different feature
set, and are not interchangeable. Within the ESE documentation, "JET" refers to
the ESE or the JET API as ESE implements it. Any references to the JET Red will
always explicitly be labeled "JET Red".
Backup and restore: An application can make consistent copies of the data
state while it is on-line and actively modifying data state.
Cursor Navigation: The application can navigate with the cursor to access
data either sequentially or by using indices.
Database: A collection of tables that are backed up and restored as a single
unit.
Logging and crash recovery: The ESE API ensures that application-
defined data consistency is honored even in the event of a system crash.
Tables: The fundamental structure of the ESE database that is used to store
data.
Transaction: The ESE database engine provides Atomic Consistent
Isolated Durable (ACID) transactions that allow applications to retrieve
data only from reliable data states and maintains data consistency in the
event of an unexpected process termination or system shutdown.
Scalable: The application can create databases as large as 100 GB or as
small as 1MB.
Database Overview
The ESE database is an indexed sequential access method (ISAM) for storing and
retrieving data. An ESE database is stored in a single file and consists of one or
more user-defined tables. Data is organized in records in the table with one or
more user-defined columns. Indexes that are created provide different
organization for the entire set or a subset of records in the table. Using the ESE
API, applications can create cursors that navigate records in the database in
different sequential orders. The elements of the table are defined below:
Column: The column is a field in the table that stores a specific type of
information. Columns can be fixed, or variable length, depending on the
data type stored in them. Some columns, such as tagged columns take no
space when NULL or set to the default value, and can contain multiple
values.
Records: A record is a collection of columns values that have a unique
identity as defined by the primary key.
Index: The index is a collection of key columns that define a stored
ordering of records in the table. The clustered, or primary, index defines
the order in which the records are stored within the table. Multiple indices
can be defined in order to specify different orderings of traversal through
records in the table. An index may also limit the set of records visible
based on simple criteria such as the presence or absence of a particular
key column value in the record.
Cursor: The cursor indicates the current record in the table and navigates
to records in the table using the current index. The cursor also contains
information on the state of the currently prepared update.
Columns and indices may be added-to or removed-from the table at any time.
Although multiple indices may be defined, the data in the table is physically
stored and logically clustered according to the primary index definition in a B+
tree. Each secondary index is stored in a separate B+ tree that contains only
logical pointers to the actual data that is stored in the primary table. If no index is
defined, the records in the table are stored in a B+ tree in the order of insertion
and are referred to as the sequential index.
The diagram here is an example of how the data for the table is stored in a B+
tree according to the primary index. The primary index is for Name and ID, and
a secondary index is created for the employee's office number. The entries for the
secondary index are stored in a separate B+ tree that contains only pointers to the
records stored in the primary table. For example, the office number 12348 in the
secondary table is related to record 3 in the primary table. Record 3 contains the
column values for the employee in office 12348. For more information, see the
Indexing in the Table topic.
ESE Handles
ESE handles are used to create sessions and access databases. They are
maintained in a hierarchy, which means that the output from one level is used to
access resources at the next level.
The diagram here shows the hierarchy of handles and the corresponding
functions that create the handles. Also indicated along with the functions are the
handles that are used in the call to create the new handle.
As shown in the diagram, the instance is the root handle and the level at which
the database is recovered in the event of an unexpected process termination or
system shutdown. The instance handle, JET_INSTANCE, is created by
JetCreateInstance and JetCreateInstance2. The next level, the session level, is the
transaction context of the database engine and the level under which all database
operations are performed. The session handle, JET_SESID, is created in the
context of the instance in the call to JetBeginSession. The session ID is used in all
subsequent calls to access tables and databases. If the instance is being used by
more than one thread at a time then each thread must use its own session handle.
The database handle is created using the session handle and primarily used to
manage the schema of the database, but it can also be used to manage tables
inside the database. Database handles can only be used in the session under
which they are created. The handle to the database is created in the call to
JetCreateDatabase or JetOpenDatabase. Tables are associated with the database
ID under which they are created.
Defragmentation/compaction
Columns
Indexing in the Table
Creating Databases
Transactions
ESE Errors
ESE Files
Your program source files should include the esent.h header file to access
function prototypes and structure definitions for the Extensible Storage Engine
API. Developers can use the esent.lib library file to build applications that use the
Extensible Storage Engine API. At runtime, applications link to the esent.dll.
Columns
A table can be created either with an initial set of columns by calling
JetCreateTableColumnIndex or without an initial set of columns by calling
JetCreateTable. Tables in ESE can contain up to 127 fixed-length columns, 128
variable-length columns, and 64,993 tagged columns. Columns are identified by
their name and ID and can be dynamically added to the table with
JetAddColumn. Columns are created with a specific data type and an optional
set of attributes, such as whether the column is fixed-length or whether it can be
NULL or not.
The type of a column determines the data that may be stored in the column and
many of the properties of the column, including its order for indexing. ESE
supports a wide range of column types, ranging in size from 1 bit to 2 GB
(2146483647 ASCII characters or 1073741823 Unicode characters). For a complete
list of the column data types supported by ESE, see the JET_COLTYP topic. The
topics below discuss a few of the columns types supported by ESE:
Although multiple indices can be defined, the records are physically stored in B+
trees in the order specified by the primary index. The primary index is always a
clustered index, and must also be unique. The primary index must be declared
before the first table update to preserve the index ordering. When no primary
index is defined by the application, the data is stored in the order in which
records are added to the table. This special index is referred to as a sequential
index.
Separate B+ trees are used to order records according to the secondary index.
Index entries in the secondary index contain pointers to the data stored
according to the primary index. The index entries for records in the primary
index must be unique because the secondary index points to the record using the
primary key of the record. Secondary indices may or may not have a uniqueness
constraint. For more information, see the Database Overview topic.
Creating Databases
The ESE database comprises one or more tables that organize data by columns
and rows. The ESE database is identified by name and database ID. An ESE
database looks like a single file to the Microsoft Windows operating system;
however, internally the database is stored as a collection of pages. These pages
contain metadata that describe the data in the database, the data itself, and one or
more indices that store different orders of the data. The database may contain up
to 2^31 pages, or 16 Terabytes of data for a database with 8 KB pages.
If the database already exists, step 5 above is replaced by the following two
steps:
A database can be detached from one ESE instance using JetDetachDatabase and
later attached to another instance with JetAttachDatabase. When the database is
detached, it can be copied as a single file using standard Windows utilities.
However, when the database is attached to an ESE instance it cannot be copied
since ESE opens database files exclusively. Also, if the instance crashes then the
database file cannot be copied alone because it needs the transaction log files
associated with it to recover from that crash.
Transactions
ESE transactions are logical units of processing that control how an application
sees and manipulates rows in the database. Your application can use transaction
save points to determine whether to keep or discard a particular set of changes to
the database. All transactions in ESE are atomic, consistent, isolated, and durable
(ACID) as described below:
Atomic: All the updates in the transaction either appear in the database or
they are discarded.
Consistent: The database will always start in a legal state and will always
end in another legal state. For ESE applications, the database engine will
control some simple constraints, for example uniqueness of a unique
index, but the application itself will define almost all other aspects of what
it means for the database to be in a legal state.
Isolation: Transactions are isolated from updates by other sessions. A
transaction will never see a partial set of changes made by another
transaction.
Durable: After the database engine acknowledges that a transaction has
been committed, its changes are persistent in the database. The durability
of a transaction may be optionally waived for performance reasons.
This procedure shows how to start and commit a transaction that reads and
updates data in a database.
The way in which an ESE database engine implements snapshot isolation has
some important differences from traditional relational database isolation and
locking models. When a transaction reads a row, it can always access the row
without failing or waiting for other sessions to release a lock. When a transaction
attempts to update a row, it will succeed if it is the first session to update that
row, that is the first writer wins. If the session is not the first writer then it will
immediately fail with a write conflict error. The session must then abort its
transaction, wait (usually via a random delay), for the other transaction to
commit its changes, and then retry the transaction. The database engine will not
automatically cause that session to wait until the other transaction has finished
its update. Usually, a transaction will test if it can update a row inside of the
JetUpdate call. If it cannot lock the row for update then JetUpdate will fail with
JET_errWriteConflict.
Sessions are limited to one thread from the time the transaction starts to the end
of the transaction. It is recommended that all update and retrieve operations be
performed in a transaction. ESE also supports schema modifications such as
creating tables and adding columns inside the transaction. Both update and
schema modifications can be performed in the same transaction. After the
transaction completes with JetCommitTransaction, the update is logged in the
transaction log file. These files can be used to maintain a logically consistent state
in the event of an unexpected process termination or system shutdown.
Throughout the ESE API documentation, only the most important errors are
documented. These errors typically represent API usage errors or very important
error conditions. Be aware that any of these ESE APIs can also return other errors
that are not documented for each API. In these cases, the caller should simply
handle the error as they would any other error that is returned by the API. The
specific error value may then be used for diagnostic purposes such as tracing.
When ESE encounters some of the more serious errors, it creates an event log
entry that contains details about the errors. The level of logging can be controlled
by Event Log Parameters.
Some applications require the ability to return JET_ERRs as HRESULTs. The
following C++ example shows how to make that conversion:
Copy Code
#ifndef FACILITY_JET_ERR
#endif
#ifndef HRESULT_FROM_JET_ERR
( __err ) == JET_errSuccess ?
S_OK :
( __err ) == JET_errOutOfMemory ?
E_OUTOFMEMORY :
MAKE_HRESULT
( __err ) < 0 ?
SEVERITY_ERROR :
SEVERITY_SUCCESS
),
FACILITY_JET_ERR,
( __err ) < 0 ?
-( __err ) :
( __err )
& 0xFFFF
#endif
For information about configuring system parameters for error handling, see
Error Handling Parameters.
See Also
Error Handling Parameters
Extensible Storage Engine Error Codes
JET_ERR
This table contains an overview of the data file names that are managed by ESE. For
Windows Vista and later, the JET_paramLegacyNames setting impacts the file names
that are used.
Windows
Server 2003 Windows Vista Windows Vista and
and earlier and later later
JET_paramLegacyNa JET_bitESE98FileNa
mes setting N/A None mes
Current Log <inst>.log <inst>.jtx <inst>.log
Pre-Init Log <inst>tmp.log <inst>tmp.jtx <inst>tmp.log
Rotated Logs <inst>XXXXX. <inst>XXXXX.jtx <inst>XXXXX.log
log after FFFFF after FFFFF switch to
switch to <inst>XXXXXXXX.lo
<inst>XXXXXXX g.
X.jtx
Checkpoint File <inst>.chk <inst>.jcp <inst>.jcp
Temporary Database <temp db file <temp db file <temp db file name>
name> name> Default: Default: tmp.edb
Default: tmp.edb
tmp.edb
Reserved Transaction res1.log & <inst>RESXXXXX <inst>RESXXXXX.jrs
Log File res2.log .jrs
Database File <db file <db file name> <db file name>
name>
The names of the log files are dependent on a three-letter base name, which can
be set with JET_paramBaseName. The examples below use a base name of "edb",
because that is the default base name. The extension for the transaction log files
will be either .log or .jtx depending upon whether the JET_bitESE98FileNames is
set in the JET_paramLegacyFileNames parameter. For more information, see
Extensible Storage Engine System Parameters.
Although transaction log files have the .LOG extension commonly associated
with text files, transaction log files are in a binary format and should never be
edited by a user.
Database operations are written to the log first. The data can be written to the
database file later; possibly immediately, potentially much later. In the event of
unexpected process or system termination, the operations are still present in the
log files, and incomplete transactions can be rolled back. The act of replaying
transaction log files is called soft recovery, and it is done automatically when
JetInit or JetInit2 is called. Soft recovery can also be performed manually with
the "-r" option of the Esentutl.exe program. The act of replaying transaction log
files on a database that is restored from a backup is called hard recovery.
Log files are of a fixed size, customizable with JET_paramLogFileSize. When the
current log file (that is, edb.log) gets filled, it gets renamed to
<base><generation-number>.log, and a new transaction log file is needed in the
transaction log stream.
Each database instance has a single log file sequence associated with it.
Windows XP introduced JetCreateInstance, allowing multiple transaction log
file sequences to be used by a single process. Multiple transaction log file
sequences cannot exist in the same directory, however.
Transaction log files will be deleted by the database engine during a full backup
(see JetBackup, JetTruncateLog, JetTruncateLogInstance), or during normal
operations, if circular logging is enabled.
After a transaction log file is filled up, the database engine needs to create a new
log file. Circular logging is a means by which log files can be automatically
cleaned up by the database engine when they are no longer required for crash
recovery. This process is an alternative to removing log files as a by-product of
performing a backup. Circular logging can be controlled with the
JET_paramCircularLog system parameter. Transaction log files should not be
deleted using any other method.
While a new log file is created and its size extended, it will be called
<base>tmp.log. Creating a new file can be a potentially costly operation, so ESE
will create the next log file proactively as a background task.
Because the temporary transaction log file is created in anticipation of need for a
new transaction log file, it does not contain any useful information.
Windows Vista: In Windows Vista and later, the Reserved Transaction Log Files
are named <base>RESXXXXX.jrs.
Windows Server 2003: In Windows Server 2003 and earlier, The Reserved
Transaction Log Files are named res1.log and res2.log.
When the database engine runs out of disk space it cannot create a new log file.
The safest thing to do is to shut down cleanly, but some operations (such as
rollback operations) must still be logged. Most database operations will fail
during this stage.
Because the reserved transaction log files are created in anticipation of need for
transaction log files in an out-of-disk scenario, they do not contain any useful
information.
Checkpoint Files
The checkpoint file stores the checkpoint for a particular transaction log file
sequence. The checkpoint file is named <base>.chk or <base>.jcp, depending
upon whether the JET_bitESE98FileNames is set in the
JET_paramLegacyFileNames parameter, and its location is given by
JET_paramSystemPath.
Database operations are first written to the log files and then cached in memory.
At some later point, the operations get written to the database file, but for
performance reasons, the order in which operations are written to the database
file might not match the order in which they were originally logged. Operations
written to the transaction log file will be in one of two states:
Many database operations can be stored in a single transaction log file. A given
log file can consist of the following items:
The checkpoint refers to the point in time in the transaction log stream where all
operations prior to the checkpoint have been written to the database file. There is
no guarantee about the operations that occur after the checkpoint; some might be
in memory, and some might be written to the database.
Since all the operations in the log files prior to the checkpoint are represented in
the database file, only the transaction log files after the checkpoint are needed for
soft recovery to bring a particular database into a clean state.
Database Files
The database file contains the schema for all of the tables in the database, the
records for all of the tables in the database, and the indexes over the tables. Its
location is given using JetCreateDatabase, JetCreateDatabase2,
JetAttachDatabase, or JetAttachDatabase2.
The Esentutl.exe program can detect whether a database is shut down cleanly
with the "-mh" option. For example, "esentutl.exe -mh sample.edb" will read the
database header of a database named sample.edb, and print out the state of
sample.edb. It may print out "State: Clean Shutdown" or "State: Dirty Shutdown".
A database that has not been cleanly shut down is in a dirty shutdown state.
Prior to Windows XP, this state was called inconsistent. A dirty (inconsistent)
database can be brought to a clean state with soft recovery. A corrupt database is
not the same as a dirty ("inconsistent") database.
Only cleanly shut down databases can be safely moved around or renamed. If a
database was not cleanly shut down, it cannot be automatically safely moved or
renamed.
Multiple databases can be associated with a single transaction log file sequence.
Temporary Databases
The temporary database is used as a backing store for temptables and it is also
used when creating indices.
JET_CALLBACK
JET_PFNREALLOC
JET_PFNSTATUS
http://msdn2.microsoft.com/en-us/library/ms683060(VS.85).aspx
Extensible Storage Engine Constants
The Extensible Storage Engine Constants section contains the following sections:
JET_CBTYP
JET_COLTYP
JET_OBJTYP
JET_SNP
JET_SNT
Error Handling Constants
Event Logging Constants
Extensible Storage Engine System Parameters
Invalid Handle Constants
Maximum Settings Constants
Obsolete Constants