Delphi Internal Data Structures

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Delphi Internal Data Structures

J.R.
2012

Abstract
This document describes the types of executable file generated by the Embarcadero Delphi programming
environment, the specific data structures used in such executables, and how to parse these to allow proper
analysis of the code in a Delphi executable.

Contents
1 Introduction 2
1.1 History of Delphi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Existing Documentation and Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Classes of Executable Generated by Delphi 3


2.1 Identifying Delphi Executables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Structure of a Delphi Executable 8


3.1 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Code Section Data Structures 10


4.1 Locating Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Runtime Type Information (RTTI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.1 0x00 - tkUnknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 0x01 - tkInteger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.3 0x02 - tkChar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.4 0x03 - tkEnumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.5 0x04 - tkFloat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.6 0x05 - tkString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.7 0x06 - tkSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.8 0x07 - tkClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.9 0x08 - tkMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.10 0x09 - tkWChar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.11 0x0A - tkLString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.12 0x0B - tkWString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.13 0x0C - tkVariant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.14 0x0D - tkArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.15 0x0E - tkRecord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.16 0x0F - tkInterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.17 0x10 - tkInt64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.18 0x11 - tkDynArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.19 0x12 - tkUString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.20 0x13 - tkClassRef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.21 0x14 - tkPointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.22 0x15 - tkProcedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Jump Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Virtual Method Tables (VMTs) and associated structures . . . . . . . . . . . . . . . . . . . . . . 17
4.4.1 Interface Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1
4.4.2 Automation Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4.3 Init Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4.4 Typeinfo Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4.5 Field Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4.6 Method Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.7 Dynamic Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 The Package Info Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Resource Section Data Structures 21


5.1 DVCLAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 PACKAGEINFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.2 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3.2 Aggregate Types (vaList and vaCollection) . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Locating User Code 27

7 Code Samples 28
7.1 Brute Force Search for Virtual Method Table Structures . . . . . . . . . . . . . . . . . . . . . . . 28
7.2 Explanation of Functions not Defined in the Listing . . . . . . . . . . . . . . . . . . . . . . . . . 29

1 Introduction
Delphi is a popular Windows-based programming environment. Executables created by Delphi are compliant
with the PE specification but are organised internally in a specific way and contain several features not found
in executables created by other tools. Delphi was originally designed for “Rapid Application Development” at
a time when compilation speed was an issue. As a result, it makes excessive use of precompiled library code.
The Delphi linker always places the code from any library functions into the executable first, with the object
code produced from compiling the user’s source going in at the end.
The challenges of disassembling and running heuristics on a Delphi executable are as follows: firstly, there
is generally a large amount of uninteresting library code preceding the interesting bits, and it is easy to use
up all the available time or space limitations simply analysing such library code. Code signatures help with
this, but there are so many libraries and versions of functions that building a complete set of such signatures
would be a Sisyphean task. Secondly, Delphi places large amounts of data in the code section, interleaving it
between functions. A basic linear-sweep disassembler, encountering such data, will disassemble it into garbage
and become desynchronised. Even more intelligent disassemblers such as IDA Pro occasionally have problems
recognising Delphi’s embedded data structures.

1.1 History of Delphi


Delphi was originally developed by Borland, was later sold to CodeGear along with Borland’s other programming
tools and is now owned by Embarcadero. Fifteen versions of Delphi have been produced since its inception in
1995 (fourteen if Kylix is not included). The first version of Delphi (1.0) targeted 16-bit Windows only. Delphi
2 was the first version that allowed 32-bit Windows development: versions 4 through 7 added extra libraries
and components and support for newer versions of Windows, while version 8 (the first post-Borland version)
was designed to target the Microsoft .NET framework and could not produce native applications at all. This
function was added back to subsequent versions (2005 onwards), though they retained the ability to produce
.NET applications until Delphi 2009, when .NET development was spun off into a separate product. Delphi
2009 also introduced fully Unicode-compliant versions of the VCL and the Delphi runtime library (RTL). The
latest version (as of 2012) is Delphi XE 2, in which 64-bit support has finally been added (however, 64-bit
Delphi executables are outside the scope of this document).
Although every version of Delphi has introduced new features, Delphi executables produced by the later
versions are relatively similar to those produced by the earlier ones (as shown later, the executables may be
classified into three ‘eras’ based on the version of Delphi that produced them). There are a number of structures
2
in the file that are only present in Delphi executables. This provides not only an excellent method of identifying
programs built with Delphi, but, if the structures are parsed, they provide a wealth of information about the
executable, much of which may be useful. Moreover, parsing this data provides essential information to refine
the accuracy of ‘standard’ executable detection techniques.

1.2 Existing Documentation and Sources


The main sources of information are the Delphi runtime library source code (which is available when a copy
of Delphi is purchased), and the official Embarcadero documentation (available online), although these will not
give the full picture until they are combined with an examination of actual Delphi samples in a hex editor and
a disassembler with function and structure recognition capability (e.g. IDA Pro). Also it is necessary to know
how the Delphi compiler translates Pascal data structures into bytes – this is generally not documented but
must be learned by experiment.1 In many cases, a specific structure may be defined in the RTL source code, but
unless its name is already known, this information is not very useful. The official documentation for the latest
versions of Delphi is available from Embarcadero as a wiki.2 For information about older versions, Free Pascal
(an open-source Pascal development environment heavily inspired by Delphi) has a runtime library which is
very closely related to the Delphi RTL, and certain elements of the Free Pascal runtime documentation3 are
applicable to older versions of Delphi also (with care). Delphi 2 was not distributed with complete RTL source
code, but with header files only, and these omit the definitions of many crucial data structures, so these have
to be deduced by analogy with their counterparts in later versions.
There are a number of third-party dumper utilities designed to parse various data out of a Delphi executable,
of which the most full-featured are “DeDe”4 and “revendepro”5 . Source code is available for these programs,
but as they are themselves written in Delphi, they often make use of Delphi internal functions for parsing, and
therefore their source code is not of much greater use than the RTL source code. Furthermore, it should be
noted that these utilities sometimes execute the target program and are therefore not suitable for
malware analysis unless this occurs in an infection environment where malware may be allowed to execute.
Lastly, there are also some “anti-dumping” utilities which are designed to obfuscate a Delphi executable and
prevent the use of dumping programs like those described in the previous section. The most interesting one
is probably Pythia by Sebastian Porst,6 not because of the program itself but because it is accompanied by a
write-up giving details of the Delphi internal data structures which it parses in order to perform its obfuscation.
Although marred by omissions and occasional errors, this document contains a fair description of the Delphi
form resource data format and most of the principal structures associated with Virtual Method Tables (VMTs)

2 Classes of Executable Generated by Delphi


The 32-bit PE files generated by the different versions of Delphi may be divided into three classes or eras, which
will be termed “Ancient”, “Old”, and “New”. Ancient-era executables were created by Delphi version 2, old-era
executables by Delphi versions 3 through to 7, and new-era executables by all versions of Delphi after Delphi 7.
Delphi version 1 could create only 16-bit executables, which are not considered here. There are more differences
between old and new executables than between ancient and old. The exact differences are summarized as follows
(meanings of terms such as “init function” and “RTTI” will be explained later):
Ancient Era (Delphi 2)
ˆ code section is called ‘CODE’

ˆ EP function has an individual CALL instruction for each library init function

ˆ RTTI records not preceded by pointers

ˆ Virtual Method Tables (VMTs) referenced by a pointer to the first function entry, e.g. after all the prefix
fields
1 For instance, the construct ‘set of’ when applied to an enumerated type is equivalent to a bitfield in C.
2 http://docwiki.embarcadero.com/RADStudio/en/
3 http://www.freepascal.org/docs-html/rtl
4 http://sourceforge.net/projects/dede/develop
5 http://www.ggoossen.net/revendepro
6 http://www.the-interweb.com/serendipity/index.php?archives/3-Protecting-the-Oracle-A-proof-of-concept-for-a-Delphi-obfuscat

html

3
Old Era (Delphi 3 to Delphi 7)
ˆ code section is called ‘CODE’
ˆ init and finalize function addresses are placed in a package info table located before the EP function
ˆ RTTI records are more elaborate and always preceded by a pointer
ˆ DVCLAL and PACKAGEINFO resources introduced
ˆ Virtual Method Tables get a vmtSelf pointer as the first entry, and references to a VMT are pointers to
this entry, which occurs before the VMT prefix fields
New Era (Delphi 2007 onwards)
ˆ code section is called ‘.text’, and other section names conform to Microsoft convention
ˆ init functions and the entry point are in a separate ‘.itext’ section, although the package info table itself
is still in ‘.text’
ˆ RTTI has a lot of extra data, and the RTTI record section at the beginning of ‘.text’ is much bigger
ˆ InitTable has extra fields, including a list of unit names like those in the PACKAGEINFO resource
ˆ Virtual Method Tables and their associated sub-tables have many extra and extended fields
Additionally, the definitions of some internal structures (such as the Virtual Method Table) were altered
in the New executable. Embarcadero appear to be continuing to modify various structures which have been
virtually unaltered since Delphi 3: more modifications have been made in Delphi XE 2.

2.1 Identifying Delphi Executables


All Delphi executables have certain characteristic features which can be used to identify them. As specified in
the previous section, features which differ between versions of Delphi may also be used to identify the version
of Delphi that generated the executable. One trivial characteristic which serves to distinguish executables
produced by tools of Borland heritage from others is the string displayed by the DOS stub, which reads “This
program must be run under Win32” as opposed to “This program requires Microsoft Windows”. This is of
course not specific to Delphi: executables produced with Borland C++ and C++ Builder contain identical
DOS stub code.
Compiler start code signaturing may be of some use in identifying Delphi executables which use default
start code, but this is only a small proportion of such executables, since non-form-based applications tend to
have a lot of their ‘functional’ code in the entry point function.
So, the best ways to identify a Delphi executable are these:
1. The characteristic section layout, as illustrated in the examples below. Delphi executables tend to have
more sections than executables generated with Microsoft compilers or other ex-Borland compilers.
2. The presence of Real Time Type Information (RTTI) data structures at the beginning of the main code
section.
3. The Entry Point function occurring either at the very end of the main code section (Ancient and Old) or
at the very end of the ‘.itext’ code section (New).
4. The presence of a package info table structure containing the addresses of each unit’s initialisation and
finalisation functions (Old and New, not Ancient).
5. The presence of specific RCDATA-type resources in the ‘.rsrc’ section:
ˆ a DVCLAL 16-byte licence string
ˆ a PACKAGEINFO structure containing the names of all the Delphi units used to create the exe-
cutable
ˆ one form data resource per form, containing form description data stored in the Delphi format
(beginning with the magic ‘TPF0’)
4
Here are examples of how the section layout differs between old and new-style Delphi executables. Table
1 shows the sections which can be expected to occur in Ancient and Old-style Delphi executables. In these
executables, the names of the first three sections (CODE, DATA, BSS) are always capitalized and do not have
the initial full stop character. —
Name From To Size VAddr VSize Characteristics
CODE 0x00000400 0x00E19000 14781440 0x00401000 14781028 CNT CODE MEM EXECUTE MEM READ
DATA 0x00E19000 0x00E27E00 00060928 0x0121A000 00060520 CNT INITIALIZED DATA MEM READ MEM WRITE
BSS 0x00E27E00 0x00E27E00 00000000 0x01229000 00020365 MEM READ MEM WRITE
.idata 0x00E27E00 0x00E2B600 00014336 0x0122E000 00014014 CNT INITIALIZED DATA MEM READ MEM WRITE
.tls 0x00E2B600 0x00E2B600 00000000 0x01232000 00000024 MEM READ MEM WRITE
.rdata 0x00E2B600 0x00E2B800 00000512 0x01233000 00000024 CNT INITIALIZED DATA MEM SHARED MEM READ
.reloc 0x00E2B800 0x00F22600 01011200 0x01234000 01011012 CNT INITIALIZED DATA MEM SHARED MEM READ
.rsrc 0x00F22600 0x0165DC00 07583232 0x0132B000 07583232 CNT INITIALIZED DATA MEM SHARED MEM READ

Table 1: Section Table as seen in Ancient and Old Era Executables

Table 2 shows the sections of a New executable. The first three sections are now named according to the
standard Microsoft convention, and two additional sections, ‘.itext’ and ‘.didata’, have been added.
Name From To Size VAddr VSize Characteristics
.text 0x00000400 0x00158C00 01411072 0x00401000 01411048 CNT CODE MEM EXECUTE MEM READ
.itext 0x00158C00 0x0015A200 00005632 0x0055A000 00005244 CNT CODE MEM EXECUTE MEM READ
.data 0x0015A200 0x0015F200 00020480 0x0055C000 00020328 CNT INITIALIZED DATA MEM READ MEM WRITE
.bss 0x0015F200 0x0015F200 00000000 0x00561000 00021428 MEM READ MEM WRITE
.idata 0x0015F200 0x00162C00 00014848 0x00567000 00014502 CNT INITIALIZED DATA MEM READ MEM WRITE
.didata 0x00162C00 0x00163600 00002560 0x0056B000 00002272 CNT INITIALIZED DATA MEM READ MEM WRITE
.tls 0x00163600 0x00163600 00000000 0x0056C000 00000076 MEM READ MEM WRITE
.rdata 0x00163600 0x00163800 00000512 0x0056D000 00000024 CNT INITIALIZED DATA MEM READ
.reloc 0x00163800 0x00180E00 00120320 0x0056E000 00120100 CNT INITIALIZED DATA MEM DISCARDABLE MEM READ
.rsrc 0x00180E00 0x001B0200 00193536 0x0058C000 00193536 CNT INITIALIZED DATA MEM READ

Table 2: Section Table as seen in New Era Executables

The differences in the entry point function of executables created by different versions of Delphi may be
observed in IDA Pro. Ancient executables have no package info table and call all the init functions individually
from the entry point, as can be seen in Figure 3.
Old executables have a package info table immediately preceding the entry point (the very end of it can be
seen here in Figure 4 ). The virtual address of the package info table is passed to the InitExe function. The
package info table’s ‘unit entry table’ contains the addresses of the initialize and finalize functions for each unit.
If a unit has no initialize or finalize function, the corresponding unit entry table location contains zero. All the
initialize functions are called by InitExe() – a file infector which targeted Delphi executables could conceal itself
by placing a pointer to the virus code in a spare unit entry table slot.
New executable entry points look similar to those of Old executables, but the package info table no longer
precedes the entry point function, which is in a separate section called .itext (Figure 5).

5
CODE:0044395C public start
CODE:0044395C start proc near
CODE:0044395C push ebp
CODE:0044395D mov ebp, esp
CODE:0044395F add esp, -0Ch
CODE:00443962 call sub_403188
CODE:00443967 call sub_404470
CODE:0044396C call sub_4077B4
CODE:00443971 call sub_40E360
CODE:00443976 call sub_40E720
CODE:0044397B call sub_4157BC
CODE:00443980 call sub_41D02C
CODE:00443985 call sub_41EF88
CODE:0044398A call sub_430020
CODE:0044398F call sub_431DF0
CODE:00443994 call sub_4356A4
CODE:00443999 call sub_436774
CODE:0044399E call sub_437448
CODE:004439A3 mov eax, ds:dword_44556C
CODE:004439A8 add eax, 30h
CODE:004439AB mov edx, (offset aImagedit_hlp+4)
CODE:004439B0 call sub_403278
CODE:004439B5 mov ecx, offset unk_4456E0 ; Ancient-style CreateForm call
CODE:004439BA mov edx, offset VMT_TMainForm
CODE:004439BF mov eax, ds:dword_44556C
CODE:004439C4 call sub_414E34
CODE:004439C9 mov eax, ds:dword_44556C
CODE:004439CE call sub_414EC4
CODE:004439D3 call sub_403FB3
CODE:004439D8 mov esp, ebp
CODE:004439DA pop ebp
CODE:004439DB retn
CODE:004439DB start endp
~

Table 3: Ancient-era Delphi entry point function

CODE:00479C00 dd offset final_47804C


CODE:00479C04 dd offset init_479280
CODE:00479C08 dd offset final_479238
CODE:00479C0C dd offset init_479914
CODE:00479C10 dd offset final_4798E4
CODE:00479C14 align 8
CODE:00479C18 dd offset unknown_libname_950
CODE:00479C1C ; ---------------------------------------------------------------------------
CODE:00479C1C
CODE:00479C1C public start
CODE:00479C1C start:
CODE:00479C1C push ebp
CODE:00479C1D mov ebp, esp
CODE:00479C1F add esp, 0FFFFFFF0h
CODE:00479C22 mov eax, offset InitTable
CODE:00479C27 call @Sysinit@@InitExe$qqrpv
CODE:00479C2C mov eax, ds:ptr_to_value_used_in_EP
CODE:00479C31 mov eax, [eax]
CODE:00479C33 call unknown_called_from_EP
CODE:00479C38 mov ecx, ds:passed_to_CreateForm_in_ecx
CODE:00479C3E mov eax, ds:ptr_to_value_used_in_EP
CODE:00479C43 mov eax, [eax]
CODE:00479C45 mov edx, FormTable
CODE:00479C4B call @Forms@TApplication@CreateForm$qqrp17System@TMetaClasspv
CODE:00479C50 mov eax, ds:ptr_to_value_used_in_EP
CODE:00479C55 mov eax, [eax]
CODE:00479C57 call @Forms@TApplication@Run$qqrv
CODE:00479C5C call @System@@Halt0$qqrv
CODE:00479C5C ; ---------------------------------------------------------------------------
CODE:00479C61 db 8Dh, 40h, 0
CODE:00479C64 dd 67h dup(0)
CODE:00479E00 dd 80h dup(?)
CODE:00479E00 CODE ends

Table 4: Old era Delphi entry point function


6
.itext:00705018 public start
.itext:00705018 start proc near
.itext:00705018
.itext:00705018 var_2C = dword ptr -2Ch
.itext:00705018 var_28 = dword ptr -28h
.itext:00705018 var_24 = dword ptr -24h
.itext:00705018 var_20 = dword ptr -20h
.itext:00705018 var_1C = dword ptr -1Ch
.itext:00705018 var_18 = dword ptr -18h
.itext:00705018 var_14 = dword ptr -14h
.itext:00705018
.itext:00705018 push ebp
.itext:00705019 mov ebp, esp
.itext:0070501B mov ecx, 5
.itext:00705020
.itext:00705020 loc_705020:
.itext:00705020 push 0
.itext:00705022 push 0
.itext:00705024 dec ecx
.itext:00705025 jnz short loc_705020
.itext:00705027 push ecx
.itext:00705028 push ebx
.itext:00705029 push esi
.itext:0070502A mov eax, offset InitTable
.itext:0070502F call @Sysinit@@InitExe$qqrpv
.itext:00705034 xor eax, eax
.itext:00705036 push ebp
.itext:00705037 push offset loc_705245
.itext:0070503C push dword ptr fs:[eax]
.itext:0070503F mov fs:[eax], esp
.itext:00705042 mov eax, off_70ED30
.itext:00705047 mov byte ptr [eax], 0
.itext:0070504A call @System@ParamCount$qqrv
.itext:0070504F mov ebx, eax
.itext:00705051 test ebx, ebx
.itext:00705053 jle short loc_70508A
.itext:00705055 mov esi, 1

Table 5: New era Delphi entry point function

7
3 Structure of a Delphi Executable
The Delphi compilation environment takes Pascal code written by the user and compiles it into one or more
‘units’. Then, the user code units are combined with code from a number of standard or custom library units
to make the final executable. Classically, library units are statically linked into the executable, but dynamic
linking is also available, where the library units’ code is in an external dynamic library file (Borland did not
use standard DLLs for this but created their own proprietary ‘BPL’). Dynamically-linked Delphi executables
appear to be quite rare, so this document will only consider statically-linked types.
As a general rule, library code always precedes user code in the final executable file, with the entry point
function usually being the last function in its section. Each unit may define an initialisation and finalisation
function, with the initialisation function being called when the unit is loaded and the finalisation function when
it is unloaded. The addresses of all such functions are stored in the package info table (where present). New-
style Delphi executables have a separate section (‘.itext’) just for initialisation functions and the entry point
function, but finalisation functions and the package info table data structure itself are still located in the main
.text section.
Compiled units (libraries and compiled user code) have the extension ‘.dcu’ and use a proprietary file format
which frequently changed between versions of Delphi (originally a deliberate measure on the part of Borland to
prevent third-party tools from being able to read the file format). There are a few utilities online that claim to
be able to parse some versions of this format, but none work consistently well on more than a few versions.

3.1 Basic Data Types


For simplicity, the various Delphi intrinsic data type names will not be used in this document. Instead, the
stdint.h type names will be used for integers (e.g. uint8 t, int32 t). Structures will be described in C-style
pseudocode (pseudo because typedefs will be omitted and structures will be permitted to contain variable-length
fields). Many Delphi structures have fields which only contain any data if a preceding ‘count’ field is greater
than zero, and such ‘optional’ fields will be clearly indicated as such.
Some Delphi data types have no direct equivalent in C. For example, Delphi has a construct “set of” which
appears to take an enumerated type and create a packed bitfield from it. The individual values of the enumerated
type thereby become bit flags instead. All such data types referred to in this document are one byte in length.
Also falling into this category are Delphi’s string types, which are shown in Table 6. Confusingly, C-style
null-terminated strings are also used in Delphi in certain places, and these will be denoted as “C STRING”.

struct PASCAL STRING


{
uint8 t length ;
uint8 t data [ l e n g t h ] ;
};

struct LONG PASCAL STRING


{
uint32 t length ;
uint8 t data [ l e n g t h ] ;
};

struct UNICODE PASCAL STRING


{
uint32 t length in unicode chars ;
uint16 t data [ l e n g t h i n u n i c o d e c h a r s ] ;
};

Table 6: Delphi String Types

The TAttrData structure (Table 7) is also used in several places in new-style Delphi executables, usually
occurring as a field at the end of other structures.
Delphi structures whose first field is called some variation on ‘length’ generally mean that the first field
contains the total size of the structure in bytes, including itself. This is true of TAttrData, and it is therefore
possible to skip over TAttrData without parsing every field (if the Len field’s value is 2, this means the
structure is empty, and such empty TAttrData structures are very common).

8
struct TAttrData
{
uint16 t Len ;
TAttrEntry AttrEntry [ ] ;
};

struct TAttrEntry
{
uint32 t AttrType ;
uint32 t AttrCtor ;
uint16 t ArgLen ;
uint8 t ArgData [ ArgLen ] ;
};

Table 7: The TAttrData structure

3.2 Alignment
All Delphi data structures are aligned: the starting virtual address of each structure must be divisible by four.7
Padding bytes are used to ensure alignment, and it is therefore necessary to take these into account when
parsing a structure. Until recently, Delphi used set padding consisting of one, two, and three-byte ‘no-op’ x86
instruction sequences, as shown in Table 8. Recent versions of Delphi have started using zeroes as padding
instead.

Size Value Instruction Sequence


1-byte 90 nop (xchg eax, eax)
2-byte 8B C0 mov eax, eax
3-byte 8D 40 00 lea eax, [eax]

Table 8: Delphi x86 Padding Sequences

7 NB: this is not always true of structures which are embedded inside a larger structure.
9
4 Code Section Data Structures
The main code section of a Delphi executable contains many data structures in addition to executable code.
These structures may be inserted virtually anywhere in the code, making it very difficult for a linear sweep
disassembler without knowledge of their location to avoid disassembling data into garbage. If, on the other
hand, the location and size of these structures can be determined in advance, the disassembler can skip over
them: but because most of the structures are complex and can vary in length, it is usually necessary to parse
them before this information is available.

4.1 Locating Data Structures


There are three ways to find data structures in the main code section:

1. If their virtual addresses are used as an argument to a function called from the entry point, e.g. Create-
Form, InitExe, the addresses may be obtained from here.

2. If they are referred to by a pointer in another structure, for example the VMT typeinfo subtable has
pointers to RTTI records, and VMTs themselves contain a pointer to their parent VMT.
3. If the structures always occur in the same place (for instance, the RTTI records at the beginning of the
main code section).

4. Via a brute-force search. This is not terribly efficient but may sometimes be the only possibility.

4.2 Runtime Type Information (RTTI)


Most Delphi executables contain RTTI records at the beginning of their main code section. Each record begins
with a byte which identifies the record type, followed by a Pascal string giving the name. The rest of the
data is variable and depends on the record type. In Delphi 2, there were only thirteen different RTTI record
types (tkUnknown, tkInteger, tkChar, tkEnumeration, tkFloat, tkString, tkSet, tkClass, tkMethod, tkWChar,
tkLString, tkLWString, tkVariant). Later versions not only added more types, bringing the total to 22, but also
added new fields to the existing type definitions. New-era versions of Delphi make much more extensive use
of RTTI than do previous versions, and the RTTI record section is generally much bigger in such executables.
Also, a new TAttrData structure has been added at the end of each RTTI record.
It should be noted that RTTI records are not designed to be parsed in a linear fashion, but on a per-record
basis on following a pointer from some other structure (for instance, a TypeInfo field in a VMT subtable).
Linear parsing is possible to a certain extent, but the RTTI records are often interspersed with non-RTTI data
and unless this has been parsed first and can be skipped, linear parsing will generally fail at some point.
In the following, fields marked with a asterisk in a comment occur only in New-era executables.

4.2.1 0x00 - tkUnknown


This is a placeholder type. It is not clear what if any data is associated with it, since it has never been observed
in a live sample.

4.2.2 0x01 - tkInteger


This is an Ordinal type. All Ordinal types (being tkInteger, tkChar, tkEnumeration, tkSet, tkWChar) have the
same data structure, consisting of an 8-bit type field, two signed 32-bit fields giving the minimum and maximum
value the type can hold and some AttrData. The OrdType enum is the same for every other Ordinal type.
Types tkEnumeration and tkSet have extra fields.

enum OrdType
{
otSByte ,
otUByte ,
otSWord ,
otUWord ,
otSLong ,
otULong
};
10
struct t k I n t e g e r
{
TOrdType OrdType ;
int32 t MinValue ;
int32 t MaxValue ;
TAttrData AttrData ; // *
}

4.2.3 0x02 - tkChar


Another ordinal type - fields same as tkInteger.
struct tkChar
{
TOrdType OrdType ;
int32 t MinValue ;
int32 t MaxValue ;
TAttrData AttrData ; // *
}

4.2.4 0x03 - tkEnumeration


The type tkEnumeration is an Ordinal type with extra fields.
struct tkEnumeration
{
TOrdType OrdType ;
int32 t MinValue ;
int32 t MaxValue ;
uint32 t BaseType ;
PASCAL STRING NameList [ ] ;
PASCAL STRING EnumUnitName ; // *
TAttrData EnumAttrData ; // *
}

4.2.5 0x04 - tkFloat


Records of type tkFloat contain a single byte float type field (plus TAttrData if in a New-era executable).
enum TFloatType
{
ftSingle ,
ftDouble ,
ftExtended ,
ftComp ,
. . . ftCurr
};

struct t k F l o a t
{
TFloatType FloatType ;
TAttrData F l o a t A t t r D a t a ; // *
}

4.2.6 0x05 - tkString


Records of type tkString contain only an 8-bit integer giving the maximum string length (plus TAttrData if in
a New-era executable).
struct t k S t r i n g
{
uint8 t MaxLength ;
TAttrData S t r A t t r D a t a ; // *
}

11
4.2.7 0x06 - tkSet
Type tkSet, in addition to its OrdType, contains a 32-bit integer pointing to the RTTI record for the type it
contains (e.g. a set of integers would contain a pointer to the tkInteger RTTI record).
struct t k S e t
{
TOrdType OrdType ;
uint32 t CompType ;
TAttrData S e t A t t r D a t a ; // *
}

4.2.8 0x07 - tkClass


Type tkClass is a relatively complex record. The ClassType field is a pointer to the VMT for the class, whereas
the ParentInfo field is a pointer to the RTTI record for the class’s parent. The PropCount field specifies the
number of properties the class has, and PropData and PropDataEx are subrecords containing information about
the properties. PropDataEx is specific to New-era executables.
struct t k C l a s s
{
uint32 t ClassType ; // p o i n t e r t o t h e VMT f o r t h e c l a s s
uint32 t P a r e n t I n f o ; // p o i n t e r t o t h e RTTI r e c o r d o f t h e p a r e n t
uint16 t PropCount ;
PASCAL STRING UnitName ;
TPropData PropData ;
TPropDataEx PropDataEx ; // *
TAttrData ClassAttrData ; // *
};

struct TPropData
{
uint16 t PropCount ;
TPropInfo P r o p L i s t [ PropCount ] ;
};

struct TPropInfo
{
uint32 t PropType ; // p o i n t e r t o RTTI r e c o r d
uint32 t GetProc ;
uint32 t SetProc ;
uint32 t StoredProc ;
uint32 t Index ;
uint32 t Default ;
uint16 t NameIndex ;
PASCAL STRING Name ;
};
struct TPropDataEx
{
uint16 t PropCount ;
TPropInfoEx P r o p L i s t [ PropCount ] ;
};

struct TPropInfoEx
{
uint8 t Flags ;
uint32 t Info ; // p o i n t e r t o p o i n t e r t o RTTI r e c o r d
TAttrData AttrData ;
}

4.2.9 0x08 - tkMethod


This type describes a method and stores data such as whether the method is a procedure or a function, how
many arguments it has, what calling convention it uses, and so forth. In Ancient-era executables there are only
two method kinds (mkProcedure and mkFunction).

12
enum TMethodKind
{
mkProcedure ,
mkFunction ,
mkConstructor ,
mkDestructor ,
mkClassProcedure ,
mkClassFunction ,
mkClassConstructor ,
mkClassDestructor ,
mkOperatorOverload ,
mkSafeProcedure ,
mkSafeFunction
};

enum TCallConv
{
ccReg ,
ccCdecl ,
ccPascal ,
ccStdCall ,
ccSafeCall
};

struct ParamList
{
TParamFlags Flags ; // b y t e = s i z e d f i e l d
DELPHI STRING ParamName ;
DELPHI STRING TypeName ;
}

struct TProcedureParam
{
uint8 t Flags ;
uint32 t ParamType ;
PASCAL STRING Name ;
TAttrData ParamAttrData ;
};

struct T P r o c e d u r e S i g n a t u r e
{
uint8 t Flags ;
TCallConv CC;
uint32 t ResultType ; // PPTypeInfo
uint8 t ParamCount ;
TProcedureParam Params [ ParamCount ] ;
};

struct tkMethod
{
TMethodKind MethodKind ;
uint8 t ParamCount ;
ParamList Params [ ParamCount ] ;
PASCAL STRING ResultType ; // o n l y p r e s e n t i f MethodKind = mkFunction
uint32 t PPTypeInfo ; // * o n l y p r e s e n t i f MethodKind = mkFunction
TCallConv CC; // *
uint32 t ParamTypeRefs [ ParamCount ] ; // *
uint32 t MethSig ; // * p o i n t e r t o a T P r o c e d u r e S i g n a t u r e
TAttrData MethAttrData ;
}

4.2.10 0x09 - tkWChar


Another Ordinal type, with the same fields as the others.
struct tkWChar
{
TOrdType OrdType ;
int32 t MinValue ;
int32 t MaxValue ;
13
TAttrData AttrData ; // *
}

4.2.11 0x0A - tkLString


Ancient and Old-era executables contain no data in this record other than the identifier byte and type name.
New-era executables contain a 16-bit CodePage field and a TAttrData structure.
struct t k L S t r i n g
{
uint16 t CodePage ;
TAttrData LStrAttrData ;
};

4.2.12 0x0B - tkWString


Ancient and Old-era executables contain no data in this record other than the identifier byte and type name.
New-era executables contain TAttrData.
struct tkWString
{
TAttrData LStrAttrData ;
};

4.2.13 0x0C - tkVariant


Ancient and Old-era executables contain no data in this record other than the identifier byte and type name.
New-era executables contain TAttrData.
struct t k V a r i a n t
{
TAttrData LStrAttrData ;
};

4.2.14 0x0D - tkArray

struct TArrayTypeData
{
uint32 t Size ;
uint32 t ElCount ;
uint32 t ElType ; // p o i n t e r t o a RTTI r e c o r d
uint8 t DimCount ;
uint32 t Dims [ DimCount ] ;
};

struct tkArray
{
TArrayTypeData ArrayData ;
TAttrData ArrayAttrData ;
};

4.2.15 0x0E - tkRecord

struct TManagedField
{
uint32 t TypeRef ; // p o i n t e r t o a RTTI r e c o r d
uint32 t FieldOffset ;
};

struct TRecordTypeField
{
14
uint32 t TypeRef ;
uint32 t FldOffset ;
uint8 t Flags ;
PASCAL STRING FieldName ;
TAttrData AttrData ;
};

struct tkRecord
{
uint32 t RecSize ;
uint32 t ManagedFieldCount ;
TManagedField ManagedFields [ ManagedFieldCount ] ; // o n l y p r e s e n t i f ManagedFieldCount > 0
uint8 t NumOps ;
uint32 t RecOps [ NumOps ] ; // o n l y p r e s e n t i f NumOps > 0
uint32 t RecFldCnt ;
TRecordTypeField R e c F i e l d s [ RecFieldCnt ] ;
TAttrData RecAttrData ;
};

4.2.16 0x0F - tkInterface

// I n t f F l a g b i t v a l u e s
#define ifHasGuid 0 x1
#define i f D i s p I n t e r f a c e 0 x2
#define ifDispatch 0 x4

struct TGUID
{
uint32 t D1 ;
uint16 t D2 ;
uint16 t D3 ;
uint8 t D4 [ 8 ] ;
};

struct t k I n t e r f a c e
{
uint32 t IntfParent ;
uint8 t IntfFlags ; // b i t f i e l d
TGUID Guid ;
PASCAL STRING I n t f U n i t ;
TIntfMethodTable I n t f M e t h o d s ;
TAttrData IntfAttrData ;
};

4.2.17 0x10 - tkInt64

struct t k I n t 6 4
{
int64 t MinInt64Value ;
int64 t MaxInt64Value ;
TAttrData Int64AttrData ;
};

4.2.18 0x11 - tkDynArray

struct tkDynArray
{
uint32 t elSize ;
uint32 t elType ; // p o i n t e r t o RTTI
uint32 t varType ;
uint32 t elType2 ;
PASCAL STRING DynUnitName ;
uint32 t DynArrElType ;
TAttrData DynArrAttrData ;
15
};

4.2.19 0x12 - tkUString

struct t k U S t r i n g
{
TAttrData AttrData ; // *
};

4.2.20 0x13 - tkClassRef

struct t k C l a s s R e f
{
uint32 t PPInstanceType ;
TAttrData ClassRefAttrData ;
};

4.2.21 0x14 - tkPointer

struct t k P o i n t e r
{
uint32 t RefType ;
TAttrData PtrAttrData ;
}

4.2.22 0x15 - tkProcedure

struct t k P r o c e d u r e
{
TProcedureSignature ProcSig ;
TAttrData ProcAttrData ;
};

struct T P r o c e d u r e S i g n a t u r e
{
uint8 t Flags ;
TCallConv CC;
uint32 t PPTypeInfo ;
uint8 t ParamCount ;
TProcedureParam Params [ ParamCount ] ;
};

TCallConv = enum( ccReg , c c C d e c l , c c P a s c a l , c c S t d C a l l , c c S a f e C a l l ) ;

struct TProcedureParam
{
uint8 t Flags ;
uint32 t PPTypeInfo ;
PASCAL STRING Name ;
TAttrData Attr ;
};

4.3 Jump Tables


Windows API functions in Delphi executables are generally called via jump tables rather than via direct far
calls to the import section. A function which wishes to call an API contains a near (0xE8) call to the jump
table entry, which in turn contains an absolute (0xFF 0x25) jump instruction to the relevant Import Table
entry. For example, suppose a function needs to call FreeLibrary() whose import table entry was at 0x40C0D4
(in the .idata section). The call instruction in the function is at 0x401408, and looks like this:
16
CODE: 0 0 4 0 1 4 0 8 E8 13 FC FF FF call 0 x401020

0x401020 is a jump table entry, and looks like this:


CODE: 0 0 4 0 1 0 2 0 FF 25 D4 C0 40 00 jmp ds : 0 x40C0D4

There is usually such a jump table located at the beginning of the code section immediately following the
RTTI records (if these exist). Locating this jump table is a useful way to determine the maximum size of the
RTTI record area, since once the jump table has been located, the RTTI records have finished, and it may not
always be possible to parse all the RTTI records in a linear fashion to determine their exact size.

4.4 Virtual Method Tables (VMTs) and associated structures


Every Delphi object class has a Virtual Method Table to store the addresses of its method functions. Addition-
ally, a set of ‘prefix fields’ store pointers to the class name and method and field (subobject) data. The most
interesting VMTs in a Delphi executable are those associated with forms – each form object has a VMT and the
VMT must be used to map method names from the form’s resource data (see Section 5.3) onto the addresses
of the corresponding functions.
Every Delphi object is derived from the base class ‘TObject’, and it is possible to locate the VMT for the
TObject class by following the vmtParent pointer in any VMT until you get to a VMT whose vmtParent field
is zero - this is always the VMT for TObject.8
The prefix fields are referred to in Delphi documentation via negative offsets, and in Ancient-era executables
this is the only way to access them, as the vmtSelf pointer is not present. In the following listing, the negative
offsets are shown in comments.
struct V i r t u a l M e t h o d T a b l e P r e f i x
{
uint32 t vmtSelf ; // =0x4C ( = 76)
uint32 t vmtIntfTable ; // =0x48 ( = 72)
uint32 t vmtAutoTable ; // =0x44 ( = 68)
uint32 t vmtInitTable ; // =0x40 ( = 64)
uint32 t vmtTypeInfo ; // =0x3C ( = 60)
. uint32 t vmtFieldTable ; // =0x38 ( = 56)
uint32 t vmtMethodTable ; // =0x34 ( = 52)
uint32 t vmtDynamicTable ; // =0x30 ( = 48)
uint32 t vmtClassName ; // =0x2C ( = 44)
uint32 t vmtInstanceSize ; // =0x28 ( = 40)
uint32 t vmtParent ; // =0x24 ( = 36)
uint32 t vmtSafeCallException ; // =0x20 ( = 32)
uint32 t vmtAfterConstruction ; // =0x1C ( = 28)
uint32 t vmtBeforeConstruction ; // =0x18 ( = 24)
uint32 t vmtDispatch ; // =0x14 ( = 20)
uint32 t vmtDefaultHandler ; // =0x10 ( = 16)
uint32 t vmtNewInstance ; // =0x0C ( = 12)
uint32 t vmtFreeInstance ; // =0x08 ( = 08)
uint32 t vmtDestroy ; // =0x04 ( = 04)
};

The Ancient-style VMT does not have the vmtSelf or vmtIntfTable entries, and is shown in the next listing.
struct O l d V i r t u a l M e t h o d T a b l e P r e f i x
{
uint32 t vmtAutoTable ; // =0x34 ( = 52)
uint32 t vmtInitTable ; // =0x30 ( = 48)
uint32 t vmtTypeInfo ; // =0x2C ( = 44)
uint32 t vmtFieldTable ; // =0x28 ( = 40)
uint32 t vmtMethodTable ; // =0x24 ( = 36)
uint32 t vmtDynamicTable ; // =0x20 ( = 32)
uint32 t vmtClassName ; // =0x1C ( = 28)
uint32 t vmtInstanceSize ; // =0x18 ( = 24)
uint32 t vmtParent ; // =0x14 ( = 20)
uint32 t vmtSafeCallException ; // =0x10 ( = 16)
uint32 t vmtAfterConstruction ; // =0x0C ( = 12)
uint32 t vmtBeforeConstruction ; // =0x08 ( = 08)
8 For a brief explanation of the VMT fields, see http://pages.cs.wisc.edu/~rkennedy/vmt.
17
uint32 t vmtDispatch ; // =0x04 ( = 04)
};

4.4.1 Interface Table


This table is not present in Ancient-era executables, although the Automation Table performs a similar function.

struct T I n t e r f a c e E n t r y
{
TGUID Guid ;
uint32 t VTable ;
uint32 t IOffset ;
uint32 t ImplGetter ;
};

struct T I n t e r f a c e T a b l e
{
uint32 t EntryCount ;
T I n t e r f a c e E n t r y E n t r i e s [ EntryCount ] ;
uint32 t I n t f s [ EntryCount ] ;
};

4.4.2 Automation Table


In all Delphi executables other than Ancient-era, this field is always empty, since the Automation Table was
only used in Delphi 2. The Interface Table has taken over some of the Automation Table’s functionality.

4.4.3 Init Table


This contains information about fields which need to be explicitly deallocated or ‘cleaned up’ when the object
is destroyed.

struct T F i e l d I n f o
{
uint32 t TypeInfo ;
uint32 t Offset ;
};

struct T F i e l d T a b l e // do n o t c o n f u s e w i t h TVmtFieldTable !
{
uint16 t X;
uint32 t Size ;
uint32 t Count ;
TFieldInfo Fields [ ] ;
};

4.4.4 Typeinfo Table


This pointer points to a RTTI record for the current class. This may be parsed in the same way as any other
RTTI record of type tkClass.

4.4.5 Field Table


The Field Table contains information about the subobjects or fields belonging to a class. Extended field entries
are found only in New-era executables.

struct TVmtFieldExEntry
{
uint8 t Flags ;
uint32 t TypeRef ; // p o i n t e r t o a RTTI r e c o r d
uint32 t Offset ;
PASCAL STRING Name ;
18
TAttrData AttrData ;
};

struct TVmtFieldEntry
{
uint32 t FieldOffset ;
uint16 t TypeIndex ; // i n d e x i n t o TVmtFieldClassTab
PASCAL STRING Name ;
};

struct TVmtFieldClassTab
{
uint16 t Count ;
uint32 t C l a s s R e f [ Count ] ;
};

struct TVmtFieldTable
{
uint16 t Count ;
uint32 t ClassTab ; // p o i n t e r t o a TVmtFieldClassTab s t r u c t u r e
TVmtFieldEntry Entry [ Count ] ;
uint16 t ExCount ; // *
TVmtFieldExEntry ExEntry [ ExCount ] ; // *
};

4.4.6 Method Table


This is probably the most important table locatable from the VMT as it enables the form method names found
in the form resources to be mapped to actual function addresses. Extended method entries are found only in
New-era executables.

struct TVmtMethodParam
{
uint8 t Flags ;
uint32 t ParamType ; // p o i n t e r t o a RTTI r e c o r d
uint8 t ParOff ; // s p e c i f i e s w h e t h e r parameter i s p a s s e d v i a a r e g i s t e r
// or on t h e s t a c k
PASCAL STRING Name ;
TAttrData AttrData ;
};

struct TVmtMethodEntryTail
{
uint8 t Version ; // u s u a l l y ’ 3 ’
TCallConv CC;
uint32 t ResultType ; // p o i n t e r t o a RTTI r e c o r d
uint16 t ParOff ; // amount o f s t a c k s p a c e needed f o r p a r a m e t e r s
uint8 t ParamCount ;
TVmtMethodParam Params [ ParamCount ] ;
};

struct TVmtMethodEntry
{
uint16 t Len ; // s p e c i f i e s e n t i r e l e n g t h o f e n t r y i n b y t e s
uint32 t CodeAddress ; // a d d r e s s o f t h e a c t u a l f u n c t i o n
PASCAL STRING MethodName ;
TVmtMethodEntryTail T a i l ; // p r e s e n t o n l y i f Len > 6 + s i z e o f ( MethodName )
};

struct TVmtMethodExEntry
{
uint32 t Entry ; // p o i n t e r t o a TVmtMethodEntry
uint16 t Flags ;
int16 t VirtualIndex ;
};

struct TVmtMethodTable
{
uint16 t Count ;
19
TMethodTableEntry Entry [ Count ] ;
uint16 t ExCount ; // *
TMethodTableExEntry ExEntry [ ExCount ] ;
};

4.4.7 Dynamic Table


This is a table listing the addresses of any dynamic functions owned by the object. They are accessed via
a 16-bit ‘selector’ value. Because the structure consists of one variable-length array followed by another, the
process of looking up an address is a little involved (see FindDynaMethod() in System.pas for details of how
Delphi does this internally).
struct TDynaMethodTable
{
uint16 t Count ;
uint16 t S e l e c t o r s [ Count ] ;
uint32 t Addrs [ Count ] ;
}

A selector is looked up by locating its value in the Selectors array. The required address is at the corre-
sponding index in the Addrs array.

4.5 The Package Info Table


The package info table (not to be confused with the PACKAGEINFO resource) is used by the InitExe library
function in Old and New executables, as has previously been shown. The structure of the package info table
is slightly different (New executables introduce the TPackageTypeInfo subtable), but in practice parsing code
can easily account for both by checking for the presence of the extra fields (indicated by asterisks in Table 9):

struct UnitEntryTable
{
PackageUnitEntry e n t r i e s [ NumberOfUnits ] ;
};

struct PackageUnitEntry
{
uint32 t Init ; // p o i n t e r t o u n i t ’ s i n i t i a l i s a t i o n f u n c t i o n
uint32 t FInit ; // p o i n t e r t o u n i t ’ s f i n a l i s a t i o n f u n c t i o n
};

struct TPackageTypeInfo
{
uint32 t TypeCount ;
uint32 t PTypeTable ; // p o i n t e r t o t y p e t a b l e ( a r r a y o f p o i n t e r s )
uint32 t UnitCount ;
uint32 t UnitNames ; // p o i n t e r t o a c o n c a t e n a t i o n o f PASCAL STRING o b j e c t s
};

struct P a c k a g e I n f o T a b l e
{
uint32 t UnitCount ;
uint32 t U n i t I n f o ; // p o i n t e r t o a U n i t E n t r y T a b l e
TPackageTypeInfo TypeInfo ; // *
};

Table 9: The Package Info Table

The type table is an array of pointers to RTTI type records. The reason for this table has not been
ascertained (the Embarcadero documentation just says “for internal use”) but it is likely that the table contains
a pointer to every RTTI record in the executable. If so, this would be extremely useful as it would remove the
need for linear parsing of the RTTI structures at the beginning of the main code section, but only in New-era
executables as this structure does not exist in previous versions.

20
4.6 Strings
Delphi has a tendency to place small amounts of data in the code section adjacent to the function that uses the
data. This includes string data. Because these strings are usually Pascal strings, they may not be detected by
standard string-search techniques, or if they are detected the boundaries may not be detected properly. Such a
string is shown in Table 10 (note the function prologue sequence occurring immediately afterwards):

|001a3a40 22 0d 0a 00 ff ff ff ff-4a 00 00 00 54 68 65 20 |"?? ????J The |


|001a3a50 66 69 6c 65 20 63 61 6e-20 6e 6f 74 20 62 65 20 |file can not be |
|001a3a60 65 78 65 63 75 74 65 64-20 6f 72 20 68 61 73 20 |executed or has |
|001a3a70 65 78 69 74 65 64 20 69-6d 6d 65 64 69 61 74 65 |exited immediate|
|001a3a80 6c 79 20 61 66 74 65 72-20 62 65 69 6e 67 20 73 |ly after being s|
|001a3a90 74 61 72 74 65 64 00 00-55 8b ec 83 c4 f4 89 55 |tarted U??????U|
|001a3aa0 f8 89 45 fc a1 cc db 5a-00 80 38 00 74 1c 8b 55 |??E????Z ?8 t??U|

Table 10: String embedded in a Delphi code section

An alternative technique for brute-force searching for Delphi strings is possible because such strings, where
they occur as standalone data items not part of another structure, are usually preceded by the integer value
0xFFFFFFFF. Therefore it is quite simple to search for all the occurrences of 0xFFFFFFFF on an aligned
boundary in the code section and then determine whether the data following them looks as if it might be a
Pascal string in any of the three formats that Delphi uses (short length prefix, long length prefix, or Unicode).
Such a scan not only ensures that all strings are located, but also allows the space occupied by them to be
marked as data and skipped during code analysis. Examples of strings extracted via this method from a Delphi
code section are shown in Table 11.

0x005A1E14|0x001A1214(SHORTSTR 0036):"Dont<20>Know<20>How<20>to<20>Handle<20>Data<20>Type<20>0x"
0x005A2080|0x001A1480(SHORTSTR 0012):"Parser<20>Error"
0x005A23D8|0x001A17D8(SHORTSTR 0012):"H<85><C0><7C>f@<89>E<E8><C7>E<EC>"
0x005A2C60|0x001A2060(SHORTSTR 0012):"clientheight"
0x005A2C88|0x001A2088(SHORTSTR 0011):"clientwidth"
0x005A2F98|0x001A2398(SHORTSTR 0019):"can<20>not<20>read<20>memory"
0x005A4040|0x001A3440(SHORTSTR 0012):"<25>s<20><28>ver<2E><20><25>s<29>"
0x005A464C|0x001A3A4C(SHORTSTR 0074):"The<20>file<20>can<20>not<20>be<20>executed<20>or<20>has<20>exited<20>immediately (...)

Table 11: Brute Force String Extraction from a Delphi Code Section

5 Resource Section Data Structures


5.1 DVCLAL
This stands for ”Delphi Visual Component Access License” and is a 16-byte string. It is not found in the earliest
versions of Delphi. Its purpose is to provide a measure of DRM for third-party precompiled library units, some
of which are only licensed to be used with certain versions of Delphi. The Personal, Professional and Enterprise
versions of Delphi each embed a different DVCLAL value in executables they create. Library code can check this
and refuse to run if, say, a unit or component only authorised to be used with Delphi Enterprise is included in an
executable created by Delphi Personal. Unless this functionality has been used, executables make no reference
to the DVCLAL at runtime, and it may be deleted from the executable without impairing its functionality in
any way.
The number of possible DVCLAL strings appears to be limited to three, relating to the Enterprise, Pro-
fessional and Personal ‘editions’ of Delphi (see Table 12). It is not known how they are calculated: they may
simply be arbitrary digit strings or perhaps MD5 hashes. Experiments have shown that when the Delphi com-
piler creates a new executable, it simply embeds the DVCLAL value from its own resource section into the
resource section of the new executable: no validation is performed by the compiler.

21
Enterprise 26 3D 4F 38 C2 82 37 B8 F3 24 42 03 17 9B 3A 83
Professional A2 8C DF 98 7B 3C 3A 79 26 71 3F 09 0F 2A 25 17
Personal 23 78 5D 23 B6 A5 F3 19 43 F3 40 02 26 D1 11 C7

Table 12: DVCLAL values

5.2 PACKAGEINFO
The PACKAGEINFO resource is a simple data structure containing some information about the executable (its
type and which tool was used to create it) as well as a list of the names of all the units in the executable (both
the library units and the user code units). Unusually for Delphi, this data structure uses C-style zero-terminated
strings rather than Pascal-style strings with a length prefix.

5.2.1 Data Structures


The structures which make up the PACKAGEINFO resource are defined in SysUtils.pas and are shown in Table
13.

struct TPkgName
{
uint8 t HashCode ;
C STRING Name ;
};

struct TUnitName
{
uint8 t Flags ;
uint8 t HashCode ;
C STRING Name ;
};

struct TPackageInfoHeader
{
uint32 t Flags ;
uint32 t R equi resC ount ;
TPkgName R e q u i r e s [ Req uire sCou nt ] ;
uint32 t ContainsCount ;
TUnitName C o n t a i n s [ ContainsCount ] ;
};

Table 13: PACKAGEINFO structures

5.2.2 Flags
The main package flags are a 32-bit value interpreted as follows:

Bit Meaning
0 Build flag (Never build if 1, always build if 0)
1 Design-time only (Yes if 1, No if 0, must be opposite value to bit 2)
2 Run-time only (Yes if 1, No if 0, must be opposite value to bit 1)
3 Check for duplicates (Do not check if 1, Do check if 0)
4 - 25 Reserved
26 - 27 Producer: 0 = pre-V4 VCL, 1 = undefined, 2 = C++, 3 = Pascal (Delphi)
28 - 29 Reserved
30 - 31 Package Type: 0 = Executable, 1 = Package DLL, 2 = Library DLL, 3 = undefined

Table 14: Package Flags

22
Each entry in the TUnitName array also contains an 8-bit Flags value. Only the lower 5 bits is used, with
the upper 3 bits being reserved. A “weak package” has a special definition9 .

Bit Meaning
0 Set if this unit is the main unit
1 Set if this unit is a package unit
2 Set if this unit is a ‘weak package’ unit
3 Set if this unit is the original container for the weak packaged unit
4 Set if this unit was implicitly imported
5-7 Reserved

Table 15: Unit Flags

An example of raw PACKAGEINFO data may be seen in the hex dump in Table 16. The PACKAGEINFO
structure begins at offset 0x1D719C. The parsed data equivalent is shown in Table 17.

|001d7190 c2 82 37 b8 f3 24 42 03-17 9b 3a 83 01 00 00 cc |??7??$B???:?? ?| ^


|001d71a0 00 00 00 00 d2 00 00 00-01 aa 44 65 44 65 00 10 | ? ??DeDe ?|
|001d71b0 bb 52 78 4d 65 6d 44 53-00 10 43 56 61 72 69 61 |?RxMemDS ?CVaria|
|001d71c0 6e 74 73 00 10 9d 53 79-73 43 6f 6e 73 74 00 00 |nts ??SysConst |
|001d71d0 c7 53 79 73 74 65 6d 00-00 81 53 79 73 49 6e 69 |?System ?SysIni|
|001d71e0 74 00 00 02 53 79 73 55-74 69 6c 73 00 0c 4b 57 |t ?SysUtils ?KW|
|001d71f0 69 6e 64 6f 77 73 00 10-55 54 79 70 65 73 00 10 |indows ?UTypes ?|
|001d7200 24 56 61 72 55 74 69 6c-73 00 10 46 43 6f 6d 4f |$VarUtils ?FComO|
|001d7210 62 6a 00 10 71 43 6f 6d-43 6f 6e 73 74 00 10 73 |bj ?qComConst ?s|
|001d7220 41 63 74 69 76 65 58 00-1c 33 4d 65 73 73 61 67 |ActiveX ?3Messag|
|001d7230 65 73 00 10 77 44 42 43-6f 6e 73 74 73 00 00 b3 |es ?wDBConsts ?|

Table 16: Unparsed PACKAGEINFO resource data: excerpt

5.3 Forms
In Delphi, windows and dialog boxes are referred to as forms. Information describing the forms used by a
Delphi program is stored in the executable’s resource section as RCDATA. These are not stored in the usual
Windows resource format, but in a proprietary format specific to Delphi and C++ Builder (Delphi executables,
particularly New executables, may often contain standard Windows dialog box resources as well, but these are
not used for user-generated forms). Forms are designed by the programmer using the GUI, and have many
properties which may be changed or left at default values (the form properties are displayed in the window on
the left in Figure 1).
Delphi form resources start with the magic bytes ”TPF0”, which are followed by two Pascal strings giving
the class name and the form name. These are followed immediately by the form data. Sub-objects of a given
form are defined in the same way. It appears that objects can have any number of nested sub-objects, which
complicates processing. Normally each form has its own TPF0 resource, which describes all the objects and
sub-objects on that form.
The form data as set at design time is stored as a set of key-value pairs. The key name is a Pascal string
and is followed by a byte giving the data type, then the value data itself. The length and format of the value
data depends on the data type. Data types are defined in the Delphi runtime source file ”Classes.pas” and are
given in Table 18. The most interesting data stored in the form resource is the name of the method (function)
associated with the form’s actions or with a subobject, such as a button. Such records are strings, and their
key names have the prefix ‘On’, examples being ‘OnCreate’, ‘OnClick’, ‘OnDoubleClick’ and ‘OnClose’. The
value associated with such a key is the name given to the method by the user. Delphi assigns default names
to many methods (for example, if you have a button called ‘Button1’ on a form, the OnClick method for that
button will by default be called ‘Button1Click’), but the user may change them. In order to locate the actual
address of the method’s function, it is necessary to parse the form’s Virtual Method Table using the techniques
described previously.
9 http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/devcommon/

compdirsweakpackaging_xml.html
23
Package Type: Undefined
Producer: Delphi
PackageFlags.Build: Never
PackageFlags.Design-time only: No
PackageFlags.Run-time only: No
PackageFlags.Check duplicate units: Yes
RequiresCount: 0
ContainsCount: 210

# Hash UnitName Flags


----------------------------------------------------------------------------------------
0001: 0xAA DeDe Main Unit
0002: 0xBB RxMemDS ImplicitlyImported
0003: 0x43 Variants ImplicitlyImported
0004: 0x9D SysConst ImplicitlyImported
0005: 0xC7 System
0006: 0x81 SysInit
0007: 0x02 SysUtils
0008: 0x4B Windows WeakPackage OriginalContainer
0009: 0x55 Types ImplicitlyImported
0010: 0x24 VarUtils ImplicitlyImported
0011: 0x46 ComObj ImplicitlyImported
0012: 0x71 ComConst ImplicitlyImported
0013: 0x73 ActiveX ImplicitlyImported
0014: 0x33 Messages WeakPackage OriginalContainer ImplicitlyImported
0015: 0x77 DBConsts ImplicitlyImported

Table 17: Parsed PACKAGEINFO resource data: excerpt

Figure 1: Delphi Form in Design View showing Form Properties

24
Type Name Description
0x00 vaNull ‘Null’ type, no extra data needed
0x01 vaList Zero-terminated list of values
0x02 vaInt8 Signed 8-bit integer (Delphi ’shortint’)
0x03 vaInt16 16-bit signed little-endian integer
0x04 vaInt32 32-bit signed little-endian integer
0x05 vaExtended Borland proprietary 10-byte floating point
0x06 vaString Pascal string (byte prefix gives length)
0x07 vaIdent Alias for vaString
0x08 vaFalse Boolean false, no extra data needed
0x09 vaTrue Boolean true, no extra data needed
0x0A vaBinary Arbitrary binary data, 32-bit length prefix
0x0B vaSet List of strings
0x0C vaLString Long Pascal string (32-bit length prefix)
0x0D vaNil Delphi ‘Nil’, no extra data needed
0x0E vaCollection Nested set of values of unrelated type
0x0F vaSingle Single-precision floating point
0x10 vaCurrency Borland proprietary currency type
0x11 vaDate Borland proprietary date type
0x12 vaWString Unicode string
0x13 vaInt64 Signed 64-bit little-endian integer
0x14 vaUTF8String UTF8 string (treated as equivalent to vaLString)
0x15 vaDouble Double-precision floating point

Table 18: Form Resource Data Types

25
5.3.1 Flags
Objects and subobjects may contain flag fields after the class and object name. These flags are related to the
way Delphi serialises objects Most are one byte in length, but they must be accounted for when parsing as some
flags (e.g. ffChildPos) store further data after the flag byte. The following three flags are defined, using bits 0-2
of the flag byte.

Bit Flag Name Description Suffix Data


0 ffInherited Indicates an inherited object -
1 ffChildPos Stores position of child object on the parent Child position value (vaInt8, vaInt16 or vaInt32)
2 ffInline ‘used for frames’10 -

Table 19: TFilerFlags as found in Delphi form resources

5.3.2 Aggregate Types (vaList and vaCollection)


A list type is a list of objects.
A collection stores a number of objects of the same class, although the class name may not be explicitly
specified. This means every item in a collection will have the same set of attributes. Attributes may have the
full range of data types, including collections, so parsing needs to be recursive. The beginning and end of a
collection item are indicated with marker bytes, with the value 01 used as the beginning marker and 00 as the
end marker. A final 00 marks the end of the collection.
One or more zero bytes signifies the end of the current object’s data (this is the most unclear aspect
of parsing resources, since various numbers of zero bytes are encountered and it is not obvious what each
represents). Furthermore, sometimes corrupted bits of form data records are present after the end of the form
data for unknown reasons. The end of the TPF0 block is taken as the end of the RCDATA resource.
Example parsed form data is shown in Table 20.

26
OBJECT: TDialogDelayForm:DialogDelayForm
--->Left: 460
--->Top: 434
--->BorderIcons: (biSystemMenu)
--->BorderStyle: bsDialog
--->Caption: DialogDelayForm
--->ClientHeight: 333
--->ClientWidth: 450
--->Color: clBtnFace
--->Font.Charset: DEFAULT_CHARSET
--->Font.Color: clWindowText
--->Font.Height: 245
--->Font.Name: MS Sans Serif
--->Font.Style: [Empty]
--->OldCreateOrder: False
--->Scaled: False
--->OnCreate: FormCreate
--->PixelsPerInch: 96
--->TextHeight: 13
OBJECT: TImage:bgImage
--->Left: 253
--->Top: 0
--->Width: 450
--->Height: 333
--->Picture.Data: [30124 bytes of binary data]

Table 20: Parsed Form Data

6 Locating User Code


In situations where automated analysis cannot analyse the entire code section owing to time or space limitations,
it is useful to be able to identify and exclude library code from analysis. This can be done using function
signatures, but these have their own problems and Delphi library functions change so much between Delphi
versions that maintaining a complete set is extremely difficult.
An alternative method can be used, which takes advantage of the fact that library code always precedes user
code in the executable, and that the VMTs for Delphi standard objects and base classes (e.g. TObject) must
always be located in library code. Since all objects are derived from TObject, by following the Parent pointers
in the VMT it is possible to locate the complete list of ancestors for any given object. If such an object is a
standard object, any functions it has in its VMT may be assumed to be library functions. Thus, a list of such
functions may be compiled, and the highest address in the list can be considered to be the address of the last
definitively-identifiable library function.

27
7 Code Samples
7.1 Brute Force Search for Virtual Method Table Structures
i n t ScanCodeSectionForPotentialVMTs ( const unsigned i n t C o d e S e c t i o n O f f s e t ,
const unsigned i n t C o d e S e c t i o n S i z e )

{
unsigned char * F i l e B a s e = ( unsigned char * ) GetMappedFileBase ( ) ;
unsigned i n t index , t e s t a d d r , t e s t a d d r a d d r ;
P V i r t u a l M e t h o d T a b l e P r e f i x pvmt ;
// need t o go b a c k w a r d s from t h e end o f t h e code s e c t i o n

index = CodeSectionSize = sizeof ( VirtualMethodTablePrefix ) ;


// c h e c k i t ’ s a m u l t i p l e o f 4 , and i f n o t s u b t r a c t t h e remainder so i t becomes one
i f ( ( i n d e x % 4 ) != 0 ) i n d e x == ( i n d e x % 4 ) ;
while ( i n d e x > 0 )
{
test addr = c o n v e r t b y t e s t o u n s i g n e d 3 2 b i t ( F i l e B a s e+C o d e S e c t i o n O f f s e t+i n d e x ) ;
test addr addr = FileOffsetToVirtualAddress ( CodeSectionOffset + index ) ;
i f ( t e s t a d d r a d d r == 0 )
{
p r i n t f ( ” ScanCodeForPotentialVMTs : found bad o f f s e t \n” ) ;
return 0 ;
}

// f i r s t t e s t i f p o t e n t i a l v m t S e l f i s v a l i d and i n t h e main code s e c t i o n


i f ( IsVirtualAddressValid ( test addr )
&& I s V i r t u a l A d d r e s s I n M a i n C o d e S e c t i o n ( t e s t a d d r )
&& t e s t a d d r > ( t e s t a d d r a d d r + 0 x28 ) )
{
pvmt = ( P V i r t u a l M e t h o d T a b l e P r e f i x ) ( F i l e B a s e+C o d e S e c t i o n O f f s e t+i n d e x ) ;
// huge ’ i f ’ s t a t e m e n t
if (
( ( pvmt=>v m t I n t f T a b l e == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>v m t I n t f T a b l e ) )) &&
( ( pvmt=>vmtAutoTable == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>vmtAutoTable ) )) &&
( ( pvmt=>v m t I n i t T a b l e == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>v m t I n i t T a b l e ) )) &&
( ( pvmt=>vmtTypeInfo == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>vmtTypeInfo ) )) &&
( ( pvmt=>v m t F i e l d T a b l e == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>v m t F i e l d T a b l e ) )) &&
( ( pvmt=>vmtMethodTable == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>vmtMethodTable ) ) ) &&
( ( pvmt=>vmtDynamicTable == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>vmtDynamicTable ) ) ) &&
( ( pvmt=>vmtClassName == 0 ) | | ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>vmtClassName ) )) &&
( ! ( I s V i r t u a l A d d r e s s V a l i d ( pvmt=>v m t I n s t a n c e S i z e ) ) ) &&
D o e s V i r t u a l A d d r e s s P o i n t T o P a s c a l S t r i n g ( pvmt=>vmtClassName )
)
{
p r i n t f ( ”Found p o t e n t i a l c a n d i d a t e a t o f f s e t 0x%08X, VA 0x%08X\n” ,
C o d e S e c t i o n O f f s e t+index , F i l e O f f s e t T o V i r t u a l A d d r e s s ( C o d e S e c t i o n O f f s e t+i n d e x ));
return F i l e O f f s e t T o V i r t u a l A d d r e s s ( C o d e S e c t i o n O f f s e t+i n d e x ) ;
}
}
i n d e x == 4 ;
}
p r i n t f ( ”No VMT o b j e c t s found \n” ) ;
return 0 ;
}

This is part of an experimental C program which has been successfully used to locate VMT structures in
Delphi executables where there are no calls to CreateForm() in the entry point function and therefore no easy
way to find any VMTs which may be present. It is given the offset and size of a chunk of file (the code section,
but not necessarily), and then scans backwards through it for structures which look like a VMT, stopping once
a valid VMT is found. Validity is determined by checking that the VMT’s fields are either zero or valid virtual
addresses, and that the vmtClassName field points to a Pascal string. This is a relatively simple test, but seems
to produce good results.

28
7.2 Explanation of Functions not Defined in the Listing
The definitions of the following functions have been omitted from the listing for space reasons. The function
GetMappedFileBase() returns a pointer to the memory-mapped PE file. The function IsVirtualAddress-
Valid() checks that the specified virtual address is located within the PE image, and likewise IsVirtual-
AddressInCodeSection determines whether the virtual address is within the code section. The function
DoesVirtualAddressPointToPascalString() determines whether a given virtual address points to a Pas-
cal string object. VirtualAddressToFileOffset() and FileOffsetToVirtualAddress() convert between file
offsets and virtual addresses using previously-stored data from the section table (the virtual address validation
functions rely on the same data).

29

You might also like