Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/272486831

The use of Hyperspectral Analysis for Ink Identification in Handwritten


Documents

Conference Paper · September 2014


DOI: 10.1109/CCST.2014.6986980

CITATIONS READS

11 286

5 authors, including:

Aythami Morales Miguel A. Ferrer


Universidad Autónoma de Madrid Universidad de Las Palmas de Gran Canaria
120 PUBLICATIONS   1,254 CITATIONS    281 PUBLICATIONS   2,843 CITATIONS   

SEE PROFILE SEE PROFILE

Moises Diaz Cristina Carmona-Duarte


Universidad de Las Palmas de Gran Canaria Universidad de Las Palmas de Gran Canaria
70 PUBLICATIONS   766 CITATIONS    44 PUBLICATIONS   181 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

SEMI AUTOMATIG SYSTEM SIGNATURE RECOGNITION View project

sclera biometrics View project

All content following this page was uploaded by Aythami Morales on 13 April 2015.

The user has requested enhancement of the downloaded file.


The use of Hyperspectral Analysis for Ink
Identification in Handwritten Documents
Aythami Morales1, Miguel A. Ferrer1, Moises Diaz-Cabrera1, Cristina Carmona1, Gordon L. Thomas²
1
Instituto Universitario para el Desarrollo Tecnológico y la Innovación en Comunicaciones (IDeTIC)
Universidad de Las Palmas de Gran Canaria Campus de Tafira s/n, E35017, Las Palmas de Gran Canaria, Spain
amorales@idetic.eu, mferrer@idetic.eu, mdiaz@idetic.eu, ccarmona@idetic.eu, ²Independent consultant and author, UK,
gordonl.thomas@virgin.net

Abstract - Hyperspectral analysis is employed in many different inks, therefore permitting the visualization of traces produced
areas, such as medicine, environmental studies, security and by others. 3. Projector profiles for precision measurements. 4.
forensics. Focusing on law enforcement, ink discrimination has A videospectral comparator for the optical analysis of the ink
become an important factor for the detection of fraudulent reflection under different lighting conditions and wavelengths
documents. This paper proposes an approach for ink analysis in (UV, IR and green and blue filters of different nm.). 5.
handwritten documents and pen verification using hyperspectral Fluotest, for luminescence observation under UV of different
analysis and Least Square SVM classification. The proposed wavelengths. As can be seen, all methods used are non-contact
method obtains immediate results in a non-contact way from the in order not to interfere with the evidence.
document or test sample. The first step is to determine the best
possible lighting conditions. Then a detailed study is made of There are other non-contact techniques, such as: colour
components and properties of the ink and pens used. This paper analysis, absorption spectrum analysis, examining by
proposes a classification method based on the hyperspectral ultraviolet radiation, infrared radiation detection or infrared
characteristics of the ink derived from its physical properties. absorption. The contact techniques are chromatography (either
Furthermore, a database of hyperspectral curves of several types thin or high performance liquid layer) and the use of test
of inks is made, which is used to obtain the characteristics of chemicals [3][4].
different inks. The proposed method for automated ink type
identification is tested using 25 different pens and more than There are also more technologically advanced techniques
1000 samples. The achieved discrimination between types of ink that require more complex instrumentation, for instance,
was 87.5%. The experimental protocol includes three different specialised spectroscopic techniques for studying the
scenarios. interaction between electromagnetic radiation and the ink.
These techniques include FTIR (Fourier Transform Infra-
Red), Raman Spectroscopy, Electrophoresis and Mass
Keywords—ink identification, pen verifier, hyperspectral Spectroscopy.
analysis, handwritten document analysis, forensics.
This paper is focused on the application of optical
spectrometry. As is well-known, each natural element has
properties of absorption and reflection depending on its atomic
I. INTRODUCTION
structure. When an ink is irradiated with white light, it absorbs
The analysis of inks, particularly in Document some wavelengths and reflects others. As the white light
Examination, is of great importance. The ink type and contains energy at several wavelengths, the spectral response
temporal factors can be important evidence in criminal of the deposited ink can be used to characterise it and extract
prosecutions [1]. useful information (e.g. determine if two different samples
There are many document analysis techniques, perhaps as have been made with the same ink). So as not to bias the
many as for forging them. Documentoscopy is the area of measurements, the light source must contain the same amount
knowledge that it is aimed at determining the authenticity of a of energy in all the radiated wavelengths. Thus, it is possible
document, its authorship, structure and content [2], a to infer the composition of an ink given the dispersion of the
document being defined as any medium capable of hosting light reflected by it.
graphical content, either printed or handwritten. In recent decades, the industry has commercialized devices
According to the Spanish Directorate General of Police, in for spectroscopy analysis (e.g. Spectrum FORAM 685-2 by
the Document Examination section of the Forensic Science Foster and Freeman or HSI Examiner 100 QD by ChemImage)
Department, the following material is available to study a In general, these devices provide detailed microscopic analysis
document’s authorship: 1. a binocular microscope or alongside a spectral curve which can be used to compare inks,
magnification loupe for examining finer details of the various papers, holograms and other forms of image. The main
documents. The system usually incorporates a photographic drawback of such devices is the high price which is not
camera. 2. An infrared microscope for the spatial analysis of affordable for some small forensic offices.
inks. This allows the optical removing of certain pigmented
The Spectrum FORAM 685-2 works as follows: white
light is used to illuminate the questioned document while a
narrow bandpass optical filter is placed in front of a camera in
order to analyze the ink sample at the selected wavelength.
This is a multispectral device, i.e., a tens of bands can be
analyzed. The device allows visualization of the strokes at
different microscopic magnifications. This way it is possible
to analyze how the ink is deposited on the paper and how it
fills the interstices between the fibres. This can give revealing
clues about the fluidity and viscosity of the ink, which,
supplemented with other tests, can help to reach a deeper
knowledge of the document authorship. Moreover, if the
spectral responses of the inks are significantly different, it
could mean they are different or at least a sufficient time
interval has elapsed between the imprints to justify
discrepancies.
Another more current system used for document analysis
is the HSI Examiner 100 QD produced by ChemImage [5] [6],
Figure 1. Spectral response of different bulbs tested.
which provide hardware and software for many chemical and
biological applications such as pathology, forensic studies,
pharmaceutical studies and threat detection.
The HSI Examiner 100 QD is a hyperspectral imaging resolution of 1392×1024 pixels. The vertical axis represents
system and software package specifically designed for the wavelength, so we have 1024 spectral bands between 400
forensic document examination. This platform provides, and 1100nm. The spatial resolution depends on the angle of
according to its manufacturer, the most sensitive commercially view of the camera lens and the focus distance. It is possible to
available device for ink discrimination purposes. Again the acquire up to 30 frames per second.
price is its major drawback. For illumination, after testing different lamps (fluorescent,
The aim of the work reported here is to develop an halogen, CFL, LEDs, etc.) looking for white light emission as
automatic ink classifier based on optical properties of the ink uniform as possible between 400 and 1100nm, we chose the
The spectral range is from 400 nm to 1100 nm which includes Philips EcoClassic bulb and the OSRAM bulb, each emitting
the near infrared. The proposed algorithm runs in real-time, at 100W. Figure 1 shows the spectral radiation of the chosen
giving a probability of the same line being written with the bulbs alongside the other two tested bulbs, when their light is
same pen, and providing additional evidence of a document projected over a white sheet.
being fraudulent or not. The data acquisition is made under controlled conditions
The block diagram of the system can be summarized as: inside a box of width 30 cm, depth 30 cm and 40 cm high. The
interior of the box is painted in white with a painting material
1. Acquisition, which include the hyperspectral camera, the that also reflects in the infrared. There are two apertures in the
light and “box” for document acquisition. box, one for the camera lens and another one for document.
2. Hyperspectral image processing to reduce the noise and Each is covered with a black felt curtain to avoid external light
obtain the ink hyperspectral curve at different locations interference. Figure 2 shows the box, the camera and the
curtain configuration. The camera gain and exposure time
3. Characterization of ink hyperspectral reflection and were experimentally fixed to 1.96 and 0 respectively.
classifier design.
4. Database build and test performed to validate the final
system. III. HYPERSPECTRAL IMAGE PROCESSING

II. ACQUISITION DEVICE After introducing a paper with ink lines drawn upon it, the
reflection of the lines is broken down into different
wavelengths, which are subsequently projected onto the CCD
To obtain the hyperspectral image we use a spectrograph detector allowing the creation of a two-dimensional image of
in conjunction with a CCD camera. The system scans a line the reflection, as shown in Figure 2, where one axis represents
image, obtaining the spectral response at each point of the the spatial information and other spectral information. In this
line. The gain and exposure parameters are setup to increase grayscale image, the white tones indicate high levels of
the contrast between bands. reflection while the dark shades indicate less reflection.

The spectrograph used in this paper is the ImSpector


V10E, and the camera the model TM-1327GE which has a
Sheet with 3 ink samples Hyperspectral image Background removal Hyperspectral curves

Ink 1 Ink 2
Scanned line

Wavelength
Ink 3

Angle Wavelength (nm)

a) b) c) d)

Figure 2. Procedure to obtain the ink hyperspectral curve along a line in a document. a) sample document; b) hyperspectral image; c)
hyperspectral image processed; and d) hyperspectral curves.

The image processing is performed in the following steps:


1. The sample document is introduced into the closed IV. DATABASE
enclosure and optimally illuminated. Focus, gain values
and camera exposure are set. Inside the box there is a
mark which shows the line analyzed by the hyperspectral For the database, we have used 25 different pens of
camera, see figure 2. different ink types as follows: 7 different pens of viscous ink,
2. The hyperspectral image of the line analyzed is obtained 4 different pens of liquid ink, 7 different pens of gel ink and 7
by the hyperspectral device to obtain the image at Figure different marker pens [7].
2.b. Two different databases have been built, the first for
3. The hyperspectral image is processed in order to remove system design and the second for evaluation.
the background noise. This is conducted by subtracting A. Database for system design
the hyperspectral image of a white sheet from the We start with lines drawn on paper with all the pens. We have
document hyperspectral image. This is to equalize the used the same kind of sheet for all the documents: business
effect of non-flat spectral illumination. We thus obtain paper of 80 g/m2.
figure 2bc.
With each of the above described pens, we draw 50 lines
4. The hyperspectral curves of ink pixels are extracted as and just after drawing (minimum time lapse) we work out the
follows: hyperspectral curve of the central pixel of each line. So, we
a. The line corresponding to wavelength equal to 800nm obtained 50×25=1250 hyperspectral curves. An example of a
which is a maximum for ink reflection is extracted document belonging to this database being placed in the box
and derived (it corresponds to a row of the image can be seen at figure 3 (upper).
matrix). After a week, with the ink dried, the lines were scanned
b. The higher negative peaks of the derived line are the again, thus obtaining another 1250 hyperspectral curves. In
position (angle) of ink pixels. total 2500 hyperspectral curves comprise the designed corpus.
c. At each pixel position, the hyperspectral curve is B. Database for validation
obtained (column at angle position of the ink). The database for validation consists of 30 bank checks.
5. The hyperspectral curve is smoothed by a moving Ten of them were written with just one pen, fictitiously, of
averaging filter of length 21 pixels thus obtaining course, because there was no intention to use them for bank
hyperspectral curves as shown at Figure 2.d. transactions. Other 10 of them were written with a specific ink
and afterwards forged with a different ink. The remaining 10
The curves characterize the ink composition and need to be were written with one pen and fraudulently altered with a
parameterized in order to identify the ink. different pen with the same ink type. An example can be seen
at figure 3 (lower) where the amount 900 is altered to 90,000
with another pen.
within the context of statisticaal learning theory and structural
risk minimization. Least Squares Support Vector Machines are
reformulations to standard SV VMs which lead to solutions of
the indefinite linear system ms generated within them.
Robustness, sparseness, and weightings
w can be imposed on
LS-SVMs where needed. Wee apply a Bayesian framework
with three levels of inference [99].
The meta-parameters of thee LS-SVM model are the width
of the Gaussian kernels σ and thhe regularization factor which
are trained with parameter vecctors from the modelled ink as
positive samples and other inks as negative samples. The
regularization factor is taken as 30 and the Gaussian
width σ parameter is optim mized as follows: the training
sequence is randomly partitiioned into two equal subsets
,1 2. The LS-SVM is trained 30 times with the
first subset and Gaussian width
w ,1 equal to T
logarithmically equally spacedd values between 10 and 10 .
Each one of the T LS-SVM models
m is tested with the second
subset so as to obtain T Equual Error Rate ,1
Figure 3. Upper: sample of document belonging to thhe database for system measures. The Gaussian widthh σ of the model is obtained as
design being introduced into the box. Lower: examplle of an altered check σ= where . Finally, the ink model
belonging to the validation database. is obtained by training the LS-S
SVM with the complete training
sequence.
This training procedure is employed to work out a LS-
V. INK HYPERSCPECTRAL CURVE PARAM
METRIZATION SVM model per ink or pen, depending on the experiment,
We represent each hyperspectral curve by
b several features using its own training samples as positive vectors and training
in order to enter it into the ink recognizeer. Two kinds of samples of other inks as negative vectors. To verify that a
hyperspectral curve parameters have been developed: the first questioned ink vector correspoonds to a given ink model, the
based on area and the second based on curve slope [8]. score of the questions ink is worked out with the LS-SVM
model of the given ink. If the score is greater than the
Prior to working out the parameters, the hyperspectral threshold, it is accepted as the same.
s
curve is divided uniformly into sections of Δ nm from 400nm
to 1100nm and features based on area and slope
s are obtained
from each section.
VII. EXPERIMENTS
X
The area of each section is numerically calculated with
the trapezoidal rule using:
Several sets of experimentss have been performed. The first
900 · C 900 · 1 were aimed at determining the ability of the device to
1
2 distinguish between inks and between pens. The latter were
being the value of the hyperspectrall curve at nm. designed to validate the schemee with the bank check database.
For slope parameters, the derivative of
o each section is Ink Classification – no tim me lapse: The first experimental
approximated as: session was addressed at workiing out the ink verification rate,
i.e., the ability to distinguish among viscous, gel liquid and
C 900 · 1 C 900 · 2 marker ink. With the 1250 samples of the design database
collected just after writing (wiith the ink fresh), the classifier
The area and slope based features of the hyperspectral
was trained with 30% of thee samples and tested with the
curve are obtained by concatenating the parrameters of all the
remainder 70%. Table I shownn the results. It can be seen that
sections as follows:
the viscous and liquid inks arre the least stable while the gel
, | 0, 900 · 1600 3 ink is the most stable. The maarker is very different from the
other pens, so it is not difficult to discriminate.
, | 0, 900 · 1600 4
Ink Classification – onee week time lapse: With the
The ink feature vector is obtained by concatenating
c both trained ink models, we testedd the samples acquired in the
characteristics , . second session, i.e., the dried ink. The results can also be seen
VI. CLASSIFIER at Table I. Obviously, the peerformance is reduced except in
the case of the viscous ink. This
T is because the viscous ink
The model we use to discriminate one innk from another is dries very quickly and there is no real difference between first
built using a Least Squares Support Vecttor Machine (LS- and second scanning.
SVM). Support Vector Machines (SVMs) arre frequently used
TABLE I. HIT RATIO FOR INK IDENTIFICATION NO TIME LAPSE
Ink Hit ratio (%) TABLE III
Viscous 85 % HIT RATIO OF THE VALIDATION TEST WITH THE CHECKS
Gel 95 %
Fresh
Liquid 75 % Checks forged with: Forgery Detection rate
Marker 95 % different ink 100 %
Viscous 85 % Same ink, different pen 80%
Gel 90 % No forged 0%
Dried
Liquid 70 %
Marker 85 %
VIII. CONCLUSIONS

TABLE II. HIT RATIO FOR INK IDENTIFICATION AFTER ONE WEEK
This paper proposes a methodology to detect forgeries in
handwritten documents. The proposal includes the device
Ink Hit ratio (%) design. This is meant to decrease the scheme cost since the
Viscous 63 % commercially available systems are generally expensive. The
Gel 75 %
Fresh
Liquid 65 % proposed scheme is based on hyperspectral ink physics
Marker 73 % parameterizing the hyperspectral curve of the ink pixels. The
Viscous 63 % hyperspectral ink curve is modeled with a LS-SVM classifier.
Dried
Gel 65 % The validation experiments were performed with a database of
Liquid 53 % altered bank checks to detect forgeries. The results are
Marker 66 %
extremely encouraging.

Pen Classification – Same ink: The third experimental set ACKNOWLEDGMENT


investigates the ability of the scheme to discriminate between This study was funded by the Spanish Government’s
pens using the same ink. For the gel ink, we have 7 classes MCINN TEC2012-38630-C04-02 research project.
(the 7 pens) and 50×7=350 samples freshly inked. Again we
trained with the 30% of the samples and test with the REFERENCES
remainder 70%. The results can be seen at Table II which [1] Comisaría General de Policía Científica, Departamento de
shows the difficulties of the scheme to distinguish among Documentoscopía. España. [Online]. Available:
different pencils. Again, the gel achieves the best http://www.policia.es/org_central/cientifica/servicios/tp_docum_copia.ht
ml. Accessed Sep. 22, 2014.
performance.
[2] Tony Roig, “Documentoscopía: Discriminación de tintas”. Blog El
Pen Classification – Same ink after one week: From the Investigador 2.0, Spain, Sep. 2009. [Online]. Available:
fourth experimental set, with the trained models of experiment http://policiasenlared.blogspot.com.es/2009/09/documentoscopia-
discriminacion-de.html. Accessed Sep. 22, 2014.
3, we work out the hit ratio to distinguish between the pens
[3] Headwall photonics – Forensics applications, Fitchburg, Massachusetts,
after the ink had dried. The results are also given at Table II. EE.UU. [Online], Available:
http://www.headwallphotonics.com/applications#forensics. Accessed
Forgery Detection: The last set of experiments is used to Sep. 22, 2014.
validate with the bank check database. The 20 altered checks
[4] ForensicXP: The next generation in questioned documents examination,
are presented to the system. The written amounts are scanned. Global Marketing & Research Inc, Nueva York, EE.UU. [Online],
The task is to determine whether all the numbers were written Available: http://arxmar.com/index-1.html. Accessed Sep. 22, 2014.
with the same pen. This is conducted by training the classifier [5] ChemImage Corporation website, Pittsburgh, Pensilvania, EE.UU.
with samples of the first digit and testing with samples of the [Online], Available: http://www.chemimage.com/. Accessed Sep. 22,
remaining digits. In all cases, the checks were scanned a week 2014.
later, with the ink dried. [6] ChemImage Corporation - The HSI Examiner 100 QD, Pittsburgh,
Pensilvania, EE.UU. [Online]. Available:
The results are given at Table III. It can be seen that when http://www.chemimage.com/products/instrumentation/examiner/100.asp
the ink is different, all forgeries were detected. When the same x. Accessed Sep. 22, 2014.
ink is used in a different pen 80% of the alterations are found. [7] K. Franke, O. Bünnemeyer, and T. Sy, “Writer identification using ink
No false alarms were detected by our scheme in our database. texture Analysis”, in Proc. 8th International Workshop on Frontiers in
Handwriting Recognition (IWFHR), pp. 268–273, Canada, 2002.
[8] Miguel A. Ferrer, Aythami Morales and Alba Díaz, "An approach to
SWIR Hyperspectral Hand Biometrics", Information Sciences, vol. 268,
2014, pp. 3-19.
[9] C. J. C. Burges, “A tutorial on support vector machines for pattern
recognition”, Data Mining and Knowledge Discovery, vol. 2 (2), 1998,
pp. 955-974.

View publication stats

You might also like