Degraded Document Image Binarization Using Novel Background Estimation Technique

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2021 6th International Conference for Convergence in Technology (I2CT)

Pune, India. Apr 02-04, 2021

Degraded Document Image Binarization using


Novel Background Estimation Technique
Harshit Jindal∗ , Manoj Kumar † , Akhil Tomar ‡ and Ayush Malik §

Department of Computer Science Engineering, Delhi Technological University


New Delhi,India
Email: ∗ harshitjindal2000@gmail.com, † mkumarg@dce.ac.in, ‡ akhiltomar098@gmail.com, § ayushmalik03@gmail.com

Abstract—Over the past few decades, the use of scanned and there is always room for improvement in the quality of
historical document images has increased dramatically, especially binarization.
2021 6th International Conference for Convergence in Technology (I2CT) | 978-1-7281-8876-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/I2CT51068.2021.9418084

with the emergence of online libraries and standard benchmark According to Sezgin and Sankur in [4], there are six
datasets like DIBCO. The historical documents are usually in
very-poor conditions containing noises like large ink stains, kinds of thresholding methods, histogram-shape based meth-
bleed-through, liquid spills, uneven-background, spots, faded-ink, ods, spatial methods, clustering-based methods, entropy-based
weak/thin text that makes the task of binarization very difficult. methods, object-attribute based methods, local methods. We
In this paper, we propose an effective degraded document image can also describe the thresholding algorithms as Global or
binarization algorithm that performs accurate text segmentation. Local depending on the method used to compute the threshold
Our method first estimates the background utilizing informa-
tion from neighboring pixels and filter smoothening. The next value. In global methods [5], a single threshold value is
step is background subtraction that helps in the compensation computed for the image based on entropy, histogram, or a
of background distortions. The document is segmented using clustering algorithm that segments the image into two classes:
Otsu thresholding, and then we process the image to remove foreground(text) and background. Whereas local thresholding
the remaining noise and maximize text content using labelled algorithms [6] divide the image into subparts and use the infor-
connected components. Our method outperforms several existing
and widely used binarization algorithms on F-measure, PSNR, mation from neighboring pixels in a small fixed-sized window
DRD, and pseudo F-measure when evaluated on H-DIBCO 2016 of the image to determine the threshold locally for one subpart
and H-DIBCO 2018 datasets and can very effectively detect faint at a time. Global thresholding methods efficiently extract text
characters from a document image. from high-quality documents. But their performance starts to
Index Terms—Document Image Processing, Degraded Docu- lag when degradations are present and adaptive thresholding
ment Image Binarization, Thresholding, Background estimation,
Noise Removal, Otsu Thresholding, Bilateral Filtering algorithms that make a local estimate of the threshold for each
pixel produce better results.
Over time several algorithms have been proposed to convert
I. I NTRODUCTION
ancient images into their binary form, Global thresholding
Image binarization is the pre-processing step required for algorithms like Otsu [7], Kittler [8], etc., and Local thresh-
digital image processing [1] and analysis tasks like moving olding algorithms like Niblack [9], Sauvola [10], etc. Otsu’s
object detection or finding the region of interest in an im- method performs well with images having a bimodal his-
age like the text in a degraded document [2], [3]. Binary togram but can’t handle degradations like uneven illumination,
images take less storage memory, document layout analysis, bleed-through, or fainting text. Kittler’s algorithm assumes a
document skew detection, and faster computations for the gaussian distribution of pixel values for each pixel level to
specific application. The most common way of binarizing an find the segmentation cut-off and works well for high-quality
image is to use a thresholding algorithm that segments the documents. Niblack’s method calculates the mean and standard
image into background and foreground based on a globally or deviation in a fixed-sized window for each pixel and hence
locally computed threshold value for pixel intensities. Ancient computes a local threshold. It effectively recognizes text but
Historical Documents, Manuscripts are stored over the years introduces tons of noise. Sauvola tried to modify Niblack’s
in archives, libraries. Over time, their quality is affected due method to reduce noise which gave better results in some
to several environmental factors like ageing of the paper, cases. But, it also resulted in a lower text detection rate.
humidity, mishandling by humans, and dust. The presence of Many methods have incorporated these algorithms and other
degradations makes the task of binarization of documents and techniques like contrast normalization, background estimation,
preserving them digitally a tough job. With increasing interest stroke-width detection to form hybrid algorithms that perform
in historical document analysis, there has been rapid develop- much better than individual algorithms. Before, moving to
ment in document image binarization. Historical documents our approach we discuss the development of such binarization
have various degradations like faded ink or faint-characters, algorithms. Kim et al. [14]consider the image as a 3d terrain
bleed-through, uneven illumination, contrast variation, smear. on which water is poured to fill valleys representing text.
New techniques keep coming out regularly because there is The water-filled image is subtracted from the original image
no single algorithm that can handle all these degradations followed by Otsu’s method to form the final binarization

978-1-7281-8876-8/21/$31.00 ©2021 IEEE 1

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.
result and we get a nature-inspired unique hybrid binarization
algorithm. Adaptive local thresholding algorithm also utilizes
the document-specific domain knowledge. Gatos et al. [15]
make use of the Weiner filter to pre-process the input image
and then the Sauvola’s algorithm to estimate the background.
The difference between the pre-processed image and the
estimated background gives an intermediate result which is
finally passed through post-processing to produce the final
result. Although this algorithm is better than Sauvola, it is
still unable to process bleed-through or recover faint characters
from the image. After this came, Lu’s [16] Iterative polynomial
smoothening-based background-estimation algorithm. It then
makes use of a normalized image thresholded by Otsu to detect
the text-stroke width. The final result is produced by local
thresholding information based on the detected stroke edges
and the mean intensity. This method won the first DIBCO
(a) competition (DIBCO 2009) [11] but still, it’s not perfect. This
method is based on the local contrast and hence sometimes
high contrast background is difficult to handle. Su et al. [17]
used minimum and maximum intensity in the local window to
form a contrast image and then binarized it to detect text edges
and were able to produce better results than Lu. This method
unlike Lu performed well on documents with bleed-through
but this also suffers from the faint text.
Nina’s binarization algorithm [18] was a six staged back-
ground estimation based algorithm that used median filtering
for background estimation, bilateral filter, recursive Otsu,
contrast compensation, and despeckling. This algorithm can
effectively capture all the text edges. Singh et al. [19] proposed
(b) an adaptive four-step method that includes contrast stretch-
ing, contrast analysis, thresholding, and noise removal from
the document image. It works very well with most of the
degradations but fails when the image suffers from bleed-
through.Howe’s algorithm [20] was one of the first automatic
parameter tuning based complex binarization algorithm. It
used graph-cut computation to minimize the energy function
based on the Laplacian of pixel intensities. It is a very efficient
method that performs well and autotunes itself on different
kinds of images. But this method also like others performs
poorly on faint text characters and sometimes even introduces
background noise. A recent method that won DIBCO 2018
[13]. (Wei. et al.) [21] makes use of morphological black
(c) hat transformation to compensate the document background
using a disc-shaped structuring element of a size determined
by stroke width transform. Howe’s binarization algorithm is
then applied to form a binary image which is further enhanced
by post-processing that reduces noise and also preserves the
text stroke-width.
Our motivation came from the consideration of the amount
of ink wasted in Xerox due to poor binarization and the
challenging task of detecting faint characters, removal of
(d)
bleed-through text from handwritten document images. The
Fig. 1: Degraded Handwritten Document images from DIBCO proposed algorithm combines several steps like background
dataset [11], [12], [13]illustrating various degradations like estimation, smoothening filters, thresholding (Otsu), and post-
contrast variation in (a), smear in (b), faint ink in (c), and processing steps. It first estimates the background utilizing the
bleed-through in (d). information from neighboring pixels to get an estimate of the

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.
current pixel value and moves similarly for the rest of the
pixels. The estimated background image is then smoothened
using a bilateral filter that reduces random noises from the
estimated background. Then we extract the text using a global
thresholding method(Otsu) and finally, the image is processed
further to get rid of the remaining noise. This paper’s major
contribution is the algorithm’s ability to detect bold and faint
text characters from handwritten documents very effectively
just by adjusting the parameters of the background estimation
while still being able to process bleed-through, smear, and
(a) Original Image (b) Ground Truth Image
other degradations.
The rest of the paper is arranged as follows:
In Section II, we present our proposed algorithm in detail.
Section III demonstrates and describes the experimental re-
sults. Finally, we conclude in Section IV.
II. PROPOSED METHOD
Our method utilizes existing techniques and combines them
with our technique to form a novel document image bi-
narization method. It includes the use of a novel iterative
sliding-window based background estimation method, bilateral (c) Nina (d) Otsu
filter [22], Otsu thresholding, and effective post-processing to
remove the remaining noise from the image. Our proposed
binarization algorithm consists of the following steps:
1) Conversion of the Image to Grayscale
2) Background Estimation and Removal
3) Gaussian Smoothing
4) Otsu Thresholding
5) Spotting Removal
A. Conversion of the Image to Grayscale
Historical documents usually do not have the dynamic color (e) Niblack (f) Sauvola
range and hence it is not a good idea to process them in
the colored form with millions of color intensities. Different
documents have different colors and it is a good idea to
convert all to the same color to ensure that all documents are
processed in the same manner. Hence, we convert the image
to a grayscale form where there are only 28 unique intensities.
B. Background Estimation and Removal
After we have converted the image to grayscale form, we
first try to remove as much background noise as possible
by forming an accurate estimation of the background. Our (g) Lu (h) Proposed Algorithm
background estimation method is based on the assumption that Fig. 2: Image H03 from DIBCO 2009 dataset with comparison
the text color varies significantly from the background color.
Usually, either the text is dark and the background is light or
vice-versa. The neighboring pixels thus can provide a good θ = Imin (x, y)/Imax (x, y) (3)
approximation of the local background of a pixel. We utilize
this information i.e. pixels with minimum and maximum I(x, y) = Imin (x, y) + Imax (x, y)(1 − θ)γ (4)
intensity in the local neighbourhood to form an iterative
Equation (1) and (2) represents the calculation of maximum
sliding window background estimation algorithm based on the
and minimum intensity pixel into a (2w+1 x 2w+1) sized
following equation:
window around the current pixel with coordinates (x, y). Equa-
Imax (x, y) = max(I(xc , yc )) (1) tion (3) represents the calculation of the intensity ratio factor
θ. Once we calculate Imax , Imin , θ inside the window, the
Imin (x, y) = min(I(xc , yc )) (2)
background estimation for the current pixel is then calculated
xc (x − w : x + w), yc (y − w : y + w) using Equation (4) where γ is a constant. This process is

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.
repeated iteratively for each pixel for n iterations. The reason
for doing multiple iterations is the fact that bold/thick text
doesn’t fade away with one iteration and hence hampers the
text when performing background subtraction and results in
poor performance.

Algorithm: (a) Step 1: Original Image (Grayscale)


1) I(x,y): Original Image;
2) w: (window size)/2;
3) n: number of iterations;
4) C(x,y): Copy of Original Image
5) BG(x,y): Estimated Background Image
6) For k=1 to n: (b) Step 2.1: Background Estimation
7) Slide window over each pixel and compute:
8) Imin = minimum intensity pixel(Window)
9) Imax = maximum intensity pixel(Window)
10) θ = Imin /Imax
11) BG[i][j]=Imin + Imax (1 − θ)γ
12) Store background image as new copy of image.
13) Apply a bilateral filter to estimated background image. (c) Step 2.2: Background Removal
Typically, the value of constant γ can vary from 1.3 to 2.5.
But, we have used the γ = 1.5 so that our algorithm works well
for dark as well as faint text and it also helps to reduce the
number of parameters that need to be tuned. We suggest using
a small window size like 5x5 for images with bold and dark
visible text whereas large window sizes varying from 50*50
(d) Step 2.3: Inversion of Output from Step 2.2
for faint text characters. The number of iterations is subjected
to the difference between text and background intensity. A
higher difference means more iterations for complete estima-
tion.To remove the remaining noise and pixel-level irregular-
ities inside the estimated background we use a bilateral filter
over the estimated background image BG(x,y). We set σColor
= 30 and σSpace = 30.These parameters have been tuned on H- (e) Step 3: Gaussian Smoothening
DIBCO 2018 and perform well for H-DIBCO 2016 [23] too.
The filtering effect is not so much visible on the background
image but has a profound effect on binarization results. Now
that we have the final estimated background image BG we
move forward to the step of background removal. We remove
the background by subtraction of the original image from it.
Before moving on to the Gaussian smoothing we invert the
(f) Step 4: Otsu Thresholding
image for further processing. The background removal process
can be summarised as:

Inew (x, y) = 255 − ([BG(x, y) − I(x, y)]) (5)

(g) Step 5: Spotting Removal and Inversion


C. Gaussian Smoothing
Fig. 3: Overview of Binarization Algorithm. Fig 3(a) shows
The background removal process introduces random pixel the input image which is already in grayscale. Fig 3(b)-
noise in the newly formed image disrupting the final output. 3(d) represents the background estimation and removal step.
So, before proceeding with thresholding, the image is passed Fig 3(e) shows the image after the removal of noise using
through a Gaussian filter to remove these isolated black and gaussian smoothing. Otsu thresholding is then used to create
white pixels. We have used a 5x5 gaussian kernel to filter the an initial binarization in 3(f). Finally, we use spotting removal
noise. This helps in greatly improving the binarization results. to generate the final binarization.

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.
D. Otsu Thresholding A. Qualitative Results
We use Otsu Thresholding for the binarization of our gaus- In Fig. 2 we have compared the results of our method
sian smoothened image. Otsu Thresholding is a global thresh- with Sauvola, Nina, Lu, Niblack, Otsu’s method. Lu’s, Nina’s
olding method used to separate text from the background. It method and our proposed method produce good binarization
minimizes the interclass variance between the foreground and results but Lu’s method has made the text bold and Nina’s
background class or we can say maximizing the intraclass method suffers from weak stroke detection but our method can
variance. produce an image close to the ground truth image. Fig. 4 and
2 Fig.5 show results of our algorithm on images from H-DIBCO
σB (T ) = w1 (T )w2 (T )[μ1 (T ) − μ2 (T )]2 (6)
2016 & 2018 datasets along with the ground truth images.
2
Here σB represents the between-class variance, we compute Images consist of different kinds of degradations like large
it for each threshold T from 0 to 255 and select the one that ink spots, thick borders, smear, severe bleed-through, faint text
gives the maximum result. wi is used to represent total pixels characters, folded pages, torn out pages, uneven illumination,
in the background/foreground class. μi represents the mean smudge, shadows, low-contrast, water spilling, etc. As evident
intensity for both the classes. Otsu thresholding helps us to from the results, our algorithm handles all these degradations
form a pretty good approximation of the text and background very well and can produce clean binary images with efficiently
but there can still be remaining noise that needs to be removed. extracted text.
Before moving on to the Quantitative Results, We would
E. Spotting Removal first describe the evaluation measures used from DIBCO
Spotting removal is the effective post-processing technique reports. DIBCO uses various benchmark evaluation methods
we use to remove all the remaining noise from our binary that use mathematical calculations to measure the efficiency,
image. We do this by labelling all the connected black(text) accurateness of an approach in obtaining the results against
components inside the binary image and then removing all the ground truth images. DIBCO 2016 and 2018 compe-
the components of small size that are introduced while bina- tition compute 8 statistic evaluation measures: F-measure,
rizing the image. Removing these small components may also Pseudo F-measure, PSNR(Peak Signal to Noise Ratio) [24],
remove some of the textual parts like a dot over a letter but DRD(Distance Reciprocal Distortion) [25], Precision, Recall,
ultimately it helps in making the binarized image clean. We Pseudo Precision, and Pseudo Recall.It uses the four of them
select the size we want to discard manually according to the to decide the winner.
text size in an image because there is no fixed text size in F-measure is calculated using two components: the pre-
handwritten documents, the dot over a character can be more cision and recall of image binarization which is obtained
than 300 pixels in one image and less than 20 pixels in another by counting pixels that are classified black and white cor-
and we want to remove the maximum amount of noise possible rectly/incorrectly while binarizing against the ground truth
from our binarized document. Lastly, We also take care of very images. F-measure is calculated as the harmonics mean of
large size degradations that are falsely detected as text and are recall and precision.
much larger as compared to the text pixel components present precision ∗ recall
inside the image. This can be any kind of degradation like a F1 = 2 ∗ (7)
precision + recall
thick border at the corner of the document, a large ink/coffee
stain, or random large spots that are often dark and hence where Recall =(TP)/(TP + FN) and Precision=TP/(TP+ FP).
misclassified as text. This can be easily selected according to TP, TN, FP, FN denotes the True Positive, True Negative, False
the text size. After this, we invert the image back to make Positive, and False Negative values, respectively.
the text dark and the background white. Fig 3(f) shows the pF-measure or pseudo F-measure is a way of measuring
final binarization results after this post-processing(stopping performance similar to F-measure. The only difference being,
removal) step. It shows how post-processing has taken care it uses the pseudo function instead of the normal recall and
of most of the thresholding misclassification errors. precision functions. This pseudo function uses the weighted
distance between the output image as boundaries of characters
III. E XPERIMENTAL R ESULTS in the ground truth image and the boundary of characters in
DIBCO datasets have images that suffer from various types the extracted document. It also takes into account the local
of degradations and also provide the ground truth image to stroke width of characters.
compare different binarization methods. These datasets consist PSNR is the relation or measure of the amount of signal
of both degraded handwritten and printed documents. We have in an image w.r.t amount of noise available. The higher value
used images from H-DIBCO 2018 dataset to develop the of PSNR implies a better signal in comparison to noise. It
algorithms as they cover a vast variety of degradations and then is used to measure the degree of reconstruction of bad lossy
used it for testing the images from the H-DIBCO 2016 as well compression. It is really useful in the case of binarization as it
as the DIBCO 2018 dataset. We first present the qualitative can be used to measure the quality of binarization against the
results and the quantitative comparison of our method with initial image. It helps to quantify the similarity of two images.
many state-of-the-art algorithms on the H-DIBCO 2016 and C2
H-DIBCO 2018 datasets. P SN R = 10log( ) (8)
M SE

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.
(a) Image 5 from DIBCO 2016 dataset (a) Image 1 from DIBCO 2018 dataset

(b) Ground Truth Image (b) Ground Truth Image

(c) Binarized Result (c) Binarized Result

(d) Image 2 from DIBCO 2016 dataset (d) Image 5 from DIBCO 2018 dataset

(e) Ground Truth Image (e) Ground Truth Image

(f) Binarized Result (f) Binarized Result


Fig. 4: Images from DIBCO 2016 dataset along with ground Fig. 5: Images from DIBCO 2018 dataset along with ground
truth images and their binarized results truth images and their binarized results

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.
Image FM p-FM PSNR DRD
Where MSE is the Mean Square Error and C is a constant 1 92.23 94.74 21.01 2.62
with a value equal to 1. 2 89.72 90.13 20 3.09
DRD is mainly used to measure distortions for binary 3 94.89 96.54 17.8 1.75
document images. It is mainly based on the reciprocal distance. 4 82.07 87.12 19.32 4.05
5 87.01 86.45 17.29 6.29
The pixels modified during binarization can have their distor- 6 93 96.53 18.88 3.33
tions quantified using this method. DRD can be calculated as 7 88.77 90.39 20.9 3.25
follows: 8 83.13 84.89 14.17 5.2
N
DRDk 9 92.95 93.9 19.59 3.24
DRD = k=1 (9) 10 86.66 90.47 14.12 6.41
N U BN Average 89.04 91.12 18.31 3.92
DRDk represents the distortion of the kth flipped bit which
TABLE II: Results on DIBCO 2018 dataset
is calculated using a weighted matrix of size 5x5. Its value is
given by the following formula:
j=2
i=2 
 both traditional and state-of-the-art methods on 3 metrics:
DRDk = |GTk (i, j)−Bk (x, y)|∗WN m (i, j) (10)
FM, PSNR, DRD, and only its pseudo F-measure is lower
i=−2 j=−2
than the winner’s and runner up’s algorithm. In Table 4,
NUBN is the total number of non-uniform pixels of the 8 × 8 we compare our results on DIBCO 2018 dataset with Otsu,
size window inside the ground truth image. All other methods Sauvola, Winner and Runner Up from the contest. It achieves
just quantitatively tell us how good our method is by checking an FM of 89.04, p-FM of 91.12, PSNR of 18.31, DRD of 3.92
the pixel classification but DRD tries to capture the human For this dataset as well our method outperforms all the listed
visual perception by measuring the perspective distortion of methods on 3 metrics FM, p-FM, DRD and has only a lower
the binary image. Unlike other metrics lower value of DRD PSNR value from Winner’s method. Higher FM, lower DRD
indicates a better binarization algorithm. All these metrics have for both datasets means higher precision, recall, better pixel
been calculated using the DIBCO evaluation tool. classification accuracy, and thus better binarization results.
B. Quantitative Results Method FM p-FM PSNR DRD
Lu [16] 84.44 92.04 17.33 5.12
H-DIBCO 2016 and H-DIBCO 2018 both datasets consist of Su [17] 84.75 88.94 17.64 5.64
10 handwritten document images along with their ground truth Howe [20] 87.47 92.28 18.05 5.35
image. These images cover up a wide variety of degradations. Otsu [7] 86.61 88.67 17.8 5.56
In Table 1 and 2, we present the results of our algorithm Sauvola [10] 82.52 86.85 16.42 7.49
Winner’s Method 87.61 91.28 18.11 5.21
for all the four metrics individually and averaged across all Proposed Method 90.17 90.31 18.68 3.57
10 Handwritten Document Images from H-DIBCO 2016 and Lelore [26] 87.21 88.48 17.36 5.27
H-DIBCO 2018 datasets respectively. Our method has been Runner Up 88.72 91.84 18.45 3.86
able to achieve an F-measure and pseudo-F-measure of around TABLE III: Comparison of performance of different methods
90, PSNR value is greater than 18, and a DRD value of less with our method against H-DIBCO 2016 dataset
than 4 for both of the datasets. In addition to the published
results from DIBCO, we also compare our results with several
traditional and state-of-the-art methods. Method FM p-FM PSNR DRD
Image FM p-FM PSNR DRD Otsu [7] 51.45 53.05 9.74 59.07
1 91.35 91.11 19.04 5.73 Sauvola [10] 67.81 74.08 13.78 17.69
2 89.04 89.17 23.3 3.6 Winner’s Method [21] 88.34 90.24 19.11 4.92
3 95.31 95.51 23.35 1.59 Proposed Method 89.04 91.12 18.31 3.92
4 89.76 91.28 19.38 4.13 Runner Up 73.45 75.94 14.62 26.24
5 95.9 96.58 22.45 1.4
TABLE IV: Comparison of performance of different methods
6 87.15 91.61 18.17 5.36
7 91.11 90.47 17.04 2.49 with our method against H-DIBCO 2018 dataset
8 85.86 88.75 13.99 6.1
9 89.23 84.42 15.97 2.34
10 86.99 84.18 14.1 2.95 The qualitative and quantitative results on both the datasets
Average 90.17 90.31 18.68 3.57 illustrate that our method significantly outperforms traditional
TABLE I: Results on DIBCO 2016 dataset and state-of-the-art algorithms. Fig 4(c),4(f),5(c) show that our
method deals very well with bleed through, ink spots, spill-
through. It works well even for images with faint text and
In Table 3, we compare the averaged results of our method preserves stroke connectivity evident from Fig. 5(f). Despite
from table 1 with state-of-the-art methods like Otsu, Sauvola, all these advantages like all methods, our method also has
Lu, Su, Howe, the winner and runner up from DIBCO 2016 several limitations.Firstly, it seemed to struggle on Images 8
dataset. Our method has achieved an FM of 90.17, p-FM from DIBCO 2016 dataset and Image 10 from DIBCO 2018
of 90.31, PSNR of 18.68, DRD of 3.57, and outperformed dataset which has one thing in common, the background and

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.
text have very similar colors, and hence a lot of background [11] B. Gatos, K. Ntirogiannis, and I. Pratikakis, “ICDAR 2009 Document
degradations are classified as text and result in poorly bi- Image Binarization Contest (DIBCO 2009),” in Proceedings of the In-
ternational Conference on Document Analysis and Recognition, ICDAR,
narized images. Secondly, It also suffers when images that 2009.
have black/dark colored degradation or color of degradation is [12] I. Pratikakis, B. Gatos, and K. Ntirogiannis, “ICDAR 2011 Document
similar to the text. Image Binarization Contest (DIBCO 2011),” in Proceedings of the In-
ternational Conference on Document Analysis and Recognition, ICDAR,
2011.
IV. C ONCLUSION [13] I. Pratikakis, K. Zagori, P. Kaddas, and B. Gatos, “ICFHR 2018
competition on handwritten document image binarization (H-DIBCO
In this paper, we have presented a novel method for bina- 2018),” in Proceedings of International Conference on Frontiers in
rization of degraded historical documents. The main idea is to Handwriting Recognition, ICFHR, 2018.
[14] I. K. Kim, D. W. Jung, and R. H. Park, “Document image binarization
form an approximation of the background using the informa- based on topographic analysis using a water flow model,” Pattern
tion from neighboring pixels in an iterative sliding window al- Recognition, 2002.
gorithm. We use this estimated background to compensate for [15] B. Gatos, I. Pratikakis, and S. J. Perantonis, “Adaptive degraded docu-
ment image binarization,” Pattern Recognition, 2006.
all the background degradations, perform binarization using [16] S. Lu, B. Su, and C. L. Tan, “Document image binarization using
Otsu, and then do effective post-processing to further clean and background estimation and stroke edges,” International Journal on
create a final binary image. The performance of our algorithm Document Analysis and Recognition, vol. 13, no. 4, pp. 303–314, 2010.
[17] B. Su, S. Lu, and C. L. Tan, “Binarization of historical document
is greatly dependent on the selection of parameters. One is images using the local maximum and minimum,” in ACM International
required to tune four parameters: the window size for the Conference Proceeding Series, 2010.
background estimation, the number of iterations required for [18] O. Nina, B. Morse, and W. Barrett, “A recursive otsu thresholding
method for scanned document binarization,” in 2011 IEEE Workshop
background estimation, valid size range to remove maximum on Applications of Computer Vision, WACV 2011, 2011.
noise and keep maximum text(lower and upper limit). Bold [19] O. I. Singh and O. James, “Local Contrast and Mean based Thresholding
text usually means more iterations for accurate approximation Technique in Image Binarization,” International Journal of Computer
Applications, 2012.
whereas faint text requires a larger window size to filter out [20] N. R. Howe, “Document binarization with automatic parameter tuning,”
the text form background. The range for text size can be International Journal on Document Analysis and Recognition, 2013.
selected according to the thickness and pixel size of the font [21] W. Xiong, X. Jia, J. Xu, Z. Xiong, M. Liu, and J. Wang, “Historical
document image binarization using background estimation and energy
present inside the image. The experimental results show that minimization,” in Proceedings - International Conference on Pattern
our method outperforms several traditional and state-of-the- Recognition, 2018.
art algorithms. But like all other methods, this method is also [22] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color
images,” in Proceedings of the IEEE International Conference on
not perfect and suffers with images that have text color shade Computer Vision, 1998.
similar to the background, and binarization performance is [23] I. Pratikakis, K. Zagoris, G. Barlas, and B. Gatos, “ICFHR 2016
affected. Our method also does not take into account the stroke handwritten document image binarization contest (H-DIBCO 2016),” in
Proceedings of International Conference on Frontiers in Handwriting
width and at some places suffers in maintaining the stroke Recognition, ICFHR, 2016.
width connectivity. We would like to take this challenge in our [24] A. Horé and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in
future work to further improve our method and see applications Proceedings - International Conference on Pattern Recognition, 2010.
[25] H. Lu, A. C. Kot, and Y. Q. Shi, “Distance-reciprocal distortion measure
of this background estimation technique in other fields. for binary document images,” IEEE Signal Processing Letters, 2004.
[26] T. Lelore and F. Bouchara, “Document image binarisation using Markov
R EFERENCES field model,” in Proceedings of the International Conference on Docu-
ment Analysis and Recognition, ICDAR, 2009.
[1] S. Akram, M.-U.-D. Dar, and A. Quyoum, “Document Image Processing
- A Review,” International Journal of Computer Applications, 2010.
[2] P. K. More and D. D. Dighe, “A Review on Document Image Binariza-
tion Technique for Degraded Document Images,” International Research
Journal of Engineering and Technology, 2016.
[3] S. Milyaev, O. Barinova, T. Novikova, P. Kohli, and V. Lempitsky, “Im-
age binarization for end-to-end text understanding in natural images,”
in Proceedings of the International Conference on Document Analysis
and Recognition, ICDAR, 2013.
[4] B. Sankur and M. Sezgin, “Image thresholding techniques: A survey
over categories,” Pattern Recognition, 2001.
[5] P. K. Sahoo, S. Soltani, and A. K. Wong, “A survey of thresholding
techniques,” 1988.
[6] P. Roy, S. Dutta, N. Dey, G. Dey, S. Chakraborty, and R. Ray, “Adaptive
thresholding: A comparative study,” in 2014 International Conference
on Control, Instrumentation, Communication and Computational Tech-
nologies, ICCICCT 2014, 2014.
[7] N. Otsu, “THRESHOLD SELECTION METHOD FROM GRAY-
LEVEL HISTOGRAMS.” IEEE Trans Syst Man Cybern, 1979.
[8] J. Kittler and J. Illingworth, “Minimum error thresholding,” Pattern
Recognition, 1986.
[9] W. Niblack, “An introduction to digital image processing.” An introduc-
tion to digital image processing., 1986.
[10] J. Sauvola and M. Pietikäinen, “Adaptive document image binarization,”
Pattern Recognition, 2000.

Authorized licensed use limited to: Universiti Teknikal Malaysia Melaka-UTEM. Downloaded on November 17,2022 at 08:49:37 UTC from IEEE Xplore. Restrictions apply.

You might also like