Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Perpustakaan Nasional Indonesia:

1.197
Library in Universitas Indonesia:
2394 Reksa Pustaka:
Radya pustaka: 500
many

Yayasan Sastra Lestari: million Sana Pustaka: 750

Pura Pakualaman:
251
Sonobudaya: 1255 Widya Budaya:
Artati ‘s Library in Sanata Dharma University:
750
225

: 7.000 enaugh manuscripts

Data Ref: Marsono (2010), Behrend (1990), Behrend dan Pudjiastuti (1997), Saktimulya (2005), Budiarti, dkk,
Map Ref: http://dokterpenulis.files.wordpress.com/2008/04/javamap.jpg, diakses 6 Juni
(2007)
2012
WSEAS - KOS Island Greece July 14-17, 2012
2
There are 200 manuscripts of the Old Javanese texts, both form of manuscripts, book
notes, transcripts and at least 6,000 book collection Pater Prof. Zoetmulder., as a
legacy for the Sanata Dharma University Yogyakarta that are stored in the Library in
Building Artati.

WSEAS - KOS Island Greece July 14-17, 2012


3
WSEAS - KOS Island Greece July 14-17, 2012
4
WSEAS - KOS Island Greece July 14-17, 2012
5
a. A written or printed paper that bears the original,
official, or legal form of something and can be
used to furnish decisive evidence or information.
b. Something, such as a recording or a photograph,
that can be used to furnish evidence or
information.
c. A writing that contains information.
d. Computer Science. A piece of work created with
an application, as by a word processor.
e. Computer Science. A computer file that is not an
executable file and contains data for use by
applications
 Citra dokumen adalah representasi
visual dokumen kertas seperti jurnal,
hasil faksimili, surat-surat kantor, lembar
isian, dan lain-lain (Srihari dkk., 1986)
 Pengenalan citra dokumen adalah upaya
untuk menjadikan citra dokumen
menjadi suatu representasi semantik
(Srihari dkk., 1986).
Meter
Mark

Digital
Sender’s Address Endorsem Post Mark
Linear ent
In Case of Undeliverable as Addressed Return to Sender

Code
Delivery Address
Personal DL
Document page
300 dpi, 8.5x11 in
255 gray Data capture
X 3 color 107 pixels

2,550 x 3,300 Pixel-level processing


pixels
7,500 character boxes, 15x20 pixels each
500 line and curve segments, 20 to 20,000 pixels each
10 filled regions 20x20 to 200x200 pixels each

Feature-level processing 10x5 region


7500x10 character features features
500x5 line and curve features

Text analysis & recognition Graphics analysis & recognition


1,500 words, 10 2 line diagrams, 1
paragraphs, company logo, etc.
1 title, 2 subtitles, etc. Document Description
Data Capture
adalah tahap
pembacaan
data dari
dokumen
kertas
dengan
mempergun
akan alat
scan optis
dan hasilnya
disimpan
sebagai file
dalam
bentuk
piksel.
 Yayasan Sastra Lestari → Citra dokumen
beraksara Jawa
 UCLA
 Pengolahan Tingkat Piksel
 Tahap pengolahan tingkat piksel adalah suatu tahap yang
bertujuan untuk menyiapkan dokumen citra, serta membuat
fitur perantara untuk membantu mengenali citra.
 Analisis Tingkat Fitur
 Analisis tingkat fitur adalah tahap untuk memproses hasil
dari pengolahan tingkat piksel sehingga menjadi informasi
yang lebih dapat dipahami manusia.
 Analisis Teks
 Terdapat dua tipe analisis yang dapat diberlakukan
terhadap teks pada dokumen. Tipe analisis pertama adalah
pengenalan karakter untuk mengenali karakter dan kata
dari citra berbasis bit. Analisis kedua adalah analisis layout
halaman untuk menentukan format teks dan menentukan
arti, yang berhubungan dengan posisi dan fungsi dari teks.
Document Image Analysis
Processing Text Graphics
Pixels Preprocessing Preprocessing
Representation, Noise removal, Representation, Noise removal,
binarization, skew, script id, font id binarization, thinning, vectorization
Primitives Glyph Recognition Primitive Recognition
Connected components, strokes, Straight lines, curve segments,
punctuations, words junctions, nodes, loops, characters
Structures Text Recognition Structure Recognition
Word segmentation, text line Text fields, legends, labels,
reconstruction, table analysis, linguistics dimensions, graphics symbols
Documents Page Layout Analysis Interpretation
Text versus non-text, physical component Component recognition,
analysis, logical component analysis, connectivity analysis, CAD layer
functional component analysis, separation, Database attribute
compression extraction, Compression
Corpus Information Retrieval Database, CAD
Document Classification, indexing, Validation, search, update
search, security, authentication, privacy
Type Example DIA Task Ancillary
Data
Plain text narrative Moby Dick Extract word order English lexicon
Newspaper, NY Times, Vogue Separate and reassemble articles, Publication
magazine pointers to illustrations specific format
Scholarly, IEEE PAMI Index, author, title, page, figs, Abbreviations,
technical text table, footnotes, equations acronyms, units
Formal text Program listing, Extract executable form Program, chess,
chess, bridge, recipe bridge syntax
Letter, Envelope Recommendation Sender, date, subject, routing info Directories
Directory Telephone book Extract name phone pairs Previous edition
Structured List Table of Contents Recover hierarchy, cross-refs Previous edition
Business Forms Order, invoice Convert to XML, link to Database Database form
Engineering Part drawing, Convert to CAD format Part list, drawing
Drawing isometric view standards
Schematic Diag Circuits Convert to CAD format Constraints
Map Street map Convert to GIS format GIS, other maps
Music score Moonlight Sonata Recover MIDI representation Music syntax
Table Stock quotes Construct model; header-entries Stock
abbreviations

You might also like