Welcome to Scribd!

EGU23 15072 Print

Uploaded by

0% found this document useful (0 votes)

10 views2 pages

This document summarizes an academic paper that proposes an automated workflow for extracting bioclimatic time series data from PDF tables. It discusses how PDF format makes it difficult to extract structured data like tables and the challenges involved. It then surveys existing tools for automated table detection and extraction, emphasizing open source programs. As a case study, it presents a workflow using R and AWK scripts to extract temperature and precipitation data from official PDF documents of the Puglia region of Italy. The document concludes by discussing lessons learned and generalizing the approach.

Original Description:

Original Title

EGU23-15072-print

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

10 views2 pages

EGU23 15072 Print

Uploaded by

ssoollaaff99

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/371349178

Automated Extraction of Bioclimatic Time Series from PDF Tables

Conference Paper · June 2023

DOI: 10.5194/egusphere-egu23-15072

CITATIONS READS

0 37

3 authors:

Sabino Maggi Silvana Fuina

Italian National Research Council Università degli Studi di Bari Aldo Moro
137 PUBLICATIONS 470 CITATIONS 13 PUBLICATIONS 45 CITATIONS

SEE PROFILE SEE PROFILE

Saverio Vicario
Italian National Research Council
137 PUBLICATIONS 4,485 CITATIONS

SEE PROFILE

All content following this page was uploaded by Sabino Maggi on 07 June 2023.

The user has requested enhancement of the downloaded file.

EGU23-15072
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Automated Extraction of Bioclimatic Time Series from PDF Tables

Sabino Maggi, Silvana Fuina, and Saverio Vicario
CNR National Research Council, Institute of Atmospheric Research, Bari, Italy (sabino.maggi@cnr.it)

Since the development of the original specifications in the '90s the PDF document format has
become the de-facto standard for the distribution and archival of documents in electronic form
because of its ability to preserve the original layout of the documents, independently of the
hardware, operating system and application software used to visualize them.

Unfortunately the PDF format does not contain explicit structural and semantic information,
making it very difficult to extract structured information from them, in particular data presented in
tabular form.
The automatic extraction of tabular data is a difficult and challenging task because tables can have
extremely different formats and layouts, and involves several complex steps, from the proper
recognition and conversion of printed text into machine-encoded characters, to the identification
of logically coherent table constructs (headers, columns, rows, spanning elements), and to the
breaking down of the data constructs into elemental objects.

Several tools have been developed to support the extraction process. In this work we survey the
most interesting tools for the automatic detection and extraction of tabular data, analyzing their
respective advantages and limitations. A particular emphasis is given on programmable open
source tools because of their flexibility and long-term availability, together with the possibility to
easily tweak them to meet the peculiar needs of the problem at hand.

As a practical application, we also present a workflow based on a set of R and AWK scripts that can
automatically extract daily temperature and precipitation data from the official PDF documents
made available each year by Regione Puglia, in Italy. The lessons learned from the development of
this workflow and the possibility to generalize the approach to different kinds of PDF documents
are also discussed.

View publication stats

How To Stand Out in The World of Online Dating With A Swipe-Worthy Profile and Irresistable Messaging Strategies
Document154 pages
How To Stand Out in The World of Online Dating With A Swipe-Worthy Profile and Irresistable Messaging Strategies
cwag68
No ratings yet
Document Management System Report
Document33 pages
Document Management System Report
Roshan Anegundi
100% (2)
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
From Everand
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
Bernardo Ronquillo Japón
No ratings yet
Digital Gram Panchayat-1
Document69 pages
Digital Gram Panchayat-1
Swetha K
No ratings yet
FAQ - Secure Web Gateway Hardware+Software Platform
Document10 pages
FAQ - Secure Web Gateway Hardware+Software Platform
Erhan Gündüz
No ratings yet
SolverTable For Mac Help
Document9 pages
SolverTable For Mac Help
Shriman Mantri
No ratings yet
UNISoN: A Tool To Aid Evaluation of Sociability in On-Line Discussion Boards
Document57 pages
UNISoN: A Tool To Aid Evaluation of Sociability in On-Line Discussion Boards
Steve Leonard
No ratings yet
Software Requirements Specification
Document44 pages
Software Requirements Specification
Ravi Kaushik
No ratings yet
UIC2014 Camera Ready
Document10 pages
UIC2014 Camera Ready
OSWALDO ENRIQUE GRANADOS PENARETE
No ratings yet
[Download pdf] Formal Techniques For Distributed Objects Components And Systems 40Th Ifip Wg 6 1 International Conference Forte 2020 Held As Part Of The 15Th International Federated Conference On Distributed Computi online ebook all chapter pdf
Document41 pages
[Download pdf] Formal Techniques For Distributed Objects Components And Systems 40Th Ifip Wg 6 1 International Conference Forte 2020 Held As Part Of The 15Th International Federated Conference On Distributed Computi online ebook all chapter pdf
james.perry468
100% (15)
Download full chapter Formal Techniques For Distributed Objects Components And Systems 40Th Ifip Wg 6 1 International Conference Forte 2020 Held As Part Of The 15Th International Federated Conference On Distributed Computi pdf docx
Document52 pages
Download full chapter Formal Techniques For Distributed Objects Components And Systems 40Th Ifip Wg 6 1 International Conference Forte 2020 Held As Part Of The 15Th International Federated Conference On Distributed Computi pdf docx
nichole.hackman591
No ratings yet
(Download PDF) Evolutionary Computing and Mobile Sustainable Networks Proceedings of Icecmsn 2020 V Suma Online Ebook All Chapter PDF
Document42 pages
(Download PDF) Evolutionary Computing and Mobile Sustainable Networks Proceedings of Icecmsn 2020 V Suma Online Ebook All Chapter PDF
jacqueline.hyre274
100% (13)
02 Paper Sistematecnologicocostruzioni PDF
Document23 pages
02 Paper Sistematecnologicocostruzioni PDF
Francesco Mario D'Alessio
No ratings yet
02 Paper Sistematecnologicocostruzioni
Document23 pages
02 Paper Sistematecnologicocostruzioni
Francesco Mario D'Alessio
No ratings yet
Testing Software and Systems 35Th Ifip WG 6 1 International Conference Ictss 2023 Bergamo Italy September 18 20 2023 Proceedings 1St Edition Silvia Bonfanti Online Ebook Texxtbook Full Chapter PDF
Document70 pages
Testing Software and Systems 35Th Ifip WG 6 1 International Conference Ictss 2023 Bergamo Italy September 18 20 2023 Proceedings 1St Edition Silvia Bonfanti Online Ebook Texxtbook Full Chapter PDF
wanda.page701
100% (9)
Iso Dis 15339-1 (E)
Document17 pages
Iso Dis 15339-1 (E)
amgsys75
No ratings yet
A Methodology For Social BI
Document10 pages
A Methodology For Social BI
lmorsal01
No ratings yet
Sciencedirect: Research Questions
Document6 pages
Sciencedirect: Research Questions
ali aflah muzakki
No ratings yet
Design and Implementation of An Iot-Based Home Automation System Utilizing Fog and Cloud Computing Paradigms
Document144 pages
Design and Implementation of An Iot-Based Home Automation System Utilizing Fog and Cloud Computing Paradigms
sam y
No ratings yet
Bim Gis Integration For Asset Management Investig-Groen Kennisnet 312633
Document90 pages
Bim Gis Integration For Asset Management Investig-Groen Kennisnet 312633
mec3ds
No ratings yet
1 s2.0 S1877042815011738 Main
Document9 pages
1 s2.0 S1877042815011738 Main
kevin abdul hakim farabi
No ratings yet
Download Coordination Models And Languages 23Rd Ifip Wg 6 1 International Conference Coordination 2021 Held As Part Of The 16Th International Federated 1St Edition Ferruccio Damiani online ebook texxtbook full chapter pdf
Document70 pages
Download Coordination Models And Languages 23Rd Ifip Wg 6 1 International Conference Coordination 2021 Held As Part Of The 16Th International Federated 1St Edition Ferruccio Damiani online ebook texxtbook full chapter pdf
john.castrellon270
100% (14)
EOSC SRIA V1.0 - 15feb2021
Document195 pages
EOSC SRIA V1.0 - 15feb2021
Binyam Asfaw
No ratings yet
JSS 2019
Document27 pages
JSS 2019
ElJefe
No ratings yet
Seyed Saeed Ghiassy 63CP3261 Final Year Project Peer-to-Peer File Sharing
Document51 pages
Seyed Saeed Ghiassy 63CP3261 Final Year Project Peer-to-Peer File Sharing
jeevan4007
No ratings yet
The Linklet Project A Solution To Your Documentation Needs
Document7 pages
The Linklet Project A Solution To Your Documentation Needs
IJRASETPublications
No ratings yet
Cloudbook Website
Document13 pages
Cloudbook Website
IJRASETPublications
No ratings yet
Software Architecture Documentation: The Road Ahead: June 2011
Document5 pages
Software Architecture Documentation: The Road Ahead: June 2011
Chy Gomez
No ratings yet
Users Manual LGU P4
Document40 pages
Users Manual LGU P4
christian epiz
No ratings yet
2018 Publishing and Consuming 3D Content On The WebA Survey
Document91 pages
2018 Publishing and Consuming 3D Content On The WebA Survey
José Ferrão
No ratings yet
Download ebook Recent Trends In Image Processing And Pattern Recognition Third International Conference Rtip2R 2020 Aurangabad India January 3 4 2020 Revised In Computer And Information Science K C Santosh Edito online pdf all chapter docx epub
Document56 pages
Download ebook Recent Trends In Image Processing And Pattern Recognition Third International Conference Rtip2R 2020 Aurangabad India January 3 4 2020 Revised In Computer And Information Science K C Santosh Edito online pdf all chapter docx epub
dennis.kuehn759
100% (6)
Ip Porject Done by (Suman, Aditya, Deepak)
Document33 pages
Ip Porject Done by (Suman, Aditya, Deepak)
venkat manoj
No ratings yet
Iot Based Biometric Attendance System: Ijarcce
Document4 pages
Iot Based Biometric Attendance System: Ijarcce
sreekanth2728
No ratings yet
Documentaion
Document19 pages
Documentaion
gokulkrishna8137
No ratings yet
The WEKA Data Mining Software An Update
Document10 pages
The WEKA Data Mining Software An Update
Prateek Malhotra
No ratings yet
Building Web Applications With Protege
Document5 pages
Building Web Applications With Protege
Harold
No ratings yet
Sensor Data Transmission From A Physical Twin To A Digital Twin
Document106 pages
Sensor Data Transmission From A Physical Twin To A Digital Twin
mito lee
No ratings yet
Smart Cloud Federation Simulations With Cloudsim: June 2013
Document18 pages
Smart Cloud Federation Simulations With Cloudsim: June 2013
fall
No ratings yet
OSY2 Microproject
Document18 pages
OSY2 Microproject
parthpatil147
No ratings yet
Bharat
Document13 pages
Bharat
Bharat Oli
No ratings yet
Design and Implementation of Wireless Network.: December 2014
Document66 pages
Design and Implementation of Wireless Network.: December 2014
Nestor Adrian Latoga
No ratings yet
Virdo: A Virtual Workspace For Research Documents: September 2013
Document5 pages
Virdo: A Virtual Workspace For Research Documents: September 2013
allen
No ratings yet
Network Attached Storage Using Raspberry Pi: A Report On Major Project
Document28 pages
Network Attached Storage Using Raspberry Pi: A Report On Major Project
PRANAY SHRIDHAR
No ratings yet
Resume College App
Document1 page
Resume College App
EmeyLuv
No ratings yet
Portalfor Final Year Project
Document11 pages
Portalfor Final Year Project
Lucas
No ratings yet
An Assessment of The Industry Foundation Classes Openbim Format For Cross-Platform Building Information Modelling Work Ows For Architectural Heritage
Document71 pages
An Assessment of The Industry Foundation Classes Openbim Format For Cross-Platform Building Information Modelling Work Ows For Architectural Heritage
Jefferson Widodo
100% (1)
Development of Online Digital Library
Document25 pages
Development of Online Digital Library
mahashir2
No ratings yet
Access and Use Internet
Document85 pages
Access and Use Internet
Santa Best
No ratings yet
Development of A Tool For Quick Result Analysis
Document5 pages
Development of A Tool For Quick Result Analysis
International Journal of Innovative Science and Research Technology
No ratings yet
Download pdf Recent Advances In Big Data And Deep Learning Proceedings Of The Inns Big Data And Deep Learning Conference Innsbddl2019 Held At Sestri Levante Genova Italy 16 18 April 2019 Luca Oneto ebook full chapter
Document54 pages
Download pdf Recent Advances In Big Data And Deep Learning Proceedings Of The Inns Big Data And Deep Learning Conference Innsbddl2019 Held At Sestri Levante Genova Italy 16 18 April 2019 Luca Oneto ebook full chapter
danny.fortune253
100% (1)
Specification of The Central Database: Lead Author: CSI Deliverable Classification: PU
Document32 pages
Specification of The Central Database: Lead Author: CSI Deliverable Classification: PU
Ngọc Ánh
No ratings yet
Download full chapter Coordination Models And Languages 22Nd Ifip Wg 6 1 International Conference Coordination 2020 Held As Part Of The 15Th International Federated Conference On Distributed Computing Techniques Discotec 2 pdf docx
Document52 pages
Download full chapter Coordination Models And Languages 22Nd Ifip Wg 6 1 International Conference Coordination 2020 Held As Part Of The 15Th International Federated Conference On Distributed Computing Techniques Discotec 2 pdf docx
jack.parrish237
100% (7)
2021020606440184225752
Document81 pages
2021020606440184225752
Wattoo Jii
No ratings yet
Universiti Teknologi Malaysia: Joei Ong Suk Mei
Document19 pages
Universiti Teknologi Malaysia: Joei Ong Suk Mei
Joey Ong
No ratings yet
Individual Project (ENGD3800)
Document73 pages
Individual Project (ENGD3800)
Sekaram Weniton
No ratings yet
Project Report
Document49 pages
Project Report
Elangovan Elangovan
No ratings yet
Data Analysis Made Easy With UNkNOT
Document6 pages
Data Analysis Made Easy With UNkNOT
IJRASETPublications
No ratings yet
Industry 4.0 - Digital Twins and OPC UA: Aksel Øvern
Document173 pages
Industry 4.0 - Digital Twins and OPC UA: Aksel Øvern
Mihail Avramov
No ratings yet
Bluetooth 5.1: An Analysis of Direction Finding Capability For High-Precision Location Services
Document17 pages
Bluetooth 5.1: An Analysis of Direction Finding Capability For High-Precision Location Services
Ali Najim Al-Askari
No ratings yet
Design and Development of A University Portal For The Management of Final Year Undergraduate Projects
Document11 pages
Design and Development of A University Portal For The Management of Final Year Undergraduate Projects
Madhu G M
No ratings yet
Javadhiru
Document11 pages
Javadhiru
gsnpwgv4ft
No ratings yet
The JEDI Event-Based Infrastructure and Its Applic
Document55 pages
The JEDI Event-Based Infrastructure and Its Applic
RSX S
No ratings yet
Exploring Semantic Technologies and Their Application to Nuclear Knowledge Management
From Everand
Exploring Semantic Technologies and Their Application to Nuclear Knowledge Management
IAEA
No ratings yet
Assignment No 6 - Polarity
Document2 pages
Assignment No 6 - Polarity
Vaishnavi Gurav
No ratings yet
MBP 13 2020
Document307 pages
MBP 13 2020
bryan rondon
No ratings yet
From Tkinter Import From Tkinter Import TTK
Document6 pages
From Tkinter Import From Tkinter Import TTK
Joy Wagner
No ratings yet
Mandeep Singh
Document1 page
Mandeep Singh
Ishanth Swaroop
No ratings yet
7011B SERIES: UNINTERRUPTED Peace of Mind
Document11 pages
7011B SERIES: UNINTERRUPTED Peace of Mind
Juan Castañeda
No ratings yet
Excel - How To Solve Complicated Incentive Calculation - Dear All I Need..
Document35 pages
Excel - How To Solve Complicated Incentive Calculation - Dear All I Need..
James Wood
No ratings yet
Cyber Security (R18A0521) Lecture Notes: Department of Cse
Document37 pages
Cyber Security (R18A0521) Lecture Notes: Department of Cse
SRAVANTHI SALLARAM
100% (2)
Frontend Development With Javafx and Kotlin Build State of The Art Kotlin Gui Applications Peter Spath Full Chapter
Document67 pages
Frontend Development With Javafx and Kotlin Build State of The Art Kotlin Gui Applications Peter Spath Full Chapter
mary.gordon132
100% (13)
X Ray Guide v3 - 9
Document40 pages
X Ray Guide v3 - 9
elcaso34
No ratings yet
TVL Quiz
Document2 pages
TVL Quiz
Joy Cabansag
No ratings yet
Informatics Practices-Xii-Model Test Paper-1
Document6 pages
Informatics Practices-Xii-Model Test Paper-1
Khushbu Yadav
No ratings yet
Tle Grade 10 Css Week 1 8 Fourth Quarter
Document37 pages
Tle Grade 10 Css Week 1 8 Fourth Quarter
joanahprenciona086
No ratings yet
CYMGRD 6.3 For Windows: User'S Guide and Reference Manual
Document141 pages
CYMGRD 6.3 For Windows: User'S Guide and Reference Manual
Jhon
100% (1)
Guideto PP
Document123 pages
Guideto PP
Carlos Moreno
No ratings yet
RCC54 Circular Column Charting
Document13 pages
RCC54 Circular Column Charting
bunheng lon
No ratings yet
DSP Lab 1
Document8 pages
DSP Lab 1
salman
No ratings yet
SAP ERP and Tally ERP
Document11 pages
SAP ERP and Tally ERP
alkjjasdn
No ratings yet
Operating Systems Lab
Document54 pages
Operating Systems Lab
shivamcps6010
No ratings yet
How To Install VMware Workstation On Ubuntu 20.04 Focal Fossa Linux
Document8 pages
How To Install VMware Workstation On Ubuntu 20.04 Focal Fossa Linux
you 'n me
No ratings yet
Jurnal Kependidikan:: STKIP Muhammadiyah Barru
Document14 pages
Jurnal Kependidikan:: STKIP Muhammadiyah Barru
AB1A 13 FIQIH MAKBULAN
No ratings yet
Memory Control Program MCP-2A PDF
Document4 pages
Memory Control Program MCP-2A PDF
LeeStoli
No ratings yet
Flash Download Tool: User Guide
Document35 pages
Flash Download Tool: User Guide
Paulo H. Corteze
No ratings yet
O Calatorie Prin Corpul Uman
Document232 pages
O Calatorie Prin Corpul Uman
Andreea Gheban
No ratings yet
Workflow
Document47 pages
Workflow
Saeed Nashar
No ratings yet
CV Nguyen Huy Hung 6 PDF
Document2 pages
CV Nguyen Huy Hung 6 PDF
Quoc Viet
No ratings yet
Dinesh Verma - MS-54 Management of Information System (2021)
Document264 pages
Dinesh Verma - MS-54 Management of Information System (2021)
Secluda
No ratings yet
Universidad Cristiana de Honduras (Ucrish)
Document6 pages
Universidad Cristiana de Honduras (Ucrish)
Fanny Santos Cruz
No ratings yet