Data Capture and Preparation

Data capture and

• A GIS is a computer-based system that provides the
following four sets of capabilities to handle geo-
referenced data.
1. Data capture and preparation
2. Data storage and management
3. Data manipulation and analysis
4. Data presentation/Output
Data capture
• it’s the costliest element of any project and of most organizational
implementations-- > 80%.
• Creating a GIS database is a daunting task
• Involves data capture, verification, and structuring
• Raw geographical data is available in many different analogue and
digital forms such as maps, aerial photographs, satellite images, or
Major Data Sources for GIS
• Primary data
- Data that is captured directly from the
- E.g. through
1) Ground-based field surveys.
- questionnaires
- total stations
2) Using remote sensors to obtain
- satellite imagery and aerial photos.
Plane surveying systems
Images: derived
from optical and
digital remote
sensing systems
mounted on aircraft
and satellite
factors that make direct data acquisition
• Time: In the first instance the time may not be
sufficient to collect the data or the weather may
be unfavorable for data acquisition in the case
where the technique is weather dependent.
• Cost on the other hand is a major consideration
since data collection is capital intensive. Hence
when the budget is a limitation then this may
not favor direct data acquisition
• Secondary data
- Data that is not captured directly from the
1) Available digital data. (download)
2) Reports
3) Data derived from existing paper maps through
scanning and digitizing.
- In recent years there has been a significant
increase in Digitizing and scanning because of the
availability and sharing of digital (geospatial) data.
• It is the conversion of analogue data into digital
• A traditional method of obtaining spatial data from
existing paper maps.
• There are two forms of digitizing:
1) On-tablet digitizing the original map is fitted on a
special surface (the tablet) & operator traces the
geographic features with a mouse device.
Digitizing Tablet for Vector Data Input
2. On-screen digitizing :a scanned image of the map is shown on the
computer screen, the operator follows the map features with a mouse
device, thereby tracing the geographic features.
Data quality in GIS
• Data Quality refers to how good the data are
• When using a GIS there is sometime a tendency to assume
all data is accurate. But this is never the case.
• While some steps can be taken to reduce the impact of
certain types of error, they can never be completely
• The greater the degree of error in the data, the less reliable
are the results of analyses based upon that data.
• This is sometimes referred to as GIGO (Garbage In Garbage


Data Quality
Sources of Errors
• Spatial inaccuracies arises if the coordinates used to
identify location of an entity or a data point is
recorded incorrectly.
• Attribute errors arise if the attribute data for objects
are measured and recorded incorrectly.
• Attribute data values and characteristics are likely to
change over time, so it is good procedure to record
the date and time when data was collected.
Stages at which errors may be introduced.
1. Data input
• Primary data acquisition errors occur during data capture or
measurement e.g. problem with measurement instruments,
sample bias, interpretation of images, some measurement
methods (e.g surveying) are more accurate than others
(interpretation of an air photo)
• Secondary data acquisition errors e.g. digitizing errors, typing
errors etc.
Typical Digitizing Errors
Stages at which errors may be introduced
2. Data processing further errors may be introduced during data
processing e.g. converting data from raster to vector, rounding off
error. Using inappropriate tools for a particular type of analysis.
3. Data display e.g. wrong scale,
Data preparation
• Aims at making the acquired spatial data fit for use.
• Satellite Imagery may require enhancements and
• Data collected may require conversion to either raster
or vector format.
• Once digitized, vector data may contain errors which
require editing and corrections E.g. overshoots,
duplicate lines, attribute data, etc.
Correction of digitized data
• Given that errors can never be completely eliminated,
good practice entails providing metadata.
• Metadata is data about data. In the GIS context, each set
of data should be accompanied by metadata explaining
not only what it contains but how and when it was
collected, and details relating to its quality.
• The emphasis is not to make the data free of errors but
simply to provide potential users with the information
they require to make an informed decision about the
adequacy of the data for a particular purpose
The table below indicates the type of information the
metadata might include.
Why use and create metadata
1. To help organize and maintain an organization's
spatial data
- Employees may come and go but metadata can
catalogue the changes and updates made to each
spatial data set and how each employee implemented
2. To provide information to other organizations and
clearinghouses to facilitate data sharing and transfer
- It makes sense to share existing data sets rather
than producing new ones if they are already available
3. To document the history of a spatial data set
- Metadata documents what changes have been
made to each data set, such as changes in geographic
projection, adding or deleting attributes, editing line
intersections, or changing file formats. All of these
could have an effect on data quality.

