Raw Data, Also Known As Primary Data, Is

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Raw data, also known as primary data, is data (e.g.

, numbers, instrument readings, figures,


etc.) collected from a source. If a scientist sets up a computerized thermometer which records the
temperature of a chemical mixture in a test tube every minute, the list of temperature readings for
every minute, as printed out on a spreadsheet or viewed on a computer screen is "raw data". Raw
data has not been subjected to processing, "cleaning" by researchers to remove outliers, obvious
instrument reading errors or data entry errors, or any analysis (e.g., determining central tendency
aspects such as the average or median result). As well, raw data has not been subject to any other
manipulation by a software program or a human researcher, analyst or technician. It is also
referred to as primary data. Raw data is a relative term (see data), because even once raw data
has been "cleaned" and processed by one team of researchers, another team may consider this
processed data to be "raw data" for another stage of research. Raw data can be inputted to a
computer program or used in manual procedures such as analyzing statistics from a survey. The
term "raw data" can refer to the binary data on electronic storage devices, such as hard disk
drives (also referred to as "low-level data").

Contents
1 Generating data
2 Examples
3 Further reading
4 References

Generating data
Data has two ways of being created or generated. The first is what is called 'captured data',[1] and
is found through purposeful investigation or analysis. The second is called 'exhaust data',[1] and is
gathered usually by machines or terminals as a secondary function. For example, cash registers,
smartphones, and speedometers serve a main function but may collect data as a secondary task.
Exhaustive data is usually too large or of little use to process and becomes 'transient'[1] or thrown
away. However, 'derived'[1] data is useful enough in nature to be further processed for use.
Examples include smartphone data, traffic data, and hospital data.

Examples
In computing, raw data may have the following attributes: it may possibly contain human,
machine, or instrument errors, it may not be validated; it might be in different (colloquial)
formats; uncoded or unformatted; or some entries might be "suspect" (e.g., outliers), requiring
confirmation or citation. For example, a data input sheet might contain dates as raw data in many
forms: "31st January 1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, this
raw data may be processed stored as a normalized format, perhaps a Julian date, to make it easier
for computers and humans to interpret during later processing. Raw data (sometimes colloquially
called "sourcey" data or "eggy" data, the latter a reference to the data being "uncooked", that is,
"unprocessed", like a raw egg) are the data input to processing. A distinction is made between
data and information, to the effect that information is the end product of data processing. Raw
data that has undergone processing are sometimes referred to as "cooked" data in a colloquial
sense.[dubious discuss] Although raw data has the potential to be transformed into "information,"
extraction, organization, analysis and formatting for presentation are required before raw data
can be transformed into usable information.

For example, a point-of-sale terminal (POS terminal, a computerized cash register) in a busy
supermarket collects huge volumes of raw data each day about customers' purchases. However,
this list of grocery items and their prices and the time and date of purchase does not yield much
information until it is processed. Once processed and analyzed by a software program or even by
a researcher using a pen and paper and a calculator, this raw data may indicate the particular
items that each customer buys, when they buy them, and at what price; as well, an analyst or
manager could calculate the average total sales per customer or the average expenditure per day
of the week by hour. This processed and analyzed data provides information for the manager,
that the manager could then use to help her determine, for example, how many cashiers to hire
and at what times. Such information could then become data for further processing, for example
as part of a predictive marketing campaign. As a result of processing, raw data sometimes ends
up being put in a database, which enables the raw data to become accessible for further
processing and analysis in any number of different ways.

Tim Berners-Lee (inventor of the World Wide Web) argues that sharing raw data is important for
society. Inspired by a post by Rufus Pollock of the Open Knowledge Foundation his call to
action is "Raw Data Now", meaning that everyone should demand that governments and
businesses share the data they collect as raw data. He points out that "data drives a huge amount
of what happens in our lives because somebody takes the data and does something with it." To
Berners-Lee, it is essentially from this sharing of raw data, that advances in science will emerge.
Advocates of open data argue that once citizens and civil society organizations have access to
data from businesses and governments, it will enable citizens and NGOs to do their own analysis
of the data, which can empower people and civil society. For example, a government may claim
that its policies are reducing the unemployment rate, but a poverty advocacy group may be able
to have its staff econometricians do their own analysis of the raw data, which may lead this
group to draw different conclusions about the data set.

You might also like