Professional Documents
Culture Documents
Data Processing Task
Data Processing Task
Data Processing Task
For example:
When data is collected and translated into usable information, data processing occurs. Usually performed
by a data scientist or team of data scientists, it is important for data processing to be done correctly as not to
negatively affect the end product, or data output.
Data processing starts with data in its raw form and converts it into a more readable format (graphs, documents,
etc.), giving it the form and context necessary to be interpreted by computers and utilized by employees
throughout an organization.
6 STAGE OF DATA PROCESSING
1. Data collection
Collecting data is the first step in data processing. It is a stage of gathering raw facts
from the environment and preparing it for the input process.
- The picture shows the steps in data preparation. The first step
of a data preparation pipeline is to gather data from various
sources and locations. Before any processing is done, we wish
to discover what the data is about. At this stage, we understand
the data within the context of business goals. Visualization of
the data is also helpful here. The next stage is to cleanse the
data of missing values and invalid values. We also reformat data
to standard forms. Next we transform the data for a specific
outcome or audience. We can enrich data by merging different
datasets to enable richer insights. Finally, we store the data or
directly send it out for analytics.
3. Data input
It refers to supply of data for processing. It is the stage where verified data is coded or converted into
machine readable form so that it can be processed through an application. Most data need to follow a
formal and strict syntax since a great deal of processing power is required to breakdown the complex data
at this stage.
- it is stage that the data is subjected to various means and methods of powerful technical
manipulations using Machine Learning and Artificial Intelligence algorithms to generate an output or
interpretation about the data. The process may be made up of multiple threads of execution that
simultaneously execute instructions, depending on the type of data.
1. DISTANCE IN KILOMETERS
1.DOCUMENTS
Documents can come in a variety of forms, whether
they are public records (handbooks, policy outlines,
plans, curriculums), personal documents (calendars,
emails, logs), or physical artifacts (handbooks, flyers,
posters, agendas)
2.CASE STUDIES
Case studies are studies or investigations of a particular person, or
group of people.
3.AUDIO RECORDING
Audio recordings can be derived from recordings of in depth
interviews, focus groups, or anything recorded during
observational studies. They can also be content such as
podcasts, newscasts, speeches, or other recorded content.
4.PHOTOGRAPHS
Photographs are any images that are captured by camera.
These can be photographs taken while in the field, photos
taken of a research subject, or of their work or living space, or
any other artifacts related to the subject of your research.
5.VIDEO RECORDING
Video recordings can include footage taken from in depth
interviews, focus groups, or observational studies. They can
also be derived from online video content such as YouTube
videos, films, news reports, or videos of events.
ACCURACY
Accuracy is a crucial data quality characteristic because inaccurate information can cause significant
problems with severe consequences.
Example: In the US database, dates follow the MM/DD/YYYY format, whereas, in the EU database and
other countries of the world, it's DD/MM/YYYY.
COMPLETENESS
Completeness” refers to how comprehensive the information is. When looking at data completeness, think about
whether all of the data you need is available; you might need a customer’s first and last name, but the middle initial may
be optional.
Example: a customer's first name and last name are mandatory but middle name is optional; so a record can be
considered complete even if a middle name is not available.
RELIABILITY
Reliability is a vital data quality characteristic. When pieces of information contradict themselves, you can’t trust
the data. You could make a mistake that could cost your firm money and reputational damage.
Example: from the healthcare field; if a patient’s birthday is January 1, 1970 in one system, yet it’s June 13, 1973 in
another, the information is unreliable..
RELEVANCE
When you’re looking at data quality characteristics, relevance comes into play because there has to be a good
reason as to why you’re collecting this information in the first place. You must consider whether you really need this
information, or whether you’re collecting it just for the sake of it.
Example: Your smartphone features a number of sensors: GPS, a compass (magneto meter), an
accelerometer, a microphone, a light sensor and maybe even a fingerprint scanner.
TIMELINESS
Timeliness, as the name implies, refers to how up to date information is. If it was gathered in the past hour,
then it’s timely – unless new information has come in that renders previous information useless.
Example: timeliness is the degree to which data delivery from a source system conforms to a schedule for
delivery. In large data assets, data is made available once processing is complete.
THANK YOU