Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

INTERNAL

ASSIGNMENT-1

BDA

NAME : V.Nantha kumar


Roll No:19BCA52
Course Code:BCA3642
DATA ANALYTICS LIFECYCLE
Data Analytics Life Cycle defines the process of the how
information is carried out in various phases for professionals working
on a project. It’ s a step-by-step procedure that is arranged in a
circular structure. Each phase has its own characteristics and
importance

Phase 1: Data Discovery

This is the first initial phase which defines the data’s purpose and
how to complete the data analytics life cycle. First identify all the critical
objectives and understand the business domain. Accumulate the resources
by analyzing the models that are intended to be developed and evaluate the
data sources needed.
Phase 2: Data Preparation
This phase will involve collecting, processing and cleansing the
data prior to modeling and analysis. One of the main aspects is ensuring
data availability for processing. Identifying various data sources and
analyzing how much data can be accumulated within a time frame is done.
Data Collection methods that are used in this phase are:

 Data acquisition: Collecting data through external sources.


 Data Entry: Prepare data points through manual entry or digital
systems.
 Signal reception: Accumulating data from digital devices such
as IoT devices and control systems.

Common tools used are — Hadoop, Spark, OpenRefine, etc.

Phase 3: Model Planning


In this phase, the team will analyze the quality of the data and find
an appropriate model for the project. An analytic sandbox is used to work
with the data and to perform analytics throughout the project duration.
Data can be loaded into the sandbox in three ways:

 Extract, Transform, Load (ETL) — The data is transformed


based on a set of business rules and then loaded into the sandbox.

 Extract, Load, Transform (ELT) — The data is loaded into the


sandbox and then transformed according to a set of business
rules.

 Extract, Transform, Load, Transform (ETLT) — It has two


transformation levels and is a combination of ETL and ELT.

After cleaning the data, the team will determine the different
techniques, methods, and workflow for building a model in the next phase.
The team will first explore the data, identifying the relations between data
points to select the key variables, and eventually devising a suitable model.

Common tools used are — R, SAS/ACCESS, SQL Analysis services,etc.

Phase 4: Model Building

The team will develops training, testing and production


datasets in this phase. Once this is done, the team will build and
execute the models. The data will be tested and various statistical
models like regression, decision trees, etc. will be performed to
determine whether it corresponds to the datasets. Although the
modeling techniques and logic required to develop models can be
highly complex, the actual duration of this phase can be short
compared to the time spent preparing the data and defining the
approaches. Once the data science team can articulate whether the
model is sufficiently robust to solve the problem or if it has failed, it
can move to the next phase.

Phase 5: Communication Results

This phase determines whether the results are a success or failure.


The data analysis results are evaluated and considered how best to
formulate the findings and outcomes to various team members and
stakeholders, taking into account warning, assumptions. The team will
identify key findings, quantify business value, and develop a narrative to
summarize and convey the findings to stakeholders. Also make
recommendations for future work or improvements to existing processes.
Stakeholders must understand how the model affects their processes.

Phase 6: Operationalize

In the final phase, the team will present the full in-depth report
with the briefings, coding, key findings and all the technical documents
and papers to the stakeholders. In this process, the data from the sandbox is
moved and run in a live environment. This approach helps in learning
about the performance and constraints of the model in a live environment
on a small scale and make the necessary adjustments before deployment.
The results are closely monitored, ensuring they match with the expected
goals. If the findings fit perfectly with the objective, then the report can be
finalized. The model can be then deployed and integrated into the business.

You might also like