Professional Documents
Culture Documents
topper world Data-science-lifecycle-fnl
topper world Data-science-lifecycle-fnl
Topperworld.in
⚫ Data Science Lifecycle revolves around the use of machine learning and
different analytical strategies to produce insights and predictions from
information in order to acquire a commercial enterprise objective.
⚫ The complete method includes a number of steps like data cleaning,
preparation, modelling, model evaluation, etc.
⚫ It is a lengthy procedure and may additionally take quite a few months to
complete. So, it is very essential to have a generic structure to observe for
each and every hassle at hand.
⚫ The globally mentioned structure in fixing any analytical problem is
referred to as a Cross Industry Standard Process for Data Mining or CRISP-
DM framework.
©Topperworld
Data Science
The following are some primary motives for the use of Data science
technology:
©Topperworld
Data Science
1. Clearly state the problem that requires solutions and why it should
be resolved at once
2. Define the potential value of the business project
3. Find risks, including ethical aspects involved in the project
4. Build and communicate a highly integrated, flexible project plan
2.Data collection:
⚫ Data collection is the next stage in the data science lifecycle to gather raw
data from relevant sources. The data captured can be either in structured
or unstructured form.
⚫ The methods of collecting the data might come from – logs from websites,
social media data, data from online repositories, and even data streamed
©Topperworld
Data Science
from online sources via APIs, web scraping or data that could be present in
excel or any other source.
⚫ The person performing the task should know the difference between
various data sets available and the data investment strategy of an
organization.
⚫ A major challenge faced by professionals in this step is tracking where each
data comes from and whether it is up-to-date.
⚫ It is important to keep track of this information throughout the entire
lifecycle of a data science project as it might help test hypotheses or run
any other updated experiments.
3.Data processing:
⚫ In this phase, data scientists analyse the data collected for biases, patterns,
ranges, and distribution of values.
⚫ It is done to determine the sustainability of the databases and predicts their
usage in regression, machine learning and deep learning algorithms.
⚫ The phase also involves the introspection of different types of data,
including nominal, numerical, and categorical data.
⚫ Data visualisation is also done to highlight the critical trends and patterns
of data, comprehended by simple bars and line charts.
⚫ Simply put, data processing might be the most time-consuming but
arguably the most critical phase in the entire life cycle of data analytics. The
goodness of the model depends on this data processing stage.
4.Data analysis:
©Topperworld
Data Science
5.Data modelling:
⚫ Modelling Data is one of the major phases of data processes and is often
mentioned as the heart of data analysis.
⚫ A model should use prepared and analysed data to provide the desired
output.
⚫ The environment needed for executing the data model will be decided and
created before meeting the specific requirements.
⚫ In this phase, the team works together to develop datasets for training and
testing the model for production purposes.
⚫ It also involves various tasks such as choosing the appropriate mode type
and learning whether the problem is a classification, regression, or
clustering problem.
©Topperworld
Data Science
⚫ After analysing the model family, you must choose the algorithms to
implement them.
⚫ It has to be done carefully since extracting necessary insights from the
prepared data is extremely important.
6.Model deployment:
⚫ Now, we are at the final stage of the lifecycle of data science. After a
rigorous evaluation process, the model is finally prepared to be deployed
in the desired format and preferred channel.
⚫ Remember, there is no value for the machine learning model until it’s
deployed to production. Hence machine learning models have to be
recorded before the deployment process. In general, these models are
integrated and coupled with products and applications.
⚫ The stage of Model deployment involves the creation of a delivery
mechanism required to get the mode out in the market among the users or
to another system.
⚫ Machine learning models are also deployed on devices and gaining
adoption and popularity in the field of computing. From simple model
output in a Tableau Dashboard to a complex as scaling it to cloud in front
of millions of users, this step is distinct for different projects.
©Topperworld