Predictive Analysis Using Python: A Summer Training Report

Predictive analysis using Python
A SUMMER TRAINING REPORT
Submitted by
Vishal Kanwar
UID: 19BCS1406
in partial fulfilment of Summer training for the award of the degree of
BACHELOR OF ENGINEERING IN COMPUTER SCIENCE
APEX INSTITUTE OF TECHNOLOGY

CHANDIGARH UNIVERSITY, GHARUAN
JULY 2021
ii
CHANDIGARH UNIVERSITY, GHARUAN, MOHALI

CANDIDATE'S DECLARATION
I Vishal Kanwa hereby declare that I have undertaken six weeks industrial
training atCoursera during a period from june to july in partial fulfillment of
requirements
for the award of degree of B.E (COMPUTER SCIENCE & ENGINEERING) at
CHANDIGARH UNIVERSITY GHARUAN, MOHALI. The work which is being
presented in the training report submitted to Department of Computer Science &
Engineering at CHANDIGARH UNIVERSITY GHARUAN, MOHALI is an
authentic record of training work.
Signature of the Student
Vishal Kanwar
CERTIFICATE
iv
ACKNOWLEDGEMENT
The summer training opportunity I had with COURSERA was a great chance
for learning and professional development. Therefore, I consider myself as a
very lucky individual as I was provided with an opportunity to be a part of it. I
am also grateful for having a chance to meet so many wonderful people and
professionals who led me though this internship period.
I perceive as this opportunity as a big milestone in my career development. I

will strive to use gained skills and knowledge in the best possible way, and I
will continue to work on their improvement, in order to attain desired career
objectives. Hope to continue cooperation with all of you in the future,
Sincerely,
Vishal Kanwar
July 20th 2021

ABSTRACT
Background
Our understanding of the etiology, pathophysiology, phenotypic diversity, and
progression of Parkinson’s disease has stagnated. Consequently, patients do not
receive the best care, leading to unnecessary disability, and to mounting costs for
society. The Personalized Parkinson Project (PPP) proposes an unbiased approach
to biomarker development with multiple biomarkers measured longitudinally. Our
main aims are: (a) to perform a set of hypothesis-driven analyses on the
comprehensive dataset, correlating established and novel biomarkers to the rate of
disease progression and to treatment response; and (b) to create a widely accessible
dataset for discovery of novel biomarkers and new targets for therapeutic
interventions in Parkinson’s disease.
Methods/design
This is a prospective, longitudinal, single-center cohort study. The cohort will
comprise 650 persons with Parkinson’s disease. The inclusion criteria are purposely
broad: age ≥ 18 years; and disease duration ≤5 years. Participants are followed for 2
years, with three annual assessments at the study center. Outcomes include a clinical
assessment (including motor and neuro-psychological tests), collection of
biospecimens (stool, whole blood, and cerebrospinal fluid), magnetic resonance
imaging (both structural and functional), and ECG recordings (both 12-lead and
Holter). Additionally, collection of physiological and environmental data in daily life
over 2 years will be enabled through the Verily Study Watch. All data are stored with
polymorphic encryptions and pseudonyms, to guarantee the participants’ privacy on
the one hand, and to enable data sharing on the other. The data and biospecimens
will become available for scientists to address Parkinson’s disease-related research
questions.
Discussion
The PPP has several distinguishing elements: all assessments are done in a single
center; inclusion of “real life” subjects; deep and repeated multi-dimensional
phenotyping; and continuous monitoring with a wearable device for 2 years. Also,
the PPP is powered by privacy and security by design, allowing for data sharing with
scientists worldwide respecting participants’ privacy. The data are expected to open
the way for important new insights, including identification of biomarkers to predict
differences in prognosis and treatment response between patients. Our long-term
aim is to improve existing treatments, develop new therapeutic approaches, and
offer Parkinson’s disease patients a more personalized disease management
approach.
Trial registration
Clinical Trials NCT03364894. Registered December 6, 2017 (retrospectively
registered).
CONTENTS
Title Page hi
About the company ii
Certificate iii
Acknowledgement iv
ABSTRACT
CONTENTS 1
CHAPTER 1 INTRODUCTION 2
1.1 PREDICTIVE ANALYSIS 2
CHAPTER 2 THEORY 4
2.1 PYTHON 4
2.2 MACHINE LEARNING 5
2.3 DATA SET 7
CHAPTER 3 METHODOLOGY ADOPTED 14
CHAPTER 4 CONCLUSIONS AND FUTURE SCOPE OF STUDY 15
REFERENCES 16
CHAPTER: 1 INTRODUCTION
PREDICTIVE ANALYSIS
Making future predictions about unknown events with the help of techniques
from data mining, statistics, machine learning, math modeling, and artificial
intelligence is known as predictive analytics. With the help of past data, it
makes predictions. We use predictive analytics in our day-to-day life without
giving much thought. For example, predicting sales of an item (say flowers)
in a market for a particular day. If it is valentines day, the sales of roses
would be high! We can easily say that the sales of flowers would be higher
on festive days than on regular days.
1
In predictive analytics, we find the factors responsible, gather data, apply
techniques from machine learning, data mining, predictive modeling, and
other analytical techniques to predict the future. The insights from the data
include patterns, the relationship among different factors that might be
previously unknown. Unraveling those hidden insights is of more worth
than you think it is. Businesses use predictive analytics to enhance their
process and to achieve their targets. Insights obtained from both
structured and unstructured data can be used for predictive analytics.
How do data insights help?

In recent years, organizations have opted to collect vast amounts of data
assuming that, if they harvest enough of it, it will eventually give rise to
relevant business insights. Even Instagram and Facebook are providing
insights to business accounts. But, data in its raw form is not useful no
matter how large it is. More the data to wade through, the more difficult it
is to separate valuable business information from irrelevant. A data insights
strategy is built on the premise that to realize the true potential of data, you
first need to determine why you’re using it and what business value you
hope to glean from it. Here is how to obtain insights from data and make
use of it.
1. Defining the problem statement/business goal.
Define the project outcomes, deliverables, scoping of the effort, business

objectives, prepare a questionnaire for the data to be obtained based on the
business goal.
2. Collection of data based on the answers to the questions created based on

the problem statement.
Based on the questionnaire, collect answers in form of datasets.
2
3. Integrate the data obtained from various sources.
Data mining for predictive analytics prepares data from multiple sources for
analysis. This provides a complete view of the customer interactions.
4. Analysis of data with analytics tools/software. We can visualize the data

to observe patterns and relationships among various factors.
Data analysis is the process of inspecting, cleansing, transforming, and

modelling data with the objective of discovering useful information to arrive
at a conclusion.
5. Validate assumptions, hypotheses and test them using statistical models.
Statistical analysis enables to validation of the assumptions, hypothesis, and

tests them using statistical models. The assumptions are based on the
problem statement, formed during EDA.
6. Model generation
Model is generated with algorithms to automate the process with the new
data combined with existing data. Multiple models can also be combined to
obtain better results.
7. Deploying the model to generate predictions and monitor them for
accuracy. Predictive model deployment provides the option to deploy the
analytical results into the everyday decision-making process to get results,

reports, and output by automating the decisions based on the modelling
3
CHAPTER: 2
THEORY
1) PYTHON
Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics. Its high-level built in data structures, combined
with dynamic typing and dynamic binding, make it very attractive for
Rapid Application Development, as well as for use as a scripting or glue
language to connect existing components together. Python's simple, easy to
learn syntax emphasizes readability and therefore reduces the cost of
program maintenance. Python supports modules and packages, which
encourages program modularity and code reuse. The Python interpreter
and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.
Often, programmers fall in love with Python because of the increased
productivity it provides. Since there is no compilation step, the edit-
testdebug cycle is incredibly fast. Debugging Python programs is easy: a
bug or bad input will never cause a segmentation fault. Instead, when the
interpreter discovers an error, it raises an exception. When the program
doesn't catch the exception, the interpreter prints a stack trace. A source
level debugger allows inspection of local and global variables, evaluation
of arbitrary expressions, setting breakpoints, stepping through the code a
line at a time, and so on. The debugger is written in Python itself, testifying
to Python's introspective power. On the other hand, often the quickest way
to debug a program is to add a few print statements to the source: the fast
edit-test-debug cycle makes this simple approach very effective.
2) Machine learning
Machine learning is an application of artificial intelligence (AI) that provides

systems the ability to automatically learn and improve from experience without
being explicitly programmed. Machine learning focuses on the development of
computer programs that can access data and use it to learn for themselves. The
process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim
4
is to allow the computers learn automatically without human intervention or
assistance and adjust actions accordingly.
But, using the classic algorithms of machine learning, text is considered as a
sequence of keywords; instead, an approach based on semantic analysis
mimics the human ability to understand the meaning of a text.
Some Machine Learning Methods
Machine learning algorithms are often categorized as supervised or unsupervised.
• Supervised machine learning algorithms can apply what has been

learned in the past to new data using labelled examples to predict future
events. Starting from the analysis of a known training dataset, the learning
algorithm produces an inferred function to make predictions about the
output values. The system is able to provide targets for any new input after
sufficient training. The learning algorithm can also compare its output with
the correct, intended output and find errors in order to modify the model
accordingly.
• In contrast, unsupervised machine learning algorithms are used when
the information used to train is neither classified nor labeled. Unsupervised
learning studies how systems can infer a function to describe a hidden
structure from unlabelled data. The system doesn’t figure out the right
output, but it explores the data and can draw inferences from datasets to
describe hidden structures from unlabelled data.
• Semi-supervised machine learning algorithms fall somewhere in
between supervised and unsupervised learning, since they use both labeled
and unlabelled data for training – typically a small amount of labeled data
and a large amount of unlabelled data. The systems that use this method are
able to considerably improve learning accuracy. Usually, semi-supervised
learning is chosen when the acquired labelled data requires skilled and
relevant resources in order to train it / learn from it. Otherwise, acquiring
unlabelled data generally doesn’t require additional resources.
• Reinforcement machine learning algorithms is a learning method that
interacts with its environment by producing actions and discovers errors or
rewards. Trial and error search and delayed reward are the most relevant
characteristics of reinforcement learning. This method allows machines and
software agents to automatically determine the ideal behavior within a
5
specific context in order to maximize its performance. Simple reward
feedback is required for the agent to learn which action is best; this is
known as the reinforcement signal.
Machine learning enables analysis of massive quantities of data. While it
generally delivers faster, more accurate results in order to identify profitable
opportunities or dangerous risks, it may also require additional time and resources
to train it properly. Combining machine learning with AI and cognitive
technologies can make it even more effective in processing large volumes of
information.
3) Data set
Machine learning methods learn from examples. It is important to have
good grasp of input data and the various terminology used when
describing data. In this section, you will learn the terminology used in
machine learning when referring to data.
When I think of data, I think of rows and columns, like a database table
or an Excel spreadsheet. This is a traditional structure for data and is
what is common in the field of machine learning. Other data like
images, videos, and text, so-called unstructured data is not considered
at this time.
4) Table of Data Showing an Instance, Feature, and Train-Test Datasets
6
• Instance: A single row of data is called an instance. It is an
observation from the domain.
• Feature: A single column of data is called a feature. It is a
component of an observation and is also called an attribute of a data
instance. Some features may be inputs to a model (the predictors)
and others may be outputs or the features to be predicted.
• Data Type: Features have a data type. They may be real or
integervalued or may have a categorical or ordinal value. You can
have strings, dates, times, and more complex types, but typically
they are reduced to real or categorical values when working with
traditional machine learning methods.
• Datasets: A collection of instances is a dataset and when working
with machine learning methods we typically need a few datasets for
different purposes.
• Training Dataset: A dataset that we feed into our machine learning
algorithm to train our model.
• Testing Dataset: A dataset that we use to validate the accuracy of
our model but is not used to train the model. It may be called the
validation dataset.
• We may have to collect instances to form our datasets or we may be
given a finite dataset that we must split into sub-datasets.
In this project there are 1 dependent variable(label) and 3 independent

variables(id,title,text) as shown in fig:
news.data
7
CHAPTER: 3
METHODLOGY ADOPTED
If you decided to order machine learning model from the data analysis company, then you will
have to face the fact that sometimes you will contact with developers. It means that you will
communicate with people from another area of work, delve into the work process, control and
test their results.
IT companies don’t work basing on the one principle, there are different approaches to work
with projects. And we want to tell you about software development methodologies for web
applications. The differences are due to the fact of how the workflow is built, which tools are
used, what principles and rules are emphasized.
Recently, some of these methodologies found the application in another area of business. So,
when you finish reading this article, you will know something new and useful for your
business too.
8
CHAPTER: 4
CONCLUSION:
In a nutshell, this internship has been an excellent and rewarding experience. I can conclude
that there have been a lot I’ve learnt from my work at the training &research centre. Needless
to say, the technical aspects of the work I’ve done are not flawless and could be improved
provided enough time. As someone with no prior experience in python whatsoever I believe
my time spent in training and discovering new languages was well worth it and contributed to
finding an acceptable solution to an important aspect of web design and development. Two
main things that I’ve learned the importance of are time-management skills and selfmotivation.
Although I have often stumbled upon these problems at University, they had to be approached
differently in a working environment. Working with machine learning languages has increased
my interest in them, hence prompting me to transfer to the predictive analysis

Predictive Analysis Using Python: A Summer Training Report

Uploaded by

Copyright:

Available Formats

You might also like

Predictive Analysis Using Python: A Summer Training Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predictive Analysis Using Python: A Summer Training Report

Uploaded by

Copyright:

Available Formats

Predictive analysis using Python

A SUMMER TRAINING REPORT

in partial fulfilment of Summer training for the award of the degree of

BACHELOR OF ENGINEERING IN COMPUTER SCIENCE

APEX INSTITUTE OF TECHNOLOGY

CHANDIGARH UNIVERSITY, GHARUAN, MOHALI

for the award of degree of B.E (COMPUTER SCIENCE & ENGINEERING) at

CHANDIGARH UNIVERSITY GHARUAN, MOHALI. The work which is being

presented in the training report submitted to Department of Computer Science &

Engineering at CHANDIGARH UNIVERSITY GHARUAN, MOHALI is an

authentic record of training work.

Signature of the Student

I perceive as this opportunity as a big milestone in my career development. I

July 20th 2021

How do data insights help?

1. Defining the problem statement/business goal.

Define the project outcomes, deliverables, scoping of the effort, business

2. Collection of data based on the answers to the questions created based on

Based on the questionnaire, collect answers in form of datasets.

4. Analysis of data with analytics tools/software. We can visualize the data

Data analysis is the process of inspecting, cleansing, transforming, and

5. Validate assumptions, hypotheses and test them using statistical models.

Statistical analysis enables to validation of the assumptions, hypothesis, and

7. Deploying the model to generate predictions and monitor them for

accuracy. Predictive model deployment provides the option to deploy the

analytical results into the everyday decision-making process to get results,

Machine learning is an application of artificial intelligence (AI) that provides

Some Machine Learning Methods

Machine learning algorithms are often categorized as supervised or unsupervised.

• Supervised machine learning algorithms can apply what has been

4) Table of Data Showing an Instance, Feature, and Train-Test Datasets

In this project there are 1 dependent variable(label) and 3 independent

You might also like