Professional Documents
Culture Documents
Predictive Analysis Using Python: A Summer Training Report
Predictive Analysis Using Python: A Summer Training Report
Predictive Analysis Using Python: A Summer Training Report
Submitted by
Vishal Kanwar
UID: 19BCS1406
JULY 2021
ii
I Vishal Kanwa hereby declare that I have undertaken six weeks industrial
training atCoursera during a period from june to july in partial fulfillment of
requirements
Vishal Kanwar
CERTIFICATE
iv
ACKNOWLEDGEMENT
The summer training opportunity I had with COURSERA was a great chance
for learning and professional development. Therefore, I consider myself as a
very lucky individual as I was provided with an opportunity to be a part of it. I
am also grateful for having a chance to meet so many wonderful people and
professionals who led me though this internship period.
Vishal Kanwar
Background
Our understanding of the etiology, pathophysiology, phenotypic diversity, and
progression of Parkinson’s disease has stagnated. Consequently, patients do not
receive the best care, leading to unnecessary disability, and to mounting costs for
society. The Personalized Parkinson Project (PPP) proposes an unbiased approach
to biomarker development with multiple biomarkers measured longitudinally. Our
main aims are: (a) to perform a set of hypothesis-driven analyses on the
comprehensive dataset, correlating established and novel biomarkers to the rate of
disease progression and to treatment response; and (b) to create a widely accessible
dataset for discovery of novel biomarkers and new targets for therapeutic
interventions in Parkinson’s disease.
Methods/design
This is a prospective, longitudinal, single-center cohort study. The cohort will
comprise 650 persons with Parkinson’s disease. The inclusion criteria are purposely
broad: age ≥ 18 years; and disease duration ≤5 years. Participants are followed for 2
years, with three annual assessments at the study center. Outcomes include a clinical
assessment (including motor and neuro-psychological tests), collection of
biospecimens (stool, whole blood, and cerebrospinal fluid), magnetic resonance
imaging (both structural and functional), and ECG recordings (both 12-lead and
Holter). Additionally, collection of physiological and environmental data in daily life
over 2 years will be enabled through the Verily Study Watch. All data are stored with
polymorphic encryptions and pseudonyms, to guarantee the participants’ privacy on
the one hand, and to enable data sharing on the other. The data and biospecimens
will become available for scientists to address Parkinson’s disease-related research
questions.
Discussion
The PPP has several distinguishing elements: all assessments are done in a single
center; inclusion of “real life” subjects; deep and repeated multi-dimensional
phenotyping; and continuous monitoring with a wearable device for 2 years. Also,
the PPP is powered by privacy and security by design, allowing for data sharing with
scientists worldwide respecting participants’ privacy. The data are expected to open
the way for important new insights, including identification of biomarkers to predict
differences in prognosis and treatment response between patients. Our long-term
aim is to improve existing treatments, develop new therapeutic approaches, and
offer Parkinson’s disease patients a more personalized disease management
approach.
Trial registration
Clinical Trials NCT03364894. Registered December 6, 2017 (retrospectively
registered).
CONTENTS
Title Page hi
About the company ii
Certificate iii
Acknowledgement iv
ABSTRACT
CONTENTS 1
CHAPTER 1 INTRODUCTION 2
1.1 PREDICTIVE ANALYSIS 2
CHAPTER 2 THEORY 4
2.1 PYTHON 4
2.2 MACHINE LEARNING 5
2.3 DATA SET 7
CHAPTER 3 METHODOLOGY ADOPTED 14
CHAPTER 4 CONCLUSIONS AND FUTURE SCOPE OF STUDY 15
REFERENCES 16
CHAPTER: 1 INTRODUCTION
PREDICTIVE ANALYSIS
Making future predictions about unknown events with the help of techniques
from data mining, statistics, machine learning, math modeling, and artificial
intelligence is known as predictive analytics. With the help of past data, it
makes predictions. We use predictive analytics in our day-to-day life without
giving much thought. For example, predicting sales of an item (say flowers)
in a market for a particular day. If it is valentines day, the sales of roses
would be high! We can easily say that the sales of flowers would be higher
on festive days than on regular days.
1
In predictive analytics, we find the factors responsible, gather data, apply
techniques from machine learning, data mining, predictive modeling, and
other analytical techniques to predict the future. The insights from the data
include patterns, the relationship among different factors that might be
previously unknown. Unraveling those hidden insights is of more worth
than you think it is. Businesses use predictive analytics to enhance their
process and to achieve their targets. Insights obtained from both
structured and unstructured data can be used for predictive analytics.
2
3. Integrate the data obtained from various sources.
Data mining for predictive analytics prepares data from multiple sources for
analysis. This provides a complete view of the customer interactions.
6. Model generation
Model is generated with algorithms to automate the process with the new
data combined with existing data. Multiple models can also be combined to
obtain better results.
3
CHAPTER: 2
THEORY
1) PYTHON
Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics. Its high-level built in data structures, combined
with dynamic typing and dynamic binding, make it very attractive for
Rapid Application Development, as well as for use as a scripting or glue
language to connect existing components together. Python's simple, easy to
learn syntax emphasizes readability and therefore reduces the cost of
program maintenance. Python supports modules and packages, which
encourages program modularity and code reuse. The Python interpreter
and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.
Often, programmers fall in love with Python because of the increased
productivity it provides. Since there is no compilation step, the edit-
testdebug cycle is incredibly fast. Debugging Python programs is easy: a
bug or bad input will never cause a segmentation fault. Instead, when the
interpreter discovers an error, it raises an exception. When the program
doesn't catch the exception, the interpreter prints a stack trace. A source
level debugger allows inspection of local and global variables, evaluation
of arbitrary expressions, setting breakpoints, stepping through the code a
line at a time, and so on. The debugger is written in Python itself, testifying
to Python's introspective power. On the other hand, often the quickest way
to debug a program is to add a few print statements to the source: the fast
edit-test-debug cycle makes this simple approach very effective.
2) Machine learning
4
is to allow the computers learn automatically without human intervention or
assistance and adjust actions accordingly.
But, using the classic algorithms of machine learning, text is considered as a
sequence of keywords; instead, an approach based on semantic analysis
mimics the human ability to understand the meaning of a text.
5
specific context in order to maximize its performance. Simple reward
feedback is required for the agent to learn which action is best; this is
known as the reinforcement signal.
Machine learning enables analysis of massive quantities of data. While it
generally delivers faster, more accurate results in order to identify profitable
opportunities or dangerous risks, it may also require additional time and resources
to train it properly. Combining machine learning with AI and cognitive
technologies can make it even more effective in processing large volumes of
information.
3) Data set
Machine learning methods learn from examples. It is important to have
good grasp of input data and the various terminology used when
describing data. In this section, you will learn the terminology used in
machine learning when referring to data.
When I think of data, I think of rows and columns, like a database table
or an Excel spreadsheet. This is a traditional structure for data and is
what is common in the field of machine learning. Other data like
images, videos, and text, so-called unstructured data is not considered
at this time.
6
• Instance: A single row of data is called an instance. It is an
observation from the domain.
• Feature: A single column of data is called a feature. It is a
component of an observation and is also called an attribute of a data
instance. Some features may be inputs to a model (the predictors)
and others may be outputs or the features to be predicted.
• Data Type: Features have a data type. They may be real or
integervalued or may have a categorical or ordinal value. You can
have strings, dates, times, and more complex types, but typically
they are reduced to real or categorical values when working with
traditional machine learning methods.
• Datasets: A collection of instances is a dataset and when working
with machine learning methods we typically need a few datasets for
different purposes.
• Training Dataset: A dataset that we feed into our machine learning
algorithm to train our model.
• Testing Dataset: A dataset that we use to validate the accuracy of
our model but is not used to train the model. It may be called the
validation dataset.
• We may have to collect instances to form our datasets or we may be
given a finite dataset that we must split into sub-datasets.
news.data
7
CHAPTER: 3
METHODLOGY ADOPTED
If you decided to order machine learning model from the data analysis company, then you will
have to face the fact that sometimes you will contact with developers. It means that you will
communicate with people from another area of work, delve into the work process, control and
test their results.
IT companies don’t work basing on the one principle, there are different approaches to work
with projects. And we want to tell you about software development methodologies for web
applications. The differences are due to the fact of how the workflow is built, which tools are
used, what principles and rules are emphasized.
Recently, some of these methodologies found the application in another area of business. So,
when you finish reading this article, you will know something new and useful for your
business too.
8
CHAPTER: 4
CONCLUSION:
In a nutshell, this internship has been an excellent and rewarding experience. I can conclude
that there have been a lot I’ve learnt from my work at the training &research centre. Needless
to say, the technical aspects of the work I’ve done are not flawless and could be improved
provided enough time. As someone with no prior experience in python whatsoever I believe
my time spent in training and discovering new languages was well worth it and contributed to
finding an acceptable solution to an important aspect of web design and development. Two
main things that I’ve learned the importance of are time-management skills and selfmotivation.
Although I have often stumbled upon these problems at University, they had to be approached
differently in a working environment. Working with machine learning languages has increased
my interest in them, hence prompting me to transfer to the predictive analysis