4HG21CS007

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Government of Karnataka Department of Colligate and

Technical Education
ViSVESVARAYA TECHNOLOGICAL UNIVERSITY
BELAGAVI -590014
(Affiliated to AICTE Delhi)

Internship Report On:


Under the Guidance of, Data Science HOD
Dr Raghu M.E Preseted By: Dr.K.C.Ravishankar
Assistant professor Professor and Head
Darshan.K- 4HG21CS007
Dept of CS&E Dept. of CS&E
Data Science
Table of Contents

Introduction to Data Sciene


Python For Data Science
Statistics and Probability
Predictive Modelling and
Machine Learning
Introduction Data Science
Data Science is about data gathering, analysis and
decision-making.
Data Science is about finding patterns in data, through
analysis, and make future predictions.

By using Data Science, companies are able to make:


1. Better decisions (should we choose A or B)
2. Predictive analysis (what will happen next?)
3. Pattern discoveries (find pattern, or maybe hidden
information in the data)
Python For Data Science
Python has in-built mathematical libraries and
functions, making it easier to calculate
mathematical problems and to perform data
analysis.
Libraries of Python:
1. Pandas
2. Numpy
3. Matplotlib
4. SciPy
CSV
CSV stands for Comma-Separated Values. It is a simple text
format for storing tabular data where each line represents a
row, and each value within the row is separated by a comma.

Dataframe
A DataFrame is a two-dimensional, size-mutable, and potentially
heterogeneous tabular data structure with labeled axes (rows
and columns).
Statistics
Statistics is the science of analyzing data.
Descriptive Statistics
Descriptive statistics summarizes important features of a data
set such as:

Central Tendency: Variability: Range:


1. Mean 1. Variance 1. Maximum
2. Median 2. Standard Deviation 2. Minimum
3. Mode 3. Frequency
4. Interquartile Range (IQR)
Histograms
A histogram is a widely used graph to show the distribution of
quantitative (numerical) data.
It shows the frequency of values in the data, usually in intervals of
values. Frequency is the amount of times that value appeared in
the data.
Key Elements of a Histogram :
1. Bins: Intervals or ranges into which the data is divided. Each bin
represents a subset of the data points.
2. Frequency: The number of data points (or observations) that fall into
each bin.
3. X-axis: Represents the range of values or intervals (bins) of the data.
4. Y-axis: Represents the frequency or count of observations within each
bin.
Probability

Bernoulli Trials
Continuous Random Variable
Central Limit Theory
Introduction to Machine Learning
Machine Learning (ML) is a subfield of artificial intelligence (AI) that
enables computers to learn from data without being explicitly
programmed.

Predictive Modelling
Predictive modeling is the process of using data to make
predictions about unknown future events or outcomes.
It involves the steps Data Collection, Feature Selection,
Model Selection, Model Training, Model Evaluation, Prediction
Flow Chart
Data Preprocessing:
Data Collection

Monitoring &
Data Analysis
Maintenance

Model Deployment

Selection & Engineering

Model Evaluation
Model Selection & Training
Conclusion
Data science combines statistics, machine learning, and domain expertise
to extract insights from data, driving decision-making across industries.
With advances in data technologies, its applications and impact continue
to expand, making data science crucial for modern business and research.
Thanks

You might also like