Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Government of Karnataka

Department of Collegiate and Technological Education

GOVERNMENT ENGINEERING COLLEGE


MOSALEHOSAHALLI
Department of Computer Science and Engineering
Internship report on
Data science

Presented By
Kiran Shivanand Totager
4HG21CS021
TABLE OF CONTENTS
Introduction to data science
Python for data science
Introduction to CSV file & pandas
Descriptive statistics
Histograms
Introduction to probability
Machine learning
Conclusion
INTRODUCTION
Data Science is a multi-disciplinary field
that uses scientific methods, algorithms,
and systems to extract knowledge and
insights from structured and unstructured
data.

Applications
Search Engines
Digital Advertising
Recommendation Systems
Image Recognition
PYTHON FOR DATASCIENCE
Operators
Variable & variable naming
conventions
Data types in python
Conditional statements
Looping statements
Functions
Packages in Python
INTRODUCTION TO CSV FILE & PANDAS
CSV file:
Comma Separated Values file is way of storing
information in tabular format within text file.

Pandas:
Pandas is python library .it is used for

Data Wrangling and Exploration


Data Visualization
Integration with Presentations
Foundation for Data Science
DESCRIPTIVE STATISTICS
Descriptive statistics summarize and describe the main features of a
dataset.

key concepts
Measures of Measures of Types of Data:
Central Tendency: Variability:
Mean Range Categorical
Median Variance Numerical
Mode Standard Deviation
HISTOGRAMS
histogram is a graphical representation of data using bars of
different heights. It summarizes a data set by dividing it into
bins and plotting the count of data points in each bin.

PURPOSE :
VISUAL SUMMARY : provides a visual summary of the distribution and
frequency of data values.
COMPARISON : It allows you to compare measurements to
specifications.

Histograms assist in decision-making by revealing


DATA EXPLORATION:
patterns and outliers.
INTRODUCTION TO PROBABILITY
Probability is the measure of the likelihood of
an event occurring.

Bernoulli Trials:
A Bernoulli trial is a random experiment
with precisely two possible outcomes:
success and failure.

Central Limit Theorem (CLT) :


States that the distribution of sample means
approaches a normal distribution as the
sample size increases.
INTRODUCTION TO MACHINE LEARNING
Machine learning involves using algorithms to
automatically analyze data and make decisions
without human intervention.

Predictive modeling :
combines AI and historical data to predict
future outcomes accurately.
predictive modeling steps :
Define Your Data
Data Cleaning and
Objective Collection
Preparation

Choose the Right


Algorithm

Model Model Model


Training Training Training
conclusion :
Data science combines statistics,
machine learning, and domain
expertise to extract insights from
data, driving decision-making
across industries. With advances in
data technologies, its applications
and impact continue to expand,
making data science crucial for
modern business and research.
THANK YOU

You might also like