Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

!

Python for Data Science


Training Program

Pre-requisites
1. Intermediate level expertise with Python
2. Basic idea of the Python data ecosystem
3. Some background on file formats and Relational Databases

Lab requirements
1. 1:1 or 2:1 participant-machine ratio for hands-on and exercises

2. Local installation of Anaconda distribution for Python 3.6


-or-
3. Local installation of ActivePython 3.6+ community edition + additional modules as
needed
4. The following python modules need to be installed: numpy, pandas, jupyter, matplotlib,
flask

Agenda

Python refresher
• The Python interpreter
• Python Data Types
• Data and type introspection basics
• Control structures
• Functions
• Classes
• Errors and exceptions
• Regular expressions
Class basics
• __init__
• self
• private vs public convention
• magic functions
• object creation
• type of objects
• inheritance, multiple inheritance

Errors & exceptions


• Standard exception hierarchy
• exception payloads
• defining new exceptions
• chaining exceptions
• traceback objects
• Assertions

Relational Database Interaction


• CRUD operations
• SQL
• Python DB API 2.0
• sqlite3
• MySQLdb module
o connect()
o Connection objects
o Cursor objects
o execute()
o fetch*()

Data Ecosystem in Python


• Scipy
• Numpy
• Pandas
• Matplotlib
• Ipython
• Jupyter

Numpy
• Why numpy?
• Comparison on memory and run-time with native lists
• Numpy arrays
• Multi-dim arrays
• Mapped operation on numpy arrays
• Filtering

Pandas
• DataFrames
• Series
• Indexes
• Inherited operations from numpy arrays
• from_* methods for reading file formats
• Selecting columns with [] and .
• Filtering
• value_counts()
• group_by() and aggregation functions
• sort_index() and sort_values() to speed-up lookups
• pivoting/unstacking
• Merging dataframes
• Appending
• .loc[] and .iloc[] based lookup
• Working with dates
• Timeseries
• Real examples to try all these operations

Machine Learning: Basics


• Algorithmic logic vs ML logic
• Supervised vs Unsupervised
• Training Data and Test Data
• Classification
• Regression
• Clustering

Supervised Learning: Classification


• The Classification Problem
• Bayes Theorem
• Conditional Probability
• Probabilistic classifier: Naive Bayes Classifier
• Non-probabilistic classifier: k-nearest neighbours(knn)
• K in knn
• Kind of problem instances in knn
• Distance
• Differences between Naive Bayes and knn

Support Vector Machines (SVM)


• Formal definition
• The intuition
• SVM classes in sklearn
• SVM kernels
• RBF and Linear kernels

Unsupervised learning in Python


• Need for dimensionality reduction
• Principal Component Analysis (PCA)
• Difference between PCAs and Latent Factors
• Factor Analysis
• Hierarchical, K-means & DBSCAN Clustering, Gaussian Mixture Models
• SVD
• Clustering Use Cases
Generalised Linear Models in Python
• Linear Regression
• Regularization of Generalized Linear Models
• Ridge and Lasso Regression
• Logistic Regression
• Methods of threshold determination and performance measures for classification score
models

Basics of Natural Language Processing

• Text Processing
• Lemmatization
• Parts of Speech Tagging
• Named Entity Recognition
• Word Embeddings
• Ngrams
• Tf-IDF
• Text Classification

Wrap-up, Discussion and Q&A

You might also like