Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Session 04: Python for Data Science

Course: Data Science Algorithms


and Applications (19ECE436A)

Session Duration: 50 minutes

Course Leader:
Prof. Raghavendra V. Kulkarni, PhD
Department of Electronics and Communication Engineering
Faculty of Engineering and Technology, MSRUAS, Bengaluru
Email: raghavendra.ec.et@msruas.ac.in
1
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Intended Learning Outcomes
At the end of this session, the student should be able to:

• Discuss the advantages of the Python Programming Language

• List the major Python libraries used for scientific computation, data
science and visualization

• Explain the functionalities and features of NumPy, Pandas and


Matplotlib libraries

• Install Python on a personal computer and use the Jupyter


Notebook for effective coding.

2
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Why Python?

• Easy to Learn and Use


 Easy to use, very accessible, simplified syntax, easily written and executed,
interpreted language
• Mature and Supportive Community
 Plenty of documentation, guides and video tutorials. Developer community is
incredibly active.
• Support from Renowned Corporate Sponsors
 Heavily backed by Facebook, Amazon Web Services, and especially Google.
3
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Why Python?
• Hundreds of Python Libraries and Frameworks
 NumPy for scientific computing 
 Pandas for Data Science 
 Matplotib for plotting charts and graphs 
 SciPy for engineering applications, science, and mathematics
 BeautifulSoup for HTML parsing and XML
 Django for server-side web development

• Versatility, Efficiency, Reliability, and Speed


 Used in nearly any kind of environment, and one will not face any kind of
performance loss issue irrespective of the platform
• Best-fit for Big data, Machine Learning and Cloud Computing
 Second most popular used tool after R language for data science and analytics.

4
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Why Python?
• First-choice Language
 The first choice for many programmers and students because it is in high demand
in the development market.
• Flexibility
 Flexible. It gives the developer the chance to try something new.
 An expert is not just limited to build similar kinds of things but can also go on to
try to make something different than before.
• Automation
 Can help a lot in automation of tasks as there are lots of tools and modules

5
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Basic Python Libraries for Data Science

6
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
The NumPy Library
• NumPy (Numerical Python) is an open-source Python library that’s
used in almost every field of science and engineering.
• It’s the universal standard for working with numerical data in Python,
and it’s at the core of the scientific Python and PyData ecosystems.
• The NumPy API is used extensively in Pandas, SciPy, Matplotlib, Scikit-
learn, Scikit-image and most other data science and scientific Python
packages.
• The NumPy library contains multidimensional array and matrix data
structures.
• It provides ndarray, a homogeneous 𝒏-dimensional array object,
with methods to efficiently operate on it.
7
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
The NumPy Library
• NumPy can be used to perform a wide variety of mathematical
operations on arrays.
• It adds powerful data structures to Python that guarantee efficient
calculations with arrays and matrices.
• It provides an enormous library of high-level mathematical functions
that operate on these arrays and matrices.
• It is the core component of other libraries, such as Matplotlib, Pandas,
TensolFlow, etc.

8
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
The Pandas Library
• Pandas is an open-source Python Library that provides high-
performance data manipulation and analysis tool using its powerful
data structures.
• The name Pandas is derived from the word Panel Data, an
Econometrics from Multidimensional data.
• Developed by Wes McKinney in 2008.
• Prior to Pandas, Python was majorly used for data munging and
preparation. It had very little contribution towards data analysis.
• Using Pandas, we can load, prepare, manipulate, model, and analyze
data regardless of its origin.
• Python with Pandas is used in a wide range of fields in academic and
commercial domains including finance, economics, statistics, analytics,
etc. 9
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Key Features of the Pandas Library
• Fast and efficient DataFrame object with default and customized
indexing
• Tools for loading data into in-memory data objects from different file
formats
• Data alignment and integrated handling of missing data
• Reshaping and pivoting of data sets
• Label-based slicing, indexing and subsetting of large data sets
• Columns from a data structure can be deleted or inserted
• Group by data for aggregation and transformations
• High performance merging and joining of data
• Time Series functionality
10
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
The Matplotlib Library
• Matplotlib is one of the most popular Python packages used for data
visualization.
• It is a cross-platform library for making 2D plots from data in arrays.
• Matplotlib makes use of NumPy and provides an object-oriented API
that helps in embedding plots in applications using Python GUI toolkits.
• It can be used in Python and IPython shells, Jupyter notebook and web
application servers as well.
• Written by John D. Hunter in 2003.
• One of the greatest benefits of visualization is that it allows us visual
access to huge amounts of data in easily digestible visuals.
• Matplotlib consists of several plots like line, bar, scatter, histogram etc.

11
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Python Installation
• Anaconda Individual Edition is an open source, flexible solution that
provides the utilities to build, distribute, install, update, and manage
software in a cross-platform manner.
• Conda (package and environment management system) makes it easy
to manage multiple data environments that can be maintained and run
separately without interference from each other.
• Let us use the Anaconda Individual Edition as the Python 3.8
distribution.
• My choice is Python 3.8 64-bit Graphic Installer for Windows 10.
• This can be downloaded from
https://www.anaconda.com/products/individual-d
• Jupyter is a free, open-source, interactive web tool can be used to
combine software code, computational output, explanatory text and
multimedia resources in a single document. 12
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Anaconda Libraries

13
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Action Items
• Download the Python 3.8 64-bit (or 32-bit depending on your
computer hardware) Graphic Installer for whichever operating
system your computer has.
• Install the package using the default settings.
• Brush up your Python Programming skills.
• Practice basic Python programmes from your First-Year class
using the Jupyter Notebook.
• There will be brief hands-on introductions to NumPy, Pandas
and Matplotlib libraries in the subsequent sessions.

14
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Summary
Advantages of the Python Programming Language are the
following:

• Easy to Learn and Use


• Mature and Supportive Python Community
• Support from Renowned Corporate Sponsors
• Hundreds of Python Libraries and Frameworks
• Versatility, Efficiency, Reliability, and Speed
• Big data, Machine Learning and Cloud Computing
• First-choice Language
• The Flexibility of Python Language
• Automation
15
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Summary
• NumPy is an open-source Python library that’s used in science
and engineering.
• It adds powerful data structures to Python that guarantee
efficient calculations with arrays and matrices.
• Pandas is an open-source Python Library that provides high-
performance data manipulation and analysis tool using its
powerful data structures.
• Using Pandas, we can load, prepare, manipulate, model, and
analyze data regardless of its origin.
• Matplotlib is one of the most popular Python packages used for
data visualization.
• It allows us visual access to huge amounts of data in easily
digestible visuals.
16
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
17
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
18
Faculty
©M. S. of Engineering
Ramaiah & Technology
University of Applied Sciences © Ramaiah University of Applied Sciences

You might also like