Professional Documents
Culture Documents
Unit2 PDS
Unit2 PDS
Unit2 PDS
Dr Kruti Dangarwala
CSE & IT Department
SVMIT
Unit2- Part –I Syllabus
Discovering the match between data science and python:
• Defining the Sexiest Job of the 21st Century, Considering
the emergence of data science, Outlining the core
competencies of a data scientist, Linking data science, big
data, and AI , Understanding the role of programming,
Creating the Data Science Pipeline, Preparing the data,
Performing exploratory data analysis, Learning from data,
Visualizing, Obtaining insights and data products,
Understanding Python's Role in Data Science, Considering
the shifting profile of data scientists, Working with a
multipurpose, simple, and efficient language, Learning to
Use Python Fast ,Loading data, Training a model, Viewing a
result.
Unit 2- Part II Syllabus
Introducing Python's Capabilities and Wonders:
• Why Python?, Grasping Python's Core Philosophy, Contributing to
data science, Discovering present and future development goals,
Working with Python, Getting a taste of the language,
Understanding the need for indentation, Working at the command
line or in the IDE, Performing Rapid Prototyping and
Experimentation, Considering Speed of Execution, Visualizing
Power, Using the Python Ecosystem for Data Science, Accessing
scientific tools using SciPy, Performing fundamental scientific
computing using NumPy, Performing data analysis using pandas,
Implementing machine learning using Scikit-learn, Going for deep
learning with Keras and TensorFlow, Plotting the data using
matplotlib, Creating graphs with NetworkX, Parsing HTML
documents using Beautiful Soup.
Data Science
• Data science is devoted to the extraction of clean information from raw data
to form actionable insights.
• And there are lots of data out there. By 2025, it’s estimated there will be
around 175 zettabytes of data floating around (a zettabyte is a trillion
gigabytes). Data has been called the “oil of the 21st century.”
• So, what do we do with all of this data?
• How do we make it useful to us?
• What are its real-world applications?
These questions are the domain of data science.
WHAT IS DATA SCIENCE?
This stage involves the identification of data from the internet or internal/external
databases and extracts into useful formats.
Prerequisite skills:
Distributed Storage: Hadoop, Apache Spark/Flink.
Database Management: MySQL, PostgresSQL, MongoDB.
Querying Relational Databases.
Retrieving Unstructured Data: text, videos, audio files, documents.
Source : https://www.datacamp.com/blog/top-programming-languages-for-
data-scientists-in-2022
5) Interpreting the Data
Similar to paraphrasing your data science model. Always remember, if you can’t
explain it to a six-year-old, you don’t understand it yourself. So, communication
becomes the key!! This is the most crucial stage of the pipeline, wherewith the use of
psychological techniques, correct business domain knowledge, and your immense
storytelling abilities, you can explain your model to the non-technical audience.
Prerequisite skills:
Business domain knowledge.
Data visualization tools: Tableau, D3.js, Matplotlib, ggplot2, Seaborn.
Communication: Presenting/speaking and reporting/writing.
6) Revision
As the nature of the business changes, there is the introduction of new features that
may degrade your existing models. Therefore, periodic reviews and updates are very
important from both business’s and data scientist’s point of view.
Python’s Role in Data Science/ Why
Python for Data Science??
• Python is one of the most widely used programming languages in the field and
most of the data scientists use python for data science.
• This dynamic language is easy to learn and read, so it’s an optimal choice for
beginners.
• Python enables quick improvement and can interface with high-performance
algorithms written in Fortran or C.
• For data scientists who need to incorporate statistical code into production
databases or integrate data with web-based applications, Python is often the
ideal choice
• . It is also ideal for implementing algorithms, which is something that data
scientists need to do often.
• There are also Python packages that are specifically tailored for certain
functions, including pandas, NumPy, and SciPy.
• Data scientists working on various machine learning tasks find that Python’s
scikit-learn is a useful and valuable tool.
• Matplotlib, another one of Python’s packages, is also a perfect solution for
data science projects that require graphics and other visuals.
Python Features -captured the imaginations of data
science community.
Easy to learn
• The most appealing quality of Python is that anyone who wants to learn it
—even beginners—can do so quickly and easily and this is one of the
reasons why learners prefer python for data science. That also works well
for busy professionals who have limited time to spend learning. When
compared to other languages, R, for instance, Python promotes a shorter
learning curve with its easy-to-understand syntax.
Scalability
• Unlike other programming languages, such as R, Python excels
when it comes to scalability. It’s also
faster than languages like Matlab and Stata. It facilitates scale
because it gives data scientists flexibility and multiple ways to
approach different problems—one of the reasons why YouTube
migrated to the language. You can find Python across multiple
industries, powering the rapid development of applications for all
kinds of use cases.
Choice of data science libraries
• Another key benefit of using python for data science is that python offers is
access to a wide variety of data analysis and data science libraries. These
include, pandas, NumPy, SciPy, StatsModels, and scikit-learn. These are just
some of the many available libraries, and Python will continue to add to this
collection. Many data scientists who use Python find that this robust
programming language addresses a wide range of needs by offering new
solutions to problems that previously seemed unsolvable.
Python community
• One reason that Python is so well-known is a direct result of its community. As
the data science community continues to adopt it, more users are
volunteering by creating additional data science libraries. This is only driving
the creation of the most modern tools and advanced processing techniques
available today which is why most of the people are preferring Python for data
science.
• The community is a tight-knit one, and finding a solution to a challenging
problem has never been easier. A quick internet search is all you need, and
you can easily find the answer to any questions or connect with others who
may be able to help. Programmers can also connect with their peers on
Codementor and Stack Overflow.
Graphics and visualization
• Python comes with many visualization options. Matplotlib provides
the solid foundation around which other libraries like Seaborn,
pandas plotting, and ggplot have been built. The visualization
packages help make sense of data, create charts, graphical plots.
and web-ready interactive plots.
Bottom Line
• There is no denying that the current job market is competitive, as
the Bureau of Labor Statistics recently reported. If you’re looking
for a stable industry that isn’t going anywhere anytime soon, data
science is an excellent choice. But, choosing a successful industry is
only half the battle when it comes to job security.
• There is also a competition to consider, and it’s important to
remember that oftentimes, many qualified candidates competing
for the same job opening. One of the best ways to ensure you stand
out to recruiters and employers is to have the right credentials.
Earning your certification in Python with data science or other
relevant field is a surefire way to get your resume noticed by the
right people. Get started today!
Most popular Python Libraries for Data
Science
• Python is an interpreted, interactive, portable and object-oriented
programming language. This open-sourced general-purpose
language runs on many Unix variants, including Linux and macOS,
and Windows
.
• TensorFlow
• NumPy
• SciPy
• Matplotlib
• Pandas
• Keras
• SciKit-Learn
• Statsmodels
• Plotly
• Seaborn