Advanced Data Analytics Using Python - Unit II

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 57

Topics

● Different IDEs
● Advanced data acquisition methods
● APIs (Application Programming Interfaces)
● web scraping
● Databases
● Data Cleaning and Preprocessing Techniques
Python Development IDEs

Tools that provide a comprehensive environment for software development, combining code editing,
debugging, and project management.

Example:
● Google Colab - Collaborative Python Notebooks. Cloud-Based Python Development and
Collaboration.
● Visual Studio Code (VS Code): A lightweight, open-source code editor with powerful features and
a large extension ecosystem.
● Jupyter Notebook: An interactive, web-based tool for data analysis, visualization, and
documentation.
● PyCharm: A comprehensive IDE with intelligent code assistance, integrated testing, and support
for various Python frameworks.
Google Colab
● Access Google Colab:
○ Visit Google Colab in your web browser.
● Sign In with Google Account:
○ Sign in with your Google account to access Google Drive integration.
● Create a New Notebook:
○ Click on File > New Notebook to create a new Python notebook.
Google Colab
● Write Python Code:
○ Write Python code in cells within the notebook.
○ Example code:print("Hello, Google Colab!")

● To add new cell, click on Insert->Code Cell


Google Colab
● Run Code Cells:
○ To run a particular cell, select the cell and press Ctrl + ENTER keys

● View Output and Results:


○ Observe the output and results directly below the code cells.
○ Use Share button to generate a shareable link.
Visual Studio Code (VS Code)

● Visit the official website:


○ Go to Visual Studio Code Official Website.
○ Download VS Code
○ Click on the prominent "Download" button based on your Operating system(WinOs,
Mac , Linux).
Visual Studio Code (VS Code)
○ Click on the installer icon to start the installation process of the Visual Studio Code.
○ After the Installer opens, it will ask you for accepting the terms and conditions of
the Visual Studio Code. Click on I accept the agreement and then click the Next
button.
○ After the Installation setup for Visual Studio Code is finished, it will show a window
like this below. Tick the “Launch Visual Studio Code” checkbox and then click Next.
Visual Studio Code (VS Code)
After the Installation setup for Visual Studio Code is finished, it will show a window like this
below. Tick the “Launch Visual Studio Code” checkbox and then click Next.

Visual Studio Code (VS Code)
Open a Python File:
○ Open Visual Studio Code.
○ Create a new Python file or open an existing one.
● Write Python Code:
○ Write a simple Python script in the editor.
○ print("Hello, Visual Studio Code!")
● Select Python Interpreter:
○ Ensure the correct Python interpreter is selected.
● Run Python Code:
○ Use the Run Python File in Terminal option.
○ Shortcut: Go to extension and search python and code runner then install it.
● View Output in Terminal:
○ Observe the output in the integrated terminal at the bottom.
○ Here is the video tutorial for vscode
Jupyter Notebook
● Install Jupyter Notebook:
○ Use the following command to install Jupyter Notebook using pip.
○ Open command prompt and type pip install notebook and click enter.
Jupyter Notebook
● Verify Installation:
○ Confirm that Jupyter Notebook is installed by checking the version.

○ jupyter notebook --version


● Launch Jupyter Notebook:
○ Start Jupyter Notebook by entering the following command.
○ Jupyter notebook.
○ A web browser opens with the Jupyter Notebook interface.
Jupyter Notebook
● Verify Installation:
○ Confirm that Jupyter Notebook is installed by checking the version.

○jupyter notebook --version


● Launch Jupyter Notebook:
○ Start Jupyter Notebook by entering the following command.
○ Jupyter notebook.
○ A web browser opens with the Jupyter Notebook interface.
○ Here is the video tutorial Jupyter
PyCharm IDE
● Visit the Official Website:
○ Go to the PyCharm Official Website.
● Download PyCharm:
○ Click on the "Download" button for the desired version (Community or
Professional).
● Selecting the Appropriate Version:
○ Choose between the Community (free) and Professional (paid) versions.
PyCharm IDE
● Run the installer and follow the wizard steps.
○ After clicking on Download Click on Next
○ After Click on Next , You need to choose the destination folder according to your
choice.
○ After Installation completed , It will show that Pycharm is installed successfully ,
then click on “I want to manually reboot later”. Click on Finish and then the process
is completed.
What is Data acquisition?

Data Acquisition (DAQ) is used to gather, measure, and record data from different sources
or sensors in real-world scenarios. This involves the conversion of analog signals into digital
data that computers can process and analyze.

Data Sources

● Databases
● Files
● APIs
● Web Scraping
● Sensors and IoT Devices
What is Data acquisition?
Data collection Sources
Main Purpose of a Data Acquisition System (DAQ)

Measure and Record Data

Collect data from diverse sensors and transducers.

Analog-to-Digital
Conversion
Convert analog signals from sensors into digital data.

Processing and Analysis

Facilitate data processing and analysis using computers or


data processing units.
The purposes of data acquisition

● Data recording
● Data storing
● Real-time data visualization
● Post-recording data review
● Data analysis using various mathematical and statistical calculations
● Report generation
What Types of Data Do Companies Collect?
Importance of Data Acquisition Systems
The importance of Data Acquisition Systems lies in their critical role in modern data-
driven applications. Here are the key reasons why they are vital:

● Accurate Data Collection


● Real-Time Monitoring
● Scientific Research
● Industrial Automation
● Predictive Analysis
● Decision-Making
● Safety and Security
What is an API?
API is an acronym for Application Programming Interface that software uses to access data,
server software or other applications and have been around for quite some time.
Types of API
How can an API be used for data collection?

An API integration can be used for data collection in a number of ways. One common use case
is to use an API to collect data from multiple sources and then analyze that data in a more
efficient and accurate way.
List of 5 data science projects using API

● Social Media Sentiment Analysis : By using data from Twitter and Facebook API.

● Opinion Mining : By using data from Twitter and Facebook API.

● Stock Prediction : By using data from Yahoo Stock API and Quandl API.

● Most Popular languages on Github : By using data from Github API.

● Microsoft Face Sentiment Recognition : By using Microsoft face API.


Machine Learning APIs for Data Science
Basic elements of an API:

An API has three primary elements:

● Access: is the user or who is allowed to ask for data or services?

● Request: is the actual data or service being asked for (e.g., if I give you current location from my game

(Pokemon Go), tell me the map around that place). A Request has two main parts:

● Methods: i.e. the questions you can ask, assuming you have access (it also defines the type of

responses available).

● Parameters: additional details you can include in the question or response.

● Response: the data or service as a result of your request.


Categories of API
● Web-based system
Some popular examples of web based API are Twitter REST API, Facebook Graph API,
Amazon S3 REST API, etc.
● Operating system
Some of the examples of OS based API are Cocoa, Carbon, WinAPI, etc.
● Database system
Some popular examples are Drupal 7 Database API, Drupal 8 Database API, Django
API.
● Hardware System
Some other examples of Hardware APIs are: QUANT Electronic, WareNet
CheckWare,OpenVX Hardware Acceleration, CubeSensore, etc.
APIs and Data Science Today

● APIs for data science are useful as they’re essential building blocks.
● They are pieces of code that can be put together to enhance applications and websites.
● For example, speech recognition APIs like chatbots improve the connection between models and
consumers.
Difference between an API and a Library
APIs and the Future of Data Science

Data science is constantly evolving, and it has the potential to grow beyond its current capabilities with the
additional help of APIs.

APIs enable different industries to innovate, improve, and become more data-driven.

Additionally, they pave the path for new business partners and help with app development.

APIs have the potential to make advanced analytics more understandable so that better business forecasting can
take place.
Web Scraping

● Web scraping is the process of automatically extracting information from websites.


● It involves writing code that visits a website, downloads its content, and extracts the
relevant information from the HTML or XML code.
● Web scraping is also known as Web harvesting or Web data extraction.
Steps For Web Scraping
Python Web Scraping Libraries
Introduction to Databases in Data Science

Data science involves extracting value and insights from large volumes of data to drive
business decisions. It also involves building predictive models using historical data. Databases
facilitate effective storage, management, retrieval, and analysis of such large volumes of data.
Essential Database Skills for Data Science
Types of Databases
What is data preprocessing?

● Data preprocessing is the initial processing of raw data to ready it for subsequent data
analysis or machine learning tasks.
● It transforms raw data into a more suitable format for efficient processing in tasks like
data mining and machine learning.
● Data preprocessing ensures the accuracy of results by refining and organizing the input
data.
Steps for data preprocessing
Preprocessing techniques for complex datasets

Handling complex datasets can be challenging, but Python offers various preprocessing techniques to make your
life easier. Here are some common techniques:

● Data Cleaning
● Data Transformation
● Feature Engineering
● Handling Text Data
● Dealing with Date/Time Data
● Handling Imbalanced Datasets
● Dimensionality Reduction
● Data Splitting
● Handling Multicollinearity
● Handling Time Series Data
What is Data Cleansing ?
Data Cleaning Cycle
Usage of Data Cleaning
Data Transformation

Data transformation is one of the techniques that we use in between data processing. This
technique lets us convert the raw data into a required format so that the next procedures of
data processing and data modelling can be performed efficiently.
Ways of data transformation
Data Transformation Techniques
Data Transformation Process
Advantages of Data Transformation
What is Feature Engineering?
The process of identifying & extracting relevant features from raw data for a machine learning
algorithm is called feature engineering. It starts from selecting the most important
characteristics (features), their transformation using mathematical operations, construction of
new variables as per the requirement, and feature extraction.
Handling Text Data
Text Analysis Techniques
Dealing with Date/Time Data
Handling Imbalanced Datasets

Some of the well-known examples of


imbalanced data sets are

1 – Fraud detection: where number of fraud


cases could be much smaller than non-
fraudulent transactions.

2- Prediction of disputed / delayed invoices:


where the problem is to predict default /
disputed invoices.

3- Predictive maintenance data sets, etc


Dimensionality Reduction

In both Statistics and Machine


Learning, the number of
attributes, features or input
variables of a dataset is
referred to as its
dimensionality.
Data Splitting
THANK YOU

You might also like