Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Week 1

Data Analysis
Week 1: Data analysis

Introduction
Data Analysis
Methods used in Data analysis

Data Mining

Text Analytics

Business Intelligence

Data Visualization
Data Analysis- is the process of inspecting,cleansing,transforming and

Data Analysis modelling data with the goal of discovering useful information
,informing conclusions and supporting decision-making. It is the

definition process of evaluating data using analytical and statistical tools to


discover useful information and aid in business decision making.

Find out more at the PowerPoint Getting Started Center


(Click the arrow when in Slide Show mode)
Methods used:

1. Data Mining

Methods used in 2.Text analytics

3.Business Intelligence
Data Analysis 4. Data Visualization
Data mining-is a method of data analysis for discovering
patterns in large data sets using methods of statistics, artificial

Data Mining intelligence,machine learning and data bases. The goal is to


transform raw data into understandable business

definition information.These might include identifying groups of data


records(known as cluster analysis) or identifying anomalies and
dependencies between data groups.
Text Analytics-is the process of deriving useful
information from text It is accomplished by
Text analytics processing unstructured textual information,extract
meaningful numerical indices from the information
definition and make the information available to statistical and
machine learning algorithms for further processing.
Business Intelligence-transforms data into

Business actionable intelligence for business purposes and


maybe used in an organization's strategic and

Intelligence tactical business decision making. It offers a way


for people to examine trends from collected data

definition and derive insights from it.


Data Visualization- refers very simply to the visual
Data representation of data. In the context of data
analysis,it means using the tools of
Visualization statistics,probability,pivot tables and other artifacts
to present data visually. It makes complex data

definition more understandable and usable.


7 most Important data mining techniques

1.Tracking pattern

Data Mining
2. Classification (predictive)

3. Association (descriptive)

Techniques
4. Outlier detection

5.Clustering Desciptive0

6.Regression (predictive)

7. Prediction
Data Mining tools

1. Rapid Miner

Data Mining
2. Orange

3. Weka

4. Knime

tools 5. R-programming
Rapid Miner is one of the the best predictive analysis system developed by the company
with same name. It is written in JAVA programming language.It provides an integrated
environment for deep learning,text mining,machine learning and predictive analysis.

Rapid Miner offers the server both on premise and in public/private cloud infrastructures.
It has a client/server model as its base.It is rated as the number one business analytics
software.

Rapid Miner It consists of three modules :

1.Rapid miner studio-for workflow design ,prototyping

2.Rapid miner server-to operate predictive data models created in studio

3. Rapid miner Radoop-executes processes directly in Hadoop cluster to simplify


predictive analysis.
It is a perfect software suit for machine learning and data mining. It
best aids the data visualization and is a component based software.
It has been written in Python computing language.

Orange As it is a component-based software,the components of orange are


called "widgets". These widgets range from data visualizationahmsnd
pre-processing to an evaluationto an evaluation of algorithms
It is a collection of machine learning algorithms for
data mining tasks. The algorithms can either be
applied directly to a dataset or called from your own
WEKA Java code.The tool is very sophisticated and used in
many different applications including visualization and
algorithms for data analysis and predictive modelling.
KNIME Primarily used for data preprocessing-data
extraction,transformation and loading. It is a powerful
tool with GUi that shows the network of data
nodes.Popular amongst financial data analysts.
Its primarily written in C and in Fortran and a lot of
its modules are written in R itself.It's a free software
programming language and software environment
R- for statistical computing and graphics. nonlinear
modelling,classical statistical tests,time-series
PROGRAMMING analysis,classification,clustering and others.

You might also like