Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Data analytics is the science of analyzing raw data to make conclusions about that

information. Many of the techniques and processes of data analytics have been
automated into mechanical processes and algorithms that work over raw data for
human consumption.

Data Analysis is the process of inspecting, cleansing, transforming,


and modeling data with the goal of discovering useful information, informing
conclusions, and supporting decision-making. [1] Data analysis has multiple facets
and approaches, encompassing diverse techniques under a variety of names, and is
used in different business, science, and social science domains. [2] In today's
business world, data analysis plays a role in making decisions more scientific and
helping businesses operate more effectively.

Types of Data Analytics


Data analytics is broken down into four basic types.

1. Descriptive analytics: This describes what has happened over a given


period of time. Have the number of views gone up? Are sales stronger this
month than last?
2. Diagnostic analytics: This focuses more on why something happened. This
involves more diverse data inputs and a bit of hypothesizing. Did the weather
affect beer sales? Did that latest marketing campaign impact sales?
3. Predictive analytics: This moves to what is likely going to happen in the
near term. What happened to sales the last time we had a hot summer? How
many weather models predict a hot summer this year?
4. Prescriptive analytics: This suggests a course of action. If the likelihood of
a hot summer is measured as an average of these five weather models is
above 58%, we should add an evening shift to the brewery and rent an
additional tank to increase output.
Real World Applications of Data Analytics

1. Health Care
2. Customer Acquisition & Retention
3. Internet Search
4. Risk Detection Management
5. Security
6. Education
7. Business Automation
8. Marketing & Data Advertising

Data Analytics Techniques

1. Regression Analysis
2. Monte Carlo Simulation
3. Factor Analysis
4. Cohort Analysis
5. Cluster Analysis
6. Time Series Analysis
7. Sentiment Analysis

Data Analytics Tools

1. Microsoft Excel
2. Apache Spark
3. Rapidminer
4. Google Analytics
5. Tableau
6. Zoho Analytics
7. Microsoft Power BI
Microsoft Excel is a spreadsheet editor developed
by Microsoft for Windows, macOS, Android, iOS and iPadOS. It
features calculation or computation capabilities, graphing tools, pivot tables, and
a macro programming language called Visual Basic for Applications (VBA). Excel
forms part of the Microsoft 365 suite of software.
Uses of MS Excel

1. Regression Analysis
2. Data Entry and Storage
3. Performing Calculations
4. Data Analysis and Interpretation
5. Reporting and Visualizations
6. Accounting and Budgeting
7. Collection and Verification of Business Data
8. Calendars and Schedules
9. Administrative and Managerial Duties
10. Forecasting and Automating Repetitive Tasks

Advanced Excel Functions refers to the features and functions of Microsoft Excel,
which helps the user to perform complex calculations, perform data analysis, and
much more. In this article, you will learn some of the most commonly used advanced
functions in Excel.

Uses of MS Excel(Advanced)

1. Data Consolidation
2. Visual Representations
3. Power Query
4. Quality Control
5. Conditional Formatting
6. Advanced Charting
7. Pivot Tables
8. Data Selection
What is power BI ?
Microsoft power is a business intelligence (BI) platform that provides nontechnical
business users with tools aggregating, analysis, visualizing and sharing data. Power
BI’s user interface is fairly intuitive for user familiar with excel, and its deep
integration with other Microsoft products make its versatile self-service tool that
requires little upfront training.
Common uses of power BI
Microsoft power BI is used to find insights with an organization’s data. Power BI can
help connect disparate data sets, transform and clean the data into a data model and
create chats or graphs to provide visual of the data.

key features of power BI


 Artificial intelligence.
 Hybrid deployment support.
 Quick Insight.
 Common data model support.
 Cortana integration.
 Customization.
 API’s integration.
 Self-service data prep.
 Modeling view.
Advantages of power BI
 Visualization.
 Data connectivity.
 Real-time data.
 Collaboration.
 Scalability.
Why power Bi is so popular
 Power Bi is easy to use and doesn’t require any coding skills.
 Power BI is visual and allows users to create beautiful report.
 Power BI integrate many popular sources, making its easy to started.
 Power Bi has a huge online community making it easy to find answers to any
questions.
 Power BI has many built-in features. Making it very powerful.
 Power BI is affordable, with a free version available for personal use.
Python has been around since 1991. It is one of the best
programming languages widely used in data analytics. It is
easy to use, fast, and manipulates data seamlessly. It
supports various data analytics activities such as data
collection, analysis, modelling, and visualisation.

Python Libraries for Data Analytics

1. NumPy: NumPy supports n-dimensional arrays and


provides numerical computing tools. It is useful for Linear algebra and Fourier
transform.
2. Pandas: Pandas provides functions to handle missing data, perform
mathematical operations, and manipulate the data.
3. Matplotlib: Matplotlib library is commonly used for plotting data points and
creating interactive visualizations of the data.
4. SciPy: SciPy library is used for scientific computing. It contains modules for
optimization, linear algebra, integration, interpolation, special functions, signal
and image processing
5. Scikit-Learn: Scikit-Learn library has features that allow you to build
regression, classification, and clustering models.

Data analysts often use Python to describe and categorize the data that currently
exists. They engage in exploratory data analysis, which includes profiling the data,
visualizing results, and creating observations to shape the next steps in the
analysis. Python can be used to manipulate data (using libraries such as pandas),
streamline workflows, and create visualizations (using Matplotlib).
Many businesses generate a trove of data that helps them better understand their
customers and processes. However, this data isn’t helpful in and of itself. What
businesses must do is develop non-intuitive, predictive, and insightful information
from this data, which is called data mining.
Data modeling is a process that helps data scientists define and classify data so that
it can be aligned to business hierarchies or other structures necessary for analysis.
The goal of data modeling is to produce high quality, consistent, structured data for
running business applications and informing decision makers. Python is one of the
most helpful tools in data modeling because it is highly scalable, flexible, well-
supported, and has a robust user community.. It handles large data sets efficiently
and is excellent for data categorization and building hierarchies as well.
Project work : insurance data
Description: An insurance group consists of 10 property and casualty insurance, life
insurance and brokerage companies. The property and casualty companies in the
group operate in a 17-state region. The group is a major regional property and
casualty insurer, represented by more than 4,000 independent agents who live and
work in local communities through a six states region. Define the metrices to analysis
agent performance based on several attributes like demography, products sold, new
business, etc. the goal is improve their existing knowledge used for agent
segmentation in a supervised predictive framework.
References :

AGENCY_ID Description
PRIMARY_AGENCY_ID master agency if part of group
PROD_ABBR 33 products of which 14 are CL and 19 are PL
PROD_LINE commercial lines (CL) or personal lines (PL)
STATE_ABBR  
STAT_PROFILE_DATE_YEAR data starts in mid-2005 and continues into 2015
RETENTION_POLY_QTY current number of policies that are still active from previous year
POLY_INFORCE_QTY number of policies active for that year
PREV_POLY_INFORCE_QTY number of policies active in the previous year
NB_WRTN_PREM_AMT new business in written premium
WRTN_PREM_AMT total written premium
PREV_WRTN_PREM_AMT written premium during the same period in the previous year
PRD_ERND_PREM_AMT amount of premium taken in
PRD_INCRD_LOSSES_AMT losses
number of months included in the data for that year; the original data was monthly
MONTHS and some months were missing so the aggregate doesn’t make up the entire year if
the months value is less than 12
computed & computed for each row in the data as RETENTION_POLY_QTY /
PREV_POLY_INFORCE_QTY; therefore it’s a granular measure of the retention
RETENTION_RATIO
for that agency writing that particular product in that particular state from the
previous year
compute & computed for each row in the data as PRD_INCRD_LOSSES_AMT /
WRTN_PREM_AMT; currently I’m only computing results where the WRTN_PREM_AMT
is greater than 0; however there are many cases where there are losses but no
LOSS_RATIO
premium maybe from a previous claim that’s still being paid, so I included codes to
indicate whether there are positive or negative losses on zero premiums; they are
located at the end of this document
computed & computed by agency by line of business by year for the three year
period ending in that year, if there is data for three years, otherwise the two years or
one year of data available; to make this more tangible, the first complete year of data
we have for an agency will have the loss ratio for that year; the second complete year
LOSS_RATIO_3YR
of data will have the mean loss ratio for that year and the previous year; the third
complete year of data will have the mean loss ratio for those three years; then the
fourth and greater will have the mean of the three year period ending in that year;
note that the mean loss ratios are computed independently for PL and CL
computed & computed by agency by line of business by year for the three year
period ending in that year; measures the average growth in written premium for that
agency in that line of business; only computes results for agencies that have data for
the entire range of years; since the measure is over three years of growth, there
GROWTH_RATE_3YR
needs to be a base year to be used as a standard so four years of data are needed;
in order to include as many results as possible, the PREV_WRTN_PREM_AMT column
is used if it exists in the first year of data available, so that it can be used as a base
year, otherwise the WRTN_PREM_AMT is the only column used
AGENCY_APPOINTMENT_YEAR year the agency started doing business with Azure
ACTIVE_PRODUCERS number of active producers in the agency
contains missing values & results may not be accurate) maximum age producer at
MAX_AGE
that agency
MIN_AGE minimum age producer at that agency
VENDOR_IND indicator column to specify whether the agency subscribes to a vendor
VENDOR the vendor that the agency subscribes to
PL_START_YEAR year the agency started using the PL vendor
PL_END_YEAR year the agency stopped using the PL vendor
COMMISIONS_START_YEAR year the agency started using the COMMISIONS vendor
COMMISIONS_END_YEAR year the agency stopped using the COMMISIONS vendor
CL_START_YEAR year the agency started using the CL vendor
CL_END_YEAR year the agency stopped using the CL vendor
ACTIVITY_NOTES_START_YEAR year the agency started using the ACTIVITY NOTES vendor
ACTIVITY_NOTES_END_YEAR year the agency stopped using the ACTIVITY NOTES vendor
number of bound policies quoted through a MDS (probably a data recording error,
CL_BOUND_CT_MDS should be DSM) in the current year to date, that is the first six months of 2015, in
commercial lines )
number of quoted policies through a MDS (probably a data recording error, should
CL_QUO_CT_MDS be DSM) in the current year to date, that is the first six months of 2015, in
commercial lines )
number of bound policies quoted through a SBZ in the current year to date, that is
CL_BOUND_CT_SBZ
the first six months of 2015, in commercial lines
number of quoted policies through a SBZ in the current year to date, that is the first
CL_QUO_CT_SBZ
six months of 2015, in commercial lines
number of bound policies quoted through an eQT in the current year to date, that is
CL_BOUND_CT_eQT
the first six months of 2015, in commercial lines
number of quoted policies though an eQT in the current year to date, that is the first
CL_QUO_CT_eQT
six months of 2015, in commercial lines
number of bound policies quoted through ELINKS since September 2013 in personal
PL_BOUND_CT_ELINKS
lines
PL_QUO_CT_ELINKS number of quoted policies though ELINKS since September 2013 in personal lines
number of bound policies quoted through PLRANK since September 2013 in personal
PL_BOUND_CT_PLRANK
lines
PL_QUO_CT_PLRANK number of quoted policies though PLRANK since September 2013 in personal lines
number of bound policies quoted through eQTte since September 2013 in personal
PL_BOUND_CT_eQTte
lines
PL_QUO_CT_eQTte number of quoted policies though eQTte since September 2013 in personal lines
number of bound policies quoted through APPLIED since September 2013 in personal
PL_BOUND_CT_APPLIED
lines
PL_QUO_CT_APPLIED number of quoted policies though APPLIED since September 2013 in personal lines
number of bound policies quoted through TRANSACTNOW since September 2013 in
PL_BOUND_CT_TRANSACTNOW
personal lines

number of quoted policies though TRANSACTNOW since September 2013 in personal


PL_QUO_CT_TRANSACTNOW
lines

PROD_ABBR DESCRIPTION DESCRIPTION DETAILED


Policy provides coverage for physical damage to and fina
equipment breakdown. Boiler and machinery insurance,
BOILERMACH Bolier Machinery
breakdown insurance, covers the cost of repairing or rep
equipment and business losses incurred from the equipm
Business Owner's comprehensive package of Property and Liability Covera
BOP
Package of small to mid-sized business entities
COMMAUTO Automobile vehicles used for business
Type of Property Insurance policy that covers certain kin
property and other specialized items that standard Prope
can't cover. Many Inland Marine policies cover insured pr
where it's located.
COMMINLMAR Inland Marine
range of items, such as cameras, musical instruments, a
equipment. These things move from location to location,
to more opportunities for loss, damage, and theft. These
call for a separate form of property coverage, and that's w
coverage comes in.
COMMPOL    
COMMUMBREL Umbrella higher liability limits and broadened coverage.
CRIME Crime protection from losses resulting from business-related cri
Standard fire insurance policy extended to cover associa
FIREALLIED Fire Allied
damage to the property
Garage Keeper's
GARAGE Policy for vehicles protection within business premeises
Liability
Broad type of insurance policy which provides liability ins
GENERALIAB General Liability
business risks
Policy for employees of a firm, business or organization i
Worker's
WORKCOMP performance of a job function or duty. This is mandatory
Compensation
majority of the U.S. states.
ANNIV    
ANNIV 12    
CYCLES Cycles Discounted bike insurance covering theft, accidental dam
CYCLES 12    
DTALK    
DTALK 12    
Fire policy designed to insure a dwelling, which is anothe
DWELLFIRE Dwell Fire
structure people may live in
Policy covers losses and damages to an individual's hou
HOMEOWNERS Homeowners
the home
MOBILEHOME Mobilehome Policy for mobile and manufactured homes
MOTORHOM12    
MOTORHOME automobiel Policy forhome car insurance
Special Automobile helps make auto insurance available to drivers who are l
PERSAIP
Insurance Policy because of limited financial resources.
PERSINLMAR Inland Marine same as commercial
Policy designed to add extra liability coverage over and a
PERSUMBREL Umbrella
insurance policy, such as auto or homeowners insurance
SNOWMOBI12    
SNOWMOBILE Snowmobile Policy to protect snowy adventure
YACHT Yacht Policy that provides indemnity liability coverage on pleas

You might also like