Professional Documents
Culture Documents
Kingword
Kingword
information. Many of the techniques and processes of data analytics have been
automated into mechanical processes and algorithms that work over raw data for
human consumption.
1. Health Care
2. Customer Acquisition & Retention
3. Internet Search
4. Risk Detection Management
5. Security
6. Education
7. Business Automation
8. Marketing & Data Advertising
1. Regression Analysis
2. Monte Carlo Simulation
3. Factor Analysis
4. Cohort Analysis
5. Cluster Analysis
6. Time Series Analysis
7. Sentiment Analysis
1. Microsoft Excel
2. Apache Spark
3. Rapidminer
4. Google Analytics
5. Tableau
6. Zoho Analytics
7. Microsoft Power BI
Microsoft Excel is a spreadsheet editor developed
by Microsoft for Windows, macOS, Android, iOS and iPadOS. It
features calculation or computation capabilities, graphing tools, pivot tables, and
a macro programming language called Visual Basic for Applications (VBA). Excel
forms part of the Microsoft 365 suite of software.
Uses of MS Excel
1. Regression Analysis
2. Data Entry and Storage
3. Performing Calculations
4. Data Analysis and Interpretation
5. Reporting and Visualizations
6. Accounting and Budgeting
7. Collection and Verification of Business Data
8. Calendars and Schedules
9. Administrative and Managerial Duties
10. Forecasting and Automating Repetitive Tasks
Advanced Excel Functions refers to the features and functions of Microsoft Excel,
which helps the user to perform complex calculations, perform data analysis, and
much more. In this article, you will learn some of the most commonly used advanced
functions in Excel.
Uses of MS Excel(Advanced)
1. Data Consolidation
2. Visual Representations
3. Power Query
4. Quality Control
5. Conditional Formatting
6. Advanced Charting
7. Pivot Tables
8. Data Selection
What is power BI ?
Microsoft power is a business intelligence (BI) platform that provides nontechnical
business users with tools aggregating, analysis, visualizing and sharing data. Power
BI’s user interface is fairly intuitive for user familiar with excel, and its deep
integration with other Microsoft products make its versatile self-service tool that
requires little upfront training.
Common uses of power BI
Microsoft power BI is used to find insights with an organization’s data. Power BI can
help connect disparate data sets, transform and clean the data into a data model and
create chats or graphs to provide visual of the data.
Data analysts often use Python to describe and categorize the data that currently
exists. They engage in exploratory data analysis, which includes profiling the data,
visualizing results, and creating observations to shape the next steps in the
analysis. Python can be used to manipulate data (using libraries such as pandas),
streamline workflows, and create visualizations (using Matplotlib).
Many businesses generate a trove of data that helps them better understand their
customers and processes. However, this data isn’t helpful in and of itself. What
businesses must do is develop non-intuitive, predictive, and insightful information
from this data, which is called data mining.
Data modeling is a process that helps data scientists define and classify data so that
it can be aligned to business hierarchies or other structures necessary for analysis.
The goal of data modeling is to produce high quality, consistent, structured data for
running business applications and informing decision makers. Python is one of the
most helpful tools in data modeling because it is highly scalable, flexible, well-
supported, and has a robust user community.. It handles large data sets efficiently
and is excellent for data categorization and building hierarchies as well.
Project work : insurance data
Description: An insurance group consists of 10 property and casualty insurance, life
insurance and brokerage companies. The property and casualty companies in the
group operate in a 17-state region. The group is a major regional property and
casualty insurer, represented by more than 4,000 independent agents who live and
work in local communities through a six states region. Define the metrices to analysis
agent performance based on several attributes like demography, products sold, new
business, etc. the goal is improve their existing knowledge used for agent
segmentation in a supervised predictive framework.
References :
AGENCY_ID Description
PRIMARY_AGENCY_ID master agency if part of group
PROD_ABBR 33 products of which 14 are CL and 19 are PL
PROD_LINE commercial lines (CL) or personal lines (PL)
STATE_ABBR
STAT_PROFILE_DATE_YEAR data starts in mid-2005 and continues into 2015
RETENTION_POLY_QTY current number of policies that are still active from previous year
POLY_INFORCE_QTY number of policies active for that year
PREV_POLY_INFORCE_QTY number of policies active in the previous year
NB_WRTN_PREM_AMT new business in written premium
WRTN_PREM_AMT total written premium
PREV_WRTN_PREM_AMT written premium during the same period in the previous year
PRD_ERND_PREM_AMT amount of premium taken in
PRD_INCRD_LOSSES_AMT losses
number of months included in the data for that year; the original data was monthly
MONTHS and some months were missing so the aggregate doesn’t make up the entire year if
the months value is less than 12
computed & computed for each row in the data as RETENTION_POLY_QTY /
PREV_POLY_INFORCE_QTY; therefore it’s a granular measure of the retention
RETENTION_RATIO
for that agency writing that particular product in that particular state from the
previous year
compute & computed for each row in the data as PRD_INCRD_LOSSES_AMT /
WRTN_PREM_AMT; currently I’m only computing results where the WRTN_PREM_AMT
is greater than 0; however there are many cases where there are losses but no
LOSS_RATIO
premium maybe from a previous claim that’s still being paid, so I included codes to
indicate whether there are positive or negative losses on zero premiums; they are
located at the end of this document
computed & computed by agency by line of business by year for the three year
period ending in that year, if there is data for three years, otherwise the two years or
one year of data available; to make this more tangible, the first complete year of data
we have for an agency will have the loss ratio for that year; the second complete year
LOSS_RATIO_3YR
of data will have the mean loss ratio for that year and the previous year; the third
complete year of data will have the mean loss ratio for those three years; then the
fourth and greater will have the mean of the three year period ending in that year;
note that the mean loss ratios are computed independently for PL and CL
computed & computed by agency by line of business by year for the three year
period ending in that year; measures the average growth in written premium for that
agency in that line of business; only computes results for agencies that have data for
the entire range of years; since the measure is over three years of growth, there
GROWTH_RATE_3YR
needs to be a base year to be used as a standard so four years of data are needed;
in order to include as many results as possible, the PREV_WRTN_PREM_AMT column
is used if it exists in the first year of data available, so that it can be used as a base
year, otherwise the WRTN_PREM_AMT is the only column used
AGENCY_APPOINTMENT_YEAR year the agency started doing business with Azure
ACTIVE_PRODUCERS number of active producers in the agency
contains missing values & results may not be accurate) maximum age producer at
MAX_AGE
that agency
MIN_AGE minimum age producer at that agency
VENDOR_IND indicator column to specify whether the agency subscribes to a vendor
VENDOR the vendor that the agency subscribes to
PL_START_YEAR year the agency started using the PL vendor
PL_END_YEAR year the agency stopped using the PL vendor
COMMISIONS_START_YEAR year the agency started using the COMMISIONS vendor
COMMISIONS_END_YEAR year the agency stopped using the COMMISIONS vendor
CL_START_YEAR year the agency started using the CL vendor
CL_END_YEAR year the agency stopped using the CL vendor
ACTIVITY_NOTES_START_YEAR year the agency started using the ACTIVITY NOTES vendor
ACTIVITY_NOTES_END_YEAR year the agency stopped using the ACTIVITY NOTES vendor
number of bound policies quoted through a MDS (probably a data recording error,
CL_BOUND_CT_MDS should be DSM) in the current year to date, that is the first six months of 2015, in
commercial lines )
number of quoted policies through a MDS (probably a data recording error, should
CL_QUO_CT_MDS be DSM) in the current year to date, that is the first six months of 2015, in
commercial lines )
number of bound policies quoted through a SBZ in the current year to date, that is
CL_BOUND_CT_SBZ
the first six months of 2015, in commercial lines
number of quoted policies through a SBZ in the current year to date, that is the first
CL_QUO_CT_SBZ
six months of 2015, in commercial lines
number of bound policies quoted through an eQT in the current year to date, that is
CL_BOUND_CT_eQT
the first six months of 2015, in commercial lines
number of quoted policies though an eQT in the current year to date, that is the first
CL_QUO_CT_eQT
six months of 2015, in commercial lines
number of bound policies quoted through ELINKS since September 2013 in personal
PL_BOUND_CT_ELINKS
lines
PL_QUO_CT_ELINKS number of quoted policies though ELINKS since September 2013 in personal lines
number of bound policies quoted through PLRANK since September 2013 in personal
PL_BOUND_CT_PLRANK
lines
PL_QUO_CT_PLRANK number of quoted policies though PLRANK since September 2013 in personal lines
number of bound policies quoted through eQTte since September 2013 in personal
PL_BOUND_CT_eQTte
lines
PL_QUO_CT_eQTte number of quoted policies though eQTte since September 2013 in personal lines
number of bound policies quoted through APPLIED since September 2013 in personal
PL_BOUND_CT_APPLIED
lines
PL_QUO_CT_APPLIED number of quoted policies though APPLIED since September 2013 in personal lines
number of bound policies quoted through TRANSACTNOW since September 2013 in
PL_BOUND_CT_TRANSACTNOW
personal lines