Professional Documents
Culture Documents
AnalytixWise - Risk Analytics Unit 2 Data and Analytics Prerequisite
AnalytixWise - Risk Analytics Unit 2 Data and Analytics Prerequisite
2
Risk Analytics
▪ To facilitate effective risk decisions, data must be turned into the Right Information and delivered
to the Right People in an understandable format at the Right Time via the Right Channel. [4R’s]
▪ As banks face increasing regulatory scrutiny, risk managers and senior bankers must adapt their
current Risk Management practices to meet the evolving regulations – such as:
▪ Basel II and III,
▪ Financial Reporting (FINREP)
▪ Compliance Reporting (COREP)
▪ International Financial Reporting Standards (IFRS) 7 and 9
▪ Dodd-Frank
▪ Comprehensive Capital Analysis and Review (CCAR)
▪ The European Central Bank’s (ECB) Asset Quality Review (AQR) and the stress tests
▪ Data became particularly relevant when the Basel Committee on Banking Supervision (BCBS)
published a document, Principles for Effective Risk Data Aggregation and Risk Reporting (BCBS
239) in 2013
4
Risk Regulations Data & Information Requirements
5
Analytical Data is needed for Regulatory & Business Risk Reporting
▪ Analytical data is the data a bank uses in its regulatory and business risk reporting,
which is the subject of the BCBS principles
▪ Analytical data is different from the operational or transactional data that banks
traditionally use, in that it:
▪ Is sourced from different types of systems – finance, assets, capital modeling, and risk systems – many of
which are highly specialized and desktop-based. These systems in turn rely on data from core
administration/IT systems
▪ Requires a high degree of granularity to support multi-dimensional reporting (e.g., interactive dashboards)
▪ Often must be aggregated and consolidated
▪ Must be readily available, accurate, and comprehensive (covering all risks) to support monthly, quarterly,
and annual reporting cycles – as well as ad hoc and real-time analyses
▪ Is used primarily in regulatory, business, and financial reporting and to support risk and capital decision-
making
6
Main Categories of Analytical Data for Risk Analytics
7
Key Challenges Faced regrading Data for Risk Analytics
1. Massive amounts of data - Banks have large amounts of different types of data stored
in multiple, siloed systems based on entities, lines of businesses, risk types, etc. Many
of these systems are old and antiquated, with no standardized data models or data
sets. After identifying what systems contain analytical data, extracting, standardizing,
and consolidating it is the next challenge
2. Lack of a common data model and standards - Few banks today have an enterprise-
wide analytical data model [FS-LDM], which makes standardizing data sets and
aggregating data difficult
3. Reliance on manual data processes - Many banks still employ large numbers of
employees to review and validate data, as well as find “gaps.” This manual approach is
slow, costly, and almost impossible to analyse and audit
8
Key Challenges Faced regrading Data for Risk Analytics
4. Low quality data and audit trails - Most banks struggle with the poor quality of data
held in their core systems due to input errors, unchecked changes, and the age of the
data. This limitation is compounded by the fact that banks often have multiple loan,
credit card, asset, administration, and finance systems with no common data (or
metadata) models. The lack of sound audit trails and lineage are an issue, particularly
with legacy systems and manual processes
5. Structured and unstructured data - A significant amount of data within a bank is still
unstructured (e.g., information that does not have predefined relationships), such as
within portfolios or derivatives, and is not stored in existing centralized databases.
These data are in the forms of Documents, PDFs, Images, Media Files and other
proprietary formats, which are not amenable for easy processing to derive information.
Aggregating unstructured data and combining it with structured data is the key
challenge
9
Key Challenges Faced regrading Data for Risk Analytics
6. Accurate counterparty data - A particular problem for banks is getting the deep,
accurate, and granular counterparty data essential for credit risk modelling –
classification, jurisdiction, entity type, etc. Complexity increases with guarantors and
insurers between layers. Counterparty data can also come from multiple sources
10
A Typical Data Management & Governance Framework for RA
11
Addressing Data Quality
▪ A key element in the management of data is improving the quality of raw data held in
the source and modelling systems. Within the data quality process, there are two
factors to consider:
▪ First, data quality should not be regarded as a one-off process. New data is always emerging, so there has to be a process to
monitor data quality on an ongoing basis. The documentation of this process should be automated as much as possible
▪ Second, data continues to diversify. Perhaps one of the most interesting recent developments is the ability to enrich validated
data with extra data from external sources, such as the sociographic or demographic data of policyholders or buying habits
from supermarket chains. This widening of data types is particularly useful for enhancing the single view of a customer. It does,
however, add a layer of complexity into the system
▪ Dimensional Model supports two kinds of Schema – Star & Snowflake Schema
13
Star Schema for Risk Analytics
▪ The Star Schema is the simplest data warehouse schema, which comprises of a
central Fact Table surrounded by two or more Dimension Tables
▪ The center of the star consists of one or more Fact Tables and the points of the
star are the Dimension Tables
Product
Date
Transaction Fact
Dimension Dimension
Tables Tables
Transaction Amount
Account Location
14
Snowflake Schema for Risk Analytics
▪ The Snowflake Schema is a type of Star Schema itself but a bit more complex
data warehouse model than a star schema
▪ In the Snowflake Schema, the Dimension Tables are further normalized to
remove redundancy
Year
Hierarchies
Category Quarter
Sub-Category Month
Product Date
Normalized Transaction Fact Normalized
Dimension Dimension
Tables Transaction Amount Tables
Account Location
Sub-Type State
A/c Type Logical Snowflake Schema Model Country
15
Logical Data Model For Financial Services [Teradata FS-LDM]
16
Categories of Data Types
Data Types
Internal External
Semi- Semi-
Publicly
Structured Structured & Structured Structured &
Available Data
Unstructured Unstructured
• ERP Data • Data from Email • Data from Syndicators • Data from Social Media • Macro-economic
• Data in Databases • ESAT Survey Data • EDI Data • CSAT Survey Data • Weather Data
• Delimited Files • Resumes & Certificates • Data from Payment • PDF’s / Docs • Geo-spatial Data
• Excel Files • Images & Videos Gateways • Speech files, • Government Docs
• Images & Videos • Satellite Images
• Surveillance Images
17
Some Analytics are closer to Decisions than others…
Higher
Predictive
Decisions
Business Value
Actions
“What will happen?”
Diagnostic
“Why did it happen?”
Descriptive
“What has happened?”
Prescriptive
Analytics
Predictive
Analytics
Diagnostic
Analytics
Descriptive
Analytics
19
BFSI - Opportunities at the Intersection of Data & Analytics Sophistication
DANAS Framework
Virtual Assistants / Intelligent Document Processing
Cognitive Chatbots Entity Extraction, OCR Image Recognition, Tabular Data Extraction from
unstructured Documents / PDF’s
Intelligence
Analytical Sophistication
AI based Credit Algorithmic RPA for KYC /
Artificial ITSM Automation Scoring with Mortgage
Trading
Intelligence collection Traits Processing
Premium
Predictive Forecasting; Fraud Prevention Early Warning Customer Loan
Agency Attrition & AML analytics Prediction Segmentation Decomposition
Analytics
Prediction
Customer Lifetime
Descriptive Value; Fund Performance
Analytics Branch Analysis
Performance;
Predictive
Decisions
Business Value
Actions
“What will happen?”
Diagnostic
“Why did it happen?”
Descriptive
“What has happened?”
• Media sites rely on machine learning to sift through millions of options to give you
song or movie recommendations
• Retailers use it to gain insight into their customers’ purchasing behaviour
• Utility companies use it to predict machinery breakdowns and hence can prevent
outages proactively
• Telecom Companies can prevent customer churn using ML
• BFSI Industry to identify & measure Risks
23
Advanced Analytics [ML/AI] – Application Fabric
24
Advanced Analytics Projects – Are they different from other projects
1. Yes & No
2. New / Existing clients faces certain business problems from time to time, which needs to be
resolved
3. They may have all the data pertaining to the business but do not know how to analyze and derive
insights, so as to take the right decisions
4. Many a times factors which impacts the desired business outcomes cannot by found using
traditional BI (Descriptive and Diagnostic analytics) and are posed to the Data Scientists to solve
5. Data Scientists need to resort to Predictive and Prescriptive analytics techniques, which comes
under the purview of “Advanced Analytics”
6. Both diagnosis and solution to these problems can be found after crunching through all the
underlying data to find patterns, trends and associations
7. These are made possible by Machine Learning Algorithms which basically helps in
▪ Data assisted Human Decisions
▪ Data assisted machine Decisions
8. All such problems and decision-making scenarios can be either operational, tactical or strategic in
nature
25
Vertical / Horizontal Advanced Analytics Solutions
Vertical Centric Solutions
Retail, Customer Marketing
Customer Store Demand
Loyalty Mix
CPG Segmentation Segmentation Forecasting
Analytics Analytics
26
When to Use ML/AI?
▪ For example, Machine Learning is a good option if you need to handle situations
like these:
Hand-written rules and The rules of a task are The nature of data keeps
equations are too complex constantly changing changing, hence need to adapt
- as in face recognition, speech - as in fraud detection from - as in automated trading,
recognition, risk modelling transaction records energy demand forecasting, and
predicting risk trends
27
Different Data Types of Variables
CATEGORICAL DATA CONTINUOUS NUMERICAL DATA
Categorical data has values to describe entities or Continuous numerical data shows values about various
classes around us. Example: Products, Customer measures. Example: Sales, Salaries, Temperature, Stock
Segments, Regions, Months, Groups, Risk Types etc. Prices, Mark to Market values, Inventory, etc.
28
How to decide which ML/AI Algorithm to use?
1. Choosing the right algorithm can seem
overwhelming - there are dozens of
supervised and unsupervised machine
learning algorithms, and each takes a
different approach to learning
2. There is no best method or one size fits all
3. Finding the right algorithm is partly just trial
and error - even highly experienced data
scientists can’t tell whether an algorithm
will work without trying it out
4. But algorithm selection also depends on the
size and type of data you’re working with,
the insights you want to get from the data,
and how those insights will be used
29
How to decide which ML/AI Algorithm to use?
Interpretability
Note: Using larger training datasets often yield
models that generalize well for new data.
30
How to decide which ML/AI Algorithm to use?
31
Challenges faced in a Data Science Project
▪ Most machine learning challenges relate to handling the data and finding the right
model
▪ Data comes in all shapes and sizes. Real-world datasets can be messy, incomplete, and
in a variety of formats. You might just have simple numeric data. But sometimes
you’re combining several different data types, such as sensor signals, text, and
streaming images from a camera
▪ Pre-processing your data might require specialized knowledge and tools. For example,
to select features to train an object detection algorithm requires specialized
knowledge of image processing
▪ It takes time to find the best model to fit the data. Choosing the right model is a
balancing act. Highly flexible models tend to over fit data by modelling minor
variations that could be noise. On the other hand, simple models may assume too
much. There are always trade-offs between model speed, accuracy, and complexity
Problem Statement • Understand the business domain and the problem context
Definition • Frame the necessary hypotheses that fit the context
• This involves Data Munging/Wrangling – Preparing data (from its raw state) for a
Data Preparation dedicated purpose and use-case scenario
Predictive Model Building, • Apply Feature Engineering concepts, use relevant supervised or unsupervised
Training & Scoring machine Learning techniques to uncover insights and patterns
33
Iterative Analytics Problem Solving Framework
Define the
Problem diagnosis Make actionable
analysis objective and resolution recommendations
SME driven Collaborative Hypothesis driven approach Context Sensitive Performance
& Planned beginning • Use collaborative techniques /
driven approach
• Define business questions to JAD sessions to zero in the • Measure, Track & Report on
be answered & corresponding relevant hypotheses Business outcomes
business value • Validate and prioritise • Recommend tactical action
• Ensure Alignment among hypotheses with the business backed by Cost-Benefit /ROI
constituents before significant analysis
investment in analytics
Reporting
Data
Capabilities Management
Visualize Insights Determine Information Need,
& Communicate gather and wrangle data
•
•
Represent insights through rich
data visualizations
Communicate business insights
Problem diagnosis •
•
•
Identify internal & external data sources
Data Sampling, profiling & exploration
Extract and stage data
•
and business impacts to relevant and resolution Data Quality Analysis and anomaly or
outlier detection
stakeholders and get their buy-in
• Integrate with existing BI Tools for • Data cleansing, wrangling and
reporting insights Supervised & Unsupervised preparation
Modelling & Statistical Analysis • Data storage and security
• Master data conforming
• Apply use case specific statistical & • Missing value imputation
predictive modelling / algorithms • Determine Data partition strategy
• Train/Score and iterate for accuracy
improvement
34
4D+O Methodology: For Advanced Analytics Solution Delivery
Discover Design Develop Deploy
Production
Identify Exploratory Run,
Data Collection Iterative
Business Data Analysis Communicate
& Preliminary Model
Problem & & Preparation, Business
Data Quality Development
Hypothesis and Feature Insight &
Analysis & Validation
Generation Engineering Measure
Outcomes
Frame the business Collect data (internal or EDA, Data Preparation Execute Model with
problem and identify external) pertinent to & Wrangling; Build Models using Production Data;
possible alternative to the business problem; Missing Value Analytics Tools; Present the Actionable
framing; Perform data profiling Imputation; Sample Data for Insights in the form of
Generate Hypothesis and data analysis to Identify and/or create Training the Models; reports, KPIs and
and scenarios identify quality gaps or derived variables that Score/Validate Models; Dashboards using Data
pertaining to the known trends & may affect the Iterative review of Visualizations;
business case; patterns outcome; model outputs; Track & Report KPIs
Operationalize
Persist the output of the Analytical Model in
new data structures; Integrate &
Integrate with Data Integration jobs with Automate
requisite frequency to automate the analytics with existing
output; Business
Expose the Analytics Information output for Analytics
consumption by downstream BI applications
35
Data Science Project in a Nut-Shell
Model 80% of
1. Prepare to create a Model the
Work
80% of
3. Interpret the Results the
Impact
36
Various Data Analytics use cases in Risk Modelling in BFSI
Unit 4 Unit 3
37
Lorem ipsum dolor sit amet
Header 3
Thank
Lorem ipsum
Header 1
dolor sit amet Check out01
video tutorials atdolor
Lorem ipsum … Coming
sit amet soon…
Header 1
Lorem ipsum
dolor sit amet
Header 2
Lorem ipsum For any queries please write to us at
dolor sit amet
adroy@yahoo.com
Header 1
03 Lorem ipsum dolor sit amet