Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Risk Analytics

Unit 2 – Data & Analytical Pre-requisites

Arup Duttaroy | Mar 2022


Agenda
▪ Data – The Foundation of Risk Management ▪ Analytical Sophistication needed for Risk
& Analytics Analytics
▪ Risk Regulations – Data & Information ▪ Six Analytical Sophistication Level
Requirements ▪ Why Advanced Analytics?
▪ Need for Analytical Data for Risk & Compliance ▪ ML/AI – Application Fabric
Reporting ▪ When to use ML/AI?
▪ Key Challenges Faced for Risk Analytics ▪ Machine Learning 101
▪ Typical Data Management & Governance ▪ Getting Started with a Data Science Project
Framework/Architecture for Risk Analytics ▪ 4D+O Methodology for Data Science Projects
▪ Importance of Addressing Data Quality
▪ Dimensional Model for Designing OLAP
applications
▪ Financial Services Logical Data Model (FS-LDM)
▪ Five Different Data Types
▪ Six Analytical Sophistication Level
▪ The DANAS Framework

2
Risk Analytics

Unit 2 – Data & Analytical Pre-requisites


Topic 2.1 – Data The foundation of Risk Management & Analytics
Data the foundation for Risk Management & Analytics
▪ Obtaining and storing data is crucial, but it is only the first step

▪ To facilitate effective risk decisions, data must be turned into the Right Information and delivered
to the Right People in an understandable format at the Right Time via the Right Channel. [4R’s]

▪ As banks face increasing regulatory scrutiny, risk managers and senior bankers must adapt their
current Risk Management practices to meet the evolving regulations – such as:
▪ Basel II and III,
▪ Financial Reporting (FINREP)
▪ Compliance Reporting (COREP)
▪ International Financial Reporting Standards (IFRS) 7 and 9
▪ Dodd-Frank
▪ Comprehensive Capital Analysis and Review (CCAR)
▪ The European Central Bank’s (ECB) Asset Quality Review (AQR) and the stress tests
▪ Data became particularly relevant when the Basel Committee on Banking Supervision (BCBS)
published a document, Principles for Effective Risk Data Aggregation and Risk Reporting (BCBS
239) in 2013
4
Risk Regulations Data & Information Requirements

5
Analytical Data is needed for Regulatory & Business Risk Reporting
▪ Analytical data is the data a bank uses in its regulatory and business risk reporting,
which is the subject of the BCBS principles
▪ Analytical data is different from the operational or transactional data that banks
traditionally use, in that it:
▪ Is sourced from different types of systems – finance, assets, capital modeling, and risk systems – many of
which are highly specialized and desktop-based. These systems in turn rely on data from core
administration/IT systems
▪ Requires a high degree of granularity to support multi-dimensional reporting (e.g., interactive dashboards)
▪ Often must be aggregated and consolidated
▪ Must be readily available, accurate, and comprehensive (covering all risks) to support monthly, quarterly,
and annual reporting cycles – as well as ad hoc and real-time analyses
▪ Is used primarily in regulatory, business, and financial reporting and to support risk and capital decision-
making

6
Main Categories of Analytical Data for Risk Analytics

7
Key Challenges Faced regrading Data for Risk Analytics
1. Massive amounts of data - Banks have large amounts of different types of data stored
in multiple, siloed systems based on entities, lines of businesses, risk types, etc. Many
of these systems are old and antiquated, with no standardized data models or data
sets. After identifying what systems contain analytical data, extracting, standardizing,
and consolidating it is the next challenge

2. Lack of a common data model and standards - Few banks today have an enterprise-
wide analytical data model [FS-LDM], which makes standardizing data sets and
aggregating data difficult

3. Reliance on manual data processes - Many banks still employ large numbers of
employees to review and validate data, as well as find “gaps.” This manual approach is
slow, costly, and almost impossible to analyse and audit

8
Key Challenges Faced regrading Data for Risk Analytics
4. Low quality data and audit trails - Most banks struggle with the poor quality of data
held in their core systems due to input errors, unchecked changes, and the age of the
data. This limitation is compounded by the fact that banks often have multiple loan,
credit card, asset, administration, and finance systems with no common data (or
metadata) models. The lack of sound audit trails and lineage are an issue, particularly
with legacy systems and manual processes

5. Structured and unstructured data - A significant amount of data within a bank is still
unstructured (e.g., information that does not have predefined relationships), such as
within portfolios or derivatives, and is not stored in existing centralized databases.
These data are in the forms of Documents, PDFs, Images, Media Files and other
proprietary formats, which are not amenable for easy processing to derive information.
Aggregating unstructured data and combining it with structured data is the key
challenge
9
Key Challenges Faced regrading Data for Risk Analytics
6. Accurate counterparty data - A particular problem for banks is getting the deep,
accurate, and granular counterparty data essential for credit risk modelling –
classification, jurisdiction, entity type, etc. Complexity increases with guarantors and
insurers between layers. Counterparty data can also come from multiple sources

7. Regulatory compliance - A plethora of regulatory initiatives focus on accurate and


correct data at the right level of granularity with full audit trails. Banks not only need
governance frameworks, but also IT platforms that actually deliver and aid compliance

10
A Typical Data Management & Governance Framework for RA

11
Addressing Data Quality
▪ A key element in the management of data is improving the quality of raw data held in
the source and modelling systems. Within the data quality process, there are two
factors to consider:
▪ First, data quality should not be regarded as a one-off process. New data is always emerging, so there has to be a process to
monitor data quality on an ongoing basis. The documentation of this process should be automated as much as possible
▪ Second, data continues to diversify. Perhaps one of the most interesting recent developments is the ability to enrich validated
data with extra data from external sources, such as the sociographic or demographic data of policyholders or buying habits
from supermarket chains. This widening of data types is particularly useful for enhancing the single view of a customer. It does,
however, add a layer of complexity into the system

Detailed Process For Improving Data Quality 12


Dimensional Model for Risk Analytics
▪ A Dimensional Model is used to design the data warehouse schema to store data
that you need for Online Analytical Processing (OLAP) Applications

▪ A schema is a collection of database objects, including tables, views, indexes, and


synonyms

▪ Dimensional Model supports two kinds of Schema – Star & Snowflake Schema

13
Star Schema for Risk Analytics
▪ The Star Schema is the simplest data warehouse schema, which comprises of a
central Fact Table surrounded by two or more Dimension Tables
▪ The center of the star consists of one or more Fact Tables and the points of the
star are the Dimension Tables

Product
Date
Transaction Fact
Dimension Dimension
Tables Tables
Transaction Amount

Account Location

Logical Star Schema Model Physical Star Schema Model

14
Snowflake Schema for Risk Analytics
▪ The Snowflake Schema is a type of Star Schema itself but a bit more complex
data warehouse model than a star schema
▪ In the Snowflake Schema, the Dimension Tables are further normalized to
remove redundancy
Year
Hierarchies
Category Quarter
Sub-Category Month
Product Date
Normalized Transaction Fact Normalized
Dimension Dimension
Tables Transaction Amount Tables

Account Location
Sub-Type State
A/c Type Logical Snowflake Schema Model Country

15
Logical Data Model For Financial Services [Teradata FS-LDM]

16
Categories of Data Types

Data Types

Internal External

Semi- Semi-
Publicly
Structured Structured & Structured Structured &
Available Data
Unstructured Unstructured

• ERP Data • Data from Email • Data from Syndicators • Data from Social Media • Macro-economic
• Data in Databases • ESAT Survey Data • EDI Data • CSAT Survey Data • Weather Data
• Delimited Files • Resumes & Certificates • Data from Payment • PDF’s / Docs • Geo-spatial Data
• Excel Files • Images & Videos Gateways • Speech files, • Government Docs
• Images & Videos • Satellite Images
• Surveillance Images

17
Some Analytics are closer to Decisions than others…
Higher

Prescriptive Decision Automation


“What How do I ensure what would happen would happen &
What Should I Do then?” Decision Support

Predictive

Decisions
Business Value

Actions
“What will happen?”

Diagnostic
“Why did it happen?”

Descriptive
“What has happened?”

Analysis Human Interventions


Analytical Recommended
Lower Data Information Knowledge
Insights Actions Higher
Complexity of Analysis

Source : Gartner 2014 18


Opportunities at the Intersection of Data & Analytics Sophistication
DANAS Framework
Cognitive
Intelligence
Analytical Sophistication
Artificial
Intelligence

Prescriptive
Analytics

Predictive
Analytics

Diagnostic
Analytics

Descriptive
Analytics

Internal Internal External External Macro /


Structured Unstructured Structured Unstructured Industry /
Data Data Data Data Weather /
Geo-spatial
Data Types or Sources Data

19
BFSI - Opportunities at the Intersection of Data & Analytics Sophistication
DANAS Framework
Virtual Assistants / Intelligent Document Processing
Cognitive Chatbots Entity Extraction, OCR Image Recognition, Tabular Data Extraction from
unstructured Documents / PDF’s
Intelligence
Analytical Sophistication
AI based Credit Algorithmic RPA for KYC /
Artificial ITSM Automation Scoring with Mortgage
Trading
Intelligence collection Traits Processing

Initial Credit Limit Automated ML based A/B


Prescriptive Geo sensitive
/ Collection Contract scoring for new
promotional offers
Analytics Optimization Responses Products

Premium
Predictive Forecasting; Fraud Prevention Early Warning Customer Loan
Agency Attrition & AML analytics Prediction Segmentation Decomposition
Analytics
Prediction

Employee Payments Straight Voice of Customer


Diagnostic Regulatory
Satisfaction Through
Reporting; Analytics
Analytics Survey Analytics processing

Customer Lifetime
Descriptive Value; Fund Performance
Analytics Branch Analysis
Performance;

Internal Internal External External Macro /


Structured Unstructured Structured Unstructured Industry /
Data Data Data Data Weather /
Geo-spatial
Data Types or Sources Data
Risk Analytics
Unit 2 – Data & Analytical Pre-requisites
Topic 2.2 – Analytical Sophistication needed for Risk Analytics
Some Analytics are closer to Decisions than others…
Higher

Prescriptive Decision Automation


“What How do I ensure what would happen would happen &
What Should I Do then?” Decision Support

Predictive

Decisions
Business Value

Actions
“What will happen?”

Diagnostic
“Why did it happen?”

Descriptive
“What has happened?”

Analysis Human Interventions


Analytical Recommended
Lower Data Information Knowledge
Insights Actions Higher
Complexity of Analysis

Source : Gartner 2014 22


Advanced Analytics [ML/AI] – Application Fabric
Machine Learning (ML) algorithms find natural patterns in
data that generate insight and help you make better
decisions and predictions. They are used every day to make
critical decisions in medical diagnosis, stock trading, energy
load forecasting, credit risk scoring, customer attrition and
more.

• Media sites rely on machine learning to sift through millions of options to give you
song or movie recommendations
• Retailers use it to gain insight into their customers’ purchasing behaviour
• Utility companies use it to predict machinery breakdowns and hence can prevent
outages proactively
• Telecom Companies can prevent customer churn using ML
• BFSI Industry to identify & measure Risks
23
Advanced Analytics [ML/AI] – Application Fabric

24
Advanced Analytics Projects – Are they different from other projects
1. Yes & No
2. New / Existing clients faces certain business problems from time to time, which needs to be
resolved
3. They may have all the data pertaining to the business but do not know how to analyze and derive
insights, so as to take the right decisions
4. Many a times factors which impacts the desired business outcomes cannot by found using
traditional BI (Descriptive and Diagnostic analytics) and are posed to the Data Scientists to solve
5. Data Scientists need to resort to Predictive and Prescriptive analytics techniques, which comes
under the purview of “Advanced Analytics”
6. Both diagnosis and solution to these problems can be found after crunching through all the
underlying data to find patterns, trends and associations
7. These are made possible by Machine Learning Algorithms which basically helps in
▪ Data assisted Human Decisions
▪ Data assisted machine Decisions
8. All such problems and decision-making scenarios can be either operational, tactical or strategic in
nature

25
Vertical / Horizontal Advanced Analytics Solutions
Vertical Centric Solutions
Retail, Customer Marketing
Customer Store Demand
Loyalty Mix
CPG Segmentation Segmentation Forecasting
Analytics Analytics

Life Clinical Analytics


Drug Treatment Payer and provider Patient
Sciences Effectiveness analytics Segmentation

Audience Targeting and Recommendation


Media Segmentation
Audience Monetization
Engine

Cross-Sell / Up- Algorithmic Risk Forecasting &


BFSI Customer 360
Sell Trading Analytics

Horizontal Function Centric Solutions


• Segmentation, Marketing Mix Optimization, Competitor Analysis, Channel Analysis, Sales
Sales & Marketing Performance Analysis, Campaign Analysis, Sales Forecasting, A/B Testing, Affinity Analysis
• Loyalty Analytics, Customer Life Time Value, Propensity Analytics, Churn Analytics, Customer
Customer Analytics Segmentation, Up-sell / Cross sell
• Credit Risk Analytics, Anomaly Detection, Social Media Analytics, Sentiment Analytics, Affinity
Risk Analytics Analysis, Fraud Analytics,

26
When to Use ML/AI?

▪ Consider using Machine Learning when


• You have a complex task or problem involving a large amount of data and
• Lots of variables, but no existing formula or equation

▪ For example, Machine Learning is a good option if you need to handle situations
like these:


Hand-written rules and The rules of a task are The nature of data keeps
equations are too complex constantly changing changing, hence need to adapt
- as in face recognition, speech - as in fraud detection from - as in automated trading,
recognition, risk modelling transaction records energy demand forecasting, and
predicting risk trends

27
Different Data Types of Variables
CATEGORICAL DATA CONTINUOUS NUMERICAL DATA

Categorical data has values to describe entities or Continuous numerical data shows values about various
classes around us. Example: Products, Customer measures. Example: Sales, Salaries, Temperature, Stock
Segments, Regions, Months, Groups, Risk Types etc. Prices, Mark to Market values, Inventory, etc.

28
How to decide which ML/AI Algorithm to use?
1. Choosing the right algorithm can seem
overwhelming - there are dozens of
supervised and unsupervised machine
learning algorithms, and each takes a
different approach to learning
2. There is no best method or one size fits all
3. Finding the right algorithm is partly just trial
and error - even highly experienced data
scientists can’t tell whether an algorithm
will work without trying it out
4. But algorithm selection also depends on the
size and type of data you’re working with,
the insights you want to get from the data,
and how those insights will be used
29
How to decide which ML/AI Algorithm to use?

▪ It’s also a trade-off between specific Speed of Training


characteristics of the algorithms, such as:

• Speed of training Memory Usage


• Memory usage
• Predictive accuracy on new data
• Transparency or interpretability (how Prediction Accuracy
easily you can understand the reasons an
algorithm makes its predictions)

Interpretability
Note: Using larger training datasets often yield
models that generalize well for new data.

30
How to decide which ML/AI Algorithm to use?

▪ Working on a Machine Learning assignment means going up


the Analytics Value Chain…
▪ With Machine Learning assignments there’s rarely a straight
line from start to finish. You’ll have to constantly iterate
and try different ideas and approaches
▪ Every machine learning workflow begins with three
questions:
• What kind of data are you working with?
• What insights do you want to get from it?
• How and where will those insights be applied?

▪ Your answers to these questions help you decide your


approach and methodology

31
Challenges faced in a Data Science Project
▪ Most machine learning challenges relate to handling the data and finding the right
model
▪ Data comes in all shapes and sizes. Real-world datasets can be messy, incomplete, and
in a variety of formats. You might just have simple numeric data. But sometimes
you’re combining several different data types, such as sensor signals, text, and
streaming images from a camera
▪ Pre-processing your data might require specialized knowledge and tools. For example,
to select features to train an object detection algorithm requires specialized
knowledge of image processing

▪ It takes time to find the best model to fit the data. Choosing the right model is a
balancing act. Highly flexible models tend to over fit data by modelling minor
variations that could be noise. On the other hand, simple models may assume too
much. There are always trade-offs between model speed, accuracy, and complexity

Sounds Daunting? A systematic Workflow will help 32


Data Science Project Typical Workflow

Problem Statement • Understand the business domain and the problem context
Definition • Frame the necessary hypotheses that fit the context

• Understand what data is required for the problem defined


Data Acquisition • Extract the requisite data from various data sources

• This involves Data Munging/Wrangling – Preparing data (from its raw state) for a
Data Preparation dedicated purpose and use-case scenario

• This involves missing data analysis, outlier detection, factor analysis,


Exploratory Data Analysis understanding different data patterns

Predictive Model Building, • Apply Feature Engineering concepts, use relevant supervised or unsupervised
Training & Scoring machine Learning techniques to uncover insights and patterns

• Communicate predicted outcomes with associated probabilities


Insight Communication • Use appropriate data visualizations to communicate results and insights in an
meaningful format to business users

33
Iterative Analytics Problem Solving Framework
 Define the
Problem diagnosis  Make actionable
analysis objective and resolution recommendations
SME driven Collaborative Hypothesis driven approach Context Sensitive Performance
& Planned beginning • Use collaborative techniques /
driven approach
• Define business questions to JAD sessions to zero in the • Measure, Track & Report on
be answered & corresponding relevant hypotheses Business outcomes
business value • Validate and prioritise • Recommend tactical action
• Ensure Alignment among hypotheses with the business backed by Cost-Benefit /ROI
constituents before significant analysis
investment in analytics

 Reporting
 Data
Capabilities Management
Visualize Insights Determine Information Need,
& Communicate gather and wrangle data


Represent insights through rich
data visualizations
Communicate business insights
Problem diagnosis •


Identify internal & external data sources
Data Sampling, profiling & exploration
Extract and stage data

and business impacts to relevant and resolution Data Quality Analysis and anomaly or
outlier detection
stakeholders and get their buy-in
• Integrate with existing BI Tools for • Data cleansing, wrangling and
reporting insights Supervised & Unsupervised preparation
Modelling & Statistical Analysis • Data storage and security
• Master data conforming
• Apply use case specific statistical & • Missing value imputation
predictive modelling / algorithms • Determine Data partition strategy
• Train/Score and iterate for accuracy
improvement

34
4D+O Methodology: For Advanced Analytics Solution Delivery
 Discover  Design  Develop Deploy
Production
Identify Exploratory Run,
Data Collection Iterative
Business Data Analysis Communicate
& Preliminary Model
Problem & & Preparation, Business
Data Quality Development
Hypothesis and Feature Insight &
Analysis & Validation
Generation Engineering Measure
Outcomes

Frame the business Collect data (internal or EDA, Data Preparation Execute Model with
problem and identify external) pertinent to & Wrangling; Build Models using Production Data;
possible alternative to the business problem; Missing Value Analytics Tools; Present the Actionable
framing; Perform data profiling Imputation; Sample Data for Insights in the form of
Generate Hypothesis and data analysis to Identify and/or create Training the Models; reports, KPIs and
and scenarios identify quality gaps or derived variables that Score/Validate Models; Dashboards using Data
pertaining to the known trends & may affect the Iterative review of Visualizations;
business case; patterns outcome; model outputs; Track & Report KPIs

 Operationalize
Persist the output of the Analytical Model in
new data structures; Integrate &
Integrate with Data Integration jobs with Automate
requisite frequency to automate the analytics with existing
output; Business
Expose the Analytics Information output for Analytics
consumption by downstream BI applications

35
Data Science Project in a Nut-Shell

Model 80% of
1. Prepare to create a Model the
Work

2. Train & Score a Model

80% of
3. Interpret the Results the
Impact

36
Various Data Analytics use cases in Risk Modelling in BFSI

Unit 4 Unit 3

37
Lorem ipsum dolor sit amet
Header 3

Thank
Lorem ipsum
Header 1
dolor sit amet Check out01
video tutorials atdolor
Lorem ipsum … Coming
sit amet soon…
Header 1
Lorem ipsum
dolor sit amet

Your headline will go here


You
Header 1
02 Lorem ipsum dolor sit amet

Header 2
Lorem ipsum For any queries please write to us at
dolor sit amet
adroy@yahoo.com
Header 1
03 Lorem ipsum dolor sit amet

You might also like