Professional Documents
Culture Documents
Impairment Modelling Using R v1.0
Impairment Modelling Using R v1.0
Impairment Modelling Using R v1.0
INTERNAL TO EMERIO AND STRICTLY CONFIDENTIAL: PLEASE DO NOT DISTRIBUTE, COPY OR DISSEMINATE
R and .NET Architecture Diagram
1/11/2019 1
Impairment - Forward Looking Process
Macro
Economic – Regression Out Of
Variable In Sample
Single Model Sample
Clustering Tests
Variable Process Tests
Selection
1/11/2019 2
Step 1 : Macro Economic Modelling
1.1 Read Config & setup 1.8 Save result to DB & R file
• DB Connection & Model Parameters • Variables selected to DB
• Command Line Arguments • Transformed MEV& Dep. Variable to R file
1/11/2019 3
Step 2 : Variable Clustering Process Flow
Save the rotated data (eigenvector * Save the factor loadings from
centered & scaled DATA) oblique rotation
Repeat from PCA step until end for
each cluster
1 - Variation explained : 1st eigenvalue , Total variation : Sum(all eigenvalues), Proportion explained : Variation explained / Total Variation
2 – r2.own : square of correlation between each column DATA & rotated data , r2.next : Initially all 0. otherwise correlation between each column DATA & next nearest cluster rotated data,
r2.ration : (1-r2.own) / (1-r2.next)
3 – If abs(fac1)-abs(fac2) >= 0 then 1st cluster else 2nd cluster
1/11/2019 4
Step 3 : Regression Model Process
3.8 Save result to DB & R file
3.1 Read Config & setup
• Models filtered , Full & In-Sample Data etc., to R
• DB Connection & Model Parameters file
• Command Line Arguments • Save model result to DB & flat file
3.2 Variable Selection & get Transformed MEV 3.7 Filter Models
• Get Variable Selection • Statistical test, VIF, Significant Coeff., Adj. R2
[VW_TMP_IFRS_MODEL_FILTER]
• Coefficient sign with economic trend
• Load R file & get transformed MEV, Dep. variable
3.3 Generate data frame for linear model 3.6 Process Variable Combination
• Get the In-sample Period & MEV Variables (either • All possible variable combination of 2 & 3 vars.
Cluster or Combination) and filter data • Check variables in same MEV group
1/11/2019 5
Step 4 : Out Of Sample Test Process
4.1 Read Config & setup 4.8 Save result to DB
• DB Connection & Model Parameters • Save model out of sample result to DB
• Command Line Arguments
4.2 Models filtered & get full data 4.7 Top N Models
• Load R file & get Models passed, full transformed • Filter top N model based on MSE/MAPE of actual
MEV, Dep. Variable etc., dep. var
4.3 Generate Out of Sample data frame 4.6 Calculate MSE & MAPE
• Filter from full transformed MEV data based on Out • Calculate MSE & MAPE for each model for both
of sample period Logit var. & actual dep. var
1/11/2019 6
Step 4 : In Sample Test Process
5.1 Read Config & setup 5.8 Save result to DB
• DB Connection & Model Parameters • Save model out of sample result to DB
• Command Line Arguments
5.2 Models filtered & get full data 5.7 Top N Models
• Load R file & get Models passed, full transformed • Filter top N model based on MSE/MAPE of actual
MEV, Dep. Variable etc., dep. var
5.3 Generate In Sample data frame 5.6 Calculate MSE & MAPE
• Filter from full transformed MEV data based on In- • Calculate MSE & MAPE for each model for both
sample period Logit var. & actual dep. var
1/11/2019 7
R Scripts Setup
• Identify the base/root folder (or directory) [e.g. C:\IFRS9 or C:\Users\Emerio or
/home/oracle]
1/11/2019 8
MACRO ECONOMIC MODELLING –
SINGLE VARIABLE SELECTION
PROCESS
1/11/2019 9
Step 1.1 : Read Config & Setup
• Parameter file “IFRS9_Rscripts_param.ini”
• Function : get_ifrs9_rscripts_param [get_ifrs9_rscripts_param.R]
1. Read the file
2. Convert “data.frame” to
“data.table” & define ‘Key’
3. Convert Key & Value to
character
1/11/2019 10
Step 1.1 : Read Config & Setup
• Command line arguments for other model parameters like DB connection, Logit,
Verbose etc.,
• Function : get_cmdline_arg [get_cmdline_arg.R]
1. Get the argument value
(based on the argno)
2. Validate if with in the
range of values
3. Otherwise return the
default value
• Function : connect_to_DB [connect_to_DB.R] – Using RJDBC package
Specify the driver, connection
string, username and
password
• Model parameters : Query the Table TMP_IFRS_MACRO_ECONOMIC_MODEL for the model id
• Model parameters are like cut off date (Out of sample, In-Sample), Adj. R2, VIF, p-value,
eigenvalue, number of clusters etc.,
1/11/2019 11
Step 1.2 : MEV Transformation
• Output of MEV transformation to be saved in the file
‘Bank_macro_trans_output_<<model_id>>.csv’
• Function : do_MEV_Transformation [do_macro_Trans.R]
Raw MEV input is FILE or Save transformed MEV to
Check MEV data type Time Series freq & start
DB output file
• Get the data from DB (DB query based on the variable type like ODR, PD or LGD
etc.,)
• Convert the data to ts & xts (extensible time series) [xts package provides more
functions for handling time series data like filter, conversion etc.,]
– 2. Use the mean and stddev for data until cutoff_date (e.g. Dec2016) to normalize for remaining data
– 3. bind the above 2 data sets and form the full data [Both In-sample & Out of sample]
1/11/2019 14
Step 1.4 : MEV Processing
• If test_Stationarity_MEV_flg flag is enabled
– Function : do_Stationarity_Test_MEV [do_Stationarity_Test_MEV.R]
– Package : fUnitRoots , Function : adfTest [3 types available, nc – no constant nor trend, c –
constant & no trend, ct (default) – constant & trend]
– Collect the adf statistic and p-value for all the 3 types
[Alternatively package : urca , function : ur.df which also has 3 types “none”, “drift” & “trend” for
stationarity test is available and can be used]
1/11/2019 15
Step 1.5 : Single Variable Analysis
• Correlation data set : If corr_using_norm_MEV_flg flag is enabled, use the
standardized (or normalized) data. Otherwise use the transformed MEV data
• Define the starting and ending period for correlation calculation
• For correlation calculation we need the economic trend & this can be either FILE or
DB based.
– Function : do_correlation_test_DB [do_correlation_test_DB.R] / do_correlation_test_FILE
[do_correlation_test.R]
– DB : TMP_IFRS_INDEPENDENT_VARIABLE, FILE : MEF_Expected_Trend.xlsx
– Calculate correlation between dep. Var & all transformed MEV [cor function]
– Update “Expected_Trend” & “lookupinitial” fields (i.e. variable group name) for each MEV from
trend data
– If ‘cor.test.flag’ is enabled, do correlation test (hypothesis test) between each MEV & dep. Var and
calculate the statistic & p-value [p-value <= 0.05 is successful]
– Build linear regression model for shortlisted variables and collect key output like Adj. R2, coefficient
estimate, p-value etc.,
1/11/2019 17
Step 1.6 : Additional Correlation Analysis
• Sometimes YoY change (_C) or Moving avg. (_M – 3,6 or 12 months)
transformation has greater weight on correlation result & hence need some
adjustment
• If adj_corr_CM_trans_flg flag is enabled, in the correlation result data set
– set “TRANSFORM_C” & “TRANSFORM_M” to 1 for MEV transformation with _C & _M
– Sum the “TRANSFORM_C” & “TRANSFORM_M” fields [SUM_TRANSFORM] & adjust the final
absolute correlation with product of correlation adj. factor (default 0.2) & SUM_TRANSFORM
1/11/2019 18
Step 1.7 : Variable Selection Process
• If test_Stationarity_MEV_flg flag is enabled, combine Correlation result and
Stationarity result. Otherwise use only Correlation result
– If chk_any_stationarity_flg flag is enabled, check any one of the 3 stationarity type p-value with
‘stationarity_pvalue_param’ (e.g. 0.05 or 0.1). Otherwise check stationarity with constant & trend
• If check_econ_trend_flg flag is enabled, validate the economic trend sign (+ve / -ve)
with correlation value
1/11/2019 19
Step 1.8 : Save result to DB & R file
• For Saving to DB :
– Table TMP_IFRS_MODEL_FILTER_DTL : Only MEV Stationarity & no Correlation test
– Table TMP_IFRS_MODEL_FILTER_CORTEST : Only Correlation test & no Stationarity
– Table TMP_IFRS_MODEL_FILTER_DTL_FULL : Both Stationarity & Correlation test
– Table TMP_IFRS_1MEVLM_RESULT_DTL : 1 MEV linear model test
• For Saving to DB,
– Delete the data before inserting (If DB key constraints are set) – Using dbSendUpdate function
– resave function will overwrite the object if already exists in the R file
1/11/2019 20
REGRESSION MODEL PROCESS
1/11/2019 21
Step 3.2 : Variable Selection, Get MEV & Dep.
Var data
• If use_varclus_flg flag is enabled, load the Cluster result from
‘ME_VarClus_Model_ROut_<<model_id>>.R’ file
• If use_varcomb_flg flag is enabled, get the MEV selected from DB view
‘VW_TMP_IFRS_MODEL_FILTER’ with ‘VARIABLE_SELECTION = 1’
– Limit max. number of MEV to 72
– Identify the distinct variable group & calculate the max. number of MEV per group
– If total number of MEV greater than limit, override with top N (per variable group) based on
ADJ_CORRELATION (descending) for each MEV group
• Load the R objects file and get the saved R objects from previous MEV
transformation process
1/11/2019 22
Step 3.3 : Generate data.frame for linear model
• Get the MEV variables (either from cluster or individual MEV selected)
• Define the In-sample period (start & end) and filter the data
• Keep a count variable for all the data sets & initialize to 1
– ‘(frs_scn_model_cnt, regr_res_cnt, regr_coeff_cnt, regr_stats_cnt, regr_vif_res_cnt) = 1’
• Initialize model seq (mseq = 0) & no. of statistical test (e.g. BLUE_test_cnt = 4)
• Calculate total possible models and create a list for all data sets for that total
– For Clustering model :
1/11/2019 24
Step 3.4 : Initialize data.table
• Initialization when using variable combination (use_varcomb_flg flag enabled)
– Enable incl_dep_lags_flg flag when needed to add lagged dep. variables to the model
– Also need to create new data.frame wit h the above Lagged dep. Vars
1/11/2019 25
Step 3.5 : Iterate Cluster
• use_varclus_flg flag is enabled
• Identify number of clusters
• Create linear model by selecting one variable from each cluster
1/11/2019 26
Step 3.6 : Process Variable Combination
• use_varcomb_flg flag is enabled
• Generate all possible combinations from variable list
• Create the model & collect the result and add it to the list data set
1/11/2019 27
Steps after all model creation
• Convert list data set to data.table
• Save the data.table to file (just as backup) and if ‘save_regr_to_DB’ flag enabled,
save it to DB
– Function : save_to_FILE (save_to_FILE.R) [Using R write.table function]
1/11/2019 28
Step 3.7 : Filter models
• add new ‘FINAL_STATUS’ field to regr_stats to based on the p-value for all statistical
tests [p-value >= 0.05 for the test to be successful]
1/11/2019 30
Step 3.8 : Save Result to R
• Define the R object file & save the required objects
– Filename : ‘ME_linreg_ROut_<<model_id>>.R’
1/11/2019 31
BACK TEST PROCESS (OUT-OF-
SAMPLE / IN-SAMPLE)
1/11/2019 32
Step 4.2/5.2 : Get Models Filtered & Data
• Load the R object file and get the Model & data
– Filename : ‘ME_linreg_ROut_<<model_id>>.R’
– R objects : ‘model_vars_list_<<model_id>>’ – Models filtered, ‘final_xts_full_<<model_id>>’ – Full
MEV data transformed, ‘dep_var_col_<<model_id>>’ – Dep. variable
1/11/2019 33
Step 4.3 / 5.3 : Generate the Test Data
• Define the date range for the test data
– Out of Sample : From cutoff date [mev_model_setting.CUT_OFF_DATE] + 1 until Last Sample date
[mev_model_setting.OUT_OF_SAMPLE_LASTDT]
– In-Sample : need the 1st sample date & cutoff date [mev_model_setting.CUT_OFF_DATE]
– In Sample :
1/11/2019 34
Step 4.4 / 5.4 : Calculate the Model Fitted results
• Add the period as columns & get the macro value for the period [for intercept it’s 1]
• Multiply the coefficients with value and sum up for each model seq
• Out Of Sample:
1/11/2019 35
Step 4.4 / 5.4 : Calculate the Model Fitted results
• In-Sample:
1/11/2019 36
Step 4.5 / 5.5 : Transform for Logit of Dep. Var
• Create a data.frame to get the Actual & Fitted (model) values for each model for
each period & apply the transformation formula from Logit to actual
• Out Of Sample:
1/11/2019 37
Step 4.5 / 5.5 : Transform for Logit of Dep. Var
• In-Sample:
1/11/2019 38
Step 4.6 / 5.6 : Calculate MSE & MAPE for Models
• MSE : Mean Squared Error [Average [Actual - Fitted]^2]
• MAPE : Mean Absolute Percentage Error [Average SUM[abs((Actual -
Fitted)/Actual)] * 100]
• Out Of Sample:
1/11/2019 39
Step 4.6 / 5.6 : Calculate MSE & MAPE for Models
• In-Sample:
1/11/2019 40
Step 4.7 / 5.7 : Choose Top N Models
• Out Of Sample:
• In-Sample:
1/11/2019 41
Q&A
1/11/2019 42