Impairment Modelling Using R v1.0

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Regla® IFRS 9 – Impairment Modelling Using R

INTERNAL TO EMERIO AND STRICTLY CONFIDENTIAL: PLEASE DO NOT DISTRIBUTE, COPY OR DISSEMINATE
R and .NET Architecture Diagram

R base (version 3.2 or above) has to be installed and any additional


packages/libraries required can be installed separately in Web Server

1/11/2019 1
Impairment - Forward Looking Process

Macro
Economic – Regression Out Of
Variable In Sample
Single Model Sample
Clustering Tests
Variable Process Tests
Selection

Calculate Z- Time Series


Additional
factor (Credit Models (HW &
Process
Index Model) ARIMA)

1/11/2019 2
Step 1 : Macro Economic Modelling
1.1 Read Config & setup 1.8 Save result to DB & R file
• DB Connection & Model Parameters • Variables selected to DB
• Command Line Arguments • Transformed MEV& Dep. Variable to R file

1.2 MEV Transformation 1.7 Variable Selection Process


• Do all possible transformation & align column wise • 1 or more variable passing correlation analysis
• Save to File • variables passing Linear model (sig. coeff.)

1.3 Get & Process dependent variable


1.6 Additional Correlation Analysis
• Logit Transform
• Adjust Correlation for _C & _M transformation
• Stationarity & Transform

1.4 MEV Processing 1.5 Single Variable Analysis


• Stationarity Test • Correlation between Dep. Var & MEV & Correlation
• Normalize variable Test
• Linear Regression between Dep. Var & MEV

1/11/2019 3
Step 2 : Variable Clustering Process Flow

Get DATA (All Ind. Vars)


Is 2nd eigenvalue >
N
END
threshold

Do PCA and get eigenvalues & Y


eigenvectors Do and save calculations on factor
loadings2
Split into 2 clusters based on
difference between absolute value of
factors loadings3

Save the rotated data (eigenvector * Save the factor loadings from
centered & scaled DATA) oblique rotation
Repeat from PCA step until end for
each cluster

Do oblique rotation on first two


Do and save other PCA calculations1
eigenvectors

1 - Variation explained : 1st eigenvalue , Total variation : Sum(all eigenvalues), Proportion explained : Variation explained / Total Variation
2 – r2.own : square of correlation between each column DATA & rotated data , r2.next : Initially all 0. otherwise correlation between each column DATA & next nearest cluster rotated data,
r2.ration : (1-r2.own) / (1-r2.next)
3 – If abs(fac1)-abs(fac2) >= 0 then 1st cluster else 2nd cluster

1/11/2019 4
Step 3 : Regression Model Process
3.8 Save result to DB & R file
3.1 Read Config & setup
• Models filtered , Full & In-Sample Data etc., to R
• DB Connection & Model Parameters file
• Command Line Arguments • Save model result to DB & flat file

3.2 Variable Selection & get Transformed MEV 3.7 Filter Models
• Get Variable Selection • Statistical test, VIF, Significant Coeff., Adj. R2
[VW_TMP_IFRS_MODEL_FILTER]
• Coefficient sign with economic trend
• Load R file & get transformed MEV, Dep. variable

3.3 Generate data frame for linear model 3.6 Process Variable Combination
• Get the In-sample Period & MEV Variables (either • All possible variable combination of 2 & 3 vars.
Cluster or Combination) and filter data • Check variables in same MEV group

3.4 Initialize data tables 3.5 Iterate Cluster list


• Calculate number of models & initialize list • Generate all possible linear model combination
• Initialize model seq and model count for each list among cluster (by choosing one var. from each
cluster)

1/11/2019 5
Step 4 : Out Of Sample Test Process
4.1 Read Config & setup 4.8 Save result to DB
• DB Connection & Model Parameters • Save model out of sample result to DB
• Command Line Arguments

4.2 Models filtered & get full data 4.7 Top N Models
• Load R file & get Models passed, full transformed • Filter top N model based on MSE/MAPE of actual
MEV, Dep. Variable etc., dep. var

4.3 Generate Out of Sample data frame 4.6 Calculate MSE & MAPE
• Filter from full transformed MEV data based on Out • Calculate MSE & MAPE for each model for both
of sample period Logit var. & actual dep. var

4.4 Calculate the Out of sample result


4.5 Transform for Logit of Dep. var
• Multiply model coeff. with out of sample data for
• Transform from LOG to Rate [exp(var)/1+exp(var)]
each period for corresponding variables and sum

1/11/2019 6
Step 4 : In Sample Test Process
5.1 Read Config & setup 5.8 Save result to DB
• DB Connection & Model Parameters • Save model out of sample result to DB
• Command Line Arguments

5.2 Models filtered & get full data 5.7 Top N Models
• Load R file & get Models passed, full transformed • Filter top N model based on MSE/MAPE of actual
MEV, Dep. Variable etc., dep. var

5.3 Generate In Sample data frame 5.6 Calculate MSE & MAPE
• Filter from full transformed MEV data based on In- • Calculate MSE & MAPE for each model for both
sample period Logit var. & actual dep. var

5.4 Calculate the In-sample result


5.5 Transform for Logit of Dep. var
• Multiply model coeff. with in- sample data for each
• Transform from LOG to Rate [exp(var)/1+exp(var)]
period for corresponding variables and sum

1/11/2019 7
R Scripts Setup
• Identify the base/root folder (or directory) [e.g. C:\IFRS9 or C:\Users\Emerio or
/home/oracle]

• Create various folders for input, output, scripts etc.,

• Parameter file “IFRS9_Rscripts_param.ini” to


the base/root folder (or directory)

• Various Path/Folders are defined in the


parameter file

• Need to change the “Rscripts_def_dir”


& “rscripts_param_file” path in the main
R scripts (MacroEconomic.R, run_linear_
regression_model.R, run_ME_Out_Of_Sample_Test.R, run_ME_In_Sample_Test.R)

1/11/2019 8
MACRO ECONOMIC MODELLING –
SINGLE VARIABLE SELECTION
PROCESS

1/11/2019 9
Step 1.1 : Read Config & Setup
• Parameter file “IFRS9_Rscripts_param.ini”
• Function : get_ifrs9_rscripts_param [get_ifrs9_rscripts_param.R]
1. Read the file
2. Convert “data.frame” to
“data.table” & define ‘Key’
3. Convert Key & Value to
character

• Function : get_param_value [get_ifrs9_rscripts_param.R]

1. Return the Value matching the


‘Key’ from the “data.table”

1/11/2019 10
Step 1.1 : Read Config & Setup
• Command line arguments for other model parameters like DB connection, Logit,
Verbose etc.,
• Function : get_cmdline_arg [get_cmdline_arg.R]
1. Get the argument value
(based on the argno)
2. Validate if with in the
range of values
3. Otherwise return the
default value
• Function : connect_to_DB [connect_to_DB.R] – Using RJDBC package
Specify the driver, connection
string, username and
password
• Model parameters : Query the Table TMP_IFRS_MACRO_ECONOMIC_MODEL for the model id

• Model parameters are like cut off date (Out of sample, In-Sample), Adj. R2, VIF, p-value,
eigenvalue, number of clusters etc.,

1/11/2019 11
Step 1.2 : MEV Transformation
• Output of MEV transformation to be saved in the file
‘Bank_macro_trans_output_<<model_id>>.csv’
• Function : do_MEV_Transformation [do_macro_Trans.R]
Raw MEV input is FILE or Save transformed MEV to
Check MEV data type Time Series freq & start
DB output file

NOMINAL / Adjust the freq if


Quarterly Column wise append
PERCENT (certain
MEV_data_input.csv for each MEV & its
transformations not (adjust_tsstart_for_Qtrly_ transformation
applicable if freq.R)
PERCENT)
Saved to
Call
IFRS_MACRO_ECON ‘Bank_macro_trans
Default : ALL ‘derive_new_MEV_var’
OMIC _output_<<model_i
for every MEV
d>>.csv’

• Function : derive_new_MEV_var [derive_new_macro_var.R]


Create time series & apply
Define function list vectors Loop all required transformations
transformation
ts(var,start,freq) e.g. Lag for every transformation
Define full set of functions
& repeat functions Call ‘add_macro_transformation’ e.g . Diff or growth func .& other
for each transformation transformation for each diff/growth

• Function : add_macro_transformation [add_macro_transformation.R]


1/11/2019 12
Step 1.3 : Dependent Variable Process
• Function : get_dependent_variable_data [get_dependent_variable_data.R]

• Get the data from DB (DB query based on the variable type like ODR, PD or LGD
etc.,)

• If use_Logit flag enabled, add new dependent Variable [log(var/(1-var))]

• If transform_PD_flg flag enabled, add transformation of dependent variable [e.g.


added Lagged dependent variables]

• If testStationarityDep flag enabled, do stationarity test


– Function : do_Stationarity_Test_ts [do_Stationarity_Test.R]
– Package : fUnitRoots, Function : adfTest [3 types available, nc – no constant nor trend, c – constant
& no trend, ct (default) – constant & trend]
– Keep doing differencing [e.g. 1st order differencing : vart – vart-1] if not stationary (until max. of 12
lags)
– Save the ADF Test results in ‘TMP_IFRS_R_ADF_TEST_DTL’
1/11/2019 13
Step 1.4 : MEV Processing
• Read the MEV transformed data ‘Bank_macro_trans_output_<<model_id>>.csv’

• Convert the data to ts & xts (extensible time series) [xts package provides more
functions for handling time series data like filter, conversion etc.,]

• If stdzindvar flag is enabled,


– 1. Split the data until cutoff_date (e.g. Dec2016) and normalize [i.e. subtract mean & divide by SD]

– 2. Use the mean and stddev for data until cutoff_date (e.g. Dec2016) to normalize for remaining data

– 3. bind the above 2 data sets and form the full data [Both In-sample & Out of sample]

1/11/2019 14
Step 1.4 : MEV Processing
• If test_Stationarity_MEV_flg flag is enabled
– Function : do_Stationarity_Test_MEV [do_Stationarity_Test_MEV.R]
– Package : fUnitRoots , Function : adfTest [3 types available, nc – no constant nor trend, c –
constant & no trend, ct (default) – constant & trend]
– Collect the adf statistic and p-value for all the 3 types

[Alternatively package : urca , function : ur.df which also has 3 types “none”, “drift” & “trend” for
stationarity test is available and can be used]

1/11/2019 15
Step 1.5 : Single Variable Analysis
• Correlation data set : If corr_using_norm_MEV_flg flag is enabled, use the
standardized (or normalized) data. Otherwise use the transformed MEV data
• Define the starting and ending period for correlation calculation
• For correlation calculation we need the economic trend & this can be either FILE or
DB based.
– Function : do_correlation_test_DB [do_correlation_test_DB.R] / do_correlation_test_FILE
[do_correlation_test.R]
– DB : TMP_IFRS_INDEPENDENT_VARIABLE, FILE : MEF_Expected_Trend.xlsx
– Calculate correlation between dep. Var & all transformed MEV [cor function]

– Update “Expected_Trend” & “lookupinitial” fields (i.e. variable group name) for each MEV from
trend data
– If ‘cor.test.flag’ is enabled, do correlation test (hypothesis test) between each MEV & dep. Var and
calculate the statistic & p-value [p-value <= 0.05 is successful]

– Set the ‘Pass’ field based on


• number of samples > minimum samples
• Correct economic trend sign [e.g . correlation > 0 if trend is +ve & < 0 if trend is -ve]
• Correlation > correlation threshold param [mev_model_setting$ CORRELATION_FILTER_VALUE, default 0.5]
• Correlation test p-value (if cor.test.flag enabled)
– Save the correlation result to ‘Bank_correlation_output_FILE.csv’
1/11/2019 [Bank_correlation_output_DB_<<model_id>>.csv if DB] 16
Step 1.5 : Single Variable Analysis
• If test_1MEV_lm_flg flag is enabled, do the linear model coefficient significant
check for each MEV against dep. Variable
– Function : do_linear_reg_1MEV [do_linear_reg_1MEV.R] – Using the correlation data set
– Shortlist MEV variables which has minimum samples (same as dep. var count)

– Build linear regression model for shortlisted variables and collect key output like Adj. R2, coefficient
estimate, p-value etc.,

1/11/2019 17
Step 1.6 : Additional Correlation Analysis
• Sometimes YoY change (_C) or Moving avg. (_M – 3,6 or 12 months)
transformation has greater weight on correlation result & hence need some
adjustment
• If adj_corr_CM_trans_flg flag is enabled, in the correlation result data set
– set “TRANSFORM_C” & “TRANSFORM_M” to 1 for MEV transformation with _C & _M

– Sum the “TRANSFORM_C” & “TRANSFORM_M” fields [SUM_TRANSFORM] & adjust the final
absolute correlation with product of correlation adj. factor (default 0.2) & SUM_TRANSFORM

1/11/2019 18
Step 1.7 : Variable Selection Process
• If test_Stationarity_MEV_flg flag is enabled, combine Correlation result and
Stationarity result. Otherwise use only Correlation result

– If chk_any_stationarity_flg flag is enabled, check any one of the 3 stationarity type p-value with
‘stationarity_pvalue_param’ (e.g. 0.05 or 0.1). Otherwise check stationarity with constant & trend
• If check_econ_trend_flg flag is enabled, validate the economic trend sign (+ve / -ve)
with correlation value

• If choose_max_corr_var_flg flag is enabled, choose only 1 MEV with ‘Pass = 1’


having max. adjusted correlation from each variable group (lookupinitial).
Otherwise choose all ‘Pass = 1’ variables as selected

1/11/2019 19
Step 1.8 : Save result to DB & R file
• For Saving to DB :
– Table TMP_IFRS_MODEL_FILTER_DTL : Only MEV Stationarity & no Correlation test
– Table TMP_IFRS_MODEL_FILTER_CORTEST : Only Correlation test & no Stationarity
– Table TMP_IFRS_MODEL_FILTER_DTL_FULL : Both Stationarity & Correlation test
– Table TMP_IFRS_1MEVLM_RESULT_DTL : 1 MEV linear model test
• For Saving to DB,
– Delete the data before inserting (If DB key constraints are set) – Using dbSendUpdate function

– Function : save_to_DB [save_to_DB.R] – Using RJDBC Package ‘dbWriteTable’ function

• For Saving to R file:


– Specify the filename
– Define the object name & assign the R objects to it

– Function : resave [save_R_objects.R] – using save function

– resave function will overwrite the object if already exists in the R file

1/11/2019 20
REGRESSION MODEL PROCESS

1/11/2019 21
Step 3.2 : Variable Selection, Get MEV & Dep.
Var data
• If use_varclus_flg flag is enabled, load the Cluster result from
‘ME_VarClus_Model_ROut_<<model_id>>.R’ file
• If use_varcomb_flg flag is enabled, get the MEV selected from DB view
‘VW_TMP_IFRS_MODEL_FILTER’ with ‘VARIABLE_SELECTION = 1’
– Limit max. number of MEV to 72
– Identify the distinct variable group & calculate the max. number of MEV per group

– If total number of MEV greater than limit, override with top N (per variable group) based on
ADJ_CORRELATION (descending) for each MEV group

• Load the R objects file and get the saved R objects from previous MEV
transformation process

• Get the dependent variable data

1/11/2019 22
Step 3.3 : Generate data.frame for linear model
• Get the MEV variables (either from cluster or individual MEV selected)

• If chk_varcombn_same_group_flg flag is enabled, need to check MEV in same


group or not

• Define the In-sample period (start & end) and filter the data

• data.frame ‘finaldf_lm’ will be used in the linear regression model


1/11/2019 23
Step 3.4 : Initialize data.table
• Initialize variable ‘insModel = datatable’ to enable using data.table (otherwise we
use data.frame)
– 5 data sets required for linear regression model
– frs_scn_model : Model formula (e.g. ODR_LOG ~ GDP + INFLATION)
– regr_res : Model anova details & other results like Adj. R2, SSE, SST etc.,
– regr_coeff : Model coefficients, p-value & significance
– regr_stats : Model statistic test like Normality, Homoskedascticity,Auto correlation
– regr_vif_res : Model multi-collinearity result (using VIF – Variance Inflation Factor)

• Keep a count variable for all the data sets & initialize to 1
– ‘(frs_scn_model_cnt, regr_res_cnt, regr_coeff_cnt, regr_stats_cnt, regr_vif_res_cnt) = 1’
• Initialize model seq (mseq = 0) & no. of statistical test (e.g. BLUE_test_cnt = 4)
• Calculate total possible models and create a list for all data sets for that total
– For Clustering model :

1/11/2019 24
Step 3.4 : Initialize data.table
• Initialization when using variable combination (use_varcomb_flg flag enabled)
– Enable incl_dep_lags_flg flag when needed to add lagged dep. variables to the model

– Also need to create new data.frame wit h the above Lagged dep. Vars

• Define the variable combination fields possible (e.g. 2 & 3)


– calculate variable combinations & total models
– Create the list of all data sets for the possible total models

1/11/2019 25
Step 3.5 : Iterate Cluster
• use_varclus_flg flag is enabled
• Identify number of clusters
• Create linear model by selecting one variable from each cluster

• run_linear_reg_model function calls ‘add_linear_reg_model_li’


[add_linear_reg_model_li.R]
• Once a successful model is created, it is added to the list of data set [internal
function : add_list_to_model]

Example of how regr_coeff is added

1/11/2019 26
Step 3.6 : Process Variable Combination
• use_varcomb_flg flag is enabled
• Generate all possible combinations from variable list

• If ‘chk_varcombn_same_group_flg’ flag is enabled, check variables in model belong


to same variable group. If found, skip the model & continue with next combination

• Create the model & collect the result and add it to the list data set

• run_linear_reg_model function calls ‘add_linear_reg_model_li’


[add_linear_reg_model_li.R]
• Once a successful model is created, it is added to the list of data set [internal function
: add_list_to_model]

Example of how regr_coeff is added

1/11/2019 27
Steps after all model creation
• Convert list data set to data.table

• Save the data.table to file (just as backup) and if ‘save_regr_to_DB’ flag enabled,
save it to DB
– Function : save_to_FILE (save_to_FILE.R) [Using R write.table function]

• Convert data.table columns to numeric [internally they become character as we


concatenate the columns]
Example of how regr_coeff table columns
converted

1/11/2019 28
Step 3.7 : Filter models
• add new ‘FINAL_STATUS’ field to regr_stats to based on the p-value for all statistical
tests [p-value >= 0.05 for the test to be successful]

• Other parameters are VIF_VALUE (for multi-collinearity) ,


ADJUSTED_R_SQUARE_VALUE [mev_model_setting],
min_no_vars [no. of Cluster or minimum number of variable combination (2)]
• Flag ‘chk_Expected_Trend_coeff_flg’ to be enabled to check model coefficient same
as expected economic trend [e.g. coeff > 0 if trend is +ve]
• Internal function : filter_linear_models to filter models based on parameters
• Flag ‘do_MEV_stationarity_flg’ to be enabled to test Stationarity of selected models
MEV
• Flag ‘rebuild_lm_sigf’ to be set if need to rebuild linear models using only the
significant variables (with /without intercept - add_Intercept_for_all_models flag)
– For the above step, we need to build the linear regression formula from the selected model coefficients
– Need to add (based on flag) or remove the intercept if not significant
– Add to the list data set & convert to data.table (with new model seq no)
– Again apply the model filtering rules & get the final selected model
1/11/2019 29
Step 3.7 : Filter models
6. From models filtered
1. Models passing all 7. If no models from
from Step 4/5, select
statistical test, VIF < Step 6, then just select
models with all
(threshold) & having at models & variables with
variables having
least 2 vars (or no. of significant coefficients
coefficient significant &
cluster) & VIF passed
VIF passed

2. Models passing 1 less 5. If no models from 8. For models from


than all statistical test, Step 4, just select top N Step 6/7, Check model
VIF < (threshold) & models by Adj. R2 coefficients with
having at least 2 vars descending (highest economic trend for
(or no. of cluster) first) corrs. Variable group

9. For models filtered


3. If no models from 4. From models filtered
from Step 8, get the
Step 1 & 2, loop Step 2 from Step 1/2/3, now
model seq, variable
with incrementing VIF select models by Adj.
name & it’s coefficient
(max. loop 10 times) R2 parameter
and VIF value as output

1/11/2019 30
Step 3.8 : Save Result to R
• Define the R object file & save the required objects
– Filename : ‘ME_linreg_ROut_<<model_id>>.R’

1/11/2019 31
BACK TEST PROCESS (OUT-OF-
SAMPLE / IN-SAMPLE)

1/11/2019 32
Step 4.2/5.2 : Get Models Filtered & Data
• Load the R object file and get the Model & data
– Filename : ‘ME_linreg_ROut_<<model_id>>.R’
– R objects : ‘model_vars_list_<<model_id>>’ – Models filtered, ‘final_xts_full_<<model_id>>’ – Full
MEV data transformed, ‘dep_var_col_<<model_id>>’ – Dep. variable

1/11/2019 33
Step 4.3 / 5.3 : Generate the Test Data
• Define the date range for the test data
– Out of Sample : From cutoff date [mev_model_setting.CUT_OFF_DATE] + 1 until Last Sample date
[mev_model_setting.OUT_OF_SAMPLE_LASTDT]
– In-Sample : need the 1st sample date & cutoff date [mev_model_setting.CUT_OFF_DATE]

• Filter the date range from the full data


– Out Of Sample :

– In Sample :

1/11/2019 34
Step 4.4 / 5.4 : Calculate the Model Fitted results
• Add the period as columns & get the macro value for the period [for intercept it’s 1]
• Multiply the coefficients with value and sum up for each model seq

• Out Of Sample:

1/11/2019 35
Step 4.4 / 5.4 : Calculate the Model Fitted results
• In-Sample:

1/11/2019 36
Step 4.5 / 5.5 : Transform for Logit of Dep. Var
• Create a data.frame to get the Actual & Fitted (model) values for each model for
each period & apply the transformation formula from Logit to actual
• Out Of Sample:

1/11/2019 37
Step 4.5 / 5.5 : Transform for Logit of Dep. Var
• In-Sample:

1/11/2019 38
Step 4.6 / 5.6 : Calculate MSE & MAPE for Models
• MSE : Mean Squared Error [Average [Actual - Fitted]^2]
• MAPE : Mean Absolute Percentage Error [Average SUM[abs((Actual -
Fitted)/Actual)] * 100]
• Out Of Sample:

– When Logit is enabled

1/11/2019 39
Step 4.6 / 5.6 : Calculate MSE & MAPE for Models
• In-Sample:

– When Logit is enabled

1/11/2019 40
Step 4.7 / 5.7 : Choose Top N Models
• Out Of Sample:

• In-Sample:

1/11/2019 41
Q&A
1/11/2019 42

You might also like