Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Change Point Modelling in the Vulnerability

Discovery Process

Ruchi Sharma(&), Ritu Sibal, and Sangeeta Sabharwal

Netaji Subhas Institute of Technology, Delhi, India


rs.sharma184@gmail.com, ritusib@hotmail.com,
ssab63@gmail.com

Abstract. The process of vulnerability discovery and its successful fixation is


dependent on various factors like testing strategy, test case effectiveness, team
constitution, efficiency, and environmental factors. These factors are prone to
changes over the period of time. Change point analysis is the process of
detecting this point at which the cumulative effect of factors affects the rate of
change of vulnerability discovery. In this paper, we propose a mathematical
model which captures point of switch or change in the regression. The practical
utility of the model is confirmed by validating it on three real life software
datasets. The results validate that the proposed model with change point con-
sideration shows a better goodness of fit in comparison with mathematical
models without change point.

1 Introduction

Quantitative assessment of the quality of a software plays a crucial role in providing a


secure and reliable software [3, 4]. Vulnerability discovery models helps in this quality
assessment of software by analyzing the trend of vulnerability occurrence. Vulnera-
bilities are the faults which have the potential of causing security breaches and hence
obstructing the confidentiality, integrity or availability of crucial resources in a system
which might further lead to monetary losses. Moreover, the information extracted as a
result of these security breaches can be used by the attackers to cause damage.
Therefore, testing of a software is continued even after it is released in the market so as
to come up with patches which can fix the vulnerabilities that are residing in the
software in its operational phase. This process of vulnerability fixation is preceded by
vulnerability detection or discovery.
Various factors affect the process of vulnerability discovery. These factors may
include strategy adopted, testing environment, effectiveness of test cases, skills of
testing team, efficiency and many more [5–7]. VDMs are formulated to track their
growth rate at any point of time during operational phase while taking in consideration
some or all of these factors. The models are formulated while taking certain assump-
tions on the process [1, 2].
The parameters of these models represent various factors associated with the trend of
vulnerability discovery [5, 7]. When VDM is applied in real software environment to
predict the number of vulnerabilities it is usually carried out under the assumption that
the parameters of the VDM will remain unchanged during the course of testing.
© Springer Nature Singapore Pte Ltd. 2019
A. K. Luhach et al. (Eds.): ICAICR 2018, CCIS 956, pp. 559–568, 2019.
https://doi.org/10.1007/978-981-13-3143-5_46
560 R. Sharma et al.

However this assumption may not hold true for all the cases. It may happen that once the
process of testing is started, the team decided to include a more experienced and skilled
member to be a part of the team. They may also introduce changes in the current testing
strategies while adopting automated tools for better results. The basic motive is to come
up with best possible results and this might lead to multiple changes during the testing
process. Now, in such a situation, the model parameters estimated without considering
these changes may not be able to give an accurate description of the testing progress.
Owing to these changes some or all of the parameters used in the model may show
significant variations that are usually observed in the operational environments [22].
The paper organization is as follows: Sect. 2 presents the existing related work,
followed by notations, assumptions and framework of the proposed model in Sect. 3.
In Sect. 4, the change point based vulnerability discovery model has been developed.
Section 5 includes the numerical illustration of the proposed model and its analysis.
Finally conclusion has been drawn in Sect. 6 based on the model development and
analysis.

2 Related Work

Over the years, researchers have tried to come up with different mathematical models
with varied assumptions for this process of vulnerability discovery. Anderson’s ther-
modynamic model was the first model in the direction of modelling the process of
vulnerability discovery [10, 21]. It was initially developed in the field of thermody-
namics and later applied to model the vulnerability detection process. Though, it did
not give appreciable results, it marked the beginning of research in this direction. Soon
after the Anderson’s thermodynamic model, Rescorla proposed two linear and expo-
nential models which gave slightly better results. The most widely used model termed
as Alhazmi Malaiya Logistic (AML) [1, 2]. Model came out in early 90’s and gave
commendable results. Their vulnerability [24] prediction results for windows operating
system were very close to the actual number of vulnerabilities observed. They also
proposed an effort based discovery model and called it Alhazmi-Malaiya Effort-based
(AME) model but that does not show any improvement in results over AML. AML
established that vulnerability discovery curve is sigmoid in shape due to 3 phases
involving learning, linear and saturation. The results obtained using AML model were
very close to the observed data.
Later, Joh et al. [13] used Weibull distribution function to model the process of
vulnerability discovery but could not give appreciable predictions. Kapur et al. [14]
gave two models and made a comparison with AML model. They developed these
models using two different distribution functions. Shrivastava et al. [20] modelled a
VDM using the famous AML model creäted around the stochastic differential equation.
Kim [17] talked about the multiple versions of software during the modelling process.
Sharma et al. developed a VDM using Gamma function and discussed differences in
the process of vulnerability discovery for open and closed software systems [26].
Anand et al. [9] suggested a model for multiple versions of vulnerability discovery
framework where they assumed that the total number of vulnerabilities of the nth
Change Point Modelling in the Vulnerability Discovery Process 561

version is sum of the vulnerabilities detected in the current version and remaining
vulnerabilities that are detected in upcoming version.
In 2003, Shyur suggested that change point exists in the process of testing and
hence should be considered during model development [7]. Change point models are
extensively used in the hardware and software reliability study [22]. Here we are
proposing this phenomenon of change point in specific area of Software Vulnerability
discovery. In the existing work in this field, the models have been derived under the
assumption of a constant detection rate [1, 2, 8–11, 13, 19, 24, 26]. Researchers
proceeded with the assumption that the probability of vulnerability detection is equal
for all the vulnerabilities while in the operational phase, and the rate at which they are
detected does not change. However, in a real scenario, the Vulnerability detection rate
relies on the skills of testing team, defect density, size of the program, factors of code
expansion, team constitution, testing, testability of software etc. [7, 15, 18, 22, 25].

3 Methodology

In this study, we have used the methodology applied for reliability assessment in
Software Reliability Growth Models [22] which is now being applied to Vulnerability
Discovery Models as well [16, 20, 26].
Non Homogenous Poisson Process is a process in which the rate of arrival or
detection of an entity is random with non-stationary increments. It shows all the
properties of a Poisson process but the rate is a function of time. In this study, we have
considered the process of vulnerability discovery as an NHPP. Since, the number of
vulnerabilities detected by the testing team is random and in each time interval, the
number of vulnerabilities found may not be stationary. The VDM is hence developed
under the assumptions that hold true for NHPP processes. The various assumptions
considered during model development are described in the next section.

3.1 Assumptions
The mathematical models are usually developed with certain predefined notions treated
as model assumptions. These are the conditions that usually prevail and simplify the
process of model development. The assumptions during model development are as
follows:
i Initially the number of vulnerabilities detected are zero i.e. at zeroth time instant,
So, at t = 0, V(0) = 0.
ii The total no. of vulnerabilities detected depends on the no. of unresolved ones.
iii The software has finite no. of vulnerabilities.
iv The rate of vulnerability detection is not constant and it can change at any point.

3.2 Notations

V: Total number of Vulnerabilities present in the software.
V(t): expected no. of Vulnerabilities identified during the time interval (0,t].
562 R. Sharma et al.

r(t): time dependent rate of Vulnerability removal per remaining no. of


Vulnerabilities.
c: the point of change.
D(t): Cumulative distribution function of Vulnerability occurrence, with density
function dðtÞ ¼ dDðtÞ
dt .

4 Model Development

Recent study on change point model [22] suggested a methodology for developing the
models with change point by giving an appropriate cumulative distribution function for
the software vulnerability discovery times, D(t). The exponential model with single
change point is then developed from this framework.
As, per the Non Homogenous Poisson Process, the total count of vulnerabilities
discovered with time is in proportion with the count of residual vulnerabilities in the
software and the rate of detection. This is depicted by Eq. (1) below.
dVðtÞ   VðtÞÞ
¼ rðtÞðV ð1Þ
dt
Equation (1) above shows the general model without considering the phenomenon
of change point.
While considering the notion of change point, the rate of vulnerability detection
changes with time. Hence, in order to develop the single change point model, we take
two different hazard functions for pre and post occurrence of change point. Using
Eq. (1),
( d1 ðtÞ
r1 ðtÞ ¼ 1D 1 ðtÞ
0tc
rðtÞ ¼ d2 ðtÞ ; ð2Þ
r2 ðtÞ ¼ 1D2 ðtÞ c\t

Solving (1) using (2), with the initial condition Vð0Þ ¼ 0,



 ðtÞ
VD 0tc
VðtÞ ¼  1 ð3Þ
V ½1  ðð1  D1 ðcÞÞð1  D2 ðtÞÞ=ð1  D2 ðcÞÞÞ c\t

If V(T) is defined by a logistic distribution function i.e.

1  er1 t
D1 ðtÞ ¼ ; 0tc ð4Þ
1  ler1 t
and
1  er2 t
D2 ðtÞ ¼ ; c\t ð5Þ
1  ler2 t
Change Point Modelling in the Vulnerability Discovery Process 563

For simplification, the shape parameter in the distribution taken can be presumed to
be same for before and after the occurrence of change point. Now, the mean value
function of the VDM with change point is
8  r t

 1  ð1 þ lÞer11t ;
<V 0  t  c;
1 þ le
VðtÞ ¼   ð6Þ
:V r c
 1  ð1 þrlÞð11c þ be 2rÞ2 t eb1 cb2 ðtcÞ c\t
ð1 þ le Þð1 þ le Þ

This model represents a learning curve which is flexible and generally provides a
decent estimation of software vulnerabilities and hence the reliability and security of
the software. For mathematical simplification, it is assumed that the parameter l of the
failure time distribution is not changed and hence taken as same for the distributions for
before and after the change point. However it might be changed and the value of mean
value function can be computed accordingly for any practical real scenario.

5 Parameter Estimation

Parameters are estimated for three different software data sets using Statistical Package
for Social Sciences (SPSS).

5.1 Dataset Description


The data used for the estimation of parameters has been extracted from CVE Details
and National Vulnerability Database (NVD) [12]. The data sets contain vulnerability
data for three software namely, Adobe (D1), Windows XP (D2), and Mozilla Firefox
(D3). For parameter estimation in existing and proposed model, we have used
(MLE) maximum likelihood estimation which is based on Non-linear regression. The
existing model without change point is depicted by Eq. (1) above and the proposed
model with change point is given by Eq. (6) (Tables 1, 2 and 3).

Table 1. Parameter estimation results on D1


Parameter Estimate without change point Estimate with change point

V 6168.235 3640.083
r1 0.273 0.320
r2 – 347.856
l 288.268 0.375

5.2 Prediction Capabilities


The capability of prediction of the proposed model with change point and currently
existing model without change point are evaluated based on various comparison criteria
for vulnerability prediction on the datasets used in this study.
564 R. Sharma et al.

Table 2. Parameter estimation results on D2


Parameter Estimate without change point Estimate with change point

V 853.543 825.613
r1 0.352 0.252
r2 – 12.865
l 33.121 0.331

Table 3. Parameter estimation results on D3


Parameter Estimate without change point Estimate with change point

V 2121.472 2314.820
r1 .309 .124
r2 – 22.360
l 52.139 .245

5.2.1 Criteria for Comparison


For gauging the capability of prediction for the exiting and proposed model, we have
used following statistical comparison criteria.
Let n be the size of sample dataset, Vi is the no. of vulnerabilities by time ti (as
observed in original dataset) and VðtÞi represents the estimated no. of vulnerabilities by
time ti .
(1) Bias
X
m
Vðti Þ  Vi
Bias ¼
i¼1
m

The difference between predicted & observed number of vulnerabilities detected


at any instant of time i is Prediction error (PE). Bias is the average of PE’s.
A lesser value of Bias indicates better curve fitting and hence better prediction.
(2) Variance
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2
Variance ¼ 1=m  1ðPEi  BiasÞ

The standard deviation of Prediction Bias is used as a measure of variance from


the original dataset. Prediction Bias is the average value of all the prediction errors
taken together. A lower value of variance means better prediction.
(3) Root Mean Square Prediction Error (RMSPE)
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
RMSPE ¼ Variance2 þ Bias2

RMSPE is a measure of how closely the model predicts the observed values.
A lower RMSPE value is desired as it suggests close predictions.
Change Point Modelling in the Vulnerability Discovery Process 565

(4) Mean Square Error (MSE)


The difference between the observed data yi , and the expected values mðti Þ is
measured by MSE as follows:

X
m
ðVðti Þ  yi Þ2
M:S:E: ¼
i¼1
m

Where k denoted the no. of data points. Lower the value of MSE, lower is the
error in curve fitting and hence desirable.
(5) Coefficient of Multiple Determination (R2 )
The ratio of the sum of squares resulting from the trend model to that from
constant model subtracted from 1 is defined as the Coefficient of multiple
determination. Closer the value of R2 to 1, better is the curve fitting and hence
predictions.

residualSS
i:e: R2 ¼ 1 
correctedSS

Table 4. Comparison criteria results of model with and without change point (CP)
Datasets D1 (Adobe) D2 (Windows XP) D3 (Mozilla Firefox)
Comparison criteria Without CP With CP Without CP With CP Without CP With CP
Bias 0.287 4.74 41.56 11.71 −12.4 0.996
Variance 21.66 18.55 95.67 16.45 37.95 13.95
RMSPE 21.66 19.15 104.3046 20.19 39.95 13.98
MSE 439.8 344.9 8694.5 3398.4 1352.7 747.11
R2 0.984 0.995 0.994 0.995 0.996 0.998

3000
2500
CumulaƟve No. of VulnerabiliƟes

2000
1500
1000
500
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Time

Actual Pred_without CP Pred_with CP

Fig. 1. Goodness of fit curve for D1


566 R. Sharma et al.

800
700
CumulaƟve No. of VulnerabilƟes

600
500
400
300
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time

Actual Pred_without CP Pred_with CP

Fig. 2. Goodness of fit curve for D2

2000
1800
1600
CumulaƟve No. of VulnerabilƟes

1400
1200
1000
800
600
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Time

Actual Pred_without CP Pred_with CP

Fig. 3. Goodness of fit curve for D3

6 Conclusion and Future Work

This proposed work introduced a new model which uses the idea of change point in the
process of vulnerability discovery. Comparisons have been done between the proposed
model and existing model and it is observed that the proposed model with change point
performs well for all the three software dataset used. In order to evaluate the capability
of prediction of these two models, we have used five comparison criteria namely mean
square error, bias, root mean square error, variance and coefficient of multiple
Change Point Modelling in the Vulnerability Discovery Process 567

determination, as shown in Table 4 above. For the first four criteria, a lower value
suggests better curve fitting and hence indicates that the model is comparatively better.
While for the last criterion, i.e., coefficient of multiple determination, a higher value
indicates better curve fitting. As can be seen in Table 4 above, the proposed model
shows better results for all the criteria over all the datasets. The goodness of fit curves
are shown in Figs. 1, 2, and 3 for datasets D1, D2 and D3 respectively. It can be seen
that the plot of predictions with change point almost overlaps the existing dataset while
the predictions without change point do not overlap effectively with the data points.
Hence, establishing that the phenomenon of change point exists during testing in the
operational phase and vulnerability predictions can be effectively improved while
modelling with change point considerations.
In future, this work can be enhanced by considering imperfect debugging and error
generation during the process of vulnerability fixation. Patch analysis can also be
considered in future while modelling the process of vulnerability discovery.

References
1. Alhazmi, O.H., Malaiya, Y.K.: Modeling the vulnerability discovery process. In: 16th IEEE
International Symposium on Software Reliability Engineering (ISSRE 2005), pp. 10-pp.
IEEE (2005)
2. Alhazmi, O.H., Malaiya, Y.K.: Application of vulnerability discovery models to major
operating systems. IEEE Trans. Reliab. 57(1), 14–22 (2008)
3. Musa, J.D.: A theory of software reliability and its application. IEEE Trans. Softw. Eng. 3,
312–327 (1975)
4. Lyu, M.R.: Handbook of Software Reliability Engineering (1996)
5. Hsu, C.J., Huang, C.Y., Chang, J.R.: Enhancing software reliability modeling and prediction
through the introduction of time-variable fault reduction factor. Appl. Math. Model. 35(1),
506–521 (2011)
6. Musa, J.D.: Software Reliability Engineering: More Reliable Software, Faster and Cheaper.
Tata McGraw-Hill Education, New York (2004)
7. Shyur, H.J.: A stochastic software reliability model with imperfect-debugging and change-
point. J. Syst. Softw. 66(2), 135–141 (2003)
8. Anand, A., Bhatt, N.: Vulnerability discovery modeling and weighted criteria based ranking.
J. Indian Soc. Probab. Stat. 17(1), 1–10 (2016)
9. Anand, A., Das, S., Aggrawal, D., Klochkov, Y.: Vulnerability Discovery modelling for
software with multi-versions. In: Ram, M., Davim, J. (eds.) Advances in Reliability and
System Engineering. Management and Industrial Engineering, pp. 255–265. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-48875-2_11
10. Anderson, R.: Security in open versus closed systems—the dance of Boltzmann, Coase and
Moore. Technical report, Cambridge University, England (2002)
11. Bhatt, N., Anand, A., Yadavalli, V.S.S., Kumar, V.: Modeling and characterizing software
vulnerabilities. Int. J. Math., Eng. Manag. Sci. 2(4), 288–299 (2017)
12. National Vulnerability Database. https://nvd.nist.gov/
13. Joh, H.C., Kim, J., Malaiya, Y.K.: Vulnerability discovery modeling using Weibull
distribution. In: 19th International Symposium on Software Reliability Engineering,
pp. 299–300. IEEE (2008)
568 R. Sharma et al.

14. Kapur, P.K., Yadavalli, V.S.S, Shrivastava, A.K.: A comparative study of vulnerability
discovery modeling and software reliability growth modeling. In: International conference
on Futuristic Trends in Computational Analysis and Knowledge Management, pp. 246–251.
IEEE (2015)
15. Kapur, P.K., Sachdeva, N., Khatri, S.K.: Vulnerability discovery modeling. In: Quality,
Reliability, Infocomm Technology and Industrial Technology Management, pp. 34–54. I.K.
International Publishing House (2015)
16. Kapur, P.K., Garg, R.B.: A software reliability growth model for an error-removal
phenomenon. Softw. Eng. J. 7(4), 291–294 (1992)
17. Kim, J., Malaiya, Y.K., Ray, I.: Vulnerability discovery in multi-version software systems.
In: 10th IEEE High Assurance Systems Engineering Symposium. HASE 2007, pp. 141–148.
IEEE (2007)
18. Kimura, M.: Software vulnerability: definition, modelling, and practical evaluation for e-
mail transfer software. Int. J. Press. Vessel. Pip. 83(4), 256–261 (2006)
19. Rescorla, E.: Is finding security holes a good idea? IEEE Secur. Priv. 3(1), 14–19 (2005)
20. Shrivastava, A.K., Sharma, R., Kapur, P.K.: Vulnerability discovery model for a software
system using stochastic differential equation. In: International Conference on Futuristic
Trends in Computational Analysis and Knowledge Management, pp. 199–205. IEEE (2015)
21. Brady, R.M., Anderson, R.J., Ball, R.C.: Murphy’s law, the fitness of evolving species, and
the limits of software reliability. Technical report no. 471, Cambridge University Computer
Laboratory (1999)
22. Kapur, P.K., Pham, H., Gupta, A., Jha, P.C.: Software Reliability Assessment with OR
Applications. Springer, London (2011). https://doi.org/10.1007/978-0-85729-204-9
23. Krusl, I., Spafford, E., Tripunitara, M.: Computer vulnerability analysis. Department of
Computer Sciences, Purdue University, COAST TR 98-07 (1998)
24. Alhazmi, O.H., Malaiya, Y.K., Ray, I.: Measuring, analyzing and predicting security
vulnerabilities in software systems. Comput. Secur. 26(3), 219–228 (2011)
25. Huang, C.Y.: Performance analysis of software reliability growth models with testing-effort
and change-point. J. Syst. Softw. 76(2), 181–194 (2005)
26. Sharma, R., Sibal, R., Shrivastava, A.K.: Vulnerability discovery modeling for open and
closed source software. Int. J. Secur. Softw. Eng. (IJSSE) 7(4), 19–38 (2005)

You might also like