Professional Documents
Culture Documents
Software Vulnarability
Software Vulnarability
Discovery Process
1 Introduction
However this assumption may not hold true for all the cases. It may happen that once the
process of testing is started, the team decided to include a more experienced and skilled
member to be a part of the team. They may also introduce changes in the current testing
strategies while adopting automated tools for better results. The basic motive is to come
up with best possible results and this might lead to multiple changes during the testing
process. Now, in such a situation, the model parameters estimated without considering
these changes may not be able to give an accurate description of the testing progress.
Owing to these changes some or all of the parameters used in the model may show
significant variations that are usually observed in the operational environments [22].
The paper organization is as follows: Sect. 2 presents the existing related work,
followed by notations, assumptions and framework of the proposed model in Sect. 3.
In Sect. 4, the change point based vulnerability discovery model has been developed.
Section 5 includes the numerical illustration of the proposed model and its analysis.
Finally conclusion has been drawn in Sect. 6 based on the model development and
analysis.
2 Related Work
Over the years, researchers have tried to come up with different mathematical models
with varied assumptions for this process of vulnerability discovery. Anderson’s ther-
modynamic model was the first model in the direction of modelling the process of
vulnerability discovery [10, 21]. It was initially developed in the field of thermody-
namics and later applied to model the vulnerability detection process. Though, it did
not give appreciable results, it marked the beginning of research in this direction. Soon
after the Anderson’s thermodynamic model, Rescorla proposed two linear and expo-
nential models which gave slightly better results. The most widely used model termed
as Alhazmi Malaiya Logistic (AML) [1, 2]. Model came out in early 90’s and gave
commendable results. Their vulnerability [24] prediction results for windows operating
system were very close to the actual number of vulnerabilities observed. They also
proposed an effort based discovery model and called it Alhazmi-Malaiya Effort-based
(AME) model but that does not show any improvement in results over AML. AML
established that vulnerability discovery curve is sigmoid in shape due to 3 phases
involving learning, linear and saturation. The results obtained using AML model were
very close to the observed data.
Later, Joh et al. [13] used Weibull distribution function to model the process of
vulnerability discovery but could not give appreciable predictions. Kapur et al. [14]
gave two models and made a comparison with AML model. They developed these
models using two different distribution functions. Shrivastava et al. [20] modelled a
VDM using the famous AML model creäted around the stochastic differential equation.
Kim [17] talked about the multiple versions of software during the modelling process.
Sharma et al. developed a VDM using Gamma function and discussed differences in
the process of vulnerability discovery for open and closed software systems [26].
Anand et al. [9] suggested a model for multiple versions of vulnerability discovery
framework where they assumed that the total number of vulnerabilities of the nth
Change Point Modelling in the Vulnerability Discovery Process 561
version is sum of the vulnerabilities detected in the current version and remaining
vulnerabilities that are detected in upcoming version.
In 2003, Shyur suggested that change point exists in the process of testing and
hence should be considered during model development [7]. Change point models are
extensively used in the hardware and software reliability study [22]. Here we are
proposing this phenomenon of change point in specific area of Software Vulnerability
discovery. In the existing work in this field, the models have been derived under the
assumption of a constant detection rate [1, 2, 8–11, 13, 19, 24, 26]. Researchers
proceeded with the assumption that the probability of vulnerability detection is equal
for all the vulnerabilities while in the operational phase, and the rate at which they are
detected does not change. However, in a real scenario, the Vulnerability detection rate
relies on the skills of testing team, defect density, size of the program, factors of code
expansion, team constitution, testing, testability of software etc. [7, 15, 18, 22, 25].
3 Methodology
In this study, we have used the methodology applied for reliability assessment in
Software Reliability Growth Models [22] which is now being applied to Vulnerability
Discovery Models as well [16, 20, 26].
Non Homogenous Poisson Process is a process in which the rate of arrival or
detection of an entity is random with non-stationary increments. It shows all the
properties of a Poisson process but the rate is a function of time. In this study, we have
considered the process of vulnerability discovery as an NHPP. Since, the number of
vulnerabilities detected by the testing team is random and in each time interval, the
number of vulnerabilities found may not be stationary. The VDM is hence developed
under the assumptions that hold true for NHPP processes. The various assumptions
considered during model development are described in the next section.
3.1 Assumptions
The mathematical models are usually developed with certain predefined notions treated
as model assumptions. These are the conditions that usually prevail and simplify the
process of model development. The assumptions during model development are as
follows:
i Initially the number of vulnerabilities detected are zero i.e. at zeroth time instant,
So, at t = 0, V(0) = 0.
ii The total no. of vulnerabilities detected depends on the no. of unresolved ones.
iii The software has finite no. of vulnerabilities.
iv The rate of vulnerability detection is not constant and it can change at any point.
3.2 Notations
V: Total number of Vulnerabilities present in the software.
V(t): expected no. of Vulnerabilities identified during the time interval (0,t].
562 R. Sharma et al.
4 Model Development
Recent study on change point model [22] suggested a methodology for developing the
models with change point by giving an appropriate cumulative distribution function for
the software vulnerability discovery times, D(t). The exponential model with single
change point is then developed from this framework.
As, per the Non Homogenous Poisson Process, the total count of vulnerabilities
discovered with time is in proportion with the count of residual vulnerabilities in the
software and the rate of detection. This is depicted by Eq. (1) below.
dVðtÞ VðtÞÞ
¼ rðtÞðV ð1Þ
dt
Equation (1) above shows the general model without considering the phenomenon
of change point.
While considering the notion of change point, the rate of vulnerability detection
changes with time. Hence, in order to develop the single change point model, we take
two different hazard functions for pre and post occurrence of change point. Using
Eq. (1),
( d1 ðtÞ
r1 ðtÞ ¼ 1D 1 ðtÞ
0tc
rðtÞ ¼ d2 ðtÞ ; ð2Þ
r2 ðtÞ ¼ 1D2 ðtÞ c\t
1 er1 t
D1 ðtÞ ¼ ; 0tc ð4Þ
1 ler1 t
and
1 er2 t
D2 ðtÞ ¼ ; c\t ð5Þ
1 ler2 t
Change Point Modelling in the Vulnerability Discovery Process 563
For simplification, the shape parameter in the distribution taken can be presumed to
be same for before and after the occurrence of change point. Now, the mean value
function of the VDM with change point is
8 r t
1 ð1 þ lÞer11t ;
<V 0 t c;
1 þ le
VðtÞ ¼ ð6Þ
:V r c
1 ð1 þrlÞð11c þ be 2rÞ2 t eb1 cb2 ðtcÞ c\t
ð1 þ le Þð1 þ le Þ
This model represents a learning curve which is flexible and generally provides a
decent estimation of software vulnerabilities and hence the reliability and security of
the software. For mathematical simplification, it is assumed that the parameter l of the
failure time distribution is not changed and hence taken as same for the distributions for
before and after the change point. However it might be changed and the value of mean
value function can be computed accordingly for any practical real scenario.
5 Parameter Estimation
Parameters are estimated for three different software data sets using Statistical Package
for Social Sciences (SPSS).
RMSPE is a measure of how closely the model predicts the observed values.
A lower RMSPE value is desired as it suggests close predictions.
Change Point Modelling in the Vulnerability Discovery Process 565
X
m
ðVðti Þ yi Þ2
M:S:E: ¼
i¼1
m
Where k denoted the no. of data points. Lower the value of MSE, lower is the
error in curve fitting and hence desirable.
(5) Coefficient of Multiple Determination (R2 )
The ratio of the sum of squares resulting from the trend model to that from
constant model subtracted from 1 is defined as the Coefficient of multiple
determination. Closer the value of R2 to 1, better is the curve fitting and hence
predictions.
residualSS
i:e: R2 ¼ 1
correctedSS
Table 4. Comparison criteria results of model with and without change point (CP)
Datasets D1 (Adobe) D2 (Windows XP) D3 (Mozilla Firefox)
Comparison criteria Without CP With CP Without CP With CP Without CP With CP
Bias 0.287 4.74 41.56 11.71 −12.4 0.996
Variance 21.66 18.55 95.67 16.45 37.95 13.95
RMSPE 21.66 19.15 104.3046 20.19 39.95 13.98
MSE 439.8 344.9 8694.5 3398.4 1352.7 747.11
R2 0.984 0.995 0.994 0.995 0.996 0.998
3000
2500
CumulaƟve No. of VulnerabiliƟes
2000
1500
1000
500
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Time
800
700
CumulaƟve No. of VulnerabilƟes
600
500
400
300
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time
2000
1800
1600
CumulaƟve No. of VulnerabilƟes
1400
1200
1000
800
600
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Time
This proposed work introduced a new model which uses the idea of change point in the
process of vulnerability discovery. Comparisons have been done between the proposed
model and existing model and it is observed that the proposed model with change point
performs well for all the three software dataset used. In order to evaluate the capability
of prediction of these two models, we have used five comparison criteria namely mean
square error, bias, root mean square error, variance and coefficient of multiple
Change Point Modelling in the Vulnerability Discovery Process 567
determination, as shown in Table 4 above. For the first four criteria, a lower value
suggests better curve fitting and hence indicates that the model is comparatively better.
While for the last criterion, i.e., coefficient of multiple determination, a higher value
indicates better curve fitting. As can be seen in Table 4 above, the proposed model
shows better results for all the criteria over all the datasets. The goodness of fit curves
are shown in Figs. 1, 2, and 3 for datasets D1, D2 and D3 respectively. It can be seen
that the plot of predictions with change point almost overlaps the existing dataset while
the predictions without change point do not overlap effectively with the data points.
Hence, establishing that the phenomenon of change point exists during testing in the
operational phase and vulnerability predictions can be effectively improved while
modelling with change point considerations.
In future, this work can be enhanced by considering imperfect debugging and error
generation during the process of vulnerability fixation. Patch analysis can also be
considered in future while modelling the process of vulnerability discovery.
References
1. Alhazmi, O.H., Malaiya, Y.K.: Modeling the vulnerability discovery process. In: 16th IEEE
International Symposium on Software Reliability Engineering (ISSRE 2005), pp. 10-pp.
IEEE (2005)
2. Alhazmi, O.H., Malaiya, Y.K.: Application of vulnerability discovery models to major
operating systems. IEEE Trans. Reliab. 57(1), 14–22 (2008)
3. Musa, J.D.: A theory of software reliability and its application. IEEE Trans. Softw. Eng. 3,
312–327 (1975)
4. Lyu, M.R.: Handbook of Software Reliability Engineering (1996)
5. Hsu, C.J., Huang, C.Y., Chang, J.R.: Enhancing software reliability modeling and prediction
through the introduction of time-variable fault reduction factor. Appl. Math. Model. 35(1),
506–521 (2011)
6. Musa, J.D.: Software Reliability Engineering: More Reliable Software, Faster and Cheaper.
Tata McGraw-Hill Education, New York (2004)
7. Shyur, H.J.: A stochastic software reliability model with imperfect-debugging and change-
point. J. Syst. Softw. 66(2), 135–141 (2003)
8. Anand, A., Bhatt, N.: Vulnerability discovery modeling and weighted criteria based ranking.
J. Indian Soc. Probab. Stat. 17(1), 1–10 (2016)
9. Anand, A., Das, S., Aggrawal, D., Klochkov, Y.: Vulnerability Discovery modelling for
software with multi-versions. In: Ram, M., Davim, J. (eds.) Advances in Reliability and
System Engineering. Management and Industrial Engineering, pp. 255–265. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-48875-2_11
10. Anderson, R.: Security in open versus closed systems—the dance of Boltzmann, Coase and
Moore. Technical report, Cambridge University, England (2002)
11. Bhatt, N., Anand, A., Yadavalli, V.S.S., Kumar, V.: Modeling and characterizing software
vulnerabilities. Int. J. Math., Eng. Manag. Sci. 2(4), 288–299 (2017)
12. National Vulnerability Database. https://nvd.nist.gov/
13. Joh, H.C., Kim, J., Malaiya, Y.K.: Vulnerability discovery modeling using Weibull
distribution. In: 19th International Symposium on Software Reliability Engineering,
pp. 299–300. IEEE (2008)
568 R. Sharma et al.
14. Kapur, P.K., Yadavalli, V.S.S, Shrivastava, A.K.: A comparative study of vulnerability
discovery modeling and software reliability growth modeling. In: International conference
on Futuristic Trends in Computational Analysis and Knowledge Management, pp. 246–251.
IEEE (2015)
15. Kapur, P.K., Sachdeva, N., Khatri, S.K.: Vulnerability discovery modeling. In: Quality,
Reliability, Infocomm Technology and Industrial Technology Management, pp. 34–54. I.K.
International Publishing House (2015)
16. Kapur, P.K., Garg, R.B.: A software reliability growth model for an error-removal
phenomenon. Softw. Eng. J. 7(4), 291–294 (1992)
17. Kim, J., Malaiya, Y.K., Ray, I.: Vulnerability discovery in multi-version software systems.
In: 10th IEEE High Assurance Systems Engineering Symposium. HASE 2007, pp. 141–148.
IEEE (2007)
18. Kimura, M.: Software vulnerability: definition, modelling, and practical evaluation for e-
mail transfer software. Int. J. Press. Vessel. Pip. 83(4), 256–261 (2006)
19. Rescorla, E.: Is finding security holes a good idea? IEEE Secur. Priv. 3(1), 14–19 (2005)
20. Shrivastava, A.K., Sharma, R., Kapur, P.K.: Vulnerability discovery model for a software
system using stochastic differential equation. In: International Conference on Futuristic
Trends in Computational Analysis and Knowledge Management, pp. 199–205. IEEE (2015)
21. Brady, R.M., Anderson, R.J., Ball, R.C.: Murphy’s law, the fitness of evolving species, and
the limits of software reliability. Technical report no. 471, Cambridge University Computer
Laboratory (1999)
22. Kapur, P.K., Pham, H., Gupta, A., Jha, P.C.: Software Reliability Assessment with OR
Applications. Springer, London (2011). https://doi.org/10.1007/978-0-85729-204-9
23. Krusl, I., Spafford, E., Tripunitara, M.: Computer vulnerability analysis. Department of
Computer Sciences, Purdue University, COAST TR 98-07 (1998)
24. Alhazmi, O.H., Malaiya, Y.K., Ray, I.: Measuring, analyzing and predicting security
vulnerabilities in software systems. Comput. Secur. 26(3), 219–228 (2011)
25. Huang, C.Y.: Performance analysis of software reliability growth models with testing-effort
and change-point. J. Syst. Softw. 76(2), 181–194 (2005)
26. Sharma, R., Sibal, R., Shrivastava, A.K.: Vulnerability discovery modeling for open and
closed source software. Int. J. Secur. Softw. Eng. (IJSSE) 7(4), 19–38 (2005)