Multi-Parametric Models For Project Data Mining and Project Planning

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 15

PM World Today August 2010 (Vol XII, Issue VIII)


Multi-parametric Models for Project Data Mining and Project Planning

By Pavel Barseghyan, PhD Abstract The main drawback of the statistical methodology of project data mining is the use of single approximation curves to replace the entire system of data, which leads to large errors in the estimation of parameters of new projects, subject to planning. In this situation, the obvious solution of the problem is to use the families of curves instead of one approximation curve to replace the data. But we must bear in mind that the choice of analytic form of the approximation curves has a significant impact on the accuracy of estimates too. Selection of an appropriate analytical form of approximation curves can be controlled by topdown approach to the data mining process Another important point in the new methodology for estimating the parameters of the projects is the consideration of the project goals and objectives. In this regard, the selection of such projects, which have the same goals and same priorities of these goals, is important to ensure the high accuracy of estimates. For this kind of analysis is also important to have a quantitative representation of the hierarchy of project objectives to develop practical criteria for the project similarity by their goals and objectives. This article is devoted to the development of new methodology of project data mining for the synthesis of projects based on input information on the project complexity, team productivity and the priorities of project objectives. Key words: Project data mining, critique of data mining statistical methods, top-down methodology of project data mining, project synthesis and planning, TRANSCALE tool for top-down data mining and knowledge extraction. Introduction Even a cursory analysis of the state of affairs in the field of project data mining indicates that over the past 40-50 years there has been no significant shift in the solution of vital problems in this critical area of research.

PM World Today is a free monthly eJournal - Subscriptions available at

Page 1

PM World Today August 2010 (Vol XII, Issue VIII)

Project data, despite its unsystematic nature as a result of a random data collection process, is a unique and important source of information based on which one can build truly effective quantitative methods of project management. Really efficient scientific methodology of project data mining should become a heart of the new quantitative project management, an area that is currently in deep crisis. Hushing up of this crisis state in the field of quantitative project management might be beneficial to some universities and especially companies that are directly involved in this business but it is not beneficial for the industry as whole. Moreover, the consequences of this hushing up have been a disaster for the industry, where the losses associated with multiple, even massive failures of projects are huge. Worldwide incidence of this phenomenon leads people to a broad discussion of the problem, particularly its financial aspects, but unfortunately, these huge losses have not been associated with crises in the quantitative project management. But the thing is that, along with other causes of massive failures of projects, lack of a truly scientific methodology of planning and execution of projects is one of the major reasons for this state of affairs. Statistical methodology of project data mining, which is the ideological basis of the modern quantitative project management, in fact, has long proved a failure. Numerous statistical relations between the parameters of the projects, intended for estimation and prediction of projects, have unacceptably low accuracy and, in fact, are unsuitable for the purposes of the industry. All these relations are the result of direct processing of the project data using methods of regression analysis. As stated in [1], such an approach to data processing has a number of serious shortcomings. First, a single approximating curve cannot cover the entire range of the parameters of projects. This means that for more accurate estimation of projects and for more reliable description of relationships between the parameters of the projects instead of one approximating curve we need to use families of curves. Secondly, the data dependent direct determination of the mathematical form of approximating curves may not reflect the essence of the functional relationships between specific parameters of the project correctly. In other words, data processing bottom-up methodology without taking into account the internal logic of data may not reflect correctly the essence of the phenomena under study.

PM World Today is a free monthly eJournal - Subscriptions available at

Page 2

PM World Today August 2010 (Vol XII, Issue VIII)

Thus, in order to improve the estimation accuracy of the parameters of projects we need to utilize families of curves that reflect the fundamental nature of the functional relationships between the parameters of the project. This paper is devoted to the development of these families of curves based on the state equation of projects and on the quantitative presentation of the priorities of project goals. Then these families of approximating curves are used to develop a new methodology of project data mining and synthesis of the projects based on the three input parameters, namely the complexity of the project, team productivity and priorities of the project objectives. Accuracy analysis of project estimation To improve the estimation accuracy of project parameters it is necessary to analyze the structure of the accuracy itself. It must be remembered that the accuracy of estimates of project parameters depends on two factors. The first is the accuracy of input parameters, and the second factor is the accuracy of the assessment mechanism itself. Fig.1 shows an example of the estimation of project parameters, when input parameters are the complexity of the project W , the productivity of the team P and the project objectives in the form of Time/People ratio R [1, 2, 3]. Output parameters are the project effort E , the average size of the team N av and the duration of the projectT . In addition the accuracy of estimates is affected by the estimation mechanism itself that are the formulas for determining the output parameters. Input parameters of projects: - Complexity, - Team productivity, - Project goals and their priorities Output parameters of projects: - Project effort, - Project duration, - Project average staffing

Project estimation methodology

Fig.1 Restoration or synthesis of projects based on three input parameters These formulas in general form can be represented as follows E = f1 (W , P, R ) T = f 2 (W , P, R) N av = f 3 (W , P, R ) (1) (2) (3)

PM World Today is a free monthly eJournal - Subscriptions available at

Page 3

PM World Today August 2010 (Vol XII, Issue VIII)

This means that if the input parameters contain errors W , P and R and the corresponding formulas f1 , f 2 and f 3 are inadequate, it may cause formation of the errors of output parameters E , T and N av . Errors in estimates of project parameters E , T and N av mathematically can be interpreted as their differentials. Correspondingly for the differentials of project effort E , cycle time working people N av one can have.
E =

and average number of

E E E f (W , P, R) f (W , P, R) f (W , P, R) W + P + R = 1 W + 1 P + 1 R , W P R W P R
T T T f (W , P, R) f (W , P, R) f (W , P, R) W + P + R = 2 W + 2 P + 2 R , W P R W P R
N av N N f (W , P, R) f (W , P, R ) f (W , P, R) W + av P + av R = 3 W + 3 P + 3 R . W P R W P R

(4) (5) (6)

T =

N av =

As it can be seen from these expressions, the accuracy of input parameters and the accuracy of estimating formulas affect the accuracy of the output parameters independently. That is, regardless of the accuracy of input parameters W , P , and R , increasing the accuracy of the formulas (1), (2) and (3) may improve the accuracy of estimates of project parameters. Fig.2 presents all the increments W , P , R , E , T and N av in the multi-dimensional flat project space with the aid of TRANSCALE tool. Remarkable curves in the project space generated by the condition of the constancy of Time/People ratio R Usage of different by nature families of curves in the project space is crucial for the analysis of projects similarity. Such a system of curves can be generated by the conditions of the constancy of the parameters of projects [4], or by the conditions of constancy of their various combinations. Also these curves can be generated by different extreme conditions such as minimization of effort and risk [5]. Like other conditions in the form of constancy of parameters of the projects, or combinations of them, the condition of constancy of the priorities of project objectives can generate a system of similarity curves in the project space too.
PM World Today is a free monthly eJournal - Subscriptions available at Page 4

PM World Today August 2010 (Vol XII, Issue VIII)

Fig.2 Estimation errors of project parameters in multi-dimensional flat project space Fig.3 presents three projects with the same complexity and Time/People ratio. As it is seen from the picture project points in the different fields are placed along with special curves which are the direct consequences of the condition of constancy of Time/People ratio.

Fig.3 Implementation of the same project with three different development teams

PM World Today is a free monthly eJournal - Subscriptions available at

Page 5

PM World Today August 2010 (Vol XII, Issue VIII)

W = Cons tan t and R= T = Cons tan t N av

(7) (8)

Such remarkable curves can also be generated for the projects with constant complexities W = Cons tan t and different Time/People ratios by the means of transformation of projects in the project space [4]. As can be seen from Fig.3, each field or coordinate system of project parameters has its own remarkable curve. Lets consider these curves separately with the aid of the state equation of projects [6]. Equations of the remarkable curves for different project fields Two conditions of constancy (7) and (8) represent the hyperbola in the [ E , P ] field and a straight line in the field [ N av ,T ] correspondingly. For the obtaining equations of the remarkable curves in the fields of [ P , N av ] and [ E ,T ] one can use the state equation of projects [6]. P * T * N av = W (9)

For finding the equation of the curve in the [ P , N av ] field, that reflects the condition T R= = Cons tan t , we can represent (9) in the form N av P * T * N av N av 2 T 2 = P * N av = P * N av R = W N av N av (10)

From here one can obtain the desired relationship between team productivity and average team size in this form P= W 2 RN av . (11)

PM World Today is a free monthly eJournal - Subscriptions available at

Page 6

PM World Today August 2010 (Vol XII, Issue VIII)

This means that for each constant values of W and R functional relationship between team productivity and average staffing has the form of a quadratic hyperbola, which is very important for project data mining and interpretation. Similarly for the relationship between project effort and its duration for constant Time/People ratio R we can have E= W T T2 = T * N av = T * N av = . P T R (12)

This means that for the constant values of Time/People ratio functional relationship between effort and project duration has a parabolic form. Project similarity zones The exact definition of Time/People ratio, in connection with the many uncertainties in practice, faces various difficulties. As a consequence, it is more appropriate to speak not about the exact value of this ratio but about the being of this value in a certain range R , the minimum value of which is determined by the above uncertainties. N av
4 0

3 0

Fig.5 Similarity zone of projects for a small interval R of Time/People ratio R

PM World Today is a free monthly eJournal - Subscriptions available at

Page 7

PM World Today August 2010 (Vol XII, Issue VIII)

Experimental project points that are located within this interval R are the basis for the refined regression between the parameters under study. For exact definition of the interval R it is always necessary to find a trade-off between the considerations of reliability and accuracy, since the increase of the interval increases the probability of finding the expected value of the ratio in the given interval, and the reduction of the interval increases the accuracy of regression. Presentation of project similarity by the families of curves in different coordinate systems The set of project data can be divided into groups according to the principle of relative constancy of the values of their Time/People ratio. As a result, each group of projects will cover a certain range of values of this ratio. The set of Time/People ratios in different systems of coordinate will generate different families of curves, which will cover the entire range of project parameters. 1. Family of project similarity curves in the coordinate system of [ N av ,T ] In the general case of nonlinear scaling of projects Time/People ratio is a nonlinear function of project complexity. With the reduction of project complexity that nonlinear dependence is gradually weakened and the Time/People ratio becomes a constant. With increasing complexity, depending on the market pressure, this dependence can take different forms. The study of this nonlinear dependence of Time/People ratio on the project scaling is very important for the study of giga-projects and is a subject of a special paper. The specific case of the constancy of Time/People ratio is presented in Fig.6 in the form of the family of linear relationships between project average staffing N av and project durationT . N av

PM World Today is a free monthly eJournal - Subscriptions available at

Page 8

PM World Today August 2010 (Vol XII, Issue VIII)

8 0

Fig.6 Zones of project similarity in the coordinate system [T , N av ] 2. Family of project similarity curves in the coordinate system of [ P , N av ] Functional relationship between team productivity P and team size N av is the subject of debates for a long time and continues to be in the center of attention of professionals. Despite this, there is still no clarity as to what this functional relationship is. P
2 0 0

N av

Fig.7 Zones of project similarity in the coordinate system of [ P , N av ] The fact is that, depending on additional conditions may exist several such functional relationships. In particular in [5] for the derivation of this functional relationship is used the conditional minimum of project effort. In [7] the derivation of the said functional relationship is based on the considerations about the communication between human beings and their interactions. In this paper, the same functional relation is obtained based on the condition of the constancy of Time/People ratio. The result is the expression (11).

PM World Today is a free monthly eJournal - Subscriptions available at

Page 9

PM World Today August 2010 (Vol XII, Issue VIII)

2 0 0

1 00 0

1 02 0

Fig.8 Zones of project similarity in the coordinate system [ P , E ] With the aid of that expression we can divide the field of [Team productivity, Team size] into the zones of project similarity (Fig.7). 3. Family of project similarity curves in the field of [ P , E ] Dividing project database into the groups by the condition of the constancy of their complexity, one can represent the functional relationship between team productivity P and project effort E in the form of the family of hyperbolas (Fig.8). 4. Family of project similarity curves in the coordinate system of [ E , T ] As we saw above, the condition of the constancy of Time/People ratio allows obtaining the functional relationship between project effort and its duration in the form of the expression (12). Fig.9 presents this relationship for different constant values of Time/People ratio R in the form of a family of parabolic curves. E
1 02 0 1 00 0 8 6 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Fig.9 Zones of project similarity in the coordinate system [ E , T ]

PM World Today is a free monthly eJournal - Subscriptions available at

Page 10

PM World Today August 2010 (Vol XII, Issue VIII)

Presentation of the zones of project similarity with the aid of the TRANSCALE tool Thus, dividing project databases into groups by using various criteria of similarity it is possible to improve dramatically the accuracy of project estimation. This is achieved by the fact that each group of projects has its own regression equation, the accuracy of which is much higher in comparison with the case when the entire system of project data is replaced by a single regression curve. Graphical presentation of project similarity zones is very useful for selecting groups of similar projects in the planning of a new project. For multivariate comparisons and project similarity analysis it is convenient to use TRANSCALE tool. Fig.10 is a joint presentation of a project database in the multi-parametrical flat project space with the zones of project similarity. Accuracy analysis of multi-parametric models of projects As an example lets consider functional relationship between project total effort and project duration. This relationship can be analyzed using some project database and breaking it into different number of groups. The grouping of project database is performed by similarity of Time/People ratio R .

PM World Today is a free monthly eJournal - Subscriptions available at

Page 11

PM World Today August 2010 (Vol XII, Issue VIII)

Fig.10 Multi-parametrical presentation of projects with the zones of their similarity Fig.11 contains the results of project grouping and approximation for the number of groups of projects from one to four. As can be seen from the picture, an increase in the number of groups of projects increases the accuracy of approximation. In addition, the numerical value of the approximating power law exponents are approaching to the figure 2, which is close to the theoretical value of this exponent. It is also important to analyze the possibilities of using the Time/People ratio as an input parameter for estimating the parameters of projects. From this point of view one can analyze the fitting curves with higher values of r 2 . For example, if we analyze the average curve in the case with three approximating curves with r 2 = 0.989 , the numerical values of Time/People ratios in the corresponding group are ranging from 7,786 to 12. This means that even the usage of rough estimates for Time/People ratio can significantly improve the quality of project estimations.

PM World Today is a free monthly eJournal - Subscriptions available at

Page 12

PM World Today August 2010 (Vol XII, Issue VIII)

Fig.11 Four options of grouping of the same project data Conclusions 1. In spite of their non-systematic character, project databases are important sources of information about the functional relationships between project parameters, 2. Statistical methodology of project data mining, which is the ideological basis of modern quantitative project management, in fact, has long proved a failure, 3. Numerous statistical relationships between project parameters intended for project estimation have extremely low accuracy and therefore cannot be used for everyday practical purposes of the industry, 4. To extract correct functional relationships between project parameters from the project data it is necessary to develop special non-statistical methods, 5. Efficient scientific project data mining should be the core of the quantitative project management, an area which is currently in deep crisis, 6. Given the large scatter of project parameters in the databases, to ensure acceptable accuracy of relationships, obtained from these data, it is necessary to cover the data by the families of approximating curves, 7. Identification of project objectives and their priorities are directly related to the accuracy of estimates of its parameters, 8. Quantitative description of the priorities of project objectives is the key to improve the accuracy of project estimates,

PM World Today is a free monthly eJournal - Subscriptions available at

Page 13

PM World Today August 2010 (Vol XII, Issue VIII)

9. Analysis of project data indicates that even the usage of rough estimates for the priorities of project objectives can significantly improve the quality of project estimates, 10. New methodology of project data mining can serve as an ideological basis for the development of project synthesis new methodology and algorithms. References 1. 2. 3. 4. 5. 6. 7. Pavel Barseghyan (2010) Project Nonlinear Scaling and Transformation Methodology and TRANSCALE Tool. PM World Today May 2010 (Vol XII, Issue V). Pavel Barseghyan (2009) Problems of the Mathematical Theory of Human Work (Principles of mathematical modeling in project management). PM World Today August 2009 (Vol XI, Issue VIII). Pavel Barseghyan (2010) Project Data Mining and Project Estimation Top-down Methodology with TRANSCALE Tool. PM World Today June 2010 (Vol XII, Issue VI). Pavel Barseghyan (2010) Similarity of Projects: Methodology and Analysis with TRANSCALE Tool. PM World Today July 2010 (Vol XII, Issue VII). Pavel Barseghyan (2009) Principles of Top-Down Quantitative Analysis of Projects: Part 2 Analytical Derivation of Functional Relationships between Project Parameters without Project Data. PM World Today June 2009 (Vol XI, Issue VI). Pavel Barseghyan. (2009). Principles of Top-Down Quantitative Analysis of Projects. Part 1: State Equation of Projects and Project Change Analysis. PM World Today May 2009 (Vol XI, Issue V). Pavel Barseghyan. (2009). Quantitative Analysis of Team Size and its Hierarchical Structure. PM World Today July 2009 (Vol XI, Issue VII).

About the Author

PM World Today is a free monthly eJournal - Subscriptions available at

Page 14

PM World Today August 2010 (Vol XII, Issue VIII)

Pavel Barseghyan, PhD


Dr. Pavel Barseghyan is a consultant in the field of quantitative project management, project data mining and organizational science. He is the founder of Systemic PM, LLC, a project management company. Has over 40 years experience in academia, the electronics industry, the EDA industry and Project Management Research and tools development. During the period of 1999-2010 he was the Vice President of Research for Numetrics Management Systems. Prior to joining Numetrics, Dr. Barseghyan worked as an R&D manager at Infinite Technology Corp. in Texas. He was also a founder and the president of an EDA start-up company, DAN Technologies, Ltd. that focused on high-level chip design planning and RTL structural floor planning technologies. Before joining ITC, Dr. Barseghyan was head of the Electronic Design and CAD department at the State Engineering University of Armenia, focusing on development of the Theory of Massively Interconnected Systems and its applications to electronic design. During the period of 1975-1990, he was also a member of the University Educational Policy Commission for Electronic Design and CAD Direction in the Higher Education Ministry of the former USSR. Earlier in his career he was a senior researcher in Yerevan Research and Development Institute of Mathematical Machines (Armenia). He is an author of nine monographs and textbooks and more than 100 scientific articles in the area of quantitative project management, mathematical theory of human work, electronic design and EDA methodologies, and tools development. More than 10 Ph.D. degrees have been awarded under his supervision. Dr. Barseghyan holds an MS in Electrical Engineering (1967) and Ph.D. (1972) and Doctor of Technical Sciences (1990) in Computer Engineering from Yerevan Polytechnic Institute (Armenia). Pavel can be contacted at

PM World Today is a free monthly eJournal - Subscriptions available at

Page 15

You might also like