Professional Documents
Culture Documents
Scaled Agile Framework Implementation in Organizations', Its Shortcomings and An AI Based Solution To Track Team's Performance
Scaled Agile Framework Implementation in Organizations', Its Shortcomings and An AI Based Solution To Track Team's Performance
Abstract—The beginning of the 21st century was the machine learning model can be developed to predict the
Software Enlargement Period, where software development probability of feature completion in the given timeline at an
methods were designed to create and deliver software to early stage of the development process.
market under limited resources, time, and budget. The
traditional Software Development Life Cycle (SDLC) created II. AGILE SOFTWARE DEVELOPMENT
2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT) | 978-1-6654-6855-8/22/$31.00 ©2022 IEEE | DOI: 10.1109/GCAT55367.2022.9971968
2
Authorized licensed use limited to: Universitas Indonesia. Downloaded on October 25,2023 at 17:07:37 UTC from IEEE Xplore. Restrictions apply.
architectural view. Then the teams prepare a draft plan and 5) Inspect and Adapt (I&A): Inspect & Adapt Event can
present it to the business owner and management, who then be seen as an escalated retrospective in which all team
review it and provide feedback [4]. members and stakeholders of ART are present. It comprises
three parts:
Now, the team plans for an iteration using a visual board
called the Program Board. Teams mark features that will be a) PI System Demo: It is a more formal activity of
executed in the next sprint. Features are further broken into about an hour where all the features developed by ART
independent user stories. The size of user stories is during the PI are presented to a comparatively broader
estimated as a story point, and then they are prioritized by audience.
the PO according to the customer requirement and team b) Quantitative and Qualitative Measurement: Agile
capacity [5]. The agile team members also determine the teams review qualitative and quantitative metrics and record
risks and dependencies. All these steps are repeated for each data patterns. RTE collects this information and analyses the
sprint. possible problems [6].
2) Day-2: The Program Manager highlights the changes in c) Retrospective and Problem Solving Workshop: A
scope and resources. Teams update the plan according to the small retrospective is organized to discuss further issues,
provided feedback, define risks and dependencies and and then RTE facilitates Problem Solving Workshop to
finalize the plan. Finally, teams present the final features, tackle these issues.
dependencies, and milestones on the Program Board.
B. Iteration Planning and Development
An iteration may last up to two weeks, where teams
execute the designated features according to the plan.
1) Sprint Planning: A Sprint Planning Meeting is
organized by Scrum Master, where teams discuss the
purpose of user stories and check off the user stories that are
ready for implementation. User stories are assigned to team
members, and the complexity of user stories is determined
using story points [4]. Team members also discuss their
holiday plans to keep the holiday track and avoid delays.
2) Product Backlog Preparation and Refinement: The
Product Owner communicates with the customers, Fig. 1 SAFe Workflow
understands their requirements, converts requirements into
VI. PROBLEM STATEMENT AND PURPOSE
features, and places them into the product backlog [5].
A Backlog Refinement process is carried out by the SAFe is a better framework than other development
teams where they filter the user stories which are ready for strategies as it follows an iteration and increment approach.
execution in the next iteration. The user stories having the It is a convenient methodology in today’s world, where
highest priority from the Product Backlog are placed into technology, business, and the market are constantly
Sprint Backlog for the upcoming iteration. User stories are changing. SAFe welcomes late changing requirements of
refined based on story points, the number of team members, customers and incorporates these changes flexibly in the
and allotted time [6]. When multiple teams in ART perform project [6].
this practice, several risks, challenges, and dependencies are However, applying SAFe for developing complex
encountered at the early stages. projects in large enterprises can be challenging. Sometimes,
3) Daily Stand-Up: A Stand-Up meeting of 15 minutes teams cannot deliver solutions on the given deadline, even
is daily organized by the Scrum Master during the entire after going through such a rigorous planning process.
sprint. Team members discuss the tasks they completed
yesterday, tasks they will do today, and the challenges they
are encountering. Scrum Master reviews the status of the
project and the work and progress of individual team
members to increase transparency [3].
4) Sprint Review and Retrospective: A working model
of the final product is displayed to the PO and customer in
Sprint Review Meeting called the System Demo [2].
Customers review the prototype, check progress and give
feedback about changes to be incorporated into the final
product. This preserve teams from heading in the wrong
direction [3].
Fig. 2 Problem Scenario
A Retrospective meeting is held at the end of an
iteration, where all project members discuss problems Several challenges may occur during the development
encountered, things that went well in this sprint, and actions process:
to be taken for future improvements.
1) Insufficient Time: The timeline has to be drawn very
carefully, as extra time is needed to incorporate the
3
Authorized licensed use limited to: Universitas Indonesia. Downloaded on October 25,2023 at 17:07:37 UTC from IEEE Xplore. Restrictions apply.
continuously changing requirements of the customers. An 3) Semi-Supervised learning: This algorithm is a
inadequate amount of time may cause the inability of teams combination of supervised and unsupervised learning as it
to stick to the estimated timeline and poor product quality. works on both labeled and unlabeled data. Since the labeled
2) Unresolved Dependencies: Multiple ARTs are cross- data is less than the data to be predicted, this algorithm falls
functionally dependent on data and information. There could between learning with supervision and without supervision
be some unresolved dependencies between data points [2]. [13]. Some Semi-Supervised Machine Learning algorithms
If one team in ART misses the deadline, it eventually are Generative Models, Self-Training, Transductive SVM,
hampers the work of other agile teams and may lead to etc. This algorithm can be used in image and speech
project failure. analysis, internet content classification, etc.
3) Pandemic: During such unprecedented times like 4) Reinforcement Learning: This algorithm makes the
Covid-19, on-time product delivery is the biggest concern of decision and takes actions to maximize the number of
enterprises and customers. positive outcomes. There is no prior knowledge about the
4) Other: There may be other factors like lack of action to be taken until a situation is given [11]. Some
teamwork, over commitment, team members leaving the Reinforcement Machine Learning algorithms are Q-
project, lack of team motivation, and disappointed Learning, State Action Reward State (SARSA), Deep Q
stakeholders [4]. Neural Network (DQN), etc. this algorithm can be used in
Even after spending 8-10 days per year in the rigorous robotics, game theory, operation research, etc.
planning process for Sprints and PI, teams are not able to
complete the designated features from the Product Backlog. B. Supervised Machine Learning
It may result in missed deadlines, loss of money and This learning has a function that matches an input to an
reputation of the enterprise, and unsatisfied customers. output based on sample input-output pairs. The input data
Organizations can overcome this challenge if there exists set is partitioned into labelled training data set and a testing
a mechanism to predict the on-time delivery of solutions data set to derive a function [11]. It is a task-driven
beforehand. It could help teams to understand the timeline approach.
and increase team size and working hours if needed to 1) Types of Supervised Machine Learning: Following
complete the work on time. are the types as described below:
There could be a proposed solution to this problem: a) Classification: Data is classified into different
One possible solution to this problem could be building categories which are illustrated on input data set based on
such a model using a machine learning algorithm that will their specific features. E.g., Spam Filtering.
use past experiences and data of agile teams to predict the b) Regression: It fits the data and predicts other data
on-time delivery of value to customers. features based on some available features. E.g., Weather
VII. MACHINE LEARNING Forecast, Market Trends etc.
4
Authorized licensed use limited to: Universitas Indonesia. Downloaded on October 25,2023 at 17:07:37 UTC from IEEE Xplore. Restrictions apply.
Fig. 5 Implementation Workflow
A. Read Data
The model is reading data from an excel sheet for now.
However, in real-world implementation, data could be
Fig. 4 Supervised Machine learning Workflow
extracted from several databases like Mongo DB,
PostgreSQL, etc., requiring the data must be in tabular form.
The data set includes attributes such as User Story Name,
4) Classification of Supervised Machine Learning Story State, Status, Story Points, Parent Feature Id, Team Id,
a) Decision Tree: Decision trees are used for Sprint Id, Release Id, and Date. Story State is the only
classification purposes. Each tree has a node that represents dependent feature or target variable whose value (Done/Not
Done) depends on the rest of the features.
an attribute and branches that represent values that the node
can assume [11]. B. Pipeline
b) Linear Regression: This algorithm makes Several steps are followed in the pipeline:
predictions by concluding relationships between one 1) Data Cleaning: Data cleaning techniques are applied to
dependent and many independent variables. For one the raw data set to drop unnecessary columns, remove
dependent and one independent variable, it is simple linear NULL values, and index resetting [13].
regression. In the case of multiple independent variables, it 2) Feature Engineering: Feature Engineering is done to
is multiple linear regression. handle missing values, binning numerical values, extracting
c) Naïve Bayes: This algorithm creates a Bayesian parts of a date, etc. [11]. The model adds another column to
Network based on the probability of occurring events [11]. the data set for calculating the number of days between the
It is used for text classification and clustering. different Story States for each user story i.e., Open to
d) Support Vector Machine (SVM): This algorithm Implementing and Implementing to Done. Now, a new
minimizes classification error by drawing margins between feature is added to the data called Days.
classes, and the distance between margin and classes is 3) Data Preparation: The purpose of this step is to drop
maximum [12]. irrelevant attributes, duplicate checking, and get the
e) Logistic Regression: It is a probability-based numerical form of data [12]. The categorical values of
statistic model that can be used to solve both classification features in the data set are converted into numerical form by
and regression problems but is mostly used in classification creating vector arrays. N-1 dimensional vector is created for
[13]. It calculates probabilities using mathematical logistic a feature having N distinct values. Now, this data is
functions. converted into tabular form and is ready to be used in the
model.
VIII. IMPLEMENTATION The obtained data set is further bifurcated into training
data and testing data. The training data set contains data till
The proposed solution is implemented by building a today’s date, and the testing data set contains future data.
model using a logistic regression algorithm in Machine 80% of the training data set is used for training, and the
Learning that will take features like User Story Name, Story remaining 20% is for validation.
State, Story Points, Parent Feature Id, Team Id, Sprint Id, Stratification is done on the Story State feature to obtain
release Id as input and then calculate the mean probability of samples from data that best represent the data values.
completion of features in the particular data set requested by 4) Scaling: This model uses a Robust Scaler to convert the
the user. actual distribution of data into normal distribution form.
Following is the workflow of the implementation process: Scaling is done on outliner data points for all independent
features. Training data fits into the scaler, and
transformation occurs on training validation and testing data
[11].
5
Authorized licensed use limited to: Universitas Indonesia. Downloaded on October 25,2023 at 17:07:37 UTC from IEEE Xplore. Restrictions apply.
Fig. 6 Robust Scaler Mathematical Formula
5) Modelling: The next step is the model initiation by Fig. 7 Model Dashboard
using a machine learning algorithm. Since the data is in
linearly separable form, logistic regression would be the
best approach to build the model. Logistic regression is best
suited for predicting the probability of a binary event
occurring. Since the Story State feature has two possible
outcomes, Done or Not Done, this can be said to be a binary
classification. Probability in Binary Logistic regression
ranges between 0 and 1. In the case of Story State, the
model assigns the Done state to be 1 and the Not Done state
0.
Logistic Function is called a sigmoid function, which will
take the story states and map the values between 0 and 1.
Fig. 8 Model Prediction Result
1 / (1 + e^-value) (1)
Where, e is base of natural logarithm IX. CONCLUSION AND FUTURE SCOPE
Logistic Regression equation is as follows:
y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x)) (2) This machine learning model will predict the mean
Where, y=predicted output percentage probability of whether the features of a particular
b0=bias or intercept term sprint in a release will be completed in the remaining time
b1=coefficient for single input variable (x) or not. It will help teams to predict on-time product delivery
Logistic Regression Probability Prediction is as follows: at an early stage; so that appropriate quality features can be
P(Story State=Done|Not Done) (3) developed within the deadline. If organizations have this
P(X) = P(Y=1|X) (4) sort of mechanism, teams can increase their velocity, size,
p(X) = e^(b0 + b1*X) / (1 + e^(b0 + b1*X)) (5) and working hours to finish the work on time.
Training data is fit into the model so that it learns and The expected benefits of this model could be:
improves itself over time. x Faster and Smarter Investment decision-making
6) Scores: The confusion matrix is used to calculate the (driven by machine learning model predictions)
model performance for a set of testing data. x Maximize ROI (Return on Investment) with a
Accuracy, error, precision, recall, and f1_score is calculated; reduction in Project Delivery delays
to fully evaluate the model's effectiveness.
Accuracy=68.31, Error=31.69, Precision=63.57, x Increase team productivity, efficiency, and moral
Recall=99.44 and f1_score=77.56 x Supports Lean-Agile Mindset
7) Pre Processing: Binary values are again mapped into
categorical values for Story State. Column added for storing x Cost Avoidance (cost assumptions and average time
period of delayed features)
the number of days during feature engineering is dropped,
and the index is reset. In the future, this model can be integrated with software
like Rally and Jira used by companies employing the Scaled
C. Prediction
Agile Framework. Data sets of teams, sprints, and releases
Now, the model is ready to be accessed by users. User is could be directly extracted from the Rally database and fed
requested to enter Team id, Sprint Id, and Release Id for into this model to predict the probability of completion of
which they want the probability. The model will calculate given features in a particular sprint of a release.
the mean of all Parent Feature Ids of that particular data
piece, and the percentage probability of feature completion REFERENCES
is calculated and returned to the user. [1] Bc. Martin Kalenda, “Scaling Agile Software Development in Large
Fig. 7 depicts the implementation model that can be used Organizations,” Brno, Spring, 2017.
to calculate the feature completion probability by user: [2] Lou Hinterberg, Fredrik Hoffman, “Exploring the Scaled Agile
Framework in a Virtual Team Setting,” Department of Informatics,
Lund School of Economics and Management, Lund University, 2018.
6
Authorized licensed use limited to: Universitas Indonesia. Downloaded on October 25,2023 at 17:07:37 UTC from IEEE Xplore. Restrictions apply.
[3] Nichamon Chantachaimongkol, Puangpetch Sincharoenpanich, International Conference on Information Technology: New
“Critical factors for implementing the Scrum Software Development Generation, 2009.
Methodology,” 2013. [10] Abheeshta Putta, Maria Paasivaara, and Casper Lassenius, “Benefits
[4] Priya Mishra, “Quality Deployment and Use of the Scaled Agile and Challenges of Adopting the Scaled Agile Framework (SAFe):
Framework® - Managing teamwork and Software Quality in the Preliminary Results from a Multivocal Literature Review,” Aalto
Banking Sector,” University of Tampere, 2018. University, Dept. of Computer Science, Espoo, Finland.
[5] Terryboy Simplicio Pereira, “Scrum and XP agile Practices used by [11] Subhas C Misra, Uma Kumar, Vinod Kumar, Gerald Grant, “The
Project Managers contribution towards Software Project Success,” Organizational Changes Required and the Challenges Involved in
Dublin Business School, 2019. Adopting Agile Methodologies in Traditional Software Development
[6] Ameta, U., Patel, M., Sharma, A.K., “Scrum Framework Based on Organizations,” pp. 25-28, IEEE 2006.
Agile Methodology in Software Development and Management,” In: [12] Ayon Dey, “Machine Learning Algorithms: A Review,” Ayon Dey /
Mathur, R., Gupta, C.P., Katewa, V., Jat, D.S., Yadav, N. (eds) (IJCSIT) International Journal of Computer Science and Information
Emerging Trends in Data Driven Computing and Communications. Technologies, Vol. 7 (3), 2016.
Studies in Autonomic, Data-driven and Industrial Computing, [13] Özer ÇELİK, “A Research on Machine Learning Methods and Its
Springer, Singapore, 2021. Applications,” Journal of Educational Technology and Online
[7] Sen, S., Patel, M., Sharma, A.K., “Software Development Life Cycle Learning, September 2018.
Performance Analysis,” In: Mathur, R., Gupta, C.P., Katewa, V., Jat, [14] Iqbal H. Sarker, “Machine Learning: Algorithms, Real-World
D.S., Yadav, N. (eds) Emerging Trends in Data Driven Computing Applications and Research Directions,” Springer, 2021.
and Communications. Studies in Autonomic, Data-driven and
Industrial Computing. Springer, Singapore, 2021. [15] F. Khan, R. Kothari, M. Patel and N. Banoth, "Enhancing Non-
Fungible Tokens for the Evolution of Blockchain Technology," 2022
[8] Astha Singhal, Divya Gupta, “Scrum: An Agile Method,” IJETMAS, International Conference on Sustainable Computing and Data
2014. Communication Systems (ICSCDS), 2022.
[9] Peter Maher, “Weaving Agile Software Development Techniques into
a Traditional Computer Science Curriculum,” Proc. of 6th IEEE
7
Authorized licensed use limited to: Universitas Indonesia. Downloaded on October 25,2023 at 17:07:37 UTC from IEEE Xplore. Restrictions apply.