Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43

SCS2205

Data Science
Project Management

Lecture 9: PM Methodologies & Data Science Process Models

© NUST 2022 SCS2205 Slide 1


Objectives

Introduce different methodologies and explain
their importance

Overview of existing methodologies

Motivation for using methodologies

How these methodologies relate to each other

© NUST 2022 SCS2205 Slide 2


Methodologies Explained

Project management methodologies are
essentially different ways to approach a project.


Project managers should adopt a systematic and
organised approach to their work and use
appropriate tools and techniques depending on
the problem to be solved, the development
constraints and the resources available.

© NUST 2022 SCS2205 Slide 3


Methodologies Explained

A simplified representation of how tasks must be done,
presented from a specific perspective.


Examples of process perspectives are
• Workflow perspective - sequence of activities;
• Data-flow perspective - information flow;
• Role/action perspective - who does what.

© NUST 2022 SCS2205 Slide 4


Factors Affecting Choice of Methodology

Risk

Cost

Duration

Complexity

Customer involvement

Goal & Solution Clarity

# Departments Affected

Organizational Environment

Team Skills & Competencies

Completeness of Requirements

© NUST 2022 SCS2205 Slide 5


Traditional, sequential methodologies
Waterfall project management methodology


Most common way to plan out a project is to sequence the tasks that
lead to a final deliverable and work on them in order.


simplest to understand.

every step is preplanned and laid out in the proper sequence.

excels in predictability but lacks in flexibility.


ideal method for projects that aren't complex and you can easily
replicate project plans for future use.

© NUST 2022 SCS2205 Slide 6


Traditional, sequential methodologies

© NUST 2022 SCS2205 Slide 7


What is AGILE?

Agile is a method that uses continuous stakeholder
feedback to deliver high quality code through use
cases and a series of short time-boxed development
iterations.


A style of product development that concentrates on
adaptive and exploratory, rather than anticipatory and
prescriptive management.


Agile is not a methodology, but is a conceptual
framework for undertaking software engineering
projects.

© NUST 2022 SCS2205 Slide 8


Agile Manifesto for Agile Software Development

We are uncovering better ways of developing software


by doing it and helping others do it. Through this work
we have come to value:

Individuals and interactions over processes and


tools
Working software over comprehensive
documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on the right,


we value the items on the left more.

© NUST 2022 SCS2205 Slide 9


Agile Models
Agile Models Key Attributes

 Based on values of simplicity/communication/feedback/courage


eXtreme  Start with simple solution, add complexity through refactoring
Programming (XP)  Frequent feedback through unit, integration and acceptance testing
 Four development phases: coding, testing, listening, designing
 Daily 15-minute “Scrum” to discuss work for the day
 Divide projects into 30-day “Sprints”
Scrum
 “Backlog” of requirements to be addressed in each Sprint
 “Retrospect” at end of Sprint to review progress / revise backlog
 Simplified Rational Unified Process (RUP): reduced no. of disciplines
Open-UP
 Uses RUP’s four phases: inception, elaboration, construction, transition
 Key mechanisms: frequent delivery; reflective improvement; close
Crystal communication with personal safety; access to expert users; automated
testing; frequent integration; and configuration management
 Repeating cycles: speculate, collaborate, learn
Adaptive
 Provides for continuous learning / adaptation to changing project state
Dynamic Systems  Three primary phases: pre-project, project life-cycle , post-project
Development  Project life-cycle phase consists of feasibility study, business study,
Method (DSDM) functional model iteration, design/build iteration, and implementation
Feature Driven  Key mechanisms: more value on design then the “code is the design”;
Development model-driven; develop feature list; and plan, design, build by feature
© NUST 2022 SCS2205 Slide 10
Agile (or Rapid) techniques have been used for decades, to resolve
key challenges that adversely impact solution development projects

Key Challenges Common Agile techniques


BUSINESS participation as a project team member
Joint requirements, design and prototyping sessions
Requirements are dynamic
and difficult to lock down Use of visual modelling and prototyping tools

Documentation of results vs. targets

Time boxing to fixed dates and fixed cost


Incremental delivery of highest priority project components first
Deliver business value more quickly Decomposition of large initiatives into multiple releases

Small, dedicated, co-located teams in teaming environment

Reduce risk of adopting new technologies


Architects participate and direct lead developers
High risk proof of concepts are performed early in the project

Right skills are dedicated to the project team

© NUST 2022 SCS2205 Slide 11


Agile Adoption Challenges

Potential pitfalls:
• Resistance to collaboration
• Waterfall culture
• Low-trust environment
• Unwillingness to change
• Rigid management hierarchy
• Lack of automated tool support

Very expensive to redeploy the system.

Significant dependencies on new
hardware development.

They feel they are doing a good job with
non-agile approaches.

© NUST 2022 SCS2205 Slide 12


Why Are Organizations Adopting
Agile Strategies?
 Dr. Dobb’s Journal (DDJ) 2008
Project Success Survey:
4,9
5,0
• Agile teams have an average Quality
0,4
2,3

success rate of 70% compared Functionality


5,6
6,0
1,8 Agile
with 66% for 2,7
Iterative
Traditional
3,9
traditional/waterfall teams Money 0,2
3,0 Ad-Hoc
0,8

• Agile teams produce higher Time 4,0


4,4
0,8
quality work, are quicker to 0,8

deliver, are more likely to


deliver the right functionality,
and more likely to provide
greater ROI than traditional
teams

© NUST 2022 SCS2205 Slide 13


Scrum Management Methods
1. Scrum has emerged as the predominate method to manage
Agile Projects.
2. Scrum has well defined methods to manage code
development and test.
3. These methods form a minimum PM system within the Scrum
• Minimum planning is required in the form of a Release plan and
Iteration Plan.
• Tracking and Control occurs via Daily Scrum Meeting and Burn
down charts.
• Technical Change Management occurs as a byproduct of Agile.
• Risk is minimized via frequent stakeholder reviews and input.
• Issue Management occurs informally and through the Daily
Scrum meeting.

© NUST 2022 SCS2205 Slide 14


© NUST 2022 SCS2205 Slide 15
Disciplined Agile Teams

1. Produce working software on a


regular basis.
2. Do continuous regression testing,
and better yet take a Test-Driven
Development (TDD) approach.
3. Work closely with their
stakeholders, ideally on a daily
basis.
4. Are self-organizing, and disciplined
teams work within an appropriate
governance framework.
5. Regularly reflect, and measure, on
how they work together and then
act to improve on their findings in a
timely manner.

© NUST 2022 SCS2205 Slide 16


Agile project teams, when compared
to waterfall project teams.
• Enjoy higher success
rates.
• Deliver higher quality.
• Have greater levels of
stakeholder satisfaction.
• Provide better return on
investment (ROI).
• Deliver systems to market
sooner.

© NUST 2022 SCS2205 Slide 17


Differences between agile project
teams and traditional project teams.

© NUST 2022 SCS2205 Slide 18


Project Management Methodologies
Comparison

© NUST 2022 SCS2205 Slide 19


Why Should There be a Standard Process?

The data mining process must be reliable


and repeatable by people with little data
mining background.

© NUST 2022 SCS2205 Slide 20


Why Should There be a Standard Process?


Framework for recording experience
• Allows projects to be replicated


Aid to project planning and management


“Comfort factor” for new adopters
• Demonstrates maturity of Data Mining
• Reduces dependency on “stars”

© NUST 2022 SCS2205 Slide 21


Knowledge Discovery in
Databases

Process of non trivial extraction of implicit, previously


unknown and potentially useful information from large
collections of data,

KDD is an iterative process where evaluation


measures can be enhanced, mining can be refined,
new data can be integrated and transformed in order
to get different and more appropriate results.

© NUST 2022 SCS2205 Slide 22


Data Mining: A KDD Process

© NUST 2022 SCS2205 Slide 23


Steps of a KDD Process

Data Cleaning: Data cleaning is defined as removal of
noisy and irrelevant data from collection.
• Cleaning in case of Missing values.
• Cleaning noisy data, where noise is a random or variance error.
• Cleaning with Data discrepancy detection and Data
transformation tools.

Data Integration: Data integration is defined as
heterogeneous data from multiple sources combined in a
common source(DataWarehouse).
• Data integration using Data Migration tools.
• Data integration using Data Synchronization tools.

© NUST 2022 SCS2205 Slide 24


Steps of a KDD Process

Data Selection: Data selection is defined as the process
where data relevant to the analysis is decided and
retrieved from the data collection.
• Data selection using Neural network.
• Data selection using Decision Trees.

Data Transformation: Data Transformation is defined as
the process of transforming data into appropriate form
required by mining procedure.
• Data Mapping: Assigning elements from source base to
destination to capture transformations.
• Code generation: Creation of the actual transformation program.

© NUST 2022 SCS2205 Slide 25


Steps of a KDD Process

Data Mining: Data mining is defined as clever techniques
that are applied to extract patterns potentially useful.
• Transforms task relevant data into patterns.
• Decides purpose of model using classification or
characterization.

Pattern Evaluation: Pattern Evaluation is defined as
identifying strictly increasing patterns representing
knowledge based on given measures.
• Find interestingness score of each pattern.
• Uses summarization and Visualization to make data
understandable by user.

© NUST 2022 SCS2205 Slide 26


Steps of a KDD Process

Knowledge representation: Knowledge
representation is defined as technique which
utilizes visualization tools to represent data
mining results.
• Generate reports.
• Generate tables.
• Generate discriminant rules, classification rules,
characterization rules, etc.

© NUST 2022 SCS2205 Slide 27


KDD Steps can be Merged
Data cleaning + data integration = data pre-processing
Data selection + data transformation = data consolidation

KDD Is an Iterative Process

Principles of Knowledge Discovery in Data


© NUST 2022 SCS2205 Slide 28
CRISP-DM

Cross-Industry Standard Process for Data Mining

Aim:
• To develop an industry, tool and application neutral
process for conducting Knowledge Discovery
• Define tasks, outputs from these tasks, terminology and
mining problem type characterization

Designed by Integral Solutions Ltd., NCR (databases),
Daimler Chrysler, and OHRA (insurance company);
the last two provided data and case studies

CRISP-DM Special Interest Group ~ 200 members
• Management Consultants
• Data Warehousing and Data Mining Practitioners

© NUST 2022 SCS2205 Slide 29


Levels of Abstraction

Phases
• Example: Data Preparation

Generic Tasks
• A stable, general and complete set of tasks
• Example: Data Cleaning

Specialized Task
• How is the generic task carried out
• Example: Missing Value Handling

© NUST 2022 SCS2205 Slide 30


Phases of CRISP-DM


Data Mining is a standards-based, iterative and adaptive process

© NUST 2022 SCS2205 Slide 31


Business Understanding Phase

Focus is on understanding objectives and requirements from a
business perspective and it converts them into a DM problem definition.

Understand the business objectives
• What is the status quo?
• Understand business processes
• Associated costs/pain
• Define the success criteria
• Develop a glossary of terms: speak the language
• Cost/Benefit Analysis

Current Systems Assessment
• Identify the key actors
• Minimum: The Sponsor and the Key User
• What should the output look like?
• Integration of output with existing technology landscape
• Understand market norms and standards

© NUST 2022 SCS2205 Slide 32


Business Understanding Phase

Task Decomposition
• Break down the objective into sub-tasks
• Map sub-tasks to data mining problem definitions

Identify Constraints
• Resources
• Law e.g. Data Protection

Build a project plan
• List assumptions and risk (technical/financial/business/
organisational) factors

© NUST 2022 SCS2205 Slide 33


Data Understanding Phase

Includes identification of data quality problems,
discovery of initial insights into the data, and
detection of interesting data subsets.

Collect Data
• What are the data sources?
• Internal and External Sources
• Document reasons for inclusion/exclusions
• Depend on a domain expert
• Accessibility issues
 Legal and technical
• Are there issues regarding data distribution
across different databases/legacy systems
• Where are the disconnects?

© NUST 2022 SCS2205 Slide 34


Data Understanding Phase II

Data Description
• Document data quality issues
• requirements for data preparation
• Compute basic statistics


Data Exploration
• Data Quality Issues
• Missing Values
 Understand its source: Missing vs Null values
• Strange Distributions

© NUST 2022 SCS2205 Slide 35


Data Preparation Phase

Covers all activities to construct the final
dataset, which constitutes the data to be fed into
DM tool(s) in the next step.


Integrate Data
• Joining multiple data tables
• Summarisation/aggregation of data


Select Data
• Attribute subset selection
• Rationale for Inclusion/Exclusion
• Data sampling
• Training/Validation and Test sets

© NUST 2022 SCS2205 Slide 36


Data Preparation Phase II

Data Transformation
• Using functions such as log
• Factor/Principal Components analysis
• Normalization/Discretisation/Binarisation


Clean Data
• Handling missing values/Outliers


Data Construction
• Derived Attributes

© NUST 2022 SCS2205 Slide 37


The Modelling Phase

Selects and applies various modeling tools. It involves using
several methods for the same DM problem and optimizing
their parameters.

Select of the appropriate modelling technique
• Data pre-processing implications
• Attribute independence
• Data types/Normalisation/Distributions
• Dependent on
• Data mining problem type
• Output requirements


Develop a testing regime
• Sampling
• Verify samples have similar characteristics and are representative
of the population

© NUST 2022 SCS2205 Slide 38


The Modelling Phase

Build Model
• Choose initial parameter settings
• Study model behaviour
• Sensitivity analysis


Assess the model
• Investigate the error distribution
• Identify segments of the state space where the model is less
effective
• Iteratively adjust parameter settings
• Document reasons of these changes

© NUST 2022 SCS2205 Slide 39


The Evaluation Phase

Evaluated from business objective perspective
and reviewed.

Validate Model
• Human evaluation of results by domain experts
• Evaluate usefulness of results from business
perspective
• Define control groups
• Calculate lift curves
• Expected Return on Investment

Review Process

Determine next steps
• Potential for deployment
• Deployment architecture
• Metrics for success of deployment

© NUST 2022 SCS2205 Slide 40


The Deployment Phase

Organization and presentation of the
discovered knowledge in a user-friendly way.

Knowledge Deployment is specific to
objectives
• Knowledge Presentation
• Deployment within Scoring Engines and
Integration with the current IT infrastructure
• Automated pre-processing of live data feeds
• Generation of a report
• Online/Offline
• Monitoring and evaluation of effectiveness

Process deployment/production

Produce final project report
• Document everything along the way
© NUST 2022 SCS2205 Slide 41
Advantages of CRISP-DM model
• uses easy to understand vocabulary and is well
documented

• acknowledges the iterative nature of the process with


loops between the steps

• frequently used because of its grounding in industrial


real-world; example applications:
• medicine, engineering, marketing, sales

© NUST 2022 SCS2205 42


Slide 42
Thank You.

© NUST 2022 SCS2205 Slide 43

You might also like