Professional Documents
Culture Documents
Implementing A Data Warehouse Via Vertical Slicing
Implementing A Data Warehouse Via Vertical Slicing
Search
One of the fundamental concepts of taking Disciplined Agile's (DA) approach to development is to slice functionality vertically into small, consumable pieces that may be potentially
deployed into production quickly. These vertical slices are completely implemented - the analysis, design, programming, and testing are complete - and offer real business value to
stakeholders. Although it is fairly clear how to do this when you're building a website, or a mobile application, it isn't clear how to do so if you're building a data warehouse (DW) or
business intelligence (BI) solution.
The focus of this article is to describe what vertical slicing potentially means for a DW/BI solution. This article is organized into the following topics:
For a DW/BI solution a vertical slice is fully implemented from beginning (the data sources) to end (accessibility by end users in the DW or BI output). This means that you have fully
implemented (within a matter of days or even hours):
Extraction from the data source(s). This is required only for the data elements that you need for the given vertical slice. For a mature DW you are likely to have most of the
data elements already, and maybe even all of them. Worst case is when you need elements from one or more "new" data sources that you have never accessed before. This
will require the initial work to gain access to the data source and to analyze the data source (ideally read its supporting documentation) to identify the data elements that you
require.
Staging of the raw source data. I typically recommend that whenever you access a table in a data source for the first time that you stage the entire table at that point. The
implication is that you may already be staging the required data elements even if you've never needed them before this point. When that's not the case you'll need to do the
work to stage the required tables. Of course, if your DW architecture doesn't include staging incoming raw data first then this step should be skipped.
Transformation/cleansing of the source data. You need to do the work, if any, the transform the incoming source data for just the new data elements that you require for
this vertical slice.
Loading the data into the DW. Once again, you need to do this for just the new data elements required for this vertical slice.
Loading your data marts (DMs). If the data elements are needed in your DMs, if you have any, to implement this vertical slice then you will need to do this work too.
Updating the appropriate BI views/reports where needed. As you'll soon see your slice may simply make some data available in the DW or DMs for ad-hoc reporting.
A common theme running through all of those steps is that you only do the work for the vertical slice that you're currently working on. This is what enables you to get the work done
in a matter of days (and even hours once you get good at it) instead of weeks or months.
1. Reduce the feedback cycle. By focusing on delivering small, vertical slices every few days or weeks you have more opportunities to show working functionality to
stakeholders and thereby receive concrete feedback that you can act on. This enables your stakeholders to steer your work more effectively. It also motivates your team to
test throughout the lifecycle, thereby reducing your overall cost of fixing any found defects dramatically.
2. Increased ability to meet actual stakeholder needs. By taking a flexible, evolutionary approach to developing your DW/BI solution where you regularly seek feedback you
end up discovering what your stakeholders actually need in practice. With a traditional approach where you attempt to think everything through up front the best you can
possibly do is to build something to specification - this is unfortunately ineffective because people are not good at defining their needs up front and even if they were they
would change their minds anyway due to changes in the marketplace.
3. More competitive. Delivering in small, incremental slices enables your team to react to changing requirements quickly. The ability to deploy these vertical slices easily
enables your organization to react quickly to marketplace dynamics, thereby increasing your competitiveness.
4. Increased quality. Vertical slicing forces data professionals to adopt modern, agile database techniques that have a significantly greater focus on quality than do traditional
techniques. Agile DB techniques such as database refactoring and database regression testing, are clearly focused on data quality.
5. Lower implementation risk. Working in small vertical slices forces the team to fully integrate and test their solution very early in the lifecycle. If there are integration issues
they will be found much earlier in the lifecycle when they are easier and less expensive to address.
6. Reduced cost of delay. Delivering in vertical slices enables teams to get working functionality into the hands of their stakeholders quickly, reducing overall cost of delay
(opportunity cost from a management accounting point of view).
There are several common complaints about working this way, but they rarely seem to hold water in practice. These complaints are:
1. It takes longer to deliver the overall solution. No, the traditional/serial approach tends to take longer in practice due to less sense of urgency and the likelihood that the
team will spend time building functionality that stakeholders don't actually want (because they built to the specification). By building incrementally you deliver smaller, valuable
functionality into production sooner thereby reducing cost of delay.
2. We need to think everything through at the beginning. Yes, it is a good idea to do some up front thinking, which is why disciplined agile techniques such as requirements
envisioning and architectural envisioning exist.
3. It's more expensive in the long run. This is also very rare in practice. Furthermore, the real issue is producing value, not what the expense of doing so is. Agile teams enjoy
higher levels of ROI on average than traditional teams because they work in priority order and deliver incrementally (once again, reducing cost of delay).
There are several interesting things about the stories in the table:
1. They are written from the point of view of your stakeholders. They aren't a technical specification. For example, the first story describes how professors want a list of
student names but it isn't saying from what data source(s), what the element names are, … These are design issues, not requirement issues.
2. They always provide business value. The first story appears to be the beginnings of an attendee list for a seminar. Having something as simple as a list of names does in
fact provide a bit of value to professors.
3. Sometimes that business value isn't (yet) sufficient. It may take several iterations to implement something that your stakeholders want delivered into production,
particularly at first. For example, although a list of student names is the beginnings of a class list it might not be enough functionality to justify putting it into production.
Perhaps professors also need to know the program that the student is enrolled in, their current year of study, and basic information about the seminar such as the course
name, time, and location of it. The decision as to whether the functionality is sufficient to ship is in the hands of your stakeholder (this is one of the reasons why you want to
demo your work on a regular basis).
Agile data modeling - For lightweight initial modeling and evolutionary detailed modeling of your data structures.
Agile modeling in general - There's more to modeling than data.
Database refactoring - To safely and easily evolve existing databases, including your data warehouse and data marts.
Database regression testing - To validate your work in an automated manner
Continuous database integration - To ensure changes are automatically regression tested.
Continuous database deployment - To ensure working updates to your database are shared appropriately.
Figure 1 below depicts a mini-waterfall approach where a team works through the traditional phases, mostly in order, throughout an iteration/sprint. These iterations are typically
longer than usual, often four or more weeks in length, whereas 80% of agile teams have iterations of two weeks or less. Mini-waterfalls are common with teams that are very new to
agile and in this case should be seen as a step in the right direction away from the traditional/serial approach towards an agile approach. However, if you're taking a mini-waterfall
approach because of one or more of the reasons discussed earlier (see you can do this too) then what's really happened is that the team is using one of those flimsy excuses for
not making the behavioral changes required to be truly agile.
Figure 1. A mini-waterfall.
The Staggered Mini-Waterfall anti-pattern is depicted in Figure 2 below. The basic strategy is that the team is organized into functional silos such as data analysts, data
architects/designers, developers, and testers - usually along the lines of what people were comfortable with taking a traditional approach. The analysts do their "sprint" where they
complete the data analysis work for one or more stories. They then hand this off to the designers who do their "design sprint", who hand off to the developers to do their
"development sprint", and finally to the testers who do their "testing sprint." Once the analysts hand off their work to the designers they move on to analyze the next batch of
requirements (often user stories). Once again, at best this might be a step towards becoming agile but it certainly isn't agile. Many times when I run into a DW/BI team taking this
approach it's because the team is composed of people who are overly specialized (remember, agilists strive to become cross-functional generalizing specialists) and often have not
bothered to learn modern agile database skills. This is ok if you're just starting out with agile, as we like to say you go to war with the army that you've got so if everyone is a
specialist then that's how you start out. BUT, when you invest in your people and when team members recognize the importance of learning new skills then they can quickly work
together to learn new skills from one another.
As we show in the article Disciplined Agile Data Warehousing it is in fact possible for DW/BI teams to work in an agile manner. There is absolutely no reason, except as a step in
your team's overall learning effort, to follow either a Mini-Waterfall or Staggered Mini-Waterfall approach. You can and should do better.
7. Parting Thoughts
Vertical slicing is an important skill for any agile team, regardless of what they are building. In this article you learned that it is highly desirable to do so for a DW/BI solution and more
importantly that the techniques exist to do so. For most people the hardest thing about vertical slicing is to adopt the agile mindset behind working this way, something that can be
very tough for experienced data professionals given the cultural impedance mismatch between traditional data professionals and modern agile practitioners.
At Scott Ambler + Associates we help teams to become more effective in the way that they work. We coach, educate and train people in advanced agile and lean skills. We have a
wide variety of workshops that we deliver, including one on Disciplined Agile DB/BI skills. We would love to help you on your agile journey.
8. Recommended Resources
Recommended Reading
This book, Choose Your WoW! A Disciplined Agile Delivery Handbook for Optimizing Your Way of Working, is an indispensable guide for agile coaches and
practitioners to identify what techniques - including practices, strategies, and lifecycles - are effective in certain situations and not as effective in others. This
advice is based on proven experience from hundreds of organizations facing similar situations to yours. Every team is unique and faces a unique situation,
therefore they must choose and evolve a way of working (WoW) that is effective for them. Choose Your WoW! describes how to do this effectively, whether they
are just starting with agile/lean or if they're already following Scrum, Kanban, SAFe, LeSS, Nexus, or other methods.
I also maintain an agile database books page which overviews many books you will find interesting.