What is a data warehouse and what are the benefits?


What are the bene ts of a data warehouse and

when do you need one?
June 9, 2018
Analytics
Marketing

Nearly every business leader (99% according to an HBR study

have-a-data-driven-culture)) recognizes the need to build a data driven organization. 
However, so many struggle to use data for a variety of reasons, a big one of them being
that data exists in all sorts of places.  This is especially the case with startups, who use a
number of out of the box tools that don’t connect to each other, and that only o er so
much granularity and exibility. 1/14
What is a data warehouse and what are the benefits?

The solution?  A data warehouse.

What is a data warehouse?  

A data warehouse, also known as a data lake or a data mart, is a single database or set
of databases that contains all of your business data.  Typically the data is piped in from
a number of disparate sources – core product, CRM, help desk, analytics tools,
accounting software – basically all of your operating systems.  Sometimes the data is
transformed or cleansed on the way in.  The data is not accessed by any operational

3 systems, and thus it’s stored in a manner that optimizes for accessibility and analysis.

The bene ts of a data warehouse

Storing all of your data in a data lake has a number of bene ts, including:

Bene t #1: Tables are optimized for read access:  You can use more complex,
ine cient queries that are necessary for analysis, but wouldn’t be practical in a
production environment.
Bene t #2: Cheaper queries:  Because of the previous points, typically data
warehouses are cheaper to run queries on than production databases.
Bene t #3: Quicker reporting: No need to export CSVs from 3 di erent softwares
to update metrics.

These are all great, but in my opinion, the biggest bene t to having a data warehouse
is as follows: there are game changing insights that can only be achieved when your all
of your data is in one place in its raw form.

The next section will cover a few examples of these insights.

Reaping the Bene ts: Examples of a data warehouse in

Here are a handful of examples where I’ve seen rst hand the value of a data

1 – Risk splitting 2/14
What is a data warehouse and what are the benefits?

In the loan industry – where I rst cut my teeth in marketing and analytics –  it’s typical
that customers who actually read terms and conditions before hitting the ‘apply’
button are less risky.  By less risky, I mean less likely to default on their loan.

For the lending institution, this is a big freaking deal.  First of all, they have to decide
whether to approve this borrower or not.  Secondly, they have to decide which line of
credit to extend, at what interest rate.  That, combined with their assessment of the
borrower’s riskiness, dictates capital requirements. At an aggregate level, we’re talking
billions of dollars here.  All this is driven by a few click and scroll events on whether a
Sharesperson read terms or not.

2 So what’s my point?  Well for one, be sure to take 5 minutes to read (or pretend to read)
the terms when applying for a credit card.

But the bigger point is, pre-purchase behavior can be highly correlated with post-
purchase behavior, and make really freaking big di erences.

Most startups and that have their act together will setup a proper funnel, typically using
an out of the box tool like Google Analytics or Mixpanel, and will start hopefully A/B
testing, and optimize for conversion rate.  Unfortunately, that’s typically where it stops. 
While these tools are great, they typically o er no way to connect pre and post purchase
behavior.  That’s where the data warehouse comes in.

#2: Calculating customer lifetime value by channel

Let’s suppose you’re running growth for a startup that’s just starting to get traction.
 After a bunch of excel nagling (
speed/) you come up to the conclusion that each customer will be worth $28 after year
one.  You decide that you want to target a 1 year breakeven (this is a whole debate in it
of itself we can save for another post).

After a month of testing and tweaking, you think you have 4 viable channels, as all of
them are $28 or lower.  Here are their costs to acquire (CAC), broken down by channel
( 3/14
What is a data warehouse and what are the benefits?

3 2.57.20-PM.png)

One year down the road, you plot your cumulative LTV curve,
ab65a07464e1#.hkcib038x) and turns out your initial estimate was correct, your 1 year
LTV is in fact around $28.


Everything looks good right?

Adwords – your most expensive channel – is breaking even, and you know you can
easily optimize that and/or adjust your bids if need be.  4/14
What is a data warehouse and what are the benefits?

Facebook and Twitter are both below your CPA by $2, suggesting room to bid a
tiny bit more
Your cheesy SXSW Gimmicks yielded the lowest CAC, so you gear up to tour nerd-
fests around the country.

All seems good – right?

Lots of companies stop here, and can’t really go any further.  If you have a data-
warehouse setup, you can get a much more granular level of insight.
Suppose we decide to break these LTV curves down by channel, and see something like


This tells a very di erent story

Facebook customers are crushing it with a $38 1 year LTV, compared to a $26 CAC.
 You can a ord to spend more!
Adwords customers are pretty darn close to breakeven, so you’re probably bidding
right.  Though it appears as if LTV is increasing, at a fairly linear pace, so you may 
even be able to increase your spend a bit 5/14
What is a data warehouse and what are the benefits?

Twitter customers LTV plateaus o at $24, under your cost per acquisition.  You
de nitely didn’t hit 1 year breakeven, and you may never break even
Your SXSW gimmicks were actually an utter failure.  They plateaued around month
9 with an LTV of $20, and will probably never break even.

While in aggregate it appears as if you’re doing well, when you break it down by channel
you’re missing out on some channels and overspending on others.

If you never connect pre-purchase data (the customers’ source) to post-purchase activity

3 (the customers’ spend), you’d never know that two of your channels are awed, and one
you’re missing out on.  It’s quite often that di erent channels have di erent lifetime
values, and you can really screw yourself if you don’t have this insight.  A case where
having a properly setup data lake can help you out a lot.

This is a fairly straightforward example, and I debated even including it because I feel
like most people understand this concept intuitively.  However, very few companies I
know practice it, especially consumer companies that don’t use an out of the box CRM.

3 – Data warehouses provide greater predictive power

About when we raised our Series A round of nancing last spring, I was interested
in digging into increasing retention.

Fortunately, I knew of a fellow Techstars company called Data Robot

(, which allows people like me (non-data scientists) to build
machine learning algorithms pretty easily. 6/14
What is a data warehouse and what are the benefits?



To build your model, you just upload a tabular CSV le that contains whatever variable
you want to predict output and whatever variables you think might be correlated.

In my case, the predicted output was whether the person had canceled during the last
month, and my inputs were all sorts of things: lot size, price, average rating, NPS score,
location, lead source, whether they joined via our app or online, and sorts of activity

Data Robot then builds a bunch of machine learning models for you, and recommends
the best to use based on a variety of metrics.  It also does a univariate analysis on your
dataset, and shows which variables play the biggest role in the outputs.  I won’t get too
into the details here, but it’s a pretty cool tool.

Turns out the data we had collected actually had a good amount of predictive power.

Creating this model, I learned a few things:

1. Data Robot is really freaking cool – seriously, you should check it out.  In a night a
guy who slept through stats class made a real machine learning model
2. You don’t need a ginormous dataset to start using predictive analytics – I always
sort of assumed you needed to be at Pinterest-scale to warrant doing data science.
 Yet, there we were, barely a series-A stage company, making models that had a

real impact. 7/14
What is a data warehouse and what are the benefits?

3. You never know what factors are going to matter – turns out that people who spell
out “street” vs abbreviating it “st” are more likely to churn…weird.

On that last point, there are so many datapoints that could help you predict churn.  All
of them live in di erent systems.  Whether it’s a nuance from the sales process (CRM),
the number of times someone contacts support (helpdesk), interactions with your email
(marketing automation tool) or engagement with your app (event tracking).

Predictive analytics can be a powerful business driver, but it hinges on being able to
3 connect all of your data together.  A data warehouse makes that clean and easy.

4 – Cohorting support burden 


A common shortcut in calculating lifetime value is to plot your realized LTV curves on a
gross revenue basis, extrapolate them out, and fudge a terminal value to get your gross
spend per original in a cohort.  Then, multiply that by your contribution margin – %
earned after variable costs – that you can nd on your income statement.

It’s an easy shortcut and usually su ces, but it’s technically incorrect.  Simply dividing
your total contribution by your total revenue lumps together all of your customers,
regardless of whether they are new or old.

At LawnStarter, we found that new customers tend to create more support issues than
established customers.  Many of my friends at B2B companies say the same thing – it’s
more likely a customer contacts support for help learning the product early on.

To properly plot your LTV curves, you can use payroll and your ticket system to cohort
out the support cost (or other variable costs) over time.  See how support cost goes
down over time in the ctitious example below? 8/14
What is a data warehouse and what are the benefits?


The more early cohort churn there is (common for low price, self serve software) in the
2 rst couple months, the worse the problem.  If you plot the LTV curves for the same
ctitious company using the shortcut method and the support cost cohort method,
you’ll note they both arrive at the same overall value, but the shortcut value makes it
look like your payback period is sooner than it is.

Now, there are many businesses where the shortcut method is close enough, but it’s
important to know if that’s the case or not.  Fooling yourself on payback period can have
big implications when you start focusing on cash conversion cycle.

Without a data warehouse, this exercise would be very di cult. 9/14
What is a data warehouse and what are the benefits?

5 – Getting past Salesforce’s terrible reporting

“There are a lot of amazing things about Salesforce, but what’s most amazing is how bad
it is.” –Michael Berliner (

Salesforce is the go-to CRM, and it really is the best across most applications.  But a
number of it’s features are lacking, one of them being the reporting.

Not only is the reporting clunky and hard to use, it doesn’t allow you to do a number of
Sharesbasic things.  One that I found frustrating was visualizing a funnel from lead to
opportunity to close won.  Now once we piped that data into Redshift, that was a
2 breeze.  It was also much easier to gain insights from.

When to invest in a data warehouse

Creating an maintaining a data warehouse is an investment.  When should you make
this investment?  Whenever you think that you can start gaining real insights.

A startup selling enterprise packages could be doing $10 million in revenue with a
couple dozen customers.  Whereas a consumer-facing company might have thousands. 
The latter is likely to have more customer-level insights simply because of the sample

It also depends on when your out-of-box solutions begin to fall short.  Google Analytics
alone can take you really far if you’re a media site, whereas if you’re a marketplace you’ll
need deeper insights much earlier on.

Do recognize that getting your data warehouse right is somewhat of an iterative

process, and will take time.  Even if there are no errors, it’s common to nd that data
needs to be collected di erently to be useful.  Give it time to get it right.

Data warehouse vs data lake vs data mart

Data warehouse and data lake are often used interchangeably and I’m including the
phrase ‘data warehouse vs data lake’ as SEO for that term.  A data mart is also

sometimes used interchangeably, but can also refer to a section of a data warehouse. 10/14
What is a data warehouse and what are the benefits?

Marketing, for example, may have their own data mart.  Every company has their own
de nition, so it’s not worth debating semantics.

Experiencing the bene ts of a data warehouse doesn’t

require custom work
Fortunately, there are out of the box tools that make creating a data warehouse easy,
with minimal engineering resources needed.

3 At LawnStarter, we use Segment which for starters, writes all of our event tracking data
straight to Amazon Redshift.  Additionally, takes data from Google Adwords, Salesforce,
Zendesk, Stripe, Google Analytics and even Jira and loads it into the same Redshift
database.  Additionally, we use a di erent service that pipes all of our production
MySQL data into redshift.  We even write search engine rankings straight to using
SerpDB – an SEO rank tracking tool (

Just because these solutions exist doesn’t mean it’s plug and play.  The biggest challenge
that I found was making sure that data was collected in a way that is useful.  You need
to think through how you’ll connect each dataset – usually this involves some sort of
unique identi ers.  Additionally, the processes you’ve been using may not be best for
data integrity.  Once we setup our data warehouse, we had to completely redo the
categorization of tickets in Zendesk, for example.

5 Comments

1 Login

 Recommend t Tweet f Share Sort by Best

Join the discussion…



Jenny Bebout • 3 years ago

You've got me intrigued by this Data Robot. Many questions about to come your way. You're my
hero, Ryan. 
1△ ▽ • Reply • Share › 11/14
What is a data warehouse and what are the benefits?

Ryan Farley Mod > Jenny Bebout • 3 years ago

Thanks Jenny!
△ ▽ • Reply • Share ›

Tanner Corbin • 3 years ago

Great post Ryan, thanks for sharing. I'm curious what "database columns to join all these datasets
together" do you use? How do you know you're comparing the same person across datasets?
Thank you!
△ ▽ • Reply • Share ›

Ryan Farley Mod > Tanner Corbin • 3 years ago

Shares That's probably the hardest part about it.

For example, Mixpanel we store Mixpanel's distinct id in a MySQL database column, as

2 soon as we can link a user to a lead. This not only requires a little bit of a hack when it
comes to Mixpanel, but also using their aliasing. As as sidenote, Mixpanel sucks when it
comes to big boy data collection; I can't wait to switch to segment.

Another challenging one was Zendesk. Zendesk's data uses email address to identify a
customer. However, a customer might use a different email address to create a ticket. Or
they might call in. So we had to create ops processes that ensured we always link it up

Basically, you have to get the entire organization aligned in order to make it happen, but
it's well worth it in the end.

Thanks for reading!

△ ▽ • Reply • Share ›

Tanner Corbin > Ryan Farley • 3 years ago

△ ▽ • Reply • Share ›


What is a data warehouse and what are the benefits?
