A Game Plan For Success in Data Analytics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 118

Table of Contents

YOUR 5-POINT GAME PLAN FOR SUCCESS


A Game Plan for Success in Data Analytics
DEDICATION
A GAME PLAN FOR SUCCESS IN DATA ANALYTICS
INTRODUCTION
MOTIVATION
EXPECTATION
WHO IS THIS BOOK FOR?
WHAT DO I BRING TO THE TABLE?
WHAT DOES THIS BOOK OFFER?
What this book is NOT about
INTRODUCING THE DATA NINJA
The Data Ninja Defined
Ninja Tip 1.
The Ideal Data Ninja Candidate
DATA NINJAS AS INTERPRETERS OF DATA
HIRING OUTLOOK
REAL NUMBERS – HIRING DEMAND
REAL NUMBERS - JOB POSTINGS
WHAT INDUSTRIES DO DATA NINJA’S WORK IN?
List of companies and industries that hire Data Ninja’s
ROLES AND SPECIALIZATIONS OF DATA NINJAS
HOW MUCH DO DATA NINJAS MAKE?
What’s in for you?
THE EXPONENTIAL GROWTH OF DATA
Illustration
Dazzling facts about the current growth of data
Some More Data Analytics Facts and Data Trends
WHY ARE DATA NINJAS NEEDED?
WHAT DO DATA NINJAS DO?
From Data to Wisdom and everything in between
The Different States of Data and Information
WHAT IS DATA ANALYSIS?
FIVE STEPS TO DATA ANALYSIS BY DATA NINJAS
Ninja Tip 2.
JOB RESPONSIBILITIES OF A DATA NINJA
Ninja Tip 3.
It’s Your Business to Know Your Business
THE TYPES OF ANALYSIS A DATA NINJA PERFORMS
Ninja Tip 4.
WHAT ARE THE SKILLSETS FOR THE DATA NINJA?
Required Skillsets for the Data Ninja
Ninja Tip 5.
WHAT TOOLS DO DATA NINJA USE?
Ninja Tip 6.
Tools are Important, but Not the End-All-Be-All
WHAT TRAINING DO DATA NINJAS NEED?
Training for Data Ninja Candidates
Training of Some Sort is Crucial for Success
Ninja Tip 7.
SAMPLE: REAL WORLD DATA NINJA STORY
SAMPLE: REAL WORLD DATA NINJA JOB DESCRIPTION AND POSTINGS
MASTER MS EXCEL
MOTIVATION
MS EXCEL CORE FUNCTIONALITIES
MS EXCEL FOR DATA ANALYSIS
MS EXCEL: AVAILABLE DATA SOURCES
MS EXCEL FOR DATA PIVOTING
MS EXCEL FOR DATA PRESENTATION
MICROSOFT POWERBI
The Future of Power BI
WHAT EXCEL IS NOT
PAIN POINTS WITH EXCEL
Ninja Tip 8.
SUMMARY
MS EXCEL - NINJA RESOURCES CORNER & SHADOW SKILLS
KEY NINJA LESSONS
CONQUER SQL
MOTIVATION
WHAT IS SQL?
WHY IS SQL BENEFICIAL
Ninja Tip 9.
RDBMS SYSTEMS
KEY CHARACTERISTICS OF THE RDBMS
VENDORS OF RDBMS
MANY FLAVORS OF SQL
SQL Portability
DATA ANALYSIS WITH SQL
Data Manipulation Language (DML)
Data Definition Language (DDL)
Data Control Language (DCL) and Others
Transaction Control Language (TCL)
THE DBA: ROLE AND RESPONSIBILITIES
Ninja Tip 10.
WHY SQL CONTINUES TO REMAIN RELEVANT FOR DATA ANALYTICS?
SUMMARY
STRUCTURED QUERY LANGUAGE (SQL) - NINJA RESOURCES CORNER & SHADOW SKILLS
KEY NINJA LESSONS
TAME DATA WAREHOUSING
MOTIVATION
WHAT IS A DATA WAREHOUSE?
Data Warehousing Defined
DATA WAREHOUSE ARCHITECTURE
WHY IS THE DATA WAREHOUSE IMPORTANT?
KEY BENEFITS OF THE DATA WAREHOUSE
DIMENSIONAL MODELLING
Dimensional Model Pro’s ad Con’s
Why is Dimensional Modeling Beneficial to the Data Ninja?
SOME KEY DATA WAREHOUSE DEFINITIONS AND CONCEPTS
Ninja Tip 11.
SUMMARY
MS DATA WAREHOUSING - NINJA RESOURCES CORNER & SHADOW SKILLS
KEY NINJA LESSONS
PICKUP CODING
MOTIVATION
MAKING A CASE FOR PROGRAMMING SKILLS
WHAT IS PROGRAMMING?
WHY LEARN HOW TO PROGRAM?
What Experts Say About the Mastery of Programming Skills
Work your way up the programming ladder
STARTING PROGRAMMING AS A NOVICE
Practice Makes Perfect
Programming Success
WHICH PROGRAMMING LANGUAGES TO LEARN DATA ANALYSIS
DIVERSE SKILL SETS FOR PROGRAMMING
PROGRAMMING IS THE GATEWAY TO BROADER LEARNING
SUMMARY
PROGRAMMING - NINJA RESOURCES CORNER & SHADOW SKILLS
KEY NINJA LESSONS
CONTINUE ADAPTING
MOTIVATION
HYPE CYCLE FOR EMERGING TECHNOLOGIES
PREDICTIONS ARE THE NEXT FRONTIER IN ANALYTICS
Why is Predictive Analytics Important?
WHAT IS PREDICTIVE ANALYTICS ABOUT?
PREDICTIVE ANALYTICS IN ACTION
THE INEVITABILITY OF CHANGE IS THE ONLY CERTAINTY.
EVOLUTIONARY CHANGES
REVOLUTIONARY CHANGES
SUMMARY
Ninja Tip 12.
CHANGE AND ADAPTATION - NINJA RESOURCES
KEY NINJA LESSONS
CONCLUSION
Stuff to Blow Your Mind
Acronyms
Glossary of Definitions
Relevant Quotes Glossary
BIBLIOGRAPHY
About the Author
YOUR 5-POINT GAME PLAN FOR SUCCESS

1. MASTER EXCEL
2. CONQUER SQL
3. TAME DATA WAREHOUSING
4. PICKUP CODING
5. CONTINUE ADAPTING
A Game Plan for Success in Data Analytics

© 2015 by Fru N. All rights reserved.


No part of this book may be reproduced in any written, electronic,
recording, or photocopying form without written permission of the author,
Fru Nde.
Every effort has been made in the preparation of this book to ensure the
accuracy of the information presented. However, the information contained
in this books is sold without warranty, either express or implied. Neither the
author, nor the publishing company, and its dealer and distributors will be
held liable for any damages caused or alleged to be caused directly or
indirectly by this book.
First Edition
Printed in the, United States of America
DEDICATION

In Africa, there is proverb that says, “It takes a village to raise a child.”
I have certainly been raised in good hands by a village, by my family,
friends and by anyone whose paths have crossed with mine.
I wish to dedicate this book to everyone that is dear to me.
Even more so, I wish to dedicate this book to my mom, dad, and
siblings who have nurtured and stood steadfastly beside me every step of
the way.
Thank you for your love, support and continued guidance.
— Fru N.
A GAME PLAN FOR SUCCESS IN DATA ANALYTICS

FRU NDE

www.nextgenglb.com
NEXTGEN GLOBAL
INTRODUCTION

The world today is awash with data. Companies have and are investing
great amount of resources collecting and storing vast amounts of data. With
this data sitting on companies data farms, there is now a great need to
employ Data Professionals (a.k.a Data Ninja’s) who can come in and make
sense of this data.
MOTIVATION
I recently had the privilege of being invited along with some of my peers to
speak with a group of students pursuing a Master’s degree in Business
Analytics and Big Data at the University Of Minnesota Carlson School Of
Management in Minnesota.
The discussion was about Analytics and options for students in the
Master Program looking to make a career out of working with data. The
discussions we had in the panel session was very lively and engaging; and
the enthusiasm the students expressed reinforced my appreciation of how
vibrant the field Data Analytics is and how important it is for those who
want to get into the field to have a game plan for success.
After further discussing the topic of what it takes to be a data
professional with peers, what skills are required to start, and what steps
beginners can take to get into the field, our conversation precipitated
thoughts which eventually led to the writing of this book on those topics.
EXPECTATION
This book, as a whole, is NOT intended to be a technical book. Even
though some technical concepts will be discussed, you will not, for
example, learn how to use Excel by reading this book. Instead, resources
will be provided at the end of each chapter to help you follow up and do a
more in-depth study of the proposed topics.
Within this book, I will make the case for why you should learn Excel.
The same approach applies with other concepts such as SQL, Data
warehousing, ETL and Programming. The list of resources provided at the
end of each chapter are intended for you to then follow up and do more in-
depth study of the concepts on your own.
WHO IS THIS BOOK FOR?
This book was created as an overview of some of the most common and
basic tools used in Data Analytics today. As such, it is geared toward new
users – who are looking to get high level summaries of the concepts, and
more especially, users who are looking for resources and pointers to get
started. For more advanced readers, the resource links that have been
included at the end of each chapter can still be very valuable as a point of
reference or as an avenue to find other new resources that can help polish
up on certain specific areas.
We hope the content of this book, along with the references and resource
links provided, will help and encourage newcomers to get a better
understanding of some of the skills required for Data Analytics. We also
hope that the material will guide and encourage you to take that next step in
your career – either by securing a new job, making a career change, getting
more skilled or just going on to pursuing that promotion you have been
yearning for in the field of Data Analytics.
WHAT DO I BRING TO THE TABLE?
Having had a solid background helping companies transform their data
asset into information, I have been fortunate over my career to work with
several small, midsized and Fortune 500 companies.
In such engagements, I have leveraged tools as simple as MS Excel
spreadsheets on the one end, to more advanced tools like SQL and
Programming on the other end in order to build complex data integration,
data warehousing and data analytics solutions for businesses. In all these
experiences, I have been witness to both the good and bad sides of working
with data. But, the common and most promising lesson I have learned is
that, the processes and skills needed to get in the game are not that
complicated.
With a bit of effort and dedication, anybody should be able to get in and
excel in the field of data analytics. My hope in this guide is to leverage my
experiences gained from working in the trenches analyzing data, to inform
readers and provide a simple 5 point game plan for success.
WHAT DOES THIS BOOK OFFER?
This book offers a survey of the Data Analytics landscape (for beginners)
and presents a slice of the tools and the skillsets that can help individuals
transform data into information. This would include discussing concepts
and tools like MS Excel, SQL, Data Warehousing, Programming and
Change.
After reading, you would be provided with empowering resources (at the
end of each chapter) that can help you gain the skills necessary to take on
the role of analyzing data confidently.
What this book is NOT about
Being a good data professional is about using technology to solve
business problems. As you start and mature in the space of data analytics,
you may find yourself talking to individuals or working for companies who
try to push you into learning technology X because they think it is superior
to Y, or vice versa. Or they may say technology A is dead in favor of
technology B.
Some might even say “Why Excel?” or “Why SQL?” My tool is better.
Choose mine!
Of course, the point of this book is not to sell you on any individual
technology or vendor, but to expose you to the fundamental industry trends
and concepts as far as data analytics is concerned. The specific technology
tool you end up using really isn’t the focus.
A good analogy I have seen to describe this is the good old fishing
parable with a slight twist.
In this book, I will teach you about fishing, why you should learn how to
fish for yourself, and provide resources for places where you can go learn
specific fishing techniques. But I will not teach you how to use a particular
fishing rod. That is up to you to learn, using books, links, videos and all the
other resources that have been provided here or that available online, at the
bookstore, or in libraries.
INTRODUCING THE DATA NINJA
Data Ninja is the term that will informally be used throughout this book
to refer to any individual (or group of individuals) who aspire to work
intimately with data and make a career in the analytics field — either as
Data Analyst, Report Writers, Business Intelligence, Data Architects, Data
Engineers, Data Scientists, etc.
The Data Ninja Defined
Because the possibilities for defining and classifying Data Professionals can
be very broad, and the required skillsets for each job role can vary widely,
in this book, the term Data Ninja would be referring to:
An entry level, unspecialized individual that works within a structured
environment (usually within a company or team), employing a variety of
tools and performs a variety of tasks related to collecting, organizing, and
interpreting data to gain useful information.
Ninja Tip 1.
As a data ninja, you would be expected to play the role of the multi skilled,
multi talented individual that helps companies quickly and effieciently
transform their data assets into information.
The Ideal Data Ninja Candidate
The definition of the Data Ninja provided above is extremely important
because it would help guide the scope and breadth of content covered in the
ensuing chapters of this book.
Because there are many different levels of expertise required by data
professionals, ranging from the non-technical persons in the business world
to PhD wielding Math and CompSci geniuses, it makes sense to get a clear
definition of the Data Ninja so as to narrowly guide the scope and content
of this book.
It is understandable that some readers, though Data Professionals
themselves, may be too advanced for the concepts presented herein. But we
aim to stick to the basics and target those readers who possess some or all
of the following skillsets:

Non-technical, but not scared of a few technical details here and there.
Have little or no experience with data.
Are excited about the prospects of working with data.
Just getting into the data analytics field.
Are looking to land their first job or make a career change.
Are looking for opportunities to progress in their current carriers.
Are curious and just looking to expand their fundamental knowledge
about the concepts.

If either one, all, or some of the above criteria fits you, then this book is
for you - and you are the ideal Data Ninja this book is targeted towards.
DATA NINJAS AS INTERPRETERS OF DATA
The process of data analysis undertaken by Data Ninja’s usually is to turn
some form of raw data into meaningful information. In this regard, we see
that Data Ninja’s are like data interpreters of the company – and it is critical
that they get the analysis done right.
A company that employs Data Professionals who do not perform their
jobs well would be akin to someone having a wrong interpreter on their side
– a point that is devastatingly exemplified by the interpreter who performed
interpretive services for dignitaries at Nelson Mandela’s funeral ceremony
in South Africa, back in 2013.
Some have accused the interpreter of being a fraud and have described
him as "waving his hands around but there was no meaning" or even
describing the situation as "childish hand gestures and clapping, it was as if
he had never learnt a word of sign language in his life".
I do not speak sign language, but if I did, I would be seriously offended
by someone who stood up and pretended, or was blatantly incompetent of
performing necessary interpretive services. The biggest losers in a situation
like that are the honest listeners in the crowd who depend on such
individuals to get information about what the speaker is saying.

A sign language interpreter during a memorial in honor of Nelson Mandela.


Similar to language interpreters, data professionals are the ones at the
forefront tasked with interpreting their company’s data and turning it into
consumable information. It is critical that they do the job right, and provide
decision makers with the information they need.
Having a good data interpreter working for a company would ensure that
reports of daily sales numbers are correct, the forecast of website traffic
volumes are reasonable, the analysis of a marketing campaign effort is
effective, and so on.
On the other hand, the consequences of having a bad data interpreter who
cannot competently perform the task of turning data into information and
answering the company’s analytical questions can lead to situations that can
be very undesirable to the company’s bottom line. Sales numbers would be
reported wrongly, customer sentiments won’t be analyzed, financial
forecasts would be off, and so on.
Having a wrong data interpreter on your side not only confuses the
company’s decision makers, but might proactively lead them astray – as
was the case with the interpreter at Nelson Mandela’s Funeral who left the
crowd and millions of viewers around the world baffled and bewildered by
the subpar job he did.
You can have data without information, but you cannot have information
without data." - Daniel Keys Moran
HIRING OUTLOOK
It’s fair to assume that most readers want to master some data analytics
skills in order to get hired. So, before we get into the nuts and bolts of the
Data Ninja game plan, we will first take a look at what the hiring outlook
looks like for Data Professionals.
Most reputable research firms in the industry are predicting a significant
upsurge in hiring of data ninja-type professionals as the amount of
computing web and digital data grows. These data professionals in demand
consist of people trained in collecting, storing, interpreting and making
inferences based on data.
Because of the extremely high demands of these professionals, some
have even gone as far as calling professionals who work with data (a.k.a
Data Ninja’s) the sexiest job of the 21st century.
REAL NUMBERS – HIRING DEMAND
Back as 2009, IBM in press release expressed interest in the need to open
a global network of Advanced Analytics Centers. The expectation was to
use these worldwide centers to retrain or hire up to 4,000 additional data
professionals.
“ARMONK, N.Y. - 28 Apr 2009: IBM(NYSE: IBM) today announced a
significant expansion of its capabilities around business analytics with
plans to open a network of Analytics Solution Centers around the world,
beginning with five in the second quarter of 2009. These initial centers will
be located in Tokyo, London, New York City, Beijing and Washington, D.C.
As part of this initiative, IBM will retrain or hire as many as 4,000 new
analytics consultants and professionals."
(IBM)
Furthermore, an article by Allison Stadd goes for further to illustrate the
need and demand for data professionals.
It’s not just the skills needed, it’s also the raw manpower. In a survey by
Robert Half Technology of 1,400 U.S.-based CIOs, 53% of the respondents
whose companies are actively gathering data said they lacked sufficient
staff to access that data and extract insights from it. Translation: you are
sorely needed.
(Stadd)
REAL NUMBERS - JOB POSTINGS
From the hiring outlook above, we see that the field for professionals
who work with Data is booming and as such, the hiring outlook numbers
are very promising. In addition to the hiring demand and shortage of labor
expressed above, we also see the job postings for data professionals
remaining steadily high.
In a recent search I did on job site Glassdoor.com, there were over
102,305 job positions currently listed with Data Analyst tags.
Job listings from GlassDoor.com

Also, a similar search done on career site Dice.com came up with very
impressive numbers for Data analytics related job openings. More than
40,000 jobs were listed in a search on their database.

Data analyst job search from Dice.com

Translation
These numbers all seem to indicate a very strong outlook in the jobs market
for data professionals. The numbers also underscore the point that data is
the way of the future and that there is great need for skilled professionals
who aspire to make a career out of working with data.
WHAT INDUSTRIES DO DATA NINJA’S WORK IN?
Aspiring data professionals usually seek to understand what industries or
verticals Data Ninja’s work in. The simple answer to that is All Industries
and Verticals.
Data is everywhere, and today we see Data Ninja’s working part-time,
full-time and in a contractual basis in a wide variety of industries. This
ranges from non-profit organizations, government and education, to
healthcare, retail, high-tech, finance, e-commerce, and consumer products.
See the list below.
List of companies and industries that hire Data Ninja’s

Construction companies
Utility companies
Oil, gas and mining companies
Hospitals and healthcare organizations
Colleges and universities
Federal, provincial/state and municipal government departments
Transportation companies
Telecommunications companies
Insurance, finance and banking organizations
Management consulting companies
Manufacturing companies

ROLES AND SPECIALIZATIONS OF DATA NINJAS


Data Ninja’s can be tasked with performing varying roles within different
industries or organizations; some of which can include:
Web Data Analysis
Data Ninja’s in this domain focus on Analyzing website data and logs. It
may even be extended to include site polls, survey results, web traffic and
web usage, etc... The goal of such analysis helps companies better
understand and develop strategies for optimizing web usage and more.
Financial Data Analysis
Data Ninja’s in this area focus on studying the company’s financial
statements and analyzing the company’s current and projected value or
earnings. They have visibility to a lot of the company’s numbers including
sales, profit and loss, operations, and even meet with company officials to
gain a better insight into the organizations so that they can predict future
trends and identify potential opportunities.
CRM Data Analysis
Data Ninja’s in this domain focus on analyzing the company’s customers. In
many companies, customer data is usually stored in some sort of a
Customer Relationship Management (CRM) system. The analyst’s job is to
mine that system to better understand patterns and relationships within
available data. The result of such analysis can help the company’s Sales and
Marketing departments to spot under-served markets, identify best
customers, fine-tune advertising campaigns or predict new markets.
Marketing / Sales Data Analysis
Data Ninja’s in this domain typically focus on analyzing data sets used by
sales and marketing teams. Such analysis usually involve identifying,
modeling, understanding and predicting sales trends and outcomes while
aiding sales management in understanding where salespeople can improve.
The analyzed data can help provide insights into how the company’s
marketing efforts are performing, how campaigns are effective and which
sales channels are most valuable.
Health Data Analysis
Data Ninja’s in this domain typically focus on analyzing healthcare data.
They might work on multiple projects related to health services delivery,
healthcare costs, quality of care, insurance coverage, and access to care.
The Health Data Analyst works with healthcare information from a variety
of sources, including medical and pharmacy claims data, hospital
inpatient/outpatient data, quality measurement data, and other resources.
They use software and statistics to provide data support, solve issues,
perform research and improve information quality and accuracy.
In addition to the five listed above, there are scores of other areas which
Data Ninja’s can specialize in. But despite the different specializations, the
common thread that runs through all of them is the outcomes and impacts
their analysis work can have on the business.
Such impacts can range from finding out how to reduce costs, increase
sales, reduce customer churn; to determining what product to sell or how
many people should be scheduled to work on the Saturday shift.
HOW MUCH DO DATA NINJAS MAKE?
Most would agree that a vast majority of people would not do their jobs
if they weren't paid for it. Because of this, salary consideration becomes a
very important point of concern at the minds of most professionals,
including aspiring or practicing Data Ninja’s.
As with many other jobs out there, pay or salary often depends heavily
on skill level, years of experience, along with a host of other factors. But
the good news is that, in most cases, individuals can rise on the pay scale to
the far limits of where their talent takes them.
Show me the money!
If you do a quick search on the internet, you quickly would find a lot of
numbers regarding the pay of Data Professionals. Some of the figures are
lower than others, but according to the talent firm DataJobs, national salary
ranges for a few data related jobs titles are as follows:
Data analyst, Entry Level
$50,000-$75,000
Data analyst, Experienced
$65,000-$110,000
Data Scientist
$85,000-$170,000
Database administrator, Entry Level
$50,000-$70,000
Database Administrator, Experienced
$70,000-$120,000
Data Engineer, Junior
$70,000-$115,000
Data Engineer, Domain Expert
$100,000-$165,000
Translation
The salary figures presented above look very promising, especially given
that according to the U.S. Census Bureau the U.S. real (inflation adjusted)
median household income was $51,939 in 2013. That number was a rise
from where it was in 2012 at $51,759. (U.S. Census Bureau). So, the pay
ranges for Data related roles is on average significantly higher than the
median income numbers reported by the U.S. Census Bureau.
To further substantiate this pay and income numbers, we see the
Technology consulting firm, Robert Half Technologies (RHT) came up with
their 2015 publication of Salary ranges for data professionals.

Salary range for data professional

Translation
Again, these numbers look very lucrative. Especially, given that a quick
analysis of the salaries reveals an above +5% average NET increase in pay
for 2015, compared to 2014, and the average salary figures are substantially
higher than the $51,939 median income we saw reported by the U.S.
Census bureau earlier for 2013.
What’s in for you?
Someone looking at these pay numbers might rightly ask; What does this
all mean for my pocketbook? Or better still, What’s in there for me? All
these are good questions to ask; and the answer to them is remarkably
simple.
With the growth of data picking up exponentially, with demand for data
professionals being very high and the supply for qualified candidates being
short, growth in pay increasing positively year after year, it becomes very
clear that aspiring or practicing Data Ninja’s all across the country have a
favorable career outlook in terms of wage and job prospects for years to
come.
And whether your goal is to keep the paychecks coming at a steady rate
or you are just looking to advance your career, the field of data analytics is
proving to be the place where you can confidently make that happen.
THE EXPONENTIAL GROWTH OF DATA
It would be hard to talk about anything related to Data Analytics without
acknowledging the exponential growth in data we are experiencing. The
world today is awash with data, and this data touches nearly all aspects of
our lives.
The far reaching tentacles of data today ranges from how we live our
social life online, how we communicate with loved ones or business
partners, how we bank our money or how we get health care.
Companies of all forms, shapes and sizes are experiencing the effects of
this astounding rates at which data volumes are growing. Projections of
about 40% growth a year into the next decade are not uncommon to see.

Data growth outlook. (Oracle)

The chart presented above is what most people usually reference when
they talk about the rapid rate of growth of data today. The projected data
volumes by 2020 are truly mind-blowing, especially when compared in
perspective to where we are today.
Illustration
To put this into perspective, if the data volumes of today were the size of 2
door car sedans, the data volumes of the future, 2020 and beyond would be
reaching the sizes of Aircraft Carriers or more. The difference is truly
astonishing.
This point of comparison can be further substantiated if we look at some
dazzling facts about the data growth rates currently being experienced.
Dazzling facts about the current growth of data

It is expected that by 2020 the amount of digital information in


existence will have grown from 3.2 zettabytes today to 40 zettabytes.
The total amount of data being captured and stored by industry doubles
every 1.2 years.
Every minute we send 204 million emails, generate 1.8 million
Facebook likes, send 278 thousand Tweets, and upload 200,000 photos
to Facebook.
Google alone processes on average over 40 thousand search queries
per second, making it over 3.5 billion in a single day.
Around 100 hours of video are uploaded to YouTube every minute and
it would take you around 15 years to watch every video uploaded by
users in one day.
570 new websites spring into existence every minute of every day.
Today’s data centers occupy an area of land equal in size to almost
6,000 football fields.
The NSA is thought to analyze 1.6% of all global internet traffic –
around 30 petabytes (30 million gigabytes) every day.
The number of Bits of information stored in the digital universe is
thought to have exceeded the number of stars in the physical universe
in 2007.
Retailers could increase their profit margins by more than 60% through
the full exploitation of big data analytics. (Marr)

Some More Data Analytics Facts and Data Trends


Business analytics is a $12.2 billion industry, according to Gartner Inc.
McKinsey & Company report forecasts a shortage by 2018 of
professionals specializing in the field, people trained to distill data into
meaningful information.
Every dollar that a company invests in business analytics earns $10.66,
according to Nucleus Research.
According to Forrester Research, 97 percent of companies with
revenue of more than $100 million are pursuing expertise in business
analytics. (Kristal)

Translation
Numbers don’t lie. These statistics on data growth rate is truly astounding,
and with more than 2 quintillion bytes of new data being generated every
single day, we now see companies, corporate managers and executives
scrambling to hire individuals who understand data and can work with it to
derive competitive value. For the data ninja, all this potentially translates
into you commanding a high demand position in the job market, long term
job security and strong pay in salary.
WHY ARE DATA NINJAS NEEDED?
According to a report published by McKinsey & Company’s Business
Technology Office in 2011 entitled Big data: The next frontier for
innovation, competition, and productivity, data has swept into every
industry and business function and is now an important factor of
production, alongside labor and capital.
This observation indicates significant need for Data Ninja-type talent in
the coming years and also emphasizes the great opportunities that this new
era of abundant data holds for companies, particularly in terms of being
able to use data to gain efficiency and tap into new business opportunities.
“The United States alone faces a shortage of 140,000 to 190,000 people
with deep analytical skills as well as 1.5 million managers and analysts to
analyze big data and make decisions based on their findings.”
(McKinsey)
Recently, research firm Gartner in one of their publications cited many
industries and companies as having a great need (demand) for more people
skilled in managing and analyzing data.
“By 2015, big data demand will reach 4.4 million jobs globally, but only
one-third of those jobs will be filled. The demand for big data is growing,
and enterprises will need to reassess their competencies and skills to
respond to this opportunity. Jobs that are filled will result in real financial
and competitive benefits for organizations. An important aspect of the
challenge in filling these jobs lies in the fact that enterprises need people
with new skills — data management, analytics and business expertise and
nontraditional skills necessary for extracting the value of big data, as well
as artists and designers for data visualization.”
(Gartner)
Even though the statistics and articles presented by some research firms
might explicitly make references to “Big Data” (a topic that is a bit more
advanced than the intended scope of this book), we nonetheless see the
move to the Big Data Analytics as a natural career progression step for any
person working in the data field.
Translation
As we can see from Gartner’s predictions above, the tremendous growth in
new jobs opportunities coupled with the potential shortages of suitable Data
Ninja-type individuals translates into healthier demand and salaries for
those individuals who are skilled and capable of crunching data.
WHAT DO DATA NINJAS DO?
Some of the most important responsibilities of Data Ninjas involve
collecting, sorting, and analyzing different sets of data to gain insights.
These datasets being analyzed can range from simple business metrics such
as sales numbers to more exotic datasets like user behavior and product
performance.
From Data to Wisdom and everything in between
The ultimate goal of any analysis effort carried out by Data Ninjas is to
transform data into information. In its raw form, data is just what it is, data.
And raw data is not very useful unless it is synthesized and transformed into
information that people and organizations can actually consume and act on.
To get a good sense of what data analysis is about, we would leverage an
existing life cycle that articulates how data gets transformed to wisdom.
This is the DIKW pyramid (shown in figure below).

DIKW pyramid. (Longlivetheux)


The DIKW pyramid is especially relevant because it visualizes the
knowledge hierarchy, showing how information is defined in terms of data,
knowledge in terms of information, and wisdom in terms of knowledge.
It is important to also observe that the lineage in the DIKW pyramid
starts with Data at the foundational level, and Data Ninja’s are vital in the
lifecycle because they help perform the necessary Data Analytics tasks that
moves us up on the DIKW pyramidal hierarchy. As a result, some might
argue that without the right data analytics work being performed, having
wisdom might not be possible.
The Different States of Data and Information
Data and Information are one and the same thing. They just exist in
different states. A good way I have been able to explain this is to consider
water. The fundamental compound of water (H20) doesn’t change, but it
can exist in different states such as liquid (water), gas (water vapor) and
solid (ice).
Regardless of its state, water will still be water; but it would not be wise
for a thirsty person to quench their thirst by trying to drink water vapor or
ice. They need to drink water, and more importantly, they need to drink it
in its liquid state.
This analogy has great parallels to the way companies go about
quenching their decision making thirst. In order to make insightful
decisions, leaders in organizations cannot go about it by consuming raw
data — that would simply be akin to a thirsty person trying to drink ice or
water vapor in order to quench their thirst.
Instead, decision makers need to consume information in order to make
insightful decisions. And just as some energy has to be put in process of
transforming ice to liquid water, some energy also has to be put into process
of transforming data into information that is safe for consumption and
decision making.
This energy is what you, the data ninja of your company would be tasked
to bring to the table.
WHAT IS DATA ANALYSIS?
Data analysis is the primary function performed by Data Ninjas.
Data Analysis: Definition I
Data Analysis is the process of systematically applying statistical and/or
logical techniques to describe and illustrate, condense and recap, and
evaluate data.
(Illinois)
An alternate definition to the data analysis process is presented in a course
description by the John’s Hopkins University in Coursera.
Data Analysis: Definition II
Data analysis is the process of finding the right data to answer your
question, understanding the processes underlying the data, discovering the
important patterns in the data, and then communicating your results to have
the biggest possible impact.
(Leek)
FIVE STEPS TO DATA ANALYSIS BY DATA NINJAS
Typically, Data Ninjas perform five simple steps as part of their Analytics
work:
1. Formulate the question
2. Collect the data
3. Analyze data
4. Communicate results
5. Reiterate
All of these steps are critical to the process and none can be ignored,
skipped or under looked.
Depending on the level of expertise, experience or specialization, the
data ninja as part of their job role can spend more time collecting and
sorting the data than analyzing or vice versa.
And the industry in which Data Ninja works may dictate or influence the
specific type of data that they would collect and analyze.
Ninja Tip 2.
The data analysis work performed by a Data Ninja is not for the faint of
heart. It requires a creative problem solver who finds it rewarding to
identify, investigate, isolate, and resolve data issues.
JOB RESPONSIBILITIES OF A DATA NINJA
The Job responsibility of a Data Ninja can vary widely from company to
company or industries, or skill level, but generally, they include the
following broad characteristics:

Writing queries to retrieve data from a database and other data sources.
Scrub data to remove duplicates and other errors within the data.
Analyze data to find insights or trends that can be used to improve
their company's Key performance metrics (KPI).
Prepare reports based on analysis and present to management.

Ninja Tip 3.
There are many paths an aspiring Data Ninja can take, but understanding
the business you work with before doing any data analysis is extremely
important because businesses often have different needs and approaches to
working with data.
It’s Your Business to Know Your Business
A Data Ninja who works with social data at a company like Facebook
has a totally different datasets and might employ totally different techniques
to analyze data than, say, an analyst working at a financial firm like
Goldman Sachs, or an analyst at a health insurance company like United
Health Group.
So, it is important to understand the business you work with before doing
any data analysis – especially given that businesses often have different
needs and approaches to working with data.
THE TYPES OF ANALYSIS A DATA NINJA PERFORMS
In general, the process of analyzing data can be divided into exploratory
data analysis (EDA), where new features in the data are discovered, and
confirmatory data analysis (CDA), where existing hypotheses are proven
true or false.
Exploratory Data Analysis (EDA) - Example
A Data Ninja at a national retail chain, as part of their exploration, can
analyze and plot on a graph their product sales by region or zip code.
Without any prior knowledge of what to expect, the exploratory exercise
can offer insights and the revelation that a particular product sells more in
the East Coast stores than it does in the West Coasts stores, or that the sales
of a particular product spikes during severe snowstorms, than when the
weather is average.
These are findings that can be profound and has potentials to influence
the way the company works, advertises or uses data. When done right, EDA
can expose trends, patterns, and relationships that are not readily apparent.
The results from EDA analysis can help the company to improve their
marketing efforts or change the way they target their customers with ads or
open new lines of businesses that can in turn increase sales revenue, reduce
cost, and affect the bottom-line positively.
Numbers have an important story to tell. They rely on you to give them a
clear and convincing voice. Stephen Few
Confirmatory Data Analysis (CDA) - Example
A Data Ninja at an Ecommerce company can hypothesize that customers
who buy Product X have a 60% likelihood of buying Product Y if an ad
impression about Product Y is shown to them during the time of their
checkout.
This is a solid hypothesis that can be verified using data. In this case, the
Analyst can set out to examine website traffic or navigation patterns to
determine where their hypothesis is true, i.e. whether based on the data,
customers are more or less likely to buy Product Y based upon prior
exposure to impressions about the product.
Ninja Tip 4.
As a data ninja, you must enjoy working with data, have great attention to
detail and enjoy looking at data sets to find anomalies, outliers, and
patterns.
WHAT ARE THE SKILLSETS FOR THE DATA NINJA?
On a day to day basis, a Data Ninja will be required to employ a number of
skills to get the job done - such as their technical skills, business acumen,
presentation skills, database skills, analysis skills, and sometimes coding
abilities.
These skills allow Data Ninjas to perform their duties of analyzing data
with competence, as well as help them overcome any new challenges that
come up along the way.
Below, we have broken down the skillset requirements of Data Ninja’s
into two broad classifications.
1. Technical Skills
As a Data Ninja, your technical skills are absolutely essential to landing
and keeping your job. There is no getting around that. In the coming
chapters of this book, we will cover some of the technical skills (such as
Excel, SQL, Data Warehousing, and Programming) by presenting 5
Nuggets that are essential for professionals looking to make a career
working with data.
For the basics, some understanding of Excel and being able to work
proficiently in it will help (see Nugget on MS Excel). Also being able to
understand data sources, data structures, schemas, Data Warehousing,
Structured Query Language (SQL), and some programming concepts (if
possible) would be extremely useful.
Some Math and general understanding of statistics and set theory would
go a long way to help. More advanced technical users can go on to master
tools like Python, Matlab, and a Statistical Language (R, SAS, and SPSS).
These advanced concepts are recommended, but generally not required for
most entry level candidates and thus not covered in this book.
2. Soft Skills
In performing analysis work, defining the problem and narrowing the
analysis down often requires a lot of soft skills. When analyzing data for a
company or client, it is important to be able to balance your time, reduce
infinite “what-if?” scenarios and understand the priority of needs that are at
hand. Mastering all of these skills require good self-awareness and control.
Unlike hard technical skills (mentioned earlier), which comprise a
person's technical abilities to perform certain functional tasks, soft skills are
interpersonal and broadly applicable across job titles and industries. You
have to get well along with people you work with, be dependable, be
timely, be honest, be curious, and so on.
Interestingly, many soft skills are tied to an individuals' personality rather
than any formal training and are thus considered more difficult to develop
than the technical skills. But with continued practice and perseverance,
many data ninja professionals should be able to develop and advance the
soft skills required to perform the job.
Technical skills may get you the job, but soft skills will help you keep the
job.
Skillsets for the Data Ninja. (Nde)

From the diagram above, we see that analytical, critical thinking, and
math skills are absolutely essential to perform at a high level as a Data
Ninja. Generally speaking, the analytical and math skills might fall under
the technical skills category, while communication and critical thinking
skills might fall under the soft skills category.
Translation
All of these skills intersecting together produces the ideal Data Ninja
candidate i.e. someone who is acutely analytical, thinks critically and
communicates effectively.
Required Skillsets for the Data Ninja
Not each and every single skillset is absolutely required to be a
successful Data Ninja. Depending on the job, some of the skills may or may
not be a requirement. But having them can be advantageous to excelling
and thriving in your career.
For example, you can have a successful career as a Data Ninja without
knowing a lot of math and statistics, or how to write a line of programming,
but knowing math, statistics, or programming will simplify things when it
comes to solving very complex or advanced problems.
Ninja Tip 5.
It doesn’t really matter how much you know about the analytics process or
how much effort you have put into an analytics project.
If you can’t communicate your results in a clear and timely manner to
decision makers, then you can’t impact the business bottomline..
WHAT TOOLS DO DATA NINJA USE?
Tools used by Data Ninjas can vary widely depending on the level of
expertise, the specific jobs requirements, preferences in the company you
work at and much more. So, trying to provide an exhaustive list of every
single tool out in the market as part of this book would be next to
impossible.
But below we have listed of some of the general tools and concepts that
are highly recommended as the starting point for individuals looking to get
into the data analytics field.
Tools and Concepts for Data Ninja Candidates

MS Excel
SQL Server
Data Warehousing Concepts
Programming
Adapting

Each of these tools and concepts listed will be covered in more depth in the
ensuing chapters.
Ninja Tip 6.
As the saying goes, “Anybody can buy a tool, but only a few special people
can make magic happen with the tools they have.” As a Data Ninja, you
should definitely stop focusing on the tools at hand and instead focus on the
magic you want to see happen.
Tools are Important, but Not the End-All-Be-All
By the time you go on to read the specific details in each chapter, my hope
is to continuously convey the all-important message that although software
tools make analysis easier, they are only as valuable as the information that
you put in and analysis that you conduct. As one of the popular sayings
goes:
"Anybody can buy a tool. A few special people can make magic happen with
it."
Data Professionals should always have the mindset of striving to make
the best of whatever tool they have. As a Data Ninja, no matter what tool
you have at hand, I would encourage you to take a moment to challenge
yourself. Learn a few new tricks with the tools you work with. Then, let the
tools serve as a medium to enhance and complement the logic and
reasoning skills that you already have – instead of being a distraction to the
process.
WHAT TRAINING DO DATA NINJAS NEED?
There is a lot of training available for anyone interested in becoming a
Data Ninja. The trainings vary in scale and rigor, ranging from informal
training by personal study to more formalized training by pursuing
scholarly curricula at an accredited academic institution.
The more advanced role you play in the job, the more advanced the
training that may be required. But, given that this book naturally dwells on
entry level candidates, the training required to start off might not be as
rigorous or formal compared to what people might think.
Training for Data Ninja Candidates
To start off, candidates are typically encouraged (but not required) to
have an undergraduate degree in a field such as accounting, statistics,
mathematics, computer science or business. Most of these requirements for
formal degrees can be waived if the candidate has sufficient years of
experience, or demonstrates strong competence in performing the job-
specific roles offered to them.
Different employers might have different practices and hiring
requirements. As a result, some employers might require their Data Ninja
candidates to have a master’s or doctoral degree in an area closely related to
fields such as mathematics, accounting, statistics, computer science or
business. But in most cases, these advanced degree requirements are usually
only for candidates looking to get advanced analytics roles or leadership
roles, and usually do not apply for entry level or beginner candidates.
Translation
Training helps individuals gain a systematic approach to problem-
solving. Some intuition, artistry and guesswork may be needed when
analyzing data, but for the most part, the process is very scientific, with
very systematic and repeatable ways to go about analyzing a data set.
Understanding this systematic approach to working with data makes
things more standardized and saves data professionals the burden of having
to reinvent the wheel on concepts that have already been mastered. This is
the main value proposition we see offered by many training programs.
Training of Some Sort is Crucial for Success
Training sometimes gets a bad representation, because some people
might perceive it as being expensive or taking a lot of time. No matter the
circumstances, that should not stop you from pursuing a rigorous training
program that will give you the essential skills to perform competently at
your job as a Data Ninja. Without competence, you will not have a job.
Training, and the benefits from it, is a point that cannot be overemphasized.
Practice Makes Perfect when it comes to Data Ninja Training
Through books, videos, lessons and a myriad of online resources, it is
possible to teach yourself much of what is needed to be an exceptional data
analytics ninja.
No matter how you start or what route you take for training, it is possible to
become much better and even extra-ordinary in the data analytics game by
making a commitment to learning and embracing new techniques to help
solve business problems.
In addition to reading and studying up on the concepts presented in this
book, to get much better at working with data, you have to actually do it;
and do it as often as you can. This is where practicing more will make you
more perfect at what you do.
Ninja Tip 7.
As a Data Ninja candidate looking to land your ideal job, there are many
benefits to aggressively pursuing a training path and a regiment of
continous learning – whether by formal training or not.
Continuous learning is about the constant expansion of skills and skill-sets
through learning and increasing knowledge. As life changes the need to
adapt both professionally and personally would be as important as the
changes themselves.
SAMPLE: REAL WORLD DATA NINJA STORY
One of my first real, non-restaurant jobs was as a “Data Analyst” for a
really large insurance/healthcare corporation. I worked in the area that
managed the marketing database for the company.
For example, we could only market to certain zip codes (by law) and once
I had to input something like 10,000 zip codes into a database in about
three days.
We would also do a lot of analysis on what groups were more responsive
to our marketing campaigns. So I’d end up throwing a ton of information
into a spread sheet (Lotus 123 at the time). Excel is very similar and then
doing lots of analysis to find the best performing groups (People between
the ages of 55-65 who live in non-urban areas of Florida might be an
example).
Using this info, we’d go out and try to find more people (our target
audience) who mirrored the most responsive groups. Purchasing a mailing
list or advertising in certain publications who’s readers are similar to these
people (like AARP) would be a way in which to target them.
You might be responsible for coming up with the stats for the target
audience and then going out and finding them. You’ll also spend vast
amounts of time in front of a computer inputting and looking at data. You
may also work with programmers (who may be in India or somewhere else).
In order to do a good job, you’ll need to be very detail oriented. You will
need to like to work with numbers. Logical/critical thinking skills required.
You will need to be ok with sitting in front of a computer looking at data for
long periods of time. There may not be much room for artistic expression.
Leann C.
December 7, 2010 at 5:32 am
SAMPLE: REAL WORLD DATA NINJA JOB DESCRIPTION AND
POSTINGS
In this section, we present a sample Data Ninja Job description. This
sample is available free on the resources.workable.com website and we
have gone through the job description to highlight and call out the specific
skills and requirements that could be of importance to an aspiring data
ninja.
As mentioned earlier, different companies and industries may have
specific job requirements tailored to the specific Data Analytics role that
they are looking to fill, but in general, there are some broad skills and
concepts that can be found in most résumés.
Note: The goal of presenting this sample is to serve for educational
purposes ONLY, and to highlight some of the skills employers look for
when finding Data Ninja’s in the real world.
(WFM)
Data analyst job posting (workable)
MASTER MS EXCEL
MOTIVATION
MS Excel is basically a spreadsheet developed by Microsoft Corporation
for windows and other OS versions. The product allows for easy
calculations, graphing, tabulation and pivoting of data.
A poll from kdnuggets.com shows Excel on top of the list as one of the
most popular analytical tools being used in the industry today.
Compared to other products in its class, Excel stands as a very powerful
tool and certainly has its place in the market as far as data manipulation and
analytics is concerned.
The ubiquity and versatility of the product gives users the ability to
manipulate, cleanse, and merge data sets with relative ease. As an analytics
platform for small datasets, Excel has proven to be very generous and can
seriously reward anyone who takes the time to learn and play with its
formulas and calculations.
MS EXCEL CORE FUNCTIONALITIES
MS Excel offers several functionalities that are useful in analyzing and
working with data. Here are the main ones:

Sort: Can sort rows and columns of data in either ascending or


descending order.
Filter: Can be applied to filter data results to appear in a certain way
and fit a certain criterion.
Conditional Formatting: It enables one to highlight a specific column
or row or a block with any color depending on the value of that block.
Charts: Can be used to graphically display data in the form of a chart
or a graph that depicts the particular rise or fall in the values, e.g.
representing the profit rise. Graphs represent data better compared to
representing it in numerical values.
Pivot Tables: It is one of the most powerful tools that can extract the
significance from a detailed data set by allowing for quick
summarizations to be done.
Tables: Properly arranged tables allow one to analyze data in a much
faster way.
What-If Analysis: It allows one to try different values in cells in order
to see whether the changes will affect the outcome of the formula.
Analysis Tool Pak: It is an Excel add-in program that can perform
financial, statistical and engineering data analysis.

MS EXCEL FOR DATA ANALYSIS


Data Ribbon
As a data ninja, you might find the data ribbon in Excel to be one of the
most useful of all. This ribbon holds functions that can help anyone quickly
perform a variety of statistical and non-statistical calculations on their data.
Data Analysis ToolPak
In addition to the out-of-the-box functions present in the data tab, Excel
also makes available the Data Analysis ToolPak1.
The Data Analysis ToolPak is especially relevant because it presents a
powerful set of tools used for statistical analysis and can help analysts
figure out the variance, correlation and covariance of data as well as other
features.
If you need to develop complex statistical or engineering analyses, you can
save steps and time by using the Analysis Tool Pak. You provide the data
and parameters for each analysis, and the tool uses the appropriate
statistical or engineering macro functions to calculate and display the
results in an output table. Some tools generate charts in addition to output
tables.
(Office)
MS EXCEL: AVAILABLE DATA SOURCES
Excel allows for data to be imported from a variety of data sources for
analysis. A data ninja working with Excel can import data into their
workbooks from a wide variety of sources.
Example data sources for Analytics in Excel

From the web


From files: Excel, CSV, XML, Text or Folder that contains files with
metadata and links.
From databases: SQL Server, Windows Azure SQL Database, Access,
Oracle, IBM DB2, MySQL, PostgreSQL and Teradata.
From other data sources: SharePoint List, OData feed, Windows Azure
Marketplace, Hadoop Distributed File System - HDFS, Windows
Azure Blob storage, Windows Azure Table storage, Active Directory
and Facebook.

MS EXCEL FOR DATA PIVOTING


As a data ninja, you may be required to spend a lot of time summarizing
data, because people prefer looking at summaries.
Pivoting is an incredibly powerful tool that makes it easy to tabulate and
summarize data in exciting ways. Though there are many products in the
market with pivoting functionality, Excel’s ubiquity makes its pivot table
one of the most widely used in the market.
The good thing about the pivot functionality in Excel (or any other pivot
tool for that matter) is that it allows analysts to quickly change how data is
summarized with very little to no effort.
Microsoft Excel screenshot. (Excel)

Whether you want to summarize daily sales data for your company by
line of business (LOB), or sum employees’ total working hours for the
week, pivot tables will let you do that with relative ease.
MS EXCEL FOR DATA PRESENTATION
Another extremely important feature within excel is that of presentation.
The results of any data analysis usually need to be presented to users and
decision makers for consumption. Microsoft Excel excels at this.
The sky is literally the limit for using Microsoft Excel for dash boarding
and presentation of data that tells a consistent and coherent story. (Excel)

Some of the common presentation and visualization functions within


Excel include:

Pie Charts
Maps
KPIs (Key Performance Indicators)
Hierarchies
Drill Up and Drill Down
Background Color and Background Images
Hyperlinks

MICROSOFT POWERBI
In recent years, Microsoft has been putting in efforts toward a number of
integrated components for data collection, analysis and visualization. These
products are currently being distributed under the solution named Power BI.
Some of the products within the Power BI ecosystem include:

Power Pivot. Power Pivot provides end-user accessible, in-memory


data modeling for large data sets. Power Pivot was introduced as an
add-in to Excel 2010, and has since been fully integrated as an out-of-
the-box feature in Excel 2013.
Power View. Power View is a complimentary technology to Power
Pivot, enabling advanced visualizations for data models created in
Power Pivot. Power View delivers interactive visualizations, including
animated visuals and maps powered by Bing Maps. Originally Power
View was available only as a SharePoint feature, but has since been
integrated directly into Excel 2013.
Power Map. Power Map, previously known by the development name
GeoFlow, is an add-in to Excel 2013 that provides more compelling
Bing Map powered visualizations, extending Power View’s
capabilities with 3D map visualizations.
Power Query. Power Query, previously known by the development
name Data Explorer, is an add-in to Excel 2013 that provides a more
fluid, open data discovery environment than is provided by Power
Pivot alone.

The Future of Power BI


By leveraging the popularity and versatility of Excel, Microsoft has
worked on providing users with new capabilities for analyzing and working
with data. It is truly exciting, to say the least, and will potentially redefine
the way data analysis is done within organizations.
PowerBI seems to hold a lot of promise and because of this Microsoft is
laying a great stake in it.
Microsoft is not content to let Excel define the company’s reputation
among the world’s data analysts. That’s the message the company sent on
Tuesday when it announced that its PowerBI product is now free.
According to a company executive, the move could expand Microsoft’s
reach in the business intelligence space by 10 times.
If you’re familiar with PowerBI, you might understand why Microsoft is
pitching this as such a big deal. It’s a self-service data analysis tool that’s
based on natural language queries and advanced visualization options. It
already offers live connections to a handful of popular cloud services, such
as Salesforce.com, Marketo and GitHub. It is delivered as a cloud service,
although there’s a downloadable tool that lets users work with data on their
laptops and publish the reports to a cloud dashboard.
(Gigaom)
WHAT EXCEL IS NOT
Excel is a very versatile tool that plays a pivotal role in the data analysis
process for most companies. But Excel doesn’t come without its drawbacks.
As such, it becomes very important to have an understanding of what Excel
is and is not.
Let’s face it: We all have seen a crazy Microsoft Excel spreadsheet or
encountered one of its dreaded “Not Responding” messages. Unfortunately,
the flexibility and ease of Excel makes it the ideal candidate for
inappropriate use and widespread abuse.
Modern Excel 2013 and the latest Power BI add-ins do sizzle in
demonstrations, but there are analyses that simply do not make sense to use
Excel for today.
(Datanami)
PAIN POINTS WITH EXCEL
Collaboration: Excel is inherently designed for personal use and for single-
user access at a time. Spreadsheets tend to be shared via email, which
causes duplicate copies or inconsistent data.

Maintenance: With data coming from different sources, it can become


very difficult to maintain spreadsheets manually, especially if the data
changes frequently.
Data Integrity: With multiple users having the capability to make
copies of any spreadsheet with ease, it becomes very difficult to
control who or where the single version of truth lies potentially
causing serious data integrity issues.
Accessibility: Being a desktop-based application, anytime or anywhere
access is not possible except when you have a mobile gadget such as a
laptop.
Data Security: Excel data is usually downloaded and made locally
available to user machines. This makes the spreadsheets not only error
prone but also susceptible to theft and data loss.
Scalability: There are times when spreadsheets get so big that it
doesn’t make any sense to hold the data in Excel. The new PowerPivot
side of Excel compresses and handles more data than traditional Excel
alone. However, capacity will still be constrained to the user’s CPU
power since all the processing is done by your local desktop.

Ninja Tip 8.

Choosing between Excel and some other solutions for your data analytics
purposes is not an either/or proposition, nor is it a zero sum game.
Excel has capabilities that are proving to be very valuable for certain types
of data analysis; and the point of this book is to make that clear and
encourage you as a Data Ninja to go out, explore the tool further and use it
for those use cases that works well for you.
SUMMARY
In this nugget, I have presented a good suite of solutions that come with
Excel which can help companies and their analysts do wonders with data. I
have also tried to present some of the drawbacks that can come with
building enterprise-wide solutions on Excel.
But I also realize there are those who would rather focus all the attention
on the negatives rather than the positives of Excel — or any other product
for that matter.
Excel is Extremely Good, but has its Limitations
As mentioned earlier, the point of this book is not to get into a debate over
whether Excel is good or bad as an analytical tool, but instead to appreciate
that Excel has potentials (whether you like it or not) which may or may not
be useful for your needs.
When it comes to data analytics, Microsoft Excel should not be seen as the
panacea, because it simply isn’t — and no single tool is for that matter.
Excel offers a lot of functionalities to help companies and data ninja
professionals in their analytical journey. But, Excel doesn’t (and shouldn’t
be expected) to solve all problems faced by companies today.
Nonetheless, it has a vital and pivotal role to play within the data analytics
ecosystem, and must not be ignored.

MS EXCEL - NINJA RESOURCES CORNER & SHADOW SKILLS


GCF Learn Free
GCF Learn Free will teach you how to create formulas. They will guide you
through the basics of creating formulas for any kind of spreadsheet. They
also give you opportunities to practice with real-world scenarios.
Link: http://www.gcflearnfree.org/excelformulas
Chandoo
Chandoo has a straight forward approach to learning excel. They cover the
basics and will move you along into advanced knowledge. They go over
everything from formulas, charts, VBA, dashboards, and more.
Link: http://chandoo.org/wp/
Five Minute Lessons
Just like the name suggest, this site offers 5 minute lessons to bring you to
the next level in excel. These are short lessons but teach important skills
that everyone should know in excel.
Link: http://fiveminutelessons.com/learn-microsoft-excel
Excel Exposure
Excel totes the Microsoft Most Valuable Professional badge on their
landing page. They have tons of resources and video tutorials that can take
you from beginner to advanced user. Their lessons include videos,
infographs, and workbooks.
Link: http://excelexposure.com/
Trump Excel
“An effort to learn and share amazing tricks on excel spreadsheets.” Trump
Excel focuses more on tips and tricks rather than walking through the
basics. Trump Excel assumes the user already has a basic knowledge but
wants to learn some new things. Even if you are a beginner you will pick up
some interesting tricks here that you might not find elsewhere or on your
own.
Link: http://trumpexcel.com/
Excel Tip of The Month
Isaac Gottlieb created this site to give monthly tips. This is another site that
focuses on tips and tricks rather than guiding you through from start to
finish. This is a straight forward site with great tips. Once you get started in
Excel this site will continue to give you tips.
Link: http://isaacgottlieb.com/tip-of-the-month
Excel Tips
You guessed it, more tips and tricks. Excel is a program that is so
immersive, making it difficult to master every single functionality it has.
That isn’t meant to discourage you but rather encourage you to learn
everything you can.
Link: http://excelribbon.tips.net/
Peltier
Peltier is a blog that focuses on excel charts. They have pages on pages of
excel charts and how-to guides. If you are struggling with charts in excel
this is the place for you.
Link: http://peltiertech.com/
Excel Central
Excel Central offers videos, eBooks, and file downloads. They sell courses
and books but they will let you view the first 8 chapters in their courses for
free to see if it is right for you. Not a very fancy website but they do have
some good essential information here.
Link: http://excelcentral.com/
How Cast
How cast provides short-form instructional video and text content. They do
not specifically concentrate on excel but their videos are a great way to get
started.
Link: http://www.howcast.com/guides/573-How-to-Use-Microsoft-Excel/
PC World: “Use Microsoft Excel for Everything”
This is an excellent article about excel that enlighten you on the many uses
of excel.
Link:
http://www.pcworld.com/article/229504/five_excel_nightmares_and_how_t
o_fix_them.html
Microsoft
Microsoft has put together an excellent resource for learning Excel. You
know you can trust that they know what they are talking about since they
created the program.
Link:https://support.office.com/en-us/article/Excel-2010-training-courses-
videos-and-tutorials-807211fe-ee81-4887-b48a-68a94e1e912f?
CorrelationId=d30498f0-5ed7-48c3-9952-57864b6fecbe&ui=en-
US&rs=en-US&ad=US
Lynda
There is just no escaping Lynda’s vast categories of tutorials. Lynda is a
paid service that offers a huge list of different kinds of tutorials and videos,
a great resource for anyone who wishes to expand their knowledge on
pretty much any subject.
Link: http://www.lynda.com/search?q=excel
KEY NINJA LESSONS

Learn to extend Excel using add-ins and scale using SharePoint and
Office 365.
PowerBI — including PowerPivot, PowerQuery, PowerViews, and
PowerMaps — extends the capabilities of native Excel tremendously.
So, don’t ignore it.
Excel is here and is promising to be around for a while, so learn it.
Master the formulas and techniques for acquiring, analyzing and
presenting data from different data sources.
CONQUER SQL
MOTIVATION
As companies embrace data for better and faster decision making, the
database environments they use have become increasingly complex. The
need for mastery of a computer language aimed at accessing, manipulating,
and querying data stored in relational databases has become extremely
important. This is where SQL comes in.
WHAT IS SQL?
Structured Query language (SQL) pronounced as sequel or ess-queue-ell
— is the primary language used to request information from a database and
it is everywhere.
For example, a database-driven dynamic web page takes user input from
forms and clicks and uses it to compose a SQL query that retrieves
information from the database required to generate the next web page.
Even more astounding is the fact that all Android Phones and iPhones
have easy access to a SQL database called SQLite and many applications
on your phone use it directly.
Today, many of the applications that run our banks, hospitals,
universities, governments, small businesses, and just about every computer
eventually touches something running SQL.
This ubiquity makes SQL an incredibly powerful tool and it has proven
itself over the years to be a very successful and solid data analytics
technology worth mastering.
WHY IS SQL BENEFICIAL
SQL is especially beneficial because it provides some standardization to
the way data in databases can be queried. It is tremendously flexible,
powerful, and very accessible, which makes it simple to master. Some of
the key benefits of using SQL to store, manage and analyze data over other
approaches are listed below:

1. You can query and make updates to data in a databases.


2. You can look up data from a database relatively rapidly.
3. You can relate data from two different tables together using JOINs.
4. You can create meaningful reports from data in a database.
5. Your data has a built-in structure to it.
6. Information of a given type is always stored only once.
7. SQL Databases can handle very large data sets (compared to excel
spreadsheets).
8. SQL Databases are concurrent i.e. multiple users can use them at the
same time without corrupting the data.
9. SQL Databases scale well (well beyond the data volumes of what can
be handled in simple excel spreadsheets).

With SQL, you can build databases, enter data into the database, manipulate
data, and query the database data with relative ease.
Ninja Tip 9.
There are many database products such as Microsoft SQL Server, Oracle,
Netezza, Teradata, MySQL, etc. which support SQL.
RDBMS SYSTEMS
At the core of SQL is the Relational Database Management System
(RDBMS).
DBMS Defined
A relational database management system (RDBMS) is a program that lets
you create, update, and administer a relational database.
In an RDBMS, data is structured in database tables, fields, and records.
Tables within the RDBMS might be related by common fields for easy
cross-table querying.
RDBMS also provides relational operations (in the form of SQL) to
manipulate and/or store data into the database tables.
RDBMS is the basis for SQL in all modern database systems like MS SQL
Server, IBM DB2, Oracle and MySQL
KEY CHARACTERISTICS OF THE RDBMS

Data is stored in a set of tables. Each RDBMS table consists of


database table rows. Each database table row consists of one or more
database table fields.
Rows represent records and columns represent record attributes or
fields.
Each row in the table is usually identified by a primary key that
uniquely identifies the record to other systems.
RDBMS use several design patterns to Reduce Duplication of data in
database (Normalization).
Every row in the same table has exactly the same number of columns
(even though some of the column values might be NULL)
RDBMS has Data Manipulation Language (SQL) that is used for
querying and manipulating data in the RBDMS.
ACID is utilized to keep transactions reliable. The acronym refers to
the four key properties of a transaction: Atomicity, Consistency,
Isolation, and Durability.

Atomicity: All changes to data are performed as if they are a single


operation.
Consistency: Data is in the same state when a transaction starts and
when it ends.
Isolation: The intermediate state of a transaction is invisible to other
transactions.
Durability: After a transaction successfully completes, changes to data
persist and are not undone, even in the event of a system failure.

VENDORS OF RDBMS
There are many vendors supplying RDBMS products in the market —
some of which are proprietary and some which are open-sourced. A few
examples of these RDBMS systems include: PostgreSQL, SQLite, MySQL,
MSSQL Server, Oracle, Teradata, Netezza, and Sybase.
MANY FLAVORS OF SQL
Although (in theory) SQL is standardized, in practice it is not. There are
many vendors in the market and each of them has their own variation and
flavor of the language. In general, SQL written for one RDBMS system,
such as Sybase, may not work for another RDBMS system, such as MySQL
or PostgreSQL, because the syntax is different.
SQL Portability
SQL database platforms tend to implement the SQL standard in
different ways. For example, the SQL date and time data types are
sometimes omitted in favor of proprietary solutions.
PostgreSQL notoriously contains a number of custom data types; for
instance, it provides an entire range of data types that define geometric
objects (e.g. box and line). These geometric object types are not necessarily
available in other database systems, so the database developer who uses
those types may be “locked in” to PostgreSQL. This situation would arise if
converting the geometric object types to another type usable by a different
database would consume too much time or be altogether impossible.
An argument typically made against complaints about SQL’s lack of
portability is that the SQL standard, despite being long and complex, is not
completely defined and, in some cases, is ambiguous.
(Learn)
DATA ANALYSIS WITH SQL
Today, SQL is the premier language used for querying and working with
relational data. This is accomplished by writing SQL query statements.
SQL statements are divided into two main categories: DML (Data
Modification Language) and DDL (Data Definition Language). Below, we
provide a high-level overview of these two categories and how to leverage
them for your data analytics needs.

SQL commands. (TechnologyCrowds)


Data Manipulation Language (DML)
DMLs manipulate data. As such, they are usually used for inserting data
into database tables, retrieving existing data, deleting data from existing
tables and modifying existing data. DML never modifies the schema of the
database (table features, relationships, etc.)
Example SQL DML:

SELECT
UPDATE and INSERT statements
JOINS
DISTINCT
IN
BETWEEN
LIKE
GROUP BY
ORDER BY
PARTITION BY
AGGREGATE FUNCTIONS such as AVG, MIN, MAX, SUM,
COUNT, etc.

The processes performed by DML statements are what a data ninja may
be tasked with performing on a day-to-day basis. That is why it is
recommended that readers are proficient or at least familiar with the
concepts of writing SQL DML statements.
Data Definition Language (DDL)
DDL statements are used to build and modify the structure of objects in a
database. These database objects include views, schemas, tables, indexes,
etc. Some examples of DDL statements:
Example SQL DDL:

CREATE - create objects in the database.


ALTER - alter the structure of the database.
DROP - delete objects from the database.
TRUNCATE - remove all records from a table.
RENAME - rename an object.

Data Control Language (DCL) and Others


In addition to DMLs and DDLs, there are more advanced topics in SQL
used for interacting with the RDBMS. DCL (Data Control Language) and
TCL (Transaction Control Language) are used to manage transaction
integrity, security around the data and more.
Data Control Language (DCL) is used to create roles, permissions, and
referential integrity as well it is used to control access to database by
securing it.
Example SQL DCL:

GRANT - give users access privileges to database.


REVOKE - withdraw access privileges given with the GRANT
command.
EXECUTE AS – used for impersonation, to run as a particular user.

Transaction Control Language (TCL)


Transaction Control Language (TCL) statements are used to manage the
changes made by DML statements. They allow statements to be grouped
together into logical transactions.
Example SQL TCL

COMMIT - save work done.


SAVEPOINT - identify a point in a transaction to which you can later
roll back.
ROLLBACK - restore database to original since the last COMMIT.
SET TRANSACTION - Change transaction options such as isolation
level and which rollback segment to use.

THE DBA: ROLE AND RESPONSIBILITIES


Commercial RDBMS systems such as Microsoft's SQL Server, Oracle
DB, MySQL and IBM's DB2 are complex applications that call for
specialized knowledge and training. As a result, some organizations hire
dedicated database administrators (DBA) to manage and administrate their
RDBMS environments.
This role of DBA, which is usually within the Information Technology
department, is charged with the creation, maintenance, backups, querying,
tuning, user rights assignment and security of an organization's databases.
DBA RESPONSIBILITIES

Installation, configuration and upgrading of Microsoft SQL


Server/MySQL/Oracle server software and related products.
Establish and maintain sound backup and recovery policies and
procedures.
Take care of the Database design and implementation.
Implement and maintain database security (create and maintain users
and roles, assign privileges).
Database tuning and performance monitoring.
Application tuning and performance monitoring.
Setup and maintain documentation and standards.
Plan growth and changes (capacity planning).
Do general technical trouble shooting and give consultation to
development teams.
Setup and maintain documentation and standards

TYPES OF DBA

Administrative DBA – Work on maintaining the server and keeping it


running. Concerned with backups, security, patches, replication, etc.
They are concerned with things that concern the actual server software.
Development DBA – works on building queries, stored procedures,
etc. that meet business needs. This is the equivalent of the
programmer. (Many data ninja’s analysts would fall into this category)
Architect DBA – Design schemas. Build tables, foreign keys, primary
keys, etc. They work to build a structure that meets the business needs
in general. The designs they produce is used by developers and
development DBAs to implement applications.

Ninja Tip 10.


As the data ninja of your organization, you might not be explicitly
responsible for playing the role of a DBA, but it still might be worthwhile
for you to have some rudimentary understanding of the concepts of the
DBA in general. This will make you more versatile, and hence more
marketable in the industry.
WHY SQL CONTINUES TO REMAIN RELEVANT FOR DATA
ANALYTICS?
There are a number of key benefits as to why it is important for data ninjas
to master RDBMS systems and to also be proficient in SQL. The top 3 of
these includes:

1. Ubiquitous: SQL is a ubiquitous standard for accessing data within


databases and many of the current programming language out there
have a way to access SQL databases.
2. Easy to learn: SQL is widely popular and is widely accepted and
utilized within the industry. It is easy to find experts who know the
subject and have years of experience on using it. It’s also relatively
easy for new comers to pick up the syntax without requiring much
training.
3. Tried and Tested: Finally, SQL is very closely tied to the relational
model, which has been thoroughly explored in regards to optimization
and scalability. Even though SQL solutions still requires manual
tweaking (index creation, query structure, etc.), the platform has been
around for a while now and is well tested in production environments.

SUMMARY
A lot of the data in existence today is stored in RDBMS databases and
SQL is the premier interface used to access and manipulate this data.
Your smartphone stores its contact database in a relational database. Your
online banking information and all your financial history, statements,
personal data and so forth, are all stored in a relational database of some
sort.
SQL is the primary language used to access and analyze this information.
So, as a data ninja who aspires to work intimately with data, it is paramount
to master SQL.
STRUCTURED QUERY LANGUAGE (SQL) - NINJA RESOURCES
CORNER & SHADOW SKILLS
Now that you have read this nugget on SQL and have got your feet wet, I
would recommend that you continue to read and practice your skills. The
more you read the more you are going to learn, and pretty soon you will be
fluent in writing SQL statements to interrogate data.
I have put together the best learning resources that I have come across to
help you on your journey. Below you will find links to paid online courses
that have spent years developing their videos and courses to really immerse
you into the material. If you are not ready to shell out some cash, there are
also free courses that have excellent resources available.
First we will look at some of the premium resources that are available to
purchase. These resources typically have the most to offer and will carry
you further than some of the free resources.
Lynda.com
Lynda.com is a very popular online education company. They offer
thousands of different courses for creative software and business skills.
Inside the MYSQL course they have several different skill levels from
beginner to advance depending on your skill level. They offer a 10 day trial
period to get started; a perfect way to see if their services are right for you.
At the end of the trial you can choose between different levels of payment,
from $25 a month on a month-to-month basis to $375 annually.
Link : http://www.lynda.com/search?q=sql
Infinite Skills
Infinite Skills was recently purchased by O’Reilly media. They offer 142
training videos on MYSQL. They also have downloadable practical files
that help further your skills beyond the videos. I find their website to be a
little confusing and counterintuitive but they do offer good videos that will
help you. They offer a $25 month-to-month fee that includes a mobile app.
Link: http://www.infiniteskills.com/training/sql-beyond-the-basics.html
Learn Now Online
Learn Now Online has a wide range of topics from programming and
mobile development to SQL. They have a nice set of online videos and
options to choose from. In terms of cost, they are a little more affordable
with options starting at $49 annually.
Link: http://www.learnnowonline.com/
Paid premium courses are not for everyone, maybe you want to dive in a
little deeper before you decide to pay to further your skills. Below are some
great free online courses.
Udemy
Udemy is an online marketplace where experts can create their own courses
which can then be offered to the public for free. Each course has a different
author and has user reviews so you can decide which course will be best for
you.
Link: http://www.udemy.com/
Learn Code the Hard Way
Learn Code the Hard Way offers books on various subjects. They are
currently working on an SQL book, but have posted the book online for free
while they work on it. You can view the book by chapters. Don’t let the
name fool you, it is very approachable with easy to understand topics. I am
assuming once the book is completed it will be available to purchase from
their website.
Link: http://sql.learncodethehardway.org/
SQLSeverCentral
SQL Server Central is a resource in the Microsoft SQL severs community.
It has many DBAs, developers and users, plenty of useful and valuable
information here. This is one to keep booked mark as you continue your
career in SQL.
Link: http://www.sqlservercentral.com/
SQL Fiddle
SQL Fiddle allows you to select a database, build a schema, populate the
schema, and run queries against it. SQL Fiddle is a great resource for
practicing different syntax of SQL and testing your queries.
Link: http://sqlfiddle.com/
Database Journal
Data base Journal is a script library, they offer a huge data base in an assay
of subjects. They have articles, news, and tutorials all offered for free. Feel
free to post questions and comments on their forum. They update their
databases frequently and have topics that date back to 2010 up to 2015.
Link: http://www.databasejournal.com/scripts/
SQL-Tutorial
SQL-Tutorial offers problems and solutions. They have resources for
novice users and those who feel they already have a grasp on SQL but want
to learn more. They will help you to program queries. The information is
presented as an Ebook you can read through.
Link: http://www.sql-tutorial.ru/
1KeyData SQL
1KeyData SQL is a very nice resource to help you with SQL. They have
common SQL commands, functions, constraints, and tables available to
access whenever you may need them. They also offer some video tutorials
and quizzes to help you along.
Link: http://www.1keydata.com/sql/sql.html
The Schemaverse
The Schemaverse is a space-based strategy game implemented entirely
within a PostgreSQL database. Play against other players using raw SQL
commands to command your fleet. This is a fun way to keep your skills
sharp.
Link: https://schemaverse.com/
SQL Zoo
SQL Zoo is a step-by-step tutorial with live interpreters, allowing access to
tables using any of Oracle, SQL server, MYSQL, and PostgreSQL engines.
Once you feel ready they also have online quizzes to help assess your skills.
Link: http://sqlzoo.net/
Tutorials Point
Tutorials Point has tons of free online tutorials and reference manuals. They
also offer premium services at a fee, if you decide to pay you will get
premium support and instructor help. If you do not wish to pay, their free
resources are excellent and will help you from the installation of MYSQL
all the way through to importing databases.
Link: http://www.tutorialspoint.com//sql/index.htm
SQLCourse
SQLCourse is an interactive online SQL training resource that offers free
training. They will get you started with the basics and move you along to
more advanced topics. The site is funded by advertisements so you will
have to scroll past some ads while you are reading but they offer great
material all about SQL.
Link: http://www.sqlcourse.com/
W3Schools
W3Schools offers free material to view but they also offer premium
services such as certificates. In order to receive a certificate you must pay
the premium price of $95 and pass an online test. It has a built in interpreter
in the browser so you can try different queries and see the outcomes.
Link: http://www.w3schools.com/
Sol Tutorials
Sol Tutorials GalaXQL is an interactive SQL tutorial. This is another fun
tutorial, take the journey into outer space while writing SQL code. The site
was created by Kari Komppa and is totally nonprofit and is very limited
with ads, an enjoyable resource all around.
Link: http://sol.gfxile.net/galaxql.html
These are all great online resources you can use to help yourself along on
your journey to mastering SQL, but sometimes you need to give your eyes a
rest from the screen and turn to a physical book.
Holding a book and turning the pages has always been my favorite way to
learn. There is something about highlighting and underlining key points in
the book that just seems to help me remember.
I have collected some of my favorite titles and created a small list below.
These titles can be found anywhere that sells computer reference titles.
SQL in 10 minutes
Author: Ben Forta
Published by: Sams Publishing
SBN-13: 978-0672336072
Learning SQL
Author: Alan Beaulieu
Published by: O’Reilly Media
ISBN-13: 978-059652083
SQL Cookbook
Author: Anthony Molinaro
Published by: O’Reilly Media
ISBN-13: 978-0596009762
SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in
SQL (3rd Edition)
Author: John Viescas
Published by: Addison-Wesley Professional
ISBN-13: 978-0321992475
Head First SQL
Author: Lynn Beighley
Published by: O’Reilly Media
ISBN-13: 978-0596526849
It is important to keep learning and using what you’ve learned. You will
only get more proficient as time goes on, so keep fine tuning your skills and
never give up.
KEY NINJA LESSONS

Gain familiarity and proficiency in using SQL to interrogate data in


databases.
Understand Relational database Management Systems (RDBMS), the
players in the market and how they work to store and manage data at
the fundamental level.
Go beyond SQL into the NoSQL (Not Only SQL) world. Gain
familiarity or at least some elementary understanding of NoSQL
databases and also understand why the trend is moving in that
direction.
TAME DATA WAREHOUSING

MOTIVATION
In the first Nugget, I addressed the importance for a data ninja to
leverage Excel for their analytical needs. The second Nugget was about
leveraging RDBMS systems to go beyond Excel for analysis. In this
Nugget, we will explore the intricacies of working huge datasets and the
need to adequately store them in data warehouses.
Numbers have an important story to tell. They rely on you to give them a
clear and convincing voice.
-Stephen Few
As mentioned in the nugget for MS Excel, using Excel for big data projects
can pose some serious challenges, especially relating to privacy, data
redundancy and concurrency issues that arise when users retain their own
personal copies of sensitive corporate data on the personal computers and
laptops.
Because of these challenges with using Excel spreadsheets alone,
companies often find themselves needing other more robust, enterprise-
scale solutions to help out.
The solution that often comes to the rescue when companies are
challenged with the need to move and store huge volumes of data falls
broadly into the category of Data Warehousing (DW).
WHAT IS A DATA WAREHOUSE?
Many organizations today have a data warehouse of one form or another.
A data warehouse serves many purposes within organizations, but at a basic
level a data warehouse is defined as a massive database typically housed on
a cluster of servers, or a mini or mainframe computer serving as a
centralized repository of all data generated by all departments and units of a
large organization.
Brief History of Data Warehousing
The term was coined by the W. H. Inmon, a well prominent figure in the
field of data warehousing.
The DW consolidates data from a variety of sources in one centralized
location and is typically designed to support Business Intelligence
processes, along with strategic and tactical decision making.
Data Warehousing Defined
Data warehousing allows a company or organization to create a
consolidated view of its enterprise data, optimized for reporting and
analysis. Basically, a data warehouse is an aggregated, sometimes
summarized copy of transaction and non-transaction data specifically
structured for dynamic queries and fast, efficient business analytics.
(InformationBuilders)
With all the companies’ data available in one location, i.e. the Data
Warehouse, companies can provide data consumers with a coherent picture
of the business at a point in time.
DATA WAREHOUSE ARCHITECTURE
In data warehousing, data and information are extracted from
heterogeneous production data sources as they are generated, or in periodic
stages and loaded to the Data Warehouse. This approach makes it simpler
and more efficient to run queries over data that originally came from
different sources.
The diagram below captures the complete architecture of an end-to-end
data solution within a company and it shows the pivotal role played by the
Data Warehouse.

Data Warehouse ETL Architecture. (serra)


From the image illustration above, we see data coming in from various
data sources, including CRM, ERP, and any other data sources the company
may have.
These incoming data sources get cleansed in staging areas, and
eventually gets stored in the DW. From the DW, end user applications like
Excel, Microsoft Stack (SQL Server, SSIS, SSRS, and SSAS), Oracle
OBIE, Scribe, Informatica, DataStage, Tableaux, MicroStrategy, QlikView,
etc. can all access and consume the data directly or push down other
streams for consumption.
WHY IS THE DATA WAREHOUSE IMPORTANT?
The Data Warehouse proves to be especially relevant because data and
information are extracted from heterogeneous production data sources as
they are generated, or in periodic stages, making it simpler and more
efficient to run queries over data that originally came from different
sources.
The Data Warehouse therefore becomes an ideal go-to source for Data
Ninjas looking to do analysis work because they are more than likely going
to find a vast majority of the data needed for analysis in one single source
rather than having to connect to many disparate sources.
Point to Note
The task of building a data warehouse can take several years and usually
involves a team of highly specialized data professionals whose tasks is to
build both the data warehouse model and the integration points to get data
from source systems around the company such as CRM, ERP etc.
For the purposes of book, we see Data Ninja’s as people who are not
involved with the actual building of the Data Warehouse, but depend on the
data that is present in the DW for their analytical needs.
KEY BENEFITS OF THE DATA WAREHOUSE
As a data ninja tasked with analyzing the company’s data, the job
requirements would typically involve handling and working with data
coming from or going into a data warehouse.
In this regard, we see that the data warehouse offers a number of key
benefits to the analytics process as a whole. Below, we provide a list of a
few of these benefits.
Benefits of the Data Warehouse

Standardizes data across an organization.


Consolidation of Data from Multiple Sources.
Timely Access to Data.
Available of Historical Data for Analysis.
Having one version of the truth, so each department will produce
results that are in line with all the other departments, providing
consistency.
Enhanced Data Quality and Consistency.
Reduction in manual reconciliations.

DIMENSIONAL MODELLING
Dimensional modelling is a crucial part of the Data Warehouse process.
Given that most data warehouses today follow the dimensional model
pattern, an understanding of the concept of dimensional modelling is
therefore extremely important when performing analytics.
In dimensional modelling, all data is contained in two types of tables
called Fact Table and Dimension Table. The Fact table contains the
measurements, metrics or facts of business processes, while the
Dimensional Tables contain the context of the measurements.
Dimensional modeling is different from the normalized modeling (which
is more focused on reducing and eliminating data redundancy) to enable
analysis and querying through massive and unpredicted queries. The
processing of massive and unpredicted queries are some of the things which
is a relational model is ill-equipped to handle.
Dimensional Model Pro’s ad Con’s
Pros:

Data Retrieval performance.


Good for analysis- slice and dice, roll up drill down.
Easy for maintenance and interpretation by the administrators.

Cons:

Data loading time is increased.


Reduces or diminishes flexibility in case of business change or
dimension change.
Storage increases due to denormalization. Same information might
multiply considerably during storage.

Example of Dimensional Modeling


Dimensional Model (Data-Warehouses)
The example above shows an example of a dimensional model of
company sales information. In the model presented, Units sold is a Fact and
Location, Date, and Product are Dimensions.
An analyst working with the model can be able to analyze the sales (fact)
across the different dimensions of Product, Channel, Customer, Order, Store
and Time.
Why is Dimensional Modeling Beneficial to the Data Ninja?
Ease of Use
Dimensional modeling is very beneficial to the Data Ninja because it is a
well-established DW design approach and is understandable by the business
because information is grouped into coherent business categories or
dimensions that make sense to business people. Usually, dimensional
models are completely based on business terms, so the business knows what
each fact, dimension, or attribute means.
Query Performance
In addition to their ease of use, Query Performance is the second reason
dimensional modeling is of great use to Data Ninjas. Denormalized
dimension hierarchies have a significant impact on query performance and
can easily be optimized for better Query Performance. Dimensional models
are also very extensible, allowing for new attributes to easily be added to
the Dimensional Tables without affecting facts in the Fact Table.
SOME KEY DATA WAREHOUSE DEFINITIONS AND CONCEPTS

Data Warehouse: The queryable source of data in the enterprise.


Data Mart: A logical subset of the complete data warehouse.
Operational Data Store (ODS): The point of data integration for
operational systems.
Metadata Database: All of the information in the data warehouse
environment that is not the actual data itself. It is centrally maintained
and stored.
Entity Relationship Models (ER): Entity-relationship modeling is a
logical design technique that shows the relationship between data.
Dimensional Models: Dimensional modeling is the name of a logical
design technique used for data warehouses. Every dimensional model
is composed of a “fact” table and a set of “dimension” tables.
Facts: A fact is an event that happened or that has been measured,
usually captured as a number, e.g. a single sale of a product to a
consumer or the total amount of sale in a specific month is a fact.
Dimensions: A dimension relates to facts and contains attributes that
can be used to add qualitative information to the numeric information
contained in facts. E.g. A dimension can be a list of products or
customers, or time space that can be used to analyze the fact.
OLAP: (Online Analytical Processing) Online analysis of transactional
data. OLAP tools enable users to analyze different dimensions of
multidimensional data.
OLTP: Online Transaction Processing. This is a class of information
systems that facilitate and manage transaction-oriented applications,
typically for data entry and retrieval transaction processing.
ETL: This is the short acronym for the Extract, Transform, and Load
process. ETL processes retrieves data from operational systems and
pre-processes it for further analysis by reporting and analytics tools.
It's also the ETL processes that is responsible for feeding a data
warehouse with data.

Ninja Tip 11.


As a data ninja, the details and intricacies of data warehousing might not be
directly required in performing your day to day job – which is analyzing
data.
But having some familiarity with these concepts, the tools and their
functions is highly recommended.
Such understanding will not only put you ahead of your peers in terms of
marketability, but will also serve to give you the full picture of data through
its life cycle — from creation layer, etl layer, storage layer and eventually
the consumption - which is what might be of most importance to you.
SUMMARY
Like many other concepts discussed in this book, the Data Warehouse is
at the forefront of companies’ data ecosystem and they influence the way,
Businesses acquire, store, analyze data and consume data for business
planning and decision making.
As Data Ninjas who will be tasked with working with the companies
data, you will undoubtedly encounter the data warehouse as part of the
process of working with data. You might use it either as a source or
destination of datasets used for analysis purposes.
Hence, an understanding of data warehouse system architecture will be
important in your responsibilities of being an effective data analyzer.
MS DATA WAREHOUSING - NINJA RESOURCES CORNER &
SHADOW SKILLS
Data Warehouses can be a bit daunting, but after reading this chapter, I
hope you feel a little more at ease. Since Data Warehouses are crucial to
most enterprises it is important to fully understand how they work and how
you can harness their full potential. Below are some resources to help you
further your knowledge and understanding of the concepts.
The Data Warehousing Information Center
The Data Warehousing Center is a vast collection of essays and articles
about everything data warehousing. They geared the site towards someone
who is just now getting started with data warehouses.
Link: http://www.dwinfocenter.org/getstart.html
Tutorials Point
I’ve listed Tutorials point elsewhere because they offer so many great
resources. Again they offer some great tools for learning more about data
warehouses. Before you start with them you should have a basic
understanding of database concepts such as schema, ER models, and
structured query language.
Link: http://www.tutorialspoint.com/dwh/
Learning Data Modeling
Learning Data Modeling focuses on the concepts of data warehouses. It is
geared towards the novice user and offers guides and picture graphs to help
you understand their concepts. The site is relatively small but has some
good articles.
Link: http://www.learndatamodeling.com/
1KeyData
Another site that you will see throughout my book, 1KeyData has an
impressive collection of tutorials and resources. Not focused directly to
beginners but rather tries to bring those with a basic level of understanding
to a higher-level of understanding. You will learn about the tools needed to
implement a data warehouse, the steps needed to fulfill your needs, and the
concepts that cover data warehouses.
Link: http://www.1keydata.com/datawarehousing/datawarehouse.html
Why Learning Data Warehousing Still Matters
This is a great article about why you should learn about data warehouses. It
does not offer guides or tutorials but will keep you motivated If you start to
lose faith in the importance of learning about data warehouses.
Link: https://infocus.emc.com/william_schmarzo/why-learning-data-
warehousing-still-matters/
Data Warehousing: Academic Tutorials
Academic Tutorials totes the slogan of “Quick and Easy Learning”. They
want you to get in and get out as quick as possible while packing in the
most information possible. Their tutorials go over pretty much everything
you need to know about data warehouses. The site is not very user friendly
and a bit scattered but if you can navigate through the site you will be
heavenly rewarded.
Link:http://www.academictutorials.com/data-warehousing/data-
warehousing-introduction.asp
Lynda
The great Lynda makes another appearnce. Lynda.com is a great resource
for learning anything you need to know. This is a premium site that charges
a fee for their courses but it is worth the money. Their support is supurb and
their courses are great. There is nothing here that you couldn’t find for free
with a bit of searching but if you want a one stop shop site then Lynda.com
is the place for you.
Link:http://www.lynda.com/SQL-Server-tutorials/Implementing-Data-
Warehouse-Microsoft-SQL-Server-2012/156150-2.html
Data Modeling 101
Data Modeling 101 has some great picture graphs that easily layout the
information you need. The resources provided are limited but if you read
through their pages you will pick up some good lessons and well thought
out information.
Link: http://agiledata.org/essays/bestPractices.html
Wikipedia
A lot of people shy away from Wikipedia, but when it comes to tech related
information it is usually a good place to start. They can lay out what it is in
plain English without relying on techno jargon. They also have some good
links that will help you find good information. It is definitely worth
checking out.
Link: https://www.wikipedia.org/
Along with the web based resources, I highly recommend Ralph Kimball’s
book The data Warehouse Toolkit. It is a leading authoritative guide on data
warehousing. In its second edition The data Warehouse Toolkit has
developed into the most comprehensive collection on dimensional modeling
for data warehousing. A must read if you want to fully understand data
warehouses.
The Data Warehouse Toolkit
Author: Ralph Kimball
Published by: Wiley
ISBN-13: 978-0471200246
In all, I know data warehousing can seem obscure but it is important and the
more you know the more important you will be. Data warehouses are
crucial to most enterprises and mastering how they work and how to best
use them will definitely further your career as a Data Ninja.
KEY NINJA LESSONS

Data warehousing concepts are vital to learn as they provide the full
picture of data through its life cycle — from creation, movement,
storage and consumption.
Dimensional model may be used for any reporting or querying of data
The data warehouse provides an environment separate from the
operational systems and is completely designed for decision-support,
analytical-reporting, ad-hoc queries, and data mining.
PICKUP CODING
MOTIVATION
Computers are critical component of all our lives. Most things we
interact with in the world today are now run directly or indirectly by
computer systems. As a result, it's become crucial than ever for everyone
(young and old) to learn programming or at least understand the concepts.
Bill Gates and Mark Zuckerberg recently donated ten million dollars to
Code.org, a non-profit that believes that “every student in every school
should have the opportunity to learn computer programming,” and that
“computer science should be a part of the core curriculum.”
(NewYorker)
MAKING A CASE FOR PROGRAMMING SKILLS
Coding is not a goal. It’s a tool for solving problems. Learning to program
teaches computational thinking and Computational thinking teaches people
how to tackle large problems by breaking them down into a sequence of
smaller, more manageable problems.
You Can Play God
When you program, you are a creator. You go from a blank text file to a
working program with nothing to limit you but your imagination (and
maybe some issues like how long your program takes to run). Programming
is like having access to the absolute best set of legos in the world in almost
unlimited qualities. Even better, you can get all of your building materials
completely for free (once you own a computer) on the internet. Amazing!
It's also great fun to see someone using something that you made. Your
ability to improve your life and the lives of your friends and family is
limited only by your ideas once you can take full control of your computer.
Moreover, your work can be extremely high quality because the limiting
factor is not manual dexterity or other non-mental attributes. If you can
understand a programming technique, you can implement and use it.
(Cprogramming.com)
WHAT IS PROGRAMMING?
In general, programming is defined as the vocabulary and set of
grammatical rules for instructing a computer to perform specific tasks.
Programming is the process of designing, writing, testing, debugging, and
maintaining the source code of computer programs. This code can be
written in a variety of computer programming languages. Some of these
languages include Java, C, and Python. Computer code is a collection of
typed words that the computer can clearly understand. Just as a human
translator might translate from the English language to Spanish, the
computer interprets these words as ones and zeros. We as humans use
programming languages, instead of writing directly in ones and zeros, so we
can easily write and understand the computer code and can organize it. We
can think of the different lines of our code as being individual instructions
that we give to the computer. The computer follows these instructions
explicitly to execute our written code.
(EarSketch)
Programming is highly detailed work, and it usually involves fluency in
several languages. Projects can be short and require only a few days of
coding, or they can be very long, involving upward of a year to write.
WHY LEARN HOW TO PROGRAM?
Many reasons can be given as to why it is important to learn
programming. But what is most important of all of the reasons that can be
provided is the attitude embodied by most programmers. Programmer’s use
their skills to primarily discompose and solve complex and challenging
problems.
It is often said that some people, when faced with a challenging
situations, throw their hands up in surrender and run away. Others, when
faced with similar challenging problems, will set about trying to break
down the problem into subsets and work on it until they understand what is
going on. The latter are those who make for good programmers. They solve
challenging problems and they like doing it.
What Experts Say About the Mastery of Programming Skills
Coding isn’t particularly easy to learn but that’s exactly why it’s so
valuable. Even if you have no plans to become a software developer, spend
a few weeks or month learning to code and I can guarantee it will sharpen
your ability to troubleshot and solve problems.
(DIY Genius)
A deep understanding of programming, in particular the notions of
successive decomposition as a mode of analysis and debugging of trial
solutions, results in significant educational benefits in many domains of
discourse, including those unrelated to computers and information
technology per se.
(Seymour Papert, in "Mindstorms")
It has often been said that a person does not really understand something
until he teaches it to someone else. Actually a person does not really
understand something until after teaching it to a computer, i.e., Express it
as an algorithm.
(Donald Knuth, in "American Mathematical Monthly," 81)
Computers are not sycophants and won't make enthusiastic noises to ensure
their promotion or camouflage what they don't know. What you get is what
you said.
(James P. Hogan in "Mind Matters")
“I think everybody in this country should learn how to program a computer
because it teaches you how to think.”
(Steve Jobs)
When you learn to read, you can the read to learn. And it’s same the thing
with coding: If you learn to code, you can the code to learn.
(Mitch Resnick)
Work your way up the programming ladder
As you work with data and mature within the data analytics space,
inevitably you might progress from working with small data in spreadsheets
to crunching Big Data with tools like Hadoop and Map Reduce and then
maybe onto being a Data Scientist. In such roles, the need for programming
becomes even more paramount.
But, when we talk about programming, it does not have to be fancy or
complicated. It can be as simple as creating simple routines, or scripts, or
workflows to automate mundane tasks, such as moving files, searching
folders, merging data sets, creating new datasets, de-duplicating datasets,
standardizing datasets, etc.
So start small and walk your way up the ladder by continuously
practicing and developing your skills.
STARTING PROGRAMMING AS A NOVICE
As a novice to programming, you can start simple. The journey of a
thousand miles begins with the first step.
The most common question asked by anybody new to computer
programming is “What language is the best to start with?“. Many people
will tell you to jump straight into it by learning a more advanced language
such as C++ or Java, others will tell you to start with a more dated language
such as C. In my personal opinion, the best programming language to begin
learning is Visual Basic .NET. VB.NET is a really good language to learn
for a beginner because it requires no previous experience in programming.
The Syntax used in VB.NET is simple and very easy to understand.
Learning Visual Basic will give you a basic understanding of how computer
programming works and is also really entertaining! Although VB.NET is a
good place to start, I would not recommend using it for too long. More
advanced languages have a more advanced syntax and spending all of your
time using VB.NET could make it harder to move onto the more advanced
languages in the future.
Although every programming language has a different syntax, most
programming languages are similar. The first language that you learn will
be the hardest language that you learn because the concept will be new to
you. After learning your first language, you will have an understanding of
how computer programming works and that will help you a lot when it
comes to learning other languages. If you chose a language such as C++
with a more complicated syntax then it is going to be very confusing and
hard for you to understand if you do not have any prior experience. The first
language that you choose to learn is completely your choice, but we
strongly recommend that you begin with VB.NET.
(HowTo)
As the excerpt from Howtostartprogramming.com article articulates, the
first step to programming may entail choosing a language and then writing
a simple “Hello, World!” program. It’s that simple.
From there you can progress to understanding more complex concepts,
such as language syntax, operators, variables and assignments, data types,
flow controls, arrays and iterators, etc.
With the simpler concepts mastered, depending on the programming
language, you can then progress to other concepts such as classes, objects,
methods, instances and instantiation. Eventually you can move on to more
advanced concepts like threads, concurrency, etc.
Practice Makes Perfect
I must acknowledge that getting into the programming game can prove to
be a challenge and poses a serious learning curve for non-programmers.
But, I would encourage anyone looking to take that step to not be
intimidated by the process.
The one important thing I’ve come to realize is that when learning to
program, as with any other thing we learn in life, we don’t start off by being
experts.
It takes practice, courage, determination and then some more practice in
order to succeed. I wish I could say it otherwise, but there is simply no way
of getting around the practice part of it. So, go out and start practicing.
Programming Success
o Get interested in programming, and do some because it is fun. Make
sure that it keeps being enough fun so that you will be willing to put in
your ten years/10,000 hours.
o Program. The best kind of learning is learning by doing. To put it
more technically, "the maximal level of performance for individuals in
a given domain is not attained automatically as a function of extended
experience, but the level of performance can be increased even by
highly experienced individuals as a result of deliberate efforts to
improve." (p. 366) and "the most effective learning requires a well-
defined task with an appropriate difficulty level for the particular
individual, informative feedback, and opportunities for repetition and
corrections of errors." (p. 20-21) The book Cognition in Practice:
Mind, Mathematics, and Culture in Everyday Life is an interesting
reference for this viewpoint.
Talk with other programmers; read other programs. This is more
important than any book or training course.
If you want, put in four years at a college (or more at a graduate
school). This will give you access to some jobs that require credentials,
and it will give you a deeper understanding of the field, but if you don't
enjoy school, you can (with some dedication) get similar experience on
your own or on the job. In any case, book learning alone won't be
enough. "Computer science education cannot make anybody an expert
programmer any more than studying brushes and pigment can make
somebody an expert painter" says Eric Raymond, author of The New
Hacker's Dictionary. One of the best programmers I ever hired had
only a High School degree; he's produced a lot of great software, has
his own news group, and made enough in stock options to buy his own
nightclub.
Work on projects with other programmers. Be the best programmer on
some projects; be the worst on some others. When you're the best, you
get to test your abilities to lead a project, and to inspire others with
your vision. When you're the worst, you learn what the masters do, and
you learn what they don't like to do (because they make you do it for
them).
Work on projects after other programmers. Understand a program
written by someone else. See what it takes to understand and fix it
when the original programmers are not around. Think about how to
design your programs to make it easier for those who will maintain
them after you.
Learn at least a half dozen programming languages. Include one
language that emphasizes class abstractions (like Java or C++), one
that emphasizes functional abstraction (like Lisp or ML or Haskell),
one that supports syntactic abstraction (like Lisp), one that supports
declarative specifications (like Prolog or C++ templates), and one that
emphasizes parallelism (like Clojure or Go).
(Norvig)

WHICH PROGRAMMING LANGUAGES TO LEARN DATA


ANALYSIS
Given that this book is ultimately about informing you on importance of
learning programming and being able to use it in your role as a data ninja
when analyzing data, we’ve pulled together survey results of some of the
popular programming languages in the industry to show how they all stack
up in terms of popularity.
Which programming tools to use for data analysis. (kdnuggets)
DIVERSE SKILL SETS FOR PROGRAMMING
Programming requires a very rich and diverse set of skills to master, and
there are many programming concepts out there one can potentially learn.
But you probably do not have to go out with the aim of learning every
single programming concept there is to learn — that simply isn’t possible
and won’t even make any sense at all.
The main thing that would be needed is an understanding of the concepts
and how to apply those concepts to solve business problems — as opposed
to getting locked down in a particular programming language or syntax.
Once you have mastered the fundamental concepts of programming, you
can then apply them to solve specific business problems — regardless of
which language or vendor product you use.
Comparing programming to some physical tasks, programming does not
require some innate talent or skill, like gymnastics or painting or singing.
You don't have to be strong or coordinated or graceful or have perfect pitch.
Programming does, however, require care and craftsmanship, like carpentry
or metalworking. If you've ever taken a shop class, you may remember that
some students seemed to be able to turn out beautiful projects effortlessly,
while other students were all thumbs and made the exact mistakes that the
teacher told them not to make. What distinguished the successful students
was not that they were better or smarter, but just that they paid more
attention to what was going on and were more careful and deliberate about
what they were doing.
(Eskimo)
PROGRAMMING IS THE GATEWAY TO BROADER LEARNING
The point of learning to program or learning programming concepts is
not necessarily to label yourself as a “programmer”. It’s about learning how
to tackle and solve complex problem.
It’s understandable that some readers with less technical inclinations
would be scared of being called a “Programmer”, instead they might prefer
“Analyst” or some other title of their choosing. But, Mitchel Resnick of
MIT media labs put it rightly when he said, “coding is a gateway to broader
learning.”
Once you have mastered programming, it can provide you with the
means to think creatively, reason systematically and work collaboratively to
solve countless other problems. This is an exciting proposition, and that is
why learning to code or read code is highly recommended, and will help
you not only excel, but thrive as a data ninja.
Good coders are a special breed of persistent problem-solvers who are
addicted to the small victories that come along a long path of trial and error.
Learning how to program is very rewarding, but it can also be a frustrating
and solitary experience. If you can, get a buddy to work with you along the
way. Becoming really good at programming, like anything else, is a matter
of sticking with it, trying things out and getting experience as you go.
(Lifehacker)
SUMMARY
Programming is a lifelong endeavor. It is like learning a language. And
like learning any (spoken) language for the first time, it is crucial you use it
frequently to remain fluent.
The resources and discussions provided in this nugget will help you get
started with a solid career in programming. But it is imperative that you
practice your new skills every day to keep it sharp and keen.
PROGRAMMING - NINJA RESOURCES CORNER & SHADOW
SKILLS
Codecademy
Codecademy is arguably the most well-known website on this list for
learning to program online. They offer courses in Web Fundamentals, PHP,
JavaScript, jQuery, Python, Ruby, and APIs. You can track your progress
and learn interactively. The site is well put together; and is a great place to
start.
Link: http://www.codecademy.com/
EdX
EdX connects students with the highest quality education, through their
institutional partners. They have a huge catalog of different categories.
They offer courses in almost anything you can think of. This is a great
resource for anyone trying to expand their knowledge in different sciences.
Link: https://www.edx.org/
Code Avengers
Code Avengers is elegantly designed to make the learning process fun.
Every course they offer is strategically designed to entertain and delight
while also educating. Code Avengers offers small mini games after each
lesson to help keep your mind relaxed and focused. It is very easy to lose
track of time while studying these courses because they are so entertaining.
If you have trouble staying focus Code Avengers is perfect for you.
Link: http://www.codeavengers.com/
ilovecoding
ilovecoding strives to turn beginners into confident developers who can
solve any programming task. They offer video tutorials in JavaScript,
Angular JS, jQuery, Node JS, and HTML5/ CSS. Their courses are
designed to be completed within one or two weeks.
Link: https://ilovecoding.org/
Code School
Code school is geared towards those who already have a good
understanding of programming and want to go more in-depth. Not only do
they offer programming courses they go over concepts like the industry’s
best practices to keep you ahead of the curve. They offer courses in Ruby,
JavaScript, HTML/CSS, and iOs.
Link: https://www.codeschool.com/
Bento
Bento offers free tutorials and for a fee they will help you along your path
to learn programming. Bento picks the best free tutorials and guides you
through them. An interesting site, worth checking out but you will probably
want to also visit some other sites to get a full experience.
Link: https://www.bento.io/
Khan Academy
The Khan Academy offers a vast array of courses that cover everything
from, coding, calculus, to computer science. People from all around the
world use Khan Academy and all join together to create an astounding
online community.
Link: https://www.khanacademy.org/
Coursera
Coursera is a huge online institution that offers courses from some of the
top universities. Coursera is available in five different languages including,
English, Spanish, French, Italian, and Chinese. This is a great way to get a
top of the line education for free.
Link: https://www.coursera.org/
Google University
Google has put together a catalog of online resources to learn an array of
things. Google University has just recently gone from their beta stage to
being live to the public and is still in its early years. This means, it is still
growing and will continue to grow. Google is a very reliable company, but
since this is not their main area of focus, beginners might find this resource
a little unapproachable.
Link: https://developers.google.com/university/
Udacity
Udacity is designed to model universities with online videos lead by
industry leaders. They have some huge giants helping them make their
videos, Google, AT&T, Facebook, Salesforce, Cloudera, just to name a few.
Udacity offers a “Nanodegree” and credenitals to give you something to put
into your resume.
Link: https://www.udacity.com/
Code Combat
A lot like Code Avengers, Code Combat is designed to teach while the user
plays a fun game. It is geared towards beginners and instantly throws you
into a game where you are writing code. The people at Code Combat really
tried to make this game addicting so you forget you are learning to code
while you are playing. It can seem like it’s for children but just because you
are an adult doesn’t mean you can’t have fun while learning too.
Link: http://codecombat.com/
The Odin Project
The Odin Project is still in its beta release but everything seems to be
running immaculately. The people who started The Odin Project felt there
was a hole in the market and they wanted to create something that would
gap that hole. They offer resources in web development, Ruby
programming, Ruby on Rails, HTML5/CSS3, JavaScript, jQuery, and offer
discussions on how to get hired as a web developer.
Link: http://www.theodinproject.com/
Quakit
Quakit offers free web tutorials, codes, templates, and tools. It’s a very
friendly site with resources in HTML, CSS, coding, databases, web hosting,
and XML. Each category offers many tutorials for you to read through.
Link: http://www.quackit.com/
Saylor Academy
Saylor Academy is a non-profit online academy. They have designed their
site after universities. They name their courses just as a college would (e.g.
arth110). Once you sign up it will feel like you are taking an online class at
your local college, however they are not accredited and you will not receive
a diploma. They offer courses in arts and sciences; this is a great resource
for anyone who wants to further their knowledge in any subject.
Link: https://eportfolio.saylor.org/
Learn Python the Hard Way
I’ve listed Learn the Hard Way platform elsewhere, but again don’t let the
name scare you off. Their online books are very approachable but force you
to type every lesson yourself so you cannot take any short cuts. The only
hard part is the self-discipline. The cost of the online book is $29.
Link: http://learnpythonthehardway.org/
Code Mentor
Code Mentor is a unique experience that connects you with real people who
specialize in in coding. Codementor connects you with experts for instant
problem solving, technical advice, pair programming, and code review.
Every mentor sets their own rate so the prices vary from mentor to mentor.
Link: https://www.codementor.io/
Career Foundry
Career Foundry is geared towards beginners with no previous experience.
They only offer two courses, web developer and user experience designer.
Link: http://careerfoundry.com/
BaseRails
BaseRails is a project-based learning site. Learn ruby on Rails along with
other web technologies. Rather than simply teaching you code they will
guide you through building a specific site, such as, review sites, market
places, data collection, and classified type sites.
Link: https://www.baserails.com/
Coder Camps
Coder Camps is a very intricate site that offers very good courses. Coder
Camps offers courses in .NET, JavaScript, iOS, and HTML/CSS. You can
apply for a scholarship to help pay for your tuition. There is a large tuition
fee from $9,000 to $12,000, they offer a deferred payment option where
you pay $1,000 down payment then pay off the rest after completion. It’s
hard to recommend a site with such a high fee when there are so many great
learning resources online with no costs. If you have the money then this site
will offer you a lot more than the free sites but don’t worry if you don’t
have the money, there are some great free sites that I’ve already listed.
Link: http://www.codercamps.com/
Code School
Code School is a little less expensive than Coder Camps but with some
great options. They offer courses in ruby, JavaScript, HTML/CSS, iOS, Git,
and many more. Their fee ranges from $29 monthly or $290 annually.
Link: https://www.codeschool.com/
Hack Reactor
Hack Reactor is an online immersive coding program. They have built their
program off classroom based learning. They have signups for their classes
and if you miss the signup you will have to wait for the next one. You will
be a student in a class with an instructor who can help you one-on-one if
needed.
Link: http://www.hackreactor.com/
Treehouse
The courses at Treehouse are great for the novice programmer. Unlike most
of the other sites listed here Treehouse is project-oriented and will help you
prepare for most projects you have planned.
Link: https://teamtreehouse.com/
BLOC
BLOC is an online boot camp with programs in web development, mobile
development, and design. BLOC has designed a structured program that is
immersive but still can fit into your busy schedule. They offer a very
structured track of courses that vary from 40 hours, 30 hours, and 15 hours.
Link: https://www.bloc.io/
Learnable
Learnable offers over 500 video tutorials with unlimited online access to all
of them. They offer help in HTML/CSS, JavaScript, PHP, Ruby,
Design/UX, Mobile OS, and Workflow. When you subscribe to their service
you’ll also gain access to a huge eBook library available online or
download them and put them on a tablet device to read them anywhere.
Link: https://learnable.com/
Thinkful
Thinkful is another mentor based site that connects you with experts in your
field of study and offers one-on-one help. These mentor sites are great if
you are working on a project and get stuck. They can look at your code and
help you debug the problem.
Link: https://www.thinkful.com/
Below is a list of resources that are aimed towards kids. These resources
understand how younger brains work and have designed their courses to be
engaging and fun for kids to learn how to code.
* Code.org
A simple name with extraordinary results. The two Partovi brothers created
this non-profit organization to encourage school students to learn computer
science. This is a very well thought out site that aims to reach the minorities
inside the programming world.
Link: http://code.org/
CS Unplugged
CS Unplugged takes a different approach to teaching programming. Rather
than having online courses they offer a free PDF book that is a collection of
games, puzzles, cards, and activities that students can use without a
computer to learn computational thinking. It won’t take someone from
beginner to master coder but it will get kids to start thinking in a way that
will set them up for success later in life. You can also purchase a physical
book from their partner website for $20.
Link: http://csunplugged.org/
Programming is becoming essential for everyone in the modern world.
There are so many resources to learn programming or sharpen your skills,
and we hope you find the ones that have been included in this nugget to be
valuable. Learning to program ultimately is like learning a language, and
you must use it every day to stay fluent.
KEY NINJA LESSONS
Get beyond the fear of reading or writing code and start getting
familiarized with the concepts. Programming syntax and languages
may change, but the concept of using programming techniques to solve
particular business problems do not change for the most part.
Remember that as we learn to read, so we can read to learn – so must
you learn to code, so you can code to learn.
The choice is yours to make. Start simple! Pick a language that suites
your needs and practice with it in order to build competence. There’s
no easy way to get around the practicing part.
CONTINUE ADAPTING

MOTIVATION
Looking back 20 years or so, cassette tapes, walk mans, and floppy disks
were the norm, and every cool kid on the block wanted to own one. But,
these once "cool" technologies of the 1980s and 1990s bears almost no
resemblance to what we have today.
In the same way, our jobs and organizations of today probably bear little
resemblance to that time. Or, let’s play that forward, and look ahead 20
years from now. For one, we can guarantee that things would have changed
and would not be same as they are today.
In that scenario, we see that new gadgets would have sprung into
existence, companies would have upgraded their platforms and tools, and
the way we do business or interact with each other would have changed.
Dealing with all of this change can be daunting. Yet being able to do so is
vital to your career as a successful Data Ninja.
HYPE CYCLE FOR EMERGING TECHNOLOGIES
Making predictions about the future can be hard, especially when the
changes happen almost on a daily basis. In the industry, the Gartner
Technology Hype cycle is the leader in making predictions about what
technology tools would live or flop in coming years. Mind you, it's only a
prediction - not a declaration.
For over the 10 years it's been published, they have over time added a
comprehensive range of hype cycles covering technology applications like
Ecommerce, CRM, ERP and Business Intelligence. (Many of their
predictions are only available to subscribers, but Gartner do share some of
the broader hype cycles through their blog/press releases.)
Gartner’s Hype Cycle for Emerging Technologies Maps the Journey to
Digital Business

Garter’s Hype Cycle (gartner)

In order to come up with their hype-cycle, Gartner examined more than


2,000 technologies on their maturity, their business benefits and their future
orientation.
The Hype Cycle they provide from the results of this evaluation is
especially relevant for review because it offers a cross-sectoral perspective
of those technologies and trends that senior executives, CIOs, innovators,
and technology planners should consider in the preparation of strategic
technology portfolio.
In the recent hype cycle they released (shown in the figure above) we see
three big predictions stand out – all promising to be the three most
important strategic technology trends in the coming years.

Computing Everywhere
The Internet of Things (IoT)
Big Data

PREDICTIONS ARE THE NEXT FRONTIER IN ANALYTICS


Predictive analytics is one area of analytics that holds a lot of promise
and is even promising to be the next frontier in Data Analytics. Open source
tools and solutions like Spark, Mahout, and R are a few of the top
contenders in this area.
Proprietary predictive analytics solutions like AzureML, Angoss
Predictive Analytics, RapidMiner, SAS Analytics, IBM Analytics and SAP
analytic offers more options companies can choose from.
Why is Predictive Analytics Important?
The ability to predict the future and influence it is a lucrative opportunity
and companies such as IBM and SAP are great examples of organizations
that adopt this initiative. IBM uses predictive analytics software to increase
profitability, prevent fraud, and even measure the social media impact of
marketing campaigns. SAP allows customers to act on big data and offers
insights on new opportunities and any hidden risks. Predictive analytics also
extends beyond these two companies to various industries some of which
are listed below.
(Subramanyan)
WHAT IS PREDICTIVE ANALYTICS ABOUT?
Predictive analytics is a very important capability companies are looking
to have now. With Predictive analytics, organizations can use historical
performance data to extrapolate and make predictions about the future. If
the predictions are right, the company can then take actions to influence the
outcomes in their favor.
Today, we companies rushing to get ahead of the curve and incorporate
predictive analytics models as part of their data analytics processes. Gartner
substantiates this point by making the claim that by 2016, 70% of the Most
Profitable Companies Will Manage Their Business Processes Using Real-
Time Predictive Analytics or Extreme Collaboration. (Gartner)
PREDICTIVE ANALYTICS IN ACTION
Examples of the use of predictive analytics are legion. Retailers such as
supermarket chains use the concept to analyze current and historical sales
data. Using predictive analytics, they can see and identify patterns in
customer behavior and use these patterns of behavior to predict what
products customers are most likely to buy.
Banks and financial institutions use predictive analytics to forecast the
likelihood of a customer defaulting on loans and health insurance
companies using predictive analytics methods to screen members and
establish which claims are most likely to be bogus or even fraudulent.
All these examples we see on predictive analytics help to reinforce the
fact that Predictive Analytics is no longer science fiction — it is something
that is actually happening, and more than likely will be coming to a
company near you.
CASE IN POINT
Recently in the news, there was a story of data analytics teams at Target
Corporation being able to figure out, through the infinite wisdom of
predictive analytics, that a shopper at one of their stores was pregnant –
before the father of the shopper even knew it.
How Target Figured out a Teen Girl was Pregnant before the Father
Every time you go shopping, you share intimate details about your
consumption patterns with retailers. And many of those retailers are
studying those details to figure out what you like, what you need, and which
coupons are most likely to make you happy. Target, for example, has
figured out how to data-mine its way into your womb, to figure out whether
you have a baby on the way long before you need to start buying diapers.
How Did They Do It?
[The analyst who implemented the solution] ran test after test, analyzing the
data, and before long some useful patterns emerged. Lotions, for example.
Lots of people buy lotion, but one of Pole’s colleagues noticed that women
on the baby registry were buying larger quantities of unscented lotion
around the beginning of their second trimester. Another analyst noted that
sometime in the first 20 weeks, pregnant women loaded up on supplements
like calcium, magnesium and zinc. Many shoppers purchase soap and
cotton balls, but when someone suddenly starts buying lots of scent-free
soap and extra-big bags of cotton balls, in addition to hand sanitizers and
washcloths, it signals they could be getting close to their delivery date.
Even More of this Target Story
As Pole’s computers crawled through the data, he was able to identify about
25 products that, when analyzed together, allowed him to assign each
shopper a “pregnancy prediction” score. More important, he could also
estimate her due date to within a small window, so Target could send
coupons timed to very specific stages of her pregnancy.
(Forbes)
This solution developed by Target certainly underscores the enormous
benefits of Predictive Analytics solutions, and this is not all theory or fun
and games.
Solutions like this produce real results that positively influence the
bottom line of companies.
For example, the Forbes article that carried the Target story goes further
to call out the direct dollar figures that were attributed to the predictive
analytics solution – and let’s just say it was in the B’s (as in billions of
dollars)!
Dollar Value of Target’s Predictive Analytics
Duhigg suggests that Target’s gangbusters revenue growth — $44 billion in
2002, when Pole was hired, to $67 billion in 2010 — is attributable to
Pole’s helping the retail giant corner the baby-on-board market, citing
company president Gregg Steinhafel boasting to investors about the
company’s “heightened focus on items and categories that appeal to
specific guest segments such as mom and baby.”
(Forbes)
From the Target example, we see over 23 billion dollar of growth in sales
from 2002 to 2010 that is attributed to the predictive analytics solutions.
That is a lot of money that could potentially turn the fortunes of many
companies around. More importantly, such positive results can make you –
the Data Ninja who made it all happen – the number one super hero of the
company
And that is why Predictive analytics and the potential value it can deliver
is a trend worth paying attention to going forward.
THE INEVITABILITY OF CHANGE IS THE ONLY CERTAINTY.
I cannot predict the future of data analytics, but if history is any good
teacher, I can say for sure that things will change and be different than what
we are used to today. Change is inevitable, and comes in two main flavors –
evolutions and revolutions.
EVOLUTIONARY CHANGES
Most of us are accustomed to evolutionary changes. They are slow, take
time, are less noticeable, but still happen. For example, going from SQL
2000 to SQL 2008 or SQL 2014 was an evolutionary process. New features
were added to the base product and all SQL server professionals had to do
was to mature and update their skill sets along the way.
Going from MS Excel 2000 to Excel 2014 was evolutionary, with new
features being added incrementally along the way — including the PowerBI
stack that consists PowerPivot, PowerView, PoweQuery, PowerMaps, and
so on.
The changes as we see in SQL or Excel are evolutionary by nature —
they come often, move us forward, and we can deal with them through
upgrades and by reading a few new books.
Revolutionary Change. (HTB)

REVOLUTIONARY CHANGES
Revolutionary changes, on the other hand, are the ones that come much
less often, and when they do, they fundamentally change the affected
industries and rock it from top to bottom.
Think about inventions such as paper currency, the light bulb,
automobile, anti-biotic, transistors, micro-processor and more. Without
these inventions, we probably won’t be where we are today as a society.
Even in the world of working with data, we see these kinds of changes on
the horizon of how we collect, store and process data.
Today in the mainstream, we think in terms of databases, with tables,
rows and columns when storing data. Tomorrow, that concept of databases
as we know it may change, replaced by entirely new constructs – forcing us
all to eschew what we previously knew and adopt the new ones.
Tools like Graph databases, NoSQL, Hadoop, Spark, and many other
products are all standing on the cutting edge, promising to fundamentally
change the field of data analytics as we know it.
Whether or not these technologies will actually deliver on their promises
and the hype surrounding them, is not a prediction I’m comfortable
making.
But one way or the other, things will change, including the tools and
techniques we currently use for data analysis.
SUMMARY
Change is constantly around us. Sometimes it is so minute and consistent
that we do not notice it, while other times it is so severe and sudden that it
bowls us over.
As Darwin famously wrote in his book, The Origins of Species:
It is not the strongest of the species that survives, nor the most intelligent
that survives. It is the one that is most adaptable to change.
--Charles Darwin
The field of data analytics is constantly changing and no one (including
me) can make a prediction of what tools or processes will be used 5, 10 or
20 years from now. At the very most, all we can do is speculate.
But despite the certainty of the inevitability of change, that fact alone
should not paralyze or prevent us from being effective data ninjas. What it
should serve to do is to make us prepared and ready to adapt to new
technologies, tools or practices the future throws at us.
As a data Ninja, if you can learn how to stay relatively unaffected by
change, handle new technological developments with confidence, and adapt
to any curve balls that come your way, then you will stand the test of time
and have a better time maturing with your career in the profession.
Ninja Tip 12.
In his classic work The Art of War, author and military strategist Sun Tzu
wrote about the importance of observing signs of the enemy.
In it, he wrote that movement among trees in a forest indicates an
advancing enemy brigate, and that dust that rose in a high column indicated
the approach of chariots.
In same token of observing signs, it’s important that you find and pay
attention to such vital signs in your career. It would help you know when or
when not to make critical career decisions.
CHANGE AND ADAPTATION - NINJA RESOURCES
The Tech world is a fast paced, ever changing, living organism. New
advancements can make a whole field obsolete overnight. Startup
companies are popping up everywhere, offering a vast array of products,
services, and technologies. It is crucial to stay current and on top of how
things are changing, to keep yourself competitive and relevant.
How Technology Is Transforming Our Brains
Published, 2013
http://www.digitaltonto.com/2013/how-technology-is-transforming-our-
brains/
Top 5 Reasons why software professionals need social skills, too
Published, 2011
http://www.computertrainingschools.com/articles/importance-of-social-
skills-for-tech-professionals.html
10 highly valued soft skills for IT pros
Published, 2013
http://www.techrepublic.com/blog/10-things/10-highly-valued-soft-skills-
for-it-pros/
7 Simple Ways to Stay Current on Technology
Published, 2012
http://allthingsadmin.com/administrative-professionals/stay-current-
technology/
8 Ways to Advance Your Career by Staying Relevant
Published, 2012
http://www.cio.com/article/2448966/careers-staffing/8-ways-to-advance-
your-it-career-by-staying-relevant.html
6 Ways to stay Current in Your Field and Advance
Published, 2010
http://www.personalbrandingblog.com/6-ways-to-stay-current-in-your-
field-and-advance/
As you can see some of these articles are a few years old, which in the
Tech world could means they are obsolete. But efforts have been made in
selecting the articles listed to ensure their relevance.
To stay informed and be able to evolve in the field of data analytics, you
need to keep reading and learning new concepts and techniques. Below is a
list of my favorite Tech blogs and websites that can be a valuable source of
news and updates for you to learn and stay informed on technologies,
especially relating to general trends within the industry. I have tried to only
include sites that have a stable revenue and user base to ensure that they
will be around for years to come.
ZDNet
ZDNet was founded in 1991 and acquired by CNET in 2000. ZDNet
publishes product reviews, software downloads, news, analysis, and
guides.
Link: http://www.zdnet.com/
GIGAOM
Gigaom was created in 2006 by Om Malik. They devote all their efforts into
finding the newest and best in tech. News and analysis on web 2.0, startups,
gaming, social media, and everything else tech. With over 6.5 million
unique visitors every month Gigaom is trying to humanize technology and
make it approachable for everyone.
Link: https://gigaom.com/
Mashable
Mashable reports on the importance of digital innovation. Mashable has
over 42 million unique visitors monthly Mashable is truly a powerhouse.
They report on everything tech from social media, entertainment, news,
startups, and anything techies are talking about.
Link: http://mashable.com/
Wired
Wired is a full-colored monthly magazine based in the United States. They
report on emerging technologies, economics, and politics. Their magazine is
full of interesting thought provoking articles that will inspire and amaze.
Their website offers free articles and news. They cover absolutely
everything any tech savvy person could care about. Subscribe to their
magazine and you’ll learn something new every time you pick it up.
Link: http://www.wired.com/
TechCrunch
TechCrunch is one of my favorite tech sites to visit on a daily bases. Like
wired above they cover almost everything you’ll need to stay current in the
world. They do not limit themselves to tech; they delve into politics and
worldwide news. Founded in 2005 they have grown immensely.
Link: http://techcrunch.com/
DataTau
This is the most technical and Data oriented of all the resources listed in
this nugget. Datatau is like Hackernews for data science. The simple
interface feels that they are just a list of articles for bigdata/data scientists.
But the quality of content that can be found there is great.
Link: http://www.datatau.com
This list is just the crust of tech related blogs and sites. If you visit any one
from the list above you will discover more blogs and affiliates expanding
your knowledge. Keep searching and exploring for new technologies and
new skills while adding to the wealth of skills you’ve gain from reading this
book.
KEY NINJA LESSONS

Change requires flexibility. The better able you are to adapt to change,
the greater your chances of being successful.
“Enjoying success requires the ability to adapt. Only by being open to
change will you have a true opportunity to get the most from your
talent.” --Nolan Ryan.
Stay curious and adapt. Change is the only thing that will remain
constant.

CONCLUSION
As we have seen throughout; data volumes continue grow at mind
blowing rates, the demand for data crunching professionals is off the roof,
the pay for quality talent is astonishingly lucrative, the entry barriers for
beginners is remarkably easy. So, what are you waiting for? The choice is
yours, to get in, and get started.
Many companies increasingly depend on their Data Analyst’s to crunch
numbers, but they depend even more on the Data Ninja-type professionals –
like you, who can go beyond the basic aspects of crunching numbers and
understand the subtle nuances of the trade.
These are Ninja’s who can see the big picture and perform analysis or
make predictions in ways that positively affect the bottom line of their
companies.
Stuff to Blow Your Mind
To conclude, I would leave you with a few real world stories of high
performing companies that are making the best of the data boom.
The excerpts presented below is a list of companies that are leveraging
the tremendous powers of Data Analytics and are positively affecting their
bottom-lines in the process.
The point of doing this, is to have you consider yourself as being the data
ninja in charge (or one of the data ninja’s in charge) who helped make that
happened. And then, in that situation, also consider what that would mean
for your career, your ambitions, your goals and above all, your pocket book,
or take home pay check.
IBM
IBM’s work has revealed genetic traits of cancer survivors, tracked the
source of an E. coli outbreak. It recently created a visualization to help the
influential Washington, D.C.–based think tank, Institute for the Study of
War, map terrorist behavior in and around Baghdad during a campaign to
free imprisoned Al Qaeda members.
THE WEATHER COMPANY
By analyzing the behavior patterns of its digital and mobile users in 3
million locations worldwide—along with the unique climate data in each
locale—the Weather Company has become an advertising powerhouse,
letting shampoo brands, for example, target users in a humid climate with a
new antifrizz product. It’s no surprise that more than half of the Weather
Company’s ad revenue is now generated from its digital operations.
EVOLV
Evolv’s data scientists have uncovered: People with two social media
accounts perform much higher than those with more or less, and in many
careers, such as call-center work, employees with criminal backgrounds
perform better than those with squeaky-clean records. Evolv’s sales grew a
whopping 150% from Q3 2012 to Q3 2013.
GE
Over the past year, General Electric has taken the lead in tying together
what Chairman Jeff Immelt calls "the physical and analytical worlds."
Translation: GE's many machines—everything from power plants to
locomotives to hospital equipment—now pump out data about how they're
operating. GE's analytics team crunches it, then rejiggers machines to be
more efficient. Even tiny improvements are substantial, given the scale: By
GE's estimates, data can boost productivity in the U.S. by 1.5%, which over
a 20-year period could save enough cash to raise average national incomes
by as much as 30%.
(FastCompany)
Acronyms

ACID
Atomicity, Consistency, Isolation and Durability
BI
Business Intelligence
CDA
Confirmatory Data Analysis
CRM
Customer Relationship Management
CSV
Comma-separated Values
DBA
Database Administrator
DCL
Data Control Language
DDL
Data Definition Language
DIKW
Data Information Knowledge Wisdom
DML
Data Modification Language
DW
Data Warehouse
EDA
Exploratory Data Analysis
ERM
Entity Relationship Models
ETL
Extract Transform Load
HDFS
Hadoop Distributed File System
KPI
Key Performance Indicator
LOB
Line of Business
NoSQL
Not Only SQL
ODS
Operational Data Store
OLAP
Online Analytical Processes
OLTP
Online Transactional Processes
RDBMS
Relational Database Management System
SAS
Statistical Analysis Software
SQL
Structured Query Language
TCL
Transaction Control Language
XML
Extensible Markup Language
Glossary of Definitions

Data Ninja
A Data Ninja is an entry level, unspecialized, entrepreneurial individual that
works within a structured environment (usually within a company or team),
employing a variety of tools and performs a variety of tasks related to
collecting, organizing, and interpreting data to gain useful information.
Data Analysis
Data analysis is the process of finding the right data to answer your
question, understanding the processes underlying the data, discovering the
important patterns in the data, and then communicating your results to have
the biggest possible impact.
Big Data
Big data is an evolving term that describes any voluminous amount of
structured, semi-structured and unstructured data that has the potential to be
mined for information. Although big data doesn't refer to any specific
quantity, the term is often used when speaking about petabytes and exabytes
of data.
DIKW Pyramid
The DIKW Pyramid, also known variously as the "DIKW Hierarchy", is a
model used for representing structure and functional relationships between
data, information, knowledge, and wisdom. In the Model, information is
defined in terms of data, knowledge in terms of information, and wisdom in
terms of knowledge.
Database
A database (abbreviated DB) is a collection of information that is organized
so that it can easily be accessed, managed, and updated. A database
basically helps to organize a collection of information in such a way that a
computer program can quickly select desired pieces of data. Many datasets
in companies are stored and manipulated in databases.
Data Warehousing
In computing, a data warehouse (DW or DWH), also known as an
enterprise data warehouse (EDW), is a system used for reporting and data
analysis. DWs are central repositories of integrated data from one or more
disparate sources.
Dimensional Modelling
Dimensional modeling (DM) names a set of techniques and concepts used
in data warehouse design. Many dimensional models typical consists of fact
tables and lookup tables.
Predictive Analytics
Predictive analytics is the practice of extracting information from existing
data sets in order to determine patterns and predict future possibilities and
trends. It doesn't dictate the future, but it forecasts what might happen in the
future with an acceptable level of reliability, and includes what-if scenarios
and risk assessment.
Programming
Programming is the process of taking an algorithm and encoding it into a
notation, a programming language, so that it can be executed by a computer.
Programming involves activities such as analysis, developing
understanding, generating algorithms, verification of requirements of
algorithms including their correctness and resources consumption, and
implementation (commonly referred to as coding) of algorithms in a target
programming language. Programming can be done in many different
programming languages (such as C, FORTRAN, JavaScript, Lisp, Python,
Ruby, Smalltalk, etc.)
Relevant Quotes Glossary

* A point of view can be a dangerous luxury when substituted for insight


and understanding. Marshall McLuhan, Canadian Communications
Professor
* In God we trust, all others must bring data. W. Edwards Deming
* I never guess. It is a capital mistake to theorize before one has data.
Insensibly one begins to twist facts to suit theories, instead of theories to
suit facts. Sir Arthur Conan Doyle, Author of Sherlock Holmes stories
* He uses statistics as a drunken man uses lamp posts – for support rather
than for illumination. Andrew Lang, Scottish Write
* Facts do not cease to exist because they are ignored. Aldous Huxley
* A reminder to be careful in your analysis and don’t stretch to get the
results you’d like If you torture the data long enough, it will confess.
Ronald Coase, Economist
* Errors using inadequate data are much less than those using no data at all.
Charles Babbage
* Once we know something, we find it hard to imagine what it was like not
to know it. Chip & Dan Heath, Authors of Made to Stick, Switch
* If the Statistics are boring, you've got the wrong numbers. Edward Tufte
* Data are just summaries of thousands of stories – tell a few of those
stories to help make the data meaningful. Chip & Dan Heath, Authors of
Made to Stick, Switch
* Numbers have an important story to tell. They rely on you to give them a
clear and convincing voice. Stephen Few
* The goal is to turn data into information, and information into insight.
Carly Fiorina, Former CEO of HP
* There is a magic in graphs. The profle of a curve reveals in flash a whole
situation — the life history of an epidemic, a panic, or an era of prosperity.
The curve informs the mind, awakens the imagination, convinces. Henry D.
Hubbard, 1939
* People take good care of data that is important to them Data that is loved
tends to survive. Kurt Bollacker, Data Scientist, Freeba
* If we have data, let’s look at data. If all we have are opinions, let’s go
with mine. Jim Barksdale, former Netscape CEO
BIBLIOGRAPHY

(Information Governance Initiative), IGI. "http://iginitiative.com/biggest-


risk-big-data-inability-extract-value/." 20 March 2015. "IS THE BIGGEST
RISK OF BIG DATA THE INABILITY TO EXTRACT VALUE?". Article.
25 April 2015.
Bain. "http://www.bain.com/publications/articles/the-value-of-big-
data.aspx." n.d. "The Value of Big Data: How Analytics Differentiates
Winners.". Article. 24 March 2015.
Big Data Salary, "Inside Big Data Salary". https://datajobs.com/big-data-
salary. n.d. Table. 25 March 2015.
Brown, Tim. "http://insights.som.yale.edu/insights/how-do-you-build-
culture-innovation." n.d. "How Do You Build a Culture of Innovation? -
Yale Insights". Article. 25 April 2015.
Carroll, Jim. "Chapter 7 - Creating an Innovation Culture." Carroll, Jim.
"What I Learned from Frogs in Texas: Saving Your Skin with Forward
Thinking Innovation.". Mississauga, Ont.: Oblio, 2004. 81. Print.
CDN. "http://cdn.images.express.co.uk/img/dynamic/78/590x/nelson-
mandela-interprete-448108.jpg." 10 December 2013. Figure 1: A sign
language interpreter during a memorial service at FNB Stadium in honor of
Nelson Mandela in Soweto, near Johannesburg. Image. 20 April 2015.
coca-cola. "http://www.coca-colacompany.com/stories/program-or-perish-
why-everyone-should-learn-to-code." n.d. "Program or Perish: Why
Everyone Should Learn to Code.". report. 11 april 2015.
Cprogramming.com. "http://www.cprogramming.com/whyprogram.html."
n.d. "Why Learn to Program?" . Article. 11 April 2015.
Datanami. "http://www.datanami.com/2014/10/02/top-three-things-excel/."
14 October 2014. "Top Three Things Not To Do in Excel.". 11 April 2015.
Data-Warehouses. "http://data-
warehouses.net/glossary/dimensionalmodel.html." n.d. "Dimensional
Model.". Diagram. 25 March 2015.
Dice. Dice.com. 20 April 2015. Search. 20 April 2015.
DNews. "http://news.discovery.com/tech/apps/youngest-video-game-
programmer-130215.htm." n.d. "Meet the Youngest Video Game
Programmer : DNews.". Article. 11 April 2015.
Doyle, Martin. ""What Is the Difference Between Data and Information?"."
n.d. Business 2 Community. Article. 25 March 2015.
EarSketch. "http://earsketch.gatech.edu/category/learning/anatomy-of-an-
earsketch-project/what-is-programming." n.d. "EarSketch." . quote. 11 April
2015.
Eridon, Corey. "http://blog.hubspot.com/marketing/problem-with-
predictive-analytics." 11 June 2014. "The Problem With Predictive
Analytics. Article. 20 April 2015.
Eskimo. "https://www.eskimo.com/~scs/cclass/progintro/sx1.html." n.d.
"Skills Needed in Programming.". Document. 11 april 2015.
Excel, Microsoft. Pivot Functions within MS Excel. 2015. Screenshot.
FastCompany. "http://www.fastcompany.com/most-innovative-
companies/2014/industry/big-data." 10 February 2014. "The World's Top 10
Most Innovative Companies in Big Data.". Article. 24 April 2015.
Forbes. "http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-
figured-out-a-teen-girl-was-pregnant-before-her-father-did/." n.d. "How
Target Figured Out A Teen Girl Was Pregnant Before Her Father Did.".
Article. 19 April 2015.
Franks, Bill. "http://iianalytics.com/research/analytics-gone-wrong-dire-
consequences-for-kids." 9 November 2011. "Analytics Gone Wrong: Dire
Consequences for Kids". Article. 24 April 2015.
Gartner. "Gartner Reveals Top Predictions for IT Organizations and Users
for 2013 and Beyond". 2013. Press Release. 25 March 2015.
—. "Gartner Says by 2016, 70 Percent of the Most Profitable Companies
Will Manage Their Business Processes Using Real-Time Predictive
Analytics or Extreme Collaboration." Analisys. 2015. Report.
gartner.
http://na2.www.gartner.com/imagesrv/newsroom/images/HC_ET_2014.jpg;
pv4cc7877f7de80268. 2014. chart. 2015.
GeekWire. "http://www.geekwire.com/2014/analysis-examining-computer-
science-education-explosion/." n.d. "Analysis: The Exploding Demand for
Computer Science Education, and Why America Needs to Keep up". Chart.
11 April 2015.
GeekWirre. "http://www.geekwire.com/2014/analysis-examining-computer-
science-education-explosion/." n.d. "Analysis: The Exploding Demand for
Computer Science Education, and Why America Needs to Keep up ". Chart.
11 April 2015.
Gigaom. " https://gigaom.com/2012/10/14/why-becoming-a-data-scientist-
might-be-easier-than-you-think/." n.d. "Why Becoming a Data Scientist
Might Be Easier than You Think.". quote. 11 april 2015.
—. "https://gigaom.com/2015/01/27/microsoft-throws-down-the-gauntlet-
in-business-intelligence/." 2015. "Microsoft Throws down the Gauntlet in
Business Intelligence.". 11 April 5015.
GlassDoor. www.glassdoor.com. 20 April 2015. Search. 20 April 2015.
HowTo. "http://howtostartprogramming.com/getting-started/." n.d. "How
To Start Programming.". Document. 11 april 2015.
HTB. http://www.hangthebankers.com/wp-content/uploads/2012/10/Stages-
of-change.jpg. 2015.
IBM. http://www-03.ibm.com/press/us/en/pressrelease/27357.wss. 25
March 2015. Article. 20 April 2015.
Illinois.
"http://ori.hhs.gov/education/products/n_illinois_u/datamanagement/datopic
.html." n.d. "Data Analysis." Responsible Conduct of Research (RCR).
Goverment Resource. 25 March 2015.
InformationBuilders. "http://www.informationbuilders.com/data-
warehousing." n.d. "Data Warehousing (Data Warehouse) Solutions |
Information Builders.". Article. 11 April 2015.
Jain, Piyanka. ""5 Steps To Transition Your Career To Analytics: Step 1 -
Identify Your Ideal Job."." Forbes Magazine (2015, Jan 5). Article.
kdnuggets. "http://www.kdnuggets.com/2012/08/poll-analytics-data-
mining-programming-languages.html." n.d. "Poll Results: Top Languages
for Analytics/data Mining Programming.". poll. 11 april 2015.
Kearney, A. T. "Big Data and the Creative Destructive of Today's Business
Models.". n.d. Table. 26 March 2015.
Kimbal. "http://www.kimballgroup.com/1997/08/a-dimensional-modeling-
manifesto/." 2 August 1997. "A Dimensional Modeling Manifesto -
Kimball Group.". Article. 11 April 2015.
Kristal, Murat. ""Mining Mountains of Data is Key for Canadian
Businesses"." 12 September 2012. The Globe and Mail. Article. 25 March
2015. <http://www.theglobeandmail.com/report-on-
business/economy/canada-competes/mining-mountains-of-data-is-key-for-
canadian-businesses/article4540604/>.
Learn. "http://www.learn.geekinterview.com/database/sql/sql-
standardization.html." 2015. "SQL Standardization | Online Learning.".
Website. 20 April 2015.
Leek, Jeff. PhD. "https://www.coursera.org/course/dataanalysis." 2013.
Johns Hopkins Bloomberg School of Public Health: Data Analysis.
Coursera Course. Webpage. 25 March 2015.
Lifehacker. "http://lifehacker.com/5401954/programmer-101-teach-
yourself-how-to-code." n.d. "Programmer 101: Teach Yourself How to
Code.". instructions. 11 april 2015.
Longlivetheux. DIKW Pyramid, Wikipedia. 5 January 2015. DIKW
Pyramid. 25 March 2015.
Marr, Bernard. "Big Data: Using Smart Big Data, Analytics and Metrics to
Make Better Decisions and Improve Performance." Marr, Bernard. Big
Data: Using Smart Big Data, Analytics and Metrics to Make Better
Decisions and Improve Performance. n.d., 2015. Print.
McGee, Marianne Kolbasuk. "http://www.databreachtoday.com/prison-
time-for-health-data-theft-a-5442." 23 January 2013. "Prison Time for
Health Data Theft." Data Breach Today. Article. 25 March 2015.
McKinsey. "Big data: The Next Frontier for Innovation, Competition, and
Productivity." Business Technology. 2011. Report.
ModelViewCulture. "https://modelviewculture.com/pieces/manufacturing-
the-talent-shortage." n.d. "Manufacturing the Talent Shortage.". Document.
11 aprill 2015.
nde. "Data Warehouse ETL architecture." n.d. Chart.
NetworkWorld. "http://www.networkworld.com/article/2226514/tech-
debates/what-s-better-for-your-big-data-application--sql-or-nosql-.html."
n.d. "What's Better for Your Big Data Application, SQL or NoSQL?". 11
April 2015.
NewYorker. "http://www.newyorker.com/tech/elements/do-we-really-need-
to-learn-to-code." n.d. "Do We Really Need to Learn to Code?" . quote. 11
april 2015.
NGGS. Nextgen Global Solutions. 2015. Chart. 25 March 2015.
Norvig. http://norvig.com/21-days.html. 2015. 2015.
Nuggets, K. D. "http://www.kdnuggets.com/polls/2011/tools-analytics-data-
mining.html." May 2011. "Data Mining/Analytic Tools Used.". Poll. 11
April 2015.
Office, MS. "https://support.office.com/en-us/article/Use-the-Analysis-
ToolPak-to-perform-complex-data-analysis-f77cbd44-fdce-4c4e-872b-
898f4c90c007." n.d. "Use the Analysis ToolPak to Perform Complex Data
Analysis.". Support. 11 April 2015.
Oracle. "Big Data and the Creative Destructive of Today's Business
Models". 2012. Chart. 25 March 2015.
Robert Half Technology, 2015 Salary Guide. www.RHT.com. 2015. Table.
25 March 2015.
Scott and Scott, technology Attorneys.
"http://www.scottandscottllp.com/main/business_impact_of_data_breach.as
px." n.d. "The Business Impact of Data Breach.". Article. 26 March 2015.
serra, james. http://www.jamesserra.com/wp-
content/uploads/2013/05/DataWarehouseWithMDMDQS2.jpg. 2013. 2015.
Somerville, Richard. Interview. American climate scientist. 2011. Quote.
Stadd, Allison. "“Data Analysts: What you’ll make and where you’ll make
it.” ." 26 Movember 2014. http://blog.udacity.com/2014/11/data-analysts-
what-youll-make.html. Web Article. 25 March 2015.
study.com. "http://study.com/become_a_data_analyst.html." 2015. "Become
a Data Analyst: Education and Career Roadmap.". Chart. 26 March 2015.
Subramanyan, Vignesh. "http://www.business2community.com/business-
intelligence/predictive-analytics-important-0610132." 9 September 2013.
"Why Is Predictive Analytics Important?" Business 2 Community. Article.
15 April 2015.
TechnologyCrowds. www.TechnologyCrowds.com. n.d. chart. 26 march
2015.
U.S. Census Bureau, Income and Poverty in the United States. U.S.
Department of Commerce Economics and Statistics Administration. 2013.
Census. 27 March 2015.
WFM.
"https%3A%2F%2Fcareer4.successfactors.com%2Fcareer%3Fcareer_ns%3
Djob_listing%26company%3DEA%26navBarLevel%3DJOB_SEARCH%2
6rcm_site_locale%3Den_US%26career_job_req_id%3D45383%26jobPipel
ine%3DIndeed." n.d. "Career Opportunities: Data Analyst". 24 March 2015.
Workable. ""Data Analyst Job Description. Ready to Post and Easy to
Customize."." n.d. http://resources.workable.com/data-analyst-job-
description. Job Description Resources. 26 March 2015.
Yale. "http%3A%2F%2Fits.yale.edu%2Fsecure-computing%2Fprotecting-
yales-data%2Fsecure-removal-data-or-disposal-computing-devices." n.d.
"Secure Removal of Data or Disposal of Computing Devices.". Article. 24
April 2015.
www.nextgenglb.com
About the Author

Fru Nde is a Data Professional who is very passionate about the ROI that
companies can realize by effectively using their data asset.
As a practicing Data Ninja, Fru uses his Ninja Skills to help companies
make sense of data; be it moving, storing, or analyzing data.
He has been battle tested with extensive experience of both consulting and
working fulltime with corporations across the globe, including several
fortune 500 companies in many different core industries: Retail, Banking,
Health Services, and Food & Hospitality.
Fru is also the founder of NextGen Global (NGG), LLC a boutique
solutions firm that conducts research and provides consulting and advisory
services to organizations large and small.
NGG’s mandate and number one value is to provide solid solutions,
techniques and services that enable companies to utilize all of their
available assets (people, process and tools), to run, grow and transform their
businesses.

You might also like