Professional Documents
Culture Documents
A Game Plan For Success in Data Analytics
A Game Plan For Success in Data Analytics
A Game Plan For Success in Data Analytics
1. MASTER EXCEL
2. CONQUER SQL
3. TAME DATA WAREHOUSING
4. PICKUP CODING
5. CONTINUE ADAPTING
A Game Plan for Success in Data Analytics
In Africa, there is proverb that says, “It takes a village to raise a child.”
I have certainly been raised in good hands by a village, by my family,
friends and by anyone whose paths have crossed with mine.
I wish to dedicate this book to everyone that is dear to me.
Even more so, I wish to dedicate this book to my mom, dad, and
siblings who have nurtured and stood steadfastly beside me every step of
the way.
Thank you for your love, support and continued guidance.
— Fru N.
A GAME PLAN FOR SUCCESS IN DATA ANALYTICS
FRU NDE
www.nextgenglb.com
NEXTGEN GLOBAL
INTRODUCTION
The world today is awash with data. Companies have and are investing
great amount of resources collecting and storing vast amounts of data. With
this data sitting on companies data farms, there is now a great need to
employ Data Professionals (a.k.a Data Ninja’s) who can come in and make
sense of this data.
MOTIVATION
I recently had the privilege of being invited along with some of my peers to
speak with a group of students pursuing a Master’s degree in Business
Analytics and Big Data at the University Of Minnesota Carlson School Of
Management in Minnesota.
The discussion was about Analytics and options for students in the
Master Program looking to make a career out of working with data. The
discussions we had in the panel session was very lively and engaging; and
the enthusiasm the students expressed reinforced my appreciation of how
vibrant the field Data Analytics is and how important it is for those who
want to get into the field to have a game plan for success.
After further discussing the topic of what it takes to be a data
professional with peers, what skills are required to start, and what steps
beginners can take to get into the field, our conversation precipitated
thoughts which eventually led to the writing of this book on those topics.
EXPECTATION
This book, as a whole, is NOT intended to be a technical book. Even
though some technical concepts will be discussed, you will not, for
example, learn how to use Excel by reading this book. Instead, resources
will be provided at the end of each chapter to help you follow up and do a
more in-depth study of the proposed topics.
Within this book, I will make the case for why you should learn Excel.
The same approach applies with other concepts such as SQL, Data
warehousing, ETL and Programming. The list of resources provided at the
end of each chapter are intended for you to then follow up and do more in-
depth study of the concepts on your own.
WHO IS THIS BOOK FOR?
This book was created as an overview of some of the most common and
basic tools used in Data Analytics today. As such, it is geared toward new
users – who are looking to get high level summaries of the concepts, and
more especially, users who are looking for resources and pointers to get
started. For more advanced readers, the resource links that have been
included at the end of each chapter can still be very valuable as a point of
reference or as an avenue to find other new resources that can help polish
up on certain specific areas.
We hope the content of this book, along with the references and resource
links provided, will help and encourage newcomers to get a better
understanding of some of the skills required for Data Analytics. We also
hope that the material will guide and encourage you to take that next step in
your career – either by securing a new job, making a career change, getting
more skilled or just going on to pursuing that promotion you have been
yearning for in the field of Data Analytics.
WHAT DO I BRING TO THE TABLE?
Having had a solid background helping companies transform their data
asset into information, I have been fortunate over my career to work with
several small, midsized and Fortune 500 companies.
In such engagements, I have leveraged tools as simple as MS Excel
spreadsheets on the one end, to more advanced tools like SQL and
Programming on the other end in order to build complex data integration,
data warehousing and data analytics solutions for businesses. In all these
experiences, I have been witness to both the good and bad sides of working
with data. But, the common and most promising lesson I have learned is
that, the processes and skills needed to get in the game are not that
complicated.
With a bit of effort and dedication, anybody should be able to get in and
excel in the field of data analytics. My hope in this guide is to leverage my
experiences gained from working in the trenches analyzing data, to inform
readers and provide a simple 5 point game plan for success.
WHAT DOES THIS BOOK OFFER?
This book offers a survey of the Data Analytics landscape (for beginners)
and presents a slice of the tools and the skillsets that can help individuals
transform data into information. This would include discussing concepts
and tools like MS Excel, SQL, Data Warehousing, Programming and
Change.
After reading, you would be provided with empowering resources (at the
end of each chapter) that can help you gain the skills necessary to take on
the role of analyzing data confidently.
What this book is NOT about
Being a good data professional is about using technology to solve
business problems. As you start and mature in the space of data analytics,
you may find yourself talking to individuals or working for companies who
try to push you into learning technology X because they think it is superior
to Y, or vice versa. Or they may say technology A is dead in favor of
technology B.
Some might even say “Why Excel?” or “Why SQL?” My tool is better.
Choose mine!
Of course, the point of this book is not to sell you on any individual
technology or vendor, but to expose you to the fundamental industry trends
and concepts as far as data analytics is concerned. The specific technology
tool you end up using really isn’t the focus.
A good analogy I have seen to describe this is the good old fishing
parable with a slight twist.
In this book, I will teach you about fishing, why you should learn how to
fish for yourself, and provide resources for places where you can go learn
specific fishing techniques. But I will not teach you how to use a particular
fishing rod. That is up to you to learn, using books, links, videos and all the
other resources that have been provided here or that available online, at the
bookstore, or in libraries.
INTRODUCING THE DATA NINJA
Data Ninja is the term that will informally be used throughout this book
to refer to any individual (or group of individuals) who aspire to work
intimately with data and make a career in the analytics field — either as
Data Analyst, Report Writers, Business Intelligence, Data Architects, Data
Engineers, Data Scientists, etc.
The Data Ninja Defined
Because the possibilities for defining and classifying Data Professionals can
be very broad, and the required skillsets for each job role can vary widely,
in this book, the term Data Ninja would be referring to:
An entry level, unspecialized individual that works within a structured
environment (usually within a company or team), employing a variety of
tools and performs a variety of tasks related to collecting, organizing, and
interpreting data to gain useful information.
Ninja Tip 1.
As a data ninja, you would be expected to play the role of the multi skilled,
multi talented individual that helps companies quickly and effieciently
transform their data assets into information.
The Ideal Data Ninja Candidate
The definition of the Data Ninja provided above is extremely important
because it would help guide the scope and breadth of content covered in the
ensuing chapters of this book.
Because there are many different levels of expertise required by data
professionals, ranging from the non-technical persons in the business world
to PhD wielding Math and CompSci geniuses, it makes sense to get a clear
definition of the Data Ninja so as to narrowly guide the scope and content
of this book.
It is understandable that some readers, though Data Professionals
themselves, may be too advanced for the concepts presented herein. But we
aim to stick to the basics and target those readers who possess some or all
of the following skillsets:
Non-technical, but not scared of a few technical details here and there.
Have little or no experience with data.
Are excited about the prospects of working with data.
Just getting into the data analytics field.
Are looking to land their first job or make a career change.
Are looking for opportunities to progress in their current carriers.
Are curious and just looking to expand their fundamental knowledge
about the concepts.
If either one, all, or some of the above criteria fits you, then this book is
for you - and you are the ideal Data Ninja this book is targeted towards.
DATA NINJAS AS INTERPRETERS OF DATA
The process of data analysis undertaken by Data Ninja’s usually is to turn
some form of raw data into meaningful information. In this regard, we see
that Data Ninja’s are like data interpreters of the company – and it is critical
that they get the analysis done right.
A company that employs Data Professionals who do not perform their
jobs well would be akin to someone having a wrong interpreter on their side
– a point that is devastatingly exemplified by the interpreter who performed
interpretive services for dignitaries at Nelson Mandela’s funeral ceremony
in South Africa, back in 2013.
Some have accused the interpreter of being a fraud and have described
him as "waving his hands around but there was no meaning" or even
describing the situation as "childish hand gestures and clapping, it was as if
he had never learnt a word of sign language in his life".
I do not speak sign language, but if I did, I would be seriously offended
by someone who stood up and pretended, or was blatantly incompetent of
performing necessary interpretive services. The biggest losers in a situation
like that are the honest listeners in the crowd who depend on such
individuals to get information about what the speaker is saying.
Also, a similar search done on career site Dice.com came up with very
impressive numbers for Data analytics related job openings. More than
40,000 jobs were listed in a search on their database.
Translation
These numbers all seem to indicate a very strong outlook in the jobs market
for data professionals. The numbers also underscore the point that data is
the way of the future and that there is great need for skilled professionals
who aspire to make a career out of working with data.
WHAT INDUSTRIES DO DATA NINJA’S WORK IN?
Aspiring data professionals usually seek to understand what industries or
verticals Data Ninja’s work in. The simple answer to that is All Industries
and Verticals.
Data is everywhere, and today we see Data Ninja’s working part-time,
full-time and in a contractual basis in a wide variety of industries. This
ranges from non-profit organizations, government and education, to
healthcare, retail, high-tech, finance, e-commerce, and consumer products.
See the list below.
List of companies and industries that hire Data Ninja’s
Construction companies
Utility companies
Oil, gas and mining companies
Hospitals and healthcare organizations
Colleges and universities
Federal, provincial/state and municipal government departments
Transportation companies
Telecommunications companies
Insurance, finance and banking organizations
Management consulting companies
Manufacturing companies
Translation
Again, these numbers look very lucrative. Especially, given that a quick
analysis of the salaries reveals an above +5% average NET increase in pay
for 2015, compared to 2014, and the average salary figures are substantially
higher than the $51,939 median income we saw reported by the U.S.
Census bureau earlier for 2013.
What’s in for you?
Someone looking at these pay numbers might rightly ask; What does this
all mean for my pocketbook? Or better still, What’s in there for me? All
these are good questions to ask; and the answer to them is remarkably
simple.
With the growth of data picking up exponentially, with demand for data
professionals being very high and the supply for qualified candidates being
short, growth in pay increasing positively year after year, it becomes very
clear that aspiring or practicing Data Ninja’s all across the country have a
favorable career outlook in terms of wage and job prospects for years to
come.
And whether your goal is to keep the paychecks coming at a steady rate
or you are just looking to advance your career, the field of data analytics is
proving to be the place where you can confidently make that happen.
THE EXPONENTIAL GROWTH OF DATA
It would be hard to talk about anything related to Data Analytics without
acknowledging the exponential growth in data we are experiencing. The
world today is awash with data, and this data touches nearly all aspects of
our lives.
The far reaching tentacles of data today ranges from how we live our
social life online, how we communicate with loved ones or business
partners, how we bank our money or how we get health care.
Companies of all forms, shapes and sizes are experiencing the effects of
this astounding rates at which data volumes are growing. Projections of
about 40% growth a year into the next decade are not uncommon to see.
The chart presented above is what most people usually reference when
they talk about the rapid rate of growth of data today. The projected data
volumes by 2020 are truly mind-blowing, especially when compared in
perspective to where we are today.
Illustration
To put this into perspective, if the data volumes of today were the size of 2
door car sedans, the data volumes of the future, 2020 and beyond would be
reaching the sizes of Aircraft Carriers or more. The difference is truly
astonishing.
This point of comparison can be further substantiated if we look at some
dazzling facts about the data growth rates currently being experienced.
Dazzling facts about the current growth of data
Translation
Numbers don’t lie. These statistics on data growth rate is truly astounding,
and with more than 2 quintillion bytes of new data being generated every
single day, we now see companies, corporate managers and executives
scrambling to hire individuals who understand data and can work with it to
derive competitive value. For the data ninja, all this potentially translates
into you commanding a high demand position in the job market, long term
job security and strong pay in salary.
WHY ARE DATA NINJAS NEEDED?
According to a report published by McKinsey & Company’s Business
Technology Office in 2011 entitled Big data: The next frontier for
innovation, competition, and productivity, data has swept into every
industry and business function and is now an important factor of
production, alongside labor and capital.
This observation indicates significant need for Data Ninja-type talent in
the coming years and also emphasizes the great opportunities that this new
era of abundant data holds for companies, particularly in terms of being
able to use data to gain efficiency and tap into new business opportunities.
“The United States alone faces a shortage of 140,000 to 190,000 people
with deep analytical skills as well as 1.5 million managers and analysts to
analyze big data and make decisions based on their findings.”
(McKinsey)
Recently, research firm Gartner in one of their publications cited many
industries and companies as having a great need (demand) for more people
skilled in managing and analyzing data.
“By 2015, big data demand will reach 4.4 million jobs globally, but only
one-third of those jobs will be filled. The demand for big data is growing,
and enterprises will need to reassess their competencies and skills to
respond to this opportunity. Jobs that are filled will result in real financial
and competitive benefits for organizations. An important aspect of the
challenge in filling these jobs lies in the fact that enterprises need people
with new skills — data management, analytics and business expertise and
nontraditional skills necessary for extracting the value of big data, as well
as artists and designers for data visualization.”
(Gartner)
Even though the statistics and articles presented by some research firms
might explicitly make references to “Big Data” (a topic that is a bit more
advanced than the intended scope of this book), we nonetheless see the
move to the Big Data Analytics as a natural career progression step for any
person working in the data field.
Translation
As we can see from Gartner’s predictions above, the tremendous growth in
new jobs opportunities coupled with the potential shortages of suitable Data
Ninja-type individuals translates into healthier demand and salaries for
those individuals who are skilled and capable of crunching data.
WHAT DO DATA NINJAS DO?
Some of the most important responsibilities of Data Ninjas involve
collecting, sorting, and analyzing different sets of data to gain insights.
These datasets being analyzed can range from simple business metrics such
as sales numbers to more exotic datasets like user behavior and product
performance.
From Data to Wisdom and everything in between
The ultimate goal of any analysis effort carried out by Data Ninjas is to
transform data into information. In its raw form, data is just what it is, data.
And raw data is not very useful unless it is synthesized and transformed into
information that people and organizations can actually consume and act on.
To get a good sense of what data analysis is about, we would leverage an
existing life cycle that articulates how data gets transformed to wisdom.
This is the DIKW pyramid (shown in figure below).
Writing queries to retrieve data from a database and other data sources.
Scrub data to remove duplicates and other errors within the data.
Analyze data to find insights or trends that can be used to improve
their company's Key performance metrics (KPI).
Prepare reports based on analysis and present to management.
Ninja Tip 3.
There are many paths an aspiring Data Ninja can take, but understanding
the business you work with before doing any data analysis is extremely
important because businesses often have different needs and approaches to
working with data.
It’s Your Business to Know Your Business
A Data Ninja who works with social data at a company like Facebook
has a totally different datasets and might employ totally different techniques
to analyze data than, say, an analyst working at a financial firm like
Goldman Sachs, or an analyst at a health insurance company like United
Health Group.
So, it is important to understand the business you work with before doing
any data analysis – especially given that businesses often have different
needs and approaches to working with data.
THE TYPES OF ANALYSIS A DATA NINJA PERFORMS
In general, the process of analyzing data can be divided into exploratory
data analysis (EDA), where new features in the data are discovered, and
confirmatory data analysis (CDA), where existing hypotheses are proven
true or false.
Exploratory Data Analysis (EDA) - Example
A Data Ninja at a national retail chain, as part of their exploration, can
analyze and plot on a graph their product sales by region or zip code.
Without any prior knowledge of what to expect, the exploratory exercise
can offer insights and the revelation that a particular product sells more in
the East Coast stores than it does in the West Coasts stores, or that the sales
of a particular product spikes during severe snowstorms, than when the
weather is average.
These are findings that can be profound and has potentials to influence
the way the company works, advertises or uses data. When done right, EDA
can expose trends, patterns, and relationships that are not readily apparent.
The results from EDA analysis can help the company to improve their
marketing efforts or change the way they target their customers with ads or
open new lines of businesses that can in turn increase sales revenue, reduce
cost, and affect the bottom-line positively.
Numbers have an important story to tell. They rely on you to give them a
clear and convincing voice. Stephen Few
Confirmatory Data Analysis (CDA) - Example
A Data Ninja at an Ecommerce company can hypothesize that customers
who buy Product X have a 60% likelihood of buying Product Y if an ad
impression about Product Y is shown to them during the time of their
checkout.
This is a solid hypothesis that can be verified using data. In this case, the
Analyst can set out to examine website traffic or navigation patterns to
determine where their hypothesis is true, i.e. whether based on the data,
customers are more or less likely to buy Product Y based upon prior
exposure to impressions about the product.
Ninja Tip 4.
As a data ninja, you must enjoy working with data, have great attention to
detail and enjoy looking at data sets to find anomalies, outliers, and
patterns.
WHAT ARE THE SKILLSETS FOR THE DATA NINJA?
On a day to day basis, a Data Ninja will be required to employ a number of
skills to get the job done - such as their technical skills, business acumen,
presentation skills, database skills, analysis skills, and sometimes coding
abilities.
These skills allow Data Ninjas to perform their duties of analyzing data
with competence, as well as help them overcome any new challenges that
come up along the way.
Below, we have broken down the skillset requirements of Data Ninja’s
into two broad classifications.
1. Technical Skills
As a Data Ninja, your technical skills are absolutely essential to landing
and keeping your job. There is no getting around that. In the coming
chapters of this book, we will cover some of the technical skills (such as
Excel, SQL, Data Warehousing, and Programming) by presenting 5
Nuggets that are essential for professionals looking to make a career
working with data.
For the basics, some understanding of Excel and being able to work
proficiently in it will help (see Nugget on MS Excel). Also being able to
understand data sources, data structures, schemas, Data Warehousing,
Structured Query Language (SQL), and some programming concepts (if
possible) would be extremely useful.
Some Math and general understanding of statistics and set theory would
go a long way to help. More advanced technical users can go on to master
tools like Python, Matlab, and a Statistical Language (R, SAS, and SPSS).
These advanced concepts are recommended, but generally not required for
most entry level candidates and thus not covered in this book.
2. Soft Skills
In performing analysis work, defining the problem and narrowing the
analysis down often requires a lot of soft skills. When analyzing data for a
company or client, it is important to be able to balance your time, reduce
infinite “what-if?” scenarios and understand the priority of needs that are at
hand. Mastering all of these skills require good self-awareness and control.
Unlike hard technical skills (mentioned earlier), which comprise a
person's technical abilities to perform certain functional tasks, soft skills are
interpersonal and broadly applicable across job titles and industries. You
have to get well along with people you work with, be dependable, be
timely, be honest, be curious, and so on.
Interestingly, many soft skills are tied to an individuals' personality rather
than any formal training and are thus considered more difficult to develop
than the technical skills. But with continued practice and perseverance,
many data ninja professionals should be able to develop and advance the
soft skills required to perform the job.
Technical skills may get you the job, but soft skills will help you keep the
job.
Skillsets for the Data Ninja. (Nde)
From the diagram above, we see that analytical, critical thinking, and
math skills are absolutely essential to perform at a high level as a Data
Ninja. Generally speaking, the analytical and math skills might fall under
the technical skills category, while communication and critical thinking
skills might fall under the soft skills category.
Translation
All of these skills intersecting together produces the ideal Data Ninja
candidate i.e. someone who is acutely analytical, thinks critically and
communicates effectively.
Required Skillsets for the Data Ninja
Not each and every single skillset is absolutely required to be a
successful Data Ninja. Depending on the job, some of the skills may or may
not be a requirement. But having them can be advantageous to excelling
and thriving in your career.
For example, you can have a successful career as a Data Ninja without
knowing a lot of math and statistics, or how to write a line of programming,
but knowing math, statistics, or programming will simplify things when it
comes to solving very complex or advanced problems.
Ninja Tip 5.
It doesn’t really matter how much you know about the analytics process or
how much effort you have put into an analytics project.
If you can’t communicate your results in a clear and timely manner to
decision makers, then you can’t impact the business bottomline..
WHAT TOOLS DO DATA NINJA USE?
Tools used by Data Ninjas can vary widely depending on the level of
expertise, the specific jobs requirements, preferences in the company you
work at and much more. So, trying to provide an exhaustive list of every
single tool out in the market as part of this book would be next to
impossible.
But below we have listed of some of the general tools and concepts that
are highly recommended as the starting point for individuals looking to get
into the data analytics field.
Tools and Concepts for Data Ninja Candidates
MS Excel
SQL Server
Data Warehousing Concepts
Programming
Adapting
Each of these tools and concepts listed will be covered in more depth in the
ensuing chapters.
Ninja Tip 6.
As the saying goes, “Anybody can buy a tool, but only a few special people
can make magic happen with the tools they have.” As a Data Ninja, you
should definitely stop focusing on the tools at hand and instead focus on the
magic you want to see happen.
Tools are Important, but Not the End-All-Be-All
By the time you go on to read the specific details in each chapter, my hope
is to continuously convey the all-important message that although software
tools make analysis easier, they are only as valuable as the information that
you put in and analysis that you conduct. As one of the popular sayings
goes:
"Anybody can buy a tool. A few special people can make magic happen with
it."
Data Professionals should always have the mindset of striving to make
the best of whatever tool they have. As a Data Ninja, no matter what tool
you have at hand, I would encourage you to take a moment to challenge
yourself. Learn a few new tricks with the tools you work with. Then, let the
tools serve as a medium to enhance and complement the logic and
reasoning skills that you already have – instead of being a distraction to the
process.
WHAT TRAINING DO DATA NINJAS NEED?
There is a lot of training available for anyone interested in becoming a
Data Ninja. The trainings vary in scale and rigor, ranging from informal
training by personal study to more formalized training by pursuing
scholarly curricula at an accredited academic institution.
The more advanced role you play in the job, the more advanced the
training that may be required. But, given that this book naturally dwells on
entry level candidates, the training required to start off might not be as
rigorous or formal compared to what people might think.
Training for Data Ninja Candidates
To start off, candidates are typically encouraged (but not required) to
have an undergraduate degree in a field such as accounting, statistics,
mathematics, computer science or business. Most of these requirements for
formal degrees can be waived if the candidate has sufficient years of
experience, or demonstrates strong competence in performing the job-
specific roles offered to them.
Different employers might have different practices and hiring
requirements. As a result, some employers might require their Data Ninja
candidates to have a master’s or doctoral degree in an area closely related to
fields such as mathematics, accounting, statistics, computer science or
business. But in most cases, these advanced degree requirements are usually
only for candidates looking to get advanced analytics roles or leadership
roles, and usually do not apply for entry level or beginner candidates.
Translation
Training helps individuals gain a systematic approach to problem-
solving. Some intuition, artistry and guesswork may be needed when
analyzing data, but for the most part, the process is very scientific, with
very systematic and repeatable ways to go about analyzing a data set.
Understanding this systematic approach to working with data makes
things more standardized and saves data professionals the burden of having
to reinvent the wheel on concepts that have already been mastered. This is
the main value proposition we see offered by many training programs.
Training of Some Sort is Crucial for Success
Training sometimes gets a bad representation, because some people
might perceive it as being expensive or taking a lot of time. No matter the
circumstances, that should not stop you from pursuing a rigorous training
program that will give you the essential skills to perform competently at
your job as a Data Ninja. Without competence, you will not have a job.
Training, and the benefits from it, is a point that cannot be overemphasized.
Practice Makes Perfect when it comes to Data Ninja Training
Through books, videos, lessons and a myriad of online resources, it is
possible to teach yourself much of what is needed to be an exceptional data
analytics ninja.
No matter how you start or what route you take for training, it is possible to
become much better and even extra-ordinary in the data analytics game by
making a commitment to learning and embracing new techniques to help
solve business problems.
In addition to reading and studying up on the concepts presented in this
book, to get much better at working with data, you have to actually do it;
and do it as often as you can. This is where practicing more will make you
more perfect at what you do.
Ninja Tip 7.
As a Data Ninja candidate looking to land your ideal job, there are many
benefits to aggressively pursuing a training path and a regiment of
continous learning – whether by formal training or not.
Continuous learning is about the constant expansion of skills and skill-sets
through learning and increasing knowledge. As life changes the need to
adapt both professionally and personally would be as important as the
changes themselves.
SAMPLE: REAL WORLD DATA NINJA STORY
One of my first real, non-restaurant jobs was as a “Data Analyst” for a
really large insurance/healthcare corporation. I worked in the area that
managed the marketing database for the company.
For example, we could only market to certain zip codes (by law) and once
I had to input something like 10,000 zip codes into a database in about
three days.
We would also do a lot of analysis on what groups were more responsive
to our marketing campaigns. So I’d end up throwing a ton of information
into a spread sheet (Lotus 123 at the time). Excel is very similar and then
doing lots of analysis to find the best performing groups (People between
the ages of 55-65 who live in non-urban areas of Florida might be an
example).
Using this info, we’d go out and try to find more people (our target
audience) who mirrored the most responsive groups. Purchasing a mailing
list or advertising in certain publications who’s readers are similar to these
people (like AARP) would be a way in which to target them.
You might be responsible for coming up with the stats for the target
audience and then going out and finding them. You’ll also spend vast
amounts of time in front of a computer inputting and looking at data. You
may also work with programmers (who may be in India or somewhere else).
In order to do a good job, you’ll need to be very detail oriented. You will
need to like to work with numbers. Logical/critical thinking skills required.
You will need to be ok with sitting in front of a computer looking at data for
long periods of time. There may not be much room for artistic expression.
Leann C.
December 7, 2010 at 5:32 am
SAMPLE: REAL WORLD DATA NINJA JOB DESCRIPTION AND
POSTINGS
In this section, we present a sample Data Ninja Job description. This
sample is available free on the resources.workable.com website and we
have gone through the job description to highlight and call out the specific
skills and requirements that could be of importance to an aspiring data
ninja.
As mentioned earlier, different companies and industries may have
specific job requirements tailored to the specific Data Analytics role that
they are looking to fill, but in general, there are some broad skills and
concepts that can be found in most résumés.
Note: The goal of presenting this sample is to serve for educational
purposes ONLY, and to highlight some of the skills employers look for
when finding Data Ninja’s in the real world.
(WFM)
Data analyst job posting (workable)
MASTER MS EXCEL
MOTIVATION
MS Excel is basically a spreadsheet developed by Microsoft Corporation
for windows and other OS versions. The product allows for easy
calculations, graphing, tabulation and pivoting of data.
A poll from kdnuggets.com shows Excel on top of the list as one of the
most popular analytical tools being used in the industry today.
Compared to other products in its class, Excel stands as a very powerful
tool and certainly has its place in the market as far as data manipulation and
analytics is concerned.
The ubiquity and versatility of the product gives users the ability to
manipulate, cleanse, and merge data sets with relative ease. As an analytics
platform for small datasets, Excel has proven to be very generous and can
seriously reward anyone who takes the time to learn and play with its
formulas and calculations.
MS EXCEL CORE FUNCTIONALITIES
MS Excel offers several functionalities that are useful in analyzing and
working with data. Here are the main ones:
Whether you want to summarize daily sales data for your company by
line of business (LOB), or sum employees’ total working hours for the
week, pivot tables will let you do that with relative ease.
MS EXCEL FOR DATA PRESENTATION
Another extremely important feature within excel is that of presentation.
The results of any data analysis usually need to be presented to users and
decision makers for consumption. Microsoft Excel excels at this.
The sky is literally the limit for using Microsoft Excel for dash boarding
and presentation of data that tells a consistent and coherent story. (Excel)
Pie Charts
Maps
KPIs (Key Performance Indicators)
Hierarchies
Drill Up and Drill Down
Background Color and Background Images
Hyperlinks
MICROSOFT POWERBI
In recent years, Microsoft has been putting in efforts toward a number of
integrated components for data collection, analysis and visualization. These
products are currently being distributed under the solution named Power BI.
Some of the products within the Power BI ecosystem include:
Ninja Tip 8.
Choosing between Excel and some other solutions for your data analytics
purposes is not an either/or proposition, nor is it a zero sum game.
Excel has capabilities that are proving to be very valuable for certain types
of data analysis; and the point of this book is to make that clear and
encourage you as a Data Ninja to go out, explore the tool further and use it
for those use cases that works well for you.
SUMMARY
In this nugget, I have presented a good suite of solutions that come with
Excel which can help companies and their analysts do wonders with data. I
have also tried to present some of the drawbacks that can come with
building enterprise-wide solutions on Excel.
But I also realize there are those who would rather focus all the attention
on the negatives rather than the positives of Excel — or any other product
for that matter.
Excel is Extremely Good, but has its Limitations
As mentioned earlier, the point of this book is not to get into a debate over
whether Excel is good or bad as an analytical tool, but instead to appreciate
that Excel has potentials (whether you like it or not) which may or may not
be useful for your needs.
When it comes to data analytics, Microsoft Excel should not be seen as the
panacea, because it simply isn’t — and no single tool is for that matter.
Excel offers a lot of functionalities to help companies and data ninja
professionals in their analytical journey. But, Excel doesn’t (and shouldn’t
be expected) to solve all problems faced by companies today.
Nonetheless, it has a vital and pivotal role to play within the data analytics
ecosystem, and must not be ignored.
Learn to extend Excel using add-ins and scale using SharePoint and
Office 365.
PowerBI — including PowerPivot, PowerQuery, PowerViews, and
PowerMaps — extends the capabilities of native Excel tremendously.
So, don’t ignore it.
Excel is here and is promising to be around for a while, so learn it.
Master the formulas and techniques for acquiring, analyzing and
presenting data from different data sources.
CONQUER SQL
MOTIVATION
As companies embrace data for better and faster decision making, the
database environments they use have become increasingly complex. The
need for mastery of a computer language aimed at accessing, manipulating,
and querying data stored in relational databases has become extremely
important. This is where SQL comes in.
WHAT IS SQL?
Structured Query language (SQL) pronounced as sequel or ess-queue-ell
— is the primary language used to request information from a database and
it is everywhere.
For example, a database-driven dynamic web page takes user input from
forms and clicks and uses it to compose a SQL query that retrieves
information from the database required to generate the next web page.
Even more astounding is the fact that all Android Phones and iPhones
have easy access to a SQL database called SQLite and many applications
on your phone use it directly.
Today, many of the applications that run our banks, hospitals,
universities, governments, small businesses, and just about every computer
eventually touches something running SQL.
This ubiquity makes SQL an incredibly powerful tool and it has proven
itself over the years to be a very successful and solid data analytics
technology worth mastering.
WHY IS SQL BENEFICIAL
SQL is especially beneficial because it provides some standardization to
the way data in databases can be queried. It is tremendously flexible,
powerful, and very accessible, which makes it simple to master. Some of
the key benefits of using SQL to store, manage and analyze data over other
approaches are listed below:
With SQL, you can build databases, enter data into the database, manipulate
data, and query the database data with relative ease.
Ninja Tip 9.
There are many database products such as Microsoft SQL Server, Oracle,
Netezza, Teradata, MySQL, etc. which support SQL.
RDBMS SYSTEMS
At the core of SQL is the Relational Database Management System
(RDBMS).
DBMS Defined
A relational database management system (RDBMS) is a program that lets
you create, update, and administer a relational database.
In an RDBMS, data is structured in database tables, fields, and records.
Tables within the RDBMS might be related by common fields for easy
cross-table querying.
RDBMS also provides relational operations (in the form of SQL) to
manipulate and/or store data into the database tables.
RDBMS is the basis for SQL in all modern database systems like MS SQL
Server, IBM DB2, Oracle and MySQL
KEY CHARACTERISTICS OF THE RDBMS
VENDORS OF RDBMS
There are many vendors supplying RDBMS products in the market —
some of which are proprietary and some which are open-sourced. A few
examples of these RDBMS systems include: PostgreSQL, SQLite, MySQL,
MSSQL Server, Oracle, Teradata, Netezza, and Sybase.
MANY FLAVORS OF SQL
Although (in theory) SQL is standardized, in practice it is not. There are
many vendors in the market and each of them has their own variation and
flavor of the language. In general, SQL written for one RDBMS system,
such as Sybase, may not work for another RDBMS system, such as MySQL
or PostgreSQL, because the syntax is different.
SQL Portability
SQL database platforms tend to implement the SQL standard in
different ways. For example, the SQL date and time data types are
sometimes omitted in favor of proprietary solutions.
PostgreSQL notoriously contains a number of custom data types; for
instance, it provides an entire range of data types that define geometric
objects (e.g. box and line). These geometric object types are not necessarily
available in other database systems, so the database developer who uses
those types may be “locked in” to PostgreSQL. This situation would arise if
converting the geometric object types to another type usable by a different
database would consume too much time or be altogether impossible.
An argument typically made against complaints about SQL’s lack of
portability is that the SQL standard, despite being long and complex, is not
completely defined and, in some cases, is ambiguous.
(Learn)
DATA ANALYSIS WITH SQL
Today, SQL is the premier language used for querying and working with
relational data. This is accomplished by writing SQL query statements.
SQL statements are divided into two main categories: DML (Data
Modification Language) and DDL (Data Definition Language). Below, we
provide a high-level overview of these two categories and how to leverage
them for your data analytics needs.
SELECT
UPDATE and INSERT statements
JOINS
DISTINCT
IN
BETWEEN
LIKE
GROUP BY
ORDER BY
PARTITION BY
AGGREGATE FUNCTIONS such as AVG, MIN, MAX, SUM,
COUNT, etc.
The processes performed by DML statements are what a data ninja may
be tasked with performing on a day-to-day basis. That is why it is
recommended that readers are proficient or at least familiar with the
concepts of writing SQL DML statements.
Data Definition Language (DDL)
DDL statements are used to build and modify the structure of objects in a
database. These database objects include views, schemas, tables, indexes,
etc. Some examples of DDL statements:
Example SQL DDL:
TYPES OF DBA
SUMMARY
A lot of the data in existence today is stored in RDBMS databases and
SQL is the premier interface used to access and manipulate this data.
Your smartphone stores its contact database in a relational database. Your
online banking information and all your financial history, statements,
personal data and so forth, are all stored in a relational database of some
sort.
SQL is the primary language used to access and analyze this information.
So, as a data ninja who aspires to work intimately with data, it is paramount
to master SQL.
STRUCTURED QUERY LANGUAGE (SQL) - NINJA RESOURCES
CORNER & SHADOW SKILLS
Now that you have read this nugget on SQL and have got your feet wet, I
would recommend that you continue to read and practice your skills. The
more you read the more you are going to learn, and pretty soon you will be
fluent in writing SQL statements to interrogate data.
I have put together the best learning resources that I have come across to
help you on your journey. Below you will find links to paid online courses
that have spent years developing their videos and courses to really immerse
you into the material. If you are not ready to shell out some cash, there are
also free courses that have excellent resources available.
First we will look at some of the premium resources that are available to
purchase. These resources typically have the most to offer and will carry
you further than some of the free resources.
Lynda.com
Lynda.com is a very popular online education company. They offer
thousands of different courses for creative software and business skills.
Inside the MYSQL course they have several different skill levels from
beginner to advance depending on your skill level. They offer a 10 day trial
period to get started; a perfect way to see if their services are right for you.
At the end of the trial you can choose between different levels of payment,
from $25 a month on a month-to-month basis to $375 annually.
Link : http://www.lynda.com/search?q=sql
Infinite Skills
Infinite Skills was recently purchased by O’Reilly media. They offer 142
training videos on MYSQL. They also have downloadable practical files
that help further your skills beyond the videos. I find their website to be a
little confusing and counterintuitive but they do offer good videos that will
help you. They offer a $25 month-to-month fee that includes a mobile app.
Link: http://www.infiniteskills.com/training/sql-beyond-the-basics.html
Learn Now Online
Learn Now Online has a wide range of topics from programming and
mobile development to SQL. They have a nice set of online videos and
options to choose from. In terms of cost, they are a little more affordable
with options starting at $49 annually.
Link: http://www.learnnowonline.com/
Paid premium courses are not for everyone, maybe you want to dive in a
little deeper before you decide to pay to further your skills. Below are some
great free online courses.
Udemy
Udemy is an online marketplace where experts can create their own courses
which can then be offered to the public for free. Each course has a different
author and has user reviews so you can decide which course will be best for
you.
Link: http://www.udemy.com/
Learn Code the Hard Way
Learn Code the Hard Way offers books on various subjects. They are
currently working on an SQL book, but have posted the book online for free
while they work on it. You can view the book by chapters. Don’t let the
name fool you, it is very approachable with easy to understand topics. I am
assuming once the book is completed it will be available to purchase from
their website.
Link: http://sql.learncodethehardway.org/
SQLSeverCentral
SQL Server Central is a resource in the Microsoft SQL severs community.
It has many DBAs, developers and users, plenty of useful and valuable
information here. This is one to keep booked mark as you continue your
career in SQL.
Link: http://www.sqlservercentral.com/
SQL Fiddle
SQL Fiddle allows you to select a database, build a schema, populate the
schema, and run queries against it. SQL Fiddle is a great resource for
practicing different syntax of SQL and testing your queries.
Link: http://sqlfiddle.com/
Database Journal
Data base Journal is a script library, they offer a huge data base in an assay
of subjects. They have articles, news, and tutorials all offered for free. Feel
free to post questions and comments on their forum. They update their
databases frequently and have topics that date back to 2010 up to 2015.
Link: http://www.databasejournal.com/scripts/
SQL-Tutorial
SQL-Tutorial offers problems and solutions. They have resources for
novice users and those who feel they already have a grasp on SQL but want
to learn more. They will help you to program queries. The information is
presented as an Ebook you can read through.
Link: http://www.sql-tutorial.ru/
1KeyData SQL
1KeyData SQL is a very nice resource to help you with SQL. They have
common SQL commands, functions, constraints, and tables available to
access whenever you may need them. They also offer some video tutorials
and quizzes to help you along.
Link: http://www.1keydata.com/sql/sql.html
The Schemaverse
The Schemaverse is a space-based strategy game implemented entirely
within a PostgreSQL database. Play against other players using raw SQL
commands to command your fleet. This is a fun way to keep your skills
sharp.
Link: https://schemaverse.com/
SQL Zoo
SQL Zoo is a step-by-step tutorial with live interpreters, allowing access to
tables using any of Oracle, SQL server, MYSQL, and PostgreSQL engines.
Once you feel ready they also have online quizzes to help assess your skills.
Link: http://sqlzoo.net/
Tutorials Point
Tutorials Point has tons of free online tutorials and reference manuals. They
also offer premium services at a fee, if you decide to pay you will get
premium support and instructor help. If you do not wish to pay, their free
resources are excellent and will help you from the installation of MYSQL
all the way through to importing databases.
Link: http://www.tutorialspoint.com//sql/index.htm
SQLCourse
SQLCourse is an interactive online SQL training resource that offers free
training. They will get you started with the basics and move you along to
more advanced topics. The site is funded by advertisements so you will
have to scroll past some ads while you are reading but they offer great
material all about SQL.
Link: http://www.sqlcourse.com/
W3Schools
W3Schools offers free material to view but they also offer premium
services such as certificates. In order to receive a certificate you must pay
the premium price of $95 and pass an online test. It has a built in interpreter
in the browser so you can try different queries and see the outcomes.
Link: http://www.w3schools.com/
Sol Tutorials
Sol Tutorials GalaXQL is an interactive SQL tutorial. This is another fun
tutorial, take the journey into outer space while writing SQL code. The site
was created by Kari Komppa and is totally nonprofit and is very limited
with ads, an enjoyable resource all around.
Link: http://sol.gfxile.net/galaxql.html
These are all great online resources you can use to help yourself along on
your journey to mastering SQL, but sometimes you need to give your eyes a
rest from the screen and turn to a physical book.
Holding a book and turning the pages has always been my favorite way to
learn. There is something about highlighting and underlining key points in
the book that just seems to help me remember.
I have collected some of my favorite titles and created a small list below.
These titles can be found anywhere that sells computer reference titles.
SQL in 10 minutes
Author: Ben Forta
Published by: Sams Publishing
SBN-13: 978-0672336072
Learning SQL
Author: Alan Beaulieu
Published by: O’Reilly Media
ISBN-13: 978-059652083
SQL Cookbook
Author: Anthony Molinaro
Published by: O’Reilly Media
ISBN-13: 978-0596009762
SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in
SQL (3rd Edition)
Author: John Viescas
Published by: Addison-Wesley Professional
ISBN-13: 978-0321992475
Head First SQL
Author: Lynn Beighley
Published by: O’Reilly Media
ISBN-13: 978-0596526849
It is important to keep learning and using what you’ve learned. You will
only get more proficient as time goes on, so keep fine tuning your skills and
never give up.
KEY NINJA LESSONS
MOTIVATION
In the first Nugget, I addressed the importance for a data ninja to
leverage Excel for their analytical needs. The second Nugget was about
leveraging RDBMS systems to go beyond Excel for analysis. In this
Nugget, we will explore the intricacies of working huge datasets and the
need to adequately store them in data warehouses.
Numbers have an important story to tell. They rely on you to give them a
clear and convincing voice.
-Stephen Few
As mentioned in the nugget for MS Excel, using Excel for big data projects
can pose some serious challenges, especially relating to privacy, data
redundancy and concurrency issues that arise when users retain their own
personal copies of sensitive corporate data on the personal computers and
laptops.
Because of these challenges with using Excel spreadsheets alone,
companies often find themselves needing other more robust, enterprise-
scale solutions to help out.
The solution that often comes to the rescue when companies are
challenged with the need to move and store huge volumes of data falls
broadly into the category of Data Warehousing (DW).
WHAT IS A DATA WAREHOUSE?
Many organizations today have a data warehouse of one form or another.
A data warehouse serves many purposes within organizations, but at a basic
level a data warehouse is defined as a massive database typically housed on
a cluster of servers, or a mini or mainframe computer serving as a
centralized repository of all data generated by all departments and units of a
large organization.
Brief History of Data Warehousing
The term was coined by the W. H. Inmon, a well prominent figure in the
field of data warehousing.
The DW consolidates data from a variety of sources in one centralized
location and is typically designed to support Business Intelligence
processes, along with strategic and tactical decision making.
Data Warehousing Defined
Data warehousing allows a company or organization to create a
consolidated view of its enterprise data, optimized for reporting and
analysis. Basically, a data warehouse is an aggregated, sometimes
summarized copy of transaction and non-transaction data specifically
structured for dynamic queries and fast, efficient business analytics.
(InformationBuilders)
With all the companies’ data available in one location, i.e. the Data
Warehouse, companies can provide data consumers with a coherent picture
of the business at a point in time.
DATA WAREHOUSE ARCHITECTURE
In data warehousing, data and information are extracted from
heterogeneous production data sources as they are generated, or in periodic
stages and loaded to the Data Warehouse. This approach makes it simpler
and more efficient to run queries over data that originally came from
different sources.
The diagram below captures the complete architecture of an end-to-end
data solution within a company and it shows the pivotal role played by the
Data Warehouse.
DIMENSIONAL MODELLING
Dimensional modelling is a crucial part of the Data Warehouse process.
Given that most data warehouses today follow the dimensional model
pattern, an understanding of the concept of dimensional modelling is
therefore extremely important when performing analytics.
In dimensional modelling, all data is contained in two types of tables
called Fact Table and Dimension Table. The Fact table contains the
measurements, metrics or facts of business processes, while the
Dimensional Tables contain the context of the measurements.
Dimensional modeling is different from the normalized modeling (which
is more focused on reducing and eliminating data redundancy) to enable
analysis and querying through massive and unpredicted queries. The
processing of massive and unpredicted queries are some of the things which
is a relational model is ill-equipped to handle.
Dimensional Model Pro’s ad Con’s
Pros:
Cons:
Data warehousing concepts are vital to learn as they provide the full
picture of data through its life cycle — from creation, movement,
storage and consumption.
Dimensional model may be used for any reporting or querying of data
The data warehouse provides an environment separate from the
operational systems and is completely designed for decision-support,
analytical-reporting, ad-hoc queries, and data mining.
PICKUP CODING
MOTIVATION
Computers are critical component of all our lives. Most things we
interact with in the world today are now run directly or indirectly by
computer systems. As a result, it's become crucial than ever for everyone
(young and old) to learn programming or at least understand the concepts.
Bill Gates and Mark Zuckerberg recently donated ten million dollars to
Code.org, a non-profit that believes that “every student in every school
should have the opportunity to learn computer programming,” and that
“computer science should be a part of the core curriculum.”
(NewYorker)
MAKING A CASE FOR PROGRAMMING SKILLS
Coding is not a goal. It’s a tool for solving problems. Learning to program
teaches computational thinking and Computational thinking teaches people
how to tackle large problems by breaking them down into a sequence of
smaller, more manageable problems.
You Can Play God
When you program, you are a creator. You go from a blank text file to a
working program with nothing to limit you but your imagination (and
maybe some issues like how long your program takes to run). Programming
is like having access to the absolute best set of legos in the world in almost
unlimited qualities. Even better, you can get all of your building materials
completely for free (once you own a computer) on the internet. Amazing!
It's also great fun to see someone using something that you made. Your
ability to improve your life and the lives of your friends and family is
limited only by your ideas once you can take full control of your computer.
Moreover, your work can be extremely high quality because the limiting
factor is not manual dexterity or other non-mental attributes. If you can
understand a programming technique, you can implement and use it.
(Cprogramming.com)
WHAT IS PROGRAMMING?
In general, programming is defined as the vocabulary and set of
grammatical rules for instructing a computer to perform specific tasks.
Programming is the process of designing, writing, testing, debugging, and
maintaining the source code of computer programs. This code can be
written in a variety of computer programming languages. Some of these
languages include Java, C, and Python. Computer code is a collection of
typed words that the computer can clearly understand. Just as a human
translator might translate from the English language to Spanish, the
computer interprets these words as ones and zeros. We as humans use
programming languages, instead of writing directly in ones and zeros, so we
can easily write and understand the computer code and can organize it. We
can think of the different lines of our code as being individual instructions
that we give to the computer. The computer follows these instructions
explicitly to execute our written code.
(EarSketch)
Programming is highly detailed work, and it usually involves fluency in
several languages. Projects can be short and require only a few days of
coding, or they can be very long, involving upward of a year to write.
WHY LEARN HOW TO PROGRAM?
Many reasons can be given as to why it is important to learn
programming. But what is most important of all of the reasons that can be
provided is the attitude embodied by most programmers. Programmer’s use
their skills to primarily discompose and solve complex and challenging
problems.
It is often said that some people, when faced with a challenging
situations, throw their hands up in surrender and run away. Others, when
faced with similar challenging problems, will set about trying to break
down the problem into subsets and work on it until they understand what is
going on. The latter are those who make for good programmers. They solve
challenging problems and they like doing it.
What Experts Say About the Mastery of Programming Skills
Coding isn’t particularly easy to learn but that’s exactly why it’s so
valuable. Even if you have no plans to become a software developer, spend
a few weeks or month learning to code and I can guarantee it will sharpen
your ability to troubleshot and solve problems.
(DIY Genius)
A deep understanding of programming, in particular the notions of
successive decomposition as a mode of analysis and debugging of trial
solutions, results in significant educational benefits in many domains of
discourse, including those unrelated to computers and information
technology per se.
(Seymour Papert, in "Mindstorms")
It has often been said that a person does not really understand something
until he teaches it to someone else. Actually a person does not really
understand something until after teaching it to a computer, i.e., Express it
as an algorithm.
(Donald Knuth, in "American Mathematical Monthly," 81)
Computers are not sycophants and won't make enthusiastic noises to ensure
their promotion or camouflage what they don't know. What you get is what
you said.
(James P. Hogan in "Mind Matters")
“I think everybody in this country should learn how to program a computer
because it teaches you how to think.”
(Steve Jobs)
When you learn to read, you can the read to learn. And it’s same the thing
with coding: If you learn to code, you can the code to learn.
(Mitch Resnick)
Work your way up the programming ladder
As you work with data and mature within the data analytics space,
inevitably you might progress from working with small data in spreadsheets
to crunching Big Data with tools like Hadoop and Map Reduce and then
maybe onto being a Data Scientist. In such roles, the need for programming
becomes even more paramount.
But, when we talk about programming, it does not have to be fancy or
complicated. It can be as simple as creating simple routines, or scripts, or
workflows to automate mundane tasks, such as moving files, searching
folders, merging data sets, creating new datasets, de-duplicating datasets,
standardizing datasets, etc.
So start small and walk your way up the ladder by continuously
practicing and developing your skills.
STARTING PROGRAMMING AS A NOVICE
As a novice to programming, you can start simple. The journey of a
thousand miles begins with the first step.
The most common question asked by anybody new to computer
programming is “What language is the best to start with?“. Many people
will tell you to jump straight into it by learning a more advanced language
such as C++ or Java, others will tell you to start with a more dated language
such as C. In my personal opinion, the best programming language to begin
learning is Visual Basic .NET. VB.NET is a really good language to learn
for a beginner because it requires no previous experience in programming.
The Syntax used in VB.NET is simple and very easy to understand.
Learning Visual Basic will give you a basic understanding of how computer
programming works and is also really entertaining! Although VB.NET is a
good place to start, I would not recommend using it for too long. More
advanced languages have a more advanced syntax and spending all of your
time using VB.NET could make it harder to move onto the more advanced
languages in the future.
Although every programming language has a different syntax, most
programming languages are similar. The first language that you learn will
be the hardest language that you learn because the concept will be new to
you. After learning your first language, you will have an understanding of
how computer programming works and that will help you a lot when it
comes to learning other languages. If you chose a language such as C++
with a more complicated syntax then it is going to be very confusing and
hard for you to understand if you do not have any prior experience. The first
language that you choose to learn is completely your choice, but we
strongly recommend that you begin with VB.NET.
(HowTo)
As the excerpt from Howtostartprogramming.com article articulates, the
first step to programming may entail choosing a language and then writing
a simple “Hello, World!” program. It’s that simple.
From there you can progress to understanding more complex concepts,
such as language syntax, operators, variables and assignments, data types,
flow controls, arrays and iterators, etc.
With the simpler concepts mastered, depending on the programming
language, you can then progress to other concepts such as classes, objects,
methods, instances and instantiation. Eventually you can move on to more
advanced concepts like threads, concurrency, etc.
Practice Makes Perfect
I must acknowledge that getting into the programming game can prove to
be a challenge and poses a serious learning curve for non-programmers.
But, I would encourage anyone looking to take that step to not be
intimidated by the process.
The one important thing I’ve come to realize is that when learning to
program, as with any other thing we learn in life, we don’t start off by being
experts.
It takes practice, courage, determination and then some more practice in
order to succeed. I wish I could say it otherwise, but there is simply no way
of getting around the practice part of it. So, go out and start practicing.
Programming Success
o Get interested in programming, and do some because it is fun. Make
sure that it keeps being enough fun so that you will be willing to put in
your ten years/10,000 hours.
o Program. The best kind of learning is learning by doing. To put it
more technically, "the maximal level of performance for individuals in
a given domain is not attained automatically as a function of extended
experience, but the level of performance can be increased even by
highly experienced individuals as a result of deliberate efforts to
improve." (p. 366) and "the most effective learning requires a well-
defined task with an appropriate difficulty level for the particular
individual, informative feedback, and opportunities for repetition and
corrections of errors." (p. 20-21) The book Cognition in Practice:
Mind, Mathematics, and Culture in Everyday Life is an interesting
reference for this viewpoint.
Talk with other programmers; read other programs. This is more
important than any book or training course.
If you want, put in four years at a college (or more at a graduate
school). This will give you access to some jobs that require credentials,
and it will give you a deeper understanding of the field, but if you don't
enjoy school, you can (with some dedication) get similar experience on
your own or on the job. In any case, book learning alone won't be
enough. "Computer science education cannot make anybody an expert
programmer any more than studying brushes and pigment can make
somebody an expert painter" says Eric Raymond, author of The New
Hacker's Dictionary. One of the best programmers I ever hired had
only a High School degree; he's produced a lot of great software, has
his own news group, and made enough in stock options to buy his own
nightclub.
Work on projects with other programmers. Be the best programmer on
some projects; be the worst on some others. When you're the best, you
get to test your abilities to lead a project, and to inspire others with
your vision. When you're the worst, you learn what the masters do, and
you learn what they don't like to do (because they make you do it for
them).
Work on projects after other programmers. Understand a program
written by someone else. See what it takes to understand and fix it
when the original programmers are not around. Think about how to
design your programs to make it easier for those who will maintain
them after you.
Learn at least a half dozen programming languages. Include one
language that emphasizes class abstractions (like Java or C++), one
that emphasizes functional abstraction (like Lisp or ML or Haskell),
one that supports syntactic abstraction (like Lisp), one that supports
declarative specifications (like Prolog or C++ templates), and one that
emphasizes parallelism (like Clojure or Go).
(Norvig)
MOTIVATION
Looking back 20 years or so, cassette tapes, walk mans, and floppy disks
were the norm, and every cool kid on the block wanted to own one. But,
these once "cool" technologies of the 1980s and 1990s bears almost no
resemblance to what we have today.
In the same way, our jobs and organizations of today probably bear little
resemblance to that time. Or, let’s play that forward, and look ahead 20
years from now. For one, we can guarantee that things would have changed
and would not be same as they are today.
In that scenario, we see that new gadgets would have sprung into
existence, companies would have upgraded their platforms and tools, and
the way we do business or interact with each other would have changed.
Dealing with all of this change can be daunting. Yet being able to do so is
vital to your career as a successful Data Ninja.
HYPE CYCLE FOR EMERGING TECHNOLOGIES
Making predictions about the future can be hard, especially when the
changes happen almost on a daily basis. In the industry, the Gartner
Technology Hype cycle is the leader in making predictions about what
technology tools would live or flop in coming years. Mind you, it's only a
prediction - not a declaration.
For over the 10 years it's been published, they have over time added a
comprehensive range of hype cycles covering technology applications like
Ecommerce, CRM, ERP and Business Intelligence. (Many of their
predictions are only available to subscribers, but Gartner do share some of
the broader hype cycles through their blog/press releases.)
Gartner’s Hype Cycle for Emerging Technologies Maps the Journey to
Digital Business
Computing Everywhere
The Internet of Things (IoT)
Big Data
REVOLUTIONARY CHANGES
Revolutionary changes, on the other hand, are the ones that come much
less often, and when they do, they fundamentally change the affected
industries and rock it from top to bottom.
Think about inventions such as paper currency, the light bulb,
automobile, anti-biotic, transistors, micro-processor and more. Without
these inventions, we probably won’t be where we are today as a society.
Even in the world of working with data, we see these kinds of changes on
the horizon of how we collect, store and process data.
Today in the mainstream, we think in terms of databases, with tables,
rows and columns when storing data. Tomorrow, that concept of databases
as we know it may change, replaced by entirely new constructs – forcing us
all to eschew what we previously knew and adopt the new ones.
Tools like Graph databases, NoSQL, Hadoop, Spark, and many other
products are all standing on the cutting edge, promising to fundamentally
change the field of data analytics as we know it.
Whether or not these technologies will actually deliver on their promises
and the hype surrounding them, is not a prediction I’m comfortable
making.
But one way or the other, things will change, including the tools and
techniques we currently use for data analysis.
SUMMARY
Change is constantly around us. Sometimes it is so minute and consistent
that we do not notice it, while other times it is so severe and sudden that it
bowls us over.
As Darwin famously wrote in his book, The Origins of Species:
It is not the strongest of the species that survives, nor the most intelligent
that survives. It is the one that is most adaptable to change.
--Charles Darwin
The field of data analytics is constantly changing and no one (including
me) can make a prediction of what tools or processes will be used 5, 10 or
20 years from now. At the very most, all we can do is speculate.
But despite the certainty of the inevitability of change, that fact alone
should not paralyze or prevent us from being effective data ninjas. What it
should serve to do is to make us prepared and ready to adapt to new
technologies, tools or practices the future throws at us.
As a data Ninja, if you can learn how to stay relatively unaffected by
change, handle new technological developments with confidence, and adapt
to any curve balls that come your way, then you will stand the test of time
and have a better time maturing with your career in the profession.
Ninja Tip 12.
In his classic work The Art of War, author and military strategist Sun Tzu
wrote about the importance of observing signs of the enemy.
In it, he wrote that movement among trees in a forest indicates an
advancing enemy brigate, and that dust that rose in a high column indicated
the approach of chariots.
In same token of observing signs, it’s important that you find and pay
attention to such vital signs in your career. It would help you know when or
when not to make critical career decisions.
CHANGE AND ADAPTATION - NINJA RESOURCES
The Tech world is a fast paced, ever changing, living organism. New
advancements can make a whole field obsolete overnight. Startup
companies are popping up everywhere, offering a vast array of products,
services, and technologies. It is crucial to stay current and on top of how
things are changing, to keep yourself competitive and relevant.
How Technology Is Transforming Our Brains
Published, 2013
http://www.digitaltonto.com/2013/how-technology-is-transforming-our-
brains/
Top 5 Reasons why software professionals need social skills, too
Published, 2011
http://www.computertrainingschools.com/articles/importance-of-social-
skills-for-tech-professionals.html
10 highly valued soft skills for IT pros
Published, 2013
http://www.techrepublic.com/blog/10-things/10-highly-valued-soft-skills-
for-it-pros/
7 Simple Ways to Stay Current on Technology
Published, 2012
http://allthingsadmin.com/administrative-professionals/stay-current-
technology/
8 Ways to Advance Your Career by Staying Relevant
Published, 2012
http://www.cio.com/article/2448966/careers-staffing/8-ways-to-advance-
your-it-career-by-staying-relevant.html
6 Ways to stay Current in Your Field and Advance
Published, 2010
http://www.personalbrandingblog.com/6-ways-to-stay-current-in-your-
field-and-advance/
As you can see some of these articles are a few years old, which in the
Tech world could means they are obsolete. But efforts have been made in
selecting the articles listed to ensure their relevance.
To stay informed and be able to evolve in the field of data analytics, you
need to keep reading and learning new concepts and techniques. Below is a
list of my favorite Tech blogs and websites that can be a valuable source of
news and updates for you to learn and stay informed on technologies,
especially relating to general trends within the industry. I have tried to only
include sites that have a stable revenue and user base to ensure that they
will be around for years to come.
ZDNet
ZDNet was founded in 1991 and acquired by CNET in 2000. ZDNet
publishes product reviews, software downloads, news, analysis, and
guides.
Link: http://www.zdnet.com/
GIGAOM
Gigaom was created in 2006 by Om Malik. They devote all their efforts into
finding the newest and best in tech. News and analysis on web 2.0, startups,
gaming, social media, and everything else tech. With over 6.5 million
unique visitors every month Gigaom is trying to humanize technology and
make it approachable for everyone.
Link: https://gigaom.com/
Mashable
Mashable reports on the importance of digital innovation. Mashable has
over 42 million unique visitors monthly Mashable is truly a powerhouse.
They report on everything tech from social media, entertainment, news,
startups, and anything techies are talking about.
Link: http://mashable.com/
Wired
Wired is a full-colored monthly magazine based in the United States. They
report on emerging technologies, economics, and politics. Their magazine is
full of interesting thought provoking articles that will inspire and amaze.
Their website offers free articles and news. They cover absolutely
everything any tech savvy person could care about. Subscribe to their
magazine and you’ll learn something new every time you pick it up.
Link: http://www.wired.com/
TechCrunch
TechCrunch is one of my favorite tech sites to visit on a daily bases. Like
wired above they cover almost everything you’ll need to stay current in the
world. They do not limit themselves to tech; they delve into politics and
worldwide news. Founded in 2005 they have grown immensely.
Link: http://techcrunch.com/
DataTau
This is the most technical and Data oriented of all the resources listed in
this nugget. Datatau is like Hackernews for data science. The simple
interface feels that they are just a list of articles for bigdata/data scientists.
But the quality of content that can be found there is great.
Link: http://www.datatau.com
This list is just the crust of tech related blogs and sites. If you visit any one
from the list above you will discover more blogs and affiliates expanding
your knowledge. Keep searching and exploring for new technologies and
new skills while adding to the wealth of skills you’ve gain from reading this
book.
KEY NINJA LESSONS
Change requires flexibility. The better able you are to adapt to change,
the greater your chances of being successful.
“Enjoying success requires the ability to adapt. Only by being open to
change will you have a true opportunity to get the most from your
talent.” --Nolan Ryan.
Stay curious and adapt. Change is the only thing that will remain
constant.
CONCLUSION
As we have seen throughout; data volumes continue grow at mind
blowing rates, the demand for data crunching professionals is off the roof,
the pay for quality talent is astonishingly lucrative, the entry barriers for
beginners is remarkably easy. So, what are you waiting for? The choice is
yours, to get in, and get started.
Many companies increasingly depend on their Data Analyst’s to crunch
numbers, but they depend even more on the Data Ninja-type professionals –
like you, who can go beyond the basic aspects of crunching numbers and
understand the subtle nuances of the trade.
These are Ninja’s who can see the big picture and perform analysis or
make predictions in ways that positively affect the bottom line of their
companies.
Stuff to Blow Your Mind
To conclude, I would leave you with a few real world stories of high
performing companies that are making the best of the data boom.
The excerpts presented below is a list of companies that are leveraging
the tremendous powers of Data Analytics and are positively affecting their
bottom-lines in the process.
The point of doing this, is to have you consider yourself as being the data
ninja in charge (or one of the data ninja’s in charge) who helped make that
happened. And then, in that situation, also consider what that would mean
for your career, your ambitions, your goals and above all, your pocket book,
or take home pay check.
IBM
IBM’s work has revealed genetic traits of cancer survivors, tracked the
source of an E. coli outbreak. It recently created a visualization to help the
influential Washington, D.C.–based think tank, Institute for the Study of
War, map terrorist behavior in and around Baghdad during a campaign to
free imprisoned Al Qaeda members.
THE WEATHER COMPANY
By analyzing the behavior patterns of its digital and mobile users in 3
million locations worldwide—along with the unique climate data in each
locale—the Weather Company has become an advertising powerhouse,
letting shampoo brands, for example, target users in a humid climate with a
new antifrizz product. It’s no surprise that more than half of the Weather
Company’s ad revenue is now generated from its digital operations.
EVOLV
Evolv’s data scientists have uncovered: People with two social media
accounts perform much higher than those with more or less, and in many
careers, such as call-center work, employees with criminal backgrounds
perform better than those with squeaky-clean records. Evolv’s sales grew a
whopping 150% from Q3 2012 to Q3 2013.
GE
Over the past year, General Electric has taken the lead in tying together
what Chairman Jeff Immelt calls "the physical and analytical worlds."
Translation: GE's many machines—everything from power plants to
locomotives to hospital equipment—now pump out data about how they're
operating. GE's analytics team crunches it, then rejiggers machines to be
more efficient. Even tiny improvements are substantial, given the scale: By
GE's estimates, data can boost productivity in the U.S. by 1.5%, which over
a 20-year period could save enough cash to raise average national incomes
by as much as 30%.
(FastCompany)
Acronyms
ACID
Atomicity, Consistency, Isolation and Durability
BI
Business Intelligence
CDA
Confirmatory Data Analysis
CRM
Customer Relationship Management
CSV
Comma-separated Values
DBA
Database Administrator
DCL
Data Control Language
DDL
Data Definition Language
DIKW
Data Information Knowledge Wisdom
DML
Data Modification Language
DW
Data Warehouse
EDA
Exploratory Data Analysis
ERM
Entity Relationship Models
ETL
Extract Transform Load
HDFS
Hadoop Distributed File System
KPI
Key Performance Indicator
LOB
Line of Business
NoSQL
Not Only SQL
ODS
Operational Data Store
OLAP
Online Analytical Processes
OLTP
Online Transactional Processes
RDBMS
Relational Database Management System
SAS
Statistical Analysis Software
SQL
Structured Query Language
TCL
Transaction Control Language
XML
Extensible Markup Language
Glossary of Definitions
Data Ninja
A Data Ninja is an entry level, unspecialized, entrepreneurial individual that
works within a structured environment (usually within a company or team),
employing a variety of tools and performs a variety of tasks related to
collecting, organizing, and interpreting data to gain useful information.
Data Analysis
Data analysis is the process of finding the right data to answer your
question, understanding the processes underlying the data, discovering the
important patterns in the data, and then communicating your results to have
the biggest possible impact.
Big Data
Big data is an evolving term that describes any voluminous amount of
structured, semi-structured and unstructured data that has the potential to be
mined for information. Although big data doesn't refer to any specific
quantity, the term is often used when speaking about petabytes and exabytes
of data.
DIKW Pyramid
The DIKW Pyramid, also known variously as the "DIKW Hierarchy", is a
model used for representing structure and functional relationships between
data, information, knowledge, and wisdom. In the Model, information is
defined in terms of data, knowledge in terms of information, and wisdom in
terms of knowledge.
Database
A database (abbreviated DB) is a collection of information that is organized
so that it can easily be accessed, managed, and updated. A database
basically helps to organize a collection of information in such a way that a
computer program can quickly select desired pieces of data. Many datasets
in companies are stored and manipulated in databases.
Data Warehousing
In computing, a data warehouse (DW or DWH), also known as an
enterprise data warehouse (EDW), is a system used for reporting and data
analysis. DWs are central repositories of integrated data from one or more
disparate sources.
Dimensional Modelling
Dimensional modeling (DM) names a set of techniques and concepts used
in data warehouse design. Many dimensional models typical consists of fact
tables and lookup tables.
Predictive Analytics
Predictive analytics is the practice of extracting information from existing
data sets in order to determine patterns and predict future possibilities and
trends. It doesn't dictate the future, but it forecasts what might happen in the
future with an acceptable level of reliability, and includes what-if scenarios
and risk assessment.
Programming
Programming is the process of taking an algorithm and encoding it into a
notation, a programming language, so that it can be executed by a computer.
Programming involves activities such as analysis, developing
understanding, generating algorithms, verification of requirements of
algorithms including their correctness and resources consumption, and
implementation (commonly referred to as coding) of algorithms in a target
programming language. Programming can be done in many different
programming languages (such as C, FORTRAN, JavaScript, Lisp, Python,
Ruby, Smalltalk, etc.)
Relevant Quotes Glossary
Fru Nde is a Data Professional who is very passionate about the ROI that
companies can realize by effectively using their data asset.
As a practicing Data Ninja, Fru uses his Ninja Skills to help companies
make sense of data; be it moving, storing, or analyzing data.
He has been battle tested with extensive experience of both consulting and
working fulltime with corporations across the globe, including several
fortune 500 companies in many different core industries: Retail, Banking,
Health Services, and Food & Hospitality.
Fru is also the founder of NextGen Global (NGG), LLC a boutique
solutions firm that conducts research and provides consulting and advisory
services to organizations large and small.
NGG’s mandate and number one value is to provide solid solutions,
techniques and services that enable companies to utilize all of their
available assets (people, process and tools), to run, grow and transform their
businesses.