Essential Data Science Skills That Need To Be Mastered

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Essential Data Science Skills that need to be mastered:

 Programming

 Statistics

 Machine Learning

 Linear Algebra and Calculus

 Data Visualization

 Communication

 Data Wrangling

 Software Engineering

 Data Intuition

programming

Programming skills are required, no matter which role or company you’re interviewing for, you’re
probably going to be presumed to know how to use the tools of the trade. This sounds like a database
querying languages like SQL and a statistical programing language, like Python and R.

Statistics

Good knowledge of statistics is vital for a data scientist. You need to have an idea of distributions,
statistical tests, maximum likelihood estimators, etc. The Statistics/Math is essential for all company
types, but specifically data-driven enterprises where stakeholders will rely on your support to make
design and decisions, also evaluate experiments.

Machine Learning

If you’re at a massive company with large amounts of data or employed at a company where the data-
driven product is present. (e.g., Google Maps, Netflix, Uber), it may be the situation, where you should
already be familiar with machine learning methods. This can mean things like ensemble methods,
random forests, k- nearest neighbors, etc. It’s a fact that many of these techniques can be executed
using Python and R libraries.

Linear Algebra and Calculus

Grasping these concepts is crucial for companies where the data define the essence of the product, and
algorithm optimization or small enhancements in predictive performance can lead to the success of the
company. When you give you interview for a role in data science, your interviewer may ask you some
fundamental linear algebra questions or multivariable calculus. Or, you will be asked to derive some
statistics or machine learning results you implement elsewhere.

Data Visualization
Images often speak more efficiently than either words or numbers, so it enables a data scientist by
presenting data in a visually exciting way. This requires you to not only habituate yourself with the
principles of visualizing data efficiently but also master data visualization tools.

Communication

Data scientist must have the capacity to report technical findings with the end goal that they are
comprehensible to non-specialized partners, regardless of whether associates or corner-office execs in
the marketing department. Make your data-driven story not merely conceivable but rather convincing,
and you could propel your manager to give you a raise.

Data Wrangling

Data wrangling is also called as data munging, which is a process of mapping and transforming data from
a single raw data form into the different format. Usually, the data you analyze is challenging to work and
is going to be messy. Some of the imperfections in data include inconsistent string formatting, missing
values, and date formatting. This will be highly crucial at small companies where you’re an early data
hire.

Software Engineering

If you’re conducting an interview process at a smaller company and are one of the first hires in data
science, it is important to have a great software engineering background. You’ll be liable for handling a
lot of data logging, and potentially the development of data-driven products.

Data Intuition

Companies expect to see that you’re a problem-solver with a data-driven efficiency. At a particular point
during the interview process, you’ll likely to be questioned about some high-level problem. For example,
about a data-driven product, it may want to develop or the test, which the company may want to run.
It’s crucial to consider what things are critical, and what things aren’t.

Conclusion

These are the effective skills that will lead to a successful data science career. It’s a fantastic time to
advance in this field, as there will be a need for many data scientists in the near future.

Data Scientist Skills List and Examples


 Share
 Pin
 Share
 Email
•••
BY ALISON DOYLE

Updated March 18, 2019

“Data scientist” is a broad term that can refer to a number of types of careers.
Generally, a data scientist analyzes data to learn about scientific processes. Some job
titles in data science include data analyst, data engineer, computer and information
research scientist, operations research analyst, and computer systems analyst.

Data scientists work in a variety of industries, ranging from tech to medicine to


government agencies. The qualifications for a job in data science vary because the title
is so broad. However, there are certain skills employers look for in almost every data
scientist. Data scientists need strong statistical, analytical, and reporting skills.

Here's a list of data scientist skills for resumes, cover letters, job applications, and
interviews. Included is a detailed list of the five most important data scientist skillsets,
accompanied by lists of related skills and work responsibilities.

Tips for Using a Skills List

A key part of creating a resume and cover letter that gets noticed by employers is to
incorporate as many job-specific keywords and keyword phrases as possible. This is
because employing companies now frequently use automated applicant tracking
systems (ATS systems) to provide first-stage analysis of the job applications they
receive. The more keywords your resume contains, the more likely it is to pass the first
cut by the ATS system and, ultimately, reach the human eye of a hiring manager.

The terms listed here are among the most frequently-sought keywords programmed into
ATS systems and utilized in job advertisements for data scientists. Thus, you should try
to incorporate many of these keyword phrases into your resume – in an initial summary
of qualifications, in your work history section, and in a tech table describing your
hardware and software skills.
You should also describe your command of the most important of these skills in
your cover letter and, eventually, during your personal interviews. Be sure to enhance
these descriptions with specific examples of how you have utilized each skill in a work
or training setting.

Your best guide as to which of these keywords you should include is the job description
to which you are applying. Each job you apply to will require different skills and
experiences, so make sure you read the job description carefully and focus on the skills
listed by the employer, tailoring each resume and cover letter you submit to the
qualifications requested by different employers.

Top Five Data Scientist Skills

Analytical
Perhaps the most important skill for a data scientist is to be able to analyze information.
Data scientists have to look at, and make sense of, large swaths of data. They have to
be able to see patterns and trends in the data and explain those patterns. All of this
takes strong analytical skills.

 Analytical Tools
 Analytics
 Big Data
 Constructing Predictive Models
 Creating Controls to Assure Accuracy of Data
 Critical Thinking
 Data
 Data Analysis
 Data Analytics
 Data Manipulation
 Data Wrangling
 Data Science Tools / Data Tools
 Data Mining
 Evaluating New Analytical Methodologies
 Interpreting Data
 Metrics
 Mining Social Media Data
 Modeling Data
 Modeling Tools
 Producing Data Visualizations
 Research
 Risk Modeling
 Testing Hypotheses

Creativity
Being a good data scientist also means being creative. Firstly, you have to use creativity
to spot trends in data. Secondly, you need to make connections between data that
might seem unrelated. This takes a lot of creative thinking. Finally, you need to explain
this data in ways that are clear to the executives at your company. This often requires
creative analogies and explanations.

 Adaptability
 Conveying Technical Information to Non-Technical People
 Decision Making
 Decision Trees
 Executing in a Fast-Paced Environment
 Logical Thinking
 Problem Solving
 Working Independently

Communication
Data scientists not only have to analyze data, but they also have to explain that data to
others. They must be able to communicate data to people, explain the importance of
patterns in the data, and suggest solutions. This involves explaining complex technical
issues in a way that is easy to understand. Often, communicating data requires visual,
oral, and written communication skills.
 Assertiveness
 Collaboration
 Consulting
 Cultivating Relationships with Internal and External Stakeholders
 Customer Service
 Documenting
 Drawing Consensus
 Facilitating Meetings
 Leadership
 Mentoring
 Presentation
 Project Management
 Project Management Methodologies
 Project Timelines
 Providing Guidelines to IT Professionals
 Reporting
 Supervisory Skills
 Training
 Verbal Communications
 Writing

Mathematics
While soft skills like analysis, creativity, and communication are important, hard
skills are also critical to the job. A data scientist needs math skills, particularly in
multivariable calculus and linear algebra.

 Algorithms
 Creating Algorithms
 Information Retrieval Data Sets
 Linear Algebra
 Machine Learning Models
 Machine Learning Techniques
 Multivariable Calculus
 Statistics
 Statistical Learning Models
 Statistical Modeling

Programming and Technical Proficiencies


Data scientists require basic computer skills, but programming skills are particularly
important. Being able to code is critical to almost any data scientist position. Knowledge
of programming languages such as Java, R, Python, or SQL is essential.

 AppEngine
 Amazon Web Services (AWS)
 C++
 Computer Skills
 CouchDB
 js
 ECL
 Flare
 Google Visualization API
 Hadoop
 HBase
 Java
 Matlab
 Microsoft Excel
 Perl
 PowerPoint
 Python
 R
 js
 Reporting Tool Software
 SAS
 Scripting Languages
 SQL
 Tableau

Job Outlook for Data Scientists

According to the Bureau of Labor Statistics, 27,900 people were employed as computer
and information research scientists in 2016; their median annual wage in 2017 was
$114,520. Career opportunities in this field are anticipated to grow 19 percent by 2026,
much faster than average.

Role of a Data Scientist


SequoiaFollow
Apr 17

Building a data-informed product requires a high-functioning data


organization, which in turn requires highly effective data
professionals. These are the people who will build and shape the
company’s data culture, which informs product strategy and
shapes key decisions across the company.

The composition of a data organization varies based on the


maturity of the company and data team’s mandate, but it generally
includes some combination of data engineers, infrastructure
architects, machine-learning engineers, product analysts, and data
scientists.

In this post, we will cover the different roles that data scientists fill
and the skills required to do each job well. We will also cover some
of the common pitfalls and myths associated with hiring data
scientists. We will cover other roles, such as data engineers, and
the skill sets they require in future posts.

THE FIVE CORE SKILLS


There are five core skills and abilities that all good data scientists
need. They should be able to deconstruct and identify the
components of complex business problems, they should have the
technical skills to extract and manipulate data, they must possess
an analytical ability that enables them to extract value from the
data, and finally, they must be able to clearly synthesize and
communicate the results of their analysis. Here’s how these core
skills are connected:

Let’s take a closer look at each skill.

1. Problem formulation. Data scientists must be able to


formulate and structure problems. This requires
deconstructing complex business problems into their
constituent pieces by asking the right business questions. Much
of the process behind asking questions requires curiosity,
which leads to hypothesis generation. Next, these business
questions can be posed as a set of technical problems.
2. Technical ability. Once business questions have been posed
as technical problems, technical skills like coding, statistics and
quantitative abilities are needed to extract data. This process
may be iterative as some of the questions asked are not
answerable due to multiple reasons including unavailability of
data.
3. Analytical ability. Once all the data is in place, data scientists
need analytical skills to extract and manipulate data sets, and
to extract value from the data in the form of tables, charts, etc.
4. Synthesis. Although outputs from data analyses are numbers
and figures, data scientists need to connect all of the
information their analysis has produced back to their original
problem formulation questions. They need to interpret the
results, simplify and synthesize. The output at this stage is
simplifying to the fewest images, tables, and numbers.
5. Influence. Connecting the business problems to specific
actionable insights (decisions) and influencing these decisions
by storytelling is important for creating impact. Telling a
compelling story can be oral or written or a combination of the
two. Being able to tell the entire story succinctly (one-pager);
notate what really matters (an executive summary); and clearly
articulating the outcomes rather than inputs are all important
skills needed.

The skills required for the generalist will suffice for most types of
problems. However, there are specific types of analytical problem
that may require some specialization. Even for the specialists, the
role and responsibilities is a relatively small change from the
generalist and in most cases the emphasis of some skills over
others is all one may need.

DATA SCIENTIST ROLES AND


RESPONSIBILITIES
The role of a data scientist depends on the type and maturity of the
product they work on. In the early stages of product development,
all data scientists have similar functions, and they are primarily
focused on setting up the computational and analytical
infrastructure. As the product evolves, data scientists’ roles change
depending on the needs of the product team.

Generally speaking, data scientists fall into six categories:

1. Product generalists who are generic problem solvers


working across product issues you may encounter
2. Early product analyst to determine product market fit for a
nascent product
3. Growth analyst to move a metric
4. Core marketplace analyst to ensure the healthy liquidity on
your platform
5. Ecosystem analyst to identify competitive threats and
strategic opportunities
6. Machine-learning analyst to ensure healthy operation of
the algorithms that power your product

Product Generalist
Unsurprisingly, product generalists are the most frequently hired
data scientists because their broad skill sets enable them to take
on a wide range of functions and problems. The primary focus of
product generalists is to inform, influence, support, and execute
product decisions. At a high level, they help set goals, roadmap,
and strategy for products, and execute on product operations.
More specifically, product generalists do the following:

Define product success by determining and evaluating key


metrics and goals for the product team.
Monitor product health by building dashboards and reports,
understand root causes of changes in metrics, and propose courses
of action. This includes:

 Ensuring that the right metrics are tracked (e.g., measurement


of users, messages, posts, purchases).
 Building a robust infrastructure to support data analysis.
 Ensuring data integrity by verifying that raw and derived fields
are accurate so that metrics are correctly counted.
 Monitor key performance indicators (KPIs) via dashboards or
other reporting tools.
 Diagnose issues and propose solutions, including with respect
to setting targets, forecasting, and investigating anomalies, as
well as understanding drivers of metric changes and diagnosing
the underlying causes of those changes. (Are they behavioral
changes, mix shift, data issues, product changes or related to
seasonality?)

Design, evaluate, and ship experiments and products.

 Work closely with marketing, design, product, and engineering


teams to design the right experiments and quantify the impact
of existing product features and future changes (A/B testing),
and then make recommendations based on the findings.
 Work with the data engineering team to develop and
implement new analytical tools and modules, and to scale
analytics. Help build product roadmaps in partnership with the
data engineering and data infrastructure teams.

Set product roadmaps and strategy

 Build key data sets to empower exploratory analysis that helps


set product roadmaps.
 Run exploratory analyses (analyze and interpret trends or
patterns to develop a thorough understanding of products,
users, and acquisition channels) to uncover issues and new
areas of opportunity, generate hypotheses, and prioritize
product changes and improvements.
 Influence product teams by presenting data-based
recommendations.

Early Product Analyst


The primary focus of early product analysts is on identifying
whether there is product market fit and if so, what the
characteristics would be of the product’s users who love the
product. The key to leveraging the expertise of early product
analysts is to build the right infrastructure so that they can answer
these questions in a scalable way.

Below are the roles and responsibilities of an early product


analyst. Many overlap with those of product generalists. The key
difference is that their focus is on defining and tracking the right
metrics and ensuring data integrity rather than on setting goals
and experimentation, which come at later stages of product
development.

 Monitor product health.


 Define product success by setting the right KPIs for the
product/business.
 Identify whether there is early product market fit through
exploratory data analysis.
 Help drive the early product roadmap by building a persona of
the ideal product user for the product team. Deeply understand
their characteristics through behavioral analysis. Generally,
much of the analysis at this stage is bottom-up rather than top-
down since there is far less data to perform top-down user
segmentation.

An early product analyst should have enough technical proficiency


to understand the basics of data pipelines, storage, and software
engineering. Some also strive to automate their analyses and data
pipelines, creating enduring value from their work. The impact of
even the most technically proficient early product analyst is
blunted, however, without certain non-technical skills, including
the ability to ask the right questions in the context of the product,
and the ability to tie analytical results to actions — delivering not
just interesting but useful insights. This is where the early product
analyst’s skills converge with those of the product generalist.

Growth Analyst
The primary focus of growth analysts is to move metrics. These
metrics may measure data around users, developers, payers,
advertisers, content creators, or anything valuable for the
business. Ultimately, this is done by deeply understanding any
phenomena, uncovering issues and opportunities related to the
problem space, identifying key drivers of the issues, and
recommending improvements. Specifically, a growth analyst needs
to:

 Define product success and monitor the health of a product,


including identifying and tracking the right metrics as well as
building growth accounting funnels to understand conversion
and opportunities.
 Set product goals and roadmap, and optimize the product in
line with both.
 Build a data-informed culture of growth culture within the
company.
A growth analyst must be highly quantitative and have their
fingers on the pulse of the business by deeply understanding the
drivers of changes in the business. They also need strong growth
marketing mindsets, the ability to run exploratory analysis and
identify roadmaps, and an iterative approach that allows them to
continuously make small improvements that compound over time.
A strong growth analyst should also have knowledge of statistics
and experimental design since good growth teams have a strong
test-and-learn philosophy.

Machine-Learning Analyst
The primary role of a machine-learning analyst is to identify
opportunities to improve products through machine learning.
Their primary role is not to build models, but instead to monitor
it’s health and suggest recommendations. They do this by
identifying root causes and suggesting areas for improvement,
including data quality, adding new features, improving algorithms,
and determining the right objective functions and tradeoffs.
Specifically

 Define success by proposing the right objective function. With


the wrong objective function, one would not be able to truly
reach success in the product.
 Monitor the health of a model by identifying and tracking the
right metrics, and building frameworks to conduct a root cause
analysis. A mutually exclusive collectively exhaustive (MECE)
framework for conducting gap analysis on model performance
(measuring the difference between reality and model
expectations) is valuable. The drivers of changes in model
performance can be determined and connected to their root
cause, which are generally data quality, operational efficiency,
algorithms, and feature engineering.
 Set goals and product roadmap is often set by identifying
opportunities from the root cause analysis framework.
 Improve decisions by building explainability and determining
the right tradeoffs. Despite widespread adoption, machine
learning models remain mostly black boxes. Understanding the
reasons behind predictions is valuable for transparency,
improving the predictions and making business decisions. One
of the business decisions are tradeoffs typically between
exploring and exploiting or between changing recall by altering
precision. Machine-learning analysts must have a principled
approach and build explainability to determining which
tradeoffs are necessary to scale a product/company.

In addition to all of the skills required for a product generalist, a


machine-learning analyst needs to be good at statistics, machine
learning, coming up with frameworks, root cause analysis, and
optimization.

Marketplace Analyst
The primary role of marketplace analysts is to maximize the value
of the marketplace by improving its efficiency. Many consumer
technology companies can be thought of as two-sided
marketplaces. These create value primarily by enabling direct
interactions between two (or more) distinct types of affiliated
customers. Many products, including PayPal, eBay, Uber, and
YouTube, are two-sided marketplaces.

The marketplace problem can be simplified to three parts —


 supply, demand and liquidity. Marketplace analysts focus on
identifying opportunities by understanding the drivers of each.
The marketplace team would require three types of analysts.
 Growth analyst — Supply and demand can each individually be
posed as growth problems. For example, there is a metric on
the supply side (say, number of drivers) and another on the
demand side (number of riders) that a growth analyst would try
to move.
 Machine learning analyst — The machine learning team would
also need to optimize routing using matching algorithms. Thus,
a marketplace team would also consist of machine-learning
analysts.
 Core marketplace analyst — The marketplace team would also
need a core marketplace analyst whose role is to understand
the interactions between the supply and demand sides of the
marketplace and to improve the liquidity and efficiency of the
marketplace overall.

The Core Marketplace analysts must be adept at the following:

 Monitoring the health of product and define product success by


identifying and tracking the right metric that connects the
supply and demand, for example the sell-through metric and
setting the right goals.
 Set product roadmaps by understanding the utilization of
supply/demand and determining areas where one or both are
constrained or under-optimized.

On top of all of the skills that product generalists should have, a


core marketplace analyst needs a deep understanding of
economics (especially supply and demand), optimizations,
network effects and marketplace dynamics.

Ecosystem Analysts
Ecosystem analysts help drive business and product strategy by
analyzing market trends and educating product leaders on their
product’s market landscapes. These market trends can be internal
(e.g., how users of an internal product are embracing mobile
more) or external (e.g., the effects of a competitor). Specifically, an
ecosystem analyst:

 Sets product roadmaps by providing key insights on market


trends, customer behaviors, and competitive moves to help
drive product roadmaps.
 Drives product strategy by: 1)Building business cases for
specific product initiatives based on a deep understanding of
how different parts of the ecosystem interact. (For example,
knowing that content production ultimately drives active usage
could lead to a recommendation to focus on content
production.) 2) Performing competitive monitoring and
analysis and identifying key opportunities by performing
market research and competitive landscape analysis and
creating relevant benchmarks.
 Drives business strategy by building the case for new business
initiatives, with a focus on market, opportunity sizing, and
product synergy. (For example, knowing that mobile usage in
teens is increasing exponentially could lead to recommending a
new mobile products for teens.)
 Identifies potential M&A targets by constantly monitoring
competition and areas of strategic interest.

Good ecosystem analysts have a deep understanding of the


domain, an ability to communicate their insights effectively to
multiple cross-functional partners, and the skills to perform
market and competitive research.
TAKEAWAYS
 The role of a data scientist is to leverage insights from data
analysis to help drive product decisions.
 There are six types of data scientists — product generalists,
early product analyst, growth analyst, core marketplace
analyst, ecosystem analyst and machine-learning analyst.
 These data scientists have five core skills — Problem
formulation, Technical ability, Analytical ability, Synthesis and
Influence.

This work is a product of Sequoia Capital’s Data Science team.


Chandra Narayanan, Hem Wadhar and Ahry Jeon wrote this
post. See the full data science series here. Please email data-
science@sequoiacap.com with questions, comments and other
feedback.

You might also like