Professional Documents
Culture Documents
Introduction To Datascience
Introduction To Datascience
Introduction to
Data Science
What is Data Science?
What is Data Science Methodology?
Getting Started with Data Science. Making Sense of Data with Analytics by Murtaza Haider.
IBM Press: Pearson plc
The data science handbook by Field Cady. Wiley 2017.
Steps to Enroll in online course #2: Data Science Methodology? Steps to Enroll:
Visit https://www.coursera.org/learn/data-science-methodology?specialization=introduction-data-science
Chapter 1: Introduction to data science
In this course, we'll unpack the what, why, and how of data science. By the end of this course, you'll have
a better grasp of how data is being used around you and how you can use data.
Data science is the field of exploring, manipulating, and analysing data, and using data to answer
questions or make recommendations.
Everyone you ask will give you a slightly different description of what Data Science is, but most people
agree that it has a significant data analysis component.
Analyze structured and unstructured data from many sources, and depending on the nature of the
problem, they can choose to analyze the data in different ways.
Using multiple models to explore the data reveals patterns and outliers; sometimes, this will
confirm what the organization suspects, but sometimes it will be completely new knowledge,
leading the organization to a new approach.
When the data has revealed its insights, the role of the data scientist becomes that of a storyteller,
communicating the results to the project stakeholders.
Chapter 1: Introduction to data science
Data scientists can use powerful data visualization tools to help stakeholders understand the
nature of the results, and the recommended action to take.
Data Science is changing
Curiosity is absolute must. If you're not curious, you would not know what to do with the data.
So, curiosity being able to take a position, strong position, and then moving forward with it.
Judgmental because if you do not have preconceived notions about things you wouldn't know
where to begin with.
Argumentative because if you can argument and if you can plead a case, at least you can start
somewhere and then you learn from data and then you modify your assumptions and hypotheses
and your data would help you learn. And you might start at the wrong point. You may say that I
thought I believed this, but now with data I know this. So, this allows you a learning process.
The other thing that the data scientist would need is some comfort and flexibility with analytics platforms:
some software, some computing platform, but that's secondary. The most important thing is curiosity and
the ability to take positions. Once you have done that, once you've analyzed, then you've got some
answers.
And that's the last thing that a data scientist need, and that is the ability to tell a story. That once you
have your analytics, once you have your tabulations, now you should be able to tell a great story from it.
Because if you don't tell a great story from it, your findings will remain hidden, remain buried, nobody
would know. But your rise to prominence is pretty much relying on your ability to tell great stories. A
starting point would be to see “what your competitive advantage is”.
Let's say you want to be a data scientist and work for an IT firm or a web-based or Internet based firm,
then you need a different set of skills.
And if you want to be a data scientist in the health industry, then you need different sets of skills.
So, figure out first what you're interested, and what is your competitive advantage. Your
competitive advantage is not necessarily going to be your analytical skills. Your competitive
advantage is your understanding of some aspect of life where you exceed beyond others in
understanding that. Maybe it's film, maybe it's retail, maybe it's health, maybe it's computers.
Once you've figured out where your expertise lies, then you start acquiring analytical skills.
What platforms to learn and those platforms, those tools would be specific to the industry that
you're interested in.
Chapter 1: Introduction to data science
And then once you have got some proficiency in the tools, the next thing would be to apply your
skills to real problems, and then tell the rest of the world what you can do with it.
Quiz:
The three important qualities to possess in order to succeed as a data scientist are:
Curious.
Judgmental.
Argumentative.
Proficient in Programming.
Summary:
Data science is the study of large quantities of data, which can reveal insights that help organizations make
strategic choices.
There are many paths to a career in data science; most, but not all, involve a little math, a little science, and
a lot of curiosity about data.
New data scientists need to be curious, judgemental and argumentative.
Why data science is considered the sexiest job in the 20th century, paying high salaries for skilled workers.
Chapter 1: Introduction to data science
Why now?
So now we know what data science is. The next question is why is it so popular? The answer is pretty
obvious: we're collecting more data than ever before. Suppose that you visit a car dealership and fill out
some information. All of that data is automatically entered into a computer, and combined with the data
from hundreds of dealerships into one big database.
Once we have that data, it's easy to use the email address that you provided when you bought that car to
tie your car purchase data with your data from social media or web browsing. Suddenly, we have a very
complete picture about everyone who purchased a car in the last year: their ages, their likes and dislikes,
their friends and family. This additional data can be used to predict what price you can pay for your car,
what other purchases you're likely to make, or how best to sell you insurance for that new car. Data is
everywhere, and it is incredibly valuable information for businesses, organizations, and governments.
At this point, data is in its raw form, so the next step is to prepare data. This includes "cleaning data", for
instance finding missing or duplicate values, and converting data into a more organized format.
Then, we explore and visualize the cleaned data. This could involve building dashboards to track how the
data changes over time or performing comparisons between two sets of data.
Finally, we run experiments and predictions on the data. For example, this could involve building a
system that forecasts temperature changes or performing a test to find which web page acquires more
customers.
Chapter 1: Introduction to data science
To answer this question, you might start by gathering information about each purchase, such as the
amount, date, location, purchase type, and card-holder's address. You'll need many examples of
transactions, including this information, as well as a label that tells you whether each transaction is valid
or fraudulent. Luckily, you probably have this information in a database. These records are called
"training data", and are used to build an algorithm. Each time a new transaction occurs, you'll give your
algorithm information, like amount and date, and it will answer the original question: What is the
probability that this transaction is fraudulent?
Chapter 1: Introduction to data science
We could express the picture as a matrix of numbers where each number represents a pixel. However, this
approach would probably fail if we fed the matrix into a traditional machine learning model. There's
simply too much input data!
Deep learning
We need more advanced algorithms from a subfield of machine learning called deep learning. In deep
learning, multiple layers of mini-algorithms, called "neurons", work together to draw complex
conclusions. Deep learning takes much, much more training data than a traditional machine learning
model, but is also able to learn relationships that traditional models cannot. Deep learning is used to solve
data-intensive problems, such as image classification or language understanding.
1. Data engineer
Data engineers control the flow of data: they build custom data pipelines and storage systems. They
design infrastructure so that data is not only collected, but easy to obtain and process. Within the data
science workflow, they focus on the first stage: data collection and storage.
2 Data analyst
Data analysts describe the present via data. They do this by exploring the data and creating visualizations
and dashboards. To do these tasks, they often have to clean data first. Analysts have less programming
and stats experience than the other roles. Within the workflow, they focus on the middle two stages: data
preparation and exploration and visualization.
3. Data scientist
Data Scientists have a strong background in statistics, enabling them to find new insights from data,
rather than solely describing data. They also use traditional machine learning for prediction and
forecasting. Within the workflow, they focus on the last three stages: data preparation and exploration
and visualization, and experimentation and prediction.
Review:
Here's a recap what we've covered. It may be intimidating to see all these tools and languages, but they
aren't as difficult to learn as spoken languages. If you know English, it may take you years to learn
French. Programming languages are more similar to power tools. If you know how to use a power drill,
you don't necessarily know how to use an electric saw, but you can learn with a little training!
Chapter 1: Introduction to data science