Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

DATA SCIENCE

DSC 302
The Ascendance of Data
• We live in a world that’s drowning in data. Websites track every user’s every click.
Your smartphone is building up a record of your location and speed every second of
every day. “Quantified selfers” wear pedometers-onsteroids that are always
recording their heart rates, movement habits, diet, and sleep patterns.
• Smart cars collect driving habits, smart homes collect living habits, and smart
marketers collect purchasing habits. The internet itself represents a huge graph of
knowledge that contains (among other things) an enormous cross-referenced
encyclopedia; domain-specific databases about movies, music, sports results,
pinball machines, memes, and cocktails; and too many government statistics (some
of them nearly true!) from too many governments to wrap your head around.
• Buried in these data are answers to countless questions that no one’s ever thought
to ask.
Definition of terms
• Data science is a process of collecting, cleaning, analyzing, and interpreting data to
extract valuable insights and information, often using statistical methods and
machine learning techniques.
• Data science can also be defined as the field of study that uses scientific methods,
processes, algorithms, and systems to extract knowledge and insights from
structured and unstructured data.
• Data science is like a toolbox that uses different methods and tools to make sense
of information from data. It helps in making decisions, recognizing patterns, and
predicting outcomes in various fields. It's important because it helps us find
valuable insights from the huge amounts of data we have in our digital world today.
• A data scientist is someone who extracts insights from messy data. Today’s world is
full of people trying to turn data into insight.
Intro…
• For instance, the dating site OkCupid asks its members to answer
thousands of questions in order to find the most appropriate matches for
them. But it also analyzes these results to figure out innocuous-sounding
questions you can ask someone to find out how likely someone is to sleep
with you on the first date. Facebook asks you to list your hometown and
your current location, ostensibly to make it easier for your friends to find
and connect with you. But it also analyzes these locations to identify
global migration patterns and where the fanbases of different football
teams live. As a large retailer, Target tracks your purchases and
interactions, both online and in-store. And it uses the data to predictively
model which of its customers are pregnant, to better market baby-related
purchases to them.
Intro…
• In 2012, the Obama campaign employed dozens of data scientists who
data mined and experimented their way to identifying voters who
needed extra attention, choosing optimal donor-specific fundraising
appeals and programs, and focusing get-out-the-vote efforts where
they were most likely to be useful. And in 2016 the Trump campaign
tested a staggering variety of online ads and analyzed the data to find
what worked and what didn’t. Now, before you start feeling too jaded:
some data scientists also occasionally use their skills for good—using
data to make government more effective, to help the homeless, and
to improve public health. But it certainly won’t hurt your career if you
like figuring out the best way to get people to click on advertisements.
Key concepts in Data Science
Several important concepts form the foundation of data science. Understanding these
concepts is crucial for anyone entering the field. Here are some key concepts in data
science:
• Data:
• Data is the raw information that data scientists work with. It can be structured (organized in a
specific format) or unstructured (not organized in a pre-defined way), and it can come in various
types such as text, numbers, images, or videos.
• Statistics:
• Statistics involves the collection, analysis, interpretation, presentation, and organization of data.
It is fundamental to understanding patterns and making predictions from data.
• Machine Learning:
• Machine learning is a subset of artificial intelligence that focuses on creating systems that can
learn and make predictions or decisions without being explicitly programmed. It includes
supervised learning, unsupervised learning, and reinforcement learning.
Key concepts in Data Science
• Programming:
• Proficiency in programming languages like Python and R is essential for data
scientists. These languages are commonly used for data analysis, manipulation,
and modeling.
• Data Cleaning and Preprocessing:
• Data cleaning involves identifying and correcting errors or inconsistencies in the
data. Preprocessing includes tasks like normalization, scaling, and feature
engineering to prepare the data for analysis.
• Exploratory Data Analysis (EDA):
• EDA is the process of visually and statistically exploring datasets to summarize
their main characteristics, often using techniques like histograms, scatter plots,
and summary statistics.
Key concepts in Data Science
• Data Visualization:
• Data visualization is the representation of data in graphical or visual formats,
such as charts or graphs, to help communicate insights effectively.
• Big Data:
• Big data refers to extremely large and complex datasets that traditional data
processing methods may struggle to handle. Technologies like Apache
Hadoop and Apache Spark are used to manage and process big data.
• Domain Knowledge:
• Understanding the specific industry or domain you are working in is crucial.
Data scientists need domain knowledge to interpret results correctly and
provide meaningful insights.
Key concepts in Data Science
• Model Evaluation and Validation:
• It's essential to assess the performance of machine learning models.
Evaluation metrics and validation techniques help determine how well a
model generalizes to new, unseen data.
• Feature Importance:
• In machine learning, identifying which features (variables) are most influential
in predicting the outcome is important. This helps in understanding the key
factors driving the model's predictions.
These concepts provide a solid foundation for data science. Continuous
learning and staying updated on emerging technologies and techniques
are also important in this dynamic field.
Introduction to Python
• Install the Anaconda distribution, which already includes most of the
libraries that you need to do data science.
• Launch the Jupyter notebook.
• Start writing code…
• The Python language design is distinguished by its emphasis on
readability, simplic‐ ity, and explicitness.
Language Semantics
Print
• In Python, the print() function is used to display output. Here's a simple example:
print("Hello, World!")

Variables
A Variable is a storage space/container for storing data in a program.
In Python the data types include:
Number- Whole Number e.g 1, 57, 9056
Strings – Collection of characters e.g “Name” , “City”
Characters – letters e.g “h”, “k”
Floating point – Decimals e.g 2.345, 500.09062
Variables
Rules for naming variables:
1. The name may contain letters, numbers and symbols.
2. The name should not start with a number.
3. The name should not have white space.
4. The name cannot be a key word in Python.
Declaring Variables
In Python, you can declare variables by assigning values to them. Unlike
some other programming languages, you don't need to explicitly declare
the data type of a variable; Python infers it dynamically. Here are some
examples:
Declaring Variables
• # Assigning a value to a variable
• my_variable = 10

• # Assigning a string to another variable


• name = "Alice"

• # Assigning a floating-point number to a variable


• pi_value = 3.14
Language Semantics
Indentation, not braces
• Python uses whitespace (tabs or spaces) to structure code instead of using
braces as in many other languages like R, C++, Java, and Perl.
• Consider code below:
x=5
y=7
if x < y:
x += 1
x
In most of the other programming languages the if statement has braces.
Language Semantics
Comments
In Python, you can add comments to your code to provide explanations
or notes for yourself or other developers. Comments are not executed
by the Python interpreter; they are ignored during runtime. Here are
the ways to add comments in Python:
1. Single line comments
# This is a single-line comment
variable = 42 # You can also add comments at the end of a line of
code
Comments
2. Multi-line comments:
• In Python, there is no direct syntax for multi-line comments. However,
you can use triple-quotes (''' or """) as a workaround to create a multi-
line string, and then leave it as a comment. This string will not be
assigned to any variable, so it won't affect your code.
Comments
'''
This is a multi-line comment.
It spans across multiple lines.
'''

"""
Another way to create a multi-line comment.
Use triple-quotes at the beginning and end.
"""
Language Semantics
Functions
In Python, a function is a block of reusable code that performs a
specific task or set of tasks.
Functions help in organizing and modularizing code, making it more
readable and maintainable.
The rules for naming a function are the same as those for naming a
variable.
Here's a basic overview of defining and using functions in Python:
Defining a Function:
def greet(name):
"""This function prints a simple greeting."""
print(f"Hello, {name}!")
# The 'def' keyword is used to define a function.
# 'greet' is the function name.
# '(name)' is the parameter the function takes.
# The triple-quoted string is a docstring that describes the function.
# The actual code of the function is indented under the 'def' statement.
# The 'print' statement is the body of the function.
# To end the function, unindent the code.
# Example call to the function:
greet("Alice")
Function parameters
• Function parameters can also be given default arguments, which only
need to be specified when you want a value other than the default:
def my_print(message = "my default message"):
print(message)
my_print("hello") # prints 'hello'
my_print() # prints 'my default message'

You might also like