Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 52

Data science for service change

“Information is the oil of the 21st century, and analytics is the


combustion engine.” – By Peter Sondergaard
DATA
Anything with information

Information sources and digital world


intertwined with almost every person's
life got big.

Welcome to Big Data.

A data scientist is the person who connects


the dots between the business world and the
data world. Similarly, data science is the craft
that a data scientist utilizes to make this
happen.
Data, data everywhere…

1 Zettabyte 1.8 ZB 8.0 ZB

logarithmic scale
800 EB

Data produced each year


161 EB

5 EB
1 Exabyte

120 PB

100-years of HD video + audio


60 PB

1 Petabyte Human brain's capacity


14 PB

1 Petabyte == 1000 TB 2002 2006 2009 2011 2015


1 TB = 1000 GB
References

(2015) 8 ZB: http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf (2002) 5 EB: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm


(2011) 1.8 ZB: http://www.emc.com/leadership/programs/digital-universe.htm (life in video) 60 PB: in 4320p resolution, extrapolated from 16MB for 1:21 of 640x480 video
(2009) 800 EB: http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf (w/sound) – almost certainly a gross overestimate, as sleep can be compressed significantly!

(2006) 161 EB: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf (brain) 14 PB: http://www.quora.com/Neuroscience-1/How-much-data-can-the-human-brain-store


I'd call it data,
not information

wisdom

knowledge

information

data
3 Vs of Big Data
TYPES OF DATA
Types Of Data
Binary Data
• Binary data is a type of data that is represented or displayed in the binary numeral
system. Binary data is the only category of data that can be directly understood and
executed by a computer. It is numerically represented by a combination of zeros and
ones.
NOMINAL DATA
• In statistics, nominal data (also known as nominal scale) is a type of data that is used to
label variables without providing any quantitative value. It cannot be ordered or
measured.

Examples of Nominal Scales


Ordinal Data
• In statistics, ordinal data are the type of data in which the data values follow a natural
order. One of the most notable features of ordinal data is that the differences between
the data values cannot be determined or are meaningless. Generally, the data categories
lack the width representing the equal increments of the underlying attribute.

Example of Ordinal Scales


Interval
• Interval scales are numeric scales in which we know both the order and the exact
differences between the values. The classic example of an interval scale
is Celsius temperature because the difference between each value is the same. For
example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the
difference between 80 and 70 degrees.

Example of Interval Scale


Ratio
• Ratio scales tell us about the order, they tell us the exact value between units, and they
also have an absolute zero–which allows for a wide range of both descriptive and
inferential statistics to be applied.

This Device Provides Two Examples of Ratio Scales


(height and weight)
Data Types Summary
• In summary, nominal variables are used to “name,” or label a series of values. Ordinal scales provide good
information about the order of choices, such as in a customer satisfaction survey. Interval scales give us
the order of values + the ability to quantify the difference between each one. Finally, Ratio scales give us
the ultimate–order, interval values, plus the ability to calculate ratios since a “true zero” can be defined.

Summary of data types and scale measures


WIKIPEDIA….
Data Science is the science which uses computer science, statistics and machine learning,
visualization and human-computer interactions to collect, clean, integrate, analyze,
visualize, interact with data to create data products
What is data science?

Data Science Service Change


Applying advanced Converting new data
statistical tools to insights into (often
existing data to small) changes to
generate new insights business processes

Smarter Work
More efficient and effective use of staff and resources
Data Science
• Data science, known as the data driven science makes use of scientific
methods, processes, algorithms and systems to extract knowledge or
insights with the goal to discover hidden patterns from the raw data.
• Data Scientists help in processing, cleaning, mining of data for your analysis.
He makes predictive models based on machinery algorithm.
Lets take an example….
BUSSINESS DATA DATA CLEANING DATA EXPLORATION
REQUIREMENT COLLECTION AND ANALYSIS

DEPLOYMENT DATA DATA


AND VALIDATION MODELLING
OPTIMIZATION
Applications: The Internet Search
Applications: Digital Advertisement
Applications: IRCTC
Applications: Image/ Speech Recognition
Applications: Gaming
Applications: A lot more...
• Price comparison websites
• Banking
• Social networking
• Airline Route Planning
• Fraud Risk Detection
• Delivery logistics
• Marketing
• Finance
• Human Resources
• Health Care
• Government Policies
• every possible industry where data gets
generated………
Examples: Free fire alarms in New Orleans
Service Issue

Fire alarms to homes


that have them

Data Science

ID homes with high prob.


of no alarm

Service Change

Use list to shape


outreach

Result

2x increase in hit rate


Examples: Use of force alerts in Charlotte
Service Issue

Excessive force have neg.


impact on community

Data Science

Identify patterns to
refine early warning

Service Change

Flagged recurring
complaints

Result

Accuracy up 20%; False


positives down 55%
Examples: Chicago Pest Control
Service Issue

Challenging to predict
outbreaks

Data Science

Analyze data associated


with outbreaks

Service Change

Proactive targeting of
leading indicators

Result

15% drop in requests for


service
DPH WIC: Help moms and babies stay in
nutrition program
Service Issue

Since 2011, DPH has seen an increase in mothers


dropping out of their nutrition program. Which moms
are most at risk of dropout?

Data Science

Built a predictive model that identified moms and infants


who are at greatest risk for dropping out

Service Change

Using the high-risk client profiles to conduct targeted


interviews to identify program barriers and make service
changes

Result

Expected: Reduce the dropout rate of moms, infants and Flag “stuff” early
children, leading to healthier outcomes for both
Full write up at datasf.org/showcase/datascience/
ART: Preserve City art for the future
Service Issue

The Arts Commission needs to accurately and efficiently


project long-term costs to budget for art preservation

Data Science

Revised cost formula and new tool to provide long-term


projections and prioritization of conservation projects on
demand

Service Change

Use tool to model cost scenarios instead of manual, one


time process

Result

Expected: Reduction in staff time, more accurate cost Optimize your resources
estimates, and earlier identification of pieces in need of
conservation
Full write up at datasf.org/showcase/datascience/
INSTALLING ANACONDA
• Installing on Windows
Download the Anaconda installer.
• Double click the installer to launch.
• Click Next.
• Read the licensing terms and click “I Agree”.
• Select an install for “Just Me” unless you’re installing for all users
(which requires Windows Administrator privileges) and click Next.
• Select a destination folder to install Anaconda and click the Next
button.
Choose whether to add Anaconda to
your PATH environment variable. We
recommend not adding Anaconda to the
PATH environment variable, since this
can interfere with other software.
Instead, use Anaconda software by
opening Anaconda Navigator or the
Anaconda Prompt from the Start Menu.
• Choose whether to register Anaconda as your default
Python. Unless you plan on installing and running multiple
versions of Anaconda, or multiple versions of Python,
accept the default and leave this box checked.
• Click the Install button. If you want to watch the packages
Anaconda is installing, click Show Details.
• Click the Next button.
• After your install is complete, verify it by opening
Anaconda Navigator, a program that is included with
Anaconda: from your Windows Start menu, select the
shortcut Anaconda Navigator from the Recently added or
by typing “Anaconda Navigator”. If Navigator opens, you
have successfully installed Anaconda. If not, check that
you completed each step above.
PYTHON Introduction
• Python is an object oriented, simple,
interpreted, high level programming language
with following characteristics:
– High Software quality
– Increased developer productivity
– Program portability
– Huge set of libraries support
– Open source
– Expressive
– Platform independent
PYTHON INTRODUCTION

Python is an easy to learn, powerful programming language. It has efficient


high-level data structures and a simple but effective approach to object-oriented
programming. Python’s elegant syntax and dynamic typing, together with its
interpreted nature, make it an ideal language for scripting and rapid application
development in many areas on most platforms.
Python is object oriented and interpreted language. Python was developed by
Guido Van Rossum during 1985-1990 in Netherlands.
It is a multi-paradigm programming language because it can be used with the
web, enterprise and web services, mobile application, big data, cloud etc.
Python makes the development and debugging fast. Since, there is no compilation
process.
FEATURES OF PYTHON
WHY PYTHON?
• Software quality
• Developer Productivity
• Program portability
• Support huge set of libraries
• Easy to learn
APPLICATION OF PYTHON

• Software development
• System programming areas
• Create graphical application
• Internet scripting
• Database programming
• Used in fields like data mining, data science, machine learning, artificial intelligence etc
• Can work in different data bases like MySQL, MSSQL, PostgreSQL, SQLite…
• Developing games
• Serial port communication
• Image processing
• Control Robotics
• Natural Language Processing
• Office Automation or File related Application
• ….etc
PYTHON MODES
• SCRIPT MODE
• SHELL MODE
Interactive Development Environments
what IDE to use?
1. PyDev with Eclipse
2. Komodo
3. Emacs
4. Vim
5. TextMate
6. Gedit
7. Idle
8. PIDA (Linux)(VIM Based)
9. NotePad++ (Windows)
10.BlueFish (Linux)
Lets go to programming basics….
• Python basics
• Data types
• Operators
• Decision control statements
• Iterative control statements
• Functions
• Modules
PYTHON PROGRAMMING
LAB SESSION
• INPUT
• OUTPUT
• COMMENT
• VARIABLES
• DATATYPES

You might also like