Vega-Lite Tutorial UC Davis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Vega-Lite Tutorial

Dominik Moritz & Kanit “Ham” Wongsuphasawat


Interactive Data Lab, University of Washington
http://bit.ly/vega-lite-tutorial-slides
Who are we
Dominik Moritz, Kanit “Ham” Wongsuphasawat

● 3rd year PhD students in CS


● Working on data visualization tools including Vega-Lite!

Who are you?


Goals & Logistics
● We want you to learn something!
○ Hands on
○ Ask questions at any time (via Etherpad)
■ https://etherpad.wikimedia.org/p/2016-05-18-vegalite

● 9:15am - noon*

● 15 minutes break
Outline
Effective Visualization Design

Vega-Lite

Using Vega-Lite with Jupyter

Using Vega-Lite with Polestar

Future of Vega-Lite
Why Visualization?
Visualizations are an effective tool to explore and communicate data.

● Human visual system is high-bandwidth channel to brain


○ Effective parallel and background processing
● Suitable when there is a need for augmenting human capabilities
○ Not needed if we have full automation
○ Analysis is often ill specified
○ Visualization offers: presentation of results, understanding of models, building trust

More on this in https://www.cs.ubc.ca/~tmm/courses/547-15/, http://cs.uw.edu/512


Anscombe’s Quartet

Some graphics taken from https://www.cs.ubc.ca/~tmm/courses/547-15/


Anscombe’s Quartet
Analysis: What, Why and How?
What is shown?

● Data abstraction

Why is the user looking at it?

● Task abstraction

How is it shown?

● Visual encoding
What? Data
Usually Tabular Data Data Types

Fields (columns)
Quantitative

Ordinal

Nominal

Temporal
There are other types such as
networks but these are not the focus.
Quantitative to Ordinal with binning
Why? Task
How? Visual Encoding
Facet
Partition data by an ordinal variable.
Color

Diverging color scale


for quantitative data:
How to choose visual encoding
Consistency: The properties of the image (visual variables) should match the
properties of the data.

Importance Ordering: Encode the most important information in the most


effective way.

Expressiveness: Tell the truth and nothing but the truth


(don’t lie, and don’t lie by omission)

Effectiveness: Use encodings that people decode better


(where better = faster and/or more accurate)
Expressiveness Example
Mackinlay’s Effectiveness Ranking
Certain encodings are more effective than others
Which one is better?

<
Which one is better?

>
Which one is better?

?
Components of a visualization
Data + Transformation

Channel: X, Y, Color, Size, ...


Mark
Encoding: Field -> Channel

Scale: Domain values -> Visual values

Legend
Legend label

Guide Vega-Lite has defaults for


everything except mark
Axis label Tick
type and encoding.
Axis
Axis title
Scales
Data domain -> Pixel range Data domain -> Color range

green

Image from http://www.jeromecukier.net/blog/2011/08/11/d3-scales-and-color/


Outline
Visualization Design

Vega-Lite

Using Vega-Lite with Jupyter

Using Vega-Lite with Polestar

Future of Vega-Lite
What is Vega-Lite
● A Visualization Specification Language
● a concise JSON syntax for rapidly creating visualizations in analysis.
○ Can serve as file format for creating and sharing visualizations

A Vega-lite spec describes:

● data source
● (optional) data transformation
● Graphical mark type
● Mappings between data and encoding channels (x, y, color, etc.) – encoding
○ Properties of Scale, Axis, Legend
● Faceting
JSON (Javascript Object Notation)
Text-based data interchange format based on Javascript’s Object.

{
“numberProperty”: 1234,
“stringProperty”: “abc”,
“dateProperty”: “1/1/2013”,
“arrayProperty”: [1,2,3],
“objectProperty: {
“childNumberProperty”: 444
}
}
Please open
http://vega.github.io/vega-editor/?mode=vega-lite
Tutorial A:
Getting Started with Vega-Lite
(adapted from https://vega.github.io/vega-lite/tutorials/getting_started.html)
Data

[
{"a": "C", "b": 2},
{"a": "C", "b": 7},
{"a": "C", "b": 4},
{"a": "D", "b": 1},
Given a table We can represent it in JSON. {"a": "D", "b": 2},
{"a": "D", "b": 6},
{"a": "E", "b": 8},
{"a": "E", "b": 4},
{"a": "E", "b": 7}
]
Describing data inline with data.values

{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
}
}

Note: Vega-Lite also supports loading data from URL from CSV/JSON files
mark

{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "point"
}
encoding

{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "point",
"encoding": {
"x": {"field": "a", "type": "nominal"}
}
}

Mapping nominal field “a” to channel “x”


(x-position of the point mark).
Mapping quantitative field “b” to y?
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "point",
"encoding": {
"x": {"field": "a", "type": "nominal"}
???
}
}
Mapping quantitative field “b” to y?
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "point",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"field": "b", "type": "quantitative"}
}
}
aggregate
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "point",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "mean", "field": "b", "type": "quantitative"}
}
}
Data query equivalent to “SELECT MEAN(b), a GROUP BY a” in SQL
aggregate
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "point",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "average", "field": "b", "type": "quantitative"}
}
} Try other aggregate function
such as “sum”, “min”, “max”?
Making a bar chart (of the same data)?
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "point",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "average", "field": "b", "type": "quantitative"}
}
}
Making a bar chart (of the same data)?
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "bar", "point",
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "average", "field": "b", "type": "quantitative"}
}
}
Transpose?
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "bar"
"encoding": {
"x": {"field": "a", "type": "nominal"},
"y": {"aggregate": "average", "field": "b", "type": "quantitative"}
}
}
Transpose?
{
"data": {
"values": [
{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4},
{"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6},
{"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}
]
},
"mark": "bar"
"encoding": {
"y""x": {"field": "a", "type": "nominal"},
"x""y": {"aggregate": "average", "field": "b", "type": "quantitative"}
}
}
Customize
- Axis Title: you can set

encoding.x.axis.title = “Average of b”
Publishing Vega-Lite Visualization on a web-page
See example page: http://vega.github.io/vega-lite-demo/

and code https://github.com/vega/vega-lite-demo

You can easily fork!


Tutorial B:
Exploring Data with Vega-Lite
(adapted from https://vega.github.io/vega-lite/tutorials/explore.html)
Seattle Weather Data Set
Let’s work with
(cm) (celsius) (m/s) some real data
date precipitation temp_max temp_min wind weather

1/1/12 0 12.8 5 4.7 drizzle

1/2/12 10.9 10.6 2.8 4.5 rain

1/3/12 0.8 11.7 7.2 2.3 rain

1/4/12 20.3 12.2 5.6 4.7 rain

1/5/12 1.3 8.9 2.8 6.1 rain

1/6/12 2.5 4.4 2.2 2.2 rain

...
From https://www.ncdc.noaa.gov/cdo-web/
Describing data from url with data.url

{
"data": {
"url": "data/seattle-weather.csv",
"formatType": "csv"
}
}

Other format type includes tsv and json


Examine each variable with Dot Plot & Histogram
{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "tick",
"encoding": {
"x": {"field": "precipitation", "type": "quantitative"}
}
}

{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "bar",
"encoding": {
"x": {"bin": true, "field": "precipitation", "type": "quantitative"},
"y": {"aggregate": "count", "field": "*", "type": "quantitative"}
}
}
Exercise: create dot plots or histograms for
temp_max, temp_min, wind, weather, date
TimeUnit for Dates
{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "tick",
"encoding": {
"x": {"timeUnit": "month", "field": "precipitation", "type": "quantitative"}
}
}

Try “date”, “yearmonth”


TimeUnit for Dates
{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "tick",
"encoding": {
"x": {"timeUnit": "month", "field": "precipitation", "type": "quantitative"}
}
}

Try “date”, “yearmonth”

As mentioned, this data set


represents daily weather data from
January 1st, 2012 to December
31st, 2015.
Sorting a Field

{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "bar",
"encoding": {
"x": {
"field": "weather", "type": "ordinal",
"sort": {"aggregate": "count", "field": "*"}
},
"y": {"aggregate": "count", "field": "*", "type": "quantitative"}
}
}
Calculate New Fields

{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"transform": {
"calculate": [{
"field": "temp_range",
"expr": "datum.temp_max - datum.temp_min"
}]
},
"mark": "bar",
"encoding": {
"x": {"bin": true, "field": "temp_range", "type": "quantitative"},
"y": {"aggregate": "count", "field": "*", "type": "quantitative"}
}
}
Relationship between variables
Try to create a visualization to answer the following questions:

- Average precipitation for different months


- Max temperature for different year and months
- Average max temperature over the years.
- Average temperature range for different months
Average precipitation for different months
{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "line",
"encoding": {
"x": {
"timeUnit": "month",
"field": "date",
"type": "temporal"
},
"y": {
"aggregate": "mean",
"field": "precipitation",
"type": "quantitative"
}
}
}
Max temperature for different year and months
{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "line",
"encoding": {
"x": {
"timeUnit": "yearmonth",
"field": "date",
"type": "temporal"
},
"y": {
"aggregate": "max",
"field": "temp_max",
"type": "quantitative"
}
},
"config": {
"unit": { "width": 300 }
}
}
Max temperature for different year and months
{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "line",
"encoding": {
"x": {
"timeUnit": "year",
"field": "date",
"type": "temporal"
},
"y": {
"aggregate": "mean",
"field": "temp_max",
"type": "quantitative"
}
}
}
Average temperature range for different months
{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"transform": {
"calculate": [{
"field": "temp_range",
"expr": "datum.temp_max - datum.temp_min"
}]
},
"mark": "line",
"encoding": {
"x": {"timeUnit": "month", "field": "date", "type": "temporal"},
"y": {"aggregate": "mean", "field": "temp_range", "type": "quantitative"}
}
}
Stacking

{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "bar",
"encoding": {
"x": {"bin": "true", "field": "temp_max", "type": "quantitative"},
"y": {"aggregate": "count", "field": "*", "type": "quantitative"},
"color": {"field": "weather", "type": "ordinal"}
}
}
Faceting

{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "bar",
"encoding": {
"x": {"bin": "true", "field": "temp_max", "type": "quantitative"},
"y": {"aggregate": "count", "field": "*", "type": "quantitative"},
"row”: {"field": "weather", "type": "ordinal"}
}
}
Filtering

{
"data": {"url": "data/seattle-weather.csv", "formatType": "csv"},
"mark": "bar",
"transform": {
"filter": "datum.weather ==='fog'"
},
"encoding": {
"x": {"bin": "true", "field": "temp_max", "type": "quantitative"},
"y": {"aggregate": "count", "field": "*", "type": "quantitative"}
}
}
More Examples
https://vega.github.io/vega-lite/examples/
and docs
http://vega.github.io/vega-lite/docs/
Outline
Effective Visualization Design

Vega-Lite

Using Vega-Lite with Jupyter

Using Vega-Lite with Polestar

Future of Vega-Lite
Set up
https://github.com/vega/ipyvega

● In a virtual environment: pip install jupyter vega pandas


● jupyter nbextension install --py vega
● jupyter notebook
Use
import pandas as pd
df = pd.read_json('cars.json')

from vega import vegalite


vegalite.view(df, {
"mark": "point",
"encoding": {
"y": {"type": "quantitative","field": "Acceleration"},
"x": {"type": "quantitative","field": "Horsepower"}
}
})
Example Workbook
https://github.com/vega/ipyvega/blob/master/Vega-Lite.ipynb
Coming soon: github.com/ellisonbg/altair
Outline
Effective Visualization Design

Vega-Lite

Using Vega-Lite with Jupyter

Using Vega-Lite with Polestar vega.github.io/polestar

Future of Vega-Lite
Please open
http://vega.github.io/polestar/
Outline
Effective Visualization Design

Vega-Lite

Using Vega-Lite with Jupyter

Using Vega-Lite with Polestar

Future of Vega-Lite
Composition and Interaction
Q&A
Relationship with D3 and Vega
D3’s low-level building blocks to Vega

Vega-Lite

Vega

D3
Faceting
-

You might also like