Data Modelling For Apache Cassandra: Datastax C Ollege Credit

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

DATASTAX C*OLLEGE CREDIT:

DATA MODELLING FOR


APACHE CASSANDRA
Aaron Morton
Apache Cassandra Committer, Data Stax MVP for Apache Cassandra
@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
General Guidelines
API Choice
Example
Cassandra is good at
reading data from a row in the
order it is stored.
Typically an efcient data model will
denormalize data and use the
storage engine order.
To create a good data model
understand the queries your
application requires.
General Guidelines
API Choice
Example
Multiple APIs?
initially only a Thrift / RPC
API, used by language specic
clients.
Multiple APIs...
Cassandra Query Language
(CQL) started as a higher
level, declarative alternative.
Multiple APIs...
CQL 3 brings many changes.
Currently in Beta in
Cassandra v1.1
CQL 3 uses
a Table Orientated, Schema
Driven, Data Model.
(I said it had many changes.)
General Guidelines
API Choice
Example
Twitter Clone
Previously done with Thrift at WDCNZ

Hello @World #Cassandra - Apache
Cassandra in action
http://vimeo.com/49762233
Twitter clone...
using CQL 3 via the cqlsh
tool.
bin/cqlsh -3
Queries?
* Post Tweet to Followers
* Get Tweet by ID
* List Tweets by User
* List Tweets in User Timeline
* List Followers

Keyspace is
a namespace container.
Our Keyspace
CREATE KEYSPACE
cass_college
WITH
strategy_class = 'NetworkTopologyStrategy'
AND
strategy_options:datacenter1 = 1;

Table is
a sparse collection of well
known, ordered columns.
First Table
CREATE TABLE User
(
user_name text,
password text,
real_name text,
PRIMARY KEY (user_name)
);
Some users...
cqlsh:cass_college> INSERT INTO User
... (user_name, password, real_name)
... VALUES
... ('fred', 'sekr8t', 'Mr Foo');
cqlsh:cass_college> select * from User;
user_name | password | real_name
-----------+----------+-----------
fred | sekr8t | Mr Foo

Some users...
cqlsh:cass_college> INSERT INTO User
... (user_name, password)
... VALUES
... ('bob', 'pwd');
cqlsh:cass_college> select * from User where user_name =
'bob';
user_name | password | real_name
-----------+----------+-----------
bob | pwd | null
Data Model (so far)

User
Data Model (so far)

CF /
Value
User
user_name Primary Key
Tweet Table
CREATE TABLE Tweet
(
tweet_id bigint,
body text,
user_name text,
timestamp timestamp,
PRIMARY KEY (tweet_id)
);
Tweet Table...
cqlsh:cass_college> INSERT INTO Tweet
... (tweet_id, body, user_name, timestamp)
... VALUES
... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_college> select * from Tweet where tweet_id = 1;
tweet_id | body | timestamp | user_name
----------+-----------+--------------------------+-----------
1 | The Tweet | 2012-11-06 10:26:56+1300 | fred

Data Model (so far)

CF /
Value
User Tweet
user_name Primary Key Field
tweet_id Primary Key
UserTweets Table
CREATE TABLE UserTweets
(
tweet_id bigint,
user_name text,
body text,
timestamp timestamp,
PRIMARY KEY (user_name, tweet_id)
);
UserTweets Table...
cqlsh:cass_college> INSERT INTO UserTweets
... (tweet_id, body, user_name, timestamp)
... VALUES
... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_college> select * from UserTweets where
user_name='fred';
user_name | tweet_id | body | timestamp
-----------+----------+-----------+--------------------------
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...
cqlsh:cass_college> select * from UserTweets where
user_name='fred' and tweet_id=1;
user_name | tweet_id | body | timestamp
-----------+----------+-----------+--------------------------
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...
cqlsh:cass_college> INSERT INTO UserTweets
... (tweet_id, body, user_name, timestamp)
... VALUES
... (2, 'Second Tweet', 'fred', 1352150816918);
cqlsh:cass_college> select * from UserTweets where user_name = 'fred';
user_name | tweet_id | body | timestamp
-----------+----------+--------------+--------------------------
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...
cqlsh:cass_college> select * from UserTweets where user_name = 'fred' order by
tweet_id desc;
user_name | tweet_id | body | timestamp
-----------+----------+--------------+--------------------------
fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTimeline
CREATE TABLE UserTimeline
(
tweet_id bigint,
user_name text,
body text,
timestamp timestamp,
PRIMARY KEY (user_name, tweet_id)
);
Data Model (so far)

CF /
Value
User Tweet
User
Tweets
User
Timeline
user_name Primary Key Field Primary Key Primary Key
tweet_id Primary Key
Primary Key
Component
Primary Key
Component
UserMetrics Table
CREATE TABLE UserMetrics
(
user_name text,
tweets counter,
followers counter,
following counter,
PRIMARY KEY (user_name)
);
UserMetrics Table...
cqlsh:cass_college> UPDATE
... UserMetrics
... SET
... tweets = tweets + 1
... WHERE
... user_name = 'fred';
cqlsh:cass_college> select * from UserMetrics where user_name
= 'fred';
user_name | followers | following | tweets
-----------+-----------+-----------+--------
fred | null | null | 1
Data Model (so far)

CF /
Value
User Tweet
User
Tweets
User
Timeline
User Metrics
user_name
Primary
Key
Field
Primary
Key
Primary
Key
Primary
Key
tweet_id
Primary
Key
Primary Key
Component
Primary Key
Component
Relationships
CREATE TABLE Followers
(
user_name text,
follower text,
timestamp timestamp,
PRIMARY KEY (user_name, follower)
);
CREATE TABLE Following
(
user_name text,
following text,
timestamp timestamp,
PRIMARY KEY (user_name, following)
);
Relationships
INSERT INTO
Following
(user_name, following, timestamp)
VALUES
('bob', 'fred', 1352247749161);
INSERT INTO
Followers
(user_name, follower, timestamp)
VALUES
('fred', 'bob', 1352247749161);
Relationships
cqlsh:cass_college> select * from Following;
user_name | following | timestamp
-----------+-----------+--------------------------
bob | fred | 2012-11-07 13:22:29+1300
cqlsh:cass_college> select * from Followers;
user_name | follower | timestamp
-----------+----------+--------------------------
fred | bob | 2012-11-07 13:22:29+1300

Data Model

CF /
Value
User Tweet
User
Tweets
User
Timeline
User
Metrics
Follows
Followers
user_name
Primary
Key
Field
Primary
Key
Primary
Key
Primary
Key
Primary
Key
Field
tweet_id
Primary
Key
Primary Key
Component
Primary Key
Component
Thanks.
Aaron Morton
@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

You might also like