This document provides an overview of modeling data for Apache Cassandra. It begins by discussing general guidelines and choices for APIs. Then it demonstrates creating tables to model a Twitter clone application using Cassandra Query Language (CQL). Various tables are created to store users, tweets, a user timeline, user metrics, and follower relationships. The document shows how to insert and query sample data in these tables. It uses the example of modeling a Twitter application to illustrate best practices for denormalizing data and designing Cassandra data models around queries.
This document provides an overview of modeling data for Apache Cassandra. It begins by discussing general guidelines and choices for APIs. Then it demonstrates creating tables to model a Twitter clone application using Cassandra Query Language (CQL). Various tables are created to store users, tweets, a user timeline, user metrics, and follower relationships. The document shows how to insert and query sample data in these tables. It uses the example of modeling a Twitter application to illustrate best practices for denormalizing data and designing Cassandra data models around queries.
This document provides an overview of modeling data for Apache Cassandra. It begins by discussing general guidelines and choices for APIs. Then it demonstrates creating tables to model a Twitter clone application using Cassandra Query Language (CQL). Various tables are created to store users, tweets, a user timeline, user metrics, and follower relationships. The document shows how to insert and query sample data in these tables. It uses the example of modeling a Twitter application to illustrate best practices for denormalizing data and designing Cassandra data models around queries.
This document provides an overview of modeling data for Apache Cassandra. It begins by discussing general guidelines and choices for APIs. Then it demonstrates creating tables to model a Twitter clone application using Cassandra Query Language (CQL). Various tables are created to store users, tweets, a user timeline, user metrics, and follower relationships. The document shows how to insert and query sample data in these tables. It uses the example of modeling a Twitter application to illustrate best practices for denormalizing data and designing Cassandra data models around queries.
APACHE CASSANDRA Aaron Morton Apache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License General Guidelines API Choice Example Cassandra is good at reading data from a row in the order it is stored. Typically an efcient data model will denormalize data and use the storage engine order. To create a good data model understand the queries your application requires. General Guidelines API Choice Example Multiple APIs? initially only a Thrift / RPC API, used by language specic clients. Multiple APIs... Cassandra Query Language (CQL) started as a higher level, declarative alternative. Multiple APIs... CQL 3 brings many changes. Currently in Beta in Cassandra v1.1 CQL 3 uses a Table Orientated, Schema Driven, Data Model. (I said it had many changes.) General Guidelines API Choice Example Twitter Clone Previously done with Thrift at WDCNZ
Hello @World #Cassandra - Apache Cassandra in action http://vimeo.com/49762233 Twitter clone... using CQL 3 via the cqlsh tool. bin/cqlsh -3 Queries? * Post Tweet to Followers * Get Tweet by ID * List Tweets by User * List Tweets in User Timeline * List Followers
Keyspace is a namespace container. Our Keyspace CREATE KEYSPACE cass_college WITH strategy_class = 'NetworkTopologyStrategy' AND strategy_options:datacenter1 = 1;
Table is a sparse collection of well known, ordered columns. First Table CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name) ); Some users... cqlsh:cass_college> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... ('fred', 'sekr8t', 'Mr Foo'); cqlsh:cass_college> select * from User; user_name | password | real_name -----------+----------+----------- fred | sekr8t | Mr Foo
Some users... cqlsh:cass_college> INSERT INTO User ... (user_name, password) ... VALUES ... ('bob', 'pwd'); cqlsh:cass_college> select * from User where user_name = 'bob'; user_name | password | real_name -----------+----------+----------- bob | pwd | null Data Model (so far)
User Data Model (so far)
CF / Value User user_name Primary Key Tweet Table CREATE TABLE Tweet ( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id) ); Tweet Table... cqlsh:cass_college> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917); cqlsh:cass_college> select * from Tweet where tweet_id = 1; tweet_id | body | timestamp | user_name ----------+-----------+--------------------------+----------- 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
Data Model (so far)
CF / Value User Tweet user_name Primary Key Field tweet_id Primary Key UserTweets Table CREATE TABLE UserTweets ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id) ); UserTweets Table... cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917); cqlsh:cass_college> select * from UserTweets where user_name='fred'; user_name | tweet_id | body | timestamp -----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 UserTweets Table... cqlsh:cass_college> select * from UserTweets where user_name='fred' and tweet_id=1; user_name | tweet_id | body | timestamp -----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 UserTweets Table... cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, 'Second Tweet', 'fred', 1352150816918); cqlsh:cass_college> select * from UserTweets where user_name = 'fred'; user_name | tweet_id | body | timestamp -----------+----------+--------------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 UserTweets Table... cqlsh:cass_college> select * from UserTweets where user_name = 'fred' order by tweet_id desc; user_name | tweet_id | body | timestamp -----------+----------+--------------+-------------------------- fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 UserTimeline CREATE TABLE UserTimeline ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id) ); Data Model (so far)
CF / Value User Tweet User Tweets User Timeline user_name Primary Key Field Primary Key Primary Key tweet_id Primary Key Primary Key Component Primary Key Component UserMetrics Table CREATE TABLE UserMetrics ( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name) ); UserMetrics Table... cqlsh:cass_college> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = 'fred'; cqlsh:cass_college> select * from UserMetrics where user_name = 'fred'; user_name | followers | following | tweets -----------+-----------+-----------+-------- fred | null | null | 1 Data Model (so far)
CF / Value User Tweet User Tweets User Timeline User Metrics user_name Primary Key Field Primary Key Primary Key Primary Key tweet_id Primary Key Primary Key Component Primary Key Component Relationships CREATE TABLE Followers ( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower) ); CREATE TABLE Following ( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following) ); Relationships INSERT INTO Following (user_name, following, timestamp) VALUES ('bob', 'fred', 1352247749161); INSERT INTO Followers (user_name, follower, timestamp) VALUES ('fred', 'bob', 1352247749161); Relationships cqlsh:cass_college> select * from Following; user_name | following | timestamp -----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300 cqlsh:cass_college> select * from Followers; user_name | follower | timestamp -----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300
Data Model
CF / Value User Tweet User Tweets User Timeline User Metrics Follows Followers user_name Primary Key Field Primary Key Primary Key Primary Key Primary Key Field tweet_id Primary Key Primary Key Component Primary Key Component Thanks. Aaron Morton @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License