Download as ppsx, pdf, or txt
Download as ppsx, pdf, or txt
You are on page 1of 21

Inner Architecture of a

Social Networking System


Petr Kunc, Jaroslav krablek,
Tom Pitner

Who am I?
Master student of FI MU
Member of LaSArIS
Webtops
Modern web applications
Cloud (and distributive) solutions

First time speaker at conference

Social network systems


Hundreds million users => advanced
software architecture and
technologies
High performance
Scalability
Billions of rows

Table of contents
What and why?
Takeplace
Which way?
Hadoop
HBase
Memcached
How?
Architecture and design
Was it worth it?
Testing

Takeplace

Takeplace and Social


Networking
Web-based service facilitating organization of
events based on meeting, sharing and
communication.
Emphasis on social and interpersonal interaction
Easy tool to comment conferences (feedback)
Professional user network: to create relations
among academic and professional world with
common interests
Analysis and statistics
To behave like Facebook with relations like
Twitter and to be used as LinkedIn.

Functional requirements
Entities can create asymmetric
relations
Posts
Walls and news feed
Comments and like

Technology requirements
Linux and Cloud
Data-oriented application
High throughput
Heavy loads
Concurrent requests

Caching tool

Relational databases
Fixed schema, ACID, indexes, joins
Problems
scaling up dataset size
Read/write concurrency

Typical use of MySQL: Production =>


Memcached (losing ACID) => Costly server
=> Denormalizing => materialize most
common queries => drop triggers, indexes
(compromises or expensive)

Hbase

Inspired by Google BigTable


Regions
4 dimensions
multidimensional sorted persistent
distributed key-value map
Keys & values = array of bytes
Row, CF, Columns & Version

Example
{
aa : {
cf : {
c1 : data
c2 : data
}
cf2 : {
anyByteArray : true
}
},
ab : { }
}

Hadoop
SW framework backbone of distributed
environment
MapReduce

HDFS

HBase

No real indexes
Automatic partitioning
Scale linearly and automatically
Parallel
Cheap
Not for everyone
Write once, read many
Built on top of Hadoop

Memcached
Distributed cache
Typical usage
public Data getData (String query) {
Data data = memcached.get(query);
if (data == null) {
data = database.get(query);
memcached.set(query, data);
}
return data;
}

Architecture

Architecture (2)

To be used in any system


Interface of services (REST, SOAP, )
User tables
Services: Follow, Wall, Like and
Discussion
Security

Architecture (3)

User ID
transformation

Data!
Three tables
Entities
Followers, Following, Blocked, Count,
News

Walls
Info, text, likes

Discussions (similar to Walls)

Storing data

Row IDs! Performance!


Lexically
Sequence scanner
UID (constant length)
yyyymmddhhmmssSSS
Inverted bytes -> newest to oldest

News feed
One by one (slow)

OR
Store news at each profile (great redundancy)

MEMCACHED!
Post put in DB => search followers =>
store minimized in Memcached => links to
news feed => 1 normal q & 1 batch q to
Memcached
TTL (LRU)

Conclusion
Pros
High volume data distribution
Scalability
High throughput
Heavy data load (write once, read many)

Cons
Losing relations, indexes, triggers,
Responsibility for consistent data
still not sure how it will behave when deployed on
production

You might also like