Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

“My Life with HBase”

Lars George, CTO of WorldLingo


Apache Hadoop HBase Committer
www.worldlingo.com www.larsgeorge.com
WorldLingo
 Co-founded 1999
 Machine Translation Services
 Professional Human Translations
 Offices in US and UK
 Microsoft Office Provider since 2001
 Web based services
 Customer Projects
 Multilingual Archive
Multilingual Archive
 SOAP API
 Simple calls
◦ putDocument()
◦ getDocument()
◦ search()
◦ command()
◦ putTransformation()
◦ getTransformation()
Multilingual Archive (cont.)
 Planned already, implemented as customer
project
 Scale:
◦ 500million documents
◦ Random Access
◦ “100%” Uptime
 Technologies?
◦ Database
◦ Zip-Archives on file system, or Hadoop
RDBMS Woes
 Scaling MySQL hard, Oracle expensive (and
hard)
 Machine cost goes up faster speed
 Turn off all relational features to scale
 Turn off secondary indexes too
 Tables can be a problem at sizes as low as
500GB
 Hard to read data quickly at these sizes
 Write speed degrades with table size
 Future growth uncertain
MySQL Limitations
 Master becomes a problem
 What if your write speed is greater than a
single machine
 All slaves must have same write capacities
as master (can„t check out on slaves)
 Single point of failure, no easy failover
 Can (sort of) solve this with sharding
Sharding
Sharding Problems
 Requires either a hashing function or
mapping table to determine shard
 Data access code becomes complex
 What if shard sizes become too large?
Resharding
Schema Changes
 What about schema changes or
migrations?
 MySQL not your friend here
 Only gets harder with more data
HBase to the Rescue
 Clustered, commodity(-ish) hardware
 Mostly schema-less
 Dynamic distribution
 Spreads writes out over the cluster
HBase
 Distributed database modeled on Bigtable
◦ Bigtable: A Distributed Storage System for
Structured Data by Chang et al.
 Runs on top of Hadoop Core
 Layers on HDFS for storage
 Native connections to MapReduce
 Distributed, High Availability, High
Performance, Strong Consistency
HBase
 Column-oriented store
◦ Wide table costs only the data stored
◦ NULLs in row are 'free'
◦ Good compression: columns of similar type
◦ Column name is arbitrary
 Rows stored in sorted order
 Can random read and write
 Goal of billions of rows X millions of cells
◦ Petabytes of data across thousands of servers
Tables
 Table is split into roughly equal sized
„regions“
 Each region is a contiguous range of keys,
from [start, to end)
 Regions split as they grow, thus
dynamically adjusting to your data set
Tables (cont.)
 Tables are sorted by Row
 Table schema defines column families
◦ Families consist of any number of columns
◦ Columns consist of any number of versions
◦ Everything except table name is byte[]

(Table, Row, Family:Column, Timestamp)  Value


Tables (cont.)
 As a data structure
SortedMap(
RowKey, List(
SortedMap(
Column, List(
Value, Timestamp
)
)
)
)
Server Architecture
 Similar to HDFS
◦ Master ≈ Namenode
◦ Regionserver ≈ Datanode
 Often run these alongsaide each other!
 Difference: HBase stores state in HDFS
 HDFS provides robust data storage
across machines, insulating against failure
 Master and Regionserver fairly stateless
and machine independent
Region Assignment
 Each region from every table is assigned
to a Regionserver
 Master Duties:
◦ Reponsible for assignment and handling
regionserver problems (if any!)
◦ When machines fail, move regions
◦ When regions split, move regions to balance
◦ Could move regions to respond to load
◦ Can run multiple backup masters
Master
 The master does NOT
◦ Handle any write requests (not a DB master!)
◦ Handle location finding requests
◦ Not involved in the read/write path
◦ Generally does very little most of the time
Distributed Coordination
 Zookeeper is used to manage master
election and server availability
 Set up as a cluster, provides distributed
coordination primitives
 An excellent tool for building cluster
management systems
HBase Storage Architecture
HBase Public Timeline
 November 2006
◦ Google releases paper on Bigtable
 February 2007
◦ Initial HBase prototype created as Hadoop contrib
 October 2007
◦ First "useable" HBase (0.15.0 Hadoop)
 December 2007
◦ First HBase User Group
 January 2008
◦ Hadoop becomes TLP, HBase becomes subproject
 October 2008
◦ HBase 0.18.1 released
 January 2009
◦ HBase 0.19.0 released
 September 2009
◦ HBase 0.20.0 released
HBase WorldLingo Timeline
HBase - Example
 Store web crawl data
◦ Table crawl with family content
◦ Row is URL with columns
 content:data stores raw crawled data
 content:language stores http language header
 content:type stores http content-type header
◦ If processing raw data for hyperlinks and
images, add families links and images
 links:<url> column for each hyperlink
 links:<url> column for each image
HBase - Clients
 Native Java client/API
◦ get(Get get)
◦ put(Put put)
 Non-Java clients
◦ Thrift server (Ruby, C++, Erlang, etc.)
◦ REST server (Stargate)
 TableInput/TableOutputFormat for
MapReduce
 HBase shell (jruby)
Scaling HBase
 Add more machines to scale
◦ Automatic rebalancing
 Base model (BigTable) scales past 1000TB
 No inherent reason why Hbase couldn„t
What to store in HBase
 Maybe not your raw log data...
 ... but the results of processing it with
Hadoop!
 By storing the refined version in HBase,
can keep up with huge data demands and
serve to your website
!HBase
 “NoSQL” Database!
◦ No joins
◦ No sophisticated query engine
◦ No transactions (sort of)
◦ No column typing
◦ No SQL, no ODBC/JDBC, etc. (but there is
HBql now!)
 Not a replacement for your RDBMS...
 Matching Impedance!
Why HBase?
 Datasets are reaching Petabytes
 Traditional databases are expensive to
scale and difficult to distribute
 Commodity hardware is cheap and
powerful (but HBase can make use of
powerful machines too!)
 Need for random access and batch
processing (which Hadoop does not
offer)
Numbers
 Single reads are 1-10ms depending on
disk seeks and caching
 Scans can return hundreds of rows in
dozens of ms
 Serial read speeds
Multilingual Archive (cont.)
 44 Dell PESC1435, 12GB RAM, 2 x 1TB
SATA drives
 Java 6
 Tomcat 5.5
 88 Xen domU‟s
◦ Apache
◦ Hadoop/HBase
◦ Tomcat application servers
 Currently split into two clusters
Lucene Search Server
 43 fields indexed
 166GB size
 Automated merging/warm-up/swap
 Looking into scalable solution
◦ Katta
◦ Hyper Estraier
◦ DLucene
◦ …
 Sorting?
Multilingual Archive (cont.)
 5 Tables
 Up to 5 column families
 XML Schemas
 Automated table schema updates
 Standard options tweaked over time
◦ Garbage Collection!
 MemCached(b) layer
Layers
Network Firewall

LWS Director 1 Director n

Web Apache 1 Apache n …

App Tomcat 1 Tomcat n Tomcat 1 Tomcat n

Cache MemCached
1
MemCached
n

Data HBase
Map/Reduce
 Backup/Restore
 Index building
 Cache filling
 Mapping
 Updates
 Translation
HBase - Problems
 Early versions (before HBase 0.19.0!)
◦ Data loss
◦ Migration nightmares
◦ Slow performance

 Current version
◦ Read HBase Wiki!!!
 Single point of failure (name node only!)
HBase - Notes
 RTF M (ine)

 HBase Wiki, IRC Channel


 Personal Experience:
◦ Max. file handles (32k+)
◦ Hadoop xceiver limits (NIO?)
◦ Redundant meta data (on name node)
◦ RAM (4GB+)
◦ Deployment strategy
◦ Garbage collection (use CMS, G1?)
◦ Maybe not mix batch and interactive?
Graphing
 Use supplied Ganglia context or JMX
bridge to enable Nagios and Cacti
 JMXToolkit: swiss army knife for JMX
enabled servers:
http://github.com/larsgeorge/jmxtoolkit
HBase - Roadmap
 HBase 0.20.x “Performance”
◦ New Key Format – KeyValue
◦ New File Format – Hfile
◦ New Block Cache – Concurrent LRU
◦ New Query and Result API
◦ New Scanners
◦ Zookeeper Integration – No SPOF in HBase
◦ New REST Interface
◦ Contrib
 Transactional Tables
 Secondary Indexes
 Stargate
HBase - Roadmap (cont.)
 HBase 0.21.x “Advanced Concepts”
◦ Master Rewrite – More Zookeeper
◦ New RPC Protocol (Avro)
◦ Multi-DC Replication
◦ Intra Row Scanning
◦ Further optimizations on algorithms and data
structures
◦ Discretionary Access Control
◦ Coprocessors
Questions?
 Email: lars@worldlingo.com
larsgeorge@apache.org
lars@larsgeorge.com
 Blog: www.larsgeorge.com
 Twitter: larsgeorge

You might also like