Professional Documents
Culture Documents
Topic 3 - Big Data Characteristics
Topic 3 - Big Data Characteristics
TECHNOLOGY
Relational
Federated
Map/Reduce
Eng. N.F Thusabantu
Content
1. Introduction
2. Volume, Velocity, Variety, Veracity, Value
3. What is a database
4. Modern database types
o MongoDB
o Cassandra
cursor = conn.cursor()
cursor.execute('SELECT * FROM db_name.Table')
• Such databases have existed since the late 1960s, but did not obtain the
"NoSQL" moniker until a surge of popularity in the early 21st
century, triggered by the needs of Web 2.0 companies.
COLUMN: Accumulo, Cassandra, Scylla, Druid, HBase,Vertica
DOCUMENT:
ApacheCouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos
DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB
KEY-VALUE: Aerospike, Apache Ignite, ArangoDB, Berkeley
DB, Couchbase, Dynamo, FoundationDB, InfinityDB, MemcacheDB, MUMP
S, Oracle NoSQL Database, OrientDB, Redis, Riak, SciDB, SDBM/Flat
File dbm, ZooKeeper
GRAPH: AllegroGraph, ArangoDB, InfiniteGraph, Apache
Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso
Eng. N.F Thusabantu
5. Database
Architectures
• The design of a DBMS depends on its architecture. It can be
centralized or decentralized or hierarchical.
• Each table has at least one data category in a column, and each
row has a certain data instance for the categories which are
defined in the columns.
• The relationship between tables can then be set via the use
of foreign keys -- a field in a table that links to the primary key
of another table.
Eng. N.F Thusabantu
Eng. N.F Thusabantu
Understanding SQL
• The Structured Query Language (SQL) is the standard user
and application program interface for a relational database.
Transaction Control
Query Processing
• Distribution of Control
Degree to which individual DBMS can operate independently
Logically
Integrated Federated Multidatabase
Multiple DBMS DBMS System
o Single DBS
o Many DBSs in a local area network
o Many DBSs in a wide area network
Multiple Sites
Single DBS
• Benefits of distribution
o Improved access times
o Improved availability
o Improved reliability
• Data models
o Structures
o Constraints
o Query languages
• Linear scaling in the ideal case. It used to design for cheap, commodity
hardware.