Professional Documents
Culture Documents
GMDB
GMDB
5 Conclusions
A Glimpse of Huawei
IT DBMSs
Device/Edge
Data Data
Management OLTP OLAP Big Data Management
NoSQL CT DBMS
Products DBMS DBMS Platform
Telecom Infrastructure Cloudification
In a cloudified system:
Logic & state are separated
Distributed Load Balancing
Each layer is distributed
Each layer is scalable
NFV Framework
Cloud OS
Cloud OS
DC DC
Online banking
• Big Data
• Distributed architecture
becoming distributed
• Mobile connectivity • AI over commodity hardware
Revolutionary
1998–2008
Computer-assisted • Omni-channel • Blockchain
• Smaller branches • IoT
banking
• Machines taken from • Mobile banking app on Benefits:
1985–1998
branches to HQ smartphones
• Virtual Teller Machine
• High available across
• Interconnection across
• Computerized
branches all over China (VTM) multiple DCs
• IT covering all services
accounting; data • Data centers with centralized, • Launch businesses faster
recorded in standalone Mobile app
computers
platform-based data
• Lower cost
• Earliest IT system based
on IBM mainframe Intelligent Personalized
computers E-banking
Card Real-time Off-bank, off-cashier
Bank Telegraphic Universal deposit
book transfer and withdrawal
Customer requirement complexity Source: International Data Corporation (IDC) and Gartner
Global Mobile Traffic Growth Forecast
Source:
Cisco virtual networking index: Global data traffic forecast update, 2017-2022
White Paper
(https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-
• M2M: machine-to-machine
(1 exabyte =
1,000 PB)
Source: DBS Asian Insights, Internet of Things The Pillar of Artificial Intelligence, June 28, 2018
5 Conclusions
Huawei MPPDB System Overview
Observations
Many customers’ OLTP transactions only access single shards
E.g. Users of Internet banking most time access their own data
GTM-LITE SCALABILITY
TPMC
1200000
1000000
800000
600000
400000 GTM-Lite SS
GTM-Lite MS
200000 Base_SS
Base_MS
0
1NODE 2NODE 4NODE 8NODE
Autonomous Vehicles
Application 3
2.5
1 4 5 6 2
2 3 7 8
Predicate Cache
• Use of KNN (K nearest neighbor) to find best
match over similar predicates
• Currently only handles selection predicates
Experimental Results: plan execution cache
• 1TB TPCH without collecting stats over the Lineitem table to show the
effectiveness of adaptive optimization
• 16X improvement overall in total runtime
Experimental Results: predicate cache [VLDB 2018]
Single parameter Two parameters
• Predicate cache
• K=5
• One parameter (constant) experiment
• Early line items : l_receiptdate <= l_commitdate – c days, c is between 1 and 80
• Late line items: l_receiptdate > l_commitdate + c days, c is between 1 and 120
• Error with KNN less than 4%
• Two parameters (constants) experiment
• Combine early and late lineitems as range
• l_receiptdate between l_commitdate – c1 and l_commitdate + c2
• Where C1 range from 1 to 80 and C2 range from 1 to 120
• Error with KNN less than 6%
Auto Tuning Summary
Adaptive query optimization with ML algorithm
Greatly improves the selectivity estimation accuracy
Significantly reduces the labor cost of performance tuning
5 Conclusions
Network Function Virtualization (NFV)
Data Center
Database Cluster • Transformation in Telecom:
DB VM DB VM DB VM
• Specialized hardware with monolithic software ==>
virtualized network functions on commodity hardware
• Scalability
Network Functions • Independent scalability database and network
functions
NF VM NF VM NF VM
• High availability (> 99.999%)
• Cost saving
• Using commodity hardware
• More efficient utilization of resources
VNF Infrastructure
Dynamic Data Conversion at Data Nodes
• Data nodes only keeps one copy of data: no data redundancy
• System maintains schema history information
• Data nodes automatically converts data based on versions when serving requests
• Only delta data changes are synced across clients and database
IMSI V1 1 V2 a b b
c
MM PDP 2 3 b cc ……
d e f f
2G 3G 4 5 d e f
4G Session 1 Session 2
Schema V2
GMDB Summary
GMDB: a distributed in-memory KV database for CT
High availability
Extreme low latency
Relaxed ACID
Low resource consumption
Online schema evolution
Support online application upgrade
Dynamic convert data with different schemas at Data Nodes
Future work:
Support more types of online schema changes
Hardware acceleration to further reduce latency and CPU consumption
Agenda
5 Conclusions
Motivation of Edge Computing
Cloud computing paradigm
Both data and computation are in cloud
Cloud computing benefits: high available, high elastic, no/low
management overhead, pay per use etc
IoT devices become pervasive in our lives
E.g. smart phones, video surveillance cameras, autonomous vehicles etc
Producer/consumer of huge amount of data
Advantages of edge computing over cloud computing
Low latency
Reduce network bandwidth consumption
Better privacy by avoid sending sensitive data to cloud
Integration between Cloud Computing and Edge Computing
41
Agenda
5 Conclusions
Takeaway I: IT/CT Convergence also in DBMS
IT/CT Convergence
• With the constant advancement of
DB technologies used in IT and CT,
we see a clear convergence
• The convergence is actually a
mutual borrowing and improvement of
technologies
• Thereby increasing further the
chances to borrow and reuse
IT DBMS CT DBMS
Takeaway II: Server DBMS Transitioning to Ubiquitous Data
Management
Device and Edge becoming more and more important
• Cloud centric => Ubiquitous across device, edge and cloud
Application scenarios become much more versatile
• The boundary between DBMS and applications becomes more vague
• New abstraction and common middle tiers can be useful
Core DBMS concepts and technologies still useful
• E.g. declarative APIs, distributed transactions, query optimization etc
Many DBMS research problems may become 100X harder in the ubiquitous
computing environment
• E.g. heterogeneity, elasticity, scalability, security etc
Takeaway III: Huawei Data Management Progress and Future