InfiniScaleStorage TAR

Technology Analysis Report | CTO Office
InfiniScale Storage Architectures

Emerging storage stacks for new appl ications
Srinivasan Narayanamurthy | Kamesh Raghavendra | Gaurav Makkar
CTO Office, NetApp
June 2013
Abstract
This decade is seeing a tremendous proliferation of cloud and mobility enabled services in human life not only in social
interaction with one another but also almost every commercial activity of consuming goods or services. This report analyzes the
impact of this techno-economic trend on the IT consumption of large enterprises, which are vigorously re-architecting their
infrastructures to enable modern consumption paradigms of their end users. Aside from the social media, these trends have
significantly impacted business operations in e-commerce, financial services, healthcare, media and application
development/deployment. We all are familiar with the consumer side of this impact as end user consumers of enterprises in
these verticals. However, there is a greater disruption on the business & IT architectural aspects of these enterprises that this
report attempts to analyze, which is characterized as InfiniScale Storage Architectures. The report also studies the response
from NetApp’s competitors & partners to this trend, and concludes with recommendations for NetApp.
Executive Summary
What? NetApp’s large enterprise customers in e-commerce, retail, financial services, public sector
and telco/SP verticals are rolling out new analytics-driven cloud scale business operations.
These business operations are characterized by a. Lean supply-chain management through
application of Internet of Things and b. Deeper real-time consumer insights through analysis
of social media traces. These are leading to a new generation of rapidly growing low-
latency, high throughput data stores optimized for analytics. An emerging API-driven
storage stack, which has almost become a de-facto standard through its OSI-like 7-layer
model, is defining migration of data management values to layers stacked above data
storage.
Disruptions driven by the cloud business model are leading applications’ demand for newer
developer friendly data services and logical data models (like map-reduce, graph, columnar
stores that are more sophisticated than the basic file volume / block used traditionally),
while changing technology curves leading to abundance of CPU, memory & networking are
impacting the physical data abstractions and data distribution of logical models. NetApp’s
current product portfolio only targets the data storage beneath the physical abstractions in
this stack and hence needs to follow the value that has migrated up this stack to the data
distribution and data abstraction layers.
We see an emergence of 3 new workloads: real-time analytics, session stores and active
blob stores. Traditional storage architectures are stretched to address these emerging
workloads in one or more of the following vectors: cost, scale, latency, throughput and the
need to support non-POSIX oriented application driven data models. While these
environments currently pose a large business threat through open source and commodity
infrastructure, there is potential for NetApp to protect its challenged market share by
differentiating in this space with engineered solutions.
Why? Some of the key drivers for this area of work are:
 Application middleware and in-memory databases are driving a trend towards doing
fine-grained data management higher in the stack. Most of cDOT data management
value moves into that space. Further, new kinds of data management emerge due to
the nearness to the application, which are difficult to provide by cDOT. Not heeding to
this shift would mean that storage is relegated to being used as JBOD. The presents
itself as the emergence of custom and narrow-focused databases, called data stores.
 Enterprises are demanding real-time analytics on most types of data. In many cases
these applications cannot tolerate disk latencies, have very high transaction rates
(millions of transactions per second) and large working sets. Thus, these applications
want a lot of memory on each node and utilize scale-out architectures. SCMs have a
number of end-to-end issues to be resolved before they become real in a data center,
but DRAM-based InfiniScale solutions like SAP-HANA and Microsoft’s Hekaton are not
waiting. Further, open source based solutions, like Cassandra are being used by most of
2 InfiniScale Storage Architectures NetApp Confidential – Limited Use

our existing customers, like Apple, Thomson Reuters and Intuit.
 NetApp competitors such as EMC and Amazon have already begun exploring this space.
EMC has gone about it through acquisitions over the past 3 years, Amazon has been
building services in-house and expanding it reach through incremental additions to
AWS. Open Source drives most of this space and now Open Compute threatens to be a
compelling hardware alternative to custom hardware.
How? The recommendations involve engineering application-guided agile data layouts that can
accommodate application-defined granularity of data management. We propose the
following broad set of investigations in ATG targeted towards accomplishing this:
 Most object stores are good at storing large objects. They can neither work with tiny
objects nor can they work with the volume and velocity of tiny key-value data
elements. A KV store from that perspective is an IOPs-tier of the Object Store. Being
able to deal with tiny key-value pairs is a challenging storage problem. This is because
even as the relatively colder data is written to a more stable storage, it will be accessed
using the same access mechanisms as when it was in memory. How can tiny key-value-
pair data store organize data for large back-end IOs to stable storage, for efficient
subsequent retrieval and processing? We speculate that this has the potential of being
the unified storage for NoSQL databases.
 CPU architectures are becoming very potent and very complex. Also, treating DRAM as
pure random access has a serious effect on cache effectiveness. Thus, spatial locality of
data access from DRAM is very important for high transaction and low-latency
workloads. Also, the memory bandwidth continues to be highly constrained, which only
aggravates the need for its effective utilization, achieved through spatial data locality.
Given these aspects, unique memory layouts for data processing based on the nature
of queries is needed. Thus, application-guided data layouts need to be explored.
 Explore storage efficiency in a IOPs sensitive world by enabling reads over compressed
and encrypted data, and enabling highly selective decompress.
 Explore the resiliency options through geo-distributed coding techniques that provide
storage efficient resiliency.
 We also call out potential inorganic investments, in the form of 3 technology startups
in this space.

TABLE OF CONTENTS
EXECUTIVE SUMMARY------------------------------------------------------------------------------------------------------2
1 DEMYSTIFYING THE LANDSCAPE ----------------------------------------------------------------------------------6
1.1 BUSINESS DRIVERS --------------------------------------------------------------------------------------------------6
1.1.1 Use Cases ------------------------------------------------------------------------------------------------------------------------------- 6
1.1.2 Emerging IT Requirements ---------------------------------------------------------------------------------------------------------- 7
1.1.3 Product Value Domains -------------------------------------------------------------------------------------------------------------- 8
1.2 EMERGING DATA STORES ------------------------------------------------------------------------------------------9
1.2.1 Real-time Analytics Store ----------------------------------------------------------------------------------------------------------- 9
1.2.2 Session-state Store------------------------------------------------------------------------------------------------------------------- 10
1.2.3 Active Blob Store --------------------------------------------------------------------------------------------------------------------- 11
1.2.4 Summary ------------------------------------------------------------------------------------------------------------------------------- 12
1.2.5 Customer References ---------------------------------------------------------------------------------------------------------------- 13
1.3 ADDRESSABLE MARKET SIZE ------------------------------------------------------------------------------------- 14
1.4 COMPETITIVE LANDSCAPE ---------------------------------------------------------------------------------------- 15
1.4.1 EMC Pivotal ---------------------------------------------------------------------------------------------------------------------------- 15
1.4.2 Amazon Web Services --------------------------------------------------------------------------------------------------------------- 17
1.4.3 Open Source Projects ---------------------------------------------------------------------------------------------------------------- 18
1.5 WHY SHOULD NETAPP CARE? ----------------------------------------------------------------------------------- 19
2 DEMYSTIFYING INFINISCALE TECHNOLOGY ------------------------------------------------------------------ 21
2.1 INFINISCALE TECHNOLOGY OVERVIEW -------------------------------------------------------------------------- 21
2.2 INFINISCALE TECHNOLOGY INSIGHTS ---------------------------------------------------------------------------- 22
2.2.1 API-driven Storage ------------------------------------------------------------------------------------------------------------------- 22
2.2.2 Emerging Workloads and Datasets ---------------------------------------------------------------------------------------------- 23
2.2.3 Emergence of Custom Data Stores ----------------------------------------------------------------------------------------------- 25
2.2.4 Analytics Changes Everything ----------------------------------------------------------------------------------------------------- 26
2.2.5 Scale-out, Shared-nothing Architectures --------------------------------------------------------------------------------------- 29
2.2.6 POSIX Semantics Eliminated ------------------------------------------------------------------------------------------------------- 32
2.2.7 Real-time Processing ---------------------------------------------------------------------------------------------------------------- 33
2.2.8 Data Management Evolution ----------------------------------------------------------------------------------------------------- 36
2.2.9 Construction of InfiniScale Solutions -------------------------------------------------------------------------------------------- 37
2.2.10 Data Center Infrastructure --------------------------------------------------------------------------------------------------------- 39
3 RECOMMENDATIONS ---------------------------------------------------------------------------------------------- 41
3.1 ABSTRACT ARCHITECTURES--------------------------------------------------------------------------------------- 41
3.1.1 InfiniScale Architecture ------------------------------------------------------------------------------------------------------------- 41
3.1.2 Real-time Data Stores --------------------------------------------------------------------------------------------------------------- 41
3.1.3 Capacity-based Data Stores ------------------------------------------------------------------------------------------------------- 42
3.1.4 Summary ------------------------------------------------------------------------------------------------------------------------------- 42
3.2 POTENTIAL ATG INVESTIGATIONS ------------------------------------------------------------------------------- 43
3.2.1 In-memory Data Layout ------------------------------------------------------------------------------------------------------------ 43
3.2.2 On Storage Data Layout ------------------------------------------------------------------------------------------------------------ 45
3.2.3 Storage Efficiency -------------------------------------------------------------------------------------------------------------------- 46
3.2.4 Data Distribution --------------------------------------------------------------------------------------------------------------------- 47
3.2.5 Coding for Resiliency ---------------------------------------------------------------------------------------------------------------- 47
3.2.6 Others ----------------------------------------------------------------------------------------------------------------------------------- 49
3.3 POTENTIAL TECHNOLOGY TARGETS ------------------------------------------------------------------------------ 49
3.3.1 Acunu: Real-time Monitoring and Analytics for High-velocity Data ----------------------------------------------------- 49
3.3.2 FoundationDB: A NoSQL Database with ACID Transactions --------------------------------------------------------------- 50
3.3.3 BangDB: A NoSQL for Real Time Performance -------------------------------------------------------------------------------- 52
4 CONCLUSION --------------------------------------------------------------------------------------------------------- 54
4.1 KEY INSIGHTS ------------------------------------------------------------------------------------------------------ 54
5 REFERENCES----------------------------------------------------------------------------------------------------------- 56

LIST OF FIGURES
FIGURE 1: ADDRESSABLE MARKET SIZE OF INFINISCALE STORAGE 15
2
FIGURE 2: PIVOTAL: INTEGRATING EMC AND VMWARE ASSETS 17
FIGURE 3: DATA STORE STACK: THE NEW OSI-LIKE MODEL FOR STORAGE 22
FIGURE 4: ANALYTICS CLASSIFICATION 26
FIGURE 5: BANDWIDTH AND CPU TRANSISTOR COUNT TRENDS 29
FIGURE 6: THE LINUX IO STACK 33
FIGURE 7: OPEN COMPUTE RACK & ONE OPEN COMPUTE PROJECT SERVER IN THE RACK. 40
FIGURE 8: ABSTRACT INFINISCALE ARCHITECTURE 41
FIGURE 9: ABSTRACT LAYERS FOR REAL-TIME DATA STORES 42
FIGURE 10: ABSTRACT LAYERS FOR CAPACITY-BASED DATA STORES 42
FIGURE 11: FITMENT OF PROPOSED DATASTORES ON TO THE DATA STORE STACK 43
FIGURE 12: MEMORY HIERARCHY AND ACCESS LATENCIES 43
FIGURE 13: MEMORY HIERARCHY IN ACTION 44
FIGURE 14: COMPRESSION USING DICTIONARY ENCODING 46
FIGURE 15: ACUNU DATA PLATFORM 50
FIGURE 16: COMPONENTS OF EMBEDDED BANGDB 52
LIST OF TABLES
TABLE 1: FACTORS THAT DIFFERENTIATE INFINISCALE FROM OTHER STORAGES ------------------------------------------------------------- 6
TABLE 2: WORKLOAD CHARACTERISTICS FOR EMERGING DATA STORES-------------------------------------------------------------------- 12
TABLE 3: NETAPP CUSTOMER REFERENCES ------------------------------------------------------------------------------------------------- 13
2
TABLE 4: PORTFOLIO OF EMC FOR EMERGING DATA STORES ----------------------------------------------------------------------------- 17
TABLE 5: TIPPING OVER TO SHARED-NOTHING DATA STORES ------------------------------------------------------------------------------ 30

1 Demystifying the Landscape
This decade is seeing a tremendous proliferation of cloud and mobility enabled services in human
life not only in social interaction with one another but also almost every commercial activity of
consuming goods or services. This report analyzes the impact of this techno-economic trend on the
IT consumption of large enterprises, which are vigorously re-architecting their infrastructures to
enable modern consumption paradigms of their end users. Aside from the social media, these trends
have significantly impacted business operations in e-commerce, financial services, healthcare, media
and application development/deployment. We all are familiar with the consumer side of this impact
as end user consumers of enterprises in these verticals. However, there is a greater disruption on
the business & IT architectural aspects of these enterprises that this report attempts to analyze,
which is characterized as InfiniScale Architectures. The report also studies the response from
NetApp’s competitors & partners to this trend, and concludes with recommendations for NetApp.
The scope of this report does not include cheap capacity [$/GB] optimized storage architectures and
cold data / archival stores. There are separate CTO office initiatives in progress around those. The
scope of this report is IT architectures driving active business operations at cloud scale with relevant
performance, data management and availability requirements. As datasets pass through their
lifecycle, they would find themselves in lower SLA, coarse-grained managed active archives and
eventually in deep / cold archives [Govil08].
Table 1: Factors that differentiate InfiniScale from other storages
InfiniScale Storage Active Archive Deep Archive

Activity Hot Warm Cold
Latency ~µs – ms ~10ms – secs ~secs – mins
Granularity ~kB – MB ~MB – GB ~GB – TBs
Pivotal, Cassandra, StorageGRID/RAIN, AWS Glacier,
Examples
AWS PaaS Atmos, AWS S3 BackBlaze, DDUP
1.1 Business Drivers

1.1.1 Use Cases
The advent of cloud computing and the ubiquitous presence of mobile devices have caused a
paradigm shift in the scale of business operations of enterprises not only in the web services
(messaging, gaming, social et al) space but also retail, financial services, media, telecom, cloud
service providers, public sector, healthcare and utilities verticals. In order to provide competitive
quality of service to their customers, these enterprises need to sustain unprecedented demands of
performance, availability & agility to accommodate a fast growing global scale of web-based
business operations. These needs manifest through three business disruptions that significantly
impact the underlying IT infrastructures:
1. Cloud-based Services: There has been a flood of cloud based web-services offered in the
aforementioned industry verticals that are leading much of the enterprises’ business growth.
These have led to new requirements of concurrency, security, scale & resiliency from the IT
infrastructure that supports the business operations. As an example, MetLife launched1 a
360o consolidated customer view service, in May 2013, called “The Wall” built on NoSQL
document store MongoDB. This Facebook-like internal cloud service has handled 45 million
agreements across 140 million transactions in a short span of 90 days. In a public statement,
MetLife’s CIO has committed investments to transform customer experience using state of
the art InfiniScale technologies.
1
http://www.10gen.com/press/metlife-leapfrogs-insurance-industry-mongodb-powered-big-data-application

2. Mobility of Clients: The ubiquitous availability of high-end mobile clients has led to
exponentially growing demands for consumption of the newly offered web-services from
handheld clients. This has imposed very high levels of scale & availability requirements on
the IT infrastructure. Any downtime or lower level of responsiveness immediately impacts
the quality of service leading to potential loss of business and/or customers. An example of
mobility driven disruption of architecture is Comcast’s Xfinity platform. With 30 billion views
since its rollout in 2003, it scales at 400 million on-demand views per month across its TV,
online and mobile (iOS/Android) interfaces. Their application developers wanted a highly
available, multi-datacenter & schema-less key-value interface and chose Basho’s RiakDB2 in
November 2010 as the underlying InfiniScale storage platform. It allows ubiquitous
availability of data across their online & mobile interfaces, while scaling across 100s of
million client devices.
3. Social Media & Smart Grids: Digital social media are leveraged by businesses to capture the
purchasing & consumptions processes. Proliferation of smart grid sensor based Internet of
Things3 is leading much of business optimizations in manufacturing and supply chain
management4. These have opened opportunities for enterprises to analyze, leverage &
influence consumers through real-time analytics that links machine/user generated logs with
web-based business operations. This has imposed real-time analytics requirements on the IT
infrastructure that can combine live streams of logs with historic data to generate
competitive insights. An example of a technology driven social strategy is shopping center
property development leader Westfield, with 110 centers globally with an accumulated 1.1
bn customer footfalls in their last fiscal year. Westfield launched a global initiative5 in March
2013 to re-architect their business development strategy around its customers’ social
experience. They have since invested in state of the art social media management system
(Hootsuite) and mobility strategy6. It is believed that they leverage EMC’s GreenPlum /
CloudFoundry platform for their InfiniScale storage needs, as gathered by interactions with
the area account teams.
1.1.2 Emerging IT Requirements

The key new requirements imposed on IT that has led to significant changes in its architecture are as
follows:
1. Agility: As cloud based web-services need to be adaptive to changing consumption scenarios

in order to remain competitive, the infrastructure needs to cater to rapidly changing
business requirements in terms of features & scale without any margin for downtime or
degraded service quality. These requirements make it challenging to continue traditional
architectures of file systems and relational data models. Modern location-independent
object stores (such as S3, Couchbase, Riak et al) and schema independent NoSQL data stores
(such as Cassandra, MongoDB et al) provide much needed agility to allow many frequent
changes to the data model without impacting service uptime.
2. Manageability: InfiniScale infrastructures challenge traditional admin-oriented management
practices due to its formidable scale and high rate of growth. Traditional toolsets and admin
staffing are neither scalable nor economically feasible at high growth rates. Hence,
InfiniScale applications and middleware are designed to self-manage themselves in aspects
of performance, resource balancing and failure management. Analysis & resource planning
are de-centralized to avoid single points of failure and congestion points. Application
2
http://basho.com/assets/basho-casestudy-comcast.pdf
3
http://en.wikipedia.org/wiki/Internet_of_Things
4
http://www.mckinsey.com/insights/business_technology/the_internet_of_things_and_the_future_of_manufacturing
5
http://www.westfieldlabs.com/blog/a-new-global-approach-to-social-media/
6
http://www.zdnet.com/westfield-hires-digital-guru-for-tech-push-1339336946/

middleware have built-in monitoring capabilities and ability to execute changes in resource
allocation plans. The cloud service administrator merely sets governance policies that guide
the distributed analysis & planning. Open source cloud/XaaS frameworks such as OpenStack7
mimic this InfiniScale management approach from the architectures of hyper-scale players
such as Google, Amazon et al who saw this need during the early 2000s.
3. Scale: The design of cloud-based web-services opens a much higher scale of active consumer
touch-points that reflects in the respectively humongous number of units of data. Agility
requirements preclude coalescing this into fewer blobs of files or relational database tables.
NetApp’s customers in financial services & telco verticals are known to have generated over
200+ billion units of objects/transactional entities from single web-service. This scale
bypasses the limits of traditional file and blob storage systems. It also necessitates a scale-
out architectural design that can keep pace with the rapidly increasing scale requirements.
4. Concurrency: The advent of mobile clients and Internet of Things has led to unprecedented
levels of concurrency in business transactions & ingested data streams respectively.
NetApp’s customers in e-commerce and telco have reported 1+ million active streams of
transactions and/or data streams. Web-based services are expected to perform with the
required level of response time at such high levels concurrency. Such high levels of
concurrency leads to a distributed storage design that can concurrently serve transactions/
ingest streams.
5. Response Time: Despite accommodating the scale & concurrency through scaled out
distributed storage systems, the response times for inserts/updates remain in the range of
~100s μs per node for transactional application needs and that of ~10s μs per node range
for its real-time analytics needs. These response times have been documented in customer
RFPs in e-commerce, retail & financial services verticals. The response times also need to be
delivered with a high level of predictability.
6. Transactional Boundary: The CAP theorem restricts the transactional boundary of highly
available distributed systems to a unit of data guaranteed to be stored in one system (such
as a document or key-value pair) unlike relational data models that assure large
transactional boundaries at the cost of availability [Brewer00, Gilbert02, Gray81]. However,
emerging NewSQL systems such as VoltDB, FoundationDB et al attempt to challenge this
limitation. Complex inter-node transactional relationships also cause cluster load
rebalancing issues as new nodes are added to scale-out.
7. Content Metadata Service: Web-services drive a diverse set of business growth &
compliance needs from human-generated content such as videos & images in media, retail &
healthcare verticals. While the human-generated content is variable in size and is best
served by capacity/throughput-optimized systems, metadata services need to cater to the
scale, concurrency and agility requirements described above. This has led to hybrid design of
content repositories with more demanding metadata service requirements.
1.1.3 Product Value Domains

Given the fast changing world of IT requirements faced by NetApp’s customers, key customer values
that would enable a web-storage solution to differentiate are:
1. Multi Data-Model Support: Web-storage requirements have led to a spur in the open
community defined data models/services that have been widely adopted by application
developers. A differentiated web-storage solution must support the popular data models
(such as tabular/columnar, document & key-value) in order to cater to diverse needs of web-
applications.
7
http://www.openstack.org/

2. Low Cost, Rapid Scale-out: Customers need to cater to exponential growths in IT
requirements, both across seasonal cycles & year-over-year growths, and need scale-out
rapidly at very low operational, management & performance costs.
3. Predictable Performance: Given the critical nature of service uptime & concurrency support,
there is great value attached to predictability of response times for creation, updation &
querying of data units. It is important to note that traditional indexing optimizations would
fall short of not addressing rapid scale-out & making the infrastructure agile to frequent
changes.
4. Data-Model Relevant Versioning: Given that the distributed data stores support CRUD
(create-read-update-delete) data lifecycles, versioning is key to offer data management
features. In the context of supporting multi data models, the differentiated web-storage
solution would need to offer multiple containers of versioning.
5. Data Tiering: Historical transaction data and post processing unstructured log data could be
moved to low cost storage in case it does not need the low response time access from the
web applications. Data tiering balances the overall TCO of the system and frees scarce
resources for the web application to utilize.
1.2 Emerging Data Stores

The IT requirements and product values described above can be segmented into three distinct
storage workloads that differ by their roles in InfiniScale architectures:
1.2.1 Real-time Analytics Store

Real-time analysis is important for operational and other metrics that need to be responded in real-
time such as, error rates or health monitoring; dynamic prices, such as stock prices or ticket fares;
on-the-spot personalization and recommendations, like the product recommendations when
browsing Netflix or eBay. The various characteristics for an analytics application to be real-time are
1. Timeliness: The data and analysis on the data should be almost instantaneous.
2. Comprehensiveness: Real time analysis doesn't involve sampling, but complete datasets, like
the analysis needed for the last quarter of a business.
3. Accuracy: Data should be accurate. As most of the data involved in analysis is not for
statistical analysis, but maybe for guaranteeing compliance to a generated model.
4. Accessibility: The raw data should be accessible for a few days (to a few weeks) while the
result of analysis should be accessible forever.
5. Performance: Most reports/ dashboards of the analytics framework should render in less
than 5 seconds. Most of these operations involve an interactive session and anything more
than 5 seconds is considered unacceptable, while anything less than 2 seconds is considered
“very responsive”.
Properties that are required for a storage system that is to be used for real-time analytics are:
1. Highly available and distributed: The system should have high tolerance to individual node
failures and makes it possible to add multi-data center support easily if data affinity or
sovereignty is an issue. On top of that it should be easy to expand a Cassandra cluster with
new nodes if necessary.
2. Extremely good write performance: Individual writes are expected to be tiny, and there are
a large number of sessions that need to be handled. Both low latency and high throughput
are required from storage.
3. Low latency reads: This is needed for drill-down and interactive analytics. Most of the
analysis is around a range of data elements, with high degree of time or space locality. Thus
storage must be organized to cater to this low-latency on reads.

4. Batch mutations: This is not needed in the normal course of operation in a real-time data
store. However, batched loads are important to prime a cluster with data or to migrate data
from one location to another.
5. Physically sorted data: This aids efficient storage of time-series data and other common
types of analytic output. When physically sorted data is combined with dynamic data and
slice predicates one can create lookup systems that retrieve large data sets in constant time.
An example of a realtime analytics InfiniScale datastore is Thomson Reuters’ (TR) Velocity

Analytics Platform8 that supports high SLA analytics-as-a-service to trading houses, research
firms and hedge funds. The underlying data platform supports 3 million ticks/ sec ingest rate
across both TR tick feeds and other broadcast servers around the world. All this data needs to be
made available instantaneously to data hungry clients with complex algorithms developed in R,
Matlab or native programming languages (like TCL, Python et al) apart from TR’s own suite of
pre-built algorithms. TR is evaluating moving from Cassandra to HBase for this infrastructure
after recent advancements in HBase’s relative performance.
1.2.2 Session-state Store

Session state is a state whose lifetime is the duration of a user session and is relevant to a particular
user; the duration of a user session is application specific [Ling03]. Examples of session state are user
workflow state in enterprise software and user navigation in e-commerce. A large class of
applications including J2EE-based applications and web-apps in general use the interaction of a
session-state model.
Some important properties of the session state are
 It is not shared.
 It is semi-persistent.
 It is keyed to a particular user.
 It is updated on every interaction.
 Limited scope ACID semantics.
Given these properties, the functionality necessary for a session state store can be greatly simplified
as follows:
 No synchronization is needed across the whole data store

 State stored by the repository need only be semi-persistent
 Single-key lookup API is sufficient
 Previous values of state keyed to a particular user may be discarded
 No need to support full ACID, and thus can avoid distributed locks and shared state across
nodes
Currently, session state storages are built using relational databases, file systems, single-copy in-
memory and replicated in memory.
An example of a session state store is eBay’s metadata service platform9 for all of its web-apps – this
service acts as a single source of truth for all its apps for media placement, ad placement, analytics
(cross-sell) et al. It consists of 100s of billions tiny metadata objects constantly updated by users’
interactions with eBay’s apps. This is a multi-datacenter platform service with read-write ratio of
10:1 with a latency SLA of ~500μs (as disclosed to our account team). They currently deploy a 400-
node MongoDB cluster with replication across two datacenters accelerated by PCIe flash hardware.
8
http://www.youtube.com/watch?v=8SP9klEv-Ho
9
http://www.slideshare.net/mongodb/storing-ebays-media-metadata-on-mongodb-by-yuri-finkelstein-architect-ebay

1.2.3 Active Blob Store
There has been a great emphasis on rich multimedia user experience since late 2000s and the
evolution of Web 2.0 technologies coupled with low cost bandwidths and high-resolution
screens/cameras in handheld devices have made it mandatory to offer to consumers of products &
services. These have led to the third key component of the InfiniScale experience – media assets that
create consumer experience. These assets are largely unstructured objects that need to be served
and/or ingested during the window of interaction between the consumer and the service. Examples
of this workload in our experience include images served by Facebook or Twitter10, videos served by
Netflix1112 [Netf13a, Netf13b], audio based consumer services such as Apple Siri and voice-based
supply-chain/mobility solutions. The object services are latency sensitive as the assets are an
essential ingredient in the interaction and need to be served with the same latency SLA as the
session state & analytics workloads.
The characteristics of this workload are:
 Petascale in capacity with hundreds of billions of objects with variable sizes (~10kB and
larger)
 Latency SLA on accessing & ingesting the first bytes of the objects (typically ~10ms)
 Emphasis on predictability of performance
 Storage & transmission error needs to be detected & corrected
 Multi-geo availability of content with disaster recovery built-in
BlobStores are aimed at storing and managing data objects, called blobs that are much larger than
the size allowed for objects in the real-time analytics store and a session state store. Blobs are useful
for serving large files, such as video or image files, and for allowing users to store binary large
objects. Most commonly known BlobStores are Microsoft Azure Blob Service and Amazon S3. Here
are some key points about BlobStores:
 Globally addressable
 Key, value with metadata
 Accessed via HTTP
 Containers are provisioned on demand through API calls
 Unlimited scaling
Commonly available BlobStores such as Google Analytics Engine BlobStore and Amazon S3 BlobStore
consists of three concepts: service, container, and blob.
 A BlobStore is a key-value store such as Amazon S3, where a user can create containers.
 A container is a namespace for the data, and a user can have many of them.
 Inside a container, a user store data as a Blob referenced by a name. Commonly in existing
BlobStores the combination of a user's account, container, and blob relates directly to a
HTTP url.
10
http://engineering.twitter.com/2012/12/blobstore-twitters-in-house-photo.html
11
https://github.com/Netflix/astyanax/wiki/Features
12
http://techblog.netflix.com/2012/01/announcing-astyanax.html

1.2.4 Summary
The workload characteristics of this classification of storage are as depicted in Table 2, below:
Table 2: Workload characteristics for emerging data stores13
Data Stores
Workload Characteristics Traditional Store Real-time Analytics Store Session-state Store Active Blob Store
Extremely time bound queries on Massively concurrent CRUD RESTful geo-distributed

Use Case Transaction Processing Systems
WORM streaming data mutations objects
Architecture Shared Storage Scale-out, Shared-nothing Scale-out, Shared-nothing Scale-out, Shared-nothing
Ingest Unit Size Few KB upwards Typically <1K Few bytes to <1 MB 100 kB to 1 GB
Number of Active Streams 100 1M 100K 100K
Number of Active Units 1+ Million 1+ Billion 100+ Billion 10+ Billion
I/O Latency (Application) 10ms <0.1ms(Ingest/Query response) <1ms (Mutation response) 10 ms
I/O Throughput (Client) 0.1+ GBps 1+ GBps 10+ GBps 10+ GBps
Predictability Moderate Very High High Moderate
Transactional Data Store Yes Restricted Restricted Restricted
Data Abstraction File, Object, Block Vanilla Object Metadata Rich Object Metadata Rich Object
Data Model None Optimized for Queries Optimized for Insert/Update Optimized for Ingest
Single Object, Object Group,
Unit of Management Volume, LUN, File Data Model Driven NameSpace
NameSpace
Single Object, Object Group Single Object, Object Group
Unit of Versioning Snapshot None
NameSpace NameSpace
Unit of Mutation Any write Optimized for Immutable Objects Whole Object Whole Object
Tenancy Multi-tenant Evolving Evolving Multi-tenant
Consistency Strong Programmable Programmable Programmable
Service Uptime Moderate Very High High Moderate
Ingest-time specific, Performance
Differentiated Services Siloed, Had to coordinate Application Specified Multi-DC Storage Efficiency
Oriented
NameSpace extension into DR NameSpace extension into DR
Geo-distribution Only for DR; not Namespace No
site site
13
Based on data collected from existing NetApp customers that include Thomson Reuters, eBay & Apple
1.2.5 Customer References
Below is a compilation of NoSQL data stores being used at our existing customers. These data stores are not leveraging NetApp storage for (one or
more) reasons of scale, cost, latency or throughput. Ontap data management is not considered as a differentiating factor with these data stores.
Table 3: NetApp Customer References
NoSQL Workload Operational Multi-DC Reason for choice of

Customer Description NoSQL Use-case
Product sharing Gaps Active/Active? NoSQL DB
Transaction data store for SaaS offering to Security/Access Flexibility, write
Intuit MongoDB Yes Yes Transactional
customer e-commerce websites control performance
DB service for application developed by LoBs in a Transactional,
eBay MongoDB Yes Automation, DR Yes Agility, flexibility
pvt cloud, deployed as services on client traffic General DB
Multi-DC cluster
Apple iCloud Cassandra iCloud has moved from Oracle to Cassandra No DR strategy Yes Transactional operations,
performance at scale
Multi-DC cluster
Siri has moved from Oracle/FAS to
Apple Siri Cassandra No DR strategy Yes Analytics operations,
Cassnadra/blade-server with internal flash
Multi-DC cluster
Core ticker symbol real time ingest, processing & Performance on
Thomson Reuters Cassandra No Yes Analytics operations,
query engine is moved to Cassandra failure, DR
Multi-DC cluster
SMS, IMAP, MMS services have migrated to Replication Transactional,
Verizon Wireless Cassandra No Yes operations,
Cassandra as the data store mgmt General DB
Sophisticated version
Challenging to Object metadata
Citigroup Riak Private & secure S3-like object storage service No Yes control, scalable
differentiate mgmt
object interface
Sophisticated version
Challenging to Object metadata
ComCast Riak Media metadata library for XFinity service No Yes control, scalable
differentiate mgmt
mobile client interface
Flexibility, write
IGN / News Corp MongoDB CMS & metadata store No DR strategy No Transactional
performance
Shared data mgmt for gaming apps development Challenging to Transactional,
Playdom / Disney MongoDB Yes No Agility, flexibility
& deployment platform differentiate General DB
Multi-DC cluster
Northrop File repository with real time ETL & search Performance on
Cassandra No Yes Analytics, Search operations,
Grumman / USPS capabilities failure, DR
1.3 Addressable Market Size
We define the addressable InfiniScale market size as being inclusive of networked & direct attach
storage architectures combined with storage software & services. This is in accordance with the TAM
model adopted internally by Product Strategy and Customer Intelligence teams. The addressable
InfiniScale storage market may be segmented into two consumption-based categories:
1. Dedicated On-Premise: These include all on-premise InfiniScale application deployment

scenarios that lead to storage consumed through dedicated infrastructures. Some recent
examples of this kind of deployment include:
a. Thomson Reuters Velocity Ticker Analytics platform built on Apache Cassandra
b. eBay’s e-commerce session state platform built on MongoDB
c. Verizon Wireless messaging platform for SMS/MMS/IMAP services built on Apache
Cassandra
Customers prefer on-premise model of InfiniScale storage consumption due to one or more
of the following reasons [Khai11]:
 There is great seasonal predictability in the workload

 Regulatory compliance makes it hard to migrate to the cloud
 Performance requirements cannot be met uniformly across multiple geographies
 Porting existing applications are too expensive in comparison to cost benefits from
migration
 Demand growth rates are reasonably manageable with private infrastructure
The on-premise InfiniScale storage market has been modeled in consultation with IDC,
Customer Intelligence and Product Strategy groups. The model showed a 30% CAGR in the
2012-2016 timeframe, leading to an addressable market of $1.0bn in 2016.
2. Cloud PaaS/IaaS: Deployments that built ground-up on the cloud are inherently InfiniScale
by nature as they exploit the platform’s elasticity & seamless support for mobility of clients.
Large SPs offer InfiniScale storage as the primary service for persisting unstructured, semi-
structured & structured datasets. Some examples of popular InfiniScale persisted data stores
offered by SPs are:
a. AWS offers DynamoDB, S3 for semi-structured & unstructured data respectively
b. OpenStack-based HP Cloud offers Apache Cassandra as its primary InfiniScale
persisted datastore
c. Azure offers MongoDB as the primary data store for all its .NET based applications
Customers are taking a cloud-first approach while architecting new InfiniScale applications
ground-up. Recent examples of cloud-first architectures from NetApp’s customers include:
 Intuit’s TurboTax development & testing platform built on AWS

 Apple’s iCloud service built on AWS & Azure
 Thomson Reuters’ Ticker History service built on SP SIRCA, now possibly migrating to
AWS
Based on the projected uptake of cloud services as shown in Figure 1, InfiniScale storage on
cloud SPs is expected to grow at a 40% CAGR and reach $2.8bn market size by 2016. The
total InfiniScale storage market inclusive of both categories is expected to grow at 37% CAGR
to reach a market size of $3.8bn by 2016.

$4,000
$3,500
Market Size (in millions)
$3,000
$2,500 $2,752
$2,000 $3.8bn
$1,991
$1,500 $1,440
$1,000 $1,024
$736
$1,051
$500 $797
$458 $604
$348
$0
CY2012 CY2013 CY2014 CY2015 CY2016
Cloud PaaS/IaaS Dedicated On-Premise
Figure 1: Addressable Market Size of InfiniScale Storage
1.4 Competitive Landscape

As seen in the previous sections, InfiniScale storage architectures satisfy very unique IT requirements
within customers’ SLA & economic constraints. These warrant a significant departure from
traditional IT storage architectures and invest in alternate solution & product engineering. In this
section, we explore a few ecosystem players who are making significant investments to address this
space.
1.4.1 EMC Pivotal

EMC (along with constituent subsidiaries) has been investing in the InfiniScale space for over four
years. It has built a very interesting portfolio of technologies over the years through very focused
acquisitions:
SpringSource [Aug 10, 2009]14: VMWare’s acquisition of SpringSource for $420M in Aug 2009
heralded EMC’s foray into InfiniScale application & IT frameworks. SpringSource was the innovator
and driving force behind some of the most popular and fastest growing open source developer
communities, application frameworks, runtimes, and management tools (including Apache TomCat,
Groovy & Grails). In just five years, SpringSource has established a presence in a majority of the
Global 2000 companies, and is rapidly delivering a new generation of commercial products and
services. VMware continues to support the principles that have made SpringSource solutions
popular: the interoperability of SpringSource software with a wide variety of middleware software,
and the open source model that is important to the developer community. Just prior to this
14
http://www.vmware.com/company/news/releases/springsource.html

acquisition, SpringSource had acquired Hyperic (in May 2009), the leading InfiniScale real-time
monitoring, infrastructure analytics & planning open source tool.
Cloud Foundry [August 19, 2009]15: SpringSource had planned acquisition of Cloud Foundry, an
Oakland based open-source PaaS platform provider prior to its acquisition by VMWare. VMWare
endorsed this decision and it happened as planned right after SpringSource’s acquisition. Cloud
Foundry complements SpringSource by allowing applications developed to take full advantage of
elastic cloud computing. Over the coming years, VMWare has invested in integrating Cloud Foundry
with all popular IaaS platforms - vCloud, AWS and OpenStack. VMWare has also provided an open-
source cloud provider interface (CPI) called BOSH for integration into any infrastructure (IaaS)
platform. Cloud Foundry has found great traction with session state management InfiniScale
applications (such as e-commerce) with its seamless integration with MongoDB. eBay has developed
an e-commerce as-a-Service platform called X.com on top of Cloud Foundry, which it uses internally
as well.
GemStone [May 6, 2010]16: GemStone Systems, Inc. was a privately held provider of enterprise data
management solutions based in Beaverton, Oregon. The acquisition advanced
SpringSource/VMware/EMC’s vision of providing the infrastructure necessary for emerging cloud-
centric applications, with built-in availability, scalability, security and performance guarantees for an
elastic session state store. These modern applications require new approaches to data management,
given they will be deployed across elastic, highly scalable, geographically distributed architectures.
With the addition of GemStone’s data management solutions, customers will be able to make the
right data available to the right applications at the right time within a distributed cloud environment.
Greenplum [July 6, 2010]17: EMC acquired the privately held Greenplum Inc. in 2010 and added a
data warehousing technology to enable big data clouds and self-service analytics. Greenplum utilizes
a shared-nothing massively parallel processing (MPP) architecture that has been designed from the
ground up for real-time analytical processing using virtualized x86 infrastructure. Greenplum is
capable of delivering 10 to 100 times the performance of traditional database software at a
dramatically lower cost. Post-acquisition, EMC invested in developing map-reduce Hadoop
capabilities to Greenplum and built a proprietary version of Hadoop called Greenplum HD.
Pivotal Labs [March 20, 2012]18: EMC acquired a boutique mobile / cloud application development
consulting and project management SaaS firm, Pivotal Labs, in March 2012. This is an important
acquisition that added much required talent force to enable EMC’s internal cloud service ambitions
as well as offer this as a professional service to its customers.
Cetas [April 24, 2012]19: VMWare acquired an early stage 18-month old startup Cetas that developed
an elastic cloud friendly query platform on top of Hadoop. It virtualized Hadoop’s architecture into a
cloud friendly stack that could be deployed on AWS or vCloud.
EMC Pivotal Initiative [April 2013]20: After the mixed success of EMC Unified Analytics Platform
product based on Greenplum & Greenplum HD, EMC & VMWare are on the cusp of rolling out a
federated platform-as-a-service called Pivotal. In this joint venture, EMC has 69% stake with
Greenplum & Pivotal Lab technologies. VMWare has the rest 31% stake with Cloud Foundry,
15
http://classic.cloudfoundry.com/news.html
16
http://www.vmware.com/company/news/releases/spring-gemstone.html
17
http://www.emc.com/about/news/press/2010/20100706-01.htm
18
http://www.emc.com/about/news/press/2012/20120320-02.htm
19
http://gigaom.com/2012/04/24/vmware-buys-big-data-startup-cetas/
20
http://gigaom.com/2013/03/13/the-pivotal-initiative-in-case-you-were-wondering-is-now-official/

GemStone and Cetas technologies. This new joint venture already has $300M revenue and 1,290
employees. With an additional investment of $400M, it is expected to grow to a $1bn company by
2017.
Figure 2: Pivotal: Integrating EMC2 and VMware Assets
Note that Pivotal marks the entry of EMC into a service based business model competing head-on
with AWS & Azure, while also being able to inter-operate with them as pure IaaS platforms (owing to
Cloud Foundry’s BOSH CPIs). This goes to emphasize the close relationship between InfiniScale
applications and cloud, and EMC’s offering allows its customers to consume its technology portfolio
as a wide catalogue of services. Here is a summary of how EMC Pivotal maps onto InfiniScale
architectures:
Table 4: Portfolio of EMC2 for emerging data stores
Realtime Analytics Greenplum, Cetas

Session State Store GemFire, MongoDB
Active Blob Greenplum HD, Underlying IaaS
Manageability Pivotal Labs (Development), SpringSource
(Framework), Cloud Foundry (PaaS)
1.4.2 Amazon Web Services

As seen in the market segmentation section, cloud based consumption of InfiniScale architectures
outgrows private infrastructure based architectures. Amazon saw this trend all the way back in 2006,
when it launched industry’s first pure public cloud offering Amazon Web Services (AWS). Having
begun as pure infrastructure services (compute, storage, networking), AWS progressively developed
and launched over 40 different services over the next 6-odd years giving it a wide InfiniScale
application, data and manageability services.
Here are the InfiniScale services offered by AWS segmented by workload categories:
(Realtime) Analytics: Apart from seamless support for Apache Cassandra & SAP-HANA21, AWS also
recently launched a massively parallel cloud data warehouse called AWS RedShift22 (currently in
beta). RedShift simplifies integrating datasets in AWS S3 (active blob store) and AWS DynamoDB
(Session State Store) into a queryable interface for analytics. RedShift guarantees < $1/GB/year price
21
https://aws.amazon.com/marketplace/b/6153421011/ref=mkt_ste_L3_MP
22
http://aws.amazon.com/redshift/

(1/10 of commercial analytics software suites). It is also noteworthy that the manageability
simplifications RedShift brings, as it would not need the customer to employ as many DBAs any more
as it has all the required query analysis & infrastructure planning built-in. While RedShift is not a
real-time data store today, we believe that Amazon will extend the capability of RedShift to include
real-time capabilities, like SAP Hana, for traditional database support and low-latency operation.
Session State Store: AWS offers a wide range of low latency data store functionalities for InfiniScale
applications to persist session state data from transactions / interactions. The key service,
DynamoDB23, is based on Amazon’s well-known Dynamo storage engine and offers flexible key-value
interfaces for applications to persist semi-structured data with flexible schemas. An associated
service, ElastiCache24, offers in-memory caching for DynamoDB and offers very low latency
performance backed by SSDs. DynamoDB provides a scale-out architecture that can seamlessly
rapidly scale from few thousand users (~100 read-writes/sec) to many million concurrent users
(~100k read-write/sec) without the customer requiring altering the architecture or application. This
has made DynamoDB very popular25 amongst gaming, social apps, advertising & e-commerce
customers who see volatile surges in demand.
Active Blob Store: AWS offers both map-reduce and RESTful interfaces to blob store data through its
EMR26 and S327 services. It is noteworthy that both these are integrated with AWS RedShift and AWS
DynamoDB, thus letting customers build architectures with seamless integration across the
InfiniScale workloads. Another important data service that lets customers build customized
architectures is AWS Data Pipeline28 that reliably moves data between AWS services.
The most significant benefits of cloud based InfiniScale architectures are elastic high utilization of
hardware & software resources and extremely simplified manageability, which together brings great
agility and economics to the business.
1.4.3 Open Source Projects

Numerous open source community based InfiniScale initiatives have been widely adopted by large
enterprises replacing traditional OLTP/OLAP deployments for flexibility, operational efficiency,
simplicity of management and economics. In this section, we describe two of the most popular ones
that have significant presence within NetApp’s enterprise customers.
Apache Cassandra: Cassandra was originally developed at Facebook to provide an InfiniScale

columnar DB platform for high speed ingest and latency sensitive processing of queries on large
semi-structured datasets. It was released as an open source project on Google Code in July 2008 and
later became an Apache Incubator project in March 2009. The first commercial distribution of
Cassandra came from a startup DataStax in 2010 and then by Acunu later that year. Cassandra is
known for its mature data distribution model (based on Amazon’s Dynamo storage), which gives it
unique linear scalability. NetApp customers have designed Cassandra clusters that can efficiently
ingest ~1M records / sec and give ~ms query response times – all from an infrastructure with a total
acquisition cost of ~$100,000. These characteristics have led to wide adoption in financial services
(for time series analyses), telco and SPs. Cassandra has very advanced self-manageability and failure
management characteristics, which contain the impact of server failures to a small (~minutes) time
window of lower performance and no downtime. The linear scalability coupled with auto-
23
http://aws.amazon.com/dynamodb/
24
http://aws.amazon.com/elasticache/
25
http://www.allthingsdistributed.com/2012/06/amazon-dynamodb-growth.html
26
http://aws.amazon.com/elasticmapreduce/
27
http://aws.amazon.com/s3/
28
http://aws.amazon.com/datapipeline/

rebalancing of data/workload makes planning of the infrastructure much simpler. Cassandra has
become the analytics platform of choice in Netflix, Apple, Thomson Reuters, VZW, Cisco WebEx et al.
Architecturally, Cassandra displaces Oracle/NFS deployments and thus (inadvertently) displaces
NetApp install base. This has led to revenue losses in many of the large NetApp accounts.
MongoDB: MongoDB was developed in 2009 by 10gen as a general-purpose transaction store with
web-friendly document/JSON data model. 10gen built extensive library extensions across all popular
programming languages to let developers seamlessly persist data structures as documents on to
MongoDB. This led to a very high level of uptake of MongoDB amongst InfiniScale app developers
using Java, .NET, SpringSource, Python or Ruby/Rails frameworks. 10gen also built an SQL-like query
interface that helped app developers move over from SQL Server to MongoDB easily. Today,
MongoDB is the de-facto choice for session state stores in web-based InfiniScale apps in e-
commerce, gaming, SaaS & web-based transactions. Documents give a schema-free architecture
(with support for indexes) that brings agility for accommodating changes in schema dynamically (as
data structures are ingested) without any downtime. It also requires minimal admin work, as schema
management is in-built. It shards automatically and handles failures through replica sets (that also
help read performance). Being an in-memory database that uses mmap() to persist memory images
to disk, it has shown cache coherency issues with NFS (esp. with journaling on) leading to reduced
write performance over NFS. This has led to customers choosing internal HDD as the preferred
storage architecture (barring a handful iSCSI/FC deployments). Thus the choice of MongoDB
displaces NetApp install base due to architectural issues with NAS storage. As with Cassandra, wide
customer adoption of MongoDB has led to revenue losses for NetApp. Some of the large NetApp
customers with MongoDB deployment include eBay, Disney, News Corp., Intuit, Apple et al.
Key characteristics of these open source based products include:
 Built-in manageability that leverages linear scaling of commodity infrastructure & failure
management and ability to sustain very rapid growth in demand with minimal admin
overhead
 High performance with extreme flexibility of changing schemas / data models owing to the
non-relational architecture of the DBs themselves
1.5 Why Should NetApp Care?

The 2010s decade is seeing two fundamental trends leading to a plethora of technology and
business disruptions:
 Abundance of IT resources due to high end CPU, memory and network bandwidth available
at commodity prices, leading to shifting of the financial bottleneck to operational efficiencies
 Economics around the business value of data that has fine-grained analytics into all modern
business operations percolating deep down into infrastructure architectures
The InfiniScale apps exploit #1 above and deliver #2 to business operations of enterprises. This is a
fundamental value domain migration from the previous decades, which was about infrastructural
efficiency (conserving memory, disk, CPU resources at high operational costs) and application
agnostic aggregation of IT resources (such as shared storage) without any architectural underpinning
of the business value of data. As NetApp customers are migrating to modern business operations
(following the trends above), they are demanding new product values from vendors like NetApp. It is
very important for NetApp to evolve and support modern product values relevant to this world. As
seen in the market sizing analysis, NetApp would miss on hyper-growth in this market segment if it
continues to support only the traditional product values.
Thus, InfiniScale is not just impacting and growing in Internet companies but is also home to rapidly
growing data stores in enterprises such as eBay, Intuit, Thomson Reuters, UBS, UHG and the likes

that are NetApp customers today. In most cases these workloads are adjacent to the workloads that
we have traditionally served. However, there have been cases where InfiniScale architectures (and
the likes of Cassandra) have replaced the Oracle-on-NetApp combo, and this is a growing trend.
Jay Kidd, our CTO, provided the following sequence of causation that accentuates relevance of
InfiniScale to NetApp:
 Demand for real-time analytics will drive creation and adoption of in-memory compute apps
and models
 In-memory apps will drive demand for large storage class memory (SCM) extensions to
memory to work on larger working sets. This will drive reduced cost of SCM-loaded systems
and a virtuous cycle of adoption will begin.
 Rise of in memory/SCM stores will give rise to in memory/SCM data management
models. Intel’s non-volatile memory ‘file’ system for SCM is an example of this. These data
management models will deal with a cache-line size block as the primitive and provide
distribution, protection, recovery and efficiency services, while maintaining low latency.
 This data management model will put new demands on the capacity tier to provide efficient
capacity for cool data, excellent latency for warm data to feed the SCM. These capacity
stores must not assume traditional block or file structures to write to disk, but must start
with the performance requirements of the cache-line sized granular objects and figure out
how SSD and HDD can store them. In short, everything we know will change.

2 Demystifying InfiniScale Technology
This section intends to capture the tenets of InfiniScale solutions. We want to highlight not just what
InfiniScale is but also how it is used, deployed and managed, as it is different from our traditional
operating range of workloads and solutions. This section also examines solution shifts and the
technologies that are driving that change.
2.1 InfiniScale Technology Overview

Below are the key aspects of InfiniScale that are introduced here and discussed in detail further, in
the following sections:
1. API-driven Storage: Next generation of storage is API-driven rather than protocol-driven. In

fact where the boundary of storage ends and where a database starts, is getting blurred.
This document introduces the notion of a Data Store, which is an emerging Data Storage
solution for InfiniScale. There is also an emergence of a new OSI-like model for these data
stores.
2. Emerging Workloads and Datasets: A brief preview of the emerging Workloads and
Datasets was provided in the previous section. These are very different from traditional
enterprise applications. The impact of this evolution is examined in detail in subsequent
sections along with the technology elements supporting this change.
3. Emergence of Custom Data Stores: To serve that variety of Workloads and Datasets, there
are more than 120 NoSQL Data Stores in the market today. And more are springing by the
day. The reason for this is that each application is now intending to pick a Data Store that
most closely meets its needs. This is thus the era of specialization.
4. Analytics Changes Everything: Analytics is driving change in the storage and data center
architectures, more than any other business need. This started with Batch Analytics and is
now changing the way databases have been used for the last 30+ years.
5. Scale-out, Shared-nothing Architectures: This is yet another incarnation of distributed
systems. It is different this time around – it is driven by business needs, and is emerging
without any baggage of having to support traditional applications and provide backward
compatibility. This has led to its quick adoption, fueled by right technologies.
6. POSIX Semantics Eliminated: Newer application infrastructures neither want to use POSIX at
its underbelly, nor want to provide a service with POSIX semantics [Grid06, Hild09]. This is
not to say that traditional applications will move over, but the newer applications prefer a
more simplified access paradigm [Beer13].
7. Real-time Processing: Businesses are demanding real-time responses, right from data ingest
to data processing. A number of businesses have been created and are run because of the
quick turn-around times that InfiniScale solutions can provide. To support this, InfiniScale
solutions place very stringent requirements on the way data is captured, organized and
processed.
8. Data Management Evolution: Data Management is application-driven, inline and fine-
grained and is provided by middleware at the host. Data Management is one of those
fundamental aspects that allow NetApp to differentiate its solutions. That control point is
shifting upwards and into the host-tier [Sakr11]. Even the fundamental characteristics of
Data Management are evolving [Voul11].
9. Construction of InfiniScale Solutions: The way InfiniScale solutions are designed,
constructed, released, deployed and managed is very different from traditional models. The
software is constructed to assume hardware failure will happen and they have resilience
built into it, by design. Thus, managing such a deployment is much cheaper and simpler. A
lot of this is also fueled by proven software being committed to communities.
10. Data Center Infrastructure: A number of data center infrastructure providers are joining
hands and designing the next generation of highly efficient data centers, in the open and

through communities. This effort intends to revolutionize the efficiencies of data centers,
reduce costs, accelerate innovation and simplify data center management.
2.2 InfiniScale Technology Insights

This section drills down into each of the above-mentioned aspects of InfiniScale, with a deeper
discussion around what it means and the impact that technology elements might have on the same.
Each sub-section also examines what it means for NetApp.
2.2.1 API-driven Storage

Next generation of storage is API-driven rather than protocol-driven. In fact where
the boundary of storage ends and where a database starts, is getting blurred. This
section introduces the notion of a data store, which is an emerging data storage
solution for InfiniScale. There is also an emergence of a new OSI-like model for these
Data Stores.
One of the fundamental trends is to leverage commodity hardware in InfiniScale solutions. This
allows for scaled-out and shared-nothing architectures to be built and operated. As a result of this,
storage attached to each host is thus managed by the middleware at the host and collectively, across
nodes, it presents an abstraction to the application.
Figure 3: Data Store Stack: The new OSI-like Model for Storage
Figure 3 presents an OSI-like model for emerging data stores, also called the Data Store Stack. Well-
defined and de-facto standard APIs are emerging between each of these layers. The various layers of
the data store stack is as explained below:
1. Application: Applications are developed and deployed (typically) on a PaaS platform with
language bundled data service APIs. For example, eBay develops its web-service applications
on CloudFoundry and uses its bundled document store or MongoDB APIs.
2. Data Service: This encapsulates the underlying complexity of the stack and presents a
convenient API to use. For example, MongoDB provides a convenient JSON interface for a
transaction store permitting applications to persist data structures as Mongo documents. A
Data Service is also responsible for offering typical CRUD and/or query interfaces to the
application.
3. Data Model: A data model describes the logical relationships, ordering and organization of
data items, when accessed through their keys. Commonly used data models are key-value
stores (Riak, Acunu), document stores (MongoDB), graph stores (Neo4J) and columnar stores

(Cassandra, HBase). This is one of the most important elements in selecting a data store, by
an application. A data model allows an application to express its organization of data based
on its access patterns and needs. For example, placing data in different column families in
Cassandra conveys to the infrastructure that the two data elements will not be accessed
together and thus helps in better memory and cache utilization.
4. Data Distribution: The data distribution layer hides details of hundreds (or potentially
thousands) of nodes working under the hood to serve the needs of the data service. It
manages the membership of nodes and triggers capacity redistribution as needed.
Additional and more complex data transformation functions can be attributed to this layer,
as needed. Later we discuss some shifts, which are moving some lower-level storage
functions into this layer, like storage efficiency and resiliency.
5. Data Abstraction: The data store middleware manages these node-specific LUNs (or at times
leverages POSIX), and organizes the data layout based on the specific needs that it promised
to accomplish [Kan12]. For example, if it promised that locality of access across keys will be
guaranteed within a certain latency band, it shall organize the data to accomplish those
goals. Very often we will find the interface to be KV-stores, but internally the layout might
be as interesting and as diverse as implementations [Lars12, Leva13]. Examples are the
fractal-B trees of TokutekDB or the doubling arrays of Acunu’s Castle data stores. Both of
these are examples of cache oblivious structures and algorithms that organize data for fast
and efficient retrieval. Data layout is expected to become a key point of differentiation in
this evolution.
6. Data Storage: The physical infrastructure is presented to the upper layers as Logical Units,
also called LUNs, and is not very different from the definition of LUNs in traditional storage.
The goals are to abstract out islands of physical storage and the underlying storage media
from the higher-level abstractions.
7. Physical Infrastructure: This is the physical medium on which data will be physically stored.
This may vary from disks to flash to another storage or memory technology. Above this
layer, everything is logical. Even within this layer, there may be elements of logicalness that
are not explored further.
A
The top-half of the stack (up to the data distribution layer) is driven by application needs and
deployment considerations. The bottom-half of the stack is driven by technology trends and
evolutions. Most often the applications would not care about the distribution layer, but some
examples in which they would are: eventual consistency, cross-geo data distribution.
From a NetApp perspective, Bit is the data distribution and data abstraction layers where we have an
opportunity to innovate and differentiate, with well-engineered products and solutions.
Traditionally, NetApp has been focused on the bottom-half of the stack and more deeply invested in
the data layout, with WAFL. It is imperative that we expand that focus into the data distribution
layer, as our focus also includes addressing the needs of a more geo-dispersed cloud infrastructure.
C
In the past, we have driven efficiencies by coupling storage resiliency (through RAID) with data
layout. With a more geo-dispersed infrastructure, efficiencies will need to be driven by coupling
storage resiliency with the data distribution layer. This is a fundamental shift in thinking at NetApp,
but is already trending in that direction, in the industry.
2.2.2 Emerging Workloads and Datasets

A brief preview of the emerging Workloads and Datasets was provided in the
previous section. These are very different from traditional enterprise applications.
The impact of this evolution is examined in detail in subsequent sections along with
the technology elements supporting this change.
InfiniScale solutions, as of this writing, are fueled primarily by the need to ingest and analyze large
amounts of machine-generated data. Riding on the trend of Internet of Things, a lot of this machine-

generated data is in the form of tiny updates from hundreds of sensors in an enterprise. A sensor can
be a real physical sensor in a manufacturing shop, monitoring the health of machines or the quality
of goods produced, as in the shop floor of FoxConn (an Apple supplier). Data from physical sensors
are also streamed from the neo-natal wards of hospitals that monitor the health of new-borns. Most
of these data are chunks that are less than 512 bytes, and are often less than 100 bytes. This data is
typically matched against well-known and healthy models to detect any anomalous behavior in real-
time. Thus the Internet of Things enables FoxConn to push a lean quality control model through its
supply chain.
One of the first storage challenges is the ability to ingest large amount of tiny data from a large
number of data sources, without dropping (losing) any data. If lost, that data packet might have
been carrying anomalous behavior information of the system. So, this is not about being statistically
correct in a large corpus. It is about gathering and analyzing all data. A slightly extreme case of such
data ingest is that of Twitter, which needs to handle close to 5000 tweets/ sec, and the number of
data sources might not be known before-hand. An enterprise deployment would be slightly more
predictable than that. One of the ways in which this workload challenge is addressed is by
minimizing system resource hold time. Thus, asynchronous response mechanisms must be
developed to help scale the solution better and not couple the front-end source-side processing with
the back-end data sink processing.
Another challenge is to be able to layout the tiny data on stable storage, reliably. This gets
challenging due to the huge amount of randomness that might be introduced by the requirement of
storing the incoming datum along a certain dimension mandated by the key. This is one such
challenge that Thomson Reuters has been faced with in their stock ticker service, where they cannot
afford loss of any input and the inputs received have to be organized along the stock timeline. One
of the ways in which the industry is addressing this challenge is by leveraging the in-memory data
layout capabilities of data stores, like Cassandra. Cassandra allows the application to specify a key,
which may be a compound string using the stock symbol. The value of that may be stored in a
Cassandra column, which may be time-versioned.
But, that is only half the explanation. It is so because it describes what happens in the logical domain
(the data model) and did not describe how physically it is organized to meet the ingest criteria. Most
of the InfiniScale solutions are heavy on the use of memory. This is because when data is ingested
into a Cassandra node, it is stored in memory, organized along a certain dimension, logged for
recoverability, replicated to another node for fault tolerance and then acknowledged to the client.
So, ingest has very low latency. Periodically, that data collected in memory is flushed to stable
storage. Organization of data in a specific form on stable storage is covered in the next sub-section.
It is worthwhile to examine Dhow such heavy usage of memory (DRAM) is considered economically
feasible. First off, volume DRAM prices are at $15/GB, as of this writing, and dropping 20% every
year. Secondly, the data model of Cassandra helps with a number of potential optimizations that
helps bring only that data into memory that is needed. This is through the column family
abstractions of Cassandra. Other attributes (column families) of the object (row) in question are not
brought into memory. Thirdly, organization of data as column families leads to high compressibility.
Because of high similarity of content of a column family, data is highly compressible, and therefore
IO throughput is also kept high. Cassandra uses bloom filters to selectively read segments within a
column family. Thus, when data is finally brought into memory it is absolutely the data that needs to
be consumed. Research in the areas of data compression and database performance has also shown
that compressed data can be used directly without having the need for necessarily uncompressing
the same. This allows for better memory bandwidth utilization although at a slightly higher CPU
utilization. Given that the cost/ CPU cycle halves every 2 years, and that the memory bandwidth is
always challenged, these techniques go a long way to increase the usable memory capacity, making

in-memory processing economically feasible. More analysis on the optimizations is presented in
subsequent sections.
The above techniques give an InfiniScale solution the ability to acknowledge each datum ingested in
well under a millisecond. It also gives the capability to analyze historical data, over short windows,
and raise alarms for anomalous behaviors.
From a NetApp perspective, Ontap can handle tiny data updates, as long as each update is not
addressed by subsequent accesses. In the context of emerging InfiniScale workloads, that is precisely
what is needed. Each data element (addressed by a key or objectID) needs to be tracked in a sea of
data elements. These tiny data elements need to be versioned and accessed with a certain temporal
and spatial locality with other data elements. From this perspective, an InfiniScale solution is more
like a database to Ontap, as the inter-relationship among data elements is alien to Ontap. When
fine-grained data management happens at the higher levels, Ontap value is diminished. When the
intensity of workloads threatens to create a bottleneck at the controller, Ontap is not the storage of
choice.
2.2.3 Emergence of Custom Data Stores

To serve that variety of Workloads and Datasets, there are more than 120 NoSQL
Data Stores in the market today. And more are springing by the day. The reason for
this is that each application is now intending to pick a Data Store that most closely
meets its needs. This is thus the era of specialization.
Our intention here is not to analyze those 120+ data stores, but to state that in this era of
specialization, there is a growing need to adapt storage layouts to the needs of the application. This
is to increase operational efficiencies and improve application effectiveness. To demonstrate what
value a custom data store can provide to an application, we examine two more in this section
(Cassandra was covered in a previous sub-section).
Another reason for terming this era as an era of specialization is that there is a shift in the way
products are being constructed. Rather than a single large monolithic solution, like Oracle Database
(or even Btrfs, for that matter), which has just about every feature under-the-sun, the move is to
now have simpler and more nimble, but highly efficient products along a certain dimension. These
products do “one” thing and they do them well. Some of these perform as much as a 100x better, on
ingest and query performance.
E
To address the question of why a single in-memory data layout does not suffice, we will need to
look at the impact of treating DRAM as a purely random access medium. As an example, if we were
to traverse an array of pointers to data elements (which are dispersed in DRAM), we would be
accessing data with almost no spatial locality. This would result in data being brought in from DRAM
into CPU cache lines. If each access results in accessing a different cache line, we would be incurring
a 60nanoseconds access (to DRAM), as opposed to a 3nanoseconds access (to an L1 cache).
For this reason, if the data model of the InfiniScale solution wants to present the capability of being
able to provide efficient spatial access to related nodes in a graph structure, as an example, the
underlying layout should support that intent. A reference from a row structure to another row, to
simulate a graph will result in poor spatial locality, leading to poor performance of nearest neighbor
queries. Thus, a columnar layout is inappropriate for a graph database.
The data layout should be driven by the application intent. If the nature of queries is pre-known, and
if the essence of those queries is given down to the layout, as hints, the data layout can thus be
organized in a fashion that would best serve those queries.

A major benefit of using a document database (such as CouchDB, MongoDB) comes from the fact
that while it has all the benefits of a key-value store (such as Redis, Project Voldemort, Tokyo
Cabinet), we aren’t limited to just querying by key. One of the main advantages of data transparency
(as opposed to opaque data) is that the document store engine will be able to perform additional
work without having to translate the data into an intermediary or a format that it understands.
Querying by non-primary key is such an example. The various document stores provide different
implementation flavors depending on index creation time, index update strategy, etc. This doesn’t
mean that some of this functionality cannot be achieved with pure key-value stores, but these seem
to be focused mainly on single/multi key lookups and most probably one should build this additional
layer. Another benefit of the data store knowing about the information content is that it can support
some common operations natively. For example, MongoDB allows the application to do an atomic
increment of a counter, without the application having to do a read-modify-write, avoiding a much
more complex solution banked by locks.
From a NetApp perspective, Ontap (WAFL) never bothered about such optimizations. What is driving
the solutions towards these optimizations? Ontap played in a world of 1millisecond to 10millisecond
access latencies, due to the network hop. When focus shifts towards a low-latency play, with bulk of
data access from memory, the latencies of the order of microseconds start to play an important part.
Thus, the engineering required in InfiniScale solutions are very different from the optimizations that
we have traditionally played in.
Thus, one-size-fits-all does not hold good anymore [Ston05].
2.2.4 Analytics Changes Everything

Analytics is driving change in the storage and data center architectures, more than
any other business need. This started with Batch Analytics and is now changing the
way databases have been used for the last 30+ years.
F
We start with a classification of the Analytics space, and for this, we shall not restrict ourselves to
only real-time analytics.
Figure 4: Analytics Classification
The leaf-nodes are elaborated further:
1. Traditional SQL, Converged DB: Examples of this are SAP Hana and Microsoft’s Hekaton. The
fundamental target is the traditional business applications, which are presented the familiar
SQL interface. These solutions run OLTP and OLAP on a single underlying database, which is
hosted in memory. The fundamental goal is to support interactive and real-time query
processing over hot data (transactions).
2. Emerging NoSQL, Converged: Most emerging applications are not SQL-based. They work
with specialized data models, which are closer to their problem domain. Examples are
GraphDB, Neo4J, Cassandra, Riak, HBase and the like. Scale of operation is the fundamental

tenet here. Most of these solutions are based on commodity hardware, leveraged by Open
Source InfiniScale middleware.
3. Traditional BI, DSS/ DW: This is the traditional DSS/ DW or the OLAP workloads. These are
batch-oriented workflows triggered after a batched ingest of data from an OLTP data store,
to produce reports and update dashboards, for human consumption.
4. Emerging Content Stores, MR on HDFS/ CDMI: Emerging deep archival or active archival
data stores manage a large corpus of data, with S3 and HDFS interfaces becoming defacto
standards. CDMI is far behind on adoption, but in the same space. Being able to extract
insights from a petabyte repository through in-place analytics is a design goal of most such
data stores [Islam12].
[1], [2] and [4] have led to a very significant shift in storage architectures, all lead by analytics.
Traditionally, there have been separate Online Transactional Processing (OLTP) data stores and
Online Analytical Processing (OLAP) data stores. The data model of the OLTP data stores (also called
operational data stores) is optimized for transactions and OLAP data stores follow a different data
model (the data warehouses). The organization of data in the OLAP data stores is typically along the
dimensions (attributes) of interest, along which planned queries will be executed.
At an appropriate frequency (may be end-of-day, end-of-week, or another frequency), data is copied

from the OLTP data store and into the OLAP data store, from which business intelligence (BI) queries
produce data for dashboards, to be consumed by humans.
G
There has been an ever-increasing need to present the Business Intelligence data as quickly as
possible. Companies such as Oracle and Terradata, have squeezed the last bit of juice (performance)
from the OLAP data stores and there has been tremendous amount of research around optimizing
SQL processing and query plan optimization. We are now at a point of diminishing returns of any
further optimizations in query optimization. And yet, the need to reduce the latency for BI
continues. It is now being demanded at a timescale to be consumed by machines (humans being
replaced by machines in taking decisions). Thus, the turn-around times of OLAP queries are now
expected to be in the order of 100s of milliseconds – and on live transactional data (as it is ingested).
Incidentally, it is the maturity of analytics and the higher degree of confidence with which the
evolved algorithms can predict the behavior of systems, which has led to reliance on machines. The
other reasons that machines are being relied on are because of the sheer dimensional complexity of
data.
Such levels of response time are feasible only if the batch operations of copying data from
operational data stores to the data marts can be avoided. Thus, the call is to have a single data store
which can serve as the transactional data store, as well as against which we can issue OLAP queries
(and as we evolve, even unplanned analytical queries).
Unfortunately, OLTP and OLAP have orthogonal workloads. OLTP constitutes small random writes,
while OLAP mandates the data store to be re-organized to leverage the large sequential read
throughputs of disks. But, this re-organization of the data along specific dimensions in the OLAP data
store was needed because of its underlying storage medium (disks) and was not a tops-down
decision. Such orthogonal workloads on a single data store thus calls for the data store to be placed
on random access medium. With DRAM prices down to nearly $15/GB, SAP chose to skip flash as a
medium and host the working set in DRAM. Microsoft, since 2009, has been known to be working on
Hekaton, an in-memory SQL server realization, which was announced to be in beta trials in
November 2012 [Lars12]. There is, however, a fundamental difference between the approaches of
SAP-HANA and Oracle Exalytics. Oracle has revived a 20-yr old database, TimesTen Database (an
embedded database) and has chosen to use it as an in-memory database. But they have not
converged the OLTP and OLAP databases.

Thus, when OLTP and OLAP are required on the same data store, the OLAP stack needs to be re-
written, such that it can deal with an underlying data store that has been organized for efficient
transaction processing. The new OLAP thus needs to translate the physical large sequential reads
into logical large sequential reads but actually reading from random locations on DRAM. But, the
optimizations needed here are more nuanced than just that. Rewriting the OLAP + OLTP stack (by
SAP in HANA) is a once-in-a-30-yr phenomenon and is being (potentially) written for the next 30
years! The boundaries of application, middleware and storage are thus being re-examined.
Functionality associated with each layer in the stack is being reconsidered, and a side effect of the
same is that the storage layer is being commoditized and data management capabilities are
migrating to higher in the stack.
From NetApp perspective, the unfortunate part is that our OLTP customers might not ask us for a
change. It is the OLAP-side, where we don't have a substantial footprint and where we are not
exploring, which threatens to change the storage architectures and do it in a way that has a positive
side-effect of enhancing the transaction latency and throughput of OLTP workloads. We thus, run a
risk of being blind-sided while storage evolves towards the new world-order.
Due to this infrastructural shift (and the cost points of the same), the notional differences and gaps
between Tier-1 and Tier-2 diminish, and in some extreme cases the worlds collide. Thus, the so-
called emerging Tier-1 solutions threaten to enter into NetApp’s green zone Tier-2 business
processing, and eat into the same.
H
Because the application is now joined at the hip with the storage, the storage need not jump
through the hoops to figure out and guess what the application is trying to do with storage. Instead,
the application can now specify what it needs of storage. The interface between the application and
storage is up-for-grab and can be defined in ways that allows fine-grained, application-driven and in-
band data management. For example, a SQL query will now be able to specify, through SQL
constraints (as an example), that a transaction involving $3 Million is more important than a $30
transaction. This in-band specification of hints from application will allow the high-value transaction
to be synchronously committed to a remote DR site before acknowledging completion to the
application, while other transactions are protected using the usual asynchronous DR mechanisms.
This allows for application-driven and transaction-selective continuous data protection. This is an
example of application-driven in-band data management, which is transaction-granular.
SAP-HANA is the Google of this infrastructure because what they have done is pioneering. Most
enterprises would want to leverage what SAP has contributed, but can't build it themselves. It
should be our endeavor to partner with this Google. There is an opportunity to work with HANA and
re-draw the storage boundaries while they are still not cast-in-stone. There is also the opportunity to
drive these interfaces into de-facto community standards such as the OpenStack and assume a
leadership position. Some of these interfaces are at the data abstraction layer of the data store
stack.
Another aspect is that the level of memory and structure optimizations done for HANA has been
done keeping a specific Data Layout in perspective. In the case of HANA, it is a custom data layout,
but essentially column-oriented, for the most part. And, this layout was chosen to allow for data
warehouse queries. However, as HANA moves into the territory of more complex and varied
analytics on its data streams, it is considering options for alternative data layout schemes. The
question that begs to be answered is: is there a single data layout scheme that can take us to the
point of 80th percentile performance in 60% of the cases.
I
The transaction layer is going to be a very important central control point that we should stake a
claim on. Transactions help knit mutations on multiple objects, provide data storage consistency
points, and are critical to businesses. It is also what is expected to see the most evolution in the near

term with either HW/ SW transactional memory becoming more practical with a push from the likes
of Intel. We are also making similar claims through our efforts in ATG in the context of ongoing
investigations around SCM technologies and would urge the reader to look at the DC2015
Technology Report. NetApp is already participating in the activities in SNIA NVM Programming TWG
around standardization happening in the context of persistent memory.
2.2.5 Scale-out, Shared-nothing Architectures

This is yet another incarnation of distributed systems. It is different this time around
– it is driven by business needs, and is emerging without any baggage of having to
support traditional applications and provide backward compatibility. This has led to
its quick adoption, fueled by right technologies.
This change has been driven by technology limitations and fueled by business needs. In the past 5
years businesses have been pushing for disproportionately increased levels of ingest and query
performance.
Figure 5: Bandwidth and CPU transistor count Trends29

J
Link bandwidths (Memory and IO Bandwidths, shown in Figure 5) have not been keeping pace with
growing data demands and raw data capacities relative to compute capacity and network
bandwidths. The only way to get more net data to move over PCIe was to have multiple PCIe
connections working in parallel. In other words, distribute the workload over multiple nodes. This
triggered scale-out. Further, to eliminate/ minimize the pressure on cross-node interconnects the
call was to evolve into a shared-nothing architecture.
Interestingly most of the data growth has been happening as unstructured content. Thus the need
for ingesting and analyzing large volumes of high throughput unstructured content led to the
evolution of storage architectures in the direction of leveraging scale-out and shared-nothing
paradigms. What would otherwise be considered as low-value data, this unstructured content
29
Source: http://www.stanford.edu/class/ee282/08_handouts/L07-IO.pdf

became a gold mine of information due to the maturity of analytics techniques that helped find the
needle-in-the-haystack. This has led to behaviors of storing everything! This further stresses the need
for traditional architectures to evolve. Maturity of Analytics algorithms has been, yet again, behind
the shift in storage architectures. These paradigms have thus formed the fundamental architectural
tenets of storage architectures, supporting analytics.
To what extent up the stack should these shared-nothing notions be exposed? From one
perspective, cDOT is also a shared-nothing system, as a D-Blade of one system does not access data
from another D-Blade. But, the N-Blade can go across nodes (the remote path). There is also a cross-
node transactional binding in the control paths at the M-Host. And, the HA-pair works to protect its
partner by leveraging shared disks at the backplane. cDOT exports a POSIX interface (covered in the
next section) that hides the underlying NUDA (Non-Uniform Data Access) model from the
applications. Thus, an application could potentially run transactions and joins that span nodes, while
being totally agnostic of the underlying data distribution across nodes. This simplicity has been very
important to the enterprise applications we support.
Shared-nothing architectures on the other hand pass the complexity of topology boundaries up to
the middleware at the host. Most InfiniScale middleware would not allow for cross-node
transactions and data access in a single unit, at a certain low-level of abstraction. This is not just for
better performance. It is also for a more robust system, by avoiding cross-node state and lock
maintenance.
However, at higher levels of programming abstractions, even Google, with its Megastore has gone
down the path of simplifying programming abstractions but leveraging the scale-out and shared-
nothing paradigms as its underpinnings. Thus, Megastore builds a higher level Data Model over a
lower-level Data Model.
K
Below is an attempt to classify when one would move from a centralized storage model to a shared-
nothing and potentially a peering model.
Table 5: Tipping over to Shared-nothing Data Stores
Storage Model
Shared Storage Shared-nothing Storage
Application Model
Monolithic 2-hops (Traditional) 3-hops
30
Sharded Function 3-hops 2-hops (Peering)
The above analysis is centered on the premise that a network hop significantly impacts the latency
seen by an application and thus alters its performance profile. The number of hops is counted from
the client, where the application initiates the request. These architectures are explained below:
1. Monolithic over Shared Storage: Access from the client would see two hops. One from the
client to Application Server and the second from the Application Server to Shared Storage.
This is a rather deterministic number of hops due to the monolithic nature of the application
at the Application Server. This is the traditional siloed data access model in an enterprise.
2. Monolithic over Scale-out, Shared-nothing: In this case, the number of hops involved would
be three hops. The additional hop is introduced at the storage layer. Counting the number of
hops, the first one is from the client to the Application Server and the second one is from the
monolithic Application at the Application Server to a node at the scale-out storage layer.
30
Function transformations are co-located with the data shard, within a node boundary, that it operates on. This leads to
highly filtered data movement over the network.

Now, this node is typically not the node that has the data, in a large enough a scale-out
cluster. This incurs an extra hop within the storage cluster to get to the node where data is
stored. This third hop is what we have been calling the cluster tax, in cDOT.
3. Sharded-function over Shared Storage: In this model, the Application functionality is split
and spread across a scaled-out Application Server. But at the storage layer, it is still a
centralized model. Depending on the workload, this shared storage model will be able to
survive the IOs from the scaled-out Application Server, but it will definitely be stretched. The
number of hops is also three in this model. The client access comes over the network to one
of the Application Server shards and will need to be forwarded to another server that
actually serves that data shard. However, the data is not local at that Application Server
node and the access request will be forwarded to back-end shared storage.
4. Sharded-function over Scale-out, Shared-nothing: This model signals the completion of the
cycle in the maturity curve of the evolution of the shared-nothing model. The Application
Server and Storage both follow a synchronized shared-nothing and scaled-out model. The
application shard at every server is responsible for the data that it manages local to that
node. Thus, the number of hops is reduced to two, from the client to an Application Server,
which acts as a forwarding agent to the targeted Application Server.
The above analysis is an attempt to define the architecture of choice for InfiniScale and when the
scale-out and shared-nothing architecture becomes mainstream architecture of choice. Middleware
in InfiniScale solutions is already split and sharded and follows approach [4], as defined above. In this
model, each node hosts an Application Server function that serves to operate over the data and
provide a highly filtered data movement over the network. However, not all problems can be broken
down into nicely partitioned function blocks working over partitioned datasets. Most of traditional
data management, as seen by enterprise applications and admins is built over our Snapshot®
technology. LCoordinating a snapshot across a 1000-node cluster is as yet an unsolved hard problem.
Being able to do this in the time frame of mainstreaming of SCM would be important.
In the past, network latencies of 5milli-seconds matched up nicely with disk seek (rotational)
latencies and thus disk-based shared storage was viable for a long time. Even with the advent of
Flash, the 100 microseconds of read latencies matched well with about the same round-trip
latencies of 10Gbe. Thus, there was still a compelling enough a reason to stay with shared storage
across the network. But, SCM at the horizon promises a 100 nanoseconds access latency, which
leads to a singularity that fosters growth of shared-nothing architectures. This brings up other issues
around data availability and how one can make data resilient in the context of SCM, when working
with an order of magnitude higher latency interconnects. For further analysis of how SCM impacts
storage architectures, the reader is advised to refer to the DC2015 Technology Report. It should also
be stated that while SCM is on the horizon, some of these changes are happening today, with DRAM-
based storage becoming mainstream.
As covered thus far, most InfiniScale solutions are DRAM-based with data durability on locally
attached disks, which would shift to SCM, in the near future. Data replication across nodes is
employed to protect against data loss, in the event of a node fault.
From a NetApp perspective, InfiniScale architecture has control points at the host, and in the data
path. This isn’t a place that NetApp has traditionally played at. Data availability is also provided at
the host layers. For purposes of cross-data-center disaster recovery, some deployments might want
to rely on a storage array for replication, but most would continue to depend on the InfiniScale
solution to provide that capability. A related but orthogonal capability is to be able to replicate into a
much smaller cluster at the DR target. We examine and contrast these architectures with host-based
caching architectures in a later section.

2.2.6 POSIX Semantics Eliminated
Newer application infrastructures neither want to use POSIX at its underbelly, nor
want to provide a service with POSIX semantics [Grid06, Hild09]. This is not to say
that traditional applications will move over, but the newer applications prefer a
more simplified access paradigm [Beer13].
While this might seem slightly provocative, it is also a fact. Most new and InfiniScale applications
being developed today are being programmed against InfiniScale middleware, which encapsulates
storage and presents higher-level abstractions to interface with. The APIs (Data Model) provided by
the InfiniScale middleware is the new emerging and de-facto standard. At this point however, there
is no one single interface that has won in this race, but there are clear favorites. Some popular
InfiniScale interfaces using which applications are programming storage are: MongoDB’s JSON
interface, Cassandra’s columnar structures or Neo4J’s Graph APIs. These are becoming popular also
because of the simplicity of integration with data structures in a programming environment.
M
POSIX is not dead, but it has been relegated to being used within a node and at very low levels of
abstraction, almost making it irrelevant. Programming by leveraging POSIX is becoming similar to
programming in assembler, of yesteryears, as an analogy. POSIX interfaces are used in InfiniScale
middleware for 3 different purposes:
- For large blob IO, where a blob layout is completely managed by the middleware
- For memory mapping a large segment of a file into the address space of the middleware for
manipulating the contents of the same, and
- For log management, for recoverability
Thus, in most cases, content layout within the storage region is managed almost completely by the
InfiniScale middleware. POSIX is relegated to managing those extents and for providing mapping of
those extents into the address space, as needed. More often than not we have a key-value layout
within these extents. This usage trend will only exacerbate with the advent of SCM, as that is also
the preferred programming model, as being proposed as a standard in the NVM Programming TWG
in SNIA.
N
POSIX is also not being used as a presentation layer interface. POSIX was created largely for
purposes of large block IO and not tiny updates. This interface still has its roots in tape-based IO,
which is evident in the lack of capability to read back the tiny updates. The data model presented by
POSIX is that of flat blobs, which isn’t very useful when the boundary required is very fine grained.
Another reason is that POSIX and scale-out are hard to get right. If each tiny datum was an
independent file, the metadata overheads are very high due to inodes and directory entries. Access
to a single file thus also has high metadata overhead. Most file systems also do not do much to
maintain spatial locality across files in a dataset. If each such file were distributed using a consistent
hash, listing of a directory would suffer. Serialization across nodes to get POSIX right is also hard and
difficult to scale. Thus, alternate richer and more flexible data models are used in InfiniScale
architectures.
This trend is further fueled by gains seen when bypassing the kernel and the kernel buffer cache and
by taking fine-grained control of what is cached, and till when, through judicious use of madvise. This
complexity is managed by the InfiniScale middleware and is the new OUser-space Kernel. Very
significant CPU path lengths can be eliminated through these means and through judicious use of
lock-free data structures and algorithms. In one of the experiments it was demonstrated that on
Intel’s Sandy Bridge the number of operations in a ConcurrentArrayQueue (a Java data structure), a
10x improvement of number of operations per second was possible when changing the structure
into a lock-free form, while reducing the latency by almost 50%.

In the past, when the data paths went over the network and into shared storage to complete a
transaction, the round-trip latency was large enough to mask a number of inefficiencies in software.
However, now that in-memory processing is gaining traction, any inefficiency will come under the
scanner for elimination. Figure 6 gives a brief insight into the depth of subsystems that a single IO
call has to traverse, and the Pcost of modularity in the Linux IO Subsystem.
In summary, POSIX was a good abstraction to be working with when the resting place for the data
was over the IO bus. But with in-memory processing catching on, the resting place changes to DRAM
and the memory bus is used instead of the IO bus. This changes the expectations of latencies, and
suddenly the nicely structured IO subsystem appears burdensome. This leads to the call for a re-
think of use of POSIX in the application stack.
Figure 6: The Linux IO Stack31
2.2.7 Real-time Processing

Businesses are demanding real-time responses, right from data ingest to data
processing. A number of businesses have been created and are run because of the
quick turn-around times that InfiniScale solutions can provide. To support this,
31
Source: http://www.mysqlops.com/2012/04/09/linux-io-stack.html

InfiniScale solutions place very stringent requirements on the way data is captured,
organized and processed.
Analysis presented in the previous section on POSIX provides a wedge into this topic, which is
related to low latency processing and strict guarantees to meet those latency requirements. The
deep stacks and layers of software that needs to be traversed for IO processing not just impacts the
latencies, but also increases the probability of missing deadlines due to the sheer complexity (and
thus uncertainty of operation) of layers involved in the stack. Troubleshooting performance issues in
these deep layers has also proven to be a challenge. This also leads to a similar conclusion of
bypassing the entire IO stack and taking control of processing at the user space kernel. That is also
the cost of modularity paid for creating a stack that would be applicable for a broad segment, which
is against custom-built stacks, and in line with one-size-does-not-fit-all paradigm.
One of the questions that we often encounter, given our NetApp lineage, is about what data
management problems do we need in the real-time data stores. A real-time data store is not
attractive by virtue of its rich data management features, as known to us, but because of its
simplicity and ability to meet the performance SLO within strict and bounded deviations. This is the
single most important data management feature needed in real-time stores.
Metamarket’s Druid, used by Netflix is an example of a real-time data store. With 70 billion log
events per day and ingesting over 2TB of data per hour, it is one of the largest log-collection
infrastructures known, as of this writing. The kinds of operations that are subject to real-time
processing are: aggregation (group by), time-series roll-ups and generalized regular expression
searches.
In order to meet the real-time needs, it is often required to layout data, which is driven by the kind
of queries, filters and dimensional analysis that the data will be subject to. With sequential data
storage, in-memory allocation, and the automatic text enumeration (as found in Lisp), searching for
a symbol is really just scanning for an integer in an array. That's why such data stores are a few
orders of magnitude faster than common relational databases for reading and analytics. So, it is
never about just placing data in-memory as-is and hope for a speedup. There is substantial
engineering and hand-organization required, to best organize data for cache-efficient access.
Writing data is a weakness of column-oriented storage. Because each column is an array (in-
memory) or file (on-disk), changing a single row means updating each array or file individually as
opposed to simply streaming the entire row at once. Furthermore, appending data in-memory or on-
disk is pretty straightforward, as is updating/ inserting data in-memory, but updating/ inserting data
on-disk is practically impossible. That is, the user can't change historical data without some massive
hack. For this reason, historical data (stored on-disk) is often considered append-only. In practice,
column-oriented data stores require the user to adopt a bi-temporal or point-in-time schema. Such a
scheme has been adopted by SAP’s HANA solution too.
In the scope of in-memory data stores of an InfiniScale solution, the role of flash and other storage-
class memory technologies could be twofold: First, flash volumes can be used as major persistent
storage devices, leaving disks as backup and archiving devices. The insert-only paradigm of an in-
memory database matches the advantages of flash memory. In an insert-only database the number
of random writes can be reduced if not eliminated and the disadvantage of limited durability is
alleviated by the fact that no in-place updates occur and no data is deleted. Second, the low readout
latency of flash storage guarantees a fast system recovery in the event of a system shutdown or
even failure. In a second scenario, flash could be used as memory-mapped storage to keep less
frequently used data or large binary objects that are mainly used during read accesses. The
InfiniScale solution can transfer infrequently used columns to a special memory region representing
a flash volume based on a simple heuristic or manually by the user. The amount of main memory can

be reduced in favor of less power consuming flash memory, thus reducing the overall energy
consumption of the system while maintaining performance [Nishi12].
Q
There is an additional aspect that impacts real-time ingest processing, and subsequent query
processing. This is the notion of data consistency across the nodes of a cluster. During ingest, if the
needed number of replicas must be updated before a client is confirmed about a write completion,
the cluster is said to have strongly consistent update semantics. If we have to wait for all nodes to
acknowledge the writes, it might lead to an increase in latency of ingest, and also increases
indeterminism of ingest latencies. Thus, if one could respond to the client soon after a quorum set of
nodes have acknowledged, latencies could be dramatically reduced. This also helps to loosen the
coupling across nodes in a cluster, leading to better partition tolerance. However, when an
application reads, it might get stale data, if it was directed to a node where the update has not
completed. A lot of applications working on this infrastructure will be ok with this level of staleness.
However, if that is not the case, the application will need to do a quorum read. That means it will
need to read from a quorum set of nodes to be sure that the data is not stale. This in turn decreases
the effective network bandwidth. Yet another issue is that the application is now responsible for
resolving any write-conflicts that might have happened due to simultaneous updates from two
different nodes, to the same data. Typically vector clocks are used to disambiguate the writes, but
which write should take precedence must now be decided by the application.
However, not all real-time analytics are in the order of a few milliseconds. Quite a few analytics
demand higher flexibility and drill-down capabilities and are willing to trade that flexibility for an
additional order or two of milliseconds. This is where a combination of data layout and inverted
indices are used in conjunction to address the needs of adhoc and near real-time analytics. However,
there are data stores such as HyperDex, that solve the exact same problem through a concept they
call Hyperspace Hashing which allows for simultaneous multi-dimensional analysis without having to
deal with multiple single-dimensional inverted indexes.
R
One of the major impact of the need for real-time ingest and query is that the authentic copy of
data will start to come into InfiniScale solutions, rather than first ingest the same into shared
storage. This threat has started to become a reality with our customers, and will only increase with
the need thus calling for anti-caching32 solutions.
Given that an InfiniScale solution is sized for a certain working set, the size of the cluster need not
change unless business requirements change. Another important aspect is the analysis around
latencies. Most DRAM accesses are of the order of 100nanoseconds (rounded up for simplicity of
analysis) while going over the network (to a storage array) involves a millisecond, which is a 4 orders
of magnitude difference. A cache miss is thus very expensive for the application, which it will not be
able to tolerate or choose not to tolerate. Thus, most InfiniScale applications would not use shared
storage. STiering is adopted, but in the direction of InfiniScale to active archive (anti-caching), rather
than caching into InfiniScale. The amount of data kept as the working set is typically driven by
policies in an organization, which may vary from 1 week to 1 quarter.
Thus, there is a clear split in the InfiniScale architectures and the applications created for InfiniScale
and those created for active archives. Hadoop Map-Reduce is applicable to applications for active
archives, not for the real-time InfiniScale space. StorageGRID is an example of active archive.
FlashAccel clearly falls short due to the issues with caching semantics, like described earlier. We
have a portfolio gap in not having a solution for InfiniScale.
32
http://istc-bigdata.org/index.php/anti-caching-and-non-volatile-memory-for-transactional-dbms/

2.2.8 Data Management Evolution
Data Management is application-driven, inline and fine-grained and is provided by
middleware at the host. Data Management is one of those fundamental aspects that
allow NetApp to differentiate its solutions. That control point is shifting upwards and
into the host-tier [Sakr11]. Even the fundamental characteristics of Data
Management are evolving [Voul11].
T
InfiniScale middleware defines everything from Data Abstraction to Data Service, in the Data Store
Stack. Thus, it leverages the Data Storage layer for nothing more than storing large blobs in which
data is internally organized in a way that best meets the needs of the application in question. This
middleware has the capability to provide cell-level versioning, in the context of columnar stores. It
provides data resiliency by replicating data, coordinated by the middleware. It knows about health of
physical nodes and the impact it has on the data elements stored in it. Thus, it may initiate
background replication of data elements to ensure the promised numbers of data elements are
available in the system. This helps to provide both data resiliency and parallel data access.
Transactional updates with full ACID semantics are also supported in limited scopes; as found in
MongoDB the supported scope is a document. In Cassandra, a transaction support cannot span a
row. These limited semantics are to overcome technology limitations, such as distributed
transactions, distributed locks and to avoid any shared state maintenance. It not just helps simplify
the data store design and implementation, it also helps in having a blazing fast solution for that
need. Avoidance of shared state across nodes in a cluster also helps in better resiliency to partition
tolerance. Most applications using these data stores do not need these missing features, and find it
acceptable to live with some additional complexity handed over to them, just in case they needed
higher-level semantics.
U
There are, thus, elements of self-healing and self-managing baked into the InfiniScale middleware.
This allows for elimination of major costs. It allows for elimination of duplication in hardware,
because software can now self-heal and software is now built with failure in mind and how it should
recover from various scenarios. InfiniScale solutions are only getting better at self-healing, with
wider deployment exposure. Another aspect of cost reduction and simplicity is around the solution
being self-managed. For the most part, admins monitor the state of the system, rather than actively
manage the same. Thus most of the management tools (Puppet, Chef, Nagios, and a host of others)
used in such deployments allow for simplified provisioning and setup, in the front-end of the
infrastructure lifecycle and subsequently monitor the performance of workloads. This is very
different from active monitoring and management of workloads and infrastructure done in
traditional enterprise deployments.
InfiniScale solutions are developed for analytics needs and are managed by analytical means. By
definition any InfiniScale solution generates enough forensics data that provides insights into its
operations. Any system providing feedback control at machine speeds will need to be monitored and
controlled in its timelines. Thus, most controls are in-built into the solution, by design. Other aspects
are monitored and violations analyzed through other analytical solutions.
Few questions that are typically posed in the context of InfiniScale solutions are discussed below:
- How important are storage efficiencies? Short answer: very important. Replication by mere
copying is used for both data resiliency and parallel data access, to avoid hot spots.
However, this approach is questionable at large scale as the total cost of ownership
increases. As an example, Acunu was able to engineer Cassandra clusters in ways that would
allow it to replace a 100-node stock-Cassandra cluster, with a 10-node Acunu-engineered
Cassandra cluster. This was through well-engineered data layout and data distribution
schemes. At small scale a 3TB or 9TB is a matter of two additional disks, but extending that

to 3PB versus 9PB is a matter of a few racks, floor-space and cooling – the costs just add up,
as is also alluded here33.
- How is data protected against faults, human-errors and disasters? Most of these issues are
considered during InfiniScale middleware and application design. Most InfiniScale solutions
react and respond very gracefully to infrastructure faults and recovery from the same. Few
InfiniScale solutions (such as Cassandra) have Disaster Recovery solutions built-in, while
others (such as HBase) rely on handcrafted schemes for the specific deployment. Protection
from human-errors is typically lacking in these deployments and when it happens, such as
deletion of a dataset, there is little in the infrastructure to provide a recovery path. There
are reasons for this. In some cases, the data collected is statistical in nature and thus loss of
some data is not seen as catastrophic. In other cases, the data in InfiniScale is not the
authentic copy. It is a cached copy and so a recovery path exists. However, this latter aspect
is changing with the maturity of InfiniScale solutions. It is now expected that InfiniScale
solutions provide data guarantees of the likes of other more traditional enterprise data
solutions. This is an opportunity for NetApp to differentiate.
- How is data life cycle managed? As the heat index of data reduces, data must be stashed
away into slower and cheaper tiers of storage. Most of the datasets handled by InfiniScale
solutions don’t believe in writes-in-place (to allow for lock-free operation). Thus, moving
large amounts of colder data is easy, as it is read-only. Minimizing the amount of copies of
this data helps reduce cost and is thus the single most important factor of that storage.
Another aspect is that this colder data will still be used, albeit not actively, from various
other geos, as they too discard their copies. These solutions thus need to provide storage
efficiency at geo-scale. This involves investments in storage coding in the Data Distribution
layer.
2.2.9 Construction of InfiniScale Solutions

The way InfiniScale solutions are designed, constructed, released, deployed and
managed is very different from traditional models. The software is constructed to
assume hardware failure will happen and they have resilience built into it, by design.
Thus, managing such a deployment is much cheaper and simpler. A lot of this is also
fueled by proven software being committed to communities.
There is very significant innovation happening in the way InfiniScale solutions are designed,
constructed, deployed and managed. Some of the design tenets and management aspects of
InfiniScale have already been discussed in the previous sub-sections. It is out of scope of this
document to discuss the development process and methodologies, including their release models,
but we would like to say that the release model enables quick turn-around times and fosters faster
innovation. It enables features to be released and pulled-back with equal ease.
There are a few aspects that have proven to be useful to both simplify development as well as
achieve high levels of productivity. Some of these are touched upon here:
1. DevOps: It is important to see how development is starting to happening in the new world
and what the developers’ workbench looks like. A lot of the developers use Eclipse-based
IDEs with Maven integration with build tools, tests, version control and release mechanisms
integrated. In fact these IDEs integrate with a test infrastructure in the cloud. They even
extend the above development environment into deployment. This is exactly what is
referred to as DevOps, wherein the IT operations are linked very closely to the developers’
workbench. SpringSource is one such community that was created, supported and driven by
VMware. SpringSource integrates Grails, which is an open source and full-stack web
33
http://lwn.net/Articles/475681/

application framework for the JVM. It is built using the Groovy dynamic programming
language, which is centered on design patterns and yet keeps performance as one of its
central goals. Another such language is Go, which has been used within Google. All these
languages support high levels of concurrency and help programmers avoid typical pitfalls.
These languages and frameworks are built with scale, performance, developer productivity
and support, in mind.
2. JVM and Languages: VThe JVM platform supports the big three languages, beyond Java.
These are: Groovy, Scala and Clojure. JRuby is another popular language supported by the
JVM platform. Python has been independent of JVM for long but Jython is now catching up.
In short, most popular languages in which middleware and/ or application development is
happening are based on the JVM platform. This also allows for easier integration with pre-
developed Java libraries. Scala is an Object Oriented and functional language adhering to
most of Java syntax. Clojure is a dialect of Lisp with a few advanced features such as
Software Transactional Memory. It has a very strong emphasis on immutability and thus is
very useful to implement persistent data structures, which becomes important in the
context of SCM. The one popular language beyond those built on JVM is Erlang. This has
concurrency support built into it and has its genesis in applications where scale and
predictability are key tenets. InfiniScale solutions that leverage these newer languages are:
Riak is built using Erlang, Cassandra using Java and FoundationDB using Flow based on C++.
3. Application Development: WApplication Development has always been centered on data.
But, the selection of the data store is now driven by the application, based on the data
characteristics and the nature of operations needed on this data. Most InfiniScale solutions
have leaky abstractions that expect the application developer to know about nuances of
design and implementation of InfiniScale solution. For example, while these data stores are
called “schema-less”, it does not need the nature of normalization that was needed in the
past, but does need a schema design for appropriate performance levels. This is why the
InfiniScale solution is very important for application development. Yet another tenet
followed by application developers in this world is to leverage (open source) and compose
the application, rather than build it from scratch.
4. User Space Kernels: XThere is no need to develop and deliver solutions necessarily in the
Kernel. There are some interesting optimizations to be had by having the solution in the
kernel, all stemming from copying of data between buffers across user and kernel space.
Security and the need for multi-tenancy raise the need to be in the OS kernel and to be able
to ensure that the effort is not reinvented. With most InfiniScale solutions having a Key-
Value store as its underpinnings, it makes a lot of sense if the OS kernel provided that
interface natively and the KV-store was within the OS kernel. As InfiniScale solutions become
more widespread, we would imagine this to be a native interface, maybe even as part of
POSIX. But, we wouldn’t want every 30-byte update to go through the kernel. This needs to
be batched as close as possible to the application and shipped to another node, to protect
against node failures. This mechanism involves a significant functionality in the user-space
and is being referred to as the user space kernel. The advent of SCM on the memory bus
would only accelerate this and expand the scope of this. Thus, the IOs that the memory
subsystem will see would be those tiny IOs, while what the IO subsystem of the kernel would
see would be an organized batch of those key-value pairs, in a file space. In short, a
significant part of InfiniScale solution development is happening in user space.
5. Open Source: YTo compete or embrace is a question often asked in the context of Open
Source. There are three levels at which we should be looking at embracing Open Source:
frameworks, middleware and libraries. Frameworks such as OpenStack tend to encompass
enterprise needs much beyond storage and will over a period of time evolve into being
responsible for supporting vendor agnostic data center management. Open source
middleware, such as the data stores we have considered in the context of InfiniScale

solutions are responsible for providing a solution for certain kinds of needs and the
generality is less than that of a framework. A library on the other hand is very specific. It
solves a specific problem, such as compression provided by Snappy, protocol agnostic wire
transmission provided by Protocol Buffers and data coding provided by Jerasure. Modern
enterprise software is being constructed in the context of a Framework by leveraging the
right middleware and customizing that with specialized Libraries. An orthogonal aspect is
about being competitive while embracing Open Source. Most Open Source solutions are
functionally interesting in the data path. But they lack efficiencies of operation. There is thus
a play with well-engineered solutions. An example of this approach is by bringing in
efficiencies in the data layout layer. Another possibility is to bring in storage efficiency in the
geo-scale active archive.
2.2.10 Data Center Infrastructure

A number of data center infrastructure providers are joining hands and designing the
next generation of highly efficient data centers, in the open and through communities.
This effort intends to revolutionize the efficiencies of data centers, reduce costs,
accelerate innovation and simplify data center management.
A couple of years back Facebook started Open Compute (OCP)34 and Open Rack35 projects in the
community. While this originally seemed like a good samaritan move, it was an excellent business
decision. With some of the more recent developments in that community, Facebook recently
contributed designs of their motherboards, which would allow for Intel and AMD processors (and
soon ARM processors) to be on the same motherboard. Now, ARM could potentially replace an Intel
processor very easily. Upgrades become cheaper as only those components that need to be
upgraded shall be upgraded. Also, processor upgrade cycles can be detached from memory upgrade
cycles. All this contributes towards their stated goals of: “most efficient computing infrastructure at
the lowest cost”.
OCP says its standards promise to deliver hardware that is 24% more energy efficient and 38% more
cost efficient, on average, than so-called commodity hardware. The group is working on specs for
storage, motherboard and server design, racks, interoperability, hardware management, and data
center design. They plan to do this through disaggregation.
Z
Disaggregation is about separating and modularizing storage, compute, interconnects, power,
cooling and other components so companies can custom configure to their workload requirements.
This approach also supports smarter technology refreshes, so companies can swap out and replace
quickly evolving components, such as CPUs, while keeping in service slowly evolving components,
such as memory and network interface cards.
With their new rack designs as shown in Figure 7, the node boundaries are eliminated and a whole
rack is a computer. This allows for efficiencies in the production, operations and upgrades of these
racks. AAThis is now changing the DAS-based approaches and taking the industry through a more
decomposable and open architecture. While this sounds like good news for NetApp, storage
continues to be treated as commodity. Also, specialized storage options are emerging in that
community, for high-speed IO, dense storage and storage for long-term archival. The OpenVault36
solution looks to be competitive with E-series. Another storage option is their cold storage
specification, which leverages shingled disks and is designed as a bulk load fast archive. To achieve
low cost and high capacity, Shingled Magnetic Recording (SMR) hard disk drives are used in the cold
storage system. This kind of HDD is extremely sensitive to vibration; so only 1 drive of the 15 on an
34
http://www.opencompute.org/
35
http://www.opencompute.org/projects/open-rack/
36
http://www.opencompute.org/projects/open-vault-storage/

Open Vault tray is able to spin at a given time. This model is no different from the tape library, with
only a single drive being able to be mounted onto a tape drive. This is more akin to the deep archival
storage of Amazon, published under the Glacier banner. For hot data storage, Fusion-IO has
contributed the designs of their 3.2TB PCIe memory cards.
Figure 7: Open Compute Rack & One Open Compute Project Server in the rack.
Calxeda has taken to low-power processors of ARM and produced ARM-based motherboards. They
have then coupled this with the OpenVault JBOD storage enclosures to produce a storage server.
The interesting observation they make is that a 32-bit processor is sufficient as a controller for cold
storage and thus correspondingly only supports 4GB of onboard RAM. While a 64-bit processor can
be supported, they do not recommend the same for sake of staying true to controlling cost and
power consumption.
AMD and Facebook have also contributed designs of their micro-servers, which are essentially low
power servers that further enable the anti-virtualization drive. Each micro-server is a cluster of low-
power processors, a small amount (4GB) of low power SDRAM and about 128GB of MLC Flash.
The member organizations have contributed their previous generation designs into the community,
at this stage. This is also so that the Open Compute community gets a validated design that it claims
to have provided under its banner. But, as we go along other members (such as Intel) have started to
table design proposals on the table, rather than validated designs. This is an interesting shift and a
point of inflection in the maturity curve of a community project. Intel has contributed specs to OCP
for Silicon Photonic interconnect technologies that already surpass 100 gigabits per second -- nearly
twice the speed of the fastest interconnect technologies currently available.
While many OCP members are hyper-scale Internet companies we cannot ignore this space as
Goldman Sachs and Fidelity are leading innovation in Data Center design through OCP. Rackspace is
known to have wanted to leverage designs from OCP. Open Compute and OpenStack combo is
evolving into a very potent combination for InfiniScale solutions and beyond. At this point, NetApp
should track developments in this community.

3 Recommendations
This section introduces the abstract architectures that emerge from the needs of InfiniScale. In the
context of those architectures, it describes the key technology elements and lays out the
recommended investments in ATG. We also call out potential technology targets as inorganic
options.
3.1 Abstract Architectures

The abstract architectures below form the basis for the recommendations in this section. The details
of the architecture elements and what drives these architectural elements has been covered in the
previous section.
3.1.1 InfiniScale Architecture
Figure 8: Abstract InfiniScale Architecture
This is broadly a snapshot of the function-blocks in any InfiniScale middleware.
We would like to draw focus on the bottom-half of the figure, which we have defined as InfiniScale
Storage, as it is responsible for core storage functions. Ingest is handled through Data Ingest module
by writing incoming data into the write-ahead log for local resiliency and replicating the same to
another node for cluster-wide availability. Finally, a write-optimized data store is updated with the
incoming update.
A read is handled by looking at the data layout through a combination of the write-optimized and
read-optimized data stores. Data is moved from the write-optimized store into the read-optimized
store, periodically.
The replication engine is a part of the data distribution layer, which is responsible for availability of
data in the cluster, and cluster-wide consistency semantics.
The Data Model is identical to the ones presented earlier. Data Analytics is part of the data service
layer of the 7-layer model.
3.1.2 Real-time Data Stores

As has been described in the previous section, real-time data stores are heavy on memory usage.
Further they rely on organizing data in-memory so as to optimize the use of caches. The first
element of the InfiniScale solution for real-time data stores is thus the in-memory Layout.

Figure 9: Abstract Layers for Real-time Data Stores
The fact that InfiniScale solutions adopt a scale-out architecture, leads to the need to be able to
distribute data across the cluster. This data distribution might be for capacity balancing or for
purposes of higher availability, or for parallel data access. Typically data will be distributed in their
original form rather than coding the same into an asymmetric form. This is because latency of
operation is critical. Finally, data layout defines the organization of data on the storage media, which
may be flash or disk.
3.1.3 Capacity-based Data Stores

InfiniScale solutions typically do not cover cold data solutions. However, as was covered earlier,
relatively colder data needs to be stored in a capacity optimized solution. And, data stored in
InfiniScale solutions will reach a point where their lowered heat index demands their movement into
a capacity optimized store. We started looking at the capacity-based data stores from the need from
InfiniScale solutions and assuming that the two would be very different. But, there appears to be a
stark similarity and when we saw Riak CS (a capacity-based data store) being constructed of Riak (an
InfiniScale solution), we are inspired to explore further.
Figure 10: Abstract Layers for Capacity-based Data Stores
As the tiny data elements are collected together in InfiniScale solutions, the data chunks written to
stable storage are much larger than the potential 30-byte ingested fragments. The typical size of
data chunks written to storage is of the order of 10MB. Thus, if the larger data chunks (of order of
1GB) in a capacity store can be broken down into manageable data elements of the size of 10MB, we
can leverage the data store we used for InfiniScale in the capacity solution. This is what we use the
Data Chunking layer for. We then need to check if this chunk is already available within the cluster. If
it is, we will only need to update the metadata (not shown in Figure 10). The Data De-duplication
block does this function. Once we identify a chunk that is not already in the cluster, we will need to
store that chunk in the cluster, with high efficiency. Just so that we can retrieve the data element
even in the case of disaster, it should be possible to code the chunk in a way that the sub-chunks can
be distributed across the geos. The Coding Layer does this function. Finally, we distribute the sub-
chunks and store the same in the identified nodes.
3.1.4 Summary
Real time data stores and Capacity based data stores thus fit into the data store stack discussed in
section 2.2.1 as shown in Figure 11.

Figure 11: Fitment of proposed datastores on to the data store stack
3.2 Potential ATG Investigations

This section describes the core technology elements that drive the realization of abstract
architectures introduced in the previous sub-section. After describing the core technology aspect, it
calls for ATG investigations in the context of that aspect.
3.2.1 In-memory Data Layout

In-memory Data Layout is key to realizing the promise that in-memory processing put on the table.
As explained previously, that just placing data in-memory will not get the expected gains. There is
significant engineering needed to get this right.
Figure 12: Memory Hierarchy and Access Latencies
A related and often ignored element is: How much should the InfiniScale solution be aware of NUMA
that is present in the system? From Figure 12, above, it is clear that a local DRAM bank costs 65ns
while a remote DRAM bank is 105ns, a 60% penalty. Thus, optimizing the in-memory layout and the
threading schemes to leverage this discontinuity of access latencies will result in significantly more
efficient systems leading to lowered TCO.

Figure 13: Memory Hierarchy in Action
Figure 13 shows typical values of access latencies as one goes down the memory hierarchy. If we
organized data that would need a pointer access and jump from one memory location to another,
we would not leverage cache locality nicely. This would have a significant impact on performance of
the InfiniScale solution. Thus, a great deal of work goes into engineering the in-memory data layout.
Data Layouts thus have to be cache-aware while being cache-size oblivious. Cache-aware means that
they should be aware of the memory hierarchies and ignoring the same has significant performance
and operational costs. Cache-size oblivious refers to the condition that the algorithms should be
such that they do not tie themselves down to the sizes of different caches. An extreme example is
cDOT, which treats 4KB as a special buffer size and optimizes around it. But, as the environment
changes, it takes a herculean effort to move away from that sized affinity.
Another aspect we would like to highlight in the context of in-memory data stores is that random
reads are still very different from sequential reads, even in “random access memory”. The in-
memory data layout is driven by the access patterns. What is sequential for one access pattern
might turn out to be randomized access pattern for another algorithm. For example, a columnar
store can be optimized for reads along certain dimensions, and is O(N) for the most part. But, if the
same layout is subject to a nearest neighbor algorithm, it becomes a O(N2). Workloads do not
change randomly, but do evolve over a period of time. To be able to evolve the layout based on
needs is one way of addressing this. Another way of addressing this is to invent a data layout that
can be used by a majority of analytics algorithms.
The Aries transactional model [Mohan92], which has been the basis for transactional system for the
past 20+ years, will be redefined in the context of SCM. Thus we shouldn’t ideally need a separate
write-ahead-log for protecting transactions.
ATG Investigations
R1. Devising an in-memory layout for Cassandra: Cassandra is a very popular InfiniScale
middleware. It is also a column-oriented data store. A column-oriented data store is also our
first choice as it helps address the needs of our enterprise customers, which are mostly
database-oriented, but need a few specific functions from their data store and their needs
are not being met. For example, rate of data ingest, tiny data ingest, scale of processing, and
the likes. For such customers, it would be good to investigate a solution that is key-value
oriented but can support a columnar structure. The goal is to optimize the data layout in-
memory for subsequent storage on disk, and efficient retrieval from the same. It is also
important to investigate how the layout should evolve if the longer-term retention media is
flash, as against a rotating media.

R2. Translation of query specifications into data layout: If we knew the queries beforehand,
could we have organized the data any better, for improved system efficiencies and
operation? This is a key question that needs to be answered. It would thus be important to
know if there are a few layouts that can help address a broad range of queries. It would also
help to know the cost of translation of layout from one form to another, as the analytics
needs (algorithms) change (evolve).
R3. Fully versioned Key-Value Store: Most of the use cases that we have seen thus far for
InfiniScale solutions have been around either real-time analytics or exploring options for tiny
blobs of data that are essentially referenced through some handle – a key-value pair. Most
object stores are good at storing large objects [Chen12]. They can neither work with tiny
objects nor can they work with the volume and velocity of key-value stores. A KV store from
that perspective is an IOPs-tier of the Object Store. Being able to deal with tiny key-value
pairs is a challenging storage problem. This is because even as the relatively colder data is
written to a more stable storage, it will be accessed using the same access mechanisms as
when it was in memory. How can a tiny key-value pairs data store organize data for large
back-end IOs to stable storage, for efficient subsequent retrieval and processing?
3.2.2 On Storage Data Layout

In InfiniScale solutions once data is written to stable storage, it may not be read again unless a query
for that data is fired, which is of a lesser stringent latency SLA than other queries that are being run
and served from an in-memory data store. This aspect is true for capacity stores too, but the latency
SLAs are relaxed by at least a couple of orders of magnitude.
If we do treat InfiniScale as a IOPs tier to Object Stores, there has to be a translation from the tiny
keys to the large blob stores into which large numbers of tiny key-value pairs are fit. Thus, given a
tiny key, one should be able to determine the blob key, which would encapsulate that tiny key. This
metadata mapping is internal to the InfiniScale solution.
There are multiple ways in which this can be achieved. Some of the solutions involve putting an
upper bound on the number of blobs that shall be inspected. Others need a more deterministic
mapping. As the value of the tiny key becomes smaller, it becomes more and more challenging to
meet the deterministic mapping schemes. This is because the amount of overhead to manage the
mapping from the tiny key to a blob key will be very large. We can thus resort to mechanisms such
as a probabilistic lookup into the blob to determine if the tiny key would be found in that blob with
high confidence. Once such a blob is determined, the key space within that blob is either searched,
or yet another key hash is looked up for a deterministic match.
The capacity tier can make no assumption about the metadata content within the blobs. The fact
that these blobs are immutable, also allows one to consolidate the metadata signatures into a
central store that may be cached in-memory. Once that is done, the capacity tier is just a large blob
object store, which can serve as the store for immutable data elements from InfiniScale as well as
just any other capacity tier.
ATG Investigations
R4. Unified Data Store: As stated, the bottom-half of the InfiniScale solutions and the capacity
solutions has the potential for common underpinnings. This validation is in order. It will also
be useful to construct a simple capacity solution with efficiency and resiliency-based codes
built over the disk based storage layout of a large blob store (order of 10MB per blob). We
already have IOPs solutions that work with consolidating tiny KV-pairs into large blobs.
Having a single unified data store will enable more seamless data tiering and movement
between a cold capacity tier and a warmer InfiniScale solution. It also enables one to have

the same batch analytics jobs to run against the colder data solution and the InfiniScale tier.
This would also help bridge the gap between the volume business and value business.
3.2.3 Storage Efficiency

In the capacity tier of the InfiniScale solution it would help to have a reduced amount of storage as
well as help with reduced data movement to and from disk. Now, in the context of another
workload, de-duplication might be the right thing to do because a large data chunk that was
identical to another media object, as another user stored it. But, in the context of a tiny KV-pairs
store, where a blob is a batch of tiny KV-pairs, it is probabilistically impossible to get two blobs to
have the same value. It will probably be better to just compress that blob to gain efficiencies. So, this
function should really have been called a storage efficiency layer, implemented as de-duplication for
some capacity solutions, while performing compression for other deployments.
Compression for Columnar stores also lines up nicely as similar KV-pairs are batched together into a
single column that find their way into a single physical file/ blob on disk. For example, SAP Hana’s
SanssouciDB37 uses dictionary encoding, where a dictionary is used to map each distinct value of a
column to a shorter, so-called, value ID. Each value ID is then compressed, using only as many bits as
are needed for the operating range. In the dictionary below, 3-bits are needed to encode the 5
values of the highlighted column.
Figure 14: Compression using Dictionary Encoding
InfiniScale solutions also leverage open source compression techniques, such as Snappy. Snappy is a
compression/ decompression library. It does not aim for maximum compression, or compatibility
with any other compression library; instead, it aims for very high speeds and reasonable
compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude
faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On
a single core of an Intel i7® processor in 64-bit mode, Snappy compresses at about 250 MB/sec or
more and decompresses at about 500 MB/sec or more.
ATG Investigations
R5. Storage Efficiency through read-compressed: Storage Efficiency in the InfiniScale context is
very different from traditionally known mechanisms, like de-duplication and compression.
37
http://link.springer.com/chapter/10.1007%2F978-3-642-29575-1_4

Here, the need is to be able to organize data to improve compression ratios, select the right
algorithm for compression and subsequently enable the middleware to read that
compressed data directly, without having to uncompress. This last aspect has very important
implications, as it enables efficient use of memory bandwidth. If the compression is such
that it is possible to read into it at cache-line aligned data boundaries, the gains will be very
significant. An added aspect of this work would be to also work with encrypted data, without
having to decrypt the same. In essence, delay uncompressing and decrypting till a point that
it is absolutely necessary and the amounts of data to be subject to this has been determined.
3.2.4 Data Distribution

While data distribution in itself is not an area of active research, like in the early 2000s, an
appropriate choice of the Data Distribution algorithm is still in order. Data needs to be distributed
across nodes in a cluster. A cluster spans nodes, racks and even data centers. Data centers might be
localized or geo distributed. Expecting tight consistency semantics across geo-located data centers
across a WAN is not something that is practical. Eventual consistency might be resorted to in that
scenario. And the same might be extended to within the data center, and distribution across racks.
Master-slave architectures have a centralized metadata management, which can be looked up for
seeking out the nodes that would have the data, from where it can be served [Kerr94, Vei01]. This
also helps in identifying nodes with lesser load so that load is better distributed. HDFS is an example
of centralized metadata management. This typically follows a tight consistency model and is fine
within a data center, not across a low-latency link.
The other extreme is Cassandra, which follows a peering architecture that helps split the key-space
and routes requests within the cluster for a maximum of O(logN) hops, as in DHT. This has better
resiliency characteristics and has a decentralized balancing logic when nodes enter or leave cluster
membership. This also has better spread of workloads, as the single metadata server does not
become a bottleneck.
What is ideally needed is the peering architecture of Cassandra with the single-hop operation of
HDFS. This is important in the context of Flash and more so in the context of SCM. Options in
between the above-mentioned extremes must thus be explored.
ATG Investigations
R6. Geo-scale Data Distribution: There is a clear opportunity to merge the notions of disaster
recovery and data resiliency (covered in the next sub-section), to achieve higher levels of
storage efficiency. However, mechanisms of cross data center distribution and the wide-area
topology construction is not well understood within NetApp as we have not played as much
in the WAN space. Reconstruction typically happens in the context of a file/ object. Given
that these are coded objects that were distributed, each data center must keep track of
(metadata) that associates the objects to their coded chunks. Protocol mechanisms, such as
torrents, will be needed to stream these data chunks to a point of reconstruction, which is
typically closest to the point-of-consumption. Data ingest should take into account capacity
balancing as well as safeguard erasures due to a disaster. It should be possible to reconstruct
and redistribute object chunks that were lost due to a whole data center outage.
3.2.5 Coding for Resiliency

We have traditionally resorted to symmetric MDS codes, which are parity-based codes. These codes
have a property that they keep the data in plain-text such that read-performance is the best possible
as compared with any other code, as it does not require any reconstruction. These codes are best for
use in Primary data storage systems. Writes are also not very expensive as these codes are based on

XOR- operations, which are supported on almost any CPU. Thus, these codes can be used in the data
paths without any significant CPU utilization.
One significant shift that is needed in the coding algorithms is that the same should have the
property of logical reconstruction and must not be linked to the capacity of a physical device. It
should also spread the load of reconstruction over the entire group and yet should be able to
contain the fault boundary. This logical reconstruction is resorted to by HDFS and most of the other
InfiniScale solutions. They do this by reconstructing the content of a logical bucket. This has the
effect of better capacity balancing across nodes in a cluster, while all those nodes in the cluster
contributing to reconstruction, thereby reducing reconstruction time.
However, if we needed a DR site, we will need to create a full-replica of the primary data, at the DR
site. Most cold data will be consolidated in the cloud. The cloud is fundamentally a geo-distributed
store with high bandwidth links between those sites. If disaster strikes one of those sites, it should
still be possible to recover all data from the other sites. So, if there are 10 data center sites, a
complete erasure of one site should not result in any data loss. There is also the need of load and
capacity balancing across those sites. If coding can be done in a way that all these aspects can be
provided by a single solution, the efficiency of the solution would be maximized, leading to optimal
TCO.
Hierarchical regenerative codes have the property of being able to regenerate all data locally, with
the help of a master. If the master and another node are lost locally, the master from another site
will be needed to complete reconstruction. This code has a very nice property of graceful
degradation, and the degree of degradation depends on the severity of the fault.
Other codes such as network codes have a property of minimizing data transfer over the WAN and
extending those properties into storage will help with end-to-end storage efficiency. Typically, the
regeneration code is also send along with the coded data to enable the destination to reconstruct. If
this regeneration code can be secured the code transmission is provably secure. Thus, by combining
two or more functions of this nature, the efficiencies of the end-to-end system are further improved.
Finally, asymmetric erasure codes, such as (modified) Reed Solomon codes are known to be
expensive to compute, but provide very good storage efficiencies at high erasure rates. It is possible
to handle 5 erasures over 25 chunks at 30% overhead. If we were to create a replica at 3 known
locations, that would have been a 200% overhead with more overheads for maintenance of
metadata for where data chunks could be found. Even during reconstruction, the entire bisectional
bandwidth cannot be used. Thus a replica-based solution at geo-scale does not work well and
alternatives listed here must be explored.
Yet another aspect in favor of asymmetric coding is the fact that Intel CPUs have SIMD instructions
that further help alleviate the cost of coding and reconstruction.
ATG Investigations
R7. Mapping the Coding Landscape: Prof. Muriel has expressed deep interest in mapping the
coding landscape and co-author a survey paper with NetApp. Information about the pros
and cons of different coding schemes is severely lacking thorough treatment and
comparisons thus have been difficult. This research and survey paper has been seen as a
significant gap in the academia and industry circles.
R8. Experimentation with Coding Algorithms: Coding theory and implementation specifics have
evolved as newer techniques have been proposed and newer CPU capabilities have evolved.
Even newer implementations are posted in open source. It is thus required to make some
investments within ATG to assess some of the algorithms from a realization standpoint from

the perspective of parameters such as coding efficiency, cost of reconstruction and coding
resiliency.
3.2.6 Others
One of the major areas where NetApp is looking to invest is in solutions that meet the growing
needs of a large market. Cloud is one such market. This recommendation thus is an opportunistic
one.
One of the ways in which storage in cloud is described is Amazon S3. What would compete with
that? Or what would enable a provider to create a solution compelling enough in another
dimension? Amazon S3 is a capacity-based object store. We just described a Unified Data Store that
has an object store as its underpinnings.
ATG Investigation
R9. NetApp Key-Value Appliance: This investigation is not about data layouts or getting a
functional KV-store. That is part of the investigations called out for unified data store and
others. This is about what might be the challenges to get those in an appliance form-factor.
Some basic cost modeling show that E-series can be a viable platform for a cost sensitive
storage tier. The fundamental persona needed is that of an object store. And we just showed
that the capacity tiers of InfiniScale could be leveraged as an object store, which is a KV-
store. Identifying the design options for retaining the optimal aspects of eos firmware and
adding functionality of a large blob KV-store will enable a low-cost and native object store.
This native object store can then be combined with memory heavy hosts for an InfiniScale
solution. In its native form, it can be used as a storage building block for the cloud, in a 4U
form factor. We can then even have the much feature-rich StorageGRID software to be
deployed on this solution. This solution can also be given an S3 persona for making it cloud-
ready.
3.3 Potential Technology Targets

3.3.1 Acunu: Real-time Monitoring and Analytics for High-velocity Data
London-based data storage startup Acunu has developed a database infrastructure and analytics
software package targeting the fast growing Big Data market. Tim Moreton, Chief Product Officer
and Tom Wilkie, Vice-President of Engineering founded Acunu in 2009. It was based on a set of
algorithms and open source code initially developed at Cambridge University Computer Lab. The
Acunu Data Platform takes advantage of the dual evolutions enabling big data applications: The rise
of non-relational or NOSQL databases such as Cassandra and analytical platforms such as Hadoop,
and the performance increases of commodity hardware on which these databases run.
Acunu Data Platform: The Acunu Data Platform is a next-generation Big Data Database combining
Apache Cassandra, Acunu Control Center and the Acunu Storage Engine (also known as Acunu
Castle) as shown in Figure 15. The Acunu Storage Engine is at the heart of our distribution for Apache
Cassandra. It comprises a rewrite of the Linux storage stack that offloads much of the storage work
from Cassandra and includes advanced OS caching and buffering schemes that eliminate the need
for tuning and provide high and predictable performance for a wide range of workloads. Acunu
transforms Cassandra into an easy to use, enterprise-ready database system optimized for today's
demanding NOSQL workloads and cloud environments. The Acunu Control Center provides simple
web-based management to support common administrative tasks including cluster management
and database creation together with unique features such as cluster-wide snapshot and clone.
Acunu requires no changes to Cassandra applications; it's integrated, tested and hardened; and is
100% compatible with Apache Cassandra drivers and APIs including Thrift and CQL.

Figure 15: Acunu Data Platform
Acunu Castle: Acunu developed its platform from the ground up to leverage developments such as
distributed servers in the cloud and the huge throughput increases enabled by SSDs. Acunu built
Castle -- the storage core, an open-source Linux kernel module that contains optimizations and data
structures targeted to be deployed on commodity hardware. Castle offers a new storage interface,
where keys have any number of dimensions, and values can be very small or very large. Whole
ranges can be queried, and large values streamed in and out. It’s designed to be just general-
purpose enough to model simple key-value stores, BigTable data models such as Cassandra’s, Redis-
style data structures, graphs, and others.
Acunu Analytics: Acunu Analytics delivers a platform and toolset that makes it possible to build and
extend complex, real-time applications easily and quickly. It does this by layering flexible and
expressive data modeling on top of Cassandra's base 'key-value pair' data model and by delivering a
much richer query capability; one that is more recognizable to developers used to the ease of use
and power of SQL. Acunu Analytics runs as a layer above any Apache Cassandra ring. In fact, it is
actually a Cassandra client application, using the Hector library, so you can run it on a shared
Cassandra cluster, alongside your existing applications. Similar to Cassandra, Acunu Analytics is
scalable, high performance and handles node failures and cluster membership changes without
interruption.
Acunu Analytics provides SQL-like query constructs to the NoSQL world, enabling familiar concepts
such as SELECT, WHERE, JOIN, and GROUP BY and built in aggregating functions such as topK and
Standard Deviation. Data collection can be performed by a JSON-based API, custom data integration,
and integration with Apache Flume. Acunu Analytics also ships with Acunu Dashboards, a powerful
and flexible browser-based tool for building live dashboards, configuring Analytics schemas, and
visualizing results.
Bare Bones: The founders of Acunu have several publications in order to validate their underlying
data structures. The first one was on copy-on-write B-tree finally beaten by Andy Twigg et. al.
[Twigg11] as a means to introduce their work on data structures for versioned data stores. A more
detailed account appears in [Byde11]
3.3.2 FoundationDB: A NoSQL Database with ACID Transactions

NoSQL databases are well known for their speed and scalability - useful traits when dealing with the
size and complexity of InfiniScale data and hyper-fast transaction requirements of real time
analytics. But one thing they have lacked has been strong data consistency: the ability to ensure that
an update to data in one part of the database is immediately propagated to all other parts of the
database.
A startup database vendor based in Vienna, VA, launched March 2013 is making claims that its
database, FoundationDB delivers on the promise of true data consistency for a NoSQL database,
without a huge loss of speed or flexibility. The initial release of the FoundationDB data store is

designed to run across multiple servers. An average-size system might be a 24-node, 96-core system
with 48 solid state drives (SSDs), capable of managing around 10 terabytes of data
The founders of FoundationDB claim that CAP was being misunderstood by most people, and that in
fact choosing C and P did not preclude a system from being highly available in the case of failure
scenarios.38
Database analyst Curt Monash, of Monash Research, has warned against data stores that have been
designed to support multiple data models, noting that "BBTo date, nobody has ever discovered a data
layout that is efficient for all usage patterns"39.
FoundationDB is a key-value-like storage engine that can support (multiple) layers of NoSQL data
models. It can support a document data model to replace MongoDB, or support a key-value model
to replace memcached, or support a graph model to replace Neo4J. This enables developers to much
more easily code their apps to reach into the FoundationDB. These layers, according to the founders,
can't be used on other key-value systems, because without consistent transactions, it would not
work. As building a distributed, fault tolerant, high performance database with cross-node ACID
transactions was difficult enough, many database features were pushed out of the core and into
layers40 as possible. Thus, the data model is a simple ordered key-value store (like a dictionary) and
the API is simple, but ACID transactions make the building of higher-level data models and features
very simple. Also, since data is going to be consistent, applications won't have to be built to wait for
data to catch up within a given transaction - thus making apps less complex and easier to build.
FoundationDB, however, has found a way to offer both availability and consistency through Paxos --
an agreement algorithm, which ensures that multiple copies of the data -- the database keeps three
copies of all data it stores -- stay synchronized. Google engineers also used Paxos [Les01] in its
Spanner global database architecture [Corb12], though Google's setup is different from
FoundationDB's. Google's up-and-coming Spanner database, a second-generation distributed
database that could ultimately replace the search engine company's Bigtable systems, is being built
on the premise that transactional integrity has to be a part of that database, too.
FoundationDB doesn't offer the traditional SQL interface, but instead offers data access through C,
Python, Ruby, Node.js and Java APIs. FoundationDB uses optimistic concurrency control and multi-
version concurrency control to construct a lock-free database, which is essential in a high
performance distributed system. The transaction conflict resolution function is decoupled from the
data storage function, thus enabling separate levels of optimization. FoundationDB is optimized to
be able to take advantage of the high random I/O of SSD's, making high performance with strong
durability guarantees. FoundationDB uses Flow, a new language that is an extension of C++, which
adds some Erlang-like asynchronous functionality, while still retaining the performance advantages
of C++. FoundationDB developed Flow, which allowed them the ability to simulate thousands of
failure scenarios that cause ACID violations.
The company has published detailed metrics based on running off of a $39k 24-machine cluster
across a dataset of two billion key-value pairs. It reports a stable 500,000 operations per second of
90 percent read and 10 percent write, 150,000 operations per second if 50/50, and up to 1,080,000
writes per second across blocks of 140 adjacent keys. The software is not available as open source,
though the company has promised to release a no-cost community version. The full general release
is expected to be available by the end of 2013. The software runs on Linux, OS X, and Windows, as
well as on Amazon's Elastic Cloud Compute (EC2).
38
http://foundationdb.com/#CAP
39
http://www.dbms2.com/2013/02/21/one-database-to-rule-them-all/
40
http://www.foundationdb.com/#layers

3.3.3 BangDB: A NoSQL for Real Time Performance
BangDB is a pure vanilla key value NoSQL data store. The goal of BangDB is to be fast, reliable,
robust, scalable and easy to use data store for various data management services required by
applications. BangDB comes in flavors such as Embedded In memory, Network, Distributed data
grid/ Elastic Cache. It's written in C++ and available under BSD license. BangDB treats key and value
as arbitrary byte arrays and stores keys in both ordered fashion using B-tree and un-ordered way
using HASH. BangDB is a highly concurrent database. That is, I/Os scales well with the number of
cores.
Figure 16: Components of Embedded BangDB
The most suitable spot for BangDB is to run as many threads as number of CPU on the machine.
Since performance was one of main design items for the BangDB, hence it was realized that the
database has to be concurrent and should take advantage of number of CPUs in the machine41.
Concurrency definitely adds the complexity and overhead but to settle with low performance even
on higher capacity machine was not the intention.
Here's a summary of BangDB's features42:
Highly concurrent operations on B link -Tree
 Manipulation of the tree by any thread uses only a small constant number of page locks any
time
 Search through the tree does not prevent reading any node. Search procedure in-fact does
no locking most of the time
 Based on Lehman and Yao paper [Lehman81] but extended further for performance
improvement
Concurrent Buffer Pools
 Separate pools for different types of data. This gives flexibility and better performance when
it comes to managing data in the buffer pool in different scenarios
 Semi adaptive data flush to ensure performance degrades gracefully in case of data
overflowing out of buffer
41
http://highscalability.com/blog/2012/11/29/performance-data-for-leveldb-berkley-db-and-bangdb-for-rando.html
42
http://www.iqlect.com/architecture.php

 Access to individual buffer header in the pool is extensively optimized which allows multiple
threads to get the right set of headers in highly efficient manner resulting in better
performance
 2 LRU lists algorithm for better temporal locality
Other
 Write of data/log is always sequential

 Vectored read and write as far as possible
 Aries algorithm for WAL has been extended for performance. For ex; only index pages have
metadata related to logging, data pages are totally free of any such metadata
 Key pages for index are further cached to shortcut various steps which results in less locking
on highly accessed pages giving way towards better performance
 Slab allocator for most of the memory requirements

4 Conclusion
This technology analysis report threw light on the length and breadth of InfiniScale storage
technologies. Motivating business drivers were discussed by covering the use cases, emerging IT
requirements and the various product value domains. We introduced three types of emerging data
stores viz., Real-time analytics store, Session-state store and Active blob store. Each of these stores
was discussed from three angles:
1. Workload characteristics and the solution landscape

2. Requirements for a data store to address such workloads
3. Technologies that drive these solutions
Apart from open source projects that contribute to InfiniScale solutions, two major players (EMC 2
and Amazon) were analyzed for their play in the InfiniScale space. We concluded with
recommendations for ATG to pursue and potential technology targets that NetApp should consider,
as inorganic options.
Given below are some key insights that were gathered during various phases of authoring this
report.
4.1 Key Insights

A
The top-half of the data-store stack (up to the data distribution layer) is driven by application needs
and deployment considerations. The bottom-half of the stack is driven by technology trends and
evolutions. [§2.2.1]
B
The data distribution and data abstraction layer is where we have an opportunity to innovate and
differentiate, with well-engineered products and solutions. [§2.2.1]
C
In the past, we have driven efficiencies by coupling storage resiliency (through RAID) with data
layout. With a more geo-dispersed infrastructure, efficiencies will need to be driven by coupling
storage resiliency with the data distribution layer. [§2.2.1]
D
Examining the economic feasibility of heavy usage of memory (DRAM) is interesting. [§2.2.2]
E
To address the question of why a single in-memory data layout does not suffice the needs of
various data stores, we need to look at the impact of treating DRAM as a purely random access
medium. [§2.2.3]
F
Analytics Data Stores are classified into a. Traditional SQL, Converged DB, b. Emerging NoSQL,
Converged c. Traditional BI, DSS/ DW and d. Emerging Content Stores, MR on HDFS/ CDMI. [§2.2.4]
G
OLTP and OLAP data stores are converging. [§2.2.4]
H
Data management is impacted due to the convergence of OLTP/ OLAP data stores. [§2.2.4]
I
Transactions as an interesting point of control. [§2.2.4]
J
Technology trends are fueling scale-out shared-nothing architectures. [§2.2.5]
K
Tipping over from centralized shared storage to Shared-nothing Data Stores is just around the
corner. [§2.2.5]

L
Coordinating a distributed snapshot across a 1000-node shared-nothing cluster is as yet an
unsolved hard problem. [§2.2.5]
M
POSIX is relegated to node level semantics and at very low levels of abstraction, almost making it
irrelevant. [§2.2.6]
N
POSIX is not a viable presentation layer for InfiniScale solutions. [§2.2.6]
O
InfiniScale middleware is emerging as the new User-space Kernel. [§2.2.6]
P
With in-memory processing gaining traction, the cost of modularity in the Linux IO Subsystem is
under the scanner. [§2.2.6]
Q
The strength of consistency impacts real-time ingest and query processing. [§2.2.7]
R
First ingest of authentic copy of data started to come into InfiniScale solutions, instead of shared
storage and increases the need for anti-caching solutions. [§2.2.7]
S
Tiering is adopted, but in the direction of InfiniScale to active archive (anti-caching), rather than
caching into InfiniScale. [§2.2.7]
T
InfiniScale middleware defines everything from Data Abstraction to Data Service. Examples include
cell-level versioning, replication, health monitoring, etc. [§2.2.8]
U
Self-healing and self-management are part and parcel of InfiniScale middleware. [§2.2.8]
V
Languages for application development are emerging. [§2.2.9]
W
Application development of the evolved era develop around InfiniScale middleware. [§2.2.9]
X
A significant part of the functionalities of kernels move to user space as a part of the InfiniScale
middleware. [§2.2.9]
Y
There are open source frameworks, middleware and libraries in InfiniScale that can be embraced
with or competed against. [§2.2.9]
Z
Efficiency through disaggregation is the policy behind Open Compute. [§2.2.10]
AA
Re-evolution of DAS based storage solutions in Open Compute. [§2.2.10]
BB
To date, nobody has ever discovered a data layout that is efficient for all usage patterns. [§3.3.2]

5 References
[Beer13] Leander Beernaert, Pedro Gomes, Miguel Matos, Ricardo Vilaça, and Rui Oliveira. Evaluating
Cassandra as a manager of large file sets. In Proceedings of the 3rd International Workshop on
Cloud Data and Platforms (CloudDP '13). ACM, New York, NY, USA, 25-30, 2013.
[Brewer00] Eric A. Brewer. Towards robust distributed systems. In PODC, page 7, 2000.
[Byde11] Andrew Byde, Andy Twigg. Optimal query/update tradeoffs in versioned dictionaries,
http://arxiv.org/abs/1103.2566, April 2011.
[Chen12] Jianjun Chen, Chris Douglas, Michi Mutsuzaki, Patrick Quaid, Raghu Ramakrishnan, Sriram Rao,
and Russell Sears. 2012. Walnut: a unified cloud object store. In Proceedings of the 2012 ACM
SIGMOD International Conference on Management of Data (SIGMOD '12). ACM, New York, NY,
USA, 743-754.
[Corb12] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,
Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,
Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,
David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,
Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google’s Globally-Distributed
Database, Proceedings of OSDI'12: Tenth Symposium on Operating System Design and
Implementation, Hollywood, CA, October, 2012.
[Gilbert02] Seth Gilbert and Nancy A. Lynch. Brewer’s conjecture and the feasibility of consistent, available,
partition-tolerant web services. SIGACT News, 33(2):51–59, 2002.
[Govil08] Jivika Govil; Kaur, N.; Kaur, H.; Jivesh Govil, "Data/Information Lifecycle Management: A Solution
for Taming Data Beast”, Fifth International Conference on Information Technology: New
Generations, 2008. ITNG 2008., vol., no., pp.1226,1227, 7-9 April 2008.
[Gray81] J. Gray. The Transaction Concept, Virtues and Limitations. In Proceedings of VLDB, Cannes,
France, Sept 1981.
[Grid06] G. Grider, L. Ward, R. Ross, and G. Gibson, "A Business Case for Extensions to the POSIX I/O API
for High End, Clustered, and Highly Concurrent Computing,"
www.opengroup.org/platform/hecewg, 2006.
[Hild09] Dean Hildebrand, Arifa Nisar, and Roger Haskin. 2009. pNFS, POSIX, and MPI-IO: a tale of three
semantics. In Proceedings of the 4th Annual Workshop on Petascale Data Storage (PDSW '09).
ACM, New York, NY, USA, 32-36.
[Islam12] N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and

D. K. Panda. 2012. High performance RDMA-based design of HDFS over InfiniBand. In Proceedings
of the International Conference on High Performance Computing, Networking, Storage and
Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, , Article 35 , 35 pages.
[Kan12] Kan, M.; Kobayashl, D.; Yokota, H., "Data layout management for energy-saving key-value storage
using a write off-loading technique," Cloud Computing Technology and Science (CloudCom), 2012
IEEE 4th International Conference on , vol., no., pp.74,81, 3-6 Dec. 2012.
[Kerr94] Kerr, A.U., "Towards Distributed Storage and Data Management Systems," Mass Storage Systems,
1994. 'Towards Distributed Storage and Data Management Systems.' First International
Symposium. Proceedings., Thirteenth IEEE Symposium on , vol., no., pp.1,, 1994.
[Khai11] Cho Cho Khaing; Thinn Thu Naing, "The efficient data storage management system on cluster-
based private cloud data center," Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE
International Conference on , vol., no., pp.235,239, 15-17 Sept. 2011.

[Lars12] Per-Ake Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, Mike Zwilling,
High-Performance Concurrency Control Mechanisms for Main-Memory Databases, 38th
International Conference on Very Large Data Bases, 2012.
[Lehman81] Philip L. Lehman and s. Bing Yao. 1981. Efficient locking for concurrent operations on B-trees.
ACM Trans. Database Syst. 6, 4, 650-670, December 1981.
[Les01] Lamport, Leslie. Paxos Made Simple ACM SIGACT News (Distributed Computing Column) 32, 4
(Whole Number 121) 51-58, December 2001.
[Leva13] Justin J. Levandoski, David B. Lomet, Sudipta Sengupta, The Bw-Tree: A B-Tree for New Hardware
Platforms, 29th IEEE International Conference on Data Engineering, 2013.
[Ling03] Benjamin C. Ling and Armando Fox. 2003. The case for a session state storage layer. In
Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9 (HOTOS'03),
Vol. 9. USENIX Association, Berkeley, CA, USA, 30-30.
[Mohan92] C. Mohan , Don Haderle , Bruce Lindsay , Hamid Pirahesh , Peter Schwarz. Aries: A transaction
recovery method supporting fine-granularity locking and partial rollbacks using write-ahead
logging, ACM Transactions on Database Systems, Vol 17, 94-162, 1992.
[Nishi12] Nishikawa, N.; Nakano, M.; Kitsuregawa, M., "Energy Efficient Storage Management Cooperated
with Large Data Intensive Applications," Data Engineering (ICDE), 2012 IEEE 28th International
Conference on , vol., no., pp.126,137, 1-5 April 2012.
[Sakr11] Sherif Sakr, Anna Liu, Daniel M. Batista, and Mohammad Alomari, A Survey of Large Scale Data
Management Approaches in Cloud Environments , IEEE Communications Surveys & Tutorials, Vol.
13, No. 3, Third Quarter 2011, 311 - 336.
[Ston05] Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has
Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE
'05). IEEE Computer Society, Washington, DC, USA, 2-11, 2005
[Twigg11] Andy Twigg, Andrew Byde, Grzegorz Milos, Tim Moreton, John Wilkes, Tom Wilkie. Stratified B-
trees and versioning dictionaries, http://arxiv.org/abs/1103.4282, March 2011.
[Vei01] Veitch, A.; Riedel, E.; Towers, S.; Wilkes, J., "Towards global storage management and data
placement," Hot Topics in Operating Systems, 2001. Proceedings of the Eighth Workshop on ,
vol., no., pp.184,, 20-22 May 2001.
[Voul11] Voulodimos, A.; Gogouvitis, S.V.; Mavrogeorgi, N.; Talyansky, R.; Kyriazis, D.; Koutsoutos, S.;
Alexandrou, V.; Kolodner, E.; Brand, P.; Varvarigou, T., "A Unified Management Model for Data
Intensive Storage Clouds," Network Cloud Computing and Applications (NCCA), 2011 First
International Symposium on , vol., no., pp.69,72, 21-23 Nov. 2011.
NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any
information or recommendations provided in this publication, or with respect to any results that may be
obtained by the use of the information or observance of any recommendations provided herein. The
information in this document is distributed AS IS, and the use of this information or the implementation of
any recommendations or techniques herein is the implementers’ responsibility and depends on the their
ability to evaluate and integrate them into the operational environment. This document and
the information contained herein may be used solely in connection with the NetApp products discussed
in this document.
© 2013 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp,
Inc. Specifications are subject to change without notice. NetApp, the NetApp logo and Go further, faster are trademarks or registered
trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or registered
trademarks of their respective holders and should be treated as such. InfiniScale Storage Architectures

InfiniScaleStorage TAR

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

InfiniScaleStorage TAR

Uploaded by

Copyright:

Available Formats

Technology Analysis Report | CTO Office

InfiniScale Storage Architectures

CTO Office, NetApp

2 InfiniScale Storage Architectures NetApp Confidential – Limited Use

3 InfiniScale Storage Architectures NetApp Confidential – Limited Use

4 InfiniScale Storage Architectures NetApp Confidential – Limited Use

5 InfiniScale Storage Architectures NetApp Confidential – Limited Use

Table 1: Factors that differentiate InfiniScale from other storages

InfiniScale Storage Active Archive Deep Archive

1.1 Business Drivers

6 InfiniScale Storage Architectures NetApp Confidential – Limited Use

1.1.2 Emerging IT Requirements

1. Agility: As cloud based web-services need to be adaptive to changing consumption scenarios

7 InfiniScale Storage Architectures NetApp Confidential – Limited Use

1.1.3 Product Value Domains

8 InfiniScale Storage Architectures NetApp Confidential – Limited Use

1.2 Emerging Data Stores

1.2.1 Real-time Analytics Store

9 InfiniScale Storage Architectures NetApp Confidential – Limited Use

An example of a realtime analytics InfiniScale datastore is Thomson Reuters’ (TR) Velocity

1.2.2 Session-state Store

Some important properties of the session state are

 No synchronization is needed across the whole data store

10 InfiniScale Storage Architectures NetApp Confidential – Limited Use

The characteristics of this workload are:

11 InfiniScale Storage Architectures NetApp Confidential – Limited Use

Table 2: Workload characteristics for emerging data stores13

Extremely time bound queries on Massively concurrent CRUD RESTful geo-distributed

Table 3: NetApp Customer References

NoSQL Workload Operational Multi-DC Reason for choice of

1. Dedicated On-Premise: These include all on-premise InfiniScale application deployment

 There is great seasonal predictability in the workload

 Intuit’s TurboTax development & testing platform built on AWS

14 InfiniScale Storage Architectures NetApp Confidential – Limited Use

Cloud PaaS/IaaS Dedicated On-Premise

Figure 1: Addressable Market Size of InfiniScale Storage

1.4 Competitive Landscape

1.4.1 EMC Pivotal

15 InfiniScale Storage Architectures NetApp Confidential – Limited Use

16 InfiniScale Storage Architectures NetApp Confidential – Limited Use

Figure 2: Pivotal: Integrating EMC2 and VMware Assets

Table 4: Portfolio of EMC2 for emerging data stores

Realtime Analytics Greenplum, Cetas

1.4.2 Amazon Web Services

17 InfiniScale Storage Architectures NetApp Confidential – Limited Use

1.4.3 Open Source Projects

Apache Cassandra: Cassandra was originally developed at Facebook to provide an InfiniScale

18 InfiniScale Storage Architectures NetApp Confidential – Limited Use

Key characteristics of these open source based products include:

1.5 Why Should NetApp Care?

19 InfiniScale Storage Architectures NetApp Confidential – Limited Use

20 InfiniScale Storage Architectures NetApp Confidential – Limited Use

2.1 InfiniScale Technology Overview

1. API-driven Storage: Next generation of storage is API-driven rather than protocol-driven. In

21 InfiniScale Storage Architectures NetApp Confidential – Limited Use

2.2 InfiniScale Technology Insights

2.2.1 API-driven Storage

22 InfiniScale Storage Architectures NetApp Confidential – Limited Use

2.2.2 Emerging Workloads and Datasets

23 InfiniScale Storage Architectures NetApp Confidential – Limited Use

24 InfiniScale Storage Architectures NetApp Confidential – Limited Use

2.2.3 Emergence of Custom Data Stores

25 InfiniScale Storage Architectures NetApp Confidential – Limited Use

Thus, one-size-fits-all does not hold good anymore [Ston05].