Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

An application–centric model for cloud management

Terence Harmer, Peter Wright, Christina Cunningham, John Hawkins and Ron Perrott
Belfast e-Science
Queen’s University of Belfast
Belfast, UK
{t.harmer, p.wright, c.cunningham, j.hawkins, r.perrott}@besc.ac.uk

Abstract—The cloud model is increasingly popular as a application and provide a framework to enable an application
means of creating dynamic, flexible and cost effective network- to be monitored and managed.
centric application infrastructures. The model separates the The separation between application concerns and resource
applications, or application cloud, from the resources, or
resource cloud, upon which the applications will be hosted. concerns means that the application can conceivably be
There are an increasing number of utility resource providers moved between different hosting environments providing
that aim to provide cloud infrastructure on demand to users potential resilience and robustness to the application deploy-
and libraries that aim to manage owned infrastructure as ment. The resource cloud may be thought of as a resource
a resource cloud. There is, unfortunately, no common API marketplace with resources consumed by the application
for cloud resources and it is unlikely that one will emerge
soon given the immaturity of the area and the competing from the best available resource provider to match appli-
commercial interests in the domain. In this paper, we outline cation demands. This offers researchers and businesses the
our commodity and application-centric approach to resource potential of significant cost-savings in that it is possible for
management, and describe our integration framework for cloud them to match the cost of their computing and storage to
application management–illustrating its use in a field deployed their demand for such resources. Thus, a user needing 1000
application and a particular dynamic component within that
application. compute resources for 1 day per week may acquire those
resources only for the day they are required.
An application could be hosted on an internal resource
I. I NTRODUCTION cloud consisting of compute resources, storage and net-
work that are wholly owned by the organisation, or similar
The cloud computing or cloud model is increasingly pop-
resources rented from a commercial resource provider or
ular as a means of creating dynamic, flexible and cost effec-
indeed a combination of both. For wholly-owned resource
tive network-centric application infrastructures. The model
clouds, the model enables organisation-wide aggregation
distinguishes between the software application components,
of resource requirements which should lead to improved
the application cloud, and the hosting computing resource
resource utilisation, reduced management costs and pro-
components upon which the application will be executed and
vide a better return on investment in resources. In today’s
managed, the resource cloud. This separation of concerns
market, it is possible for a business or researcher to be a
is intended to bring flexibility, mobility, scalability and
significant computation resource consumer without owning
greater robustness to the application, and the potential of
any significant infrastructure, instead relying primarily on
cost savings in the deployment and operation of network
resources purchased from commercial hosting providers
centric infrastructures.
when they need a compute infrastructure. Again, this can
The application cloud is most often dynamic and de-
bring significant cost benefits and reduce the barriers faced
signed to scale to meet demand from users for application
by businesses in becoming significant service providers
services–the hosting cloud provides a reliable and scalable
or researchers whilst undertaking computationally intense
physical infrastructure that hosts application services. From
research experiments.
the application perspective, the hosting cloud is a utility
The cloud model has much potential and the creation of
that is provisioned to meet the needs of the application
a resource marketplace will bring significant cost savings
cloud. The resource cloud will consist of one or more
and robustness to application users. However, there is no
resource providers each of which supplies hosting, network
common provider interface and little prospect of one being
and storage capabilities on-demand. The role of the resource
adopted in the short term given the immaturity of the cloud
cloud provider is to supply a robust, secure and flexible
model–the biggest provider, Amazon, might be regarded
environment that is configured to meet the needs of the
as a de facto standard leading to open source libraries
based on its model[1]. On-demand provision of resources to
This work is supported by the UK Technology Strategy Board under
grant TP/3/PIT/6/l/15656 and the UK EPSRC under Platform Award users has been around for some time with grid computing
EP/F066139/1 services, such as the UK National Grid Service[2] and the
US TeraGrid[3], but they have largely been job focused EBS (a networked block device system) volume then they
rather than service-focused. In these infrastructures the goal can simply call the EBS snapshot method on that volume.
is to optimise the compute jobs that are being requested by However, if the user had provisioned this same system with
users—delivering the fastest compute for users and making Flexiscale or NewServers (where there is no block device
the best use of the fixed compute infrastructure. snapshot API) then the user can not make such a request. If
In general, an application must be aware of the providers the user was using an application-level abstraction, however,
it is using and their particular features–this can vary from they would have provisioned ”a database” and would be
the charging models they use to the network configuration requesting a snapshot of the database. With such a higher
options and virtualisation technology they can provide. In abstraction there is more room for the API to change how
this evolving area, capabilities are added rapidly and the the request is executed: in the case of EC2 it could use
supporting APIs to use these capabilities change frequently. the Amazon Relational Database Service whereas in the
This represents a barrier to using the cloud model and Flexiscale or NewServers case it could provision a host,
increases the current cost of use by requiring software to configure it with MySQL, say, and ask the database for a
track the evolution of provider APIs. snapshot directly.
We (BeSC) have developed a number large-scale commer- The next section demonstrates the basic API differences
cial and research cloud deployments that use a commodity between provisioning the cheapest resources in the clouds
market approach to resources provided within a resource of Amazon EC2 and Flexiscale
cloud. In the commodity marketplace, resources are discov-
A. Provisioning: Flexiscale
ered and selected to satisfy the high-level needs of the appli-
cation. To support this work we have created an application- Flexiscale’s[13] API requires users to define the specifi-
centric and provider neutral interface[4] that enables an cation of the machines they wish to start: the number of
application to use a variety of providers in a generic way CPUs, amount of RAM and the size of the hard disk. The
and to express provisioning using commodity market centric valid combinations of these are not exposed through the API
ideas. In this paper we develop these commodity market API but rather are shown on the Flexiscale website in an HTML
ideas and illustrate their use within a large-scale operation table - if this table changes then the users of the API would
that is currently is user field trial. need to modify their provisioning logic to take account of
it.
II. BACKGROUND : R ESOURCE - CENTRIC API S S t r i n g i d = ‘ MyNewServer ’ ;
Almost all of the current cloud APIs focus on resource
clouds, removing the requirement to host or own the physical / / Get d e f a u l t h o s t i n g p a c k a g e , VLAN + OS
P a c k a g e pkg = f l e x i s c a l e . p a c k a g e s [ 0 ] ;
cloud resources to be used. It is still necessary, however, Vlan v l a n = f l e x i s c a l e . v l a n s [ 0 ] ;
to manage this dynamic collection of resources that will OSImage o s = new OSImage ( 2 7 ) ;
host the application cloud. Effective management of the
resource layer in a provider-neutral way is difficult. There S e r v e r s = new S e r v e r ( ) ;
are both significant and subtle differences in current APIs. s . setName ( i d ) ;
s . setDisk capacity (4);
More importantly, there are significant differences in the s . setProcessors (1);
resource allocation models those APIs support and which are s . setMemory ( 5 1 2 ) ;
exposed to the application. In addition, utility provider APIs s . s e t P a c k a g e i d ( pkg . i d ) ;
are evolving rapidly and are subject to significant changes; s . s e t O p e r a t i n g S y s t e m ( os ) ;
for example, Amazon’s introduction of Spot Instances intro- s . s e t I n i t i a l P a s s w o r d ( ‘ changeme ’ ) ;
duced an entirely new EC2 resource life-cycle model. f l e x i s c a l e . createServer ( s , vlan ) ;
There has been an explosion of provider APIs (such f l e x i s c a l e . s t a r t S e r v e r ( i n s t I d , ‘ no n o t e s ’ ) ;
as EC2[5], Rackspace[6], OpenNebula[7] OCA, OCCI[8],
Azure[9], NewServers[10], etc.) and user libraries (such String ip = f l e x i s c a l e . getServer ( id ) [ 0 ] . ip [ 0 ] ;
as libcloud.org[11], Dasein[12]) for the cloud model. The
System . o u t . p r i n t f ( ‘% s ’ s I P : %s \n ’ , i d , i p ) ;
APIs, in general, are focused on exposing the available
infrastructure-level capabilities, such block devices, virtual In the above code we pick our hosting package (a handle to
machines, snapshots, power states. This low-level view Flexiscale’s billing system), the network VLAN to which the
makes cross-provider operations (or porting a novel cloud machine should be connected and a pointer to the Ubuntu
between providers) heavy work if there is no exact match operating system (image number 27.) We then create a server
to the features being used. with 1 CPU, 512MB of RAM and 4GB of disk space.
By way of an example, suppose a user provisions a Once created, we start the machine and then request its
database and some time later they wish to produce a details from the API to determine the IP address assigned to
snapshot of its state; if the database files sit on an Amazon this instance. Flexiscale must create a machine image when
requested and so turnaround time can be in the region of 2) specify the associated storage and networking infras-
10 minutes - as a result it is important to create a resource tructure that are required for components;
in advance of when it will be required by an application 3) specify configuration of a component or infrastructure;
(which can complicate scaling to meet changes in application 4) specify software versions within the application stack
demand). that are suitable;
5) specify online sources from which software should be
B. Provisioning: Amazon EC2
taken when required;
R u n I n s t R e q u e s t r e q = new R u n I n s t R e q u e s t ( ) ; 6) manage the application life-cycle, such as instantia-
r e q . s e t I m a g e I d ( ‘ ami−1c5db975 ’ ) ; tion, discard and shutdown; and
r e q . s e t P l a c e m e n t ( ‘ us−e a s t −1a ’ ) ; 7) monitor the resource and application behaviour during
/ / 1GHz xeon , 1 . 7GB RAM, 150GB d i s k
its life-cycle.
r e q . s e t I n s t T y p e ( ‘ m1 . s m a l l ’ ) ;
r e q . setKeyName ( ‘ someAuthKey ’ ) ; As we already described, there are many APIs and libraries.
The problem is that few of these enable an application
R e s e r v a t i o n r e s = ec2 . r u n I n s t a n c e s ( req ) ; to remain agnostic to the provider environment and thus
enable the provider cloud to be viewed as a generic resource
String id = res . getInstance ( ) . getInstanceId ( ) ;
String ip = r e s . g e t I n s t a n c e ( ) . getPublicDNS ( ) ; marketplace for resource allocation and trading. In such a
trading environment, an application establishes the underly-
System . o u t . p r i n t f ( ‘% s ’ s I P : %s \n ’ , i d , i p ) ; ing providers with which it has a financial arrangement in
order to enable those resources to be accessible. Forming
In the above code we pick the Amazon Machine Image
an agreement may be as simple as creating an account
(AMI) representing an Ubuntu operating system, specify the
and providing credit card details - but it could also involve
datacentre we wish the resource to reside in and pick an
personally auditing the provider’s data centres.
Instance Type - a template specifying CPU cores, speed,
Once a pool of resources is defined, an application can
RAM and scratch disk size and quantity. Once these have
select and allocate suitable resources within it. But what
been specified we also define the SSH login key to use and
model should the API take in specifying resource properties
start the machine. Machines in Amazon start reasonably
to cope with the differing resource provider models? Given
quickly - generally within two minutes - allowing us to
the range of possible hardware and software requirements
rapidly acquire large numbers of resources.
for applications, defining a flexible and extensible API is
These code snippets (above) are performing the same
a challenge. We take a discovery based approach where
simple operation–selecting a basic CPU type, suitable op-
an application attempts to find resources (from the pool of
erating system and basic configuration–which is a basic
acceptable providers) that meet its requirements.
operation of a dynamic multi-provider cloud application. In
These requirements express constraints on the acceptable
the Flexiscale and EC2 examples, the available operating
resources and resource configurations in order to satisfy the
systems and resource specifications (CPU, RAM, disk) are
needs of the application. Each constraint expresses a simple
somewhat static– when a new template is added the de-
testable property; for example, that a particular CPU type,
veloper (or deployment team) must modify the application
RAM size, operating system versions or storage access type
to enable the use the new resource. This static approach
is to be used. The idea is that a collection of these simple
reduces the flexibility of applications and tends to lock
constraints can be used to express more complex needs and
applications into particular providers and particular known
provide for systematic and automated comparison of what
configurations. The differences are significant for such a
is available within the provider, thereby allowing us, for
basic cloud operation and become larger when underlying
example, to define a Windows 7 OS hosted on dual core
features such as network configuration, security and data
processor with 3GB of RAM, with input network capacity
storage are used.
of 100Mbyte/sec and using storage resilience of at least 99%.
III. A N APPLICATION - CENTRIC API The capability to specify constraints to select a resource
We are interested in dynamic service-centric systems that rather than mandating a design-time specified solution is
are deployed as dynamic application clouds. The application very powerful since the developer can utilise the best avail-
cloud is hosted in a resource cloud implemented by a able solution at the application’s runtime based on their
collection of resource providers, and is intended to remain current situational requirements. For example, if the user
agnostic to the underlying provider using cloud resources could express “I need 2GB of RAM but I’d be willing to
from whichever provider meets the needs of the application. pay $0.10 more per hour for 4GB” or “the resource must be
In this model, a resource-provider-agnostic API is required in an ISO 27001 accredited datacentre” then any approved
to enable an application dynamically to provider capable of satisfying those requirements at runtime
1) specify the CPU and operating system types that are could be automatically considered. Thus by expressing the
suitable for the application or a component; requirements of the application explicitly it is possible to
take advantage of any new resource type or capability when resource provider. Currently, when developing cloud appli-
it becomes available within the provider without any change cations, many of the steps in our model are performed manu-
in the application code. ally and updated frequently to reflect API or resource model
The find operation returns a list of suitable templates from changes. This approach means that the application can take
which an application can choose an appropriate resource. a dynamic, commodity-centric approach to resource usage.
The application chooses the most appropriate resource by An application cloud can be deployed and scaled according
assigning a cost to each suitable template. The cost is, to application and system constraints, stay within a strict
ideally, an estimate of the cost of a typical usage scenario budget and be mobile between all available infrastructure
to a particular application. A usage scenario is a collection providers. As an example we consider a large-scale example
of chargeable operations such as: which manages a on-demand media infrastructure for a
media test community and specifically a component which
1) allocating/deallocating a resource
manages video transcoding–this infrastructure has been in
2) reserving a resource
field deployment and testing for more than 2 years and has
3) keeping a resource switched on for a certain period of
managed more than 1 petabyte of media content in that time.
time
4) using a certain amount of incoming/outgoing band- IV. E XAMPLE : PRISM A U TILITY INFRASTRUCTURE
width within the provider (over a given period of time) AND SERVICE MARKETPLACE
5) performing a number of IO operations on a local or
SAN disk (over a given period of time) The PRISM[14] on-demand media infrastructure man-
6) storing a certain volume of data on a local or SAN ages large-scale and distributed media content creation and
disk (for a given period of time) distribution to a user and service community–the media
content is the daily television and radio output of a broad-
These scenarios are simple to create, however predefined
caster and contributed media from the user community.
scenarios with the API are available for compute-only,
PRISM developed ideas and infrastructure from the GridCast
network-only and storage-only applications to allow devel-
project[15], [16] that created a cloud model for an internal
opers to us the API without a significant effort.
broadcasting infrastructure with the aim of exploring new
Each template has a cost model which can be interrogated
models for on-demand provisioning, access and delivery for
about the monetary cost of all of the above chargeable
the media domain. Whereas Gridcast focused on the rather
operations. We term the estimated cost of a usage scenario a
static needs of a deployed broadcaster infrastructure with
scenario cost. An application may have multiple scenarios,
known demand and usage patterns, PRISM is a dynamic
of which it will pick one representative scenario based
service infrastructure with user driven and more random
on the available features of the resource. A good example
demand patterns within the evolving digital media economy.
of this is where the resource must provide some storage
The media economy is increasingly a marketplace for ser-
redundancy; there would be a scenario for redundant storage
vices with shared infrastructure, collaborative development
(with fewer write operations and half as much data stored)
and service trading between specialist companies. This mar-
as well as a worst-case for non-redundant storage. Since
ketplace is global, rapidly evolving and cost sensitive where
we can calculate the cost of providing redundancy on top
establishing a market service quickly and cost effectively
of a non-redundant infrastructure we can determine the
brings significant business benefits. The basic capabilities of
cost effectiveness of provider-managed redundancy versus
PRISM, as illustrated in Figure 1, within this marketplace
managing redundancy at the application or operating system
are
level.
• content access providing a content portal, user interface
Once an appropriate resource has been selected it can be
reserved. A reservation is a short-term hold on resources and media exchange; and
• collaborative sharing and production with B2B and
(and is a no-op for many providers currently) designed to
provide a guarantee the resources will be available when Media services;
needed. Once resources are needed they are instantiated, These services are implemented as dynamic service clouds
which provides the application with a running resource for
which it can use as it sees fit. When the application finishes • content management, replication, managed sharing, and
with the resource it discards it, returning it to the provider. secure and authenticated user access;
Combining all the steps discussed so far provides a usage • content services to process the content into a form that
model of define provider, find resource, reserve resource, is sharable and available to users;
instantiate resource, use resource, discard resource. • content metadata management, indexing, replication
This model is intentionally simple and attempts to reflect and sharing; and
an ideal for the application user who wishes to specify • content management including release, withdrawal and
resources without knowing the internal behaviour of the access rights.
logical groups that should be deployed together, reflecting
a need to share data. For example, a content repository
usually requires supporting transcoding services to enable
content to be delivered to a user in the format required. The
provider API has the capability to support multi-resource
allocation enabling a group of templates to be output from
a find operation to support costing based on service-group
hosting.
All of the software within the PRISM infrastructure is
annotated with metadata that defines its particular require-
ments when being deployed–this metadata is generated as an
integral part of the development process and stored as part of
the system definition. For commercial software, the metadata
is generated by the developer when the software is integrated
Figure 1. Media Economy and PRISM Services
into the PRISM development. A system’s metadata defines
• resource requirements such as CPU, RAM, network
capacity security requirements;
The components within these clouds are scaled by our cloud • hosting operating systems and acceptable versions;
infrastructure management layer Zeel (outlined below) in • supporting software stack listing software, acceptable
line with the needs of the infrastructure in order to stay versions and acceptable online repositories from which
within established service level agreements. A basic internal the software can be sourced.
scenario is that new media content is being released (to • deployment costing scenarios.
users or collaborators): the content is processed to its correct
The deployment costing scenarios define a costing model for
distributable format(s), shared with authorised distributors
particular patterns of use. For example, the requirements for
and supplied with appropriate metadata to enable discovery
a transcode component are different when it is deployed as
and access. The PRISM services infrastructure provide a
an on-demand/on-the-fly transcoder compared to one that is
platform for distributed service provision between media
used as a batch transcoder.
service companies and supplying content to distributors and
The deployment of the PRISM infrastructure and the
other end-users.
management of its day-to-day operation is performed by the
The services within the PRISM cloud, as shown in Figure Zeel management system developed by BeSC.
1, are hosted on a collection of owned resource providers,
in this instance owned by BBC, BT and BeSC, and utility V. Z EEL - A CLOUD MANAGEMENT LAYER
providers, such as Amazon and Flexiscale. Each of the In developing projects at BeSC (some five years ago)
resource cloud providers is managed identically as outlined we faced a task that many cloud developers now face. We
above using the provider agnostic library–with each provider have a collection of resources that we wish to treat as
advertising available templates that may be purchased in utility resources that are configured as we require them and
order to host application cloud services. The balance of in a form we require–sometimes these will be Windows
use between providers will depend on the availability of flavours, other times these will be Linux flavours and each
resources–in the past we have lost one of our owned should be pre-configured with an application stack to suit
providers due to network failure and the services were its purpose. The other problem we faced was that much of
migrated automatically to a utility provider. our infrastructure was remote to BeSC and often required a
To deploy and manage the PRISM infrastructure we use plane journey in order to access the physical hardware. Our
the above API to allocate and scale the resources on which solution was to develop a resource manager and application
services are hosted. PRISM software has a wide range deployment framework, called Zeel, which has been in use
requirements. Some components, such as media transcoders, (in various forms) within BeSC for more than four years
require high performance resources with fast network con- to manage and monitor resources, manage virtual machines,
nectivity to enable sharing of the processed content. Other deploy/undeploy software, and configure storage, security
services, such as user interfaces require lower powered and network. Zeel is capable of managing machines securely
resources with connectivity to search and streaming engines– across multiple administrative domains using the above
these components are most likely to scale as demand provider agnostic API (which within BeSC is usually called
changes. Some services have large data storage requirements Zeel infrastructure layer or simply Zeel/i) to control physical
in order to host large-scale archives accessible to users. resource clouds.
Moreover, the allocation of services to a provider cannot In essence, Zeel is a management framework rather than
be handled in a per-service way as some services form a software solution in that it ties together available software
using a Software as a Service (SaaS) transcode provider
from a 3rd party. A plan can be undeployed when it is no
longer needed, allowing the user’s system to be torn down
and resources discarded or restored to their original state.
At eZeel’s core is a software registry and a machine
registry; the software registry defined security policies and
obligations for using various pieces of software and the ma-
chine registry detailed the state of the resource, the software
installed on it and the policies and obligations in effect as
a result of either its location, role or the software installed.
This allows the user to express high-level restrictions such
as “software x cannot be on the same machine as software
y or data z” and this can be enforced in an auditable
way. Policies are specified in XACML and enforced by
Figure 2. A high-level view of Zeel’s components an XACML engine, such as Sun XACML engine or the
HERAS-AF XACML engine.
eZeel is a framework to allow a system to be managed and
solutions to create a management layer–the Zeel whole is monitored in its own terms: a system is comprised of one
more significant than the sum of it parts. For example, or more capabilities. Capabilities encapsulate all monitoring
the Zeel deployment solution is a framework that provides and management tasks for a software component, putting
generic access to custom solutions provided by the resource, them in terms that is meaningful to the wider system. For
such as rpm on RedHat linux or msi under windows; and example:
the Zeel virtualisation solution is a framework with plugins • a webserver capability would provide monitoring data
for VMWare, Xen, OpenVZ etc. Thus the Zeel management relevant to serving pages (hits per second, the cost to
layer talks about virtualisation and deployment in a generic serve the average request, etc.)
fashion rather than in particular technology terms–it is then • a storage capability would show storage terms (usage,
relatively easy to integrate a new operating system or vir- storage cost per gigabyte per hour, current latency)
tualisation technology without changes in the management • a policy engine capability would show policy evaluation
layer. terms (requests evaluated per second, number of denied
Zeel provides core management layers for deployment requests)
(internally referred to as zeel/d), tasks such as configuration
(zeel/t) and monitoring (zeel/m). These management layers Capability objects have access to the Zeel APIs for
exist to allow the co-ordinating framework, eZeel, to operate deployment, monitoring and task execution; they are also
at a very abstract level. free to make contact with the service they represent to
perform additional functions. Each capability has one or
A. eZeel more Hosts upon which its software is running; the Host
eZeel is designed around deploying groups of software object encapsulates all the operating system and provider-
onto sets of machines–for example a storage and transcode specific tasks. This encapsulation allows the Capability to
group that are part of the PRISM infrastructure. The ma- be completely agnostic to hardware terms: in fact, it makes
chines can either be directly specified (by IP address, for it possible for multiple Capabilities to be co-located on
instance) or searched for (for instance, a machine with a single host when the cost of running them on separate
SSSE3 and at least 4GB of RAM as constraints to Zeel/i). hosts cannot be justified (and where the Capabilities permit
The high level operation of eZeel is shown in Figure such co-location). A Capability could also be a composite
2. eZeel takes abstract deployment plans and enacts them– capability: comprised of other capabilities (and generally
generally this would be a request to deploy a variety of adding value to the grouping - for instance exposing average
services onto a collection of resources with specific com- or peak throughput data for a group of webservers)
pute or storage capabilities–based on the currently available We have discussed systems, capabilities and hosts; these
software and infrastructure. Plans can be flexible (such as exist when our system is running, however in order to
a request for a transcoding service) or highly prescriptive construct our system we have other types which are closely
(such as a request for a specific transcoder to be installed related: a CapabilityTemplate will set up the software so the
and attached to a particular storage system and work queue). associated Capability can manage it; a HostTemplate will
Flexible plans can deliver major benefits to the end user set up the hardware so the associated Host type can manage
since a variety of software and platforms can be used – it. A System is constructed from a Recipe - a recipe can
a requirement for a transcoder could potentially be solved be configured to take custom parameters (such as budgets,
sioning more worker nodes if the current level of throughput
is not high enough.
The Transcode Coordinator constantly re-evaluates the
value for money delivered by worker nodes. If a particular
node is not providing sufficient value for money then it is
discarded just before its next billing cycle. The system tries
to ensure it does not deliver content significantly in advance
of its deadline: by delivering in a JIT (Just In Time) fashion
it can get the best value for money.
The Transcode Coordinator also records the performance
of various machine types; this is useful in a number of
Figure 3. transcoder high-level model system components:

1) to determine whether a machine’s performance is in


static external service endpoints, etc) using an ingredients line with expectations
list. The ingredients list can include information such as: 2) to form the usage scenarios to evaluate machines
1) The recipe name against
2) The instance name 3) to allow new machines from various providers to be
3) The load balancer to use within the system used instantly without the need for benchmarks to be
4) The providers we are willing to use for this system run
5) Access control requirements
6) Storage budgets Using historical data to drive decisions is very flexible
7) System budgets since it allows new providers to be added quickly and
8) A minimum / maximum number of resources to in- automatically avoids slow resources – and will react instantly
stantiate to pricing changes (if using, for instance, Amazon EC2 Spot
Instances)
VI. T RANSCODING IN THE C LOUD Jobs are sent to the Transcode Coordinator in batches with
Video transcoding is the process of converting video a common deadline. Each batch is provisionally planned
files between container formats, video and audio codecs. by the Transcode Coordinator, which allocates the longer-
It is an interesting task for a cloud because it has high running jobs to appropriate resource types, with the intention
compute, storage and bandwidth requirements and scales of completing within an hour if possible. Shorter jobs are left
very well (individual video files are usually quite short so to be allocated opportunistically by the Transcode Coordi-
many transcodes can be run in parallel). We developed a nator, running either during unused time at the end of longer
scalable budget-aware transcoding solution to work around jobs, or on resources that turn out to be substandard. Each
the great expense and inflexibility of existing transcoding planned resource is started at a time chosen by the Transcode
hardware: they are generally deployed on very expensive Coordinator, which aims to finish the work well ahead of
fixed machines with little sharing. We wanted to be able the deadline, but spread it evenly across that time, allowing
to quickly transcode the daily output of our broadcasting high utilisation of the best-performing machines. Multi-core
partner’s many TV and Radio channels without having machines can be used to run several jobs simultaneously,
expensive transcoding hardware idle for most of the day one per core, as this is more efficient than transcoding a
but whilst also keeping down the latency between when a job with multiple threads - this is done only if necessary to
piece of content is broadcast and when it is transcoded and complete a long job within the deadline.
available online for users. The Transcode Coordinator continually monitors the per-
The general transcoding model, shown in Figure 3 allows formance of the running nodes. Historical data is used to
video to be ingested and converted into different formats determine what is acceptable performance for each resource
while minimising the number of resources used. To ac- type, and resources which fall short of this have their
complish this, a Transcode Coordinator adopts groups of allocated jobs reassigned to another resource. Such resources
transcode jobs submitted through the Transcode Queue API. are never kept for more than an hour, and are used for
A group specifies its deadline and a budget for the con- running shorter jobs until their hour is up. Once any node
version. Individual jobs specify the source and destination has finished its planned work, it uses the performance data
formats, along with a link to the source video. The Transcode to decide whether to keep the node and allocate another
Coordinator ensures that transcodes are completed before planned node’s jobs to it, or to discard on its next billing
they are needed, potentially increasing parallelism by provi- cycle and use to run short jobs until then.
VII. F UTURE W ORK generic terms and interact with them by expressing solution-
centric requirements and selecting resources and resource
PRISM as an infrastructure is still in field deployment
locations using solution-centric scenarios that reflect the
and testing with an active test user group that ranges from
application’s view of resource cost. The use of (on the face
children as young as 8 to teenagers and adults. This broad
of it) a simplistic approach to expressing requirements as
age group enables a range of different types of services
constraints has proven a powerful and automated way of
and usage patterns to be investigated. For example, young
filtering the resources that are available from providers to
children require simpler user interfaces and integration of
find the resources that are suitable for an application. A
content with associated online games, teenagers generally
service or application can have a long list of constraints in
watch streamed content which is of a reduced quality and
our model but these are defined once and the process is an
that has peaks at well defined points in the day, while adults
integral part of our development approach–the creation of
often prefer high quality content and are willing to wait
a software/service/service group metadata repository and its
while that content downloads to their home. The ideas within
management and update has been fundamental to our work.
PRISM are currently being developed by commercial part-
ners for deployment as a commercial product that manages R EFERENCES
film content for in-home viewing. [1] Nurmi et al., in CCGRID ’09: Proceedings of the 2009 9th
Zeel is open source and is in active development within IEEE/ACM International Symposium on Cluster Computing
BeSC and in deployment with a number of our industrial and the Grid (IEEE Computer Society, Washington, DC,
partners. Its primary purpose within BeSC is to provide a sta- USA, 2009), pp. 124–131
ble platform for the development of novel dynamic service- [2] W.J. Lizhe Wang, Jinjun Chen, Grid Computing: Infrastruc-
centric infrastructures and is being developed to support the ture, Service, and Applications (CRC Press, 2009)
projects we are involved in–these currently include financial
services with high computational and sporadic behaviour, [3] Teragrid, (http://www.teragrid.org, 2008)
military media with strict security and high volume media [4] T. Harmer, P. Wright, C. Cunningham, R. Perrott, in Euro-
traffic, and utility hosting providers creating utility resource Par ’09: Proceedings of the 15th International Euro-Par
offerings. Zeel is largely an integration framework and Conference on Parallel Processing (Springer-Verlag, Berlin,
much of its basic development is integrating developments Heidelberg, 2009), pp. 454–465
from utility providers, such as Amazon etc, and generic [5] S. Garfinkel, An Evaluation of Amazon’s Grid Computing
frameworks such as OpenNebula [7]. We are developing Services: EC2, S3 and SQS. Tech. rep., Harvard University
our ideas in the area of system recipes to define system (2007)
deployment and configuration and the development of our
marketplace ideas that guide resource selection and mobility [6] Rackspace Cloud, (http://www.rackspacecloud.com, 2009)
between resource providers. A standard API to a current [7] R.S. Montero, (Open Source Grid and Cluster Conference,
strand of development is attempting to optimise the layout Oakland, CA, 2008)
of software on resources.
[8] OCCI Working Group, (http://occi-wg.org, 2009)
VIII. C ONCLUSION [9] Microsoft Azure, (http://www.microsoft.com/windowsazure,
2010)
In this paper we have presented a simple and resource
provider agnostic approach to resource management within [10] NewServers, Inc, (http://www.newservers.com, 2008). URL
cloud applications. This API has been in use for several http://www.newservers.com
years and is the core technology for a number of large-scale
[11] libcloud.org, (http://libcloud.org, 2009)
applications in digital media, financial services, military and
utility hosting that are field deployed with established user [12] Dasein Multi-cloud API, (http://dasein-cloud.sourceforge.net,
groups. The API is developed around the idea that it is an 2009)
integration framework rather than a solution technology–
[13] Xcalibre, Inc, (http://www.flexiscale.com, 2008)
the approach defines a management framework that brings
together the technologies that are required rather than rein- [14] R. Perrott, T. Harmer, R. Lewis, Computer 41(11), 67 (2008)
venting new variants of existing and accepted technolo-
gies. This integration approach has permitted us to include [15] T. Harmer, Cluster Computing 10(3), 277 (2007)
new virtualisation and provider solutions with few changes [16] T. Harmer, J. McCabe, P. Donachy, R. Perrott, C. Chambers,
to our core applications that are our primary focus–the S. Craig, R. Lewis, B. Mallon, L. Sluman, in SCC ’05:
development of Zeel began as an effort to simplify our Proceedings of the 2005 IEEE International Conference on
applications development and we did not view it as an Services Computing (IEEE Computer Society, Washington,
end in itself. Our applications view the resource cloud in DC, USA, 2005), pp. 35–42

You might also like