Design The Support For Granting Required Sla in Public Cloud Environments Based On Cloud Foundry

ALMA MATER STUDIORUM - UNIVERSIT DI BOLOGNA
SCUOLA DI INGEGNERIA E ARCHITETTURA

DIPARTIMENTO DI INFORMATICA SCIENZA E INGEGNERIA CORSO DI LAUREA MAGISTRALE IN INGEGNERIA INFORMATICA
TESI DI LAUREA in RETI DI CALCOLATORI M
DESIGN THE SUPPORT FOR GRANTING REQUIRED SLA IN PUBLIC CLOUD ENVIRONMENTS BASED ON CLOUD FOUNDRY
CANDIDATO: Guido Davide DallOlio
RELATORE: Chiar.mo Prof. Ing. Antonio Corradi CORRELATORI: Ing. Diana J. Arroyo Dr. Ing. Luca Foschini Ing. Darrell Reimer Dr. Ing. Malgorzata Steinder
Anno Accademico 2012/13 Sessione III
Design the support for granting required SLA in public Cloud Environments based on Cloud Foundry
Guido Davide DallOlio
Key words : PaaS Cloud Foundry Isolation
Contents
Introduction 1 Introduction to Cloud Computing 1.1 1.2
13 15
Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . 15 Dierent Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.2.1 1.2.2 1.2.3 1.2.4 Public Cloud . . . . . . . . . . . . . . . . . . . . . . . 20 Private Cloud . . . . . . . . . . . . . . . . . . . . . . . 21 Private vs Public Cloud . . . . . . . . . . . . . . . . . 22 Hybrid Cloud . . . . . . . . . . . . . . . . . . . . . . . 24
1.3
Service Level Agreement in the Cloud . . . . . . . . . . . . . . 24 27
2 Cloud layers and its uses 2.1
Cloud Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.1 2.1.2 2.1.3 IaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 PaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 SaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2
Main Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1 2.2.2 Google Cloud Platform . . . . . . . . . . . . . . . . . . 34 Amazon Web Services . . . . . . . . . . . . . . . . . . 35 37
3 Cloud Foundry
6 3.1 3.2
CONTENTS A Good Choice . . . . . . . . . . . . . . . . . . . . . . . . . . 38 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.2.8 3.3 3.4 3.5 3.6 NATS . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Cloud Controller . . . . . . . . . . . . . . . . . . . . . 41 Droplet Execution Agent . . . . . . . . . . . . . . . . . 44 Warden . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Router . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Health Manager . . . . . . . . . . . . . . . . . . . . . . 47 User Account and Authentication Server . . . . . . . . 48 Services . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Roles and Organizations . . . . . . . . . . . . . . . . . . . . . 51 Command Line Client . . . . . . . . . . . . . . . . . . . . . . 53 Applications Guidelines . . . . . . . . . . . . . . . . . . . . . . 54 Interaction and Usage . . . . . . . . . . . . . . . . . . . . . . 56 3.6.1 3.6.2 Staging . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Start of an Application . . . . . . . . . . . . . . . . . . 59 63
4 BOSH 4.1 4.2
BOSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 Stemcell . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Jobs and Packages . . . . . . . . . . . . . . . . . . . . 66 BOSH Agent . . . . . . . . . . . . . . . . . . . . . . . 69 Blobstore . . . . . . . . . . . . . . . . . . . . . . . . . 71 Director . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3
BOSH Manifest . . . . . . . . . . . . . . . . . . . . . . . . . . 74 77
5 Cloud Foundry Deployment
CONTENTS 5.1 5.2
The deployment . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Local deployment . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2.1 5.2.2 CF Nise Installer . . . . . . . . . . . . . . . . . . . . . 79 Local Development Environment . . . . . . . . . . . . 80
5.3
Distributed Deployment . . . . . . . . . . . . . . . . . . . . . 83 5.3.1 5.3.2 5.3.3 5.3.4 5.3.5 5.3.6 Micro BOSH . . . . . . . . . . . . . . . . . . . . . . . 84 The Steps Involved . . . . . . . . . . . . . . . . . . . . 84 Distributed Development Environment . . . . . . . . . 87 OpenStack . . . . . . . . . . . . . . . . . . . . . . . . . 88 Deploying Micro BOSH . . . . . . . . . . . . . . . . . 91
Deploying a distributed Cloud Foundry . . . . . . . . . 92 101
6 Application Isolation in Cloud Foundry 6.1 6.2 6.3
Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Process Groups - Control Groups . . . . . . . . . . . . . . . . 104 6.3.1 6.3.2 Hierarchy and Subsystems . . . . . . . . . . . . . . . . 104 An example of usage . . . . . . . . . . . . . . . . . . . 107
6.4
Containers a lightweight approach . . . . . . . . . . . . . . . . 108 6.4.1 6.4.2 6.4.3 6.4.4 LXC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Warden . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Docker vs Warden . . . . . . . . . . . . . . . . . . . . 118
6.5
Risks of a Container Based isolation . . . . . . . . . . . . . . . 120 125
7 Improving provided isolation 7.1 7.2
The current simple isolation . . . . . . . . . . . . . . . . . . . 125 Where to hook up a virtualization isolation: the Stack . . . . 127
8 7.2.1 7.2.2 7.2.3 7.2.4 7.3
CONTENTS Current Stack usage . . . . . . . . . . . . . . . . . . . 128 Cloud Foundry Stack in details . . . . . . . . . . . . . 129 Our proposal to employ the Stack . . . . . . . . . . . . 133 Integrate the change with BOSH . . . . . . . . . . . . 139
Enabling a Dynamic Provisioning . . . . . . . . . . . . . . . . 140 7.3.1 7.3.2 Heat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Scaling DEA nodes with Heat . . . . . . . . . . . . . . 144 149
8 Isolation and Co-location Performances 8.1
Application Tested . . . . . . . . . . . . . . . . . . . . . . . . 151 8.1.1 8.1.2 8.1.3 8.1.4 8.1.5 8.1.6 CPU - intensive application . . . . . . . . . . . . . . . 154 Network - intensive application . . . . . . . . . . . . . 158 Disk I/O - intensive application . . . . . . . . . . . . . 161 Distributed Application . . . . . . . . . . . . . . . . . 163 Media Stream over network . . . . . . . . . . . . . . . 165 Multi-tier Application . . . . . . . . . . . . . . . . . . 167 . . . . . . . . . . . . . . . . . . . . . . 170 173
8.2
Technical Conclusions
Conclusions and Future work
List of Figures
1.1 1.2 1.3 2.1 2.2 3.1 3.2 3.3 3.4 4.1 4.2 4.3 5.1 5.2 6.1 6.2 6.3 6.4 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . 16 Virtual Machine Monitor and Virtualization . . . . . . . . . . 18 Deployment Models . . . . . . . . . . . . . . . . . . . . . . . . 20 The cloud computing stack . . . . . . . . . . . . . . . . . . . . 28 Google Cloud Platform . . . . . . . . . . . . . . . . . . . . . . 35 A Triangle of choice . . . . . . . . . . . . . . . . . . . . . . . . 39 Cloud Foundry Architecture . . . . . . . . . . . . . . . . . . . 40 Organization and Roles . . . . . . . . . . . . . . . . . . . . . . 52 Start of an application . . . . . . . . . . . . . . . . . . . . . . 60 BOSH architecture . . . . . . . . . . . . . . . . . . . . . . . . 65 BOSH Director and Agent basic interaction . . . . . . . . . . 70 BOSH APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 MicroBosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Final BOSH deployment . . . . . . . . . . . . . . . . . . . . . 94 Xen and KVM virtualization . . . . . . . . . . . . . . . . . . . 103 A single hierarchy can have one or more subsystems attached . 105 Attaching multiple subsystems . . . . . . . . . . . . . . . . . . 106 Lightweight virtualization layers . . . . . . . . . . . . . . . . . 109
10 6.5 6.6 6.7 6.8 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 Container vs Virtualization
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . 110
DEA and Warden interaction . . . . . . . . . . . . . . . . . . 114 Docker features and Virtualization . . . . . . . . . . . . . . . 117 Container and Virtual Machine comparison . . . . . . . . . . . 121 DEAs and Applications . . . . . . . . . . . . . . . . . . . . . . 126 DEA pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 DEA advertisements . . . . . . . . . . . . . . . . . . . . . . . 130 Cloud Controller selection process . . . . . . . . . . . . . . . . 132 New DEA advertisements . . . . . . . . . . . . . . . . . . . . 134 DEA and Cloud Controller conguration les . . . . . . . . . 135 New pools and processing . . . . . . . . . . . . . . . . . . . . 136 A new Controller processing . . . . . . . . . . . . . . . . . . . 137 Heat architecture . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.10 Cloud Foundry scaler, using Heat . . . . . . . . . . . . . . . . 145 7.11 A dierent Stack is running . . . . . . . . . . . . . . . . . . . 147 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 Virtual Machines and test conguration . . . . . . . . . . . . . 152 Two VCPU deployment for test . . . . . . . . . . . . . . . . . 153 Four VCPU deployment for test . . . . . . . . . . . . . . . . . 153 Whetstone Benchmark host score . . . . . . . . . . . . . . . . 154 Whetstone Benchmark two VCPU average container score . . 156 Whetstone Benchmark two VCPU average execution time . . 156
Whetstone Benchmark four VCPU average container score . . 157 Whetstone Benchmark four VCPU average execution time . . 157 Two VCPU deployment iperf test . . . . . . . . . . . . . . . . 158
8.10 TCP average Bandwidth . . . . . . . . . . . . . . . . . . . . . 159 8.11 UDP two VCPU average Bandwidth . . . . . . . . . . . . . . 160
LIST OF FIGURES
11
8.12 UDP four VCPU average Bandwidth . . . . . . . . . . . . . . 160 8.13 Disk intensive I/O Write Disk speed . . . . . . . . . . . . . . . 161 8.14 Disk intensive I/O total average execution time . . . . . . . . 162 8.15 Two VCPU deployment distributed computation test . . . . . 163 8.16 Average communication Bandwidth sending the chunk . . . . 164 8.17 Average execution time . . . . . . . . . . . . . . . . . . . . . . 164 8.18 Two VCPU deployment media stream test . . . . . . . . . . . 165 8.19 Average connection speed during media transfer two VCPU . 166 8.20 Average connection speed during media transfer four VCPU . 167 8.21 Two VCPU deployment Multi-tier test . . . . . . . . . . . . . 167 8.22 Average execution time two VCPU . . . . . . . . . . . . . . . 168 8.23 Average transfer speed two VCPU . . . . . . . . . . . . . . . . 169 8.24 Average execution time four VCPU . . . . . . . . . . . . . . . 169 8.25 Average transfer speed four VCPU . . . . . . . . . . . . . . . 170
Introduction
Cloud Computing, that is providing computer resources as a service, is a technology revolution oering exible IT usage in a cost ecient and pay-per-use way. The Cloud approach can be applied for applications development process by the use of special platforms and environments that provide an access to remote resources. One of the platforms, Platform as a Service (PaaS ), oers opportunities for software companies to create applications easier, concentrating on business processes instead of coding and maintenance, reduce costs, associated with hardware and software, anticipate possible problems in scalability and carry out the whole development lifecycle within the same environment. In the last 12 months the adoption of PaaS has increased dramatically and it is now one of the fastest growing areas of all the cloud computing services. Gartner estimates a steep rise in PaaS adoption and forecasts an increase in spending to more than $2.9 billion by 2016 and that every organization will run some or all of its business software on public or private PaaS. Early PaaS oerings, however, restrict developers to a specic or non-standard development frameworks, a limited set of application services or a single, vendor-operated Cloud service. These incompatible platforms inhibit application portability, locking developers into a particular oering and restricting movement of applications across Cloud providers or even into an enterprises own datacenter. Cloud Foundry is a modern application platform built specically to simplify the end-to-end development, deployment and operation of Cloud era applications, it is an open source Cloud Computing project, that oers a platform supporting many languages and many
14
Introduction
services. Thanks to its openness it can be adapted and partially changed, but also integrated to fulll many tasks in dierent environments. Cloud Foundry represents a new generation of application platform, architected specically for Cloud Computing environments and delivered as a service from enterprise datacenters and public Cloud service providers. The project is not tied to any single Cloud environment, rather, Cloud Foundry supports deployment to any public and private Cloud environment. However, each Cloud platform, has to face some challenges, such as: application portability, security, scalability and SLA delivery. The open source platform project sacrices a solid run-time application isolation for a easier architecture and deploying process; however, in certain scenarios, some applications should not run or be co-located with others of a dierent classication or requiring dierent SLAs. This work represents a proposed solution and research to meet customers required SLA, on Cloud Foundry, granting application separation and isolation: during the thesis advancement we dened a change in the PaaS architecture, that could add new features but grant always backward compatibility to the current applications developed. Moreover, the new added placement option grants more strict and eective boundaries between the applications hosted on the PaaS, allowing more advanced and ecient SLAs. In Chapter 1 the idea and concept of Cloud Computing is presented, while in Chapter 2 we explore the dierent Cloud layers and dierent available solution for enterprises. Chapter 3 deeply analyzes Cloud Foundrys architecture and characteristics, exalting its qualities and use cases; while in Chapter 4 BOSH, a specic deployer, will be introduced. In Chapter 5 shows how to install and deploy the open project, then Chapter 6 examines application placement weakness and drawbacks present in Cloud Foundry. Chapter 7 explains the proposed change and its painless integration, compatible with the current release, while in the Chapter 8 many uses cases, applications and tests were carried out to quantify and understand the real benets of the dierent isolated placement.
Chapter 1 Introduction to Cloud Computing

Cloud computing has recently emerged as one of the common words in the Information and Communications Technology (ICT) industry. Several Information Technology (IT) vendors are promising to oer computation, storage, and application hosting services and to provide coverage in several continents, oering specic Service Level Agreements (SLA) and ensuring performance and uptime promises for their services. These clouds are distinguished by exposing resources such as computation, data storage and applications as standards-based Web services and following a pricing model where customers are charged based on their utilization of computational resources, storage, and transfer of data. Nowadays we are experiencing a more continuous transition to the Cloud; mostly because it aims to cut costs, and help the users focus on their core business instead of being impeded by IT obstacles.
1.1
Cloud Computing
We can track the roots of clouds computing by observing the advancement of several technologies, especially in hardware (virtualization, multi-core chips),
16
Introduction to Cloud Computing
Internet technologies (Web services, service-oriented architectures, Web 2.0), distributed computing (clusters, grids), and systems management (autonomic computing, data center automation). Figure 1.1 shows the convergence of technology elds that signicantly advanced and contributed to the advent of cloud computing. While these emerging services have increased interoperability and usability and reduced the cost of computation, application hosting, and content storage and delivery, there is signicant complexity involved in ensuring that applications and services can scale as needed to achieve consistent and reliable operation under peak loads. Cloud vendors, researchers, and practitioners alike are working to ensure
Figure 1.1: Cloud Computing that potential users are educated about the benets of cloud computing and the best way to harness the full potential of the cloud. The Cloud Computing is really connected to the concept of utility computing, described by a business model for on-demand delivery of computing power: a scenario where
1.1 Cloud Computing
17
consumers pay providers based on usage, similar to the way in which we currently obtain services from traditional public utility services such as water, electricity and gas. However, while the realization of real utility computing appears closer than ever, its acceptance is currently restricted to cloud experts due to the perceived complexities of interacting with cloud computing providers. From a nal user perspective, the Cloud Computing can be seen as a set of useful functions and resources that hide how their internals work and do not let worry the customer. Before this concept took place, the access, the management and the elaboration of the data has been achieved through bare-metal congurations gradually high performing; now the computing itself, that may be considered fully virtualized, allows computers to be built from distributed components such as processing, storage, data, and software resources. One of the main aim of the Cloud Computing is to allow access to large amounts of computing power in a fully virtualized manner, by aggregating resources and oering a single system view. In addition, an important aim of this technology has been delivering computing as a utility [2]. Cloud computing has been coined as an umbrella term to describe a category of sophisticated on-demand computing services; it denotes a model on which a computing infrastructure is viewed as a cloud, from which businesses and individuals access applications from anywhere in the world on demand [3]. The main principle behind this model is oering computing, storage, and software as a service. Three new main aspects can be generally considered as Cloud features [4]: Perception of innite computing resources on demand; Removal of a excessive investment in resources; Ability to pay for use of short-term resources only when necessary. They seem simple topics, but with a complex feasibility. We are currently experiencing a switch in the IT world, from in-house generated computing power into utility-supplied computing resources delivered
18
over the Internet as Web services. Cloud computing services are usually backed by large-scale data centers composed of thousands of computers. Such data centers are built to serve many users and host many disparate applications. For this purpose, hardware virtualization can be considered as a perfect t to overcome most operational issues of data center building and maintenance, as it allows running multiple operating systems and software stacks on a single physical platform. A software layer, the Virtual Machine Monitor (VMM) as shown in Figure 1.2, also called a hypervisor, mediates
Figure 1.2: Virtual Machine Monitor and Virtualization access to the physical hardware presenting to each guest operating system a Virtual Machine (VM), which is a set of virtual platform interfaces [5]. Traditionally, perceived benets were improvements on sharing and utilization, better manageability, and higher reliability. More recently, with the adoption of virtualization on a broad range of server and client systems, researchers and practitioners have been emphasizing three basic capabilities regarding management of workload in a virtualized system, namely: isolation, consolidation and migration [6]. Workload isolation is achieved since all program instructions are fully conned inside a VM, which leads to improvements in security. Better reliability is also achieved because software failures
1.1 Cloud Computing
19
inside one VM do not aect others [5]. Moreover, better performance control is attained since execution of one VM should not aect the performance of another VM. The consolidation of several individual and heterogeneous workloads onto a single physical platform leads to better system utilization. Workload migration, also referred to as application mobility, targets at facilitating hardware maintenance, load balancing, and disaster recovery. It is done by encapsulating a guest operating system (OS) state within a VM and allowing it to be suspended, fully serialized, migrated to a dierent platform, and resumed immediately or preserved to be restored at a later date. Certain features of a cloud are essential to enable services that truly represent the cloud computing model and satisfy expectations of consumers, and cloud oerings grant: Self-service: consumers of cloud computing services expect on-demand, nearly instant access to resources. To support this expectation, clouds must allow self service access so that customers can request, customize, pay, and use services without intervention of human operators [7]; Per-usage metering and billing: cloud computing eliminates up-front commitment by users, allowing them to request and use only the necessary amount. For this reason, clouds must implement features to allow ecient trading of service such as pricing, accounting, and billing; Elasticity: cloud computing gives the illusion of innite computing resources available on demand. Therefore users expect clouds to rapidly provide resources in any quantity at any time. In particular, it is expected that the additional resources can be provisioned, possibly automatically, when an application load increases and released when load decreases; Customization: resources rented from the cloud must be highly customizable. In the case of infrastructure services, customization means allowing users to deploy specialized virtual appliances and to be given, for example, privileged (root) access to the virtual servers.
20
1.2
Dierent Clouds
Although cloud computing has emerged mainly from the appearance of public computing utilities, other deployment models, with variations in physical location and distribution, have been adopted. In this sense, regardless of its service class, a cloud can be classied as public, private or hybrid [7] based on model of deployment as shown in Figure 1.3. In most cases, establishing
Figure 1.3: Deployment Models a private cloud means restructuring an existing infrastructure by adding virtualization and cloud-like interfaces. This allows users to interact with the local data center while experiencing the same advantages of public clouds, most notably self-service interface, privileged access to virtual servers, and per-usage metering and billing. A public cloud can be shared by several organizations and supports a specic community that has shared concerns (e.g., mission, security requirements, policy and compliance considerations [7]. While a hybrid cloud takes shape when a private cloud is supplemented with computing capacity from public clouds [9].
1.2.1
Public Cloud
It is the most common model. Public cloud or external cloud describes cloud computing in a traditional mainstream sense, whereby resources are dynamically provisioned via publicly accessible Web applications/Web services (SOAP or RESTful interfaces) from an o-site third-party provider. Commonly when services and applications rely on public visibility and reach-
1.2 Dierent Clouds
21
ability from the Internet, a public cloud is the rst choice. To run a public cloud, a service provider will rst need to dene the services that will be oered to enterprises that want to place their workloads in the cloud; a offer for many customers. This is the reason why Cloud providers, such as Amazon EC2, can host a large of number applications [10] in a multitude of virtual machine independently managed and oer a collection of remote computing services that together make up a platform over the internet. Those who choose this approach, should not worry about computational resources supply or availability issues, as the provider will totally take care of it. The main benets of using a public cloud service can be summarized in: Easy and inexpensive set-up because hardware, application and bandwidth costs are covered by the provider; Scalability to meet needs; Very few or totally absent wasted resources. Nothing comes free, typically public clouds are subject to billing services based on a pay-per-use basis and a time-based resource usage calculation. The provider shares resources and bills customers on a ne-grained utility computing basis; the user pays only for the capacity of the provisioned resources at a particular time. Usually, from an organization point of view, public cloud is chosen when not sensible data is involved, enterprise public services are needed or when company data centers cannot fulll load requests.
1.2.2
Private Cloud
Private cloud (also called internal cloud or corporate cloud) is a marketing term for a proprietary computing architecture that provides hosted services to a limited number of people behind a rewall. Private cloud is the cloud infrastructure operated solely for an organization. It can be managed by the organization or a third party and can exist on-premises or o-premises. It aims at providing public cloud functionality, but on private resources,
22
while maintaining control over an organizations data and resources to meet security and governances requirements in an organization. It usually consists in a compute platform with these goals: Simplicity: allow service provisioning, setup and compute capability for an organizations users in a self-service manner; Potency: automate and provide well-managed virtualized environments; Management: optimize computing resources, and servers utilization; Adaptability: support specic workloads. Dierently from public clouds, instead of a pay-as-you-go model, there could be other schemes in place, which take into account the usage of the cloud and proportionally bill the dierent departments or sections of the enterprise. Private clouds have the advantage of keeping in-house the core business operations by relying on the existing IT infrastructure and reducing the burden of maintaining it once the cloud has been set up. In spite of these advantages, private clouds cannot easily scale out in the case of peak demand, and the integration with public clouds could be a solution to the increased load. However some drawbacks may occur, since the conguration and the installation of the private infrastructure is a mandatory phase. A total private conguration, at the cost of a initial investment.
1.2.3
Private vs Public Cloud
After an initial enthusiasm for this new trend, it soon became evident that a solution built on outsourcing the entire IT infrastructure to third parties would not be applicable in many cases, especially when there are critical operations to be performed and security concerns to consider. Moreover, with the public cloud distributed anywhere on the planet, legal issues arise and they simply make it dicult to rely on a virtual public infrastructure for any IT operation. As an example, data location and condentiality are two of the
1.2 Dierent Clouds
23
major issues that scare stakeholders to move into the clouddata that might be secure in one country may not be secure in another [2]. In many cases though, users of cloud services dont know where their information is held and dierent laws can apply. It could be stored in some data center in either Europe, where the European Union favors very strict protection of privacy, or America, where laws such as the U.S. Patriot Act2 invest government and other agencies [16] with virtually limitless powers to access information including that belonging to companies. In addition, enterprises already have their own IT infrastructures. In spite of this, the distinctive feature of cloud computing still remains appealing, and the possibility of replicating in-house (on their own IT infrastructure) the resource and service provisioning model proposed by cloud computing led to the development of the private cloud concept. In this scenario, security concerns are less critical, since sensitive information does not ow out of the private infrastructure. Moreover, existing IT resources can be better utilized since the Private cloud becomes accessible to all the division of the enterprise. Another interesting opportunity that comes with private clouds is the possibility of testing applications and systems at a comparatively lower price rather than public clouds before deploying them on the public virtual infrastructure. For the enterprises there are some key advantages from the use of a private cloud: Customer information protection Despite the public cloud oerings about the specic level of security, in-house security is easier to maintain and to rely on; Infrastructure ensuring Service Level Agreements (SLAs) Quality of service implies that specic operations such as appropriate clustering and failover, data replication, system monitoring and maintenance, disaster recovery, and other uptime services can be commensurate to the application needs. While public clouds vendors provide some of these features, not all of them are available as needed; Compliance with standard procedures and operations If organizations are subject to third-party compliance standards, spe-
24
Introduction to Cloud Computing cic procedures have to be put in place when deploying and executing applications. This could be not possible in the case of virtual public infrastructure.
However private clouds may not easily scale. Hence, hybrid clouds, which are the result of a private cloud growing and provisioning resources from a public cloud, are likely to be best option in many cases.
1.2.4
Hybrid Cloud
Hybrid cloud is the cloud infrastructure composed of two or more clouds, either private or public that remain separated entities but bound together by standardized technology that enables data and application portability. Hybrid clouds allow exploiting existing IT infrastructures, maintaining sensitive information within the premises, and naturally growing and shrinking by provisioning external resources and releasing them when needed. Security concerns are then only limited to the public portion of the cloud, which can be used to perform operations with less stringent constraints but that are still part the system workload. Hybrid clouds change their composition and topology over time. They form as a result of dynamic conditions such as peak demands or specic SLAs attached to the applications currently in execution. An open and extensible architecture that allows easily plugging new components and rapidly integrating new features is of a great value in this case.
1.3
Service Level Agreement in the Cloud
A Service Level Agreement (SLA) is a contract between a network service provider and a customer that species, usually in measurable terms, what services the network service provider will furnish. SLAs are oered by providers to express their commitment to delivery of a certain QoS. To customers it serves as a warranty. An SLA usually include availability and performance
1.3 Service Level Agreement in the Cloud
25
guarantees. Additionally, metrics must be agreed upon by all parties as well as penalties for violating these expectations. Service Level Agreements can prove to be a useful instrument in facilitating enterprises trust in cloud-based services. Cloud providers are typically not directly exposed to the service semantics or the SLAs that service owners may contract with their end users. The capacity requirements are, thus, less predictable and more elastic. The use of reservations may be insucient, and capacity planning and optimizations are required instead. The cloud providers task is, therefore, to make sure that resource allocation requests are satised with specic probability and timeliness. These requirements are formalized in infrastructure SLAs between the service owner and cloud provider, separate from the high-level SLAs between the service owner and its end users. In many cases, either the service owner is not resourceful enough to perform an exact service sizing or service workloads are hard to anticipate in advance. Therefore, to protect high-level SLAs, the cloud provider should cater for elasticity on demand. There are two types of SLAs from the perspective of hosting, at two dierent levels: Infrastructure: infrastructure provider manages and oers guarantees on availability of the infrastructure, namely, server machine, power, network connectivity, and so on. Enterprises manage themselves, their applications that are deployed on these server machines. The machines are leased to the customers and are isolated from machines of other customers. In such dedicated hosting environments, a practical example of service-level is represented by a Quality of Service (QoS) condition related to the availability of the system CPU, data storage and network for ecient execution of the application at peak loads. Application: in the application co-location hosting model, the server capacity is available to the applications based solely on their resource demands. Hence, the service providers are exible in allocating and de-allocating computing resources among the co-located applications. Therefore, the service providers are also responsible for ensuring to meet their customers application Service Level Objective.
26
It is also possible for a customer and the service provider to mutually agree upon a set of SLAs with dierent performance and cost structure rather than a single SLA. The customer has the exibility to choose any of the agreed SLAs from the available oerings. At runtime, the customer can switch between the dierent SLAs. Currently, the cloud solutions come with primitive or reduced SLAs [2]. This is surely bound to change; as the cloud market gets crowded with increasing number of cloud oers, providers have to gain some competitive dierentiation to capture larger share of the market. This is particularly true for market segments represented by enterprises and large organizations; where those entities will be particularly interested to choose the oering with sophisticated SLAs providing more assurances. Many businesses are ready to move and have already migrated to the cloud to this day.
Through the idea of Cloud we understand why many enterprise applications are moving towards this direction and which kind of SLAs are implied. Now, we are going to present the dierent layers of the Clouds and how the dierent models interact each other.
Chapter 2 Cloud layers and its uses

Cloud computing services are divided into three classes, according to the abstraction level of the capability provided and the service model of providers: a Provider of a Cloud typically oers a subscription-based access to infrastructure (Infrastructure as a Service), platforms (Platform as a Service) and applications (Software as a Service); that are popularly referred to as IaaS, PaaS, and SaaS. Figure 2.1 depicts the layered organization of the cloud stack from physical infrastructure to applications.
2.1
Cloud Layers
The abstraction levels can also be viewed as a layered architecture where services of a higher layer can be composed from services of the underlying layer [14]. Cloud development environments are built on top of infrastructure services to oer application development and deployment capabilities; in this level, various programming models, libraries, APIs, and mashup editors enable the creation of a range of business, Web, and scientic applications. Once deployed in the cloud, these applications can be consumed by end users. We start describing from the lowest layer, going up to the most abstract.
28
Cloud layers and its uses
Figure 2.1: The cloud computing stack
2.1.1
IaaS
Oering virtualized resources (computation, storage, and communication) on demand is known as Infrastructure as a Service (IaaS ) [9]. A cloud infrastructure enables on-demand provisioning of servers running several choices of operating systems and a customized software stack. Infrastructure services are considered to be the bottom layer of cloud computing systems [11]. Amazon Web Services mainly oers IaaS, which in the case of its EC2 service means oering VMs with a software stack that can be customized similar to how an ordinary physical server would be customized; in the same way Openstack provides the same abstraction to the nal consumers. Users are given privileges to perform numerous activities to the server, such as: starting and stopping it, customizing it by installing software packages, attaching virtual disks to it, and conguring access permissions and rewalls rules. A key challenge IaaS providers face when building a cloud infrastructure is managing physical and virtual resources, namely servers, storage, and networks, in a holistic fashion. The orchestration of resources must be performed in a way to rapidly and dynamically provision resources to applications[9]. Public
2.1 Cloud Layers
29
Infrastructure as a Service providers commonly oer virtual servers containing one or more CPUs, running several choices of operating systems and a customized software stack. In addition, storage space and communication facilities are often provided. In spite of being based on a common set of features, IaaS oerings can be distinguished by the availability of specialized features that inuence the cost-benet ratio to be experienced by user applications when moved to the cloud. The most relevant features are: Geographic presence: to improve availability and responsiveness, a provider of worldwide services would typically build several data centers distributed around the world; User interfaces and access to servers: ideally, a public IaaS provider must provide multiple access means to its cloud, thus catering for various users and their preferences. Graphical User Interfaces (GUIs) are preferred by end users who need to launch, customize, and monitor a few virtual servers and do not necessary need to repeat the process several times. On the other hand, Command Line Interfaces (CLIs) oer more exibility and the possibility of automating repetitive tasks via scripts; Advance reservation of capacity: advance reservations allow users to request for an IaaS provider to reserve resources for a specic time frame in the future, thus ensuring that cloud resources will be available at that time. Automatic scaling and load balancing: elasticity is a key characteristic of the cloud computing model. Applications often need to scale up and down to meet varying load conditions. Automatic scaling is a highly desirable feature of IaaS clouds. It allow users to set conditions for when they want their applications to scale up and down, based on application-specic metrics such as transactions per second, number of simultaneous users, request latency, and so forth;
30
Cloud layers and its uses Hypervisor and operating system choice: IaaS providers needed expertise in Linux, networking, virtualization, metering, resource management, and many other low-level aspects to successfully deploy and maintain their cloud oerings.
One of the most well known open-source IaaS is OpenStack [12]: a cloudcomputing project that aims to provide the ubiquitous open source cloud computing platform for public and private clouds.
2.1.2
PaaS
Platform as a Service (PaaS ) is a category of cloud computing services that provides a computing platform and a set of software subsystems or components, needed to perform a task without further external dependencies, as a service. Typically an IaaS is not agile enough for developers, when developers that adopt infrastructure layered clouds, become responsible for managing their virtual machines, needing to understand more about the infrastructure, VMM and OS than when they were using traditional IT. For service providers, as the number of VMs grows, it becomes very dicult to manage and keep track of what exact virtual machine has specic applications running. It becomes a logistical nightmare as the ecosystem of users, as well as applications, grow within the cloud infrastructure. In addition to infrastructure-oriented clouds, that provide raw computing and storage services, another approach is to oer a higher level of abstraction to make a cloud easily programmable, through the PaaS. Public Platform as a Service providers commonly oer a deployment environment that allow users to create and run their applications with little or no concern to lowlevel details of the platform. Such a cloud platform oers an environment on which developers create and deploy applications and do not necessarily need to know how many processors or how much memory that applications will be using. In addition, multiple programming models and specialized services e.g., data access, authentication, and payments) are oered as building blocks to new applications [13]. Specic programming languages and frameworks
2.1 Cloud Layers
31
are made available in the platform, as well as other services such as persistent data storage and in-memory caches. Typical features of these platforms are: Programming models, languages and frameworks: PaaS providers usually support multiple programming languages. Most commonly used languages in platforms include Python (e.g., Google AppEngine), Java (e.g., Google AppEngine, Cloud Foundry), .NET languages (e.g., Microsoft Azure), Ruby (e.g., Heroku, Cloud Foundry) and NodeJS (e.g., Cloud Foundry). A variety of software frameworks are usually made available to PaaS developers, depending on application focus. Providers that focus on Web and enterprise application hosting oer popular frameworks such as Ruby on Rails, Spring, Java EE (frameworks that can be used in Cloud Foundry PaaS ). Sometimes well-dened APIs, too; Persistence options: a persistence layer is essential to allow applications to record their state and recover it in case of crashes, as well as to store user data. In the cloud computing domain we can rely on two common solutions: relational databases and distributed storage technologies. Typically PaaS providers oer several solutions or connection mechanisms to integrate these persistence options with the applications. Scalability: a PaaS is usually built in order to let an agile team work and iterate quickly with software. Application scalability is not only an operations issue but it is a development issue as well, but a PaaS provides these functions out of the box. Moreover scaling an application and run it in production, should is oered as a fundamental feature. In this way the downtime penalty to be paid, when a specic scaling is required, is drastically reduced. Cloud consumers of PaaS can employ the tools and execution resources provided by cloud providers to develop, test, deploy and manage the applications hosted in a cloud environment. PaaS consumers can be application developers who design and implement application software, application testers who
32
run and test applications in cloud-based environments, application deployers who publish applications into the cloud, and application administrators who congure and monitor application performance on a platform. Moreover, an additional desired feature for a PaaS is being portable, regarding both the applications and the PaaS itself. Selecting the right PaaS has a signicant impact on keeping an application portable. Basically an application should be portable among several deployments of the same PaaS, with no issues; as PaaS oers developers a set of services that are independent of the infrastructure, ensuring that the application and operational tools integrated by developers are agnostic of any cloud infrastructure. Moreover, a more portable PaaS, is capable of being installed on many IaaS ; thus increasing the portability of the application independently from the IaaS (open and portable PaaS oerings, like Cloud Foundry [20], can be deployed to public or private cloud congurations giving you the most exible deployment alternatives).
2.1.3
SaaS
A Software as a Service (SaaS ) is a software delivery model in which both software and data are totally hosted on the cloud, providing to the consumer the capability to access the providers applications, running on a cloud infrastructure, from various client devices through a thin client interface such as a web browser. In the SaaS domain, cloud applications can be built as compositions of other services from the same or dierent providers. Services such user authentication, e-mail, payroll management, and calendars are examples of building blocks that can be reused and combined in a business solution in case a single, ready-made system does not provide all those features. Many building blocks and solutions are now available in public marketplaces. Applications reside on the top of the cloud stack; the services provided by this layer can be accessed typically by end users through Web portals. The SaaS on Cloud oerings are focused on supporting large software package usage leveraging
2.1 Cloud Layers
33
cloud benets. This layer represent the most abstract one, most of the users of these packages will access directly the services and the applications totally unaware of the underlying cloud support. Traditional desktop applications such as word processing and spreadsheet can now be accessed as a service in the Web. This model of delivering applications alleviates the burden of software maintenance for customers and simplies development and testing for providers [14][15].The SaaS model has no physical need for indirect distribution since it is not distributed physically and is deployed almost instantaneously. Therefore SaaS customers have no hardware or software to buy, install, maintain, or update. Access to applications is easy as only an internet connection is required. Applications, especially the line of business services that are large customizable business solutions aimed at facilitating business processes, are normally designed for ease of use and based upon proven business architectures. The advantages of this approach include: Multitenant architecture: in which all users and applications share a single, common infrastructure and code base that is centrally maintained. Because SaaS vendor clients are all on the same infrastructure and code base, vendors can innovate more quickly and save the valuable development time previously spent on maintaining numerous versions of outdated code; Customization: the ability to easily customize applications to t enterprise business processes without aecting the common infrastructure. Because of the way SaaS is architected, these customizations are unique to each company or user and are always preserved through upgrades. That means SaaS providers can make upgrades more often, with less customer risk and much lower adoption cost; Scalability: most of the software runs on provider infrastructure, the same provider is responsible for its availability and scalability. Applications are studiously being moved to clouds, which are exposed as services, which are delivered via the Internet to user agents or humans and
34
accessed through the ubiquitous web browsers. In a SaaS approach, most of the time, we do not have to worry about the installation, setup and running of the application, because the service provider will take care of it; good realizations, for example, can be found in Google Apps [17] (a cloud-based productivity suite) and Microsoft Oce 365 [18] (an online oce suite, based on the cloud).
2.2
Main Platforms
Before diving completely into a cloud technology, it is interesting to take a look at the enterprise companies and how well the Cloud, in this last years, has been integrated and used by them. The cloud ecosystem is evolving very fast and several businesses have to deal with it, enterprise businesses need to use clouds and not build them. A cloud technology should be seen as a commodity and not a dierent way to achieve the same tasks or reach in a dierent way the same clients. Nowadays, two of the main providers oering a suite of tools and a platform to ease the move towards the Cloud Computing are: Google and Amazon.
2.2.1
Google Cloud Platform
Google Cloud Platform [19] is a set of services that enables developers to build, test and deploy applications on Googles reliable infrastructure. Generally, we talk about cloud computing when taking applications and running them on infrastructure other than our own. As a developer, the cloud should be seen as a service that provides resources to our applications. Built on the same infrastructure that allows Google to return billions of search results in milliseconds, rapidly we can develop, deploy and iterate applications without worrying about system administration, as Google manages completely application life-cycle, database and storage servers. As shown in Figure 2.2, the oering covers mainly to cloud layers: IaaS and PaaS.
2.2 Main Platforms
35
Thanks to a solid infrastructure, the Compute Engine, the platform is ca-
Figure 2.2: Google Cloud Platform pable of handling millions of requests and the applications can automatically scale up to handle the most demanding workloads and scale down when trac subsides. Googles compute infrastructure provides consistent CPU, memory and disk performance, while the network and edge cache serve responses rapidly to users across the world. While Cloud Platform oers both a fully managed platform and exible virtual machines, App Engine a PaaS, supports application development when focus on the only code is required. In addition, if the storage is required, the service oer comprises databases such as MySQL or NoSQL.
2.2.2
Amazon Web Services
Amazon Web Services (AWS) is a suite of remote services that together make up a cloud computing platform. While with Google Cloud Platform we have an environment of a higher logic level, more close to developers and applications; with AWS we reach a lower level, meaning direct access to virtual machines, congurations and more adaptability. The most well-known services oered are: EC2, S3, Route 53 and ELB. Elastic Cloud Computing (EC2) is a central service, allowing users to rent virtual computers and deploy applications on virtual machines called instances; the users are allowed to select the most suitable avor represented by a purpose-specic instance based on dierent sizes or number of CPU, GPU and memory. Amazon
36
Simple Storage Service (S3) is an online le storage web service providing the storage through Web Services interfaces, based on a highly durable and available store for a variety of content, ranging from web applications to media les. Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service, designed to give developers and businesses an extremely reliable and cost eective way to route end users to Internet applications by translating names into the numeric IP addresses; Route 53 eectively connects user requests to infrastructure running in AWS and can also be used to route users to infrastructure outside of AWS. While Elastic Load Balancing automatically distributes incoming application trac across multiple Amazon EC2 instances, by enabling users to achieve greater levels of fault tolerance in their applications, seamlessly providing the required amount of load balancing capacity needed to distribute application trac. While with AWS we can tweak more the architecture and touch directly the lower layer, by conguring it and developing specic services that interface with the cloud suite, with Google Cloud Platform we can get the same benets at the cost of a more opaque architecture, quick to develop on, scalable but not transparent.
After an overview about the dierent Cloud layers, we know what a PaaS is and why for certain purposes it is as its strong points oered are: fast development, environment pre-provisioned and easy scalability. As we have seen, many businesses are running on the Cloud and several technologies and approaches are utilized. The market oers dierent solutions and suites to meet up with developers, many of them provides dierent layers of abstraction and environment to work with. However all of these oerings come in bundle, with a pretty static choice, based on the vendor technologies. In a dierent way, some good project at the Platform level exist; we want to take a look at a specic PaaS that is having a great momentum today, we are going to introduce the open PaaS par excellence: Cloud Foundry.
Chapter 3 Cloud Foundry

In the cloud era, the application platform delivered as a service, often called Platform as a Service (PaaS ), makes much easier to deploy, run and scale applications. At the state of the art, some PaaS oer a limited language and framework support, do not deliver key application services and restrict deployment to a single cloud. Cloud Foundry [20] is the industrys Open PaaS and provides a choice of clouds, framework and application services. As an open source project, there is a broad community both contributing and supporting Cloud Foundry [21]. Open cloud and open source are only part of the transformation underway, there are also continuous innovation and high velocity agile development along the way. Some open source projects foster inclusiveness and sacrice velocity, while some increase velocity at the expense of transparency. Cloud Foundrys unique vision is to foster contributions from a broad community of developers, users, customers, partners and independent software vendors while advancing development of the platform at extreme velocity. Cloud Foundry exists to provide a platform for the community of customers, partners and even former competitors to collaborate, teach, share and learn together, accelerating the pace of innovation and contribution.
38
Cloud Foundry
3.1
A Good Choice
Cloud Foundry does not stop its openness to the code, but extends it to dierent environments. Being an Open Platform as a Service is about having the ability to make several choices [22] that best t developers, as represented in Figure 3.1, such as: Choice of Developer Frameworks: The platform supports several and common frameworks such as Spring for Java, Rails and Sinatra for Ruby and Node.js. There is also support for Grails on Groovy and other JVM-based frameworks integrated into Cloud Foundry. As far as now the choice is restricted to those languages, the project will soon integrate other languages as Cloud Foundry matures; Choice of Application Services: Application Services allow developers to take advantage of data, messaging, and web services as building blocks for their applications. Cloud Foundry currently oers abstract logical components that link applications to external services like MySQL, MongoDB and Redis [23]. In addition, the services can be extended; the PaaS oers interfaces and constructs to link not natively supported services from scratch; Choice of Clouds: Cloud Foundry can run on several variety of clouds, both private and public are supported [24]; it is up to the developer and organization where they want to run it. Cloud Foundry can be run on the top of OpenStack or Amazon Web Services as well; Type of Usage: Platforms code is open sourced at Cloud Foundry.org under the Apache License making it easy for anyone to adopt and use the technology in virtually any way they want. This is one of the best ways to avoid the risk of lock-in and foster additional innovation.
3.2 The Architecture
39
Figure 3.1: A Triangle of choice Cloud Foundry is an interoperable PaaS framework that allows users to have freedom of choice across cloud infrastructure and application programming models, and cloud applications. Therefore developers do not have to worry about Virtual Machine conguration or environment setup anymore; a deployment can be really speeded up, as we discussed in. 2.1.2. It appears clear now as Cloud Foundry needs to oer portability, extensibility and scalability, to comply to PaaS standards. The architecture itself demands modularity and cross-compatibility between the dierent IaaS, in order to provide an environment ready to be used.
3.2
The Architecture
Cloud Foundry has been designed with a simple but eective concept in mind: The closer to the center of the system, the dumber the code should be [27]. Distributed systems raise fundamentally hard to solve problems [25], every component that cooperates to form the entire system should be as simple as it possibly can be, and still do its job properly. The architecture is both portable across dierent infrastructures and fundamentally extensible itself. The components are modular and loosely coupled,
40
Cloud Foundry
they know each other in a loosely coupled way via a Publisher-Subscriber message system called NATS; where any direct call, to dierent agents and all the tasks, and request communicate on. Every component in this system is horizontally scalable and self recongurable in case of failure, meaning it is possible to add as many copies of each component as needed in order to support the load of a cloud, and in any order with resilience properties always in mind. Since everything is very decoupled, it does not even really matter where each component resides or runs. We can break down the whole system into ve main components, as displayed in the Figure 3.2, plus a message bus: the Cloud Controller, the Health Manager, the Router, the DEAs (Droplet Execution Agents) and a set of Services.
Figure 3.2: Cloud Foundry Architecture
41
3.2.1
NATS
NATS is a lightweight publish-subscribe and distributed queuing cloud messaging system, Cloud Foundrys message bus, written in Ruby. It is the system that all the internal components communicate on. The dierent agents, within the architecture, re o dierent messages and receive them from other components on dierent subjects. When the other components have completed their tasks, usually send a message back on the NATS bus. Other components have the option to listen to what is happening on the NATS and perform peripheral tasks, such as ensuring the DNS is correctly congured for deployed applications, logging activity, or managing scalability. The NATS client is also built with EventMachine [26], which means communication is asynchronous and does not block the invoker which can immediately handle any NATS messages that are pushed to it; the pubsub system oers in addition multiple subjects for several communications and tasks. When each daemon rst boots, it connects to the NATS message bus, subscribes to subjects it cares about (ie: provision or heartbeat signals), and also begins to publish its own heartbeats and notications. We are able to replicate mostly any component, in this way as we only require the NATS endpoint, for each component, to acquire the connection and the message ow for each task.
3.2.2
Cloud Controller
The Cloud Controller is the main orchestrator of the system. This is an application that uses EventMachine[28] to be fully async and Sinatra[29] (web application library) to expose REST APIs. This component exposes the main REST interface that the Command Line Interface (CLI) tool cf talks to. The orchestrator of the system wears many hats, the main ones: Control of the life cycle of an application; Initiation of the staging process of a new application;
42 Selection of the best DEA agent;
Cloud Foundry
Reception of information from the Health Manager about applications; Control of clients access credentials; Management of the spaces, organization and user; Binding of services to the applications. The Cloud Controller maintains a database (CC DB) with tables for organizations, spaces, applications, services, service instances, user roles and tasks. Relying on the data structure created during the deployment and rst run of this component, the Controller takes care of several tasks. Each time a command is issued, via CF CLI, the Cloud Controller checks if the user is authenticated - authentication is performed by providing a UAA Token in the authorization HTTP header- and if has the right role, combined with a set of permissions, to manage the life cycle of the applications. There is an access validation whenever the users try to access to the space and organization associated. While the Cloud Controller can answer the REST calls via Sinatra and provide the right endpoints for all the client requests, by using Sequel (an object relational mapping tool) it can asynchronously update the Postgres database and dynamically be notied of changes. The Cloud Controller can be seen under a Model View Controller (MVC), where: the model is persistent on a database and associated with specic logic classes at run-time; the view is represented by the REST APIs that oer specic endpoints to the CF CLI commands issued; and the controller is partially obtained through specic REST logic and partially through a set of specic classes that interact directly with other Cloud Foundry components, via NATS, and are driven by database updates and events. The high level architecture of this version of the Cloud Controller can be summarized as follows: Sinatra HTTP framework; Sequel ORM;
3.2 The Architecture Thread per request, currently using Thin in threaded mode; NATS based communication with other CF components.
43
By adopting these components the Cloud Controller can oer specic APIs to the clients that grant: Consistency across all resource URLs, parameters, request-response bodies and error responses; Partial updates of a resource can be performed by providing a subset of the resources attributes; Pagination support for each of the collections; Filtering support for each of the collections. A developer typically will interact with the Cloud Controller only during the rst process of pushing an application to Cloud Foundry, that it can be translated into a simple upload of his application and a transfer of the only les that are really required to run the piece of software. The deployment of an application starts always with an initial push. Thanks to the CF CLI and the Cloud Controller, the applications le are ngerprinted and the orchestrator can keep track of the changes, it is like a built-in version control system. Then the client only sends the objects that the cloud requires, in order to create a full Droplet (a droplet is a tarball of all applications code plus its dependencies, all wrapped up into a droplet with a start and stop button). Moreover the Cloud Controller is in charge to manage a blob store, containing:
44
Cloud Foundry Resources: les that are uploaded to the Cloud Controller with a unique SHA such that they can be reused without re-uploading the le; Application packages: un-staged les that represent an application; Droplets: the result of taking an application package, processing a buildpack and getting it ready to run.
The blob store uses the FOG technology (a Ruby cloud service library) such that it can use abstractions like Amazon S3 or an NFS-mounted le system for storage.
3.2.3
Droplet Execution Agent
This is an agent that is run on each node that actually runs the applications. So in any particular cloud build of Cloud Foundry, there will be more DEA nodes then any other type of node in a typical setup. Each DEA can be congured to advertise a dierent capacity and dierent built-in image for the applications, identied via stack label. So not all DEA nodes are of the same size or are able to run the same applications. The DEA itself is written in Ruby and takes care of managing an application instances life cycle. It can be instructed by the Cloud Controller to start and stop application instances. It keeps track of all started instances, and periodically broadcasts messages about their state over NATS (meant to be picked up by the Health Manager). The Droplet Execution Agents were designed with an idea in mind: as much as possible modular. We do not need to know exactly the id of a DEA node, or to make a direct call to start/stop an application; when we talk about a DEA we need a service, an agent that can fulll the request. NATS here has a central role: the nodes publish an advertise message with their capabilities and the orchestrator, Cloud Controller, browses between all the messages to nd the most suitable. We can congure each execution node and set up a
45
dierent capacity, stack, disk size in order to create pools of execution agents. The DEA does not necessarily care what language an app is written in. All it sees are droplets: a droplet is a simple wrapper around an application that takes one input, the port number to serve HTTP requests on and it also has two buttons, start and stop. So the DEA treats droplets as black boxes: when it receives a new droplet to run, it tells it what port to bind to and runs the start script. A droplet again is just a tarball of the application, wrapped up in a start/stop script and with all the conguration les, rewritten in order to bind to the proper database. Once it tells the droplet what port to listen on for HTTP requests and runs its start script, then the app properly binds to the correct port; later it will broadcast on the bus the location of the new application so the Routers can know about it. If the app did not start successfully it will return log messages to the CF client that tried to push this app, telling the user why their app did not start. To summarize, the key functions of a Droplet Execution Agent (DEA) are: Stage applications: a DEA uses the appropriate buildpack to stage the application, the result of this process is a droplet; Manage Warden containers: after the staging process the applications run in Warden containers, a DEA is in charge to control them; Run droplets: a DEA manages the lifecycle of each application instance running in it, starting and stopping droplets upon request of the Cloud Controller. The DEA monitors the state of a started application instance, and periodically broadcasts application state messages over NATS for consumption by the Health Manager. To guarantee a good availability, DEA periodically checks the health of the applications running in it. If a URL is mapped to an application, the DEA attempts to connect to the port assigned to the application. If the application
46
Cloud Foundry
port is accepting connections, the DEA will consider that application state to be Running. If there is no URL mapped to the application, the DEA checks the system process table for the applications process PID; if the PID exists, the DEA will consider that application state to be Running.
3.2.4
Warden
The container of dierent apps on DEA nodes. The Wardens primary goal is to provide a simple API for managing isolated environments. These isolated environments (or containers) can be limited in terms of CPU usage, memory usage, disk usage, and network access. The isolation is achieved by namespacing kernel resources that would otherwise be shared; because the applications will be co-located on the same node. Obviously the intended level of isolation is set such that multiple containers present on the same host should not be aware of each others presence. This means that these containers are given (among others) their own PID (Process ID) namespace, network namespace, and mount namespace while the resource control is done by using Control Groups (cgroups). Every container is placed in its own control group, where it is congured to use an equal slice of CPU compared to other containers, and the maximum amount of memory it may use. Warden is a daemon that manages containers and can be controlled via a simple API rather than a set of tools that are individually executed. While the Linux backend for Warden was initially implemented with LXC, the current version no longer depends on it, this because running LXC out of the box is a very opaque and static process[30]. There is little control over when dierent parts of the container start process are executed, and how they relate to each other. Because Warden relies on a very small subset of the functionality that LXC oers, this tool executes pre congured hooks at dierent stages of the container start process, such that required resources can be set up without worrying about concurrency issues. These hooks make the start process more transparent, allowing for easier debugging when parts of this process are not working as expected.
47
3.2.5
Router
The Router routes trac coming into Cloud Foundry to the appropriate component, usually Cloud Controller or a running application on a DEA node. The router is implemented in Go. Implementing a custom router in Go gives full control over every connection to the router, which makes it easier to support WebSockets and other types of trac. All routing logic is contained in a single process, removing unnecessary latency. When gorouter is used in Cloud Foundry, it receives route updates via NATS from the Droplet Execution Agents. Routes that have not been updated in two minutes, by default are pruned. Therefore, to maintain an active route, it needs to be updated at least every two minutes. In this way we guarantee updated routes and a sort of monitoring. If the applications have an entry in the router table, we can assume that they are running and they are reachable, otherwise they would have been removed. If the DEA node or the application itself crash, the gorouter will lose the entry after few minutes. In a larger production setup there is a pool of Routers load balanced behind Nginx or some other load balancers. These routers listen on the bus for notications from the DEA nodes about new apps coming online and apps going oine. When they get a real-time update they will change their in-memory routing table, that they consult in order to properly route requests. So a request coming into the system goes through Nginx, or some other HTTP termination endpoint, which then load balances across a pool of identical Routers. One of the routers will pick up the phone to answer the request, it will start inspecting the headers of the request just enough to nd the Host: header so it can pick out the name of the application this request is headed for. It will then do a basic hash lookup in the routing table to nd a list of potential backends that represent this particular application.
3.2.6
Health Manager
The Health Manager is a standalone daemon that has a copy of the same models the Cloud Controller has and can currently see into the same database as the Cloud Controller. This daemon has an interval where it wakes up
48
Cloud Foundry
and scans the database of the Cloud Controller to see what the state of the world should be, then inspects the real state to make sure it matches the one desired. If there are things that do not match, then it will send specic messages back to the Cloud Controller to correct this incongruity. This is how it is handled the loss of an application or even a DEA node per say. If an application goes down, the Health Manager will notice and will quickly remedy the situation by signaling the Cloud Controller to start a new instance. If a DEA node completely fails, the app instances running over there will be redistributed back out across the grid of remaining DEA nodes. Health Manager monitors the state of the applications and ensures that started applications are indeed running, their versions and number of instances correct. The Health Manager is essential to ensuring that apps running on Cloud Foundry remain available and scale correctly, it is needed to restart applications whenever the DEA running an app shuts down for any reason or Warden kills the app because it violated a quota or just the application process exits with a non-zero exit code. Conceptually, this is done by maintaining an actual state of applications and comparing it against the desired state. When discrepancies are found, actions are initiated to bring the applications to the desired state, i.e. start/stop commands are issued for missing or extra instances, respectively. The current Cloud Foundry release is using this component, but a new brand version will be soon released under the name of HM9000.
3.2.7
User Account and Authentication Server
Also called UAA, it is the identity management service for Cloud Foundry, its primary role is as an OAuth2 provider, issuing tokens for client applications to use when they act on behalf of Cloud Foundry users. It can also authenticate users with their Cloud Foundry credentials, and can act as a Single Sign-on (SSO) service using those credentials (or others). It has endpoints for managing user accounts and for registering OAuth2 clients, as well as various other management functions. It provides single sign-on for web applications and secures Cloud Foundry resources. In addition it grants access tokens to client applications for use in accessing Resource Servers in the
49
platform, including the Cloud Controller. It is a plain Spring MVC webapp that provides:
OAuth2 authorize tokens; A login endpoint, to allow querying for login prompts; A check token endpoint, to allow resource servers to obtain information about an access token submitted by an OAuth2 client; A Simple Cloud Identity Management (SCIM) user provisioning endpoint; OpenID connection endpoints to support authentication, to get user infos and check id.
The authentication usually can be performed by command line clients by submitting credentials directly to the authorization endpoint.
3.2.8
Services
Cloud Foundry Services are add-ons that can be provisioned alongside an application. There are two ways in which Cloud Foundry enables developers to add services to their applications, Managed Services and User-provided Service Instances. Managed Services have been integrated with Cloud Foundry via APIs and provision new service instances and credentials on demand; while user-provided Service Instances are a mechanism to deliver credentials to applications for service instances which have been pre-provisioned outside of Cloud Foundry.
50 3.2.8.1 User-provided Service Instances
Cloud Foundry
Sometimes we need only to provide a simple endpoint and a set of credentials to our application, before we push the application on the PaaS. Therefore if we know the connection parameters before the deployment, or just we do not need any Broker logic, we can inject this setting during the publication phase. The Cloud Foundry CLI will prompt us with basic and static information that we can associate with our service, whenever we want to add a Service Instance, an entity represented by a set of: hostname, port and password. Service Instances enable developers to use external services with their applications using familiar workows; in addition the user-provided ones (against the one provided via a Service Broker) are service instances which have been provisioned outside of Cloud Foundry, as we can only dene the parameters to connect to, without providing any kind of additional logic. For example, a DBA may provide a developer with credentials to an Oracle database managed outside of, and unknown to Cloud Foundry. Rather than hard coding credentials for these instances into an application, it is possible to create a mock service instance in Cloud Foundry to represent an external resource using the familiar create-service command, and provide whatever credentials the application requires. Once created, user-provided service instances behave just like other service instances. 3.2.8.2 Managed Services
Cloud Foundry provides an API which is used to integrate services with Cloud Foundry, each time a new Managed Service manager is added, an interaction stars where the Cloud Controller is the client and the manager of those services is the Service Broker. The APIs involved are RESTful HTTP API, those should not be confused with the version of the Cloud Controller API, often used to refer to the version of Cloud Foundry itself; when one refers to Cloud Foundry V2 typically refers to the Cloud Controller version. The services API is versioned independently of the Cloud Controller API. When we need to oer a Manage Service, we need to provide a logic construct called Service Broker. The Broker is the term used to refer to a component
3.3 Roles and Organizations
51
which implements the service broker API and oers an endpoint to the Cloud Controller during the provisioning of the required service. In general, service brokers advertise a catalog of service oerings and service plans to Cloud Foundry, and receive calls from Cloud Foundry for ve functions: fetch catalog, create service, bind service, unbind service and delete service. What a broker does with each call can vary between services, is totally up to the business logic that a developer would add; in general the command create reserves resources on a service and bind delivers information to an application necessary for accessing the resource. The reserved resource is called a Service Instance, what a service instance represents can vary by service, obviously: it could be a single database on a multi-tenant server, a dedicated cluster, or even just an account on a web application. How a Service Broker handles the life cycle of all these services created is again up to the service provider/developer. Basically Cloud Foundry only requires that the service provider implements the service broker API: a broker can be implemented as a separate application, or by adding the required HTTP endpoints to an existing service, we need only to comply to a specic set of REST APIs.
3.3
Roles and Organizations
As shown in Figure 3.3, Cloud Foundry oers many meta objects within the concept of the organization. Each organization is a logical abstraction that encompasses three things: domains, spaces and users. A domain is exactly a domain-name, like acme.com or foo.net. This feature allows nal user to associate application to custom domains registered to an organization. Each application deployed on the PaaS is always a web application, that needs to be reached from the Internet. Via a domain, we can congure several application and aggregate them under the same internet domain name. Tipically when we use for the rst time Cloud Foundry, a default domain is available to all spaces. Domains can also be multi-level and contain sub-domains like for example store in store.acme.com. Domain objects belong to an organization and are associated with zero or many spaces within the organization,
52
Cloud Foundry
moreover they are not directly bound to application, but a child of a domain object called a route is. A route, is associated with an application and bind the application with the
Figure 3.3: Organization and Roles domain. Once a web application is pushed to Cloud Foundry, a route, must be provided. The route is at the end a subdomain, that will let the Router component forward all the request to the right application. The Space, as shown in Figure 3.3, is always part of an organization; in addition every organization can have multiple spaces. The concept of the spaces provide separation and boundaries for all the application. The default ones for a standard Cloud Foundry installation are development, test, and production. In each space we can deploy multiple applications. In order to control and manage the users and the whole organization some permissions are required, such as: Org manager: the org-admin permission is used to edit the Access Control List (ACL) on the organization. The org-admin permission is required in order to create or delete an app-space, the enumerate appspaces, to manage organization level features, to change the plan for an organization, to add users to the organization;
3.4 Command Line Client
53
Org audit: the org audit permission gives the user rights to see all organization levels and application space levels reports and also all organization space level and app space level events; The app space permissions are required to handle applications and services, some of them are: App space manager: this permission is required to edit the ACL on an app-space. In addition, it is required to add additional managers, to invite developers and to enable/disable/add features to the appspace which can then be used by applications within the app-space. The admin permission on an app-space does not give one the ability to create or delete app-spaces. This function is considered to be an operation on the org object; Developer: the permission is required in order to perform all operations against apps and services within the app-space. With this permission it is possible to: create, delete, stop, change instance count, bind/unbind services, read logs and les, read stats, enumerate apps, change app settings. If we were to map this to todays current system, all users have the developer permission for their account; App space audit: the audit permission is required to read all state from the app-space and all containing apps. If all users have this audit access, they can do anything that is non-destructive. They can enumerate apps and services, read all logs, les and stats from all apps and services within the space. This permission does not allow any destructive operations and does not allow any mutations. Note, the audit permission is a subset of the developer permission.
3.4
Command Line Client
When a developer wants to interact with Cloud Foundry and control his own application, the easiest way is to use the CF Command Line Interface. CF is a Ruby Gem command line interface, that can be used to deploy and
54
Cloud Foundry
manage applications running on most Cloud Foundry based environments. This tool comes useful not only to handle the life-cycle of an application, but also it can be employed to manage many other properties of the system, like organizations, spaces and services. The client can manage dierent things: 1. login and logout of users; 2. creation of dierent users; 3. complete life cycle of an application and all kind of informations; 4. customization of services; 5. creation and set up of organization and spaces; 6. handle dierent routes and domain for the applications. Usually when we deploy an application we can provide a deployment descriptor le, where all the conguration parameters can be written. The CF CLI parses this values and interacts directly with the Cloud Controller, calling the remote APIs, to provide the conguration desired. The client, can also handle the versioning of the projects via interaction with the Controller. When we need to scale the instances of some applications or just increase the memory limits, the CF client allows the user to do that.
3.5
Applications Guidelines
The applications written using the runtimes and frameworks supported by Cloud Foundry often run unmodied on Cloud Foundry, if the application design follows a few simple guidelines and does not write les to the local le system. There are few reason for this: Local le system storage is short-lived: when an application instance crashes or stops, the resources assigned to that instance are reclaimed by the platform including any local disk changes made since the app started. When the instance is restarted, the application will start with a new disk image. Although an application can write local les while it is running, the les will disappear after the application restarts;
3.5 Applications Guidelines
55
Instances of the same application do not share a local le system: each application instance runs in its own isolated container. Thus a le written by one instance is not visible to other instances of the same application. If the les are temporary, this should not be a problem. However, if an application needs the data in the les to persist across application restarts, or the data needs to be shared across all running instances of the application, the local le system should not be used. Rather it is recommended using a shared data service like a database or blob store for this purpose. For example, rather than using local le systems, it is a good idea to use a Cloud Foundry service such as the MongoDB document database or a relational database like MySQL or Postgres. Another option is to use cloud storage providers such as Amazon S3, Google Cloud Storage, Dropbox, or Box. If the application needs to communicate across dierent instances of itself (i.e. for example to share state), a good approach is to consider a distributed cache system like Redis or a messaging-based architecture with RabbitMQ. Regarding the client sessions or connections, Cloud Foundry supports session anity or sticky sessions for incoming HTTP requests to applications. If multiple instances of an application are running on Cloud Foundry, all requests from a given client will be routed to the same application instance. This allows application containers and frameworks to store session data specic to each user session. Cloud Foundry does not persist or replicate HTTP session data. If an instance of an application crashes or is stopped, any data stored for HTTP sessions that were sticky to that instance are lost. When a user session that was sticky to a crashed or stopped instance makes another HTTP request, the request is routed to another instance of the application. Session data that must be available after an application crashes or stops, or that needs to be shared by all instances of an application, should be stored in a Cloud Foundry service. Applications running on Cloud Foundry receive requests using only the URLs congured for the application, and only on ports 80 (the standard HTTP port) and 443 (the standard HTTPS port).
56
Cloud Foundry
3.6
Interaction and Usage
When we want to interact with Cloud Foundry and deploy our application, we have to know that our deployment is done by Cloud Foundry CLI cf push command. The deployment process is often referred to as pushing an application. When an application is pushed, Cloud Foundry performs a variety of staging tasks, which at a high level, consist of nding a container to run the application, provisioning the container with the appropriate software and system resources, starting one or more instances of the application, and storing the expected state of the application in the Cloud Controller database. The staging process is required in order to have a running application. By default, all les in the applications project directory tree, except version control les with le extensions (e.g. .svn, .git and .darcs) are uploaded to the Cloud Foundry instance. If the application directory contains other les (such as temp or log les), or complete sub-directories that are not required to build and run your application, the best practice is to exclude them using a .cgnore le (.cgnore is similar to gits .gitignore). Especially with a large application, uploading unnecessary les slows down application deployment. To avoid the risk of an application being unavailable during Cloud Foundry upgrade processes, it is recommended to run more than one instance of an application. When a DEA is upgraded, the applications running on it are evacuated: shut down gracefully on the DEA to be upgraded, and restarted on another DEA. BOSH is congured to upgrade DEAs one at a time, so for an application whose startup time is less than two minutes, running a second instance should be sucient. Cloud Foundry recommends running more than two instances of an application that takes longer than two minutes to start. With Cloud Foundry, when we talk about runtimes, the choice is quite tight; it supports only three kind of buildpacks to deploy applications: Ruby; Javascript (NodeJS); Java.
3.6 Interaction and Usage
57
Cloud Foundry stages application using framework and runtime-specic buildpacks; currently provides buildpacks for the several runtimes and frameworks. It also supports custom buildpacks as described on Custom Buildpacks [31]: there are some community-developed buildpacks. To use a buildpack that is not built-in to Cloud Foundry, it is necessary to specify the URL of the buildpack when an application is pushed, using the buildpack qualier in the deployment le. The details of how an application is deployed are governed by a set of required and optional deployment attributes. For example, one option is to specify the name of the application, the number of instances to run, and how much memory to allocate to the application via the command line when cf push is run; while another option is to write them in an application manifest le. If a buildpack is not specied during a push, Cloud Foundry determines which built-in buildpack to use, using the bin/detect script of each buildpack.
3.6.1
Staging
When we want to deploy an application and see it running, we need to issue a CF push command. With this command, the applications le will be uploaded to the Cloud Controller agent and we will have a running application at the end, reachable via an url. Between the push and the run of our application, there is a process called Staging. It is possible to upload an application, without requiring a running state, however before the rst run and the upload process, staging stage is mandatory. When an application is staged: a DEA process an uploaded application, in accordance with the buildpack selected for use by Cloud Foundry or specied by the user. The result of the staging process is a droplet. A buildpack is a set of scripts that the DEA runs on an application package to create a droplet that contains everything the application needs to run. A buildpack is specic to a particular framework or runtime environment. When an application is uploaded, Cloud Foundry examines the application artifacts to determine which buildpack to apply then a DEA node will be selected to start the staging process. At the end of the process the droplet will be produced. A droplet is an uploaded application to which a buildpack has been
58
Cloud Foundry
applied. It is the original application, with a wrapper around it that accepts one input: the port where it should listen for HTTP requests, and has two methods, a start and a stop within an executable startup script. The stager is the component in charge of staging the applications, each DEA node is both a stager and a running agent at the same time. When the node, running a DEA service, start a staging process, it will depend on: NATS server; HTTP endpoint for getting a zipped version of the application; HTTP endpoint for posting the droplet package, the result of the process and ready to be runed. Cloud Controller knows all the available stagers, thank to a NATS subject called staging.advertisement, where all the stagers publish messages. Whenever a new DEA is spawned, it starts publishing its capabilities and properties such as memory availability, disk availability, stack type and id to NATS server. Cloud Controller also know if an application is staged by querying the database, containing application information. To summarize, when we push an application and we require the start contextually, this happens: 1. Cloud Controller checks on the database if the application is already staged; 2. If the application needs to be staged (via AppObserver component), the Controller issues a staging request on the staging.dea-id.start; 3. The DEA node receive a staging message on the subject, containing: Application ID; Properties regarding required memory, disk and services; Download URL used to get the les that need a staging process; Upload URL used to store the droplet; Optional URL for a dierent buildpack;
3.6 Interaction and Usage
59
Start message, required as DEA node will know how many instances of that app need to be started. 4. DEA node starts the staging process. The staging process is made by these steps: 1. Unzip of the app package; 2. Detect buildpack to apply; 3. Find the right buildpack folder and script to run; 4. Run the right building script; 5. Tar the output les of the building script, the droplet; 6. Upload the droplet to the URL provided by the Cloud Controller; 7. Save locally the droplet for the run. After all these tasks, the DEA node sends a report to the Cloud Controller. If the staging process did not fail, the application goes directly to the start process, as shown in Figure 3.4.
3.6.2
Start of an Application
Each time we change the number of instances for a running application or we start a stopped application, the Cloud Controller REST endpoint receive the commands and updates the applications table in the database. The Controller detects a change in the number of running applications requested and the AppObserver component, inside the Cloud Controller, check if the application needs to be staged. The application can only be runned on a Droplet Execution Agent and the Controller knows all available DEAs thanks to a dea.advertisement NATS subject, where each DEA broadcast its capabilities and properties such as memory availability, disk availability and number of applications running. Then the Controller send a start message on the NATS subject dea.dea-id.start, containing:
60
Cloud Foundry
Figure 3.4: Start of an application Droplet ID; Name of the application; The URL where to nd the droplet (database); The URL requested by the developer, binded to the web application; Version of the application; Services requested; Memory, Disk size required; Optional start command. The DEA receives the message and checks if it has locally the droplet and if the version required is available. If not, the droplet will be downloaded from the database. When the droplet is available locally, these steps will be executed in order:
3.6 Interaction and Usage 1. Creation of a dedicated container, via Warden; 2. Extraction of the droplet; 3. Execution of the start script extracted from the droplet.
61
The DEA knows nothing about Ruby or Java or PHP; it only knows about the startup script within the package. At the end a status message will be sent to the Cloud Controller, in order to update the database information. Cloud Foundry and its component rely on a IaaS layer. Luckly, a specic deployer exists: BOSH. By using this tool, many tasks are automated and the open PaaS can be installed on many variety of providers such as: OpenStack, AWS, VSphere, etc.
Chapter 4 BOSH
While we explore all the parts and components of Cloud Foundry, we need to understand that we are dealing with a PaaS software, a layer above the IaaS. Cloud Foundry services and agents can not be installed directly on an Infrastructure layer, on a cloud platform such as OpenStack, AWS or VSphere by hand or using scripts; because we need an intermediate level deployer, a distributed services maintainer in order to ease the deployment and installation process on a cloud infrastructure. Cloud Foundry open source project includes an open source tool chain for release engineering, deployment, and life cycle management of large-scale distributed services; its name is BOSH.
4.1
BOSH
BOSH while developed to deploy Cloud Foundry, has a more general purpose; because it can be used to deploy even other distributed services; BOSH can deploy services and agents on Infrastructure as a Service (IaaS ) providers such as VMware vSphere, vCloud Director, Amazon Web Services EC2 and OpenStack. This is possible thanks to several BOSH components all written in Ruby language and many Cloud Provider Interfaces (CPI). When we want to install Cloud Foundry on a IaaS provider, rst of all we need a BOSH installation, then we can install later our PaaS software. This is necessary as we need to abstract from the dierent cloud providers implementations:
64
BOSH
we cannot bind our PaaS to specic IaaS APIs, we need to be more open and orchestrate our services with a deployer that is aware of all dierent providers, but can keep the same logic. The project BOSH was created with this purpose in mind. BOSH is scalable and Cloud aware, meaning that we can deploy BOSH on dierent variety of clouds. The BOSH architecture is designed to communicate via NATS and save status and tasks information on a database; in this way all the assignment can be handled in a synchronous manner, avoiding overlappings. We have to remember that BOSH has to deal with Cloud IaaS APIs and issue several task in order to get many running instances; we have to do with possible latency problems, API rate limits and quota limitations. BOSH usually can handle all these problems automatically, relying on is Architecture.
4.2
The Architecture
When we want to use BOSH we have to keep in mind this separation: we have a BOSH deployer architecture installed on the top of a IaaS layer and we have a BOSH client, called BOSH Command Line Interface (CLI). While the rst will directly interact with the Cloud Infrastructure APIs, the latter will be in charge to oer a simple client interface to issue commands to the BOSH APIs. The main BOSH components, as shown in Figure 4.1 are: Director, Workers, BOSH Agents, Health Monitor, Stemcells and a message bus as well. All these components can run within the same VM provided by the Cloud Infrastructure or within several VMs depending on the size of our BOSH deployment.
4.2.1
Stemcell
Before introducing all the BOSH parts, we need to understand what a stemcell is. BOSH stemcells are really like stem cells, each stemcell is a VM template with an embedded BOSH Agent built in. Basically, when we need
65
Figure 4.1: BOSH architecture
to run services or applications on a Cloud Infrastructure, we need instances: that are virtual machines running an operating system suitably modied to be used on a cloud provider. When we use a stemcell, we have to start a new Cloud instance deployed from a stemcell image. The stemcell can be a simple QCOW2 image (or another format, suitable for the Cloud Provider), ready to be used on a IaaS Cloud Provider, containing some BOSH related services, some Agents ready to talk with the Director (the brain and coordinator of our BOSH system). A stemcell can be used for many deployments: is a standard clean Ubuntu distribution, with little additions. BOSH can do more, via stemcells: we can aggregate our jobs (services specic to be deployed by BOSH requested by our BOSH CLI) or spawn them inside instances; so as to obtain all the benets of a cloud platform. BOSH can orchestrate and keep track of all the jobs that we requested to run, but it needs to have a sort of monitor or agent inside each instance spawned. Stemcells are uploaded using the BOSH CLI to BOSH and used by the BOSH Director when creating VMs through the Cloud Provider Interface (CPI). If we need personalized stemcells or we want to create our own stemcell with several dependencies and services built-in, BOSH provides a tool that let us do that, as seen here [32]. When the Director creates a VM through the CPI, it will pass along congurations for networking and storage, as well as the location and cre-
66
BOSH
dentials for the Message Bus and the Blobstore (a container for packages and jobs).
4.2.2
Jobs and Packages
BOSH can deploy really a great variety of possible services or applications, it can install on a cloud system Ruby, Java, Go applications. We need to ll the gap between a developer point of view (where only source code and services are designed) and deployer point of view. BOSH is not tailored to a specif kind of services or languages, it can deploy almost every kind of an application on the top of a IaaS provider. When we have our services written, we need to prepare BOSH jobs and packages, to be ready to deploy them on a stemcell. A package[33] is a collection of source code along with a script that describes how to compile the code to binary format and install the package, with optional dependencies on other prerequisite packages. Packages are compiled, as necessary, during deployment. To turn source code into binaries, each package has a packaging script that is responsible for the compilation, and is run on the compile VM (BOSH will start some instances only for compiling process). While the package contents are specied in the spec le, which has three sections: Name: name of the package; Dependencies: optional list of other packages this package depends on; Files: List of les this package contains, regular expressions are supported. The package spec le contains a section that lists other packages that the current package depends on. These dependencies are compile time dependencies, as opposed to the job dependencies, which are runtime dependencies. When the Director plans the compilation of a package during a deployment, it rst makes sure all dependencies are compiled before it proceeds to compile the current package, and that prior to starting the compilation all dependent
67
packages are installed on the compilation VM. That means: when a service or component has to be deployed by BOSH, we need to create a package with all the binaries and specify two dierent les (packaging and spec); in this way BOSH will prepare our future jobs or dependencies. Once all the packages are compiled from source, BOSH can now start thinking about the jobs we want to deploy. Jobs[34] are a realization of packages, are the real components and services that we want to run. A job contains the conguration les and startup scripts to run the binaries from a package. There is a many-to-many mapping between jobs and VMs. One or more jobs can run in any given VM, and many VMs can run the same job. For example, four VMs could run the same Cloud Controller job, or we can start Cloud Controller job and the DEA job within the same VM. If you need to run two dierent processes (from two dierent packages) on the same VM, you need to create a job that starts both processes. When a job is specied, we need to get ready: a spec le, a monit le and a folder named templates, where all our conguration les will be stored. Template folder will contain template les used for conguration and management of the jobs life cycle: a template is a ruby ERB le that can be loaded with variables written within spec le, BOSH can use templates to obtain the nal script to launch our job. When a template is turned into a nal conguration le, instance-specic information is abstracted into a property that later is provided when the Director starts the job on a VM. Information includes, for example, which port the webserver should run on, or which username and password a database should use and so on. When a job is started, it could need some specic properties written inside conguration les; via the properties section in spec le it is possible to inject the right values to the ERB template les, when BOSH prepares the jobs. Job spec le contains:
Name: name of the job; Templates: ERB les that will be turner in running script or conguration les;
68
BOSH Packages: dependencies and packages required by the job at runtime or in order to start it; Properties: values and attributes that template les will read, and written in the nal conguration or start les.
We need to keep in mind that when a VM is rst started, it is a simple stemcell, which can become any kind of job or service. BOSH Director will turn our neat instance into a specialized VM with several duties. The set of softwares, congurations, templates (organized in packages and jobs) is called BOSH Release. A daemon is in charge to keep track of all the jobs running on each VM, for this reason, inside the job folder, we have a monit le too; that le is mandatory as BOSH uses monit to keep track of all the running jobs. 4.2.2.1 Monit
The Director is able to keep track of all running processes and jobs, thanks to monit. BOSH uses monit[35] to manage and monitor the process(es) for a job. Monit is an utility for managing and monitoring, processes, programs, les, directories and lesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations (i.e. monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses to much resources.). Monit provides a HTTP(S) interface and it is possible to use a browser, or a plain HTTP connection, to access the monit server.Basically with monit it is possible to: Monitor daemon processes running locally; Automatically stop or restart a process if certain issues come up; Monitor network connections and perform network tests; Run test script at certain times and send alerts;
69
Monitor machine resource usage such as CPU, memory and load average. The monit conguration le describes how the specied job will be stopped and started, and it contains at least three sections: with pidle: where the process keeps its pid le; start program: how monit should start the process; stop program : how monit should stop the process. Usually the monit le contains a script to invoke to start and stop the process, but it can invoke the binary directly: that script is specied via job template les. Each stemcell includes monit, and then it can accomodate each kind of process we want as the monitoring and management system is just built-in. So the Director can track all the jobs and query their status via monit, that is running inside each stemcell. So far so good, but we do not know how a running stemcell knows which job it has to start or where all the packages and jobs are stored. We need to introduce BOSH Agent, a BOSH service running within each BOSH stemcell.
4.2.3
BOSH Agent
BOSH Agents listen for instructions from the BOSH Director and every VM, under BOSH control, contains an Agent. Through the Director-Agent interaction, VMs are given Jobs, or roles. If the instances job is to run MySQL, for example, the Director will send instructions to the Agent about which packages must be installed and what the congurations for those packages are. The dialogue between the Agent and the Director takes place on NATS messaging system. Basically the Agent[36] can perform these following tasks: Mount any persistent disk assigned to the VM; Compile packages and upload the results to the blobstore;
70 Apply a spec and install packages with job templates; Manage life cycle of a job with monit; Setup ssh connection to the VM is running on.
BOSH
When a job need to be started, the Director takes actions and starts a dialogue with an Agent. By default, and within a normal bosh deployment, a BOSH Agent listens and responds to API requests via NATS. Alternately, it can be run to respond to HTTPS requests. Everything starts with a new Virtual Machine request, issued by the BOSH Director to the Cloud Provider, via CPI. The virtual machine instance is started from a stemcell image and the Agent comes to life thanks to the Openstack user datas, injected by Cloud Provider. Once the Agent is up
Figure 4.2: BOSH Director and Agent basic interaction and running, it starts listening to the right NATS subject, ready to receive a rst ping message. The Director waits till the virtual machine is ready, then tries to contact the Agent. If the Agent answers the ping message, the Director will start preparing the apply message, as shown in Figure 4.2. This kind of message is in charge to transform an equipotent instance, into a specialized server running a certain job. The apply message contains: Name of deployment; Release in use;
71
Job or jobs to run, represented by: a name and a template to apply; Network conguration; Resource pool in use; Packages required; Properties to apply; As soon as the Agent receives the apply message, the transformation happens: with a get blobs message the Agent asks the blobstore the packages and jobs required, then it downloads all the packets. When everything is stored locally on the instance, the BOSH Agent untar the compressed les and starts the compiling process, if required for the packages. When all the buildpacks are compiled, the Agent sends everything back to the blobstore, that will store everything in the database, and starts to apply the templates to the job needed to be run. The job manifest le, received from the Agent, is exactly the spec le written during the generic job specication. When all the conguration and startup script are converted from the template les into the right les, the Agent asks monit to reload its conguration. Now the Agent waits only for a start message from the Director. It is important to understand that the Director has a central role and coordinates all the Agents, during their transformation. It is possible to have several Agents compiling packages during a deployment of a complex architecture like Cloud Foundry.
4.2.4
Blobstore
The BOSH Blobstore is used to store the content of releases, that is BOSH jobs and packages in their source form as well as in the compiled image of BOSH Packages. Releases can be uploaded by the BOSH CLI and inserted into the Blobstore by the BOSH Director. When we deploy a release, BOSH will orchestrate the compilation of packages, coordinate the Agents and store the result in the Blobstore. When BOSH deploys a job to a VM, the BOSH Agent will pull the specied job and associated packages from the Blobstore.
72
BOSH
BOSH also uses the Blobstore as an intermediate store for large payloads, such as log les and output from the BOSH Agent that exceeds the max size for messages over the message bus. When a Blobstore is installed it could use a local path to store les or Amazon S3, Atmos, Openstack Swift as well.
4.2.5
Director
Figure 4.3: BOSH APIs The Director is the core orchestrating components in BOSH. It controls creation of VMs, deployment and other life cycle events of software and services. Commands and controls are handed over to the the Director-Agent
73
interaction after the CPI has created resources and talked together with the Cloud Provider. Several and specic sub components manage many tasks: these components are controllers of the following classes 4.3 referenced from the APIs provided: Deployment Manager: It is responsible for creating, updating, and deleting the deployments that are specied in the deployment le. When the BOSH CLI issue a deploy command, this controller is in charge to start the deployment process and insert a new Task in the Job Runner queue; Instance Manager: A component required to maintain and to manage the instances. When a new Task is created, a new AgentClient is required to dialogue with the Agent running on a selected instance. The Instance Manager, is in charge to handle the Clients for several Agents; Problem Manager: This component helps scan a deployment for problems and helps apply resolutions. It keeps informations about the problem occurred and has a ont-to-many relationship with the Deployment Model, used by the Deployment Manager; Property Manager: All the properties dened for a certain job are written and static on a deployment le, however it is still possible to update jobs properties. This manager is called when attributes, specic for a xed job, change. Resource Manager: It provides access to the resources stored into the Blobstore. With this manager we can get a resource using an ID, delete a resource and obtain the resource path starting from an ID; Release Manager: It is a component that manages the creation and deletion of releases. When we want to deploy a specic release, we need to upload all the packages and jobs; this manager will handle the request;
74
BOSH Stemcell Manager: All the stemcells are managed by this component. It plays the role of the stemcell life cycle manager. Task Manager: Each time we issue a command or the Director starts a set of actions a Task will be created. Each Task is handled in a job queue and its state is stored in the database, in this way we avoid repeating the same tasks. We need to keep in mind that the Cloud Provider is not always snappy fast; VM State Manager: Check the status of the VMs used in a deployment, by issuing this command we will get the current state of the instances running a certain set of jobs.
The Director acts as a central point of control and coordination, it is the endpoint for all the requests issued by the BOSH CLI. When we start using the BOSH CLI we need to target, the rst time, the IP of our Director, in order to start a dialogue.
4.3
BOSH Manifest
We can specify jobs, packages; our agents are ready to be installed on some cloud instances, but we do not know how congure BOSH and dene the mappings between our tasks and the running VMs. The BOSH Manifest is a YAML (YAML Aint Markup Language) le with all the settings and properties for our jobs. We need to keep in mind that a release is a set of packages and jobs, while a deployment is the real installation of our architecture components on the top of a cloud environment. We can upload to BOSH dierent releases and issue several deployments. The Manifest, also known as Deployment le, consists of several sessions, the most important are: Name: name of the deployment we want to install with a Cloud Provider;
4.3 BOSH Manifest
75
Director UUID: universally unique identier of the Director in our BOSH installation. This value is extremely important, the Director during the process of deployment will check that value. Only BOSH authorized users can deploy releases and read the UUID; Release: name and version of the release we want to use for our deployment; Compilation: contains the number of Worker instances, networks to be used during and the cloud properties (instance types, availability zones,etc.) for the deployment process. Some of our packages can require a building process, this session of the Manifest will specify how many VMs we will need to speed up and accomplish the task; Networks: part specic for the Cloud Provider network settings. Usually we can choose between an automatic conguration or a manual one. When the rst is choosen, BOSH will use the CPI to obtain ip addresses for all the Virtual Machine and will manage the network connections; when the latter is requested, static IPs (visible only internally in the cloud network), gateway, dns servers, security groups need to be specied manually; Resource Pools: contains the Cloud specic instance attributes, avors and images type that we want to be used during our deployment. Here we set the dierent kinds of resources, that will be associated in the job session of this le; Jobs: under this section we will specify the mapping between our BOSH release jobs and the cloud instances. This part of the manifest is called jobs and it may be misleading: we do not dene here the jobs, we will associate our jobs dened in our BOSH release (with packages) to the VMs started by the IaaS. Under this tag we can have several entries, each one with a: Name: name of the association;
76
BOSH Template: name of the release jobs that will be installed on the Virtual Machine. Usually is setted one job per instance, but we can save resources and merge dierent jobs inside the same instance. Useful if we want are paying more for additional VMs, counterproductive if we need performances; Instances: number of the same instance desired; Resource Pool: which kind of resource pool we want to use, previously dened; Networks: the network, previously dened, bound to our new virtual machine. Properties: here we will override and add all the properties available in our release job spec le. This section contains many entries, each of them is bound one to one to the job properties.
When this le is ready, we can deploy our release on the top of a Cloud Computing platform, with a running BOSH. In this chapter we have seen BOSH, how it works and how we can use it; this distributed services maintainer greatly eases the long and complicated steps required to install a PaaS on a IaaS layer. However, due to the novelty of the project, fast open-source growth and continuous contributions on a daily base, BOSH might appear slightly transparent in its usage and conguration. In the next chapter we are going to present two main ways to use this powerful instrument to install and interact with Cloud Foundry.
Chapter 5 Cloud Foundry Deployment

Now we know Cloud Foundry and its deployer, BOSH; when we want to try our PaaS or understand it better, the best path is to install it. Our chosen PaaS does require a lower cloud layer to run, an infrastructure ready to host its components on separate virtual machine instances; for this reason a deployment through BOSH is mandatory and, as we have seen, the deployer itself is meant to be used with an IaaS provider. A distributed installation might not be the rst choice, when we simply need a test environment to become familiar with the PaaS. All the components we have seen, can run within dierent virtual machines or inside the same host. One advantage of the installation on one single host, is the opportunity to understand quickly all the services running, make changes in a deployment that is local to our machine and try them out whenever we want to verify our changes; this sort of installation on a single node, can easily be achieved with a local installation, as described during this chapter. In a dierent way, if a complete installation at the state of the art is required, a various approach is along the way, a distributed installation; a deployment that will end with several virtual machines running on a IaaS layer and hosting all the Cloud Foundry components we have requested. Both methods will be discussed here in details and a working example will be prompted for both of them as a good starting point to play with is really important. We will give an idea of the conguration and explain all the steps involved to have a running environment, as the ocial documentation might be in some ways lacking and all the procedures are typically complicated.
78
Cloud Foundry Deployment
5.1
The deployment
As we have seen Cloud Foundry consist of several components and agents. When it comes to deploy or install Cloud Foundry, we need to deal with a BOSH release, the typical complex architecture of our convoluted PaaS has been designed to work on an IaaS cloud. To deploy it, we need to checkout the source code of the cf-release[37] with git, and issue a deploy command on BOSH. Inside the cf-release folder we will nd all the required jobs, packages and release denition. All the Cloud Foundry components are specied as jobs and their denition is placed under job folder, while all the dependencies are under packages folder as well. The release folder will contain the version in use and all the ngerprints of the source code on github of the several components. There are various releases of Cloud Foundry, more than 140 to this day. We talked about BOSH, the main and complete version, designed to be used on a IaaS cloud; however we can nd Nise BOSH[38] and BOSH-lite[39]: both are a sort of emulator of the real BOSH, with some reduced features. While the rst is a lightweight BOSH emulator capable of installing Cloud Foundry locally on a machine; the latter is simplied BOSH ready to be installed in a local Virtual Machine via Vagrant.
5.2
Local deployment
It is possible to install a Cloud Foundry distribution within a single machine. This can help understanding the whole architecture before trying to deploy the distributed version. In addition it could be a good starting point to start digging into the code, modifying the deployment and keep track of all the message ow between all the components. Consecutively a method is presented and suggested to get a running environment avoiding most of the issues.
5.2 Local deployment
79
5.2.1
CF Nise Installer
The only requirements to run Cloud Foundry and install it via this script are a Ubuntu Linux operating system and enough memory to handle all the processes. It is important to remember that Cloud Foundry components are designed to work on dierent instances and communicate on the network, some of these components (especially DEA) require abundant resources to run, easy to obtain with a Cloud Provider. For a local installation cf nise installer[40], based on Nise BOSH, is a simple tool: it lets the developer install a working Cloud Foundry release in few steps, within a single machine; the set of scripts will: Install git, required to checkout Cloud Foundry release code; Install Nise BOSH and initialize it: this part of the process will install tools to build a local stemcell and monit; Install Ruby 1.9.3 on the host; Download from git cf-release; Create a deployment le; Use Nise BOSH to install a local cf-release on a host machine: passing Cloud Foundry release folder, deployment le, name of deployment and local machine IP address. Nise BOSH will act as a BOSH Director, it will read the deployment le and start a BOSH Agent rubygem, that will receive an apply message for the deployment le and prepare the host machine. All this process will complete without any use of Cloud Provider Interfaces, as everything will be stored locally. This can let us install all the Cloud Foundry components on a single machine. At the end we will have this folder structure under /var/vcap: bosh: containing les required by the BOSH Agent;
80
Cloud Foundry Deployment jobs: having inside all the jobs requested by our deployment, obviously we will nd here the Cloud Foundry jobs specied in its release; packages: providing all the packages requested by the jobs at runtime. Here we will nd Cloud Foundry components source code, as Ruby is interpreted and not compiled; monit: containing all the les requested by the monitoring daemon; store: supplying a directory for the postgres database, used by Cloud Controller and UAA; shared: granting a folder and a path to the storing process handled by the Cloud Controller, we will nd here our staged droplets; sys: storing all the components log.
When the whole process ends, that folder structure will be created and we can start all the Cloud Foundry components by asking monit to do that. There are many great conveniences by using this local approach, we will have: a local Cloud Foundry source code, all Cloud Foundry components perfectly working within a single host, avoid BOSH deploymentif it had been a cloud environment, we would have to deal with a BOSH deployment rst, in order to install a distributed Cloud Foundry and a development installation ready to be edited. This local installation, has been the rst starting point to get in touch with Cloud Foundry and the rst choice to develop changes before deploying them on a distributed installation.
5.2.2
Local Development Environment
The local setup has been chosen to understand better the dierent parts of Cloud Foundry system and to speed up the development process. From a small conguration to a bigger one, the distributed. The suggested minimum conguration is: Ubuntu 10.04 64-bit operating system;
5.2 Local deployment 4GB of RAM; 8GB free HDD space; Dual core processor;
81
A native installation of Ubuntu on a physical machine grants the best performances, but usually it is easy to mess up with Cloud Foundry congurations and end with a not working system. Virtual Machines, created by Oracle VirtualBox, can be really useful as a starting point and can provide good features, like snapshots, at the cost of lower performances. The local development machine has been a Virtual Machine, started with the resource requirements written above. When the local VM is created, we can start deploying Cloud Foundry via cf nise installer, we will only need to issue this command using curl (command-line tool for transferring data using various protocols); in detail 5.1:
1 2 3 4
export CF_RELEASE_BRANCH=release-candidate bash < <(curl -s -k -B https://raw.github.com/yudai/ cf_nise_installer/${INSTALLER_BRANCH:-master}/ local/bootstrap.sh) Listing 5.1: Local Installation Pay attention to the variable CF RELEASE BRANCH, its value will declare which cf-release will be downloaded during the process. Cloud Foundry is an open project and, as open source, many developers contribute to it. It is continuously updated and it changes on a daily base. Release-candidate versions, under the git branch of the same name, are released each week and contain the latest working commits, avoiding the issues caused by the most recent updates. A strong recommendation is to stick to the release candidate branch, avoiding many problems introduced by commits that did not pass all the tests. Once the Nise installer nishes, all the working Cloud Foundry source code will be found under the folder /var/vcap/data/packages, while the jobs controlled by monit will be found under /var/vcap/jobs.
82
1 2
cf target http://api.192.168.17.129.xip.io cf login Listing 5.2: CLI target and login One of the best ways to understand if our local installation is working, is to use Cloud Foundry CLI and start a sample application. We can install Cloud Foundry CLI easily with gem install cf (Ruby is required). Then we will need to target our Cloud Foundry Cloud Controller (APIs endpoint) and login. To do so 5.2, change the Virtual Machine IP accordingly, leaving xip.io at the end of the domain. Default username and password are admin and c1oudc0w. To understand better all properties set and injected into our local jobs, we can take a look at the local deployment le used by the Nise installer, under the folder $HOME/cf nise installer/manifests/deploy.yml. That le will contain all the congurations set up in our local deployment, passwords and credential to access the local PostgreSQL database managed by the Cloud Controller and UAA. Once we are logged and have received the authorization token from the login process, we can start our rst sample application; but before doing that we need to create an organization and a space, both meant to manage the separation between users and developed applications. Simply we need only to create a new organization and space 5.3, and understand the output from the Cloud Controller:
1 2 3
cf create-org my-sample-org cf create-space development cf switch-space development Listing 5.3: CLI Spaces and Organizations After this initial set up, we are ready to go and deploy a rst Cloud Foundry application. We need just to go into the folder $HOME/cf nise installer/hello and push a simple Hello World Ruby sinatra application to the PaaS. To obtain a running application, simply issue a cf push command from the folder containing the source code of the sample application. The Cloud Foundry Command Line client will ask for:
5.3 Distributed Deployment 1. Name of the application to be pushed;
83
2. Number of instances desired and memory for each instance (this is misleading, Cloud Foundry calls instances the Warden containers started for an application. This value has not to be confused with the instance meaning on a Cloud Platform like Openstack); 3. Subdomain for the application; 4. Services for the application. If everything works, usually it is easy to forget about conguration les locally while pushing an application, we will end seeing the staging process of our application and the start process. The nal output should be something like 5.4:
1 2 3 4
Checking status of app HelloWorld... 1 of 1 instances running (1 running) Push successful! App HelloWorld available at http://helloworld.192.168.17.129.xip.io Listing 5.4: CLI Pushed Application Now, we need only to open a Web Browser and go straight to that page, to see our application working.
5.3
Distributed Deployment
When it comes to install Cloud Foundry on a IaaS we need to use a full BOSH version, with all the integrated CPIs. We need to install BOSH rst, then we can achieve a Cloud Foundry deployment working on a Cloud Provider. Before diving completely in the installation path, providing all the details to install the distributed version, we need to introduce Micro BOSH: a lighter and compact tool.
84
5.3.1
Micro BOSH
Micro BOSH is simply a complete BOSH system within a single Virtual Machine, Figure 5.1 represents the idea. This single instance has inside all the main BOSH components, required to deploy something meant to be deployed with BOSH. A user, starting from a BOSH CLI gem installed on a host machine, can install a Micro BOSH by providing a deployment manifest le for BOSH itself. This process will end with a single cloud instance running: BOSH Director, BOSH DB, BOSH Blobstore, BOSH Health Monitor, NATS message bus and providing all the IaaS interfaces. To install a cloud distribution of BOSH, we rst need to have a running Micro BOSH. Since our main goal is to have a Cloud Foundry distribution running on
Figure 5.1: MicroBosh a Cloud Environment, we will not see here how to deploy a BOSH cloud distribution, as it is not needed.
5.3.2
The Steps Involved
Everything starts from the BOSH CLI. The rst step needed to install Micro BOSH and Cloud Foundry, is the installation of the BOSH CLI rubygem. This CLI comes with a lot of plugins and requires many BOSH dependencies.
5.3 Distributed Deployment
85
When we install the CLI we will end having installed stemcell, CPI, Agent Client and Micro components too. These additional parts of the BOSH suite, will let us deploy BOSH and issue commands with the future BOSH Director. Once the BOSH CLI is installed, we can start installing Micro BOSH. We need only a bosh stemcell and a BOSH deployment le, in order to help our CLI during the process of deployment[41]. Once the Micro BOSH deployment manifest is specied and the stemcell downloaded, the BOSH CLI will: 1. Verify the presence of a valid BOSH stemcell locally; 2. Unpack the archive containing the stemcell and upload the image to the cloud image store, using the right CPI; 3. Create a new VM with the image uploaded to the IaaS Cloud and wait for the BOSH Agent; 4. Create a volume, used by future Micro BOSH as a database and blobstore; 5. Issue a task for the Agent, specifying all the BOSH components required; 6. Start the new jobs within the VM previously created; 7. Target the brand new BOSH Director. This process will let us have a little BOSH installation on our Cloud, ready to receive now our Cloud Foundry installation. Now that we have a BOSH deployer working we can checkout the Cloud Foundry release code from git. Once that is done, we need only to start the built-in update script that will update all the submodules and the Cloud Foundry components to the latest version, according to the latest release YAML le provided under release folder. After this step, we will use BOSH CLI to create a local Cloud Foundry release, a BOSH release that will contain all the denition of packages, jobs and dependencies ready to be stored locally on our machine, then uploaded to BOSH. This step is mandatory if we bring changes to the Cloud Foundry source code and we want to test
86
them in a Cloud Distribution, where our components might be replicated and strongly isolated between several instances. Once the process nishes we will end with all the les stored locally and a dev release le, that will tell BOSH CLI itself where to nd the jobs and packages required from our release. Now we are ready to talk with our previously created Micro BOSH and to do so we will use BOSH CLI and ask to upload our release to the BOSH system. Our deployer, after this process, will know all our jobs, packages available within our release; ready to be deployed and read from our deployment manifest le. The BOSH CLI will: 1. Find all the requested packages, jobs le locally; 2. Build a tarball with our release jobs and packages; 3. Upload the compressed archive to Micro BOSH instance, where the BOSH Director is running; 4. On the other side, BOSH Director side, the coordinator will extract the release, verifying the manifest, save packages and jobs denitions. At this point we have our release, containing all the denitions, uploaded to BOSH; but not all the packages are built. We need to remember that when we describe a package, we only specify a collection of source code that it has not been compiled yet; we do not have binaries ready to run. The next step will be the real deployment process. Starting from a BOSH manifest le, the Director will receive the settings and desired placement of our jobs. Once we upload the deployment le on the Micro BOSH instance, we have only to issue the deploy command on our BOSH CLI in order to ask our Director to start the whole process. The BOSH coordinator, will: 1. Read and check the deployment manifest le; 2. Control that the requested resources will be dened in our Cloud Provider; 3. Analyze which packages are requested from our deployment and which of them need a compiling process; if we are lucky, we could have some packages already compiled from a previous deployment;
5.3 Distributed Deployment 4. Start new instances, responsible to handle the compiling process;
87
5. Assign the compiling job to the BOSH Agents within those instances; 6. Once, all the requested packages will be stored back into the database, the Director will create the missing VMs (requested by the manifest le); 7. Bind the job to the started VMs, will issue an Apply message to the BOSH Agent listening on the spawned instance; 8. Receive the status of all the requested jobs, contacting the monit daemons on the instances. This is usually what happens during a correct deployment, however it is pretty common getting some issues from the Cloud Provider manager. It is pretty recommended to turn o API rate limits and quota limitation before starting the deployment process, especially in development environments; so BOSH will not face problems during the calls to the CPIs.
5.3.3
Distributed Development Environment
Installing Cloud Foundry on a cloud provider such as OpenStack takes time and usually triggers many issues: many logic layers, congurations and mistakes are involved. First of all we need a IaaS platform, we need to congure it and then deploy Cloud Foundry on the top of it. Speaking about dierent Cloud Providers, our choice fell on OpenStack. It has been installed on a single node, a server with this resources: Two Intel Xeon X5570, for a total number of 16 processors; 48 GB of RAM; 500 GB Hard Drive. With these capabilities Openstack can run awlessly on a single node, without requiring a multi-node installation. We need to remember that we will ask our IaaS to provide several instances, generating a high load on our infrastructure. The host is running Ubuntu 12.04 64-bit operating system, with rbenv to manage Ruby installation.
88
5.3.4
OpenStack
Devstack, a set of shell scripts to build complete OpenStack development environments, helped us during the installation process of the IaaS ; it has speeded up the installation of OpenStack and left us free to deploy quickly our PaaS. To use Devstack and deploy Openstack we need to simply clone Devstack repository and checkout the Grizzly version; in order to do so 5.5:
1 2
git clone https://github.com/openstack-dev/devstack.git git checkout stable/grizzly Listing 5.5: Devstack download and Grizzly checkout Cloud Foundry is compatible with Openstack, however only with Grizzly and Folsom versions, respectively 2013.1 and 2012.2. Openstack Havana, latest version 2013.2, could not be used due to incompatibility and instability. Then we need to write a localrc le, containing all the congurations we wish to set in Openstack; this is the working conguration 5.6:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
FIXED_RANGE=10.1.0.0/24 NETWORK_GATEWAY=10.1.0.1 FLOATING_RANGE=192.168.1.224/27 FIXED_NETWORK_SIZE=256 FLAT_INTERFACE=eth0 HOST_IP=9.47.226.200 API_RATE_LIMIT=False VOLUME_BACKING_FILE_SIZE=400000M DATABASE_PASSWORD=passw0rd ADMIN_PASSWORD=passw0rd MYSQL_PASSWORD=passw0rd SERVICE_PASSWORD=passw0rd SERVICE_TOKEN=passw0rd RABBIT_PASSWORD=passw0rd # Compute Service NOVA_BRANCH=stable/grizzly # Volume Service
5.3 Distributed Deployment CINDER_BRANCH=stable/grizzly # Image Service GLANCE_BRANCH=stable/grizzly # Web UI (Dashboard) HORIZON_BRANCH=stable/grizzly # Auth Services KEYSTONE_BRANCH=stable/grizzly # Quantum (Network) service QUANTUM_BRANCH=stable/grizzly SWIFT_BRANCH=stable/grizzly disable_service n-net enable_service q-svc enable_service q-agt enable_service q-dhcp enable_service q-l3 enable_service q-meta enable_service quantum Listing 5.6: Localrc conguration This le denes: IP range for all the instances; Floating IP range for all the intances; Openstack API rate limit; Persistent le for all Openstack Volumes; Password for all the services; Specic branch version for all the Openstack components; Quantum network services, against Nova Network service.
89
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
90
To obtain a working Cloud IaaS, it is really important to realize how these settings are strongly related to our BOSH deployment. Inside the BOSH manifest we will set manually instances IP, use Openstack volumes to provide persistence to some of our instances (Micro BOSH) and dene Openstack security groups. BOSH will send a high rate of API REST call to Openstack, so it is really important to disable during the installation the API RATE LIMIT, in order to avoid errors from the BOSH CPIs interfaces. In addition another place where we must change and disable the API rate limit is inside api-paste.ini, instructions provided here [42]. After we started the ./stack.sh script, we will end with a running Openstack system accessible through its Horizon dashboard and ready to create new instances. One of the fastest way to know if our Openstack is working, is to start an instance from a uploaded image. We chose to add a Ubuntu 12.04 64-bit cloud image to the system and issued a start command. Everything should work ne; sometimes, with Quantum Network enabled, instances might not access the internet, to solve this issue simply we used this iptables command: sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE (if the network interface connected to the outside world is eth0). Nova network, a simpler Openstack network provided, initially caused many issues: during the BOSH deployment the switches on the network were ooded by very high number of DHCP requestes. That issue was addressed and the solution was found in Quantum network, that let us specify manually each settings and parameter for the virtual networks in Openstack. The Openstack Horizon dashboard is really a good conguration instrument to start from when setting up network properties. In addition we will need to set the correct Openstack avors, increase Project Quota limits and add the right Security Groups; as our BOSH manifest will request specic ones for each instance we are going to spawn. It is really important to start matching Openstack settings with BOSH requirements from now, as many issues and problems, usually sneaky, could be avoided.
91
5.3.5
Deploying Micro BOSH
Openstack is ready to launch all the instances we want, and now it is BOSH turn. Before deploying Cloud Foundry, we must have a BOSH system running; it is fundamental as we cannot install our PaaS without BOSH. To begin with the long process, BOSH CLI (the client side of BOSH) must be installed in our host machine or in another machine that will be used to deploy Cloud Foundry. All the steps covered here, can be found on Cloud Foundry docs[43] updated to the latest release; we will cover only the most important steps. Once our IaaS is ready to launch new instances, it is time to deploy a little BOSH, Micro BOSH, to start the Cloud Foundry deployment. Micro BOSH can be deployed using a BOSH manifest le and a stemcell. But a question could rise, we have a BOSH Manifest le to deploy Micro BOSH, but there is not any BOSH Director available, who is going to deploy our system? The BOSH CLI itself. This BOSH client is really powerful, it includes the basic CPI APIs and capabilities to manage a simple stemcell. The BOSH CLI itself is capable of deploying a Micro BOSH release, by using a ruby gem plugin pre built in. Once our deployment le is ready, we need only to download the right stemcell for our IaaS, by issuing 5.7:
1
bosh public stemcells Listing 5.7: Public BOSH stemcells The output of this command will list all the stemcell, archives containing os images, ready for several Cloud Providers. There is only one for Openstack: bosh-stemcell-XXXX-openstack-kvm-ubuntu.tgz, with a release number coming whit it. Usually the greater the number is, the more we should use the most updated Cloud Foundry release. This is the basic stemcell, that will be used soon to deploy all the instances requested by our BOSH Manifest deployment le. BOSH CLI is also capable of deploying a standard BOSH deployment, a keyword micro let us distinguish between a regular BOSH deployment and
92
a specialized Micro BOSH deployment. Once our stemcell and deployment les are ready, we need only to start this two commands 5.8:
1 2
bosh micro deployment ./path/to/deployment-file.yml bosh micro deploy ./path/to/bosh-stemcell-XXXX-kvm-ubuntu.tgz Listing 5.8: Micro BOSH deploy
5.3.6
Deploying a distributed Cloud Foundry
Finally, once our little BOSH is running, we can deploy the Cloud Foundry release. Two methods may be adopted: we could use a pre-built CF-release YAML le or we can build our release le from the scratch. While the rst comes really useful when we want to deploy a certain release of Cloud Foundry without worrying about all the struggle of the source code, the latter approach is fundamental to test our code changes in a real deployment on a IaaS platform. To start with the process, we need the Cloud Foundry release les, and get them via git 5.9:
1 2 3
git clone https://github.com/cloudfoundry/cf-release cd cf-release git checkout release-candidate Listing 5.9: Cloud Foundry CF release One of the most important part of this process, is the checkout command we issue on the local git repository. Cloud Foundry is really growing and evolving on a daily base, it is really easy to come across an unstable release. Usually, the Cloud Foundry Pivotal engineers, grant a almost stable release each week on the release-candidate branch. It is a good suggestion to stick to release-candidate releases. To nd the latest release BOSH manifest le, we just need to take a look under releases folder and to use the most recent one, just make BOSH point to that le by issuing bosh upload releases/cfXXX.yml. But if we want to change the source code or we want to test our changes on a distributed version of Cloud Foundry, we need to create a new release: a development one.
93
After we have downloaded the CF release code and checked out the releasecandidate branch, is important to update the local source les and obtain the code of the components. To do that, two commands are required 5.10:
1 2
bundle update ./update Listing 5.10: Download and update of the source code The second command will update the git submodules and download all the source code components, specied under the src folder. After this process we are able to integrate our changes to the source code and test them in a BOSH deployment; but we do not have any Cloud Foundry release YAML le that is pointing our changes. We must tell BOSH to create a new and update release, to integrate all our changes in a local development BOSH release le; the command to do that is bosh create release (it must be launched inside the Cloud Foundry release folder, as BOSH CLI will check jobs and packages folders). Well if we have a BOSH system running, a Cloud Foundry release uploaded to the BOSH Coordinator instance and a stemcell saved on BOSH, we are ready, nally, to start the real deployment and get a nal running distributed installation like the one displayed in Figure 5.2. One of the most important steps is the generation of a BOSH deployment manifest le. The lack of documentation and examples, can lead typically to misunderstanding and wrong settings. This is due to two dierent main problems: the Cloud Foundry release code change very quickly (each two week there is a new release) and the ocial documentation is constantly out of date. A good starting point is to keep track of all the recent commits to the cf-release repository on github, but sometimes is not enough. Another good suggestion is to take a look at the single job spec les and templates, just to have an idea of which parameters are really mandatory or not. Sometimes it is really hard to create a working deployment le, a nal help could be nd on Google Group Cloud Foundry Development forum [44]. Now, we will see the most important parts of our deployment manifest. First of all in order to write a proper BOSH manifest le, it is important
94
to have access and know Openstack network settings, BOSH Director UUID and BOSH releases. In the rst part of the manifest le it is possible to
Figure 5.2: Final BOSH deployment specify the name of our deployment and the BOSH UUID, like we see here 5.11:
1 2
name: MyCloudFoundryDeployment149 director_uuid: e6276249-d138-482d-8e54-147998ba0972 Listing 5.11: BOSH deployment manifest: Name and UUID To obtain the director uuid value, we need to login into BOSH via BOSH CLI with an administrator account, and run a bosh status command to get the right information from the system. As we discussed before, it is important to declare which release our deployment le is pointing; immediately after the rst declarations, the releases eld appears 5.12:
1 2 3
releases: - name: MyBuiltV149-release version: 149.1-dev Listing 5.12: BOSH deployment manifest: Releases
95
The release, requested under this section of the deployment le, must be present and available on BOSH. The one dened here is a built one, from the source and subsequently uploaded to our deployer; it is possible to use a pre-built release specied under cf-release/releases folder. The next to appear is the section about the compilation and update workers, specic cloud instances started by BOSH Director to compile the packages included in the release requested. This is a sample of what can be written in the deployment le. A low number of workers and max in ight parameter is really suggested on slow systems or on an Openstack single node installation 5.13:
1 2 3 4 5 6 7 8 9 10 11 12 13
compilation: workers: 1 network: default reuse_compilation_vms: true cloud_properties: instance_type: m1.microbosh update: canaries: 1 canary_watch_time: 30000-300000 update_watch_time: 30000-300000 max_in_flight: 1 max_errors: 1 Listing 5.13: BOSH deployment manifest: Compilation Here it is specied how many parallel workers we want to compile the BOSH releases packages, while the canary and update watch times are the time the BOSH Director, once it has established a connection with the BOSH Agent, must wait until a job is deployed. Already here, at this point of the manifest le, it is possible to distinguish some Cloud Provider attributes, such as the chance to reuse compilation vms (to deploy later specic jobs on them), the cloud instance type kind associated to the worker virtual machine and a network setting. Those parts will be covered in the next section of the
96
deployment le. The next to come, is the cloud provider specic network conguration 5.14:
1 2 3 4 5 6 7 8 9 10 11 12 13
networks: - name: default subnets: - range: 10.1.0.0/24 gateway: 10.1.0.1 static: - 10.1.0.10 - 10.1.0.50 cloud_properties: net_id: 3b5ae455-e5c7-4606-a410-3ebc4cc02eed security_groups: - default - cf-public - cf-private Listing 5.14: BOSH deployment manifest: Network settings Here is listed only a part of the conguration. One of the most important sections are the range of static ips, the net id and the security groups. The rst let choose the range of static IPs bound to the instances started and associated to specic jobs, while the network identication is a mandatory value requested by BOSH in order to link our deployment network settings to a currently virtual network active on Openstack and managed by Quantum. Security groups can be specied here and are really important: without them we can deploy Cloud Foundry, but cannot talk with the services from an external network. It is here were we can set up the right groups and open the ports we need on Openstack accordingly. Immediately after the network specic settings, it is possible to dene the resource pools, or rather the real number of virtual machines required by our deployment le and started from the BOSH Director 5.15:
1 2 3
resource_pools: - name: small network: default
5.3 Distributed Deployment size: 7 stemcell: name: bosh-openstack-kvm-ubuntu version: latest cloud_properties: instance_type: m1.small Listing 5.15: BOSH deployment manifest: Resource Pools
97
4 5 6 7 8 9
These values represent the types of instances we want, in terms of image to use (it is recommended a stemcell image), the avor desired, the network associated and the number. It is important to recall that BOSH let us run multiple jobs within the same cloud instance. A sucient number of resources it is really important and should be chosen according to the number of jobs per VM.
1 2 3 4 5 6 7 8 9 10 11
jobs: - name: nats template: - nats instances: 1 resource_pool: small networks: - name: default default: [dns, gateway] static_ips: - 10.1.0.10 Listing 5.16: BOSH deployment manifest: Job conguration Finally, we cover the jobs and packages conguration. Below there is a briey sample of one of the jobs congured to be deployed by BOSH; the NATS job conguration can be written like this in the listing 5.16.
1 2 3
properties: nats: machines: [10.1.0.10]
98 address: 10.1.0.10 port: 4222 user: nats password: p4zzw0rd authorization_timeout: 20
4 5 6 7 8
Listing 5.17: BOSH deployment manifest: Job Properties As we can see, we can dene how many instances running that job we want, the kind of resource pool we require and, most important, the template to be associated. The template value is directly dened under the properties section of the same le, in the listing 5.17. The NATS properties here attached are a sample and represent the attributes that we can set and dened in the spec le of the job denition. Obviously, we can set up only values mentioned in the job denition; if the values do not match, the BOSH Director will refuse to deploy our manifest le. The output of our deployment task will be something similar to this 5.18:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Director task 5 Preparing deployment binding deployment (00:00:00) binding releases (00:00:00) binding existing deployment (00:00:00) binding resource pools (00:00:00) binding stemcells (00:00:00) binding templates (00:00:00) binding properties (00:00:00) binding unallocated VMs (00:00:00) binding instance networks (00:00:00) Done 9/9 00:00:00 Preparing package compilation Compiling packages
5.3 Distributed Deployment libyaml/2 (00:01:34) [...] cloud_controller_ng/27.1-dev (00:04:51) Done 11/11 00:09:29 Creating bound missing VMs Done 8/8 00:04:43 Binding instance VMs syslog_aggregator/0 (00:00:01) postgres/0 (00:00:01) nats/0 (00:00:01) uaa/0 (00:00:01) cloud_controller/0 (00:00:01) router/0 (00:00:01) health_manager/0 (00:00:01) dea/0 (00:00:02) Done 8/8 00:00:04 Updating job nats nats/0 (canary) (00:00:40) Done 1/1 00:00:40 [...] Task 5 done Started 2013-12-05 03:17:43 UTC Finished 2013-12-05 03:40:20 UTC Duration 00:22:37 Deployed DeploymentManifest.yml to MicroBosh Listing 5.18: BOSH Deployment output
99
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
100
Both of the two approaches has been adopted during the study of Cloud Foundry, both distributed and local installation has been congured and used. Now, the PaaS is running and ready to accept our applications, however Not all that glitters is gold; we are going to discuss some weaknesses of Cloud Foundry and some suggested improvements.
Chapter 6 Application Isolation in Cloud Foundry

Cloud Foundry is growing quickly and, just now, we are taking part in one of the youngest project, with a great momentum. The architecture is solid and the components are well dened, however the ensemble lacks of some features due to its youth. The open PaaS guarantees ease of development at the cost sometimes of a poor exibility. Some capabilities can still be extended and it is possible to integrate new services and deliver additional properties; but policies, extensibility and dynamism are still missing from the architecture. One of the most important characteristics in an execution environment such as Cloud Foundry, it is the application isolation; as we have seen in the previous chapters all the applications, whether or not part of the same organization or user, run inside the same droplet execution node, separated by a container management. This feature is interesting and useful when we do want to save and optimize the resource usage, however when we want to guarantee a strong separation between applications of dierent tenants or constant performances paired with a billing system, Cloud Foundry does not arrange a real solution nor supports them.
6.1
Isolation
The isolation can be referred as a set of dierent hardware and software technologies designed to protect and ensure each process or application resources at run-time, against other processes or applications running on the
102
Application Isolation in Cloud Foundry
same operating system or environment. Whether serving external customers or internal business units, cloud platforms (such as PaaS ) typically allow multiple users, or tenants, to share the same physical server and network infrastructure, as well as use common platform services. Because they rely on shared infrastructures, however, these services face two key, related issues: Multi-tenant interference and unfairness: tenants simultaneously accessing shared services contend for resources and degrade performance [45]; Variable and unpredictable performance: tenants often experience signicant performance variations, e.g., in response time or throughput, even when they can achieve their desired mean rate [46]. Benign, or perhaps malicious interference between tenants can cause signicant performance degradation that hurts performance of applications as well; in addition the system performance is unpredictable if a co-located tenant tries to grab resources (CPU, disk, IO) disproportionately: this is why we need a separation between the dierent tenants and applications. With the word isolation we may cover dierent aspects, because several levels and kind of isolation are generally provided when we want to achieve separation during the execution of a software or during the resource management. Isolation is concerned at implementation, design and execution time, and usually is obtained through the concept of process. Although operating system processes have well-dened isolation boundaries and inter-process communications mechanisms, current operating systems sometimes provide insucient mechanisms[47] for isolating components of a particular application from each other. A badly-written or misbehaving component can easily damage the containing host, and other components, either accidentally or deliberately. Many approaches are available, but typically it is possible to distinguish between three levels of isolation [48] and how that is achieved; the techniques are: Virtualization: it relies on using virtual machine monitor and virtual machines;
6.2 Virtualization Process groups: it requires a control of group of processes;
103
Containers: through the division of the system into independent blocks it is possible to create jails such as containers. Now we are going to introduce these main concept and discuss how they can achieve dierent level of isolation, the dierences and benets.
6.2
Virtualization
Between the isolation obtained from virtual machines and the isolation achieved with the other two techniques, there is a strong dierence: the rst can rely on hardware support (Intel VT-x or AMD-V) and mainly is managed by a hypervisor (also known as Virtual Machine Monitor); while the two latter rely on a virtualization at the operating system level, based on kernel features [49]. A Hypervisor virtualization can be summarized as Running a full operating
Figure 6.1: Xen and KVM virtualization system on the top of a host operating system, this is very eective for server consolidation when we have in mind consolidating existing workloads into a virtualized environment. Performance, however, is going to take a slight hit when running on a hypervisor. We introduce an extra layer of abstraction between the operating system and hardware with hypervisor virtualization.
104
Also it is needed a complete OS stack for each guest when using hypervisor virtualization, from the kernel to libraries, applications, and so on. An additional drawback is that we must have additional storage overhead and memory use from running OSes entirely separate. A full virtualized system gets its own set of resources allocated to it, and does minimal sharing. We get more isolation, but it is much heavier as requires more resources. That being said, performance is less of a factor today than a few years ago. The various hypervisor solutions have been pretty well-optimized for heavy loads, and continue to improve at a good clip. Some technologies and realizations are Xen [50], KVM [51] and QEMU [52], as shown in Figure 6.1.
6.3
Process Groups - Control Groups
Through the usage of groups and Linux Kernel specic feature is possible to limit, account and isolate resource usage like CPU, memory and disk I/O; one of the most common tools is Control Groups (CGroups). CGroups is at the bottom of the picture 6.4 and represents the linux kernel extensions used to obtain the isolation of resources by providing a mechanism for aggregating and partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour [54]. There are multiple eorts to provide process aggregations in the Linux kernel, mainly for resource-tracking purposes. Such eorts include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server namespaces. These all require the basic notion of a grouping or partitioning of processes, with newly forked processes ending up in the same group (cgroup) as their parent process. The kernel cgroup patch provides the minimum essential kernel mechanisms required to eciently implement such groups. It provides hooks for specic subsystems such as cpusets to provide additional behaviour as desired.
6.3.1
Hierarchy and Subsystems
Cgroups are organized hierarchically, like processes, and child cgroups inherit some of the attributes of their parents. However, there are dierences from
6.3 Process Groups - Control Groups
105
the linux process model. While in the linux process model all processes are child processes of a common parent: the init process, which is executed by the kernel at boot time and starts other processes (which may in turn start child processes of their own); with cgroups it is possible to have dierent hierarchies. If the Linux process model is a single tree of processes, then the cgroup model is one or more separate, unconnected trees of tasks (i.e. processes). Multiple separate hierarchies of cgroups are necessary because
Figure 6.2: A single hierarchy can have one or more subsystems attached each hierarchy is attached to one or more subsystems. A subsystem represents a single resource, such as CPU time or memory. We can have several subsystems, it depends on the version of the kernel and linux distribution; but usually the most common one are cpuset, memory and net cls [55]. A single hierarchy can have one or more subsystems attached to it. As a consequence, the cpu and memory subsystems (or any number of subsystems) can be attached to a single hierarchy, as long as each one is not attached to any other hierarchy which has any other subsystems attached to it already as displayed in Figure 6.2. Any single subsystem (such as cpu) cannot be attached to more than one hierarchy if one of those hierarchies has a dierent subsystem attached to it already. As a consequence, the cpu subsystem can never be attached to two dierent hierarchies if one of those hierarchies already has the memory subsystem attached to it. However, a single subsystem can be attached to two hierarchies if both of those hierarchies have only
106
that subsystem attached, as shown in Figure 6.3. Each time a new hierarchy is created on the systems, all tasks on the system are initially members of the default cgroup of that hierarchy, which is known as the root cgroup. For any single hierarchy we create, each task on the system can be a member of exactly one cgroup in that hierarchy. A single task may be in multiple cgroups, as long as each of those cgroups is in a dierent hierarchy. As soon as a task becomes a member of a second cgroup in the same hierarchy, it is removed from the rst cgroup in that hierarchy. At no time is a task ever in two dierent cgroups in the same hierarchy. As a consequence for example: if the cpu and memory subsystems
Figure 6.3: Attaching multiple subsystems
are attached to a hierarchy named cpu mem cg, and the net cls subsystem is attached to a hierarchy named net, then a running process could be a member of any one cgroup in cpu mem cg, and any one cgroup in net. The cgroup in cpu mem cg, that the process is a member of, might restrict its CPU time to half of that allotted to other processes, and limit its memory usage. Additionally, the cgroup in net, that it is a member of, might limit its transmission rate. In addition any process (task) on the system which forks itself creates a child task, then a child task automatically inherits the cgroup membership of its parent but can be moved to dierent cgroups as needed as once forked, the parent and child processes are completely independent.
6.3 Process Groups - Control Groups
107
6.3.2
An example of usage
In the beginning all prioritization-type switches are balanced, that is resources are distributed equally among cgroups on the same level. In most cases this alone is enough to stop excessive usage of shared resources like CPU and disk IO. There are times however, when equal sharing is not enough: we want to increase or decrease resource usage of certain groups for certain subsystems. First of all to obtain that we need to attach our process to the desired group; we can do it easily by issuing 6.1, where the /sys/fs/cgroup is the cgroup mount point.:
1
echo $PID > /sys/fs/cgroup/my_group/tasks Listing 6.1: Attaching process Now that we have a process in a group we can set some limits. For example, when comes to limit memory, it is possible to limit how much memory a certain process can use, and that is what memory.limit in bytes and memory.memsw.limit in bytes are for. memory.limit in bytes limits the total memory usage of a group including le cache, whereas memory.memsw.limit in bytes limits the amount of memory and swap a group can use. Note that we do not need to specify the amount in bytes, but it is possible to use the shorthand multipliers k or K for kilobytes, m or M for Megabytes, and g or G for Gigabytes. To adjust the limit, we can simply echo the desired value into the group le 6.2:
echo "1024M" > $CGROUP_PATH/mygroup/memory.limit_in_bytes Listing 6.2: Limiting a group Cgroups can also keep track of the memory pages used by each group, because each page is charged to a group when we associate the task to the group. With this functionality in mind and by using control groups, we can share certain pages between tasks or individual group limits, but also kill out of memory requests.
108
6.4
Containers a lightweight approach
The other solution is container-based isolation, also called operating system virtualization, and sometimes really not seen as virtualization at all. Instead of trying to run an entire guest OS, container virtualization isolates the guests, but does not try to virtualize the hardware. Instead, you have containers for each virtual environment. With container-based technologies, a patched kernel and user tools to run the virtual environments are needed. The kernel provides process isolation and performs resource management: this means that even though all the virtual machines are running under the same kernel, they eectively have their own lesystem, processes, memory, devices, etc. In addition container approach is limited to a single operating system. In Cloud Foundry both virtualization and containers management are used. While the IaaS provides isolation via virtualization, separating the dierent components of the PaaS ; the control of groups (containers handled by Warden) let the DEA node manage multiple applications within the same virtual machine and co-locate dierent softwares providing separation of the execution context. Not just because of potential breaches, but also because of resource optimization problems and easy accounting process, the isolation turns into one of the most important topic around the co-location of applications and services within a cloud system. Another aspect to have in mind is the separation of roles: when we refer to a container isolation we need to distinguish between dierent layers and roles. An isolation based on this kind of technology is also called as Lightweight virtualization due to the lesser connement between the executing applications. From a linux process point of view, a container should isolate or limit environments and resources such as the kernel, le system, network system, PID, memory, CPU and others. We can dierentiate the dierent layers and roles required to grant a process isolation without a hypervisor virtualization; starting from the lowest level closer to the kernel there are [53]:
6.4 Containers a lightweight approach
109
Figure 6.4: Lightweight virtualization layers the technology, most of the times a kernel extension, used to provide a sort of isolation or limit resources; the low level container management; the high level container management. These layers can be seen in the gure 6.4 and dierent components are involved to provide the complete management and administration of the lightweight isolation. With a hypervisor isolation we have more a strong isolation but slow performance, due to the overhead and the cost of resources while with a lightweight we can really co-locate many application into a single node, at the cost of a light isolation. Cloud Foundry sacrices a strong isolation and complete application separation of execution context to speed up start times and let the developers push faster their projects and application; that will end in a fast environment where we can test quickly our projects built with PaaS in mind, but will easily open some issues when we want to have a more solid nal environment and an easy and accurate management of resources.
110
6.4.1
LXC
LinuX Containers (LXC) is an operating system-level virtualization for running containers on a single host, a userspace interface for the Linux kernel containment features. Taking a look at the Figure 6.4, we can collocate it almost at bottom level, just above the kernel features. LXC provides these features, obviously not via a virtual machine, but rather via cgroups functionality and other linux kernel functionalities, through a powerful API and simple tools, it lets Linux users easily create and manage system or application containers [56]. The goal of LXC is to create an environment as close as possible as a standard Linux installation but without the need for a separate kernel, as shown in Figure 6.5. This allows us to create a
Figure 6.5: Container vs Virtualization large number of isolated sandboxes for running applications on a single host operating system. For Linux Containers to function correctly, several components are needed. The Linux kernel provides namespaces to ensure process isolation and cgroups to control the system management. SELinux is used to control separation between the host and the container and also between the individual containers. The Libvirt toolbox provides an interface for construction and management of containers. The kernel assigns system resources by creating separate namespaces for containers. Namespaces allow to create an abstraction of a particular global system resource and make it appear as a separated instance to processes within a namespace. Consequently, several containers can use the same resource simultaneously without creating a conict. The kernel uses cgroups to group processes for the purpose of system
111
resource management. Cgroups let you allocate CPU time, system memory, network bandwidth, or combinations of these among user-dened groups of tasks. A nal user can benet of several features using the containers: Security: by running key services within separate containers, you can ensure that a security aw in one will remain better isolated from affecting other services; Portability: LXC containers can be zipped up and moved to any other host with the same processor architecture; Limits: due to its use of Linux cgroups, LXC containers can be congured with limitations on resources. When running a large number of containers, this can ensure that the most important ones get rst priority. In addition LXC is more powerful than a plain cgroup, LXC integrates cgroups and provides command line tools to create, manage and model containers in a easy way. When it comes to use it, we can basically create a container via 6.3:
1
lxc-create -n name-of-test-container -t ubuntu Listing 6.3: Creating a container By issuing that command we will end with a container ready to be started and used. Under the hood LXC will download and cache all the required components, use the ubuntu template requested and give us the default username and password associated to the container. When a container is created, LXC downloads all the linux core packages, extract them, congure them, update the linux distribution, install additional packages (like openssh) and, at the end, the template will be ready. Once a template is created, is stored and ready to be used for other containers. When it is time to run the container, we need only to start it via lxc-start in order to obtain a working separate process running an isolated ubuntu or distro we previously downloaded. Providing the name of the container to lxc-info we can obtain the running state of our container and its PID;
112
while with lxc-console access the container running in the background. LXC will automatically congure the control groups, namespaces, network properties and lesystem isolation. While this automatism is really helpful for a nal user, from a developer point of view it could be opaque and weakly congurable. This is the reason why Cloud Foundry only in its youth opted for that implementation and changed to a more sophisticated and advanced solution like Warden.
6.4.2
Warden
Warden is the agent in charge to provide connement between dierent applications running within the same DEA instance and the current container orchestration service for Cloud Foundry, it includes some features that LXC oers and, at the same time, uses direct calls to make the most of linux kernel features, required to maintain the isolation between processes. Keeping in mind the gure 6.4, it is clear that Warden is more than a simple cgroup wrapper or a container manager; it is designed so that it is possible to implement multiple backends for dierent containerization technologies. Currently the only public backend that is implemented is linux cgroups, while in addition a Warden.NET backend for Iron Foundy is available on Windows, too. The projects primary goal is to provide a simple API for managing isolated environments: these isolated environments, or containers, can be limited in terms of CPU usage, memory usage, disk usage, and network access (as they rely on CGroups). When a DEA node is running, it comes along with a Warden service: this because while the DEA is in charge to handle the request from the Cloud Controller, Warden is liable to conduct the real isolation and the containers management.
6.4.2.1
Wardens Architecture
Warden itself is composed by three parts [57]:
113
warden-protocol: represents a set of classes for the communication between the warden-client and the warden-server. The available commands let the user create, destroy containers, stop processes inside a container, limit the disk quota and the memory associated, open or close ports for a selected container; warden-client: is the logic element that is instructed to provide communication between a user, or a DEA, and the warden-server. It exposes methods and APIs; warden: also known as warden-server, is the main component that achieves the isolation and manages the containers. Starting from now, when we talk about Warden we will refer implicitly to the server component. The containers are generated using control groups and namespaces, two tools integrated in the linux kernel: while the rst allows Warden to set limits for each container, the latter lets the processes created by Warden thinking to run in a separated environment. The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. One of the overall goals of namespaces is to support the implementation of containers, a tool for lightweight virtualization. Linux implements six dierent types of namespaces such as: Mount, UTS, IPC, PID, Network and User [58]. Regarding the network every container, spawned by Warden, is assigned a network interface which is one side of a virtual ethernet pair created on the host. The other side of the virtual ethernet pair is only visible on the host (from the root namespace). The pair is congured to use IPs in a small and static subnet. While talking about the lesystem each container can see: a lesystem that is created by stacking a read-only lesytem and a read-write lesystem. This concept is implemented by using aufs on Ubuntu versions from 10.04 up to 11.10, and overlayfs on Ubuntu 12.04. While the read-only lesystem contains the minimal set of Ubuntu packages and Warden specic modications
114
Figure 6.6: DEA and Warden interaction common to all containers, the read-write lesystem stores les overriding container specic settings when necessary. Because all writes are applied to the read-write lesystem, containers can share the same read-only base lesystem, while the read-write lesystem is created by formatting a large sparse le. Each container is identied by its handle, which is returned by Warden upon creating it. It is a hexadecimal representation of the IP address that is allocated for the container. Regardless of whether the backend providing the container functionality supports networking or not, an IP address will be allocated by Warden to identify a container. When a container is created and its handle is returned to the caller, it is immediately ready for use. Automatically all resources are allocated, the necessary processes are started and all rewalling tables are updated. If Warden is congured to clean up containers after activity, it will use the number of connections that have referenced the container as a metric to determine inactivity. If the number of connections referencing the container drops to zero, the container will automatically be destroyed after a precongured interval. If in the mean time the container is referenced again, this timer is cancelled. The container can be used by running arbitrary scripts, copying les in and out, modifying rewall rules and modifying resource limits. When a container is destroyed (either per user request, or automatically after being idle) Warden
115
rst kills all unprivileged processes running inside the container. These processes rst receive a TERM signal followed by a KILL if they have not exited after a couple of seconds. When these processes have terminated, the root of the containers process tree is sent a KILL. Once all resources the container used have been released, its les are removed and it is considered destroyed. 6.4.2.2 Warden realization and containers
Warden uses a line based JSON protocol to communicate with its clients, and does so over a Unix socket which is located at /tmp/warden.sock by default. Every command invocation is formatted as a JSON array, where the rst element is the command name and subsequent elements can be any JSON object, as specied according to the warden-protocol. When it comes to apply limit Warden uses cpu, cpuacct, devices and memory control groups susbsystems; but currently only the memory and disk limit can be set. This limit is specied in number of bytes and it is enforced using the control group associated with the container. When a container exceeds this limit, one or more of its processes are killed by the kernel. Typically a DEA component whenever receives a request of droplet start, create a warden-client, send a create message on the warden socket and stores the handle received in an internal hash. That handle will be essential to manage the life cycle of the started container, as Warden will only behave like a daemon without any decision-making capacity. Each time a request of a new container is received, the server will: copy a skeleton of the le system associated with the future container, mount cgroups subsystem individually, create specic rules on iptables, mount with overlayfs the le system, setup the namespaces IPC, NET, PID, UTS, create and set the cgroups limit.
116
Each container is logically represented by an instance object within the DEA component, obviously each instance has its own handle saved in a logical structure managed by the DEA. In the gure 6.6 we can see how the dierent components interact: when a create or start message is received by the node, the DEA will generate a new logic instance object and, with the help of a warden-client, will try to ask warden-server to create a new container. The green arrow represents the warden-protocol. When the creation process completes with success, a handle is returned directly to the DEA, which will store it for further communications. Usually the memory and disk quota limitation are accomplished before creating a new container, issuing direct commands through the warden servers socket.
6.4.3
Docker
Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sucient container that will run virtually anywhere, it can be represented as a LXC wrapper with additional features and a valid alternative to Warden. Looking at 6.4, Docker is simply a user-friendly interface to LXC. Again the isolation is achieved using namespaces, aufs and CGroups ; same tools Warden uses and part of LXC as well. This container manager is having a good momentum thanks to the ease of use and its inter-compatibility; in addition is a shipping container system for code, that enables any payload to be encapsulated in a container, that can be manipulated using standard operations and run consistently on virtually any hardware platform, as we can see in Figure 6.7. While Warden is pretty bound to Cloud Foundry and not user friendly at all, Docker can be used as stand-alone tool, as it provides real portable containers across several platforms supported by a standard format for containers and APIs. 6.4.3.1 Docker functioning
Whenever we start a new container, using a certain base template or image, Docker will execute these steps in order:
117
Figure 6.7: Docker features and Virtualization 1. Download the requested image from the Docker index or a local repository, 2. Create a new LXC container, 3. Allocate a dedicated le-system, 4. Mount a read-write layer, 5. Prepare a network interface, 6. Execute the process we want in that container; All these steps are performed transparently, without leaving the user worried about congurations or installations. Since one of the most important features is the ability to snapshot the distribution used, within a container, into a common image and store it in a local or remote repository, this means that it is really easy to deploy and share the same snapshots on other docker hosts, as our applications can be ported without any complicated conguration issues. Docker is capable of incremental changes, from a starting image (a feature totally lacking from Warden): it is possible to start with a base image (let us say an Ubuntu 12.04 release made for containers), then make some changes and commit the changes to create a separated image. This second image contains only the dierences from the base, it is not a complete copy; later
118
when we want to run the new image, we obviously need the base image and then deploy our snapshot containing all the dierences, a process done via a layered le system, thanks to aufs. Aufs merges the dierent layers together and we get what we want, and we just need to run it; there is not any limit, we can keep adding more and more images(layers) and Docker will keep only saving the dis. It is a dierential backup system, really close to the one used in many code management systems. Moreover Docker supports a repository, a hosted collection of tagged images that together create the le system for a container. It is possible to nd one or more repositories hosted on a registry, where a tag will represent an implicit or explicit host name for the repositories. Docker is not only a tool for creating and managing containers; it is also a tool for sharing. The Docker project provides a Central Registry to host public repositories, namespaced by user, and a Central Index which provides user authentication and search over all the public repositories. In addition it is possible to host a private registry and than create private repositories. This handy features can leverage completely the container based approach in several projects. Infact developers could use Docker for: Automating the packaging and deployment of applications; Creation of lightweight, private PaaS environments; Automated testing and continuous integration/deployment; Deploying and scaling web apps, databases and backend services.
6.4.4
Docker vs Warden
Docker has been taken into consideration by Cloud Foundry development team. Currently Warden is still used, but it is possible that in the next future, a new container manager will be developed or used. The opportunity to store, save snapshots of developer modications and the ability to port the Docker containers between several and completely dierent environment, make Docker a really good opponent. The main dierences between these two approaches can be summarized as:
119
Multiple processes per container: Warden allows multiple processes by default, while Docker only runs a single process within the container. However Docker can be congured to run a sort of a init style process that can launch multiple child processes; Bandwidth and Disk limits: while Warden can limit both, via other unix like tools dierent from cgroups, Docker cannot; Dynamic container conguration: Warden allows a lot of on-the-y container conguration changes. However Cloud Foundry does not use this feature of Warden and typically restarts an application in a new container when the memory limits change for example. Docker conguration is quite static and is not that available a way to change current limitations of a container while it is running; Events: while there is not any concept of events in Warden, Docker is able to oer events such as startup or shutdown by exposing APIs available via streaming or polling; System Images: as we have seen this is one of the greatest feature of Docker, thanks to the registry it is possible to have ready made images. This can really speed up creation time as droplets or self-contained images are no more required. Warden still uses a dierent approach, as each time a new application is pushed a new droplet is created from scratch. Local API and remote clients: Warden is pretty bound to Cloud Foundry and its instance manager, the DEA. Warden listens only on a unix local socket and does not expose any APIs externally. Docker on its side provides a good API support both locally and remotely via TCP sockets; OS support: Warden is compatible only with Ubuntu 10.04 and newer releases, but requires an old kernel version. Docker takes advantage of only linux kernel (greater than 3.8) and due to this fact is more compatible.
120
6.5
Risks of a Container Based isolation
Cloud Foundry provides a running agent for all the applications, the Droplet Execution Agent, typically a single virtual machine that host multiple, unrelated applications, which may run on behalf of independent organizations, as is common when a data center consolidates dierent instances. The applications in such a scenario have no need to share information, indeed, it is important they have no impact on each other. For this reason, hypervisors (and virtual machine monitors) heavily favor full isolation over sharing. However, when each virtual machine is running the same kernel and similar operating system distributions, the degree of isolation oered by hypervisors comes at the cost of eciency. A number of emerging usage scenarios, such as web/db/game hosting organizations or distributed hosting (Akamai, Amazon EC2), benet from virtualization techniques that isolate dierent groups of users and their applications from one another. From a security perspective, VM isolation connes the actions of a faulty, compromised, or malicious application to a particular VM, ensuring that a vulnerability in one service does not act as a stepping stone against other services or system resources; the same can not be said for a lightweight container system approach, because the same operating system is shared between application, seen as processes. However each time a container is generated, less warranties are guaranteed; more interferences may occur between the applications and resource isolation problems could raise. Containers and Virtual Machines can handle the isolation of applications, but the use of one technique implies some considerations, a brief comparison is displayed in the Figure 6.8. It is true that namespaces provide the rst, and most straightforward, form of isolation: processes running within a container cannot see, and even less aect, processes running in another container, or in the host system. Usually each container also gets its own network stack, meaning that a container does not get a privileged access to the sockets or interfaces of another container. Of course, if the host system is setup accordingly, containers can interact with each other through their respective network interfaces, just like they can interact with external hosts.
6.5 Risks of a Container Based isolation
121
From a network architecture point of view, all containers on a host are sitting on a bridge interface. This means that they are just like physical machines connected through a common Ethernet switch. The same can be achieved via the virtualization approach with a more demanding logical layer managed by the Virtual Machine Monitor. Control Groups ensure that each container gets its fair share of memory, CPU, disk I/O; and, more importantly, that a single container cannot bring the system down by exhausting one of those resources. However Warden does not limit CPU associated to a container, currently only disk space and memory; this can lead to unfair utilization of computational resources from a resource-intensive container running a badly programmed application. Furthermore CGroups do not play a role in preventing one container from accessing or aecting the data and processes of another container, they are essential to defend from some denial-of-service attacks. When it comes to fault or errors, if a container generates a kernel fault, all
Figure 6.8: Container and Virtual Machine comparison the running ones will be aected, causing a cascade of failures; while this can happen in a separated virtual machine, the fault can be restricted to the only aected instance without involving other separated virtual machines. Traditional virtualization techniques (as implemented by Xen, VMWare, KVM, etc.) are deemed to be more secure than containers, since they provide an extra level of isolation. A container can issue syscalls to the host kernel, while a
122
full VM can only issue hypercalls to the host hypervisor, which will generally have a much smaller surface of attack. In addition most of the CPUs nowadays (via Intel VT-x or AMD-V) provide hardware virtualization support and ring level isolation during the execution of virtual machines. In addition virtual machines get more exposure in production, and more scrutiny, as there are many providers selling virtual machines to the public. However, it has been pointed out that if a kernel vulnerability allows arbitrary code execution, it will probably allow to break out of a container, but not out of a virtual machine. No exploit has been crafted yet to demonstrate this, but it will certainly happen in the future, as especially with more and more containers in production they will become a more interesting target for a malicious user. On the other side critical kernel issues tend to be xed very quickly when they are discovered, but it is hard to maintain updated a deployment to the latest kernel patches, especially when we talk about a Cloud Foundry deployment, where all the applications run within virtual machines. There is another side to the coin: when an exploit or security hole is found in the kernel, the kernel needs to be updated and the machine rebooted. This can be translated in an update of the virtual machines images. Generally speaking with virtual machines is possible to run several operating system and in this way it is possible to increase the range of oer. It is in beta stage, but a Warden container manager .NET compatible exists and use that component on a Windows operating system virtual machine is not far from reality. Virtual Machines are more secure today, but containers are denitely catching up; containers are already easier to manage, and additionally are way faster when it comes to consider cold boot: a full virtualized system usually takes minutes to start while containers take seconds, and sometimes even less than a second. In addition the ability to concentrate more applications inside a single host, via containers, let us optimize the use of resources. As we have seen some drawbacks can rise when a lightweight isolation is adopted. This does not mean that Cloud Foundry needs to change completely, for certain purposes a container approach is more than reliable; how-
6.5 Risks of a Container Based isolation
123
ever we want to provide a choice and additional feature that is totally compatible with the current Cloud Foundry version. An additional placement selection can extend the exibility of the system, giving the user an improvement for those application that do require a dierent kind of isolation. Moreover with the proposal, we can ease the billing and metering process, as several tools are already available for many IaaS solutions. In the next chapter is explained which steps and what changes are required, in the Cloud Foundry project, to add the new specic additional behavior.
Chapter 7 Improving provided isolation

While we have seen which issues a lightweight isolation approach implies, this chapter is going to explain and discuss a proposed change: fully backward compatible and integrated with the current Cloud Foundry release: how we can leverage on virtualization layer and add support for a stronger isolation. Our proposal take advantage of an existing Cloud Foundry logic feature called stack, a pre built image associated to each DEA. The stack concept comprises a sort of labeling system for the dierent DEA nodes, letting us leveraging this feature and manage pool of instances.
7.1
The current simple isolation
The current level of isolation in Cloud Foundry relies on Warden containers. During a Cloud Foundry deployment we must set up DEA nodes, where our applications will run after the staging process, as shown in Figure 7.1. Everything works awlessly and supplies a great and fast experience for developers, who want to test quickly their application without worrying about the provisioning of resources. This simple and fast approach, based on the co-location of the applications, is extremely suitable for test or development environment, but it could lead to additional problems in production. First of all, Warden currently supports only memory limitations and disk limitations; the cpu cannot be limited. If many applications are placed within the same virtual machine host and most of them have a normal usage of CPU
126
Improving provided isolation
Figure 7.1: DEAs and Applications
except for one, we could imagine a scenario. While the CPU aggressive application could run without issues, the others, running on the shared machines, could suer of that behavior and run slowly or totally receive an unfair time slice of CPU usage. In addition Cloud Foundry does not provide any accounting system associated with the direct resource utilization. There are limits for the space and number of applications per space, but no application monitoring system associated with an accounting management is present yet. Considering how the isolation is handled via Warden container, adding a monitoring system is pretty complicated without disrupting the current implementation or writing from scratch the code. This is due to the fact that when an application is started, the DEA node sees only a droplet, an instance guid, totally meaningless for accounting purposes as there is not any association between the running instances of an application and the space to which belong; moreover each new droplet is launched in a separated Warden container that is represented by a user id, gradually increased, and not associated with the real name or organization of the application on the database. If a cloud provider wanted to monitor the usage of resources and start a billing system to account the amount of resources associated with each instance, it would be a problem, as all the instances of the same or, worse, several other applications are running in the same virtual machines without any way to keep track of the dierent resource usage. Another issue, in ad-
7.2 Where to hook up a virtualization isolation: the Stack
127
dition, could be the resource limitation among dierent users, as introducing thresholds based on users id would be totally unexpected and would imply a total distortion of the current Cloud Foundry architecture. As we have seen, Cloud Foundry manages the users via the concepts of spaces and organizations, but these partially unclear concepts are not totally associated with the placement of the instances. It is true that all the instances of many applications are isolated by the lightweight container approach, but there is not any separation or distinction between application belonging to dierent organization or spaces. We can have separated spaces and well classied organization, but all the applications related to all of these conceptual constructs, will run within the same DEA node or nodes at the end. All these concept, in the running environment, are totally mixed and not used to keep track of separation or accounting. Organizations and spaces are only used for quota association, but not with the placement of dierent instances. What we are going to suggest it is a proposed change that could help solving both the placement and the accounting problem, leveraging on the virtualization isolation opposed to the lightweight container way.
7.2
Where to hook up a virtualization isolation: the Stack
As we previously discussed, an isolation between running applications can be obtained in dierent ways. While Cloud Foundry relies on a container approach, we would introduce a stronger type of isolation, based on a virtualization layer, maintaining backward compatibility. The current PaaS implementation is strongly tailored to Warden and, like many open source projects, it is not a suggested idea to subvert the architecture to provide new features. Cloud Foundry keeps its simplicity at the cost of a stronger way to separate applications. We wanted to provide a totally compatible concept that could co exist with the current container based approach.
128
7.2.1
Current Stack usage
The solution implemented relies on the Cloud Foundrys stack concept. As we previously seen, a stack is a pre built image associated to each DEA node. When we install and congure a distributed version of Cloud Foundry, we must dene a stack conguration value for the DEA nodes we are going to spawn. The use of this label can simply extend the versatility of a deployment, oering the maintainers a way to congure dierent nodes and to mark them. A stack is basically a rootfs image, that includes an operating system that supports running applications with certain characteristics. This image is the same one that Warden manages during the bootstrap of the containers. Currently any DEA can support exactly one stack, because in each DEA node exists only one instance of Warden server and that instance is able to start containers layered on the top of only one rootfs image. In addition, Cloud Foundry comes with only one available stack: lucid64. This image is practically an Ubuntu 10.04 64bit system containing additional programs and libraries such as MySQL, PostgresSQL, git and Ruby. The use of stack comes handy when we want to integrate additional tools,
Figure 7.2: DEA pools binaries or functions to a basic linux image used later by Warden. It is possible to create a new stack from scratch with the tools provided here [59], however in order to use it we need to upload the created package to BOSH and store it in the blobstore. Unfortunately only the new DEAs started will be able to use it, if in their conguration le that stack is set. It is clear enough, how the stack concept is pretty static and hardly bound to the deployment process, while at run time it comes really dicult to take advantage of it and recongure dynamically the DEA nodes. That is why this feature is pretty not considered and usually left to default in most of the deployments.
129
Despite the fact the stack feature, as it appears, lacks of some features and it is most of the time forgotten and left to default during a deployment, it can turn to be a handy label when it comes to arrange pools of DEA nodes, as we can see in Figure7.2. First of all we wanted to achieve an isolation accessible immediately by the clients (who uses a CF CLI) and totally compatible with the current Cloud Foundry installation. In addition each DEA node advertise its capabilities and availabilities, including the stack itself, to the brain of the architecture: the Cloud Controller, this latter component each time receives a start application request, browses between all available stacks and sends the application only to the DEA that exposes the requested stack. Basically we can use the stack ag to split the DEA pool and provide a dierent placement for those apps that do require a special stack.
7.2.2
Cloud Foundry Stack in details
As we have seen Cloud Foundry supports the usage of stacks and let congure in a static way the DEA nodes during the deployment. The current support of stacks oers the clients to choose, during a push of an application, which stack they prefer; while on the PaaS side we have a support based on message communication system and interaction between two components: the Cloud Controller and the DEA.
7.2.2.1
DEA bootstrapping
Whenever a DEA is deployed on a virtual instance and started, it reads a conguration le called dea.yml, which contains all the conguration parameters such as: NATS endpoint and credentials, Warden socket to use, base domain application to use, maximum limits of resources associated to the containers (memory and disk), advertises heartbeats intervals and stack ag associated. During the bootstrap of the Ruby process, all the settings are loaded from the YAML le and other sub-tasks are spawned. During the conguration process the DEA sets up the NATS connection
130
and subscribes to the dea.#{id}.start, dea.stop. Those channels are required to receive start or stop application requests. The rst channel includes an id value that represents an universally unique identier (UUID) and allows the Cloud Controller to send a specic start request to a single DEA node. When the nats connection is ready, a secondary task kicks o: the DEA::Responders::DeaLocator. That component is in charge to subscribe the node to the dea.advertise NATS subject and periodically, thanks to EventMachine (an event-driven I/O concurrency library), publishes advertisements of: DEA id, available stacks, available memory (calculated with an overcommit factor) and number of instances for each application (represented by an id again), as displayed in Figure 7.3. As we can see the stack ag is advertised in those messages and the Cloud Controller is well informed about all the statuses of the several DEAs. As we said before, a DEA node is both a running agent and a staging agent;
Figure 7.3: DEA advertisements like for the starting process, we have some similar NATS staging channels to whom the DEA is subscribed. An additional secondary task, is started when the DEA is launched: the DEA::Responders::StagingLocator. This component is pretty identical to the DeaLocator one, except for the staging.advertise NATS subject messages. While the rst locator shares information about the number of running instances per application, the second one does not at all. Since a DEA is treated as a staging agent, without any dierences, this lack of information could lead to certain issues that we will discuss later. 7.2.2.2 Cloud Controller bootstrapping
We understand that, from a coordination point of view, we have enough information to make decisions whenever an application needs to be started and requires a placement. In eect the Cloud Controller, during a staging or
131
start process, checks that the memory and stacks requirements are met. To get this accomplishment the Controller, during its bootstrap, congures an AppObserver component that is instructed to keep coherency between the logical applications state stored on the database and the current live situation. The AppObserver keeps an eye both on the expected application state and the run-time state, if required invokes a staging, start or stop process. When the Observer is started, it subscribes to the staging.advertise NATS subject, a channel where all the advertisement messages will be sent. Similarly to the Observer, the Controller congures a DeaClient component that, during its activation, will subscribe to the dea.advertise channel. Whenever an application requires a staging process, an AppStagerTask request is generated; during this ow of requests the method nd stager is invoked on a stager pool class. The Cloud Controller, in mutual exclusion, will check if the requested stack is available in the deployment (typically this stack is represented always by lucid64) and than nd the best top ve stagers by pruning and browsing between the advertisements received previously, then a random one will be picked. The selection is done by checking if the found stager has enough memory free and the right stack requested. We need to remember that there is not any dierence between a stager node or DEA node, both represent the same component at the end; in fact the same node chose at rst will be the host of the future staged application. Much like the staging process, the Controller handles the start process: the AppObserver reacts automatically to a state change of the application, if the state changes to STARTED, this component will use the DeaClient construct to deploy a new instance of an application (with instance we mean a single running container with an application inside), as represented in Figure 7.4. The Client chooses the right DEA node by calling the nd dea method on a dea pool class, checking that the memory and stack requirements will be met. In addition, during the selection of the most suitable DEA, the Controller checks that the DEA node has fewer instances of the app requested, in this way we obtain a little balancing between the applications, spawning the same one on several nodes and balancing the dierent kind of applications. When the CF CLI invokes a push of an application, a dierent stack, from
132
Figure 7.4: Cloud Controller selection process the default one, can be set thanks to the deployment le or console parameters. The PaaS will rstly ensure that the required stack is known and exists in the system, than it will choose the most appropriate DEA node according to additional statistics gotten from the advertise messages. This process takes place each time we want to start an application from scratch, but certain issues could rise. The staging advertise messages lack of important information: it is not provided any number of running application during the selection of a stager. In a small deployment, with few nodes and a small load usually is not a problem, but thinking to more real deployments with replication of nodes and resource-intensive applications, that missing info could generate many problems. Whenever an application start request is received and the application is not already staged, the Cloud Controller checks only a suitable DEA that has enough memory to stage that application; moreover the same node will run the same application staged without passing through the nd dea process (during a staging process a start message is built and sent [60]). Considering a situation where few applications with high memory limits but low cpu usage rate run inside a DEA node and
133
many applications with little memory limits set but aggressive cpu rates run in another DEA node, it could be that the Cloud Controller could choose a stager node with more available memory, but worse CPU time slices and furthermore overload a DEA node. This is because it is not shared any application number information among the staging advertise messages, while the stager and runner nodes are totally collapsed in one single DEA agent.
7.2.3
Our proposal to employ the Stack
Our idea of isolation comes to life by a simple mechanism: the current usage of the stack let the Cloud Foundry administrator, at deployment time, mark the dierent DEA nodes. The stack at the end is a sort of ag, that can be used to label dierent DEA nodes or arrange DEA pools. Trying to be as more compatible as possible, we can create several stacks, or just stack labels, and re up dierent DEA pools. Inside the DEA node itself is not possible to provide a stronger isolation, due to the concept of the component. The Warden containers separate the application from conicts and virtually give the idea of a dedicated environment per application, but practically there is not any way to provide a more strict isolation inside the single VM holding the applications. When it comes to strongly divide the use of resources, a separate virtual machine is a better approach and a much more strong solution compared to the containers. If we think about it, we have already a way, embedded in Cloud Foundry, to organize DEA pools and to dispatch the dierent client request; Cloud Foundry supports all this process. Now is up to us to provide a new kind of isolation by changing the meaning of the stack ag and editing the controller logic. Basically we can change the logic inside the Cloud Controller, that knows all the available stacks in the deployment, and make it behave in a dierent way when an application wants a dierent stack from the stock one; when the application does require a stack that is under a concept of more strong isolation. In this way we can shift the problem to a dierent layer; assuming that the Cloud Foundry can dispatch the dierent stack request by itself. It is not matter of deep change in the architecture or staging-starting process, it is only matter of dealing dierently certain stacks. We can imagine
134
Figure 7.5: New DEA advertisements to have a pool of isolated DEA nodes under a certain stack ag, and to send special staging-start request to those, under certain conditions; an isolation achieved by the virtualization layer oered by Openstack, a secondary isolation supported by both the IaaS and PaaS layer. 7.2.3.1 Proposal of DEA modications
A DEA virtual machine can be both a stager and running agent. In addition each time a stager is chosen, it will turn to be the rst node to accommodate the new application. What we want to achieve is a dierentiation of the staging process itself. We need to know, the Controller needs to know, during the choice of the DEA stager which one is more suitable and which one could hold a new application without running it alongside other existing. First of all we need to solve the staging message issue, as it does not provide us of any information regarding the number of applications running on a node. The DEA node generates periodically the advertise messages via two responders: the StagingLocator and the DeaLocator. During the bootstrap, the DEA sets a timeout in to the EventMachine currently used and passes the two advertise methods exposed by the locators. Regarding the StagingLocator, here we see the new addition in the listing 7.1, the app id to count conforms to the one provided via the DeaLocator. Now, unlike before, the DEA publishes on the channel the number of applications running. This addition comes useful to distinguish, at staging time, which stager is more loaded of apps or wich one has just one application running. Now we have the DEAs publishing complete information on the right NATS subjects, no matter if they are merely runners or stagers, an example picture is represented in Figure 7.5.

1 2 3 4 5 6 7 8
135
def advertise nats.publish("staging.advertise", { "id" => dea_id, "stacks" => config["stacks"], "available_memory" => resource_manager.remaining_memory, "app_id_to_count" => resource_manager.app_id_to_count }) end Listing 7.1: Staging Locator Advertise
7.2.3.2
Proposal of Cloud Controller changes
When it comes to take decisions, the Cloud Controller is the starting point. While we have seen how the single DEA nodes know which stack they belong via the dea.yml conguration part; we did not cover how Cloud Foundry knows all the available stacks in the system: the Cloud Controller is in charge of this task. During the rst bootstrap of the component, a stacks.yml le is read, these conguration les and components are displayed in Figure 7.6. That le contains information regarding the default stack to use, if the appli-
Figure 7.6: DEA and Cloud Controller conguration les cations do not specify it during a deployment, and a list of all the available stacks in the system. The Controller during the very rst run reads that le and populates the internal database, by adding new entries for each new known stack. While the information written on the database is specically used by the Cloud Controller REST responders and queried by the CF CLI clients; the other conguration properties are loaded at run-time and used during the life-cycle of the component. In order to achieve a dierent behavior for those special stacks, we decided to change the stacks.yml by adding a conguration parameter called isolated stacks. As you can see in the listing 7.2 the new property can accept a
136
list of possible values, representing each one a known stack available in the deployment. What we want to achieve is displayed in Figure 7.7. During the
1 2 3 4 5 6 7 8 9
default: "lucid64" stacks: - name: lucid64 description: Ubuntu 10.04 - name: isolatedPool description: Ubuntu isolated pool isolated_stacks: ["isolatedPool"] Listing 7.2: stacks.yml Change bootstrap of the Controller component, the le is read and all the conguration operations are run; when the dea pool and stager pool are created, to provide additional functionalities to the Cloud Controller, the isolated stacks value is passed, containing all the stacks that are require a dierent handling
Figure 7.7: New pools and processing from the default lucid64 one. Now, based on the information we got and considering to pre-allocate resources and nodes, we can change the Cloud Controller logic behavior. First of all we need to consider the AppObserver. As we have seen this component reacts automatically to the state changes on the database and checks if an application already pushed to the PaaS has the right state desired. If an application transits to a started state,
137
the AppObserver checks immediately if the application is already staged and then issues the right process if needed: the staging. During a staging process the nd stager method is called on the stager pool class and it is here where we can hook up our changes. In the listing here 7.3 we can see how we added our changes, to achieve the ow explained in Figure 7.8. By default the cloud controller, during the stag-
Figure 7.8: A new Controller processing ing ow, searches for a valid stager by, rst of all, validating the availability of the stack in the deployment and than nding the top 5 isolated stagers. As we can see in 7.3, we added a new method to select special stagers, that are part of the specic pool of stagers that require a specic selection. Now we understand as the Cloud Controller is able and is in charge to take a decision, and select the most correct node according to the logic within the top 5 methods. In addition, the selection can vary according to the hash list of isolated stacks loaded during the bootstap of the component. In the same way, in the listing 7.4, we can see the modication to the previous approach during the selection of a dea node. Again, here we lter the advertisements received and we make a decision according to the kind of stack that the client request selected. If the stack is part of the isolated pool, a dierent decision will be taken. In both scenarios, the specic method that contains the logic of the isolation, returns a DEA id only if that node does not have any application running at all. To summarize, each time the Cloud Controller needs
138
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
def find_stager(stack, memory) mutex.synchronize do validate_stack_availability(stack) prune_stale_advertisements if @isolated_stacks.include? stack best_ad = top_5_isolated_stagers_for(memory, stack). sample else best_ad = top_5_stagers_for(memory, stack).sample end best_ad && best_ad.stager_id end end Listing 7.3: New nd stager
to stage or start an application and create a new instance - a Warden container - controls the stack requested; if the selected ag from the client is in the isolated stack list, the application starts only in a DEA node from the pool exposing that specic stack. The Controller behaves dierently when we request a isolated stack; basically it checks which DEA node has currently zero application running. In this way, we achieve a stronger isolation between running application: they run in separated VMs. In addition the accounting process and the resource association to the application could turn easier as the resource limitation policies or the accounting tools are well integrated and greatly available for many dierent IaaS. We need to keep in mind that Cloud Foundry can be deployed on several dierent platforms, each one with dierent and specic implementations and integration tools. Through this detailed explanation is clear how well Cloud Foundrys pieces are totally modular and editable. It is pretty easy to just provide new information via the advertisement messages and shift all the logic, with the decision making process, straight to the Cloud Controller node. We can grant and add more policies and features during the selection of the runner nodes.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
139
def find_dea(stack, mem, app_id) mutex.synchronize do prune_stale_deas if @isolated_stacks.include? stack best_dea_ad = EligibleDeaAdvertisementFilter. new(@dea_advertisements). only_meets_needs(mem, stack). eligible_for_isolation. sample else best_dea_ad = EligibleDeaAdvertisementFilter. new(@dea_advertisements). only_meets_needs(mem, stack). only_fewest_instances_of_app(app_id). upper_half_by_memory. sample end best_dea_ad && best_dea_ad.dea_id end end Listing 7.4: New nd dea
7.2.4
Integrate the change with BOSH
While we have seen how to modify the logic in Cloud Foundry, we need to understand how we changed the deployer conguration les, according to the BOSH manifest. Do not forget how Cloud Foundry is strictly tailored to BOSH and how it is totally mandatory to deploy our open PaaS with that specic deployer. In order to combine the changes made with the current implementation, the Cloud Foundry job release les are involved. On the DEA side we do not need any kind of change. The dea next job exposes under his spec le the dea next.stacks property; by using a modied deployment le, we can just write down a DEA template and specify that stacks value to a special attribute. BOSH can handle multiple deployments, so it is possible to just
140
start and manage single or multiple DEA nodes deployments with specic manifest le; at the cost of only one property added, the stack name. While BOSH can handle the deployment of the dierent DEA nodes that are part of various pools, on the Cloud Controller side, it is needed an additional job change. The current cloud controller ng spec le allows, who wants to deploy the release, to insert a ccng.stacks value that represent the stacks values to be written in the stacks.yml le; but that property is not adequate to fulll our purposes. A new property, inside the spec le, has been added: ccng.isolated stacks. Via this new property and by editing the stacks.yml.erb job template le, is possible to provide a conguration value to the manifest deployment les. In this way during the deployment, the Cloud Controller can be created with a knowledge of all the available stacks in the systems, while we create the DEA pools during a second deployment. When all the Cloud Foundry nodes are started and communicating, we can then decide which stack choose or which kind of isolation we want, by conguring the right stack during the deployment of our application.
7.3
Enabling a Dynamic Provisioning
Cloud Foundry is open-source, is installable on many IaaS providers but it is really tailored to a specic deployer entity such as BOSH. While Cloud Foundry can be installed easily, the only current way to congure and then install it in a distributed environment is via BOSH. By using this deployer we come across some pros and cons. BOSH is handy to specify all the logic tasks in jobs and packages, it is really easy to create new jobs with their packages from scratch, in addition it can be used for several other tasks (not only Cloud Foundry). Moreover, the deployer features can ease the conguration process and oer a simple way to run our daemons or components, as all the CPIs oer many ways to interface with several Cloud Providers with a very well formed integration. Unfortunately, all these features come with a very big drawback: the deployment is totally static. So far, when we consider the BOSH manifest and prepare all the jobs by
7.3 Enabling a Dynamic Provisioning
141
adapting the templates to the desired state we want to obtain, we are binding our future virtual machines or instances to specic BOSH jobs. When the association is made, the virtual machine is totally congured to run that specic set of jobs with a static conguration, that cannot be dynamically changed, if not at the cost with an update via BOSH CLI. The biggest drawback is the impossibility to scale up the nodes dynamically during the execution of all the components and at run time. Without considering the additional change we made in the current design, once a deployment is running and BOSH completes the procedure, there is not any way to provide some sort of dynamic scaling of pools or simply nodes. Cloud Foundry is thought to be easy, is up to us to take care of its consistency, availability and resilience capabilities. BOSH can monitor the tasks running inside an instance, can restart the single tasks, but it will not create any new instance from scratch, if any issue occurs. Until now, there is not any BOSH command or feature that could help us managing the life cycle of the instances associated with the jobs. BOSH is a bridge between the IaaS layer and the PaaS layer, is a deployer but does not oer more advanced tools. If we want to adopt the additional isolation approach we have seen or provide more advanced capabilities, BOSH cannot be considered as a viable and satisfactory answer. Considering the stack denition and how it can be used, each time we specify a stack that is part of an isolated pool, we strictly tie a single instance of an application to a virtual machine running a DEA agent. It is clear how this approach coupled with the set of BOSH features, may lead to resource availability problems: we can specify multiple DEA nodes and manage them all via a stack pool, but the system can not auto-scale under heavy load or just when too much instances are requested, in a standard Cloud Foundry installation. A solution can be found in many orchestration tools available or in a BOSH modication, since the PaaS does not interface with the IaaS layer below. On AWS the Cloud Formation [61] tool gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them. On Openstack, Heat [62] implements an or-
142
chestration engine to launch multiple composite cloud applications and oer the users tools to manage resources and instances based on templates. Both the tools can connect directly with the Cloud Platform APIs and request specic or start new congurations.
7.3.1
Heat
Heat is a service to orchestrate multiple cloud applications using the AWS CloudFormation template format, through both an OpenStack-native REST API and a CloudFormation-compatible query API. This engine provides a template based orchestration for describing a cloud application by executing appropriate OpenStack API calls to generate running cloud applications. The software integrates other core components of OpenStack into a one-le template system. In this way the templates allow creation of most OpenStack resource types (such as instances, oating ips, volumes, security groups, users, etc), as well as some more advanced functionality such as instance high availability, instance autoscaling, and nested stacks. In addition this handy engine can be used with Ceilometer to provide a more smart metering and auto-conguration via Ceilometer Alarms. Basically when we use Heat we prepare templates, that specify which kind of new instances we want to be added, and ask the deployment via: Openstack Horizon web browser interface or REST APIs or command-line tools. When a template is sent to be applied, Heat in this order: provide a node, provide JSON conguration meta data, run conguration triggers and reports, at the end, node informations (success or not). As we see in Figure 7.9, Heat consists of few components: Heat API: REST API exposed to users to communicate with the engine (the calls pass through AMQP, a message technology that OpenStack uses between all its component), these APIs represent the endpoint for the interaction with Heat engine; Heat engine: it is the core component in charge to orchestrate OpenStack, the layer on which the resource integration is implemented. In addition contains abstractions to use Auto Scaling and High Availability features;
143
Heat metadata: a specic server that collects all the metadata informations, useful for metrics and wait-conditions, and interacts with ininstance agents.
Figure 7.9: Heat architecture Whenever a new template is applied and Heat succeed in creating a new virtual machine, a new stack is created. A stack, not to be confused with the Cloud Foundrys stack concept, is an abstraction that represents an applied template. We can specify a template and apply it many times behind a dierent stack name. This feature is important, as inside each template is possible to set parameters and congurations like number of instances or autoscaling groups; when we have multiple instances, for example, we need only to now the stack ID and set the dierent conguration parameters, that we specied within the template, to obtain a change in instance number. A template is the main object through which we can congure new OpenStack instances, is made of: Description: is a simple text description of the template usually inserted at the beginning of the le; Parameters: are all the parameters that we want to be able to specify each time we submit a template. In this section we dene parameters
144
Improving provided isolation such as keynames (OpenStack keynames to be associated with the new instances started), ImageID and LinuxDistribution. These parameters are totally congurable and up to us;
Mappings: optional, represent the mappings between AWS and OpenStack. In this section of the le we map and specify how our parameters, for example the image types, can be mapped with the Amazon denition or sizes; Resources: is a section in which is possible to dene packages required by the instance and daemon or services that need to be started immediatly after the virtual machine is started. In this section the parameters previously specied are transformed in properties to be used during the interaction with OpenStack APIs. In addition under the resources we can also add specic scripts required by our instance; Outputs: represents the value, like public IPs, that Heat engine need to extract from the nal deployment of the instance and store in its metadata server. Whenever we query via CLI the Heat APIs, we are able to read and get this specic values as a result of a query on the stack ID. When we want to deploy a template, we basically need to interact with the Heat engine, provide the template le and specify all the required parameters (written inside our template) during the invocation of the command.
7.3.2
Scaling DEA nodes with Heat
With these tools, it is possible to provide a sort of scaling for Cloud Foundry. Considering how Heat works, its features, the template congurability, a scaler has been developed. This little application, in combination with NATS advertise messages, is able to provide a sort of scaling for Cloud Foundry. The same idea could have been developed relying on BOSH, as weel; BOSH provides a CLI client like Heat, but we decided to use the OpenStack tool as the template feature is more congurable and provides user script option.
145
Figure 7.10: Cloud Foundry scaler, using Heat
First of all, to generate a little auto scaler is important to nd a good metrics system that can monitor the load on the nodes and provide statistics during the life-cycle of the components. Cloud Foundry comes with that sort of system out of the box; considering the modication added in 7.2.3, both NATS heartbeat subject staging.advertise and dea.advertise provide us sufcient information to take decisions. While the rst channel is more useful from a Cloud Controller point of view, whenever an application requires an isolated stack, the latter comes useful to give us metrics about the load on the dierent DEA nodes. DEAs ResourceManager collects and broadcast informations such as: DEAs node stack in use, available memory, available disk and number of applications running on the node. Keeping this in mind, we developed a little application that subscribes to the same channel and collects the advertisement broadcasted. When we receive metrics from the DEA nodes, we can take decision for both isolated pools and not. Obviously, as we see in the Figure 7.10, if the pool of DEA node that requires isolation is running out of nodes, we can ask gently
146
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
{ [...] "InstanceType" : { "Description" : "Instance type", "Type" : "String", "Default" : "m1.dea", "AllowedValues" : [ "m1.tiny", "m1.small", "m1.medium", "m1.large", "m1.xlarge", "m1.dea" ], "ConstraintDescription" : "must be a valid OpenStack type" }, "ImageId" : { "Description" : "Name of the image to use", "Type" : "String", "Default" : "isolated-dea-snapshot" }, "NumInstances": { "Default": "1", "MinValue": "1", "MaxValue": "100", "Description" : "Number of instances to create", "Type": "Number" } }, "Resources" : { "JobServerGroup" : { "Type" : "OS::Heat::InstanceGroup", "Properties" : { "LaunchConfigurationName" : {"Ref":"JobServerConfig"}, "Size" : {"Ref": "NumInstances"} } }, [...] } }} Listing 7.5: A template for scaling
to Heat to provide a new node by querying a to scale up a Heat stack or, for example, if we are running out default DEA nodes running a lucid64 stack, we can easily ask Heat to scale that pool of instances. An extract from a
147
template like the one shown in the listing 7.5, let us understand how to dene new instances for the Heat engine. First of all, we need to take a snapshot of a clean DEA node, after it is deployed, in order to have a clean starting point for the future and additional Cloud Foundry instances. Then in the bottom part of the same le we need to specify the specic script required to run the ruby DEA code, exactly taken from the BOSH releases job template. The scaler will: 1. Subscribe to the right heartbeat channels; 2. Receive the metrics and elaborate them; 3. If required, issue a Heat create-stack command, passing the right template associated with the DEA pool and increasing the number of the instances required for that pool.
Figure 7.11: A dierent Stack is running
148
The scaler, is merely a starting point for an auto-scaling feature and it is not yet able to scale down nodes (when unused), relies on a taken snapshot (this should be automated in a dierent way) and it should be tuned, based on real usage statistics. A nal representation of the new and additional behavior obtained, is displayed in the Figure 7.11; each virtual machine is running a specic Cloud Foundry component. Only the DEA nodes exposing the default stack lucid64 are able to run multiple applications in separated containers within the same instance, while all the other nodes, part of a dierent stack called isolated are treated in a dierent way from the Cloud Controller. For those nodes, part of the isolated stacks value, we obtain a stronger isolation, based on the virtualization layer, as the application will be spawned to separate nodes. As we discussed, this new isolation approach relying on virtualization, can be useful for certain kind of applications. Which applications may really make the most out of this new feature? In the next chapter we are going to analyze and discuss which categories of application can benet of this additional feature.
Chapter 8 Isolation and Co-location Performances

The stronger isolation provided and based on the virtualization support, adds a choice of more security and separation among the application. The added capability oers more connement at runtime, but can also grant some performance improvements for certain sets of applications.In order to measure and consider the benets, we tried to categorize dierent kinds of applications that can run on Cloud Foundry and nd which of them could benet more of this stronger separation in terms of performance. Below are described and dened the dierent categories of applications developed or used to generate load and stress conditions. The goal of these tests is to measure and quantify the benet of an isolated or co-located placement of certain applications that mostly require a heavy usage of CPU resources, network communication and local disk I/O. We focused our area of test on these three main categories, as most of the applications developed for Cloud Foundry may take advantage, in terms of performances, from those three resources. Whenever an application is pushed on the PaaS we can consider generally and pay most of our attention on resources such as: CPU, as application does require computational power to serve out requests and elaborate as fast as they can the most complex tasks;
150
Isolation and Co-location Performances
Network, because a reliable and good network support is mandatory for Web applications and for those applications that usually deal with a lot of network trac; Local storage, not always essential for Web application, but usually an unavoidable bottleneck when considered.
The dierent applications considered to stress the physical VM resources one at time, can be listed under these main categories:
CPU intensive tasks; Network intensive communication; Disk intensive input and output.
Then we are going to present same real example of applications, that try to take advantage from the dierent placement; those test applications will stress in combination a set of the previous resources we listed and will show benets or drawbacks of the dierent placement approaches. The samples for this second category are:
Distributed application, a distributed application that mix network trac and calculation of operations, on a chunk of data; Media stream over network, a media streaming service that oers fruition of content to multiple clients (stressed multiple connection problem); Multi-tier application, a multi-tier J2EE application oering a list of contents to client, but built considering many layers and technologies such as JavaServer Pages(JSP), Spring Framework and Hibernate persistence framework.
8.1 Application Tested
151
8.1
Application Tested
All the evaluations were carried on a Ubuntu Cloud Havana installation, an Openstack installation integrated in Ubuntu 12.04 LTS release. The Cloud Platform deployment comprises multiple compute nodes, each of them with: KVM-QEMU Hypervisor; 32 VCPUs; 256 GB of RAM; 2.2 TB of storage; 1 GBit Network interface. The core of our comparison focuses on the main dierences between a single container running in a separated allocated VM and a container running inside a VM, but co-located with other containers. Each time we request a new instance of an application, we get a new Warden container running, but we can choose between a separate VM, dedicated to that application, or a co-located. For this purpose a snapshot has been prepared, containing the Warden system and all the tools and applications required to test the dierent performances. In order to guarantee a fair comparison, we decided to guarantee the same computational resources to a separated virtual machine holding a single Warden container and in proportion to a single virtual machine holding multiple containers. We try to split the same computing power and resources among dierent deployments; considering the same hosting machine, a compute node with multiple cores and a large amount of memory, we want to allocate the same properties available during the creation of the instance via OpenStack. This means that each time we refer to an isolated VM (which is part of the isolated stacks pool we have discussed in 7.2.3) performance we intend the performance of a single VM, with a single Warden container running within, which can use a single Virtual CPU (VCPU) compared to a bigger single
152
Figure 8.1: Virtual Machines and test conguration
virtual machine hosting multiple containers that can benet of a directly proportional number of VCPUs; an explanation can be found in the Figure 8.1, where it is clear how the resources associated to a VM grow, the more the number of containers increases: a single Warden container must take advantage of a single VCPU at most, both in the isolated deployment and in the co-located. The concurrent execution, inside the containers, during all the tests in the both congurations, grants a fair comparison between the two placement when the resources are stressed. The tests were performed mainly with a conguration of two concurrent containers and four containers. To be more precise each time we relate to a isolated conguration, the virtual machine can benet at most of 1 VCPU, that it is directly mapped to a single KVM process on the host that can get at most 100% of CPU utilization (a single process running in a multi-core system of 16 cores, may request at most 1600% CPU utilization). The same isolated virtual machine can benet of 1024MB of RAM. In an equivalent manner and proportional, a single virtual machine hosting two warden containers has been associated to 2 VCPUs and 2048MB of RAM, in order to grant each single container a 100% of CPU utilization within the VM and a fair amount of memory, totally equivalent to a single isolated VM; then that conguration has been compared with the isolated approach, iterating all the test using two containers hosted in the single VM and using two dierent VMs capped to a single VCPU, as the Figure 8.2 shows. For a conguration of four containers, the VM hosting all the Warden containers has been congured to use 4 VCPUs and 4096MB of memory
153
Figure 8.2: Two VCPU deployment for test and has been compared to a deployment of four identical isolate VM of 1 VCPU running the same tests, as Figure 8.3 displays. This fair resource association let us understand and measure in which cases
Figure 8.3: Four VCPU deployment for test a same exact amount of resources, in terms of memory and CPU, is better utilized; in addition we try to understand and nd those application that can benet from a separated operating system at runtime. The answer is not always obvious or predictable as many factors are involved; moreover, it depends on the character of the application and its resource needs. Each time we measure the performance of two or four running applications, we start the application at the same time, no matter what they execute in the same virtual machine or in dierent one; this has to be considered to keep a fair comparison between concurrent execution of the applications. In addition we need to keep in mind that with a single iso-
154
lated virtual machine we are just providing a separated operating system at the little cost of a single VCPU and few megabytes of ram, while when we co-locate multiple containers within the same instance, we are using the same ideal amount of computational resources but the virtualized operating system has less limited system resources and has to handle more context switches due to the increased number of processes.
8.1.1
CPU - intensive application
To measure the real dierence between an isolation approach and a co-located one, we decided to use the Whetstone Benchmark [63] to generate a high load on the CPU and measure a probable dierence, in congurations with a total of two and four VCPUs. The amount of resources has been split accordingly
Figure 8.4: Whetstone Benchmark host score among two separated isolated VMs and four isolated VMs, opposed to a single VM running with two or four VCPUs, but dealing with an equal number of Warden containers running the same task. The test has been repeated scaling the number of threads started by each benchmark instance running within a Warden container, scaling from one to thirty two; in this way we tried to generate an overhead of context switch and looked for a feasible real
155
benet adopting one of the two dierent approaches, when a large amount of thread is running. Each instance of the application run the benchmark and then gets a score, a number, that measure the Millions of Whetstone Instructions Per Second (MWIPS): a MWIPS represents a score obtained from the calculation of different oating point tests (MFLOP, IF MOPS, COS MOPS, EQUAL MOPS, FIXPT MOPS, EXP MOPS), a result determined by measuring the time it takes to perform some sequences of oating-point instructions. Generally the benchmark take advantage of an increasing number of CPU cores, this mean increasing the number of the threads according to the number of the physical cores, grants a higher score [64]: as example, a Intel Core I7 930 (four core CPU with Hyper Threading) generally can achieve a score between three thousand and four thousand in a single thread conguration. The Figure 8.4 reports the host score, in order to provide a comparison parameter and a real result representing how the benchmark could benet from an increased number of threads on a multi-core CPU available on the native operating system without considering the virtualization layer (as an additional overhead). As it is clear in the Figure, the host exploits totally its computational cores and the increasing number of threads let the benchmark reach a higher score taking an advantage from the multiple cores of the CPU. Figures 8.5 and 8.6 compare the two dierent scores and execution time between the isolated execution and the co-located one, when we cap the test at two VCPUs. We should not jump to any hasty conclusions, on the axis of ordinates we have a magnication of the values, represented by a range between 3780 and 3900. It is clear, from the results, how there is not any real signicant performance dierence between the two deployments; the gap, in terms of benchmark score achieved is really reduced and not meaningful. It is important to consider that each container, in a separate virtual machine, can benet of one single VCPU at most, as a co-located container can, in a bigger sized virtual machine, virtually benet of two virtual CPUs, however due to the concurrent run it can obtain at most one single VCPU. The virtual CPU dedicated to the single virtual machine is a stronger boundary, compared to the bigger sized VM. As far as now, Warden cannot be as-
156
Figure 8.5: Whetstone Benchmark two VCPU average container score
Figure 8.6: Whetstone Benchmark two VCPU average execution time sociated with CPU cycles nor CPU cores, this means that we have multiple applications that virtually can see multiple cores, but due to the concurrent execution, they reach a peak of resource utilization that is comparable with a single and isolated virtual approach during a concurrent execution. The two scores, on the chart, point out how a CPU intensive application generally can get the same performance in a co-located environment or a single isolated one. A CPU benchmark does not seem to nd a real benet from a total separate and isolated VM.
157
Figure 8.7: Whetstone Benchmark four VCPU average container score Similarly, in the Figures 8.7 and 8.8, we can observe a not meaningful gap
Figure 8.8: Whetstone Benchmark four VCPU average execution time between the two dierent approaches when we cap the limit to a total of four virtual CPUs. Briey, in terms of CPU intensive application, the isolation placement, the additional abstraction provided by a separate operating system does not guarantee real increased performances; however we need to consider how well the additional overhead, added by the virtualization layer, does not reduce the nal performances. Whenever we add more strict isolation, we request new virtual machines; a feasible drawback might come up
158
if we consider the reduced computational power associated, nevertheless the dedicated instance with a single VCPU takes advantage from the reduced environment and use better the dedicated resources.
8.1.2
Network - intensive application
In order to measure the network performance we can get from one or the other approach, we decided to use a tool like Iperf [65], a program written in C++ language, well known as an instrument to generate network trac on TCP and UDP protocols. This tool has been really essential to measure the
Figure 8.9: Two VCPU deployment iperf test dierences in terms of bandwidth between the isolated instances and the colocated ones. While the Warden containers acted as clients and generated the trac (this operation puts load on the CPU during UDP communication), on the other side of the communication the servers registered the network trac bandwidth and the number of packets received, as shown in Figure 8.9. Considering the Figure, we are showing both the isolated and co-located placements with a conguration of two VCPUs, however the tests on the connection speed and protocol has run in two dierent times, keeping totally separated the two measurements. The tests tried to gauge the TCP and the UDP speed at dierent bandwidths. Obviously both the congurations, either isolated or not, share the same network interface, as the compute nodes are capped at 1GBit/s interface. This limitation turned to be interesting, considering that we compare: multiple virtual machine instances to a gradually bigger one with more computational resources.
159
Most of the network load balancing, among the dierent VMs, has done by the Virtual Machine Monitor, who is in charge to dispatch the packets to the right virtualized OS during the communication. In the Figure 8.10 we can distinguish the real dierence between the two congurations: due to a fair VMM load balancing, between the instances, we can clearly see how the single isolated containers can reach a higher average bandwidth compared to the co-located ones. This is not surprising, the single virtualized instances can benet of more fair round robin load balancing on the virtualized network interface, while the multiple containers inside a single instance have to compete for the same medium and then can reach a lower bandwidth inevitably. All the virtual machines run on the same host at the end, therefore the real nominal bandwidth is split between the dierent instances. The gaps
Figure 8.10: TCP average Bandwidth between the two dierent scenarios let us clearly understand that the more we co-locate containers within a single virtual machine, the less we can take most of the available bandwidth during a TCP communication. In the rst case, considering the two VCPU, we can quantify an improvement of the 8%; while in the second case, where four VCPUs are involved and, consequently, more containers are communicating at the same time inside a single virtual machine, we can get a 20% increase in terms of bandwidth for each container with the isolated placement. When it comes to monitor the UDP perfor-
160
mances, a true benet does not appear clearly. As we can see in gures 8.11 and 8.12, the bandwidth used during the transfer is almost the same, with
Figure 8.11: UDP two VCPU average Bandwidth
Figure 8.12: UDP four VCPU average Bandwidth a very little improvement for the isolated container deployment. However, it is important to consider how the generation of UDP packets, during a UDP iperf communication, puts a lot of load in terms of CPU. Considering how CPU intensive application perform almost the same, in both approaches, is partially comprehensible why the isolated containers, running in separated VMs, do not benet of that placement.
161
8.1.3
Disk I/O - intensive application
The purpose of this test is to evaluate the benet of the isolation, when multiple containers or single containers, running within dierent operating systems, access the same logic disk. The test is an ad hoc application to ascertain how fast a set of multiple threads write the results of a simple calculation on a local le. Each instance of the application, running in a separated warden container, spawns many thread (15) and each of them is responsible to write the result on a le, dierent from each thread, and saved on the local logic disk. This test runs continuously for ten minutes and repeats the complex elaboration for many iterations. In this situation, we can clearly understand the real benet of a separated execution context when we isolate the local storage, as the number of les handled by an operating system is limited. If we think about the event of four dierent containers, running on the same operating system and trying to operate with sixty les at the same time, there will like to be a real disadvantage. Moreover the container isolation do not replicate the virtual disk used by each container, causing a straightforward drawback when multiple instance of the same application try to access concurrently the same disk. In the Figure 8.13, scaling the number of Virtual CPU, it is evident how
Figure 8.13: Disk intensive I/O Write Disk speed well the isolation grant a better performance when multiple applications run
162
within a separated and virtualized operating system. Referring to only two containers executing in the same virtualized operating system, we can benet of an increase of the 38% in write speeding, when we choose an isolated placement for multiple threads and contemporary writings. In addition, scaling up the number of concurrent applications running within a node and the total number of VCPUs used, the performance increment reaches a peak of 178%. Similarly, scaling the number of logic CPUs associated and consequently the
Figure 8.14: Disk intensive I/O total average execution time number of simultaneous application, a signicant gap is highlighted. With only two instance of the same application we can benet from approximately a 50% increase in terms of performance, but the increase of just the number of coexisting instances to four, let us achieve an improvement at least three times better. By the use of mount, Warden generates single logic disk for all the containers; each container can benet of a separated logic disk layered on the top of the operating system one, however the isolation provided via containers, provides a separation in terms of ownership and visibility of folders and les, but does not guarantee the same performance while accessing the medium. A stronger separation and improvement, is clearly visible when an isolated approach is chosen: a new virtual operating system, provided for each isolted VM, guarantees a dierent logic disk to work on each time, a logic storage unity that is not layered on the top of a existing one.
163
It is clear how well a separated operating system for each container, can really help to keep the same performances when we access the same logic storage in a concurrent way with multiple accesses at the same time.
8.1.4
Distributed Application
To test the performances of a more complex application and more close to real behavior, we have designed a specic application based on a communication of data on a TCP socket, between a controller (server) and a worker (client). Basically we have a set of operations to be done, on a large set of
Figure 8.15: Two VCPU deployment distributed computation test data: the server, whenever a client connects to it, sends a chunk of data that requires the elaboration; meanwhile the client keeps receiving the message, elaborate parts of it and sends back to the server the single nal result. Each time the worker starts calculating the result, the controller keeps sending new data that requires elaboration while the socket is left open (on purpose) and a load on the worker CPU (due to the operations) is set. We measure the average speed reached, during the transmission of the chunk to be elaborated by the worker; but keep in mind that the communication is continuous and the controller tries continuously to send the data, no matter
164
Figure 8.16: Average communication Bandwidth sending the chunk the load on the worker node. For each run we executed the application for 10 iteration of the communication and calculation. With this test we want to verify the benets of a separated environment of execution when both CPU load and concurrent access to the same operating system virtualized interface are involved. As we can see in Figure 8.16, we have a real dierence between the two
Figure 8.17: Average execution time dierent placement techniques. The isolated instances, running in separated VMs, benet totally of a separate operating system resulting in an aver-
165
age advantage of 30% approximately in both scenarios; while in terms of execution time, again the workers running on the isolated virtual machine take an advantage from the separated OS and reach an average of 40% of improvement as shown in Figure 8.17.
8.1.5
Media Stream over network
Not only specic tests on CPU and network performances were conducted, to simulate a multimedia stream over the network and a continuous communication between a client and a server application, we decided to test the average speed obtained by streaming a media content, such a audio le, over the network from a server to the client application, as displayed in Figure 8.18. The streaming communication, starts after the client connects to the
Figure 8.18: Two VCPU deployment media stream test server application; then the service read a local media le and starts the communication. With this test we stress and pay more attention on multiple open connections (sockets); for this reason, each client opens multiple connections and tris to receive the same stream from the streaming application running inside the container. The number of simultaneous connection has been scaled from ve to twenty in order to show how a single co-located
166
approach might worsen its performances adding more open sockets on the same virtualized network interface provided by the single VM. In the Figure 8.19 a speed comparison is showed between the two approaches when we limit the number of VCPU to two. As we can see, the real speed,
Figure 8.19: Average connection speed during media transfer two VCPU
obtained during the communication, slowly decreases when we add more threads per application instance. While in the isolated environment, this performance decrease occurs gradually, in the co-located approach we have a more abrupt decrease and a slower connection speed generally due to two drawbacks: a larger number of threads and a concurrent accesss to the same virtual network interface. Limited to the test with two VCPU, we can have an average improvement of approximately 40%. In the same way, in Figure 8.20, it is apparent how in the isolated environment we have a lower decrease in the connection speed, while four containers running within the same OS are already overworked and limited to lower speed that is around the half isolated ones. As it is clear, opening multiple connection on the same network virtualized interface, in the same operating system, and stream a content do not guarantee same performances like in an isolated approach.
167
Figure 8.20: Average connection speed during media transfer four VCPU
8.1.6
Multi-tier Application
With this kind of test we tried to measure the benet of an isolated approach when multi tier applications are involved. A sample J2EE application, running in an Apache Tomcat Web server has been our choice. The
Figure 8.21: Two VCPU deployment Multi-tier test application consists of a JSP front-end, served by a servlet that relies on Hibernate and Spring frameworks. Each time a request is received, the application queries via Hibernate a database and then provides as a result a list
168
of entries from a single table. This sample tries to measure how well a web application with a lot of overhead, due to the dierent layered tiers, might perform better. The test has been repeated scaling the number of concurrent requests is-
Figure 8.22: Average execution time two VCPU sued by multiple clients (each instance of a web application had a separate Apache Benchmark client generating the trac) for a total of 10000 requests and keep alive HTTP enabled. As we can see in Figure 8.22 the more we increase the number of concurrent request, the more the total execution time to get a complete answer grows. It is clear how in a co-located environment the time delay increase happens quickly, by scaling the number of concurrent requests; while in an isolated environment, unequivocally we still have an increase, but considerably slower, from an almost absent dierence between the to approaches to a slower 30% answer. Talking about transfer rates speeds, in Figure 8.23, we can understand how an isolated environment performs better when we have a larger number of concurrent requests for the same web resource. The peak of transfer speed, for both the conguration, is around 100 concurrent requests for each instance of our web application, however we have a worsening for both when we cross the number of 150 concurrent requests for each instance. In addition, the gap between these two placement decreases patently. Talking about perfomances, the isolated vir-
169
Figure 8.23: Average transfer speed two VCPU tual machine monitor can generally great a well balanced network dispatch among all the separated nodes, and this motivate the disadvantages of a colocated approach. In the Figure 8.24, under the four VCPU conguration, it is shown again
Figure 8.24: Average execution time four VCPU a gap in total execution time between the two dierent placements. Again an isolated virtual machine running a single instance of warden container containing the Apache web server do perform better. However, by increasing the number of virtual machines to four, adopting the isolated placement, we
170
have total more requests and this aspect is translated in a lower improvement between the two approaches. While for the transfer speeds, as shown in Figure 8.25, again the isolated containers can perform better till a peak of 50 concurrent requests, after this conguration the gap between the two dierent approaches gets thinner.
Figure 8.25: Average transfer speed four VCPU
8.2
Technical Conclusions
As we have discussed during this chapter, we tried to verify the dierences and benets achieved by adopting one of the two placement approaches in a little distributed environment. We tried to measure rst the advantages in terms of CPU, Network and Disk I/O, then we attempted to dene same categories of Web applications simulating real use cases. The isolated placement can guarantee real benets and improvements in certain scenarios, while colocating the application typically may save more resources and reach the same performances in some examples. Regarding CPU intensive applications and CPU usage, we can clearly understand from the rst tests in the Figures 8.5 and 8.7, how an application cannot completely benet from an isolated VM. The benchmark shows how
8.2 Technical Conclusions
171
well the two solutions get close scores and how the execution times t together. However me must not draw premature conclusions, one of the real advantages from the isolated placement is the resource isolation, meaning that when an application is associated and is running inside a separated VM, we are associating a given computational power. Moreover if we take in consideration malicious or resource greedy appplications, in a co-located placement we might come across certain unfair situations. Typically, when more Warden containers are executing on the same VM, with multiple VCPU associated, all the containers can see and take advantage of all the virtual cores; however if an application starts using more CPU, recklessly, the others, sharing the same node, could suer of a less amount of computational power dedicated. This boundary is strongly present and achieved in an isolated approach, as the amount of resources is totally guaranteed and locked by a VMM. If we shift slowly the stress load on the network, and try to quantify the benet of the isolation, we can clearly understand under a heavy TCP communication, how the separated VMs can reach a higher average bandwidth during the communication. This result is due to the network load balancing managed by the VMM. When there is a higher competition for the same virtual network interface, the average communication speed is lower, as shown in the chart 8.10. In the UDP communication test, the gap between the two approaches is limited, but we need to pay attention to the CPU load, as the generation of UDP packets shifts more the load on the processing power, and, as we have seen, the real advantages on this eld do not exist. When we test the real isolation generated, in terms of logic local disk, we can clearly see and quantify a great improvement. A Warden container takes puts to use a private le system for many purposes, but that disk is a folder mounted on the top of the same Operating System disk, like an additional layer. When many concurrent process, containers, access with read and write cycles the same logic disk, we can understand how the nal performances degenerate. It is clear how an isolated placement can guarantee a separated logic disk and storage to the dierent containers.
172
When the load is mixed and the stress on resources is combined, again the isolated placement may guarantee better performances; expecially when multiple connections are open (as shown in Figure 8.20) or a large number of requests need to be served (as seen in Figures 8.24 8.25). Understandably the isolated placement choice is not always the best answer to many needs, the cost of this solution is higher, as each time a new instance is allocated in the isolated pool, we are requesting a new VM bound to that specic application. Nevertheless, ensure specic SLAs for certain applications might be more easy, because more tools and conguration parameters exist when it comes to associate a specic VM to pre-allocated resources. Through the modied concept of stack we are introducing a new placement feature, available to the nal users, that can completely scale in a little or large-scale deployment; thanks to the modular Cloud Foundry components and the infrastructure layer oered by the cloud, we can achieve a scalable isolation for those application that do require a dierent placement. Moreover the boundaries set up by the virtualization guarantee computational power and memory in a more strict way, that let the applications take totally advantage from the hardware resources. The solution presented, represents a complete new placement technique totally integrated in Cloud Foundry and compatible with the latest releases of the PaaS. The results show how, for certain applications, the new placement can really improve performances and throughput, at the cost of a single and smaller dedicated VM. The isolation provided via VMM is an additional placement and deployment approach that can really extend Cloud Foundry capabilities and oerings, at the minimum cost of additional virtual machines.

On this thesis we explored and analyzed one of the most known PaaS : Cloud Foundry, a state of the art platform, that is having a great momentum and is growing really fast. The project is becoming an enterprise-class platform standard and many big companies, such as IBM, SAP, Rackspace, Intel and Canonical are joining into the project actively. With this work we shared and discussed a proposed change for a new placement technique, measuring its benets in dierent scenarios on a distributed open PaaS deployment. By exploiting the isolation oered by the virtualization layer, we provided a second choice to the Cloud Foundry users and allowed the management of pools of DEA nodes, under the Cloud Foundry stack concept. This new solution assists the SLAs constrains application, as the boundaries set up by a VMM are more solid and ecient than the ones supplied by a container management system. Although many advanced features are still missing in Cloud Foundry, the project is having a great momentum and becoming established as one of the most open PaaS, with a great community and user base. The proposed idea and implementation has been discussed and presented to Christopher Ferris, distinguished engineer and CTO for Cloud Interoperability at IBM; the presented work has been taken into consideration for future Cloud Foundry releases. The results of the tests demonstrate real benets in a small-scale system, but the same or better performances can certainly be achieved in large-scale systems, thanks to the loosely coupled architecture based on network communication and cloud infrastructure. The advanced capability added can be improved with a more reliable and compatible auto-scaling feature, based on the BOSH deployer itself, instead
174
of the limited one tested, OpenStack tied. The BOSH project is more Cloud Foundry coupled and it can integrate really well a new auto-scaling mechanism compatible with all the IaaS and leveraging on the dierent CPIs. With more extended tests, an auto-provisioning feature could be implemented, allowing a more correct and precise tailoring of VMs to the specic Web applications pushed on Cloud Foundry; dynamically an integrated service, could categorize the dierent applications and recommend the best placement, providing the right resources at deployment time. Finally, the Warden project could be extended to integrate many of the interesting features oered by Docker, in order to speed up the staging process, sometimes slow, and the execution of the droplets, by adding the concept of registry and repository. The new Isolation oered represents a new approach for application deployment and separation in Cloud Foundry, a new way to comply to dierent and strict SLAs, a new oer that grants stronger boundaries and resource allocation.
Bibliography
[1] K. Rajani Kanth, Worldwide PaaS revenue to touch $2.9 billion in 2016: Gartner, available: http://www.business-standard.com/article/technology/ worldwide-paas-revenue-to-touch-2-9-billion-in-2016gartner-112111900148_1.html [2] William Voorsluys, James Broberg, and Rajkumar Buyya, Cloud Computing - Principles and Paradigms, Hoboken, NJ: John Wiley & Sons, 2011. [3] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems, 25:599-616, 2009. [4] Michael Armbrust, Armando Fox, Rean Grith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, Matei Zaharia, Above the Clouds: A Berkeley View of Cloud Computing, Electrical Engineering and Computer Sciences University of California at Berkeley, Feb. 2009. [5] R. Uhlig et al., Intel virtualization technology, IEEE Computer, 38(5):48-56, 2005. [6] P. Barham et al., Xen and the art of virtualization, in Proceedings of 19th ACM Symposium on Operation Systems Principles, New York, 2003, pp. 164-177.
176
BIBLIOGRAPHY
[7] P. Mell and T. Grance, The NIST Denition of Cloud Computing, National Institute of Standards and Technology, Information Technology Laboratory, Technical Report Version 15, 2009. [8] L. Youse, M. Butrico, and D. Da Silva, Toward a unied ontology of cloud computing, in Proceedings of the 2008 Grid Computing Environments Workshop, 2008, pp. 1-10. [9] B. Sotomayor, R. S. Montero, I. M. Llorente, and I. Foster, Virtual infrastructure management in private and hybrid clouds, IEEE Internet Computing, 13(5):14-22, September-October, 2009. [10] Amazon Web Service, Customer Success. Powered by the AWS Cloud., available: http://aws.amazon.com/solutions/case-studies/ [11] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youse, and D. Zagorodnov, The Eucalyptus open source cloud computing system, in Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2009), Shanghai, China, pp. 124-131, University of California, Santa Barbara, September, 2009. [12] OpenStack, Open source software for building private and public clouds, available: https://www.openstack.org/ [13] Appistry Inc., Cloud Platforms vs. Cloud Infrastructure, White Paper, 2009. [14] L. Youse, M. Butrico, and D. Da Silva, Toward a unied ontology of cloud computing, in Proceedings of the 2008 Grid Computing Environments Workshop, 2008, pp. 1-10 [15] B. Hayes, Cloud computing, Communications of the ACM, 51:9-11, 2008.
BIBLIOGRAPHY
177
[16] Joep Ruiter, Martijn Warnier, Privacy Regulations for Cloud Computing: Compliance and Implementation in Theory and Practice, in Computers, Privacy and Data Protection: an Element of Choice 2011, pp 361-376. [17] Google, Google App for Business, available: http://www.google.it/intx/en/enterprise/apps/business/ [18] Microsoft, Oce 365, available: https://mspartner.microsoft.com/en/us/pages/solutions/ office-365.aspx [19] Google, Google Cloud Platform, available: https://cloud.google.com [20] Cloud Foundry, Open PaaS, available: http://www.cloudfoundry.org/ [21] Cloud Foundry Community, Get Involved and Contribute, available: http://cloudfoundry.org/get-in/index.html [22] Cloud Foundry, Explaining The Magic Triangle, available: http://blog.cloudfoundry.com/2011/04/14/explaining-themagic-triangle [23] Cloud Foundry, Adding a Service, available: http://docs.cloudfoundry.com/docs/dotcom/adding-aservice.html [24] Steve Herrod, Cloud Foundry Delivering on VMwares Open PaaS Strategy, available: http://blogs.vmware.com/vmware/2011/04/cloud-foundrydelivering-on-vmwares-open-paas-strategy.html [25] James Aspnes, Costas Busch, Shlomi Dolev, Panagiota Fatourou, Chryssis Georgiou, Alex Shvartsman, Paul Spirakis, Roger Wattenhofer, Eight open problems in Distributed Computing, in EATCS Bulletin, Number 90, October 2006, viii+248 pp.
178
BIBLIOGRAPHY
[26] Derek Collison, client.rb source code, available: https://github.com/derekcollison/nats/blob/master/lib/ nats/client.rb#L123 [27] Derek Collison, Cloud Foundry Open PaaS Deep Dive, available: http://blog.cloudfoundry.com/2011/04/19/cloud-foundryopen-paas-deep-dive/ [28] Cloud Foundry, runner.rb source code, available: https://github.com/cloudfoundry/cloud_controller_ng/blob/ master/lib/cloud_controller/runner.rb#L104 [29] Cloud Foundry, base.rb source code, available: https://github.com/cloudfoundry/cloud_controller_ng/blob/ master/lib/cloud_contro\ler/rest_controller/base.rb [30] Cloud Foundry, Warden, available: http://docs.cloudfoundry.com/docs/running/architecture/ warden.html [31] Cloud Foundry, Custom Buildpacks, available: http://docs.cloudfoundry.com/docs/using/deployingapps/custom-buildpacks.html [32] Cloud Foundry, Bosh Stemcell source code, available: https://github.com/cloudfoundry/bosh/tree/master/boshstemcell [33] Cloud Foundry, Bosh Package, available: http://docs.cloudfoundry.com/docs/running/bosh/reference/ packages.html [34] Cloud Foundry, Bosh Job, available: http://docs.cloudfoundry.com/docs/running/bosh/reference/ jobs.html
BIBLIOGRAPHY
179
[35] Cloud Foundry, Bosh Monit source code, available: https://github.com/cloudfoundry/bosh/tree/ f41902c05341f55b7728a3a8e4910e6e4b1b7071/stemcell_builder/ stages/bosh_monit [36] Cloud Foundry, Bosh Agent source code, available: https://github.com/cloudfoundry/bosh/tree/master/bosh_ agent [37] Cloud Foundry, cf-release source code, available: https://github.com/cloudfoundry/cf-release [38] NTT Labs, nise bosh source code, available: https://github.com/nttlabs/nise_bosh/ [39] Cloud Foundry, bosh-lite source code, available: https://github.com/cloudfoundry/bosh-lite [40] Iwasaki Yudai, cf nise installer source code, available: https://github.com/yudai/cf_nise_installer [41] Cloud Foundry, Deploying Micro BOSH, available: http://docs.cloudfoundry.com/docs/running/deployingcf/openstack/deploying_microbosh.html [42] Openstack, Conguring the Compute API, available: http://docs.openstack.org/grizzly/openstack-compute/admin/ content//configuring-compute-API.html [43] Cloud Foundry, Deploying Cloud Foundry on OpenStack, available: http://docs.cloudfoundry.com/docs/running/deployingcf/openstack/ [44] Cloud Foundry, Cloud Foundry Developers Google Group, available: https://groups.google.com/a/cloudfoundry.org/forum/#! forum/vcap-dev
180
BIBLIOGRAPHY
[45] David Shue, Michael J. Freedman, Anees Shaikh, Performance isolation and fairness for multi-tenant cloud storage, in OSDI 2012, Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, Pages 349-362. [46] Jeyakumar, V., Alizadeh, M., Mazieres, D., Prabhakar, B., Kim, C., EyeQ: Practical network performance isolation for the multi-tenant cloud, in Proc. of Usenix HotCloud 2012. [47] Nuwan Goonasekera, William Caelli, Colin Fidge, A Hardware Virtualization Based Component Sandboxing Architecture, Journal of Software, 7(9), pp. 2107-2118, Dec. 2012. [48] Kamezawa Hiroyu, Cgroup and Memory Resource Controller, Japan Linux Symposium, Nov. 2008. [49] Joe Zonker Brockmeier, Containers vs. Hypervisors: Choosing the Best Virtualization Technology, available: http://www.linux.com/news/technology-feature/ virtualization/300057-containers-vs-hypervisors-choosingthe-best-virtualization-technology[50] Linux Foundation, Xen Project, available: http://www.xenproject.org/ [51] KVM, Kernel Based Virtual Machine, available: http://www.linux-kvm.org/page/Main_Page [52] QEMU, QEMU - Open-Source Processor Emulator, available: http://wiki.qemu.org/Main_Page [53] Chen YiFei, Lightweight virtualization docker in practice, available: http://www.slideshare.net/dotCloud/docker-in-praticechenyifei-28898132
BIBLIOGRAPHY
181
[54] Paul Menage, CGroups, available: https://www.kernel.org/doc/Documentation/cgroups/cgroups. txt [55] Red Hat, How Control Groups Are Organized, available: https://access.redhat.com/site/documentation/en-US/Red_ Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ ch01.html#sec-How_Control_Groups_Are_Organized [56] LXC, Linux Containers, available: http://linuxcontainers.org/ [57] Cloud Foundry, Warden source code, available: https://github.com/cloudfoundry/warden [58] Michael Kerrisk, Namespaces in operation, part 1: overview, available: http://lwn.net/Articles/531114/ [59] Cloud Foundry, stacks source code, available: https://github.com/cloudfoundry/stacks [60] Cloud Foundry, app stager task.rb source code, available: https://github.com/cloudfoundry/cloud_controller_ng/ blob/3a1907d2941ac5ff1a4568be9b0068b9cdbff16f/lib/cloud_ controller/app_stager_task.rb#L98 [61] Amazon AWS, AWS CloudFormation, available: http://aws.amazon.com/cloudformation/ [62] OpenStack, Heat, available: https://wiki.openstack.org/wiki/Heat [63] Roy Longbottom, Whetstone Benchmark Detailed Results on PCs, available: http://freespace.virgin.net/roy.longbottom/whetstone\ %20results.htm namespaces
182
BIBLIOGRAPHY
[64] Roy Longbottom, Four Core Eight Thread Computing Benchmarks, available: http://www.roylongbottom.org.uk/quad\%20core\%208\ %20thread.htm [65] NLANR/DAST, Iperf, available: http://iperf.sourceforge.net/

Design The Support For Granting Required Sla in Public Cloud Environments Based On Cloud Foundry

Uploaded by

Copyright:

Available Formats

You might also like

Design The Support For Granting Required Sla in Public Cloud Environments Based On Cloud Foundry

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design The Support For Granting Required Sla in Public Cloud Environments Based On Cloud Foundry

Uploaded by

Copyright:

Available Formats

ALMA MATER STUDIORUM - UNIVERSIT DI BOLOGNA

SCUOLA DI INGEGNERIA E ARCHITETTURA

TESI DI LAUREA in RETI DI CALCOLATORI M

CANDIDATO: Guido Davide DallOlio

Anno Accademico 2012/13 Sessione III

Key words : PaaS Cloud Foundry Isolation

Introduction 1 Introduction to Cloud Computing 1.1 1.2

Service Level Agreement in the Cloud . . . . . . . . . . . . . . 24 27

2 Cloud layers and its uses 2.1

Cloud Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.1 2.1.2 2.1.3 IaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 PaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 SaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 BOSH 4.1 4.2

5 Cloud Foundry Deployment

CONTENTS 5.1 5.2

Deploying a distributed Cloud Foundry . . . . . . . . . 92 101

6 Application Isolation in Cloud Foundry 6.1 6.2 6.3

Risks of a Container Based isolation . . . . . . . . . . . . . . . 120 125

7 Improving provided isolation 7.1 7.2

8 7.2.1 7.2.2 7.2.3 7.2.4 7.3

8 Isolation and Co-location Performances 8.1

Conclusions and Future work

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . 110

Chapter 1 Introduction to Cloud Computing

Introduction to Cloud Computing

1.1 Cloud Computing

Introduction to Cloud Computing

1.1 Cloud Computing

Introduction to Cloud Computing

1.2 Dierent Clouds

Introduction to Cloud Computing

Private vs Public Cloud

1.2 Dierent Clouds

Service Level Agreement in the Cloud

1.3 Service Level Agreement in the Cloud

Introduction to Cloud Computing

Chapter 2 Cloud layers and its uses

Cloud layers and its uses

Figure 2.1: The cloud computing stack

2.1 Cloud Layers

2.1 Cloud Layers

Cloud layers and its uses

2.1 Cloud Layers

Cloud layers and its uses

Google Cloud Platform

2.2 Main Platforms

Thanks to a solid infrastructure, the Compute Engine, the platform is ca-

Amazon Web Services

Cloud layers and its uses

Chapter 3 Cloud Foundry

3.2 The Architecture

Figure 3.2: Cloud Foundry Architecture

3.2 The Architecture

42 Selection of the best DEA agent;

Droplet Execution Agent

3.2 The Architecture

3.2 The Architecture

User Account and Authentication Server

3.2 The Architecture