Terraform - Automating Infrastructure As A Service: Michael Howard

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Terraform — Automating Infrastructure As A

Service
Michael Howard
Computer Science Department
Portland State University
Portland, USA
mihoward@pdx.edu

Abstract— Developing a software service requires a strict


arXiv:2205.10676v1 [cs.SE] 21 May 2022

software development life cycle and process. This process de-


mands controlling all application code through source control
management as well as a rigorous versioning and branching
strategy. However, the platform and infrastructure also benefit
from this rigor. Software services must be deployed to a target
run time environment and provisioning that environment through
manual user actions is tedious and error-prone. Provisioning
manually also becomes prohibitive as the number of resources
grow and spread globally over multiple regions. The answer is to
apply the same rigor to provisioning the infrastructure as applied
to developing the application software. Terraform provides a
platform allowing infrastructure resources to be defined in code.
This code not only allows the automation of the infrastructure
provisioning but also allows for a strict development and review
life cycle, same as the application software.
Index Terms—cloud computing, terraform, infrastructure, pro-
visioning

I. I NTRODUCTION Fig. 1: High-level workflow for Terraform cover-


Terraform is an Infrastructure As Code (IaC) client tool ing write, plan and apply stages.
developed by HashiCorp. It allows the user to define both
cloud and on-premise compute resources in human-readable
configuration files. These files are created using the HashiCorp A. Write
Configuration Language (HCL). The syntax is declarative with The write stage focuses on developing the code required to
each block of code defining a resource to be provisioned. drive the plan. The infrastructure resources are defined here
Declarative definitions (versus imperative) allow the user to for the providers and services required. Multiple providers
define the desired state, rather than an exhaustive list of all may be included. The code is contained in configuration
the interim steps required to achieve that state. files with a .tf extension. There is an additional option to
Section II discusses typical Terraform workflows and how store the configuration in JavaScript Object Notation (JSON)
the tool is used. Section III digs more into the structure of format which then requires the .tf.json extension. The correct
the HCL code. Various platforms that can run the Terraform extension must be used in order for the Terraform tool to detect
code are discussed in Section IV while the provider plugins the configuration file while generating the plan.
are expanded upon in Section V. Section VI details the Cloud For example, a user wishes to create a mini-cluster of
Development Kit interfacing Terraform to multiple high-level virtual machines in Amazon Web Service (AWS). Three virtual
programming languages. The alternatives to Terraform are machines, a Virtual Private Cloud (VPC) network, security
examined in Section VII. Other related topics of research are group and a load balancer service are required for this cluster.
discussed in VIII which leads to a summary discussion in The Terraform configuration code declares each of these
Section IX. resources separately with corresponding parameters. Section
III goes into more detail of the configuration language.
II. BASIC W ORKFLOWS B. Plan
Figure 1 introduces the basic workflow for Terraform. The Once the configuration code has been written, the next stage
development and implementation of IaC is broken into write, is to run the Terraform tool to generate a plan. The tool is run
plan and apply stages. either through the local Command Line Interface (CLI) or via
other high-level language that can interface to the Terraform corresponding subnet. The purpose of the block is defined by
framework via the Cloud Development Kit (CDK). Section VI the block type. A variable block type is used as a parameter
digs into further detail of the CDK. Running the Terraform in other blocks. A provider defines what provider plugin is
plan will scan local directories for configuration files ending used to translate the resource block into API calls to the
in .tf or .tf.json and process these into a list of actions to be infrastructure provider. The resource block is used to define
sent to the provider(s). This list is called the execution plan a concrete resource in the provider’s infrastructure. Resource
and encompasses all create, update and destroy actions needed blocks are translated into create, update or delete API calls to
to make the target infrastructure match what is declared in the the provider’s target infrastructure service.
configuration code.
The plan generation additionally has a dependency on the
existing infrastructure that is represented in the state. The
state details all infrastructure resources that are currently
present. The state file exists either locally in the file system or
remotely. For example, the configuration contains an Elastic
Compute (EC2) virtual machine named “VirtualMachine1”.
Upon running Terraform to generate the plan, the current state
file is checked. If the VirtualMachine1 EC2 is already existing,
the plan does not create it. It will either be an update action
or no operation.
C. Apply
The final stage in the Terraform workflow is to apply.
Running the tool on a plan executes each action against the
corresponding provider. Figure 2 shows Terraform interacting
with its provider plugin which subsequently calls into the
Application Programming Interface (API) of the corresponding
cloud provider (e.g. AWS). Provider modules act as the
abstraction between the configuration code and unique API
defined by each infrastructure provider. The providers are
further discussed in Section V. As part of apply, the state is
updated to represent the changes in the target infrastructure.

Fig. 2: Terraform actions go through a provider Fig. 3: An HCL example declaring the required
module to translate into API calls specific to that provider, a VPC and subnet.
provider.
IV. F RAMEWORK E NVIRONMENT
III. C ONFIGURATION L ANGUAGE Environments that can run Terraform are the CLI, Terraform
Cloud, Terraform Enterprise and CDK. The CLI is the most
Terraform scans configuration files and generates a cor- common. Pre-built binaries can be downloaded or the Golang
responding plan. The configuration files are written in the source code1 can be cloned and built. The Terraform tool
HashiCorp Configuration Language (HCL). This language is runs on a local set of configuration files. These files can be
declarative. Declarative offers an advantage over imperative organized into subdirectories which Terraform will automati-
in that the desired state of the infrastructure can be directly cally traverse. The state file is typically generated in the same
coded. An imperative language requires defining all the interim directory that the tool runs from. However, there is a remote
steps to arrive at the desired state. option which generates state files in a remote, central location
The main purpose of HCL is to define resources. The code is such that multiple Terraform clients may apply their plans and
written in blocks with each block representing an infrastructure still synchronize their view of the existing infrastructure.
object. A Terraform configuration is a complete document in Terraform Cloud is a Terraform environment hosted by
HCL telling Terraform how to manage a given collection of HashiCorp. As a hosted service, users log in to generate and
infrastructure resources. Figure 3 shows example code that
declares the required provider plugin as well as a VPC and 1 https://github.com/hashicorp/terraform
apply plans. The Terraform tool-chain itself is maintained by
HashiCorp while the state files are centrally stored such that all
users are running against the same current infrastructure state.
Terraform Enterprise is a self-hosted version of Terraform
Cloud. If offers the same cloud-based feature set but is
designed to be deployed within an enterprise’s private cloud.
Another version of the CLI or local environment is the
Cloud Development Kit (CDK). Rather than running the CLI
tool directly, CDK permits five supported high-level languages
to generate and apply Terraform plans. Code in these supported
languages is able to call in to the Terraform framework,
replacing the Terraform CLI. Section VI provides further
details on CDK.

V. P ROVIDERS
Fig. 4: CDK and other pathways to define config-
Terraform relies on plugins called providers to interact and uration, input to Terraform and provision infras-
abstract the various infrastructure providers. Each provider tructure through multiple providers. Configuration
must be declared in the configuration using the “provider” input may be through CDK, CRDS, HCL or
block type. Once a provider has been declared, the correspond- JSON.
ing plugin is included while generating the plan. Declared
resources utilize the provider’s underlying API to perform
create, update and delete actions needed to ensure the resource VII. A LTERNATIVES
ends up in the desired state. Terraform provides an abstraction of providers and their re-
Providers come from a publicly available registry of known sources. It can represent physical hardware, virtual machines,
plugins2 . The list is extensive and covers all known cloud, containers, network configurations, email and Domain Name
Software as a Service (SaaS) and other APIs. These provider Service (DNS) providers. Given the breadth of resources and
plugins allow the resource and data source blocks to be providers, Terraform does overlap with other tools. Some of
declared without needing details on the specific provider’s these tools will be discussed here.
API. Each provider maintains documentation on the Terraform
blocks it supports along with the corresponding parameters. A. Chef and Puppet
Chef3 and Puppet4 are configuration management tools.
VI. C LOUD D EVELOPMENT K IT They are designed to install and manage software on compute
resources that already exist. Terraform instead focuses on the
The Cloud Development Kit (CDK) for Terraform allows bootstrapping and initializing of those compute resources. It
the use of other programming language to define and provi- works well in conjunction with configuration management.
sion infrastructure. CDK gives access to the entire Terraform
B. CloudFormation and Heat
ecosystem without requiring development in HashiCorp Con-
figuration Language (HCL) and running it via the CLI tool. CloudFormation5 and Heat6 are both tools that represent
Additionally, a user can more easily integrate with an exist- infrastructure as code, just like Terraform. The configuration
ing tool-chain for testing and dependency management. The files allow the infrastructure to be elastically created, modified
following languages are currently supported: and destroyed. The big advantage which Terraform provides
is it is provider-agnostic. CloudFormation is an Amazon Web
• Typescript Service (AWS) tool and only works with provisioning other
• Python AWS resources. Heat similarly operates only on an OpenStack
• Java API. Terraform not only supports multiple providers but can
• C# also combine resources from each into a single plan. Thus, it
• Go introduces multi-cloud provisioning.
Figure 4 shows the various input pathways to Terraform. Another feature which Terraform has over CloudFormation
CDK may be invoked from its five supported languages while and Heat is the separation of the plan and execution. Terraform
configuration code in HCL or JSON require the Terraform has the distinct stage to generate a plan which also takes into
CLI. Kubernetes’ Custom Resource Definitions (CRDS) are
3 https://www.chef.io/
another possibility but will not be covered here. 4 https://puppet.com/
5 https://aws.amazon.com/cloudformation/
2 https://registry.terraform.io/browse/providers 6 https://docs.openstack.org/heat/latest/
account the existing state of the infrastructure. The plan is
then optionally reviewed and approved before the apply stage
executes each plan action. Terraform also has a graph feature
which displays the plan actions ordered by dependency.

C. Boto and Fog


Boto7 and Fog8 are similar to provider plugins in Terraform.
They abstract the API to a particular infrastructure provider.
Both still require some high level programming language to
declare the resources and call in to their respective libraries. In
contrast, Terraform abstracts this functionality as an extensive Fig. 5: Structural Elements of a TOSCA Service
set of plugins and providers the high-level configuration lan- Template and its Relations.
guage to allow declaring the resources in a provider-agnostic
manner.
• Define Functions as a Service applications that can run
VIII. R ELATED W ORK without any corresponding deployment.
• Deploy services to Internet of Things (IoT) and Edge
The Organization for the Advancement of Structured In-
devices while minimizing latency.
formation Standards (OASIS)9 was founded in 1993 as a
• Support open and interoperable process control architec-
non-profit consortium that works on the development, con-
tures.
vergence and adoption of open standards for cybersecurity,
cloud computing and related areas [2]. The two standards Implementations of the TOSCA standard can take the form
that are relevant to this paper are Topology and Orchestration of:
Specification for Cloud Applications (TOSCA) and Cloud • Source code: a Yet-Another-Markup-Language (YAML)
Application Management for Platforms (CAMP). document which defines the Service Template.
• Processor: a tool or engine to parse the Service Template
A. TOSCA document.
TOSCA is an open standard that describes a topology of • Orchestrator: a tool or engine that processes the Service
cloud-based web services, their components, relationships and Template in order to deploy and manage an application.
the processes that manage them. Version 1.0 was approved 16 • Translator: a tool to translate the Service Template into
January 2014 by OASIS. The standard enables portability and another language such as Helm Charts or Amazon Cloud-
automated management across cloud providers regardless of Formation templates.
the underlying infrastructure. This standard improves reliabil- • Generator: a tool to generate a Service Template.
ity and reduces cost while facilitating the continuous delivery • Archive: Cloud Service Archive (CSAR), a package con-
of applications across their entire lifecycle. taining the Service Template and other artifacts needed
Version 2.0 is the current and was approved 28 October for deployment.
2020 [3]. The core specification provides a language for Links to all known TOSCA implementations are listed in the
describing service components and their relationships using oasis-open repository in GitHub10 . Also, a full list of TOSCA
a service topology, and it provides for specifying the lifecycle technical committee members may be viewed at the oasis-open
management procedures that allow for creation or modifi- membership page11 .
cation of services using orchestration processes. A TOSCA
Service Template, as shown in Figure 5, combines topology B. CAMP
and orchestration needed in different environments to enable The Cloud Application Management for Platforms (CAMP)
automated deployment of services and their management is another standard to come from the OASIS consortium. Its
throughout the complete service lifecycle. technical committee published version 1.0 in August 2012 and
The TOSCA language has the ability to automate lifecycle was a collaboration between CloudBees, Cloudsoft Corpora-
management for the following: tion, Huawei, Oracle, Rackspace, Red Hat, and Software AG
• Infrastructure as a Service deployments for multiple cloud [4]. The technical committee was closed by OASIS on 23
providers (i.e. OpenStack, AWS, Microsoft Azure). April 2021 and is no longer active. As CAMP is referenced
• Deploy containerized applications to existing orchestra- frequently in cloud computing research, we will include it here
tors (i.e. Kubernetes). for completeness.
• Define the management of Virtual Network Functions. The CAMP technical committee’s goal advances an in-
• Support on-demand creation of network services. teroperable protocol that packages and deploys cloud-hosted
7 http://boto.cloudhackers.com/en/latest/ 10 https://github.com/oasis-open/tosca-community-
8 https://github.com/fog/ contributions/wiki/Known-TOSCA-Implementations
9 https://www.oasis-open.org/ 11 https://www.oasis-open.org/committees/membership.php?wg abbrev=tosca
applications. The standard defines models, mechanisms and mature the standard as well as bring a new suite of framework
protocols for the management of a Platform as a Service tools into the mainstream that implement it.
(PaaS) environment. PaaS describes a service where the users
R EFERENCES
manage the platform that applications are hosted on. In
contrast, the TOSCA standard focuses on Infrastructure as [1] HashiCorp, Terraform Documentation. Accessed: May 2, 2022. [Online].
Available: https://www.terraform.io/docs
a Service (IaaS). PaaS exists a level above IaaS and thus [2] Wikipedia, OASIS (organization). Accessed: May 6, 2022. [Online].
CAMP and TOSCA are complementary standards rather than Available: https://en.wikipedia.org/wiki/OASIS (organization)
overlapping. [3] Oasis Open, TOSCA Version 2.0. Accessed: May 7, 2022. [On-
line]. Available: https://docs.oasis-open.org/tosca/TOSCA/v2.0/TOSCA-
v2.0.html
IX. C ONCLUSION AND D ISCUSSION [4] Oasis Open, CAMP Charter. Accessed: May 7, 2022. [Online]. Avail-
able: https://www.oasis-open.org/committees/camp/charter.php
In this paper, we discuss Terraform as an Infrastructure as
Code (IaC) tool. Its framework allows the user to declare
infrastructure resources through code, generate an execution
plan from that code and finally apply the plan. Applying works
through provider plugin modules which translate the execution
actions into API calls specific to the provider. Not only is the
infrastructure created in a programmatic manner, it is agnostic
of the underlying provider (i.e. AWS, private cloud, VMWare,
Microsoft Azure). This abstraction avoids vendor lock-in and
increases portability between vendors.
The Terraform configuration language is discussed as well
as the possible environments that can interpret and run the
corresponding code. These environments range from local exe-
cution to both cloud and private hosted services. Provider mod-
ules allow interfacing to almost any third-party infrastructure
vendor or service while the Cloud Development Kit (CDK)
enables a user to integrate with the Terraform framework from
five popular high-level programming languages. Configuration
management tools such as Chef, Ansible and Puppet focus
more on automating the software and configurations within
an infrastructure resource and thus, do not directly compete
with Terraform. Amazon’s CloudFormation and OpenStack’s
Heat do compete. However, their framework only supports
their own infrastructure and cannot provision resources from
other providers.
As of this writing, Terraform focuses on the provisioning
of infrastructure and services. The building and deployment
of software applications to that infrastructure requires addi-
tional tool-chains and automation. A product that overseas an
entire operation from multi-cloud infrastructure provisioning,
application deployment and runtime orchestration does not
exist at this time. One promising possibility is Topology and
Orchestration Specification for Cloud Applications (TOSCA).
TOSCA is a standard developed by the Organization for the
Advancement of Structured Information Standards (OASIS)
consortium. Part of the standard is a specification language
that allows users to create YAML-based Service Templates.
These templates declare nodes, workflows and relationships
such that a more complete picture may be specified for the
total operation of an application and its required infrastructure.
TOSCA implementations include template code interpreters,
orchestrators, translators and archive tools. OpenStack Heat
is an example of a TOSCA-compatible framework but is still
limited in its breadth and focuses mostly on the provisioning.
Further research in TOSCA is required, both to expand and

You might also like