Internet Performance Delivered right to your inbox

The Importance Of IaaS To DevOps

Cloud Computing has been part of the internet service delivery landscape for over a decade, and has now become so ubiquitous that even our friends and family that are not in the industry understand what it is. As “The Cloud” has grown and expanded, so to has the concept of DevOps and what it means within an organization and within the industry.

Infrastructure as a Service (IaaS) is the new paradigm being offered by all major cloud providers. Whether you need the dynamic scaling capabilities cloud computing was founded on, or you just need to reduce the cost and overhead of a full-blown datacenter, treating your Infrastructure As Code is the key to deploying resilient, performant web-based services.

In addition to the compute or database resources necessary for your service, your infrastructure includes networking, routing , configuration management, and various third party services such as resource caching and metrics/monitoring/alerting services. The DNS records necessary for service discovery are a crucial part of your infrastructure, and should be managed via code. In a cloud environment resources can disappear or move frequently, but not necessarily regularly or routinely. By having your services announce their availability via DNS records, those services become and remain discoverable, even when they move location.

While all providers offer a web-based UI for deploying and managing your infrastructure resources, they also offer APIs, and therein lies the power of maintaining your Infrastructure as Code. The dynamic aspect of the cloud demands that your resources be managed programmatically via APIs.

Recently we added Terraform to our quiver for managing the infrastructure we use to deliver our services. Terraform has API interactions for a plethora of IaaS/PaaS/SaaS providers.

Managing Cloud Infrastructure with Terraform

When defining our infrastructure in code we treat the Terraform Modules defining it as first class citizens. The modules are part of our code repositories, follow a pull request review pattern, and use our CI/CD build pattern to push changes automatically on merge following successful test runs.

Terraform is very handy in managing integration testing infrastructure in an automated fashion, allowing full platforms to be built and destroyed based on testing criteria. Following a well-known pattern from configuration management, we use environment specific variable files to dynamically set values. Examples of variables that might be different between your integration and production environments are the VPC id, the subnet where an instance is launched, or the hostname and domain of services.

It is not always end-to-end testing that requires bringing up a full infrastructure, especially for cloud native deployments. Developers need to be able to repeatedly and reliably build the infrastructure their service relies on. With a consistent definition of that codified, there is no chance that a bespoke handcrafted configuration file will be lost when a resource is destroyed (or becomes unavailable for some reason).

To keep all commands and options for building and testing our infrastructure consistent between developer workstation and our build pipelines, we wrap our Terraform actions in Makefile targets. Here’s an example of a target to apply a terraform module. Using the -var-file option allows us to override variables with environment specific values.

tf-apply:
  TF_LOG=${TF_LOG} TF_LOG_PATH=${TF_LOG_PATH} \
  terraform apply
  -var-file=${TF_VAR_ENV_FILE}

With our automated tests in place, we use our Make targets to build our infrastructure, run our tests, and destroy our infrastructure. Destroying test infrastructure when it is no longer needed is crucial to keeping your cloud infrastructure costs in check.

Being able to easily and repeatedly build and destroy infrastructure is important for integration testing. Deploying a fresh clean infrastructure on each test run increases confidence in the test results. We know that artifacts from a previous test are not affecting the results. Including an environment variable such as TF_VAR_tag_environment for tagging instances with a unique id allows for concurrent testing on test-run specific resources.

We use the following pattern in our CI job builder to automatically run tests.

# first stand up some infrastructure
TF_VAR_tag_environment= make -e tf-apply
# run our test
make our-test-run
# take it all down
TF_VAR_tag_environment= make -e tf-destroy

A successful run passes off to an upstream job for deployment.  In the event a test run has failures, we do not call the destroy method and report the failure. The infrastructure is tagged with the test run id, making it easy to identify and review.

Having a simple, repeatable infrastructure build process helps our developers too. They can spin up and destroy the necessary resources easily. If devs do not have to painstakingly build up their development infrastructure, it can be destroyed when not in use, again cutting costs. This gives the added benefit of exercising the code for building infrastructure dozens of times a day. We have confidence that we can rebuild our infrastructure successfully with just the push of a button.

DNS and CNAMES are your friend

Having your hostnames consistently available is crucial to the success of your tests. In many cases the recreated resources will have a different IP address from their predecessor. Additionally, cloud resources (especially in AWS) have some rather convoluted names, that you wouldn’t want to expose to your end users. This is where DNS and CNAMES come to the rescue.

Your tests will have hostnames for services it expects to exist, which is where automated DNS record management comes into play.  Using a robust Managed DNS Service with a programmatic interface for fast, dynamic updates gives you the flexibility and control you need to solve this dilemma. If your Cloud Platform provider does not offer such a service you can use an outside provider.

In the DNS zone for your testing domain you can include a records like this:

testapi.subdomain.testdomain.com 30 IN CNAME api-id.execute-api.region.amazonaws.com

or

webhost.subdomain.testdomain.com 30 CNAME ec2-12-34-56-78.us-west-2.compute.amazonaws.com

The hostname (the left side of the record) remains constant, while the value for the CNAME can be updated as part of your infrastructure deployment. The “30” in that record is the Time To Live (TTL) of the record, how long the value will be valid. With a low TTL, you can update the values rapidly, which is valuable in testing scenarios.

Terraform has built-in providers for several Managed DNS providers including Dyn, and knows how to map relationships between resources. You can issue a DNS record update when your resources are provisioned ensuring your test endpoints are reachable. Terraform understands the dependencies between your resources. When you use an attribute like public_dns of an AWS instance as an attribute to your DNS resource, Terraform knows to not issue the DNS update call until after the AWS instance has been launched.

Recreating Your Infrastructure

Infrastructure as Code plays a crucial role in any organization’s recovery or response plans. By having your Infrastructure defined in a manner that has been tested (as part of your automated testing) you can be confident that you have the necessary resources and mappings defined. You no longer need to fear that bespoke hand-fed pet with 500+ days of uptime may die.

In the event of a cloud resource becoming unavailable, you can quickly respond to your monitoring alerts to re-deploy a known working infrastructure. Terraform has the added advantage of keeping track of the state of your infrastructure, and will only rebuild the resources that do not match the expected state.

Just as your applications need to be cloud aware and resilient to ephemeral resources so must your infrastructure.

It’s the cloud, things are going to evaporate. Be prepared.


Share Now

Whois: Lisa Hagemann

Lisa Hagemann is a Senior Automation Engineer at Oracle Dyn Global Business Unit, a pioneer in managed DNS and a leader in cloud-based infrastructure that connects users with digital content and experiences across a global internet.