Cloud Optimization

AWS has reduced prices 67 times since launching in 2006. However, costs increase when consumption increases. It’s important to continuously work with cloud optimization and Tiqqe can help you on your journey.

First, there is a natural asymmetrical relationship between the provisioning and decommissioning of resources. We always know when we need something, but we don’t always remember when we no longer need it. Because of this, unused resources tend to accumulate. This principle applies at the application portfolio level, in the form of an inflated application count, as well as at the infrastructure level, in the form of unused servers, storage, and other components. We’re not used to actively decommissioning resources because in the fix-cost model the payoff isn’t as significant, but in the consumption-based model actively decommissioning resources is vital.

Second, performance tuning is typically only performed in response to performance problems. When too few resources are allocated, we feel it in the form of poor application performance, so we add resources, such as upsizing EC2 instances and adding Provisioned IOPS to EBS storage volumes. But when too many resources are allocated it is essentially invisible—perfectly allocated resources and over-allocated resources feel identical in the context of application performance. Performance and cost are directly correlated in a consumption-based model, so we should be looking for opportunities to reduce performance where possible in order to reduce cost. In the fixed-cost model investments in performance are a sunk cost, but in the consumption-based model the costs are recoverable.

Continuous optimization is an iterative process where we implement a set of simple, high-impact cost reduction methods across all applications, and then measure and report the cost savings results. The process is then repeated on a regular cadence. Below are two essential tenets of continuous optimization.

Cost optimization is not a project, it’s a way of life.

We are never finished with continuous optimization. It is integrated into our existing operating procedures and we work to improve the process every cycle. The process is designed to be low-cost and low-overhead. And within those limitations, continuous optimization is designed to find out exactly what level of super-optimization is possible. “How inexpensively can we run each application?” is the question to be answered.

Focus on big impact/low effort.

Each optimization idea should be ranked by its impact/effort ratio, and ideas should be implemented starting from the top of the list and progress downward until reaching a point where the effort exceeds the impact. This line will be drawn in a different place by different organizations, and can change over time to suit the business priorities. I give some examples of the ideas I’ve implemented below as a starting point.

Here are three categories of optimization along with several examples of each.

Category 1: Remove

These are the easiest ideas that produce the most cost savings.

Remove unused applications. Determine whether the application is really needed. If not, delete all infrastructure and data associated with it.
Remove unused instances. Look for instances that are no longer used, and then shut them down. AWS CloudWatch metrics can be a useful starting point to discover idle instances.
Remove unused storage volumes. Volumes unattached from instances (orphan volumes) are almost never needed any longer. A helpful policy is to require that needed orphan volumes have a tag specifying who needs it and why it is needed. Verify that they’re not needed and remove them.
Remove unused snapshots. Storage and instance snapshots accumulate when there isn’t an active process to remove them. Determine what is needed and remove the rest.
Reallocate or sell unutilized reserved instances. AWS Cost Explorer is a great tool for finding unused RIs. Either move an on-demand instance to an instance type that is covered by an RI or resell them on the Amazon EC2 Reserved Instance Marketplace.

Category 2: Resize

Everything that can’t be deleted should be evaluated to ensure it isn’t over provisioned.

Resize instances. Use Amazon CloudWatch metrics to determine which instances can be downsized.
Resize storage volumes. Look at storage volume utilization and reduce any unnecessary free space. Re-evaluate any overgenerous free space policies that were carried over from on-premises. It’s possible to be far more efficient with storage in AWS.
Reduce performance of storage volumes. Use Amazon CloudWatch metrics to determine if Provisioned IOPS volumes can be detuned or moved to less expensive non-PIOPS volumes.

Category 3: Refactor

This category should be done less frequently, as takes more effort and is less likely to produce results after the first pass. However, the first pass will likely produce significant results, so this step should be done at least once. After that a quarterly or annual review is a reasonable cadence. Look at each application and ensure that the architecture is as efficient as possible. We recommend to perform an AWS Well Architected review