Table of Contents

Add a header to begin generating the table of contents

The Future of Cloud Economics: Cost Savings with Automated AWS Resource Management

Executive Summary

In an era where cloud adoption is critical to business success, managing and optimizing cloud costs has become a significant challenge for organizations. This whitepaper delves into the complexities of cloud cost management and introduces automated optimization strategies to tackle these challenges. By leveraging advanced tools and technologies, managers can not only gain better control over their cloud expenses but also drive significant savings. The document outlines practical steps, technical architectures, and a real-world case study to guide managers in implementing effective cost optimization strategies. Implementing these practices can lead to improved operational efficiency and substantial cost reductions.

Start optimizing your cloud costs today by integrating automated solutions and a FinOps framework tailored to your organization’s needs.

Introduction

Cloud computing offers unparalleled scalability and flexibility, but these benefits come with the challenge of managing costs. Many organizations struggle with overprovisioned resources, underutilized instances, and idle services, leading to inflated cloud bills. Traditional cost management practices, which often rely on manual monitoring and adjustments, are time-consuming and prone to errors.

Context

The objective of our automated AWS resource management tool is to address these challenges by providing a scalable, efficient, and automated solution for optimizing cloud costs. This tool identifies underutilized, overutilized, and idle resources within AWS environments and performs cost-saving actions such as resizing, terminating, or pausing resources based on usage patterns.

Solution Overview

AWS Lambda

Lambda functions are the core of our tool’s logic. They analyse resource usage metrics and determine whether a resource is underutilized, overutilized, or idle. Based on this analysis, Lambda triggers appropriate actions such as resizing or terminating resources.

Tool Architecture

Key Components:

The core of our cost optimization tool is built around a series of AWS Lambda functions, each responsible for a specific aspect of the optimization process. These functions are orchestrated through AWS EventBridge and monitor resource utilization through AWS CloudWatch and other AWS APIs.
AWS EventBridge
EventBridge is responsible for orchestrating the scheduling of tasks and responding to specific events. It triggers Lambda functions at predefined intervals or in response to specific CloudWatch metrics, ensuring that resource management actions are executed precisely when needed.
AWS CloudWatch
CloudWatch monitors resource usage across the AWS environment, collecting metrics such as CPU utilization, memory usage, and network activity. These metrics are fed into Lambda functions, where they are analysed to identify optimization opportunities.
Amazon DynamoDB
Stores the state of each resource (e.g., Action Status) and tracks the progress of optimization tasks.
AWS Compute Optimizer
Provides recommendations for EC2 instances, helping identify overprovisioned resources.

How this approach aligns with the Best Practices

Serverless Architecture
Using AWS Lambda for automation reduces operational overhead and costs, as we only pay for the compute time that is consumed. This is a key principle in cost-efficient architectures.

Event-Driven Automation
EventBridge enables event-driven automation, ensuring that these cost optimization processes are triggered automatically based on predefined schedules or specific events, minimizing the need for manual intervention.

Data-Driven Decisions
Leveraging CloudWatch Metrics to make informed decisions based on actual usage data is a best practice. It ensures that any rightsizing or termination of resources is based on concrete metrics, reducing the risk of unnecessary downtime or resource wastage.

Holistic Approach
By combining Lambda, EventBridge, CloudWatch, DynamoDB, and a well-thought-out tagging strategy, we have created a comprehensive and automated cost optimization solution that addresses both operational efficiency and cost-effectiveness.

Flexibility and Customization
Use of DynamoDB for state management allows the tool to be highly flexible and customizable. This ensures it can adapt to different environments and requirements, making it an asset for the organization.

Detailed Implementation

Our tool systematically identifies underutilized, overutilized, and idle resources by analysing metrics over a specified period (e.g., 7 days). Here is how it works for different resource types:

Expanding the Tool to Other AWS Services

RDS Instances
The tool monitors CPU utilization, memory usage, and IOPS for RDS instances. If an instance is consistently underutilized, the tool triggers a resize operation to a smaller instance class, reducing costs without impacting performance.

EBS Volumes
The tool identifies idle EBS volumes by monitoring read/write operations and its current state. If a volume has been idle and in ‘Available’ state for an extended period, it is flagged for backup and deletion to reduce unnecessary storage costs.

EC2 Instances
For EC2 instances, the tool leverages AWS Compute Optimizer recommendations. It evaluates whether instances are overprovisioned based on historical usage patterns. Overprovisioned instances may be resized to a smaller instance class, while idle instances are stopped or terminated (based on the usage threshold) if they are idle.

Detailed Optimization Process for EC2 Instances

Scan Lambda
This function is triggered by an EventBridge rule to periodically scan all EC2 instances tagged with CO-Tool=Yes. It checks whether the instance is part of an Auto Scaling Group (ASG), Elastic Kubernetes Service (EKS), or Elastic Beanstalk environment. If the instance is not part of these services and is overprovisioned according to AWS Compute Optimizer recommendations, the function updates the instance’s status in DynamoDB to “Alerted.”

Right Size Lambda
Once an instance is marked as “Alerted,” this function evaluates its recent utilization, specifically checking CPU usage. If the instance has been idle (CPU utilization < 1%) for the past hour and meets the right-sizing criteria, the function stops the instance, right-sizes it based on Compute Optimizer recommendations, and then restarts it. The status in DynamoDB is updated to “Right Sized.”

Stop Lambda
For instances marked as “Right Sized,” this function re-evaluates their idle status. If the instance remains idle for another hour, it is stopped to further save costs, and the status in DynamoDB is updated to “Stopped.”

Terminate/Remove Lambda
This function checks instances with a “Stopped” status. If the instance has multiple EBS volumes, it creates snapshots of these volumes. If not, it creates an AMI of the instance and then terminates it, updating the status in DynamoDB to “Terminated.”

Notify Lambda
Throughout the process, this function generates notifications summarizing the actions taken by the tool. It collects relevant information from DynamoDB and sends detailed reports to the operations team.

Tagging Strategy
Ensure Consistent Tagging: All resources must be tagged according to the agreed-upon tagging strategy, specifically with CO-Tool = Yes.

Automate Tagging Compliance: Use AWS Config to monitor and enforce tagging policies across all relevant AWS accounts.

Practical Checklist: Planning your own Implementation

Resource Identification
- Configure Lambda Functions: Set up Lambda functions to scan resources based on tags and usage patterns.
- EventBridge Schedules: Define and configure EventBridge rules to trigger the cost optimization process according to the required frequency.
Pre-Execution Validation
- Confirm Resource Dependencies: Verify that targeted resources are not part of critical groups like ASGs, EKS clusters, or Beanstalk applications unless intended.
- Backup Mechanisms: Ensure backups (e.g., AMIs or EBS snapshots) are in place before performing destructive actions like termination or resizing.
Optimization Execution
- Rightsizing EC2 Instances: Implement Lambda functions to automatically adjust the size of EC2 instances based on AWS Compute Optimizer recommendations.
- Terminate Idle Resources: Use Lambda functions to automatically terminate underutilized or idle resources, ensuring that usage metrics have been checked and verified.
Monitoring and Logging
- Set Up CloudWatch Alarms: Implement alarms to monitor both the performance of the tool and the resources it manages.
- Notification Mechanisms: Use SNS or custom Lambda functions to notify relevant teams about the actions taken by the optimization tool.
- Audit Logs and Actions: Regularly review DynamoDB logs and CloudWatch metrics to assess the effectiveness of the optimizations.
Post-Optimization Review
- Continuous Improvement: Adjust Lambda functions, thresholds, and schedules based on feedback and performance data to improve efficiency.

Challenges, Results & Impact

Common Challenges:

Risk of Over-Optimization
- Solution: Implement a “dry-run” mode to simulate optimizations and fine-tune thresholds before applying changes in production.
Handling Exceptions and Custom Cases
- Challenge: The tool may not correctly handle resources with unique configurations or special requirements.
- Solution: Incorporate exception handling within Lambda functions and develop a manual review process for resources flagged as needing special attention.
Challenge: Aggressive optimization, such as rightsizing or terminating resources based on strict thresholds, can lead to service disruptions.

Results & Impact:

Cost Savings
- Example: 55% reduction in monthly costs by optimizing EC2 instances alone
Resource Efficiency
- Increased utilization rates across optimized resources
Operational Impact
- Reduced manual effort and detailed reporting for operational awareness

CASE STUDY: Veterinary Care Company

Collaborated with a large Veterinary Care company
Reduced cloud expenses by 30% in three months using the automated tool
Rightsized EC2, RDS, and EBS instances
Improved operational efficiency
~30% cost reduction
45% improvement in resource utilization efficiency

Conclusion & Future Potential

The tool is scalable and adaptable to various AWS environments
Future development includes support for more services and refined automation
Continuous optimization is critical for sustained cost efficiency

Author

Pruthav Shingadia
January 29, 2025