Monitor Amazon VPC CIDR Resource Usage with IPAM, Cloud Watch, and Serverless

Bishr Tabbaa
Towards AWS
Published in
4 min readDec 2, 2022
IPAM Observability

I had an enterprise customer which runs Kubernetes workloads on EKS in the AWS cloud and they regularly ran out of VPC subnet address space during large autoscaling events. The impact of running out of address space was lower than expected performance at best and infrequent downtime at worst. They asked me how could they solve the problem in a modern, proactive manner without technical debt so that the solution could integrate with their existing observability tool chain.

Amazon VPC IP Address Management (IPAM)

Amazon VPC IPAM makes it easier for you to plan, track, and monitor IP addresses for your AWS workloads, and it was released around re:Invent 2021.

You can use IPAM to do the following:

  • Organize IP address space into routing and security domains
  • Monitor IP address space that’s in use and monitor resources that are using space against business rules
  • View the history of IP address assignments in your organization
  • Automatically allocate CIDRs to VPCs using specific business rules
  • Troubleshoot network connectivity issues
  • Enable cross-region and cross-account sharing of your Bring Your Own IP (BYOIP) addresses
Amazon VPC IPAM Overview [source: AWS]

IPAM is enabled as a regional construct and should be setup in the management account where the AWS Organization is typically defined for a multiple-account landing zone. IPAM is a composite entity that consists of scopes, usually private for all private address space and public for all public address space. Scopes enable you to reuse IP addresses across multiple unconnected networks without causing IP address overlap and conflict which can occur as your workloads and their inter-connectivity grow across accounts, regions, and VPCs. Within scopes you create pools. Pools are a collection of a contiguous IP address ranges or CIDRs related according to your routing and security needs (e.g. dev vs prod, emea vs namer vs apac geography, etc). Within IPAM pools you allocate CIDRs to AWS resources. Allocations are a CIDR assignment from a pool to another resource or IPAM pool. For example, you can create a VPC and choose an IPAM pool for the VPC’s CIDR to enforce network governance rules at an organizational, enterprise level. You can also share IPAM pools using AWS Resource Access Manager (RAM), create additional scopes that go beyond the public and private defaults, as well as monitor CIDR usage using a dashboard UI and Cloud Watch metrics.

So far, so good, right? Well as Shakespeare once said, ay, there is a rub. While AWS VPC IPAM does have CloudWatch metrics and alarms for IPAM-managed pools of IP addresses, the IPAM does NOT support this feature for non-IPAM managed address pools which is the case for many customers since IPAM is a relatively new service only released in December 2021. So customers with existing VPCs, subnets, and other CIDR resources could get limited visual observability into their existing resources but there was no automation for self-managed CIDR resources. My customer still needed to be proactive about IP address allocation and usage well before these limited resources are exhausted, so a natural approach is to publish metrics based on IP address usage exceeding a user-defined threshold, and then take other downstream actions including sending alerts to staff and consuming the metrics and alerts into their existing observability toolchain.

I built a solution that addresses these service gaps for users and customers who have existing VPC resources that are not managed through IPAM address pools. The solution consists of a set of Python functions deployed as a Docker container attached to a Lambda Function and packaged through the AWS Serverless Application Model (SAM). The solution can be run as-is as a standalone/scheduled Lambda function and should be installed in the parent/management account of your landing zone. The solution also integrates with other AWS services such as publishing alerts to SNS topics, sending metrics to CloudWatch, and passing IPAM CIDR information as JSON to other Lambda functions as part of a larger composite workflow.

IPAM Monitoring Solution for self-Managed CIDRs [source: Bishr Tabbaa]

The Python code uses the boto3 EC2 APIs to connect with IPAM and then the SNS and CloudWatch APIs for alerts and metrics respectively. The Docker container uses the AWS Lambda Python v3.9 base image. The CloudFormation template has been parameterized for IpamUsageThreshold and IpamSnsTopic for convenient deployment, and the SAM construct makes it easier to build/test locally before you deploy to the AWS cloud. You can inspect the complete solution code, build/deploy instructions, and documentation at the GitHub repo below.

Conclusion

AWS observability lets you collect, correlate, aggregate and analyze telemetry in your network, infrastructure, and applications in the cloud, hybrid, or on-premises environments so you can gain insights into the behavior, performance, and health of your systems that your business depends upon. This article described an mechanism to adding network observability to existing, self-managed CIDR resources and workloads running on AWS.

I am interested in community feedback around this unique approach to monitoring self-managed CIDR resources and look forward to extending it in several directions in the future including integration of CloudWatch metrics with external 3rd parties, demonstrating alert notification value using Slack and Teams, and expanding the set of IPAM metrics.

Enjoy the article? Follow me on Medium and Twitter for more updates.

--

--

Architect @awscloud • Board Member • Fractional CTO • Built B2B DNA supply chain stack @GxGene • History of System Failure • AIML Data Science • @RiceAlumni