Implementing an AWS Account Vending Machine

Pascal Euhus
New Work Development
8 min readDec 13, 2022

Automated AWS Account creation on enterprise scale

Abstract

This article demonstrates how to automate the creation of AWS accounts at enterprise scale and what the benefits are. This approach is based on AWS Control Tower.

What Account Vending Machine is about

Control Tower is a service for centrally administering account management, compliance, and security, and simplifying it in part by providing predefined rule sets. In turn, Control Tower is built on various AWS services and orchestrates them. These services are mainly AWS Organizations, Service Catalog and Cloudformation StackSets. At present, Control Tower does not offer its own API for automated account creation. However, the Service Catalog API can be used to create AWS accounts that are automatically managed by Control Tower.

Oftentimes, there are minimum requirements for a (new) AWS account, which is highly linked to the context of the company. Especially compliance and the characteristics of the security policies vary quite strongly between enterprises. But also the state of the delivery to the mostly internal customers varies, so it may be that a fully configured network setup is part of a basic account or not.

With Control Tower, AWS Organizations and Cloudformation StackSets, such requirements can be implemented easily.

If you look at Control Tower from the perspective of an (internal) service provider that wants to offer an AWS account as a service, you quickly run into a few pitfalls. Especially the fact that the account creation, which runs on top of the Service Catalog, has to be done sequentially, makes it difficult to fully automate the process. The official AWS documentation refers to a blog post that addresses this issue.

This solution is based on ad hoc batch provisioning, i.e., you collect accounts to be provisioned and then feed an automation with a list that provisions AWS accounts according to the input. However, if you want to provision an AWS account as an ad hoc service, where a user orders an account on demand and gets one delivered minutes later, this solution has its weaknesses. The approach of using a DynamoDB as a lock to guarantee that AWS accounts are only created sequentially, however, sounds promising.

The Account Vending Machine should meet the following criteria:

  • Short turnaround times for AWS account creation
  • Fully automated, following a manual approval process
  • There must be a distinction between personalized playground accounts and staging/production accounts
  • A user should be able to order an Account via web UI

What we already had at New Work

The topic of automated account creation was not a new one, there existed already (partial) automation, which however originated before Control Tower and directly used the APIs of AWS Organisations and Cloudformation. The solution, a Lambda that ran for about 10 minutes, did its job, but lacked features and automation. Furthermore a single Lambda function with many API calls stacked on top of each other was not particularly error resistant. Due to the amount of effort on maintenance and extension we would have had to spend on the existing solution, the decision was made for a new implementation.

The cornerstones of the Account Vending Machine

The setup should be reproducible described with Infrastructure-as-Code. Hashicorp’s Terraform has been very common to date. Since Control Tower was to be used for account management in the future, and since it is heavily based on Cloudformation, we decided against Terraform and instead opted for the Cloud Development Kit (CDK), which transpiles to Cloudformation. Furthermore, this allowed us to remain homogeneous with our tech stack, since the business logic should be implemented in Typescript and also the infrastructure through the CDK. This allowed us to use the same ecosystem for automated testing, code analysis, etc., regardless of the infrastructure code or business logic.

Typescript was chosen for the business logic primarily because of the broad support for NodeJS in the AWS environment and because of the existing internal know-how.

Since the Account Vending Machine can be seen as a direct extension of the Control Tower and has to be deployed in the root account, the easiest setup is to also deploy the Account Vending Machine in the root account. This allows access to Control Tower native mechanisms and roles needed to manage and initialise accounts. Deploying outside the root account is possible, but requires a lot of configuration to forward events to the member account and still requires mechanisms to be created that allow access to all accounts.

The User Interface

The best system is only as good as its usability and accessibility for the end user. Internally, the necessary approval process for an account order has already been implemented and covered via Jira. In order to keep the effort low, as well as to keep the comfort for the end user, Jira should continue to act as the UI. This way, there was no change in the process for the customer. The account vending machine is triggered via Jira Automation Workflow, after approval with data from the ticket via internal REST API. Making the whole process ticket-based also offers the possibility to provide a simple feedback mechanism on transaction level for the customer. This way, she can always track the status of the order based on the information provided on the ticket, up to the complete delivery.

The architecture

The whole stack is based on serverless functions, which is appropriate for that particular use case because we don’t expect the system to run all the time. All components are decoupled and glued via events. Fig. 1 depicts the components used for the implementation.

Fig. 1 Technical overview

If a customer orders an account via the UI (in our case, she raises a Jira ticket), the following process (Fig. 2) starts. The whole system is running as long as there are open account orders in the database.

Fig. 2 Sequence diagram, account ordering

Account orders can be received via REST API and a Validator checks the input. This mechanism is based on JSON schema. Since a valid input directly places an order in the order database (DynamoDB), a direct implementation of the validation logic on API gateway level was omitted here to have better control over it. A DynamoDB stream with a filter on the new and completed provisioning status field triggers the Dispatcher. The Dispatcher checks if no order is already in the status of PROVISIONING and creates a message for the Account Creator if necessary. The message triggers the Account Creator which creates a lock for the order in the database. After placing the lock, an API call is made to the Service Catalog, which in turn creates an AWS Account and registers it with Control Tower.

Account creator function

If an account was successfully registered, Control Tower sends the corresponding EventBridge event. This will be used as a trigger for the Orchestrator (a StepFunction). The actual account creation was deliberately not included in the Orchestrator, because at the time of implementation, waiting for events within a StepFunction is not natively possible (the solution would have been to implement this logic itself via Lambda, which seemed less intuitive and transparent compared to the now existing solution). If the Orchestrator has done its work, it sends the result (SUCCESS/ FAIL) also to a SNS Topic, where among other components, e.g. a mailer (SES based) is registered, to notify the customers. Since the initial process relies on creating a Jira ticket for the account order, a Lambda could hook in at that point and automatically, update the corresponding Jira ticket with relevant informations (eg. accountId and account-alias) and close the ticket.

Essentially, all inter-system communication is based on (internal) events and (DynamoDB) streams. This not only ensures a strong decoupling of the individual sub-steps, ordering, creation and individualization, it also simplifies the provision and implementation of monitoring and alerting. It also makes it easier to test and develop the individual sub-functions, since the input of the individual functions is based on standardized formats, which means that consumer-driven contracts can be extensively used in development. In this context, a function defines which output the preceding sub step must provide.

Using an orchestrator for account customization

The third step in the process is the customization of the account to meet the company’s internal requirements. This includes, for example, the enforcement of additional compliance policies, the creation of an account alias that is constrained by a fixed naming convention, and the tagging of accounts and resources for cost allocation and budget management.

The Orchestrator (Fig. 3) is based on AWS StepFunctions and therefore provides a state machine for sequential sequences, but also the ability to parallelize as much as possible to keep the overall turnaround time at a minimum. Retry mechanisms and error handling work very comfortably with the on-board mechanisms of StepFunctions. Furthermore, many AWS services are natively integrated and boilerplate code in the form of Lambda functions is often spared at this point. Some customization, like setting an account alias, is deployed as a Cloudformation custom resource. Within the StepFunction you need to poll the rollout of such resources to get the actual status.

Fig 3 Orchestrator, that handles account customization

The next steps

The most obvious extension is to support additional account lifecycle events, such as account closures. This is straightforward thanks to AWS Organizations’ API, but there are a few things to keep in mind when using Control Tower (see).

By providing a REST API that creates AWS accounts and integrates with a workflow tool like Jira, there are many opportunities to successively extend and build an internal platform around the provision of an AWS specific base setup. For example, the API can be extended to offer network configurations, standardized database setups, or other cloud products to customers via the same user interface (Jira). This centralization and standardization greatly improves the visibility of internal service portfolios as well as the standardization of the system landscape. This in turn pays directly towards the goals of a cloud platform, namely to increase the speed of product teams and to shorten the time-to-market for new innovations. On the other hand, standardized basic configuration reduces the effort for maintenance and operation, and it decreases the extrinsic cognitive load, especially within the operations area.

TL;DR;

Automating account creation is not just an issue for medium — large enterprises that have a huge number of AWS accounts. Those running a multi-account setup will have to address some compliance, security and FinOps issues in order to stay on top of things. The level of automation of the processes involved plays a critical role in efficiency. This article provided a detailed look at automating the account creation and basic configuration of a landing zone managed by Control Tower. An alternative approach based on AWS CodeBuild is described in this blog article.

If you are interested to work with us on such challenges using AWS at enterprise scale, feel free to reach out and let’s have a chat.

--

--

Software-Engineer and DevOps-Enthusiast, AWS Solutions Architect Professional, GCP Professional Cloud Architect