Terraform Journey In Startup
Introduction
Thank you for clicking through to my arcticle. I've been a DevOps engineer for 2 years in dev-team of 7 engineers.
My name is MINSEOK, LEE, but I use Unchaptered as an alias on the interenet. So, you can call me anythings "MINSEOK, LEE" or "Unchaptered" to ask something.
Topics
This post is a retrospective of an early startup's Terraform Journey.
Pros and Cons
Solution of Cons
Some example codes of Terraform
Target Engineer
As a DevOps Engineer, I'd the following concerns. So I hope this article will help startup engineers with similar concerns.
Want to follow Best Practices and Security Principals.
Want to increase the reliability of infrastructure provisioning and operations
Want to reduce "human error" in repetitive tasks.
Target Team, Account, Product
10 engineer or less.
1 or 2 devops engineer.
Have complex worloads of infrastructure
Have workloads based on scheduler.
Have workloads based on messaing(SQS).
Have several AWS accounts for several products
Connections
GitHub : github.com/unchaptered
Inblog : inblog.ai/unchaptered
Medium : medium.com/@unchaptered
LinkedIn : linkedin.com/in/minseok_lee
Requisites
This section is written for "What is the IaC Tools?"and "Why are you choose Terraform, not CloudFormation, AWS CDK and so on?"
What is the IaC Tools?
IaC is shortcut of Infrastructure of Code.
It literally means "Replace from Console/GUI to Codes for provisioning infrastructure".
If you want more information,
- Read "Infrastructure as Code (IaC) — What is it?"
- Watch "Infrastructure as code: What is it? Why is it important?"
What is the expected Pros?
First, when I knew IaC Tools in Oct 2022, I expect this pros.
Versioning of infra
Increase reusability
Reduce documentation
"Can" power up autiomation pipeline or CI/CD Platform
And, after used IaC Tools in Dec 2023, t expect new pros.
"Can" power up testability of infrastructure.
What's the meaning of "Can"?
I thought, terraform is optimized of provisioning of infrastructure resources.
When you use terraform only, you can't power up automation and testability.
Rather, using terraform modules, propagating erros from parent to child modules. So, I thought, sometimes terraform reduce testability.
So DevOps Engineers will use more tools to integrate for a power-up of automation and increasing testability. Such as Ansbile, Terragrunt, Terratest and so on.
What is the expected Cons?
From Oct 2022 to Feb 2024, I thought IaC Tools reduce productivity temporarily.
But with the good module system and technical proficiency, I think productivity will be similar or better than Cosnole(Manaul Process).
And a backend engineer is having trouble working with terraform, because of learning curve.
Why I choose Terraform?
Alternatives(of terraform) include, AWS CloudFormation, CDK or Pulumi.
I have these following requirements.
Don't require any programming skills
I didn't assume all DevOps is good to deal with multiple programming languages. I thought, having IaC in Java and TypeScript has Pros and Cons.Don't locked into a specific CSP(AWS, Azure)
Needed a solution would be scalable in Hybrid Cloud or Multi-CSPMust have engineer community.
As a first requirements, I can't choose CDK and Pulumi.
And second requirements, I don't choose CloudFormation.
Of the many other IaC Tools out there, Terraform had the largest community.
So I choose Terraform as a IaC Tools.
Design of Structure
Before start to work, I thought about the structure of the task, following two concept.
Define the complexity of company's infrastructure.
Design a folder structure to match
Determine Structure Type with Best Practices
I thought, structure changes of terraform is very difficult.
Therefore, I wanted to make our structure as extensible as possible.
First, I read the article "Terraform Best Practices - Code structure examples", written by Anton Babenko. I thought, the categorization made sense for us.
However, due to the fast-paced nature of early-startup, I thought about it in terms of time: post, present and future.
Terms of Time | Key Requirements | Type |
---|---|---|
Post |
| large |
Present |
| very-large |
Future |
| very-large |
According to each business/product dicisions, the complexity already became too high.
In Dec 2023, our first-party services is launched.
And Jan 2024, we prepare to fix first-party service to integrated outer service.
And Jan 2024, we prepare to launch new mvp service for user demand research.
And there're several SaaS AI Offerings, was expected.
So, I determined service type is very-large.
Design Folder Structure with Best Practices
Around June 2023, I was using Ansible with the wrong file structure, designed by my own ideas. It took me 2 weeks to change it to good structure. And this time, I researched a lot of references of Best Practices.
Especially, I looked up "MarketCurly, DevOps' Terraform Journey".
MarketCurly is a same-day grocery delivery service in Republic of Korea.
In MarketCurly, introduce this folder structure.
├── README.md
├── env // Environment Files
│ ├── dev
│ └── stg
│ ├── main.tf
│ ├── terraform.tfvars
│ ├── variables.tf
│ └── version.tf
└── modules // AWS modules Codes
├── acm
└── compute
└── alb
├── main.tf
├── output.tf
├── variables.tf
└── versions.tf
By default, it separates environment variables from module files.
However, it only talked about some trouble-shooting and didn't give me the overall project structure.
Depending on the purpose, I've devided them into two categories.
Company Resources
It means "resources are used for multiple-products"Product Resources
It means "resources are used for single-products"
└── services
├── <COMAPNY>
├── <PRODUCT_A>
└── <PRODUCT_B>
I further categorized the modules based on their purpose.
For example, it would be dangerous to manage storage(s3), database(rds), compute(ec2), serverless(lambda).
Why single folder is dangerous, i thought?
The terraform actions consist of create, update and destroy, basically.
Because some resources don't support update, terraform action can destroy and create resource. By default these processes happen concurrently, which can cause fatal problem if a particular values is a unique value. Previous resource and new resources is encountered in the same time.
Or some syntax, for_each, occured all resources of list is destroyed and created issue.
Therefore, separating the modules used in the product according to their purpose is more safe than single folder.
└── services
├── <COMAPNY>
│ └── domain
├── <PRODUCT_A>
└── <PRODUCT_B>
├── compute
├── storage
├── database
└── serverless
Example Codes
Here's a simple example code to help your understanding.
After some feedback, I realized the problem of this structure.
Therefore, I recommend that you only use this code "for understanding".
Defining S3 Module Codes
I've written code to provisioning AWS S3 Bucket using Terraform.
As a Designed Folder Structure, I seperate main.tf
, variables.tf
, output.tf
.
main.tf
: Define modulesvariables.tf
: Define modules' argumentsoutputs.tf
: Define modules' attributes
# modules/s3/bucket/main.tf
resource "aws_s3_bucket" "aws_s3_bucket_module" {
bucket = var.bucket_name
acl = var.bucket_acl
}
# modules/s3/bucket/variables.tf
variable "bucket_name" {
type = string
}
variable "bucket_acl" {
type = string
}
# modules/s3/bucket/outputs.tf
output "bucket_domain_name" {
value = aws_s3_bucket.aws_s3_bucket_module.bucket_domain_name
}
Why declare output blocks?
Basically, you can accessresource.aws_s3_bucket_aws_s3_bucket_module.bucket_domain_name
.
However, in odrer for the module system to access the properties of an internal module, it must be declared asoutput
in the internal modules.
Use S3 Module Codes
Let's provision our infrastructure by source(=import) the s3 module in the services. If you want to deploy your product in dev, prod, stage, qa and so on, you'll need to use variables
again.
# services/product/storages/sample_s3_bucket.tf
module "sample_s3_bucket" {
source = "../../../modules/s3/bucket"
bucket_name = "${var.service}-${var.stage}-s3-bucket-sample"
bucket_acl = "acl"
}
# services/product/storages/_.variables.tf
variable "service" { type = string }
variable "stage" { type = string }
Create tfvars file
And create tfvars file
# env/dev/sample_s3_bucket.tfvars
service = "example"
stage = "dev"
Create S3 Bucket
And you can provisioning infrastructure
cd services/product/storages/
terraform init
terraform apply -var-file=../../../env/dev/sample_s3_bucket.tfvars
Can we use terraform with DevOps and Backend Engineer in this system? "No!"
Just before product launched, I did a self-code review first.
And then, I ask a question to "Server Engineer Lay". The question was, "Do you think, you can work simple modification using Terraform?". And he said "No, it's little worry to me"
Too many files, too long codes
S3 buckets will have different preferences depending on their purpose.
When using AWS Console/GUI, some options are automatically assigned. But in terraform, you'll need to manually put a options.
In s3 examples, we use the following resources together.
acl
bucket
bucket_notification
bucket_policy
core_configuration
ownership_controls
pbulic_access_block
So, the CSP Resources, are secured, are tens to hunderds of lines long in single any.tf
files. If your service is more complex, you'll have anywhere from a few a dozones of each.tf
files in one folder.
This means you'll end up with thousands of lines of terraform code in one folder.
It makes the code hard to read and unwieldy to work with.
Backend Developer's Answer
As I mentioned earlier, one folder has thousands of lines.
And each terraform modules has many references to other module, it seems like developer were afraid about side effect.
So, I realized there was a fatal problem with this approach.
Backend Engineer can't fix any modules with safety. If organization wants backend engineer to fix Terraform Codes, DevOps engineer must share all of the terraform.
It doesn't look smart to me.
Where there any other critical issues?
If you use terraform module system for reusability, you'll encounter error propagation. When you modify high-level module, sometimes low-level module can occured some error.
So, you must write test code to reduce side effect.
Nowdays I'm used terratest to test terraform.
Conclusion
I've been working with Terraform for 3 months, 7 days a week. And it's hard to change our production codes now. So that's where I'd like to conclude for now.
Pros
[IDK] Versioning of Infra
Expected : Good
Reality : I don't feel any advantages yet.
[GOOD] Increase reusability
Expected : Good
Reality : totally increase reusability all modules.
[GOOD] Reduce documentation
Expected/Reality : Good
[GOOD] "Can" power up automation pipeline or CI/CD Platform
Expected : Maybe good?
Reality : It's incredibly useful for DevOps. With cloud secret storages(Vault, AWS SecretManager), you can manage secrets or ids in central tower.
[BAD] "Can" power up testability of infrastructure
Expected : Maybe Good?
Reality : I think, terraform modules system reduce testability.
Cons
[GOOD] reduce productivity
Expected : reduce productivity
Reality : If I used VPC and EC2 only, terraform reduce productivity. But, if you used many cloud resources, terraform increase productivity, I thought.
[BAD] backend engineer's learning curve
Expected : backend engineer can't modify infra using terraform.
Reality : It's true i think.
How I'm improving using Terraform?
Of the expected entire Pros and Cons, each one thing appeared to be a cons, I thought.
[Pros to Cons] "Can" power up testability of infrastructure
[Solution] Use test code with terratest
Article : How can I test Terraform?
GitHub : github.com/unchaptered/iac-storage
[Cons to Cons] backend engineer's learning curve
[Solution] Use sepereated module layer
Article : Production-level Guide to Terraform
GitHub : github.com/unchaptered/iac-storage