Rolling back Terraform Remote State in AWS S3

Ahmad Al-Sajid
5 min readSep 18, 2024

--

Rolling back a Terraform remote state stored in AWS S3 might be necessary in several scenarios where the current state of your infrastructure is incorrect, corrupted, or inconsistent. Here are some common reasons why a rollback might be needed:

1. Accidental Changes or Deletion

  • Misconfiguration: If you accidentally apply a Terraform configuration that deletes or modifies critical resources, the state file will reflect these changes. Rolling back to a previous version of the state can help restore the infrastructure to its prior state.
  • Human Error: Mistakes during manual updates or applying the wrong Terraform configuration can lead to unintended infrastructure changes. Rolling back can reverse these mistakes.

2. State File Corruption

  • State Corruption: Although rare, the state file can become corrupted due to network issues, partial updates, or bugs. A corrupted state file can lead to Terraform being unable to manage resources correctly. Rolling back to a previous, uncorrupted version can restore proper functionality.

3. Failed Terraform Operations

  • Partial Deployments: If a Terraform apply operation partially succeeds due to errors or timeouts, the state file may not accurately reflect the actual state of your infrastructure. Rolling back to a previous state allows you to start the operation from a known, stable point.
  • Errors During Terraform Apply: In case of unforeseen errors during an apply operation that result in an incomplete or incorrect state, rolling back can provide a clean slate to address the issues.

4. Drift in Infrastructure

  • Manual Changes: If changes are made to the infrastructure outside of Terraform (e.g., directly via the AWS console), the Terraform state may no longer accurately reflect the real-world state of your resources. Rolling back to a previous state might be necessary to reconcile these changes.
  • Resource Drift: Over time, resources might drift from their desired configuration due to manual changes or configuration management issues. Rolling back the state file can help reset the infrastructure to a known state.

5. Testing and Development

  • Environment Reset: In development or testing environments, you might need to reset the environment to a previous state after testing changes. Rolling back the state file in these scenarios helps you revert to a known baseline for further testing.

6. Disaster Recovery

  • Incident Response: If a critical issue occurs in production, such as a misconfigured Terraform module causing downtime or security vulnerabilities, rolling back to a previous state can be part of the incident response process to quickly restore services.
  • Restoring Deleted Resources: If resources are accidentally deleted, rolling back to a previous state can be used to recreate those resources as they were before the deletion.

7. Compliance and Auditing

  • Reverting Unauthorized Changes: If changes are made that violate compliance requirements or security policies, rolling back the state can revert the infrastructure to a compliant state.
  • Audit and Review: During an audit, if it’s discovered that unauthorized or unexpected changes were made, rolling back the state file can help in quickly restoring the infrastructure to a known and verified configuration.

How to Rollback Terraform State in AWS S3:

  • Enable Versioning on S3 Bucket: AWS S3 supports versioning, allowing you to keep previous versions of the state file. If an issue arises, you can restore a previous version of the state file by identifying and copying the required version.
  • Identify the Correct Version: Review the history of changes to determine the version of the state file you need to rollback to. This can be done through the AWS S3 console or using AWS CLI commands to list and examine versions.
  • Download and Replace: Download the previous state file and replace the current state in the S3 bucket with this version. Make sure to backup the current state before replacing it, in case you need to reference it later.
  • Terraform Refresh and Apply: After rolling back the state file, run terraform refresh to update the state with the actual resources, and then terraform apply to reconcile the state with your desired configuration.

From the previous writing, Terraform remote state in AWS S3, we can continue to demonstrate the processes. In the last article, we opened ports 22, 80, and 8000 first, and then removed ports 22 and 80 from the ingress.

Let’s list all the terraform.tfstateversions and their modification times.

$ aws s3api list-object-versions --bucket demo-sajid-tf-state-bucket  --prefix demo/terraform.tfstate --output json | jq -r '.Versions[] | [.VersionId, .LastModified]'

[
"hjzwbmH1.Di5yBMfQBB31ePeza4kNmHf",
"2024-08-21T10:13:28+00:00"
]
[
"Tjcpvwhfofesrd9MWYFFXih3FQtm_D9R",
"2024-08-21T10:09:09+00:00"
]

We will rollback to the second one in this list, as it was created earlier. Download it to the local path and push it back to the S3 repository

$ aws s3api get-object --bucket demo-sajid-tf-state-bucket --key demo/terraform.tfstate --version-id Tjcpvwhfofesrd9MWYFFXih3FQtm_D9R terraform.tfstate
$ aws s3 cp ./terraform.tfstate s3://demo-sajid-tf-state-bucket/demo/terraform.tfstate

Now, check the terraform state by

$ terraform state list
Failed to load state: state data in S3 does not have the expected content.

The checksum calculated for the state stored in S3 does not match the checksum
stored in DynamoDB.

Bucket: demo-sajid-tf-state-bucket
Key: demo/terraform.tfstate
Calculated checksum: 96bc0a406930c2ecd7b4a16b4fd483c8
Stored checksum: 57dcc8d2e38da2d87ccfd605dade3c11

This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again.

If this problem persists, and neither S3 nor DynamoDB are experiencing an
outage, you may need to manually verify the remote state and update the Digest
value stored in the DynamoDB table to the following value: 96bc0a406930c2ecd7b4a16b4fd483c8

Right, we need to update the checksum also in the DynamoDB lock table. Let’s check the existing value by

$  aws dynamodb scan --table-name terraform-state-locking --region us-east-1
{
"Items": [
{
"LockID": {
"S": "demo-sajid-tf-state-bucket/demo/terraform.tfstate-md5"
},
"Digest": {
"S": "57dcc8d2e38da2d87ccfd605dade3c11"
}
}
],
"Count": 1,
"ScannedCount": 1,
"ConsumedCapacity": null
}

Now, update the checksum

$ aws dynamodb update-item \
--table-name terraform-state-locking \
--region us-east-1 \
--key '{ "LockID": { "S": "demo-sajid-tf-state-bucket/demo/terraform.tfstate-md5" } }' \
--update-expression "SET Digest = :newval" \
--expression-attribute-values '{":newval":{"S":"96bc0a406930c2ecd7b4a16b4fd483c8"}}' \
--return-values ALL_NEW
{
"Attributes": {
"Digest": {
"S": "96bc0a406930c2ecd7b4a16b4fd483c8"
},
"LockID": {
"S": "demo-sajid-tf-state-bucket/demo/terraform.tfstate-md5"
}
}
}

Check the state list again

$ terraform state list
aws_instance.ec2_instance
aws_security_group.instance_security_group

Now, apply the changes again

$ terraform apply -auto-approve

You will see, the security group again opens ports 22, 80, and 8000.

Precautions When Rolling Back:

  • Data Loss: Rolling back may result in the loss of data or configurations applied after the version you are restoring to. Carefully evaluate the implications before performing a rollback.
  • Resource Consistency: Ensure that the rollback does not lead to inconsistencies in your infrastructure. Review the infrastructure to confirm it aligns with the state file post-rollback.
  • Backup Current State: Always back up the current state file before performing a rollback. This allows you to restore it if the rollback does not resolve the issue or creates new problems.

Rolling back Terraform remote state in AWS S3 is a powerful recovery tool, but it should be used judiciously and with full awareness of the potential impacts on your infrastructure.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Ahmad Al-Sajid
Ahmad Al-Sajid

Written by Ahmad Al-Sajid

Software Engineer, DevOps, Foodie, Biker

No responses yet

Write a response