Skip to main content

Command Palette

Search for a command to run...

Terraform Data Sources in AWS: Using Existing Infrastructure Instead of Creating Everything

Updated
6 min read
Terraform Data Sources in AWS: Using Existing Infrastructure Instead of Creating Everything

As Terraform adoption grows in organizations, not every resource you work with will be created by your own Terraform code.

Many times, network infrastructure such as VPCs, subnets, security groups, or AMIs already exist and are managed by another team, another Terraform project, or even manually.

This is where Terraform Data Sources become extremely valuable.

Today, I learned how to use Terraform Data Sources to discover and reference existing AWS resources without managing or recreating them.


What Are Terraform Data Sources?

Terraform Data Sources allow us to fetch information about existing infrastructure and use that information in our Terraform configuration.

Unlike resources, data sources do not create anything.

They simply query existing infrastructure and make the information available to Terraform.

Think of it as:

Resource → Creates infrastructure

Data Source → Reads infrastructure


Why Use Data Sources?

In real-world environments:

  • Networking teams often manage VPCs and Subnets.

  • Security teams manage Security Groups.

  • Platform teams manage AMIs and shared services.

  • Multiple Terraform projects may need to reference the same resources.

Instead of hardcoding IDs or recreating infrastructure, we can dynamically retrieve them using data sources.

Benefits include:

✅ Reusability

✅ Better collaboration across teams

✅ Reduced hardcoding

✅ More maintainable Terraform code

✅ Easier deployments across environments


Scenario

In this hands-on lab, the networking infrastructure already existed in AWS.

Pre-existing resources:

Existing VPC

Name = shared-network-vpc

Existing Subnet

Name = shared-primary-subnet

My task was to launch an EC2 instance inside this existing network infrastructure using Terraform Data Sources.


Architecture Overview

Existing VPC
      │
      ▼
Existing Subnet
      │
      ▼
Terraform Data Sources
      │
      ▼
EC2 Instance

Terraform reads the existing VPC and subnet and then uses those values to deploy a new EC2 instance.


Provider Configuration

First, I configured the AWS provider.

provider "aws" {
  region = "us-east-1"
}

This tells Terraform which AWS region to interact with.


Data Source: Existing VPC

The first step was locating the existing VPC.

data "aws_vpc" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-network-vpc"]
  }
}

How It Works

Terraform searches AWS for a VPC with the tag:

Name = shared-network-vpc

Once found, all VPC attributes become available.

For example:

data.aws_vpc.shared.id

returns the VPC ID.


Data Source: Existing Subnet

Next, I retrieved the existing subnet.

data "aws_subnet" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-primary-subnet"]
  }

  vpc_id = data.aws_vpc.shared.id
}

Why Reference the VPC?

Adding:

vpc_id = data.aws_vpc.shared.id

ensures Terraform only searches within the correct VPC.

This creates a dependency between the subnet lookup and VPC lookup.


Data Source: Latest Amazon Linux 2 AMI

Instead of hardcoding an AMI ID, I used a data source to fetch the latest Amazon Linux 2 image.

data "aws_ami" "amazon_linux_2" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

Why This Approach Is Better

Hardcoding:

ami = "ami-123456789"

can become outdated quickly.

Using a data source:

data.aws_ami.amazon_linux_2.id

ensures Terraform always retrieves the latest matching AMI.

Benefits:

✅ Latest supported image

✅ Better security updates

✅ Reduced maintenance

✅ Environment portability


Launching the EC2 Instance

After retrieving the VPC, subnet, and AMI, I used them to deploy an EC2 instance.

resource "aws_instance" "main" {
  ami           = data.aws_ami.amazon_linux_2.id
  instance_type = "t3.micro"

  subnet_id     = data.aws_subnet.shared.id
  private_ip    = "10.0.1.50"

  tags = {
    Name = "day13-instance"
  }
}

Understanding the References

AMI

data.aws_ami.amazon_linux_2.id

Fetches the latest Amazon Linux 2 AMI.


Subnet

data.aws_subnet.shared.id

Fetches the existing subnet ID.


VPC Dependency

data.aws_vpc.shared.id

Used internally when locating the subnet.


Dependency Flow

Terraform automatically understands the dependency chain.

aws_vpc Data Source
          │
          ▼
aws_subnet Data Source
          │
          ▼
aws_instance Resource

Terraform reads the VPC first, then the subnet, and finally creates the EC2 instance.

No manual dependency configuration is required.


Variable Configuration

To make the deployment reusable, I also defined the AWS region as a variable.

variable "region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

Benefits:

✅ Easier environment changes

✅ Better reusability

✅ Reduced hardcoding


Common AWS Data Sources

Terraform provides many useful AWS data sources.

Examples:

VPC

data "aws_vpc"

Subnet

data "aws_subnet"

Security Group

data "aws_security_group"

Availability Zones

data "aws_availability_zones"

IAM Role

data "aws_iam_role"

AMI

data "aws_ami"

Caller Identity

data "aws_caller_identity"

Real-World Use Cases

Data Sources are heavily used in production environments.

Examples include:

Shared Networking

Application teams deploy into existing VPCs and subnets.


Shared Security Groups

Reference security groups managed by another team.


Latest AMI Selection

Automatically use updated operating system images.


Multi-Team Infrastructure

Allow different teams to manage different parts of infrastructure independently.


Cross-Project Integration

Reference resources created by separate Terraform projects.


Best Practices

✅ Use data sources instead of hardcoded resource IDs

✅ Filter resources using tags whenever possible

✅ Use variables to make lookups reusable

✅ Always validate that expected resources exist

✅ Use data sources for shared infrastructure

✅ Use dynamic AMI lookups instead of static IDs


Key Takeaways

✔ Data Sources allow Terraform to read existing infrastructure

✔ They do not create or manage resources

✔ Data Sources reduce hardcoded values

✔ They improve collaboration between teams

✔ Existing VPCs and Subnets can easily be referenced

✔ Dynamic AMI lookups are better than static AMI IDs

✔ Terraform automatically handles dependencies between data sources and resources

✔ Data Sources are essential for enterprise-scale Terraform deployments


Conclusion

Terraform Data Sources are one of the most important concepts for working in real-world cloud environments.

Most organizations already have existing infrastructure that new applications must integrate with. Data Sources provide a clean and reliable way to discover and consume those resources without recreating or managing them.

This hands-on lab helped me understand how Terraform can seamlessly combine existing AWS infrastructure with newly provisioned resources, making Infrastructure as Code more practical, reusable, and scalable.

The official Terraform documentation and YouTube tutorials helped me understand Data Sources and practice deploying resources into existing AWS infrastructure.


Connect With Me

LinkedIn: https://www.linkedin.com/in/devops-samarjeet/