Terraform Data Sources in AWS: Using Existing Infrastructure Instead of Creating Everything

As Terraform adoption grows in organizations, not every resource you work with will be created by your own Terraform code.
Many times, network infrastructure such as VPCs, subnets, security groups, or AMIs already exist and are managed by another team, another Terraform project, or even manually.
This is where Terraform Data Sources become extremely valuable.
Today, I learned how to use Terraform Data Sources to discover and reference existing AWS resources without managing or recreating them.
What Are Terraform Data Sources?
Terraform Data Sources allow us to fetch information about existing infrastructure and use that information in our Terraform configuration.
Unlike resources, data sources do not create anything.
They simply query existing infrastructure and make the information available to Terraform.
Think of it as:
Resource → Creates infrastructure
Data Source → Reads infrastructure
Why Use Data Sources?
In real-world environments:
Networking teams often manage VPCs and Subnets.
Security teams manage Security Groups.
Platform teams manage AMIs and shared services.
Multiple Terraform projects may need to reference the same resources.
Instead of hardcoding IDs or recreating infrastructure, we can dynamically retrieve them using data sources.
Benefits include:
✅ Reusability
✅ Better collaboration across teams
✅ Reduced hardcoding
✅ More maintainable Terraform code
✅ Easier deployments across environments
Scenario
In this hands-on lab, the networking infrastructure already existed in AWS.
Pre-existing resources:
Existing VPC
Name = shared-network-vpc
Existing Subnet
Name = shared-primary-subnet
My task was to launch an EC2 instance inside this existing network infrastructure using Terraform Data Sources.
Architecture Overview
Existing VPC
│
▼
Existing Subnet
│
▼
Terraform Data Sources
│
▼
EC2 Instance
Terraform reads the existing VPC and subnet and then uses those values to deploy a new EC2 instance.
Provider Configuration
First, I configured the AWS provider.
provider "aws" {
region = "us-east-1"
}
This tells Terraform which AWS region to interact with.
Data Source: Existing VPC
The first step was locating the existing VPC.
data "aws_vpc" "shared" {
filter {
name = "tag:Name"
values = ["shared-network-vpc"]
}
}
How It Works
Terraform searches AWS for a VPC with the tag:
Name = shared-network-vpc
Once found, all VPC attributes become available.
For example:
data.aws_vpc.shared.id
returns the VPC ID.
Data Source: Existing Subnet
Next, I retrieved the existing subnet.
data "aws_subnet" "shared" {
filter {
name = "tag:Name"
values = ["shared-primary-subnet"]
}
vpc_id = data.aws_vpc.shared.id
}
Why Reference the VPC?
Adding:
vpc_id = data.aws_vpc.shared.id
ensures Terraform only searches within the correct VPC.
This creates a dependency between the subnet lookup and VPC lookup.
Data Source: Latest Amazon Linux 2 AMI
Instead of hardcoding an AMI ID, I used a data source to fetch the latest Amazon Linux 2 image.
data "aws_ami" "amazon_linux_2" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
Why This Approach Is Better
Hardcoding:
ami = "ami-123456789"
can become outdated quickly.
Using a data source:
data.aws_ami.amazon_linux_2.id
ensures Terraform always retrieves the latest matching AMI.
Benefits:
✅ Latest supported image
✅ Better security updates
✅ Reduced maintenance
✅ Environment portability
Launching the EC2 Instance
After retrieving the VPC, subnet, and AMI, I used them to deploy an EC2 instance.
resource "aws_instance" "main" {
ami = data.aws_ami.amazon_linux_2.id
instance_type = "t3.micro"
subnet_id = data.aws_subnet.shared.id
private_ip = "10.0.1.50"
tags = {
Name = "day13-instance"
}
}
Understanding the References
AMI
data.aws_ami.amazon_linux_2.id
Fetches the latest Amazon Linux 2 AMI.
Subnet
data.aws_subnet.shared.id
Fetches the existing subnet ID.
VPC Dependency
data.aws_vpc.shared.id
Used internally when locating the subnet.
Dependency Flow
Terraform automatically understands the dependency chain.
aws_vpc Data Source
│
▼
aws_subnet Data Source
│
▼
aws_instance Resource
Terraform reads the VPC first, then the subnet, and finally creates the EC2 instance.
No manual dependency configuration is required.
Variable Configuration
To make the deployment reusable, I also defined the AWS region as a variable.
variable "region" {
description = "AWS region"
type = string
default = "us-east-1"
}
Benefits:
✅ Easier environment changes
✅ Better reusability
✅ Reduced hardcoding
Common AWS Data Sources
Terraform provides many useful AWS data sources.
Examples:
VPC
data "aws_vpc"
Subnet
data "aws_subnet"
Security Group
data "aws_security_group"
Availability Zones
data "aws_availability_zones"
IAM Role
data "aws_iam_role"
AMI
data "aws_ami"
Caller Identity
data "aws_caller_identity"
Real-World Use Cases
Data Sources are heavily used in production environments.
Examples include:
Shared Networking
Application teams deploy into existing VPCs and subnets.
Shared Security Groups
Reference security groups managed by another team.
Latest AMI Selection
Automatically use updated operating system images.
Multi-Team Infrastructure
Allow different teams to manage different parts of infrastructure independently.
Cross-Project Integration
Reference resources created by separate Terraform projects.
Best Practices
✅ Use data sources instead of hardcoded resource IDs
✅ Filter resources using tags whenever possible
✅ Use variables to make lookups reusable
✅ Always validate that expected resources exist
✅ Use data sources for shared infrastructure
✅ Use dynamic AMI lookups instead of static IDs
Key Takeaways
✔ Data Sources allow Terraform to read existing infrastructure
✔ They do not create or manage resources
✔ Data Sources reduce hardcoded values
✔ They improve collaboration between teams
✔ Existing VPCs and Subnets can easily be referenced
✔ Dynamic AMI lookups are better than static AMI IDs
✔ Terraform automatically handles dependencies between data sources and resources
✔ Data Sources are essential for enterprise-scale Terraform deployments
Conclusion
Terraform Data Sources are one of the most important concepts for working in real-world cloud environments.
Most organizations already have existing infrastructure that new applications must integrate with. Data Sources provide a clean and reliable way to discover and consume those resources without recreating or managing them.
This hands-on lab helped me understand how Terraform can seamlessly combine existing AWS infrastructure with newly provisioned resources, making Infrastructure as Code more practical, reusable, and scalable.
The official Terraform documentation and YouTube tutorials helped me understand Data Sources and practice deploying resources into existing AWS infrastructure.
