4.04 Terraform Data Sources

Overview

Data sources let Terraform read attributes from infrastructure it doesn't manage — resources created manually, by other IaC tools, or by a separate Terraform configuration — and use that data inside its own managed resources.

Abstract

A data block is Terraform's read-only counterpart to a resource block. It fetches information about an existing object without creating, updating, or destroying it, making that information available elsewhere in the configuration via the data object.

Why It Matters in Production

Real infrastructure is rarely managed by a single tool. Resources may exist from Puppet, CloudFormation, SaltStack, Ansible, ad-hoc scripts, manual provisioning, or even a different Terraform state. Data sources let a Terraform configuration reference values from those external resources — like a manually-provisioned database's host address — without taking ownership of them, avoiding duplicated infrastructure or unsafe imports.

Key Concepts

Concept	Description
Data source (`data` block)	Reads attributes from an existing resource; does not create, update, or destroy it
Managed resource (`resource` block)	Creates, updates, and destroys infrastructure; fully owned by Terraform
`data` object	Where attribute values read by a data source are exposed for use elsewhere in configuration
Exported attributes	The specific fields a given data source makes available, defined per-provider in the Terraform Registry docs

Common Use Cases

Referencing a manually-provisioned AWS database's host, name, or user in a Terraform-managed application resource.
Pulling values from infrastructure created outside Terraform's control — ad-hoc scripts, other config management tools, or a separate Terraform state directory.
Looking up cloud provider metadata (e.g. the latest AMI ID, an existing VPC ID, or an existing IAM policy) without managing the lifecycle of that object.
Reading a locally-created file's content to feed into a Terraform-managed resource, as in the local_file example below.

Example Configuration or Commands

Resource managed outside Terraform

A file is created independently of Terraform, for example via a shell script:

cat /root/dog.txt

Dogs are awesome!

Terraform has no knowledge of this file — it exists only in "real world infrastructure," not in terraform.tfstate.

Reading it with a data source

data "local_file" "dog" {
  filename = "/root/dog.txt"
}

resource "local_file" "pet" {
  filename = "/root/pets.txt"
  content  = data.local_file.dog.content
}

The data keyword replaces resource to declare a read-only lookup.
The resource type (local_file) follows, same as in a managed resource block.
The logical name (dog) is used to reference this data elsewhere.
Arguments inside the block (filename) are specific to that data source — check the Terraform Registry provider docs for which arguments and exported attributes apply.

The value is referenced elsewhere in configuration as:

data.local_file.dog.content

For the local_file data source, the Terraform Registry documents two exported attributes: content and the base64-encoded equivalent.

Resource vs. data source comparison

	Resource	Data Source
Keyword	`resource`	`data`
Capability	Creates, updates, destroys infrastructure	Only reads infrastructure
Also known as	Managed resource	Data resource

Best Practices

Check the Terraform Registry provider documentation for each data source's required arguments and exported attributes before use — these vary per resource type and provider.
Use data sources instead of hardcoding values (IDs, ARNs, file contents) that come from infrastructure outside the current configuration.
Prefer a data source over importing a resource into state when you only need to read its attributes, not manage its lifecycle.
Keep data source lookups scoped narrowly (a single file, a single instance) rather than broad queries that could return ambiguous or multiple results.

Security Best Practices

Security

Data sources can expose sensitive attributes (e.g. database credentials, secrets stored in tags) into Terraform's plan output and state file — treat that output with the same care as managed resource state.
Be cautious referencing data sources that read from files or systems writable by other processes; unexpected content changes will flow directly into your managed resources on the next apply.
Restrict who can modify the externally-managed resource a data source reads from, since Terraform has no control over (or audit trail for) those changes.

Do and Don't

✅ Do	❌ Don't
Use a data source to read attributes from externally-managed infrastructure	Manually duplicate values from external resources into your config
Check the Registry docs for exported attributes before referencing them	Guess at attribute names and assume they're consistent across providers
Reference data via `data.<type>.<name>.<attribute>`	Confuse a data source's logical name with a managed resource of the same type
Use data sources for true read-only lookups	Use a data source when you actually need Terraform to manage the resource's lifecycle

Common Mistakes

Forgetting the data keyword and accidentally declaring a second managed resource block instead, which would attempt to create or conflict with the external resource.
Assuming every resource type's data source exposes the same attributes as its managed resource counterpart — exported attributes are defined per data source.
Not realizing that changes to the externally-managed resource (like the dog.txt file content) will flow into the dependent managed resource on the next terraform apply.

Troubleshooting

# Confirm what a data source is currently reading
terraform plan

# Inspect the resolved value of a data source after apply
terraform show

# Check the Terraform Registry for a provider's data source arguments and exports
# (no CLI command — refer to registry.terraform.io/providers/<provider>/latest/docs/data-sources)

Real-World Examples

Platform Team — Bridging Manually-Provisioned Databases

Scenario: A company had a production RDS database provisioned manually before adopting Terraform. Problem: New application infrastructure needed the database's host address and name, but the team didn't want to import and take ownership of the existing database in Terraform. Solution: Used a data source to read the existing database's attributes and passed them into a Terraform-managed application resource's connection configuration. Outcome: New infrastructure could reference the existing database safely without risking accidental modification or deletion of a resource Terraform didn't own.

Multi-Tool Infrastructure — Reading Ansible-Managed Config

Scenario: A hybrid environment where base servers were configured with Ansible, but application-layer resources were defined in Terraform. Problem: Terraform-managed resources needed values (like a generated config file's contents) that only existed after Ansible runs completed. Solution: Used a data source to read the Ansible-generated file and feed its content into the dependent Terraform resource. Outcome: The two tools coexisted without Terraform attempting to manage resources outside its scope.

Cloud Migration Team — Cross-State Data Sharing

Scenario: A large organization split infrastructure across multiple separate Terraform configurations (state files) by team. Problem: A networking team's VPC needed to be referenced by an application team's configuration, but merging state files wasn't desirable. Solution: Used data sources to look up the VPC's attributes from the networking team's already-provisioned resources, rather than duplicating the VPC definition. Outcome: Teams maintained independent state files while safely sharing required infrastructure values.

Quick Recap

Data sources read attributes from resources Terraform doesn't manage, using the data keyword instead of resource.
They never create, update, or destroy infrastructure — only read it.
Retrieved values are accessed via data.<type>.<logical_name>.<attribute>.
Each data source's required arguments and exported attributes are defined per provider in the Terraform Registry.
Resources are "managed resources"; data sources are also called "data resources."

Interview / Revision Notes

Q: What's the main difference between a resource and a data source? Resources create, update, and destroy infrastructure; data sources only read information from existing infrastructure.
Q: What keyword declares a data source block? data, followed by the resource type and a logical name.
Q: How do you reference a value read by a data source elsewhere in configuration? data.<resource_type>.<logical_name>.<attribute>.
Q: Where do you find which arguments and attributes a data source supports? The Terraform Registry documentation for that provider's data sources.
Q: What is another name for a managed resource? For a data source? Managed resource; data resource.