Skip to main content

Command Palette

Search for a command to run...

Auditing AWS with CloudTrail and Analyzing Logs Using Athena

Published
5 min readView as Markdown
Auditing AWS with CloudTrail and Analyzing Logs Using Athena

When you work with AWS at scale, visibility and auditing become as important as compute and storage. Knowing who did what, when, and from where is critical for security, compliance, and troubleshooting.

In this blog, I’ll walk through:

  • What CloudTrail is and how it helps with auditing

  • How CloudTrail logs are stored in Amazon S3

  • How Athena helps query those logs without managing servers

  • Creating tables and partitions in Athena for cost-efficient querying

This setup is commonly used in real-world AWS environments.


Why CloudTrail + Athena?

  • CloudTrail records all AWS API activity

  • S3 stores logs durably and cheaply

  • Athena lets you query logs directly from S3 using SQL

  • No servers, no databases to manage, pay only for what you query


Understanding CloudTrail (Auditing in AWS)

CloudTrail is an auditing service provided by Amazon Web Services that records API calls and account activity across your AWS account.

What CloudTrail Records

CloudTrail captures details such as:

  • Identity of the API caller (IAM user / role)

  • Time of the API call

  • Source IP address

  • AWS service and action performed

  • Request parameters

  • Response elements returned by AWS

This makes CloudTrail extremely useful for:

  • Security investigations

  • Compliance audits

  • Debugging accidental changes

  • Tracking unauthorized access

Default CloudTrail Behavior

  • By default, CloudTrail stores last 90 days of events

  • You can download events as JSON or CSV

  • Events can be stored in:

    • Amazon S3

    • CloudWatch Logs (optional)


Creating a CloudTrail Trail

A Trail enables CloudTrail to continuously deliver log files to your S3 bucket.

Key Points About Trails

  • A trail delivers logs to a specified S3 bucket

  • By default, a trail applies to all regions

  • It records events across all AWS partitions

  • Logs are automatically organized by service and date

Once the trail is created, CloudTrail will start pushing logs to S3 without any manual intervention.


Managing Access and Notifications

IAM Permissions

Using AWS IAM, you can control:

  • Who can create, modify, or delete CloudTrail trails

  • Who can start or stop logging

  • Who can access CloudTrail log buckets in S3

This ensures least-privilege access to sensitive audit data.

SNS Notifications (Optional)

You can also:

  • Create an SNS topic

  • Subscribe to it

  • Receive notifications when new log files are delivered to S3

This is useful for security or compliance alerting.


Querying CloudTrail Logs with Athena

Once logs are in S3, querying them manually is difficult. This is where Athena shines.

What is Athena?

Athena is a serverless, interactive query service that allows you to run SQL queries directly on data stored in S3.

Why Use Athena?

  • No database setup required

  • Query data directly from S3

  • Pay per query, based on data scanned

  • Ideal for ad-hoc analysis

💰 Athena charges around $5 per TB scanned, so optimization matters.


Athena Tables: How They Work

  • All Athena tables are external tables

  • Table schema and actual data are loosely coupled

  • Dropping a table does not delete data from S3

  • You can recreate tables anytime without data loss

This makes Athena safe and flexible for log analysis.


Creating a Table in Athena (Using Console)

Before creating a table, make sure:

  • Log data already exists in S3

  • You’ve copied the S3 object path

Steps to Create Table in Athena

  1. Navigate to Athena Console

  2. Click Get Started

  3. Choose Create table

  4. Select Create table from S3 bucket data

  5. Choose an existing database or create a new one

  6. Paste the S3 object path of your data

  7. Select file format (CSV / JSON)

  8. Define column names and data types

  9. Click Create table

  10. Start querying

Athena will show data scanned for every query — keep an eye on this.


Cost Optimization in Athena

There are two major ways to reduce Athena query costs:

1. Use Compressed Columnar Formats

  • Prefer Parquet or ORC

  • These formats reduce:

    • Data scanned

    • Query execution time

    • Cost

Partitioning helps Athena scan only relevant data, instead of the entire dataset.


Creating Partitions in Athena

Prerequisite

Your S3 folder structure must match the partition column.

Example:

s3://cloudtrail-logs/year=2024/
s3://cloudtrail-logs/year=2025/

Partitioning Strategy

  • Take similar datasets

  • Divide them based on year (or month/day)


Steps to Create Partitioned Table

  1. Create folders in S3 like year=2024, year=2025

  2. Go to Athena → Create table from S3

  3. Provide S3 location

  4. Select file format (CSV / JSON)

  5. Define all columns except partition column

  6. Click Add partition

  7. Add year as partition column

  8. Create table


Loading Partitions

For partitioned tables, queries won’t return results until partitions are loaded.

Two Ways to Load Partitions

  1. Click table name → Load partitions

  2. Use SQL query:

Once partitions are loaded, you can start querying efficiently.


Final Thoughts

Using CloudTrail + Athena together gives you:

  • Complete AWS activity visibility

  • Serverless log analysis

  • Scalable and cost-effective auditing

  • SQL-based troubleshooting without databases

This setup is widely used in production AWS environments, especially for:

  • Security teams

  • Compliance audits

  • Incident investigations

If you’re learning AWS seriously, mastering this flow is a big step forward 🚀


More from this blog

Sam's blog

56 posts