Skip to main content

One post tagged with "Query S3 logs"

View All Tags

Querying Compressed S3 Logs Using AWS Athena: A Step-by-Step Guide

· 5 min read
Cloud & AI Engineering
Arina Technologies
Cloud & AI Engineering

Managing logs in cloud environments can often come with challenges, particularly when using solutions like Elasticsearch, which require managing clusters, shards, and other components that increase both cost and complexity. If you're looking for an alternative that provides the same querying capabilities without the added overhead, AWS Athena is a powerful tool to consider. In this blog post, we'll walk you through how to leverage Athena with Amazon S3 to efficiently query logs from CloudWatch without breaking the bank. Refer How to Export AWS CloudWatch Logs to OpenSearch (Elasticsearch) for details on exporting CloudWatch logs to OpenSearch/Elasticsearch.



Challenges with Opensearch/Elasticsearch

When managing logs through Opensearch/Elasticsearch, several issues can arise:

  • Cluster Management: You need to spin up a cluster to query logs, which can get complicated based on the number of log groups.
  • Cost: You incur costs for compute resources (such as EC2 instances), storage (which can scale to gigabytes), and managing shards and replication.
  • Policies: There is a need to create cost-optimization policies, manage shards, and regularly review the cluster.

These factors make Opensearch/Elasticsearch a more costly and complex solution, especially for organizations with large-scale logging needs. So, what's the alternative?


AWS Athena to the Rescue

Athena is a serverless query service that allows you to analyze structured and semi-structured data directly stored in Amazon S3 using standard SQL. Here's why Athena is an excellent choice:

  • Serverless: There's no need to manage servers or clusters.
  • Cost-Effective: You pay only for the queries you run, rather than for maintaining a persistent cluster.
  • No Data Transfer Needed: You can query data directly from files stored in S3 without having to move the data into a database.
  • Supports Multiple Formats: Athena supports compressed data formats, such as Gzip, which help reduce both storage costs and improve query performance.

Step-by-Step Guide to Querying Logs with Athena


Here is a quick walkthrough of how to query CloudWatch logs using Athena:


  1. Export Logs from CloudWatch to S3:

  • Create an S3 bucket for exporting the cloudWatch logs.

S3 bucket
  • You can do this by creating a new S3 bucket (or using an existing one) and configuring CloudWatch to store logs there.

permissions

  • Ensure that CloudWatch has appropriate permissions to write to the S3 bucket.
  • On log group, select Action --> Export data to S3 as shown in following figure:

Export Logs from CloudWatch to S3
  1. Create a Database in Athena:
    • Navigate to the Athena console and create a new database. This database will store your logs in a structured format, making it easy to query them using SQL.

Database in Athena
  • For this setup, specify the compression format (e.g., Gzip) and the S3 location where your logs are stored.

  1. Set Up Queries:
    • Once the database is ready, you can create tables that reflect the structure of your logs. This will enable you to query log data stored in various formats (structured, semi-structured, or unstructured) without any transformations.
    • Write your SQL queries to extract the relevant data, and run them directly in the Athena query editor.

Sample query:
CREATE EXTERNAL TABLE IF NOT EXISTS log_database.application_logs ( timestamp_iso STRING, log_date STRING, log_time STRING, message STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ( "input.regex" = "^([0-9\\-T:.Z]+)\\s([0-9\\-]+)\\s([0-9:.]+)\\s.*\\s-\\s(.*)$"
)
STORED AS TEXTFILE
LOCATION 's3://yt-athena-logs/cloudwatch/7f344873-86a8-4d17-b242-7302242a12d2/'
TBLPROPERTIES ('skip.header.line.count'='0', 'compressionType'='GZIP')

tables
  1. Analyze the Data:
    • After running your SQL queries, you’ll be able to see the output in an organized format, similar to what you’d expect from a traditional database system.
    • The querying process may take a few seconds depending on the volume of data, but it's far more efficient than manually decrypting or moving files into a traditional database.

SQL queries

Why Choose Athena Over Opensearch/Elasticsearch?

  • Lower Costs: With Athena, there is no need for persistent EC2 instances or complex cluster management. You pay only for the queries you run, making it ideal for sporadic log analysis.
  • Scalability: Since Athena queries logs directly from S3, it leverages the scalability of Amazon S3 to handle vast amounts of data.
    Refer How to Export AWS CloudWatch Logs to OpenSearch (Opensearch/Elasticsearch): Step-by-Step Tutorial to know how to export CloudWatch logs to OpenSearch
  • Ease of Use: Writing SQL queries for log analysis in Athena is much simpler and more straightforward compared to managing shards and replicas in Opensearch/Elasticsearch.

Conclusion


Using AWS Athena to query logs stored in S3 provides a simpler, more cost-effective solution compared to Opensearch/Elasticsearch. Whether you are dealing with structured or semi-structured data, Athena's powerful querying capabilities make it an excellent choice for those looking to reduce operational overhead and costs.



Ready to take your cloud infrastructure to the next level? Please reach out to us Contact Us