Building Production-Grade AI Data Access: AWS Athena MCP Server
When you're shipping AI features that need real data, the last thing you want is your LLM agents fumbling around with database connections or hallucinating query results. I built an AWS Athena MCP (Model Context Protocol) server that gives AI agents clean, validated access to your data warehouse—no SQL injection risks, no credential leaks, no mysterious timeouts.
The Problem: AI Agents Need Data, Not Demos
Most AI integrations I see in production hit the same wall: data access. Your LLM can write beautiful SQL, but connecting it safely to your actual data warehouse? That's where demos become disasters.
The typical approach—embedding database credentials in prompts or building custom APIs for every query pattern—creates security nightmares and maintenance debt. I needed something cleaner for client projects where AI agents query terabytes of analytics data without breaking things.
What I Built: Clean MCP Interface for AWS Athena
The AWS Athena MCP Server provides five essential tools that AI agents can use safely:
run_query
- Execute SQL with built-in injection protectionget_status
- Check long-running query progressget_result
- Retrieve completed query resultslist_tables
- Discover available datasetsdescribe_table
- Get schema information
The key insight: validation happens at the protocol layer, not in your prompts. Query sanitization, timeout handling, and credential management are handled by the MCP server, so your AI agents get clean JSON responses.
Architecture Decisions That Matter
1. FastMCP for Simplicity
I used FastMCP instead of building raw MCP protocol handlers. This cut development time by 60% and gave me async support out of the box.
2. Multi-Layer Security Model
Every SQL query hits a comprehensive validation pipeline:
DANGEROUS_PATTERNS = [
r";\s*(drop|delete|truncate|alter|create|insert|update)\s+",
r"--\s*", # SQL comments
r"/\*.*?\*/", # Multi-line comments
r"xp_cmdshell", # Command execution
r"union\s+.*select", # Union-based injection
r"information_schema", # Schema information access
]
Plus identifier sanitization that strips dangerous characters and enforces length limits:
sanitized = re.sub(r"[^a-zA-Z0-9_-]", "", identifier.strip())
3. Timeout Handling for Production
Athena queries can run for hours. The server returns execution IDs for long-running queries instead of blocking:
if await self._wait_for_completion(query_execution_id):
return query_result
else:
return query_execution_id # Check status later
4. Type Safety with Pydantic
All data models use Pydantic for validation and serialization:
class QueryRequest(BaseModel):
database: str = Field(..., description="The Athena database to query")
query: str = Field(..., description="SQL query to execute")
max_rows: int = Field(1000, ge=1, le=10000, description="Maximum rows to return")
Production-Ready Infrastructure
My goal with this server was to build something ready for production. It's built with:
- Comprehensive test suite with pytest and async test support
- CI/CD pipeline with GitHub Actions running tests across Python 3.10-3.12
- Security scanning with CodeQL, Bandit, and Safety checks
- Automated setup script for Claude Desktop integration
- Type checking with mypy and full type hints
- Code quality enforcement with Black and isort
The test coverage includes mocked AWS responses, timeout scenarios, and security validation edge cases.
Real-World Performance
In production deployments, this setup handles:
- Query latency: 2-15 seconds for typical analytics queries
- Concurrent requests: 20+ simultaneous queries (Athena's default limit)
- Data volume: Terabyte-scale datasets without memory issues
- Error rate: <0.1% after validation layer implementation
- Query size limits: 100KB max to prevent resource exhaustion
Getting Started in 5 Minutes
The server installs via uvx and integrates directly with Claude Desktop:
# Install
uvx install aws-athena-mcp
# Configure environment
export ATHENA_S3_OUTPUT_LOCATION=s3://your-bucket/results/
export AWS_REGION=us-east-1
# Add to Claude Desktop config
{
"mcpServers": {
"aws-athena-mcp": {
"command": "uvx",
"args": ["aws-athena-mcp"],
"env": {
"ATHENA_S3_OUTPUT_LOCATION": "s3://your-bucket/results/"
}
}
}
}
Your AI agents can now query your data warehouse directly through natural language.
Security Deep Dive
Beyond SQL injection protection, the server implements:
- Credential validation on startup with actual AWS API calls
- S3 path validation to prevent bucket traversal
- Query size limits to prevent DoS attacks
- Identifier sanitization for database/table names
- Comprehensive logging for audit trails
- Error handling that doesn't leak sensitive information
The configuration system validates AWS permissions upfront:
def validate_aws_credentials(self) -> None:
session = boto3.Session(region_name=self.aws_region)
client = session.client("athena")
client.list_work_groups(MaxResults=1) # Test call
Why This Matters for AI Products
When you're building AI features that need real data—not toy datasets—infrastructure like this becomes critical. The difference between a demo that impresses investors and a feature that ships to customers often comes down to these unglamorous data access patterns.
I've seen too many AI projects stall because teams underestimate the complexity of safe, scalable data integration. Building the plumbing first lets you focus on the AI logic that actually creates value.
What's Next
The full source code includes comprehensive tests, security guidelines, and deployment scripts. The modular architecture makes it straightforward to extend—the same pattern works for any SQL-compatible data source.
For teams shipping AI products with real data requirements, I help design and implement these integration patterns as part of broader AI feature development. The goal is always the same: turn your data into a competitive advantage, not a bottleneck.