Recently, I encountered a situation that highlights a common issue with LLM frameworks. Here's what happened and why I ended up building my own solution.
The False Promise
LLM frameworks often give a false sense of speed. After writing just 9 lines of code, I thought I had arrived at a solution. Reality proved otherwise.
What followed was an hour of debugging:
- Setting verbose flags
- Reading through source code
- Ultimately scrapping the framework entirely
The Task at Hand
I needed a basic "text to SQL" script to convert natural language into database queries. For some reason, the LLM framework I was using couldn't produce correct queries.
Instead of struggling with the framework, I took a different approach:
- Copied their examples
- Asked Claude (an AI assistant) to "Make this without these trash dependencies"
- Result: A working solution in 5 minutes
You can find the code here: GitHub Gist
The Custom Solution
The core of the solution is the LLMSQLQueryEngine
class. It handles schema retrieval, SQL generation, and query execution. Here's a simplified version of its structure:
class LLMSQLQueryEngine:
def __init__(self, db_path):
self.engine = create_engine(f'sqlite:///{db_path}')
self.inspector = inspect(self.engine)
self.tables = self.inspector.get_table_names()
def get_schema(self):
# Schema retrieval logic
def generate_sql(self, natural_language_query):
# SQL generation using OpenAI API
def execute_sql(self, sql_query):
# Query execution logic
def query(self, natural_language_query):
# Combines generation and execution
Schema Retrieval
def get_schema(self):
schema = []
for table in self.tables:
columns = self.inspector.get_columns(table)
schema.append(f"Table: {table}")
for column in columns:
schema.append(f" - {column['name']}: {column['type']}")
return "\n".join(schema)
This method generates a string representation of the database schema, which we'll use to inform our SQL generation.
SQL Generation
def generate_sql(self, natural_language_query):
schema = self.get_schema()
prompt = f"""Given the following database schema:
{schema}
Generate a SQL query to answer the following question:
{natural_language_query}
Requirements:
...
"""
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a SQL expert. Generate SQL queries based on natural language questions and the given schema."},
{"role": "user", "content": prompt}
],
temperature=0.1,
response_format={"type": "json_object"}
)
response = response.choices[0].message.content.strip()
return json.loads(response)['sql']
This is where the magic happens. We construct a prompt with the schema and natural language query, then use OpenAI's API to generate a SQL query. Note the specific instructions in the prompt to ensure we get a valid SQLite query.
SQL Execution
def execute_sql(self, sql_query):
try:
with self.engine.connect() as connection:
result = connection.execute(sql_query)
return result.fetchall()
except SQLAlchemyError as e:
return f"Error executing SQL query: {str(e)}"
This method executes the generated SQL query against our database and returns the results.
def query(self, natural_language_query):
sql_query = self.generate_sql(natural_language_query)
result = self.execute_sql(text(sql_query))
return {
"natural_language_query": natural_language_query,
"sql_query": sql_query,
"result": result
}
The query
method ties everything together, generating SQL from natural language, executing it, and returning the results along with the original query and generated SQL for transparency.
Advantages of the Custom Solution
- No unnecessary abstractions
- Direct control over the prompt and response handling
- Easier to debug and modify
- Focuses on the core functionality without extra complexity
The Problem with Frameworks
These LLM frameworks often come with:
- Complex, unnecessary abstractions
- Difficulty in progressing beyond toy examples
- Time wasted on negative engineering
- Scattered code across multiple files, making it hard to find the actual prompt
Takeaway
Do it yourself. Building a custom solution often takes less time than wrestling with a framework's complexities. You'll have more control, better understanding, and likely a more efficient result.