Building an Gmail Auto Labeler With LLMs: A Step-by-Step Guide

I've been experimenting with LLM classification lately and found a neat way to tame my messy inbox. Using cheap, small LLMs like GPT-4-mini and Ollama, I built an email classifier that auto-categorizes my emails. It's been a game-changer. In this post, I'll walk you through how I set it up. We'll cover connecting to Gmail, fetching unread emails, and letting the LLMs do their thing. It's pretty cool to watch the AI categorize emails, almost like it's rubber duck debugging with itself. Ready to build your own AI email assistant? Let's dive in!

Introduction

Email classification is a common task in natural language processing, but it often requires expensive models or large datasets. However, with the advent of more affordable LLMs and API services, we can now build a robust email classifier without breaking the bank. This project uses either OpenAI's GPT-4-mini or the locally run Ollama llama3 8B model to categorize emails based on their content.

Setting Up the Environment

First, let's set up our environment. We'll need to install several Python libraries and set up our credentials. Here's what you'll need:

import os
import base64
import json
import logging
from datetime import datetime, timedelta
from typing import List
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import Resource, build
from googleapiclient.errors import HttpError
from dotenv import load_dotenv
from openai import OpenAI
import sqlite3
import requests
import time
from ratelimit import limits, sleep_and_retry
from google.auth.transport.requests import Request
 
# Load environment variables
load_dotenv()
 
# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
llm_log_file = 'llm_interactions.jsonl'
 
# Constants and configurations
SCOPES = [
    "https://www.googleapis.com/auth/gmail.readonly",
    "https://www.googleapis.com/auth/gmail.labels",
    "https://www.googleapis.com/auth/gmail.modify",
]
TOKEN_FILE = "token.json"
CREDENTIALS_FILE = "credentials.json"
LAST_RUN_FILE = "last_run.json"
PROCESSED_LABEL = "Processed"
CATEGORY_LABELS = [
    "Marketing",
    "Response Needed / High Priority",
    "Bills",
    "Subscriptions",
    "Newsletters",
    "Personal",
    "Work",
    "Events",
    "Travel",
    "Receipts",
    "Low quality",
    "Notifications"
]
DATABASE_FILE = "email_states.db"
PREVIEW_MODE = False
 
# OpenAI configuration
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_MODEL = "gpt-4o-mini"
LLM_SERVICE = os.getenv("LLM_SERVICE", "OpenAI")  # Default to OpenAI if not specified
 
# Ollama API URL
OLLAMA_API_URL = "http://0.0.0.0:11434/api/chat"

This setup includes importing necessary libraries, setting up logging, and defining constants and configurations. We're using environment variables to store sensitive information like API keys.

Connecting to Gmail

To interact with Gmail, we need to set up authentication. Here's how we create a Gmail client:

def get_gmail_client() -> Resource:
    """Creates and returns a Gmail client."""
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
            creds = flow.run_local_server(port=8080)
        with open(TOKEN_FILE, "w") as token:
            token.write(creds.to_json())
    return build("gmail", "v1", credentials=creds)

This function handles the OAuth2 flow, either by loading existing credentials or by initiating a new authorization flow.

Fetching Unread Emails

Once we have our Gmail client, we can fetch unread emails:

def get_last_run_time() -> datetime:
    """Gets the last run time from file or returns a default time."""
    if os.path.exists(LAST_RUN_FILE):
        with open(LAST_RUN_FILE, 'r') as f:
            data = json.load(f)
            return datetime.fromisoformat(data['last_run'])
    return datetime.now() - timedelta(days=7)  # Default to 7 days ago if no last run
 
def build_query(last_run: datetime) -> str:
    """Builds the query string for fetching emails."""
    return f"is:unread after:{last_run.strftime('%Y/%m/%d')}"
 
def fetch_emails(gmail: Resource, query: str) -> List[dict]:
    """Fetches emails based on the given query."""
    try:
        results = gmail.users().messages().list(userId="me", q=query).execute()
        return results.get("messages", [])
    except HttpError as error:
        logging.error(f"Failed to fetch emails: {error}")
        raise

These functions work together to fetch unread emails since the last time the script was run. This approach ensures we're not repeatedly processing the same emails.

Categorizing Emails with LLMs

The heart of our classifier is the LLM-based categorization. We support both OpenAI's GPT models and the locally run Ollama:

def categorize_email_with_openai(email_content: str) -> str:
    """Categorizes an email using OpenAI's language model."""
    client = OpenAI(api_key=OPENAI_API_KEY)
    prompt = f"""
    Categorize the following email into one of these categories: {', '.join(CATEGORY_LABELS)}.
    Respond with only the category name.
 
    Email content:
    {email_content}
    """
 
    try:
        response = client.chat.completions.create(
            model=OPENAI_MODEL,
            messages=[
                {"role": "system", "content": "You are an AI assistant that categorizes emails."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=10,
            temperature=0.3
        )
        category = response.choices[0].message.content.strip()
        return category if category in CATEGORY_LABELS else "Other"
    except Exception as e:
        logging.error(f"Error in OpenAI categorization: {e}")
        return "Other"
 
def categorize_email_with_ollama(email_content: str) -> str:
    """Categorizes an email using the local Ollama LLM."""
    try:
        system_prompt = f"""You are an AI trained to categorize emails into predefined categories.
            Categorize the following email into one of these categories: {', '.join(CATEGORY_LABELS)}.
            Please respond in the following JSON format:
            {{
                "explanation": "string",
                "category": "string"
            }}
        """
        prompt = f"""
        <Email>
        {email_content}
        </Email>
        """
 
        response = call_ollama_api(prompt + system_prompt)
        category = json.loads(response)['category']
        return category if category in CATEGORY_LABELS else "Other"
    except Exception as e:
        logging.error(f"Error in Ollama categorization: {str(e)}")
        return "Other"
 
def categorize_email(email_content: str) -> str:
    """Wrapper function to categorize email using the selected LLM service."""
    if LLM_SERVICE == "OpenAI":
        return categorize_email_with_openai(email_content)
    elif LLM_SERVICE == "Ollama":
        return categorize_email_with_ollama(email_content)
    else:
        logging.error("Invalid LLM service specified.")
        return "Other"

These functions handle the interaction with the LLMs, sending the email content and receiving a categorization. The categorize_email function acts as a wrapper, choosing the appropriate LLM based on the configuration.

Processing and Labeling Emails

Once we have the category for an email, we need to apply the appropriate label:

def get_or_create_label(gmail: Resource, label_name: str) -> str:
    """Gets or creates a label and returns its ID."""
    try:
        results = gmail.users().labels().list(userId="me").execute()
        labels = results.get("labels", [])
        for label in labels:
            if label["name"] == label_name:
                return label["id"]
 
        # If the label doesn't exist, create it
        label = {
            "name": label_name,
            "labelListVisibility": "labelShow",
            "messageListVisibility": "show"
        }
        created_label = gmail.users().labels().create(userId="me", body=label).execute()
        return created_label["id"]
    except HttpError as error:
        logging.error(f"An error occurred while managing label {label_name}: {error}")
        return None
 
def add_labels_to_email(gmail: Resource, email_id: str, label_ids: List[str]):
    """Adds labels to a specific email."""
    if PREVIEW_MODE:
        logging.info(f"Preview: Would add labels {label_ids} to email {email_id}")
        return
    try:
        gmail.users().messages().modify(
            userId="me",
            id=email_id,
            body={"addLabelIds": label_ids}
        ).execute()
        logging.info(f"Labels added to email {email_id}")
    except HttpError as error:
        logging.error(f"An error occurred while adding labels to email {email_id}: {error}")

These functions handle the creation of labels (if they don't exist) and the application of labels to emails.

Automating Inbox Management

To keep our inbox clean, we can automatically move certain categories of emails out of the inbox:

def remove_from_inbox(gmail: Resource, email_id: str):
    """Remove an email from the inbox."""
    try:
        gmail.users().messages().modify(
            userId='me',
            id=email_id,
            body={'removeLabelIds': ['INBOX']}
        ).execute()
        logging.info(f"Email {email_id} has been removed from the inbox.")
    except HttpError as error:
        logging.error(f"Failed to remove email {email_id} from the inbox: {error}")

This function removes the 'INBOX' label from an email, effectively moving it out of the inbox.

Error Handling and Logging

Throughout the code, we've implemented error handling and logging to keep track of what's happening:

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
llm_log_file = 'llm_interactions.jsonl'
 
# Example of logging
log_entry = {
    "request_timestamp": start_time,
    "response_timestamp": end_time,
    "duration": end_time - start_time,
    "request": {"prompt": prompt},
    "response": response.choices[0].text.strip()
}
 
with open(llm_log_file, 'a') as f:
    f.write(json.dumps(log_entry) + '\n')

This logging helps us track the performance of our LLMs and diagnose any issues that arise.

Conclusion

And there you have it - a neat little email classifier powered by cheap LLMs. It's pretty cool how we can use models like GPT-4-mini or Ollama to wrangle our inboxes. With some Python skills and these LLMs, you can tackle everyday problems like email overload. Play around with different prompts, categories, and LLMs to find what works best for your email chaos. Hope this helps you tame your inbox! Let me know how it goes if you give it a shot. Happy hacking!