New to OtterLedger? Read the Documentation
45 Guides Available Quick Start Guide
Learn AI Categorization View Guide
Have Questions? Check the FAQ
New to OtterLedger? Read the Documentation
45 Guides Available Quick Start Guide
Learn AI Categorization View Guide
Have Questions? Check the FAQ

AI Categorization

Guide 23: AI Categorization

Let AI automatically categorize your transactions with a 5-tier intelligence system


Overview

OtterLedger uses a 5-tier AI system to automatically categorize your transactions. Starting with simple user-defined rules and progressing through machine learning, local AI, and cloud AI, the system saves you hours of manual work and gets smarter over time as it learns from your choices.

The Categorization Assistant walks you through the entire process in an interactive, step-by-step workflow -- from analyzing payees and detecting transfers to running AI categorization and reviewing results.

What you'll learn:

  • How the 5-tier AI categorization pipeline works
  • Using the Categorization Assistant's 7-phase workflow
  • How web search enrichment identifies cryptic payees
  • Training the XGBoost ML model on your own data
  • Configuring AI providers and privacy settings
  • Tips for getting the best results

Time required: 10-15 minutes to configure; categorization runs automatically after that


Prerequisites

  • Transactions imported into OtterLedger (the more, the better the AI performs)
  • Categories set up (OtterLedger includes a default category tree to start with)
  • For Tier 3 (XGBoost): The XGBoost Python service running locally (optional)
  • For Tier 4 (Local LLM): A compatible local model downloaded via AI Setup (optional)
  • For Tier 5 (Cloud AI): An API key from OpenAI, Google Gemini, or Anthropic Claude (optional)

The 5-Tier AI System

OtterLedger tries each tier in order, from fastest to smartest. As soon as a tier produces a confident match, the system stops and uses that result. If no tier is confident enough, the transaction is left for you to review.

Tier 1: Rules (Fastest, most reliable)
   |  If no match...
Tier 2: Payee Learning (Pattern matching from your history)
   |  If no match...
Tier 3: XGBoost ML (Local machine learning model)
   |  If no match...
Tier 4: Local LLM (On-device AI model)
   |  If no match...
Tier 5: Cloud AI (Smartest, requires API key)

[Screenshot: AI tier flow diagram in Settings]

Tier 1: Rules

Your custom rules always run first and take priority over everything else.

  • What it does: Matches transactions based on payee names you define. For example, "If payee contains 'Starbucks', set category to Food & Dining:Coffee."
  • Speed: Instant
  • Accuracy: 100% (you control the rules)
  • Privacy: Fully local, no data leaves your computer

Tip: Create rules for your top 20 most frequent payees. This alone can categorize 60-70% of your transactions instantly. See Guide 24: Creating Rules for details.

Tier 2: Payee Learning

OtterLedger remembers how you have categorized transactions in the past and applies those patterns to new transactions.

  • What it does: When a new transaction arrives from "STARBUCKS #12345 SEATTLE WA", the system recognizes "Starbucks" from your history and applies the same category.
  • Speed: Very fast (in-memory lookups)
  • Accuracy: High for recurring payees
  • Privacy: Fully local

This tier also includes NSI Merchant Lookup, which uses an open-source database of known merchants to identify brands by name and map them to tax-relevant categories.

Tier 3: XGBoost ML

A local machine learning model trained on your own transaction data.

  • What it does: Uses gradient-boosted decision trees to classify transactions based on payee name, amount, date patterns, and account type.
  • Speed: Fast (milliseconds per transaction, supports batch processing)
  • Accuracy: Medium to high, improves as you train it
  • Privacy: Fully local -- runs as a Python microservice on your machine
  • Requires: The XGBoost service running at http://localhost:8101

The XGBoost provider also maintains a merchant dictionary that grows through incremental learning. Every time you correct or confirm a categorization, the dictionary is updated.

Tier 4: Local LLM

A small language model running entirely on your computer.

  • What it does: Uses a local AI model (such as Llama 3B via Ollama, or a locally-running Gemini model) to understand transaction descriptions and suggest categories.
  • Speed: Moderate (a few seconds per transaction)
  • Accuracy: Good for well-known merchants; can struggle with cryptic payee codes
  • Privacy: Fully local -- no internet required

OtterLedger detects your hardware (GPU/CPU) during setup and recommends the best model for your system. See the Hardware Detection section below.

Tip: The local LLM has a confidence cap of 90% to prevent overconfident suggestions. Responses are validated against your actual category list using fuzzy matching.

Tier 5: Cloud AI

Advanced cloud-based AI services for the highest accuracy.

  • What it does: Sends transaction descriptions to a cloud AI provider (OpenAI GPT, Google Gemini API, or Anthropic Claude) for categorization.
  • Speed: 1-3 seconds per transaction (network dependent)
  • Accuracy: Highest -- these models have broad knowledge of merchants and businesses
  • Privacy: Transaction descriptions are sent to the cloud provider (see Privacy section)

Supported providers:

Provider Model Best For
Google Gemini Gemini Pro General accuracy, good value
OpenAI GPT-4 / GPT-4o Complex or unusual merchants
Anthropic Claude Claude 3 Nuanced categorization

The Categorization Assistant

The Categorization Assistant is an interactive dialog that walks you through categorizing uncategorized transactions. It processes transactions in 7 phases, showing you progress, timing, and results along the way.

Opening the Assistant

  1. Go to Banking Center
  2. Select an account (or choose "All Accounts")
  3. Click the Categorization Assistant button

[Screenshot: Categorization Assistant button in Banking Center]

Phase 1: Initialization

The assistant scans your pending transactions and shows you:

  • How many transactions need categorization
  • Which accounts are included
  • Options to configure before starting

Options available:

  • Include All Accounts -- Process transactions across all accounts, not just the selected one
  • Match Amazon Orders -- Automatically match Amazon transactions to imported order data
  • Pause on Similar Matches -- Stop and ask you when the system finds a close but uncertain match
  • Use Research and Cloud Fallback -- Allow web search + cloud AI for transactions that local tiers cannot handle

Click Start to begin.

[Screenshot: Categorization Assistant intro screen with transaction count and options]

Phase 2: Payee Analysis (Auto-Processing)

The assistant runs Tiers 1 and 2 automatically:

  1. Rules matching -- Your custom rules are applied first
  2. Historical matching -- Known payees from your categorization history are matched
  3. Transfer detection -- Transactions that look like transfers between accounts are flagged

You will see a progress bar with the current payee being processed and running counts of matches found.

Counter Meaning
Rule Matched Transactions categorized by your rules
History Matched Transactions categorized from past patterns
Needs Decision Transactions that require your input

[Screenshot: Auto-processing phase with progress bar and match counts]

Phase 3: Web Research Enrichment

For transactions with cryptic payee names (like "TST* BURGERPLACE 847" or "SQ *JOES COFFEE"), the assistant can use web search to identify what the business actually is before sending the transaction to AI.

This phase runs automatically if web search is enabled (see Web Search Enrichment section below).

Phase 4: Transfer Detection

The assistant identifies transactions that appear to be transfers between your accounts, such as:

  • "ONLINE TRANSFER TO SAVINGS ...3435"
  • "ZELLE PAYMENT TO JOHN DOE"
  • "ACH TRANSFER FROM CHECKING"

Detected transfers are matched to your existing accounts when possible. You will be asked to confirm or reject each detected transfer.

[Screenshot: Transfer detection results showing matched accounts]

Phase 5: AI Categorization

Transactions that were not matched in earlier phases are sent through the AI tiers (3, 4, and 5). During this phase you will see:

  • Active AI Providers -- Which tiers are being used (e.g., "XGBoost -> Gemini")
  • Elapsed Time -- How long the AI phase has been running (e.g., "1:23")
  • Estimated Time Remaining -- Projected time to complete (e.g., "~2:45")
  • Activity Log -- A real-time feed showing each transaction as it is categorized, with the provider that handled it and the confidence level

The AI processes transactions one by one (or in batches for XGBoost), trying each enabled tier until a confident result is found.

[Screenshot: AI processing phase with elapsed time, ETA, and activity log]

Phase 6: Confidence Review (Decisions)

For transactions where the AI was not fully confident, or where similar payee matches were found, you are asked to make a decision:

  • Accept the suggestion -- Apply the AI's recommended category
  • Choose a different category -- Select from your category tree
  • Use AI -- Send this specific transaction to AI for another attempt
  • Skip -- Leave uncategorized for now

For each transaction, you will see:

  • The payee name and amount
  • Similar payees from your history (if any) with usage counts and match percentages
  • The AI's suggestion (if available) with confidence level
  • A "Remember this choice" checkbox to save the pattern for future transactions

Confidence is color-coded:

Confidence Color Meaning
90%+ Green High confidence -- likely correct
70-89% Yellow/Amber Medium confidence -- review recommended
Below 70% Red Low confidence -- manual review needed

[Screenshot: Decision phase showing similar payee options and AI suggestion]

Phase 7: Final Summary

After all transactions have been processed, you see a complete summary:

  • Total transactions processed
  • Rule matched -- Categorized by your rules
  • History matched -- Categorized from past patterns
  • User decisions -- Categorized by your choices
  • AI categorized -- Categorized by AI providers
  • Skipped -- Left uncategorized
  • Patterns saved -- New payee patterns learned for next time
  • Amazon matches -- Transactions matched to Amazon order data (if enabled)

[Screenshot: Final summary with categorization statistics]


Web Search Enrichment

Bank transaction descriptions are often cryptic. "POS DEBIT TST* BURGERPLACE 847 PORTLAND OR" does not tell the AI much. Web search enrichment fixes this by looking up the business online before sending the transaction to AI.

How It Works

  1. The system identifies payees that look cryptic (codes, numbers, abbreviations)
  2. It searches for the business name using Google Search or DuckDuckGo
  3. The search result provides the business type (restaurant, retail, etc.) and industry
  4. This enriched context is passed to the AI tier, dramatically improving accuracy

Search Providers

Provider Role Notes
Google Search Primary Better coverage, but can be rate-limited with many transactions
DuckDuckGo Fallback Privacy-focused, uses Instant Answer API

Both providers run in parallel when available. The system uses the preferred provider's result first, falling back to the other if needed.

Go to Settings > AI to configure:

  • Enable Web Search Enrichment -- Master toggle for web research
  • Prefer Google over DuckDuckGo -- Choose your primary search provider
  • Enable Google Search Fallback -- Allow Google as a fallback option

Note: Google Search uses web scraping and may hit rate limits (HTTP 429) when processing many transactions at once. DuckDuckGo is more reliable for large batches but may return fewer results for unusual payees.


XGBoost Machine Learning

The XGBoost provider is a local machine learning service that categorizes transactions using gradient-boosted decision trees. It runs as a lightweight Python microservice on your computer.

How XGBoost Works

  1. Merchant Dictionary -- A lookup table of known merchants and their categories. This is the primary categorization method and provides the highest confidence (90%+).
  2. ML Model -- A trained XGBoost classifier that predicts categories based on transaction features (payee text, amount, date, account type). Used when the merchant dictionary has no match.

The service exposes a REST API at http://localhost:8101 with endpoints for single categorization, batch categorization, health checks, and incremental learning.

Batch Processing

XGBoost supports efficient batch processing. During the AI Categorization phase, transactions can be sent in bulk rather than one at a time, significantly speeding up categorization for large imports.

Results are categorized by confidence:

Confidence Tier Threshold Typical Source
High 90%+ Merchant Dictionary match
Medium 70-89% ML Model prediction
Low Below 70% Uncertain -- passed to next tier

Incremental Learning

Every time you confirm or correct a categorization, the XGBoost service updates its merchant dictionary. This means:

  • The more you use OtterLedger, the more merchants the dictionary knows
  • Corrections are applied immediately -- the same payee will be categorized correctly next time
  • The dictionary persists across sessions

Training the XGBoost Model

To train or retrain the ML model on your categorized transactions:

  1. Go to Settings > AI > XGBoost
  2. Ensure the XGBoost service is running (check the health indicator)
  3. Click Train Model
  4. The service trains on your existing categorized transactions
  5. Training progress is shown in real time

Tip: You need at least 50-100 categorized transactions for the ML model to be useful. The merchant dictionary, however, works with even a single correction.

Enabling XGBoost

  1. Install the XGBoost Python service (see setup documentation)
  2. Start the service: it runs on http://localhost:8101 by default
  3. In Settings > AI, toggle XGBoost Service to On
  4. The health indicator shows whether the service is reachable and the model/dictionary status

[Screenshot: XGBoost settings showing service URL, health status, and training button]


Friendly Name Extraction

Bank descriptions like "CHECKCARD 0215 TST* BURGERPLACE 847 PORTLAND OR 97209" are hard to read. OtterLedger's Friendly Name Extractor uses the local LLM to clean these up into readable names like "Burger Place."

This happens automatically during import and categorization. The extracted friendly name is:

  • Stored as the transaction's display payee name
  • Used by AI tiers for better categorization accuracy
  • Shown in your transaction list for easier reading

If a description already looks clean (short, no codes or reference numbers), the extractor skips it to save processing time.


Hardware Detection

When you first set up AI features, OtterLedger detects your hardware capabilities to recommend the best configuration:

  • GPU Detection -- Identifies whether you have a compatible NVIDIA, AMD, or Intel GPU for accelerated AI inference
  • CPU Detection -- Assesses CPU cores and memory for running local models
  • Model Recommendation -- Suggests the optimal local LLM model size based on your hardware

You can re-run hardware detection from Settings > AI > Detect Hardware.

[Screenshot: Hardware detection results showing GPU/CPU capabilities and recommended model]


Configuring AI Settings

Accessing Settings

Go to Settings > AI to view and configure all AI features.

For detailed configuration instructions, see Guide 38: AI Configuration.

Quick Configuration

Tier Toggle Recommendation
Tier 1: Rules Always on Create rules for your top payees
Tier 2: Payee Learning Always on Learns automatically from your choices
Tier 3: XGBoost ML On if service is running Best balance of speed and accuracy
Tier 4: Local LLM On if model is downloaded Good for offline use
Tier 5: Cloud AI Optional Enable for highest accuracy; requires API key

Confidence Thresholds

You can adjust how confident the AI must be before automatically applying a category:

Setting Default Description
Auto-Accept Threshold 90% Categories at or above this confidence are applied automatically
Minimum Confidence 70% Suggestions below this threshold are discarded

[Screenshot: AI confidence threshold settings]


Training the AI

How the AI Learns

The AI improves every time you interact with it:

  1. Categorize a transaction -- The payee-to-category mapping is saved to the Payee Master database
  2. Accept an AI suggestion -- Reinforces the AI's choice for that payee
  3. Correct a wrong suggestion -- The correction overrides the previous mapping and teaches XGBoost's merchant dictionary
  4. Create a rule -- Rules always take priority and provide 100% reliable categorization

Best Practices for Training

  1. Be consistent -- Always assign the same category to the same payee. If "Starbucks" is sometimes "Food & Dining:Coffee" and sometimes "Food & Dining:Restaurants", the AI gets confused.
  2. Use "Remember this choice" -- When the Categorization Assistant asks for a decision, check the "Remember" box to save the pattern.
  3. Fix mistakes promptly -- If you notice a miscategorized transaction, correct it. The correction teaches the AI immediately.
  4. Clean up payee names -- The AI performs better with clean, recognizable names. Use the Friendly Name Extractor or manually edit messy bank descriptions.
  5. Train XGBoost periodically -- After you have categorized a significant batch of transactions, retrain the XGBoost model from Settings to capture new patterns.

Payment Processor Prefixes

The AI knows about common payment processor prefixes and decodes them automatically:

Prefix Meaning Example
TST* Toast POS (restaurants) TST* BURGERPLACE -> "Burger Place"
SQ * Square merchant SQ *JOES COFFEE -> "Joe's Coffee"
PP* / PAYPAL* PayPal transaction Extracts merchant name after prefix
GOOGLE* Google Play/Services GOOGLE*YOUTUBE -> "YouTube"
APPLE.COM/BILL Apple subscription Mapped to Subscriptions
AMZN MKTP Amazon Marketplace Mapped to Shopping
UBER EATS / UBER TRIP Uber services Mapped to Food Delivery / Transportation

Privacy Considerations

Tiers 1-4: Fully Local

  • All data stays on your computer -- nothing is sent over the internet
  • Tiers 1-3 use only local databases, ML models, and lookup tables
  • Tier 4 (Local LLM) runs entirely on your hardware via Ollama or a local Gemini instance
  • No internet connection required for these tiers

Tier 5: Cloud AI

When Cloud AI is enabled, the following data is sent to the provider:

Sent:

  • Payee name / merchant name
  • Transaction amount
  • Transaction date
  • Memo or description text
  • Account type (e.g., "Checking", "Credit Card")

NOT sent:

  • Your account numbers
  • Your name or personal details
  • Your bank name
  • Your full financial history
  • Your category tree (only a summarized list for matching)

Tip: If privacy is a top concern, you can get excellent results using only Tiers 1-4 (fully local). Enable Tier 5 selectively for transactions that local tiers cannot handle.

Web Search Privacy

When web search enrichment is enabled:

  • Google Search -- Payee names are sent to Google as search queries
  • DuckDuckGo -- Payee names are sent to DuckDuckGo's Instant Answer API (privacy-focused, no tracking)

Only the payee name is searched. No transaction amounts, dates, or personal information are included in search queries.


Tips and Best Practices

  1. Start with rules for your top payees. The 20-30 merchants you use most often account for the majority of your transactions. Create rules for them first, and the AI handles the rest.

  2. Let the Assistant run all phases. The 7-phase workflow is designed to maximize automated categorization before asking you to make decisions. Do not skip phases.

  3. Enable XGBoost if possible. The XGBoost ML tier provides the best balance of speed, accuracy, and privacy. It processes thousands of transactions in seconds.

  4. Review AI suggestions weekly. Spend a few minutes each week reviewing and correcting any miscategorized transactions. Each correction makes the AI smarter.

  5. Use "Include All Accounts" sparingly. Processing all accounts at once is convenient but can take longer. For routine maintenance, process one account at a time.

  6. Enable web search for cryptic payees. If your bank produces hard-to-read descriptions, web search enrichment significantly improves AI accuracy.

  7. Check the Activity Log. During the AI processing phase, the activity log shows you exactly which provider categorized each transaction and with what confidence. This helps you understand how the system is performing.

  8. Train XGBoost after large imports. When you import a large batch of historical transactions and categorize them, retrain the XGBoost model to capture all the new patterns.


Troubleshooting

Q: The AI keeps getting the same payee wrong.

A: Create an explicit rule for that payee in Guide 24: Creating Rules. Rules always take priority over AI. Also check that you have not categorized the same payee inconsistently in the past.

Q: XGBoost is not available / shows as unhealthy.

A: Make sure the Python XGBoost service is running. Check that http://localhost:8101/health returns a response in your browser. If the service is running but the model is not loaded, try training it from Settings.

Q: The Local LLM is slow or not responding.

A: Check that Ollama is running and a model is downloaded. OtterLedger detects your hardware and recommends a model size. If your system lacks a GPU, consider using a smaller model or relying on XGBoost (Tier 3) and Cloud AI (Tier 5) instead.

Q: Cloud AI is not working.

A: Verify your API key is correct in Settings > AI. Use the Test Connection button to confirm connectivity. Check your internet connection and ensure your API key has not expired or exceeded its usage quota.

Q: Web search is getting rate-limited.

A: Google Search can return HTTP 429 errors when processing many transactions. Try switching to DuckDuckGo as the primary provider, or process smaller batches of transactions.

Q: Categorization confidence is always low.

A: This usually means the AI providers do not recognize the payee. Try enabling web search enrichment to give the AI more context. Also ensure you have enough categorized transactions (50+) for the learning tiers to work effectively.

Q: How do I reset AI learning for a specific payee?

A: Go to Settings > AI > Training to view and manage learned payee associations. You can delete individual patterns or reset all learning.

Q: Amazon transactions are not being matched.

A: Make sure you have imported your Amazon order data and that the "Match Amazon Orders" option is enabled in the Categorization Assistant. The assistant matches Amazon bank transactions to your imported order history and creates itemized splits.


What's Next?


Need help? Visit the OtterLedger community at github.com/openledger or check the FAQ.