AI Categorization
Guide 23: AI Categorization
Let AI automatically categorize your transactions with a 5-tier intelligence system
Overview
OtterLedger uses a 5-tier AI system to automatically categorize your transactions. Starting with simple user-defined rules and progressing through machine learning, local AI, and cloud AI, the system saves you hours of manual work and gets smarter over time as it learns from your choices.
The Categorization Assistant walks you through the entire process in an interactive, step-by-step workflow -- from analyzing payees and detecting transfers to running AI categorization and reviewing results.
What you'll learn:
- How the 5-tier AI categorization pipeline works
- Using the Categorization Assistant's 7-phase workflow
- How web search enrichment identifies cryptic payees
- Training the XGBoost ML model on your own data
- Configuring AI providers and privacy settings
- Tips for getting the best results
Time required: 10-15 minutes to configure; categorization runs automatically after that
Prerequisites
- Transactions imported into OtterLedger (the more, the better the AI performs)
- Categories set up (OtterLedger includes a default category tree to start with)
- For Tier 3 (XGBoost): The XGBoost Python service running locally (optional)
- For Tier 4 (Local LLM): A compatible local model downloaded via AI Setup (optional)
- For Tier 5 (Cloud AI): An API key from OpenAI, Google Gemini, or Anthropic Claude (optional)
The 5-Tier AI System
OtterLedger tries each tier in order, from fastest to smartest. As soon as a tier produces a confident match, the system stops and uses that result. If no tier is confident enough, the transaction is left for you to review.
Tier 1: Rules (Fastest, most reliable)
| If no match...
Tier 2: Payee Learning (Pattern matching from your history)
| If no match...
Tier 3: XGBoost ML (Local machine learning model)
| If no match...
Tier 4: Local LLM (On-device AI model)
| If no match...
Tier 5: Cloud AI (Smartest, requires API key)
[Screenshot: AI tier flow diagram in Settings]
Tier 1: Rules
Your custom rules always run first and take priority over everything else.
- What it does: Matches transactions based on payee names you define. For example, "If payee contains 'Starbucks', set category to Food & Dining:Coffee."
- Speed: Instant
- Accuracy: 100% (you control the rules)
- Privacy: Fully local, no data leaves your computer
Tip: Create rules for your top 20 most frequent payees. This alone can categorize 60-70% of your transactions instantly. See Guide 24: Creating Rules for details.
Tier 2: Payee Learning
OtterLedger remembers how you have categorized transactions in the past and applies those patterns to new transactions.
- What it does: When a new transaction arrives from "STARBUCKS #12345 SEATTLE WA", the system recognizes "Starbucks" from your history and applies the same category.
- Speed: Very fast (in-memory lookups)
- Accuracy: High for recurring payees
- Privacy: Fully local
This tier also includes NSI Merchant Lookup, which uses an open-source database of known merchants to identify brands by name and map them to tax-relevant categories.
Tier 3: XGBoost ML
A local machine learning model trained on your own transaction data.
- What it does: Uses gradient-boosted decision trees to classify transactions based on payee name, amount, date patterns, and account type.
- Speed: Fast (milliseconds per transaction, supports batch processing)
- Accuracy: Medium to high, improves as you train it
- Privacy: Fully local -- runs as a Python microservice on your machine
- Requires: The XGBoost service running at
http://localhost:8101
The XGBoost provider also maintains a merchant dictionary that grows through incremental learning. Every time you correct or confirm a categorization, the dictionary is updated.
Tier 4: Local LLM
A small language model running entirely on your computer.
- What it does: Uses a local AI model (such as Llama 3B via Ollama, or a locally-running Gemini model) to understand transaction descriptions and suggest categories.
- Speed: Moderate (a few seconds per transaction)
- Accuracy: Good for well-known merchants; can struggle with cryptic payee codes
- Privacy: Fully local -- no internet required
OtterLedger detects your hardware (GPU/CPU) during setup and recommends the best model for your system. See the Hardware Detection section below.
Tip: The local LLM has a confidence cap of 90% to prevent overconfident suggestions. Responses are validated against your actual category list using fuzzy matching.
Tier 5: Cloud AI
Advanced cloud-based AI services for the highest accuracy.
- What it does: Sends transaction descriptions to a cloud AI provider (OpenAI GPT, Google Gemini API, or Anthropic Claude) for categorization.
- Speed: 1-3 seconds per transaction (network dependent)
- Accuracy: Highest -- these models have broad knowledge of merchants and businesses
- Privacy: Transaction descriptions are sent to the cloud provider (see Privacy section)
Supported providers:
| Provider | Model | Best For |
|---|---|---|
| Google Gemini | Gemini Pro | General accuracy, good value |
| OpenAI | GPT-4 / GPT-4o | Complex or unusual merchants |
| Anthropic Claude | Claude 3 | Nuanced categorization |
The Categorization Assistant
The Categorization Assistant is an interactive dialog that walks you through categorizing uncategorized transactions. It processes transactions in 7 phases, showing you progress, timing, and results along the way.
Opening the Assistant
- Go to Banking Center
- Select an account (or choose "All Accounts")
- Click the Categorization Assistant button
[Screenshot: Categorization Assistant button in Banking Center]
Phase 1: Initialization
The assistant scans your pending transactions and shows you:
- How many transactions need categorization
- Which accounts are included
- Options to configure before starting
Options available:
- Include All Accounts -- Process transactions across all accounts, not just the selected one
- Match Amazon Orders -- Automatically match Amazon transactions to imported order data
- Pause on Similar Matches -- Stop and ask you when the system finds a close but uncertain match
- Use Research and Cloud Fallback -- Allow web search + cloud AI for transactions that local tiers cannot handle
Click Start to begin.
[Screenshot: Categorization Assistant intro screen with transaction count and options]
Phase 2: Payee Analysis (Auto-Processing)
The assistant runs Tiers 1 and 2 automatically:
- Rules matching -- Your custom rules are applied first
- Historical matching -- Known payees from your categorization history are matched
- Transfer detection -- Transactions that look like transfers between accounts are flagged
You will see a progress bar with the current payee being processed and running counts of matches found.
| Counter | Meaning |
|---|---|
| Rule Matched | Transactions categorized by your rules |
| History Matched | Transactions categorized from past patterns |
| Needs Decision | Transactions that require your input |
[Screenshot: Auto-processing phase with progress bar and match counts]
Phase 3: Web Research Enrichment
For transactions with cryptic payee names (like "TST* BURGERPLACE 847" or "SQ *JOES COFFEE"), the assistant can use web search to identify what the business actually is before sending the transaction to AI.
This phase runs automatically if web search is enabled (see Web Search Enrichment section below).
Phase 4: Transfer Detection
The assistant identifies transactions that appear to be transfers between your accounts, such as:
- "ONLINE TRANSFER TO SAVINGS ...3435"
- "ZELLE PAYMENT TO JOHN DOE"
- "ACH TRANSFER FROM CHECKING"
Detected transfers are matched to your existing accounts when possible. You will be asked to confirm or reject each detected transfer.
[Screenshot: Transfer detection results showing matched accounts]
Phase 5: AI Categorization
Transactions that were not matched in earlier phases are sent through the AI tiers (3, 4, and 5). During this phase you will see:
- Active AI Providers -- Which tiers are being used (e.g., "XGBoost -> Gemini")
- Elapsed Time -- How long the AI phase has been running (e.g., "1:23")
- Estimated Time Remaining -- Projected time to complete (e.g., "~2:45")
- Activity Log -- A real-time feed showing each transaction as it is categorized, with the provider that handled it and the confidence level
The AI processes transactions one by one (or in batches for XGBoost), trying each enabled tier until a confident result is found.
[Screenshot: AI processing phase with elapsed time, ETA, and activity log]
Phase 6: Confidence Review (Decisions)
For transactions where the AI was not fully confident, or where similar payee matches were found, you are asked to make a decision:
- Accept the suggestion -- Apply the AI's recommended category
- Choose a different category -- Select from your category tree
- Use AI -- Send this specific transaction to AI for another attempt
- Skip -- Leave uncategorized for now
For each transaction, you will see:
- The payee name and amount
- Similar payees from your history (if any) with usage counts and match percentages
- The AI's suggestion (if available) with confidence level
- A "Remember this choice" checkbox to save the pattern for future transactions
Confidence is color-coded:
| Confidence | Color | Meaning |
|---|---|---|
| 90%+ | Green | High confidence -- likely correct |
| 70-89% | Yellow/Amber | Medium confidence -- review recommended |
| Below 70% | Red | Low confidence -- manual review needed |
[Screenshot: Decision phase showing similar payee options and AI suggestion]
Phase 7: Final Summary
After all transactions have been processed, you see a complete summary:
- Total transactions processed
- Rule matched -- Categorized by your rules
- History matched -- Categorized from past patterns
- User decisions -- Categorized by your choices
- AI categorized -- Categorized by AI providers
- Skipped -- Left uncategorized
- Patterns saved -- New payee patterns learned for next time
- Amazon matches -- Transactions matched to Amazon order data (if enabled)
[Screenshot: Final summary with categorization statistics]
Web Search Enrichment
Bank transaction descriptions are often cryptic. "POS DEBIT TST* BURGERPLACE 847 PORTLAND OR" does not tell the AI much. Web search enrichment fixes this by looking up the business online before sending the transaction to AI.
How It Works
- The system identifies payees that look cryptic (codes, numbers, abbreviations)
- It searches for the business name using Google Search or DuckDuckGo
- The search result provides the business type (restaurant, retail, etc.) and industry
- This enriched context is passed to the AI tier, dramatically improving accuracy
Search Providers
| Provider | Role | Notes |
|---|---|---|
| Google Search | Primary | Better coverage, but can be rate-limited with many transactions |
| DuckDuckGo | Fallback | Privacy-focused, uses Instant Answer API |
Both providers run in parallel when available. The system uses the preferred provider's result first, falling back to the other if needed.
Configuring Web Search
Go to Settings > AI to configure:
- Enable Web Search Enrichment -- Master toggle for web research
- Prefer Google over DuckDuckGo -- Choose your primary search provider
- Enable Google Search Fallback -- Allow Google as a fallback option
Note: Google Search uses web scraping and may hit rate limits (HTTP 429) when processing many transactions at once. DuckDuckGo is more reliable for large batches but may return fewer results for unusual payees.
XGBoost Machine Learning
The XGBoost provider is a local machine learning service that categorizes transactions using gradient-boosted decision trees. It runs as a lightweight Python microservice on your computer.
How XGBoost Works
- Merchant Dictionary -- A lookup table of known merchants and their categories. This is the primary categorization method and provides the highest confidence (90%+).
- ML Model -- A trained XGBoost classifier that predicts categories based on transaction features (payee text, amount, date, account type). Used when the merchant dictionary has no match.
The service exposes a REST API at http://localhost:8101 with endpoints for single categorization, batch categorization, health checks, and incremental learning.
Batch Processing
XGBoost supports efficient batch processing. During the AI Categorization phase, transactions can be sent in bulk rather than one at a time, significantly speeding up categorization for large imports.
Results are categorized by confidence:
| Confidence Tier | Threshold | Typical Source |
|---|---|---|
| High | 90%+ | Merchant Dictionary match |
| Medium | 70-89% | ML Model prediction |
| Low | Below 70% | Uncertain -- passed to next tier |
Incremental Learning
Every time you confirm or correct a categorization, the XGBoost service updates its merchant dictionary. This means:
- The more you use OtterLedger, the more merchants the dictionary knows
- Corrections are applied immediately -- the same payee will be categorized correctly next time
- The dictionary persists across sessions
Training the XGBoost Model
To train or retrain the ML model on your categorized transactions:
- Go to Settings > AI > XGBoost
- Ensure the XGBoost service is running (check the health indicator)
- Click Train Model
- The service trains on your existing categorized transactions
- Training progress is shown in real time
Tip: You need at least 50-100 categorized transactions for the ML model to be useful. The merchant dictionary, however, works with even a single correction.
Enabling XGBoost
- Install the XGBoost Python service (see setup documentation)
- Start the service: it runs on
http://localhost:8101by default - In Settings > AI, toggle XGBoost Service to On
- The health indicator shows whether the service is reachable and the model/dictionary status
[Screenshot: XGBoost settings showing service URL, health status, and training button]
Friendly Name Extraction
Bank descriptions like "CHECKCARD 0215 TST* BURGERPLACE 847 PORTLAND OR 97209" are hard to read. OtterLedger's Friendly Name Extractor uses the local LLM to clean these up into readable names like "Burger Place."
This happens automatically during import and categorization. The extracted friendly name is:
- Stored as the transaction's display payee name
- Used by AI tiers for better categorization accuracy
- Shown in your transaction list for easier reading
If a description already looks clean (short, no codes or reference numbers), the extractor skips it to save processing time.
Hardware Detection
When you first set up AI features, OtterLedger detects your hardware capabilities to recommend the best configuration:
- GPU Detection -- Identifies whether you have a compatible NVIDIA, AMD, or Intel GPU for accelerated AI inference
- CPU Detection -- Assesses CPU cores and memory for running local models
- Model Recommendation -- Suggests the optimal local LLM model size based on your hardware
You can re-run hardware detection from Settings > AI > Detect Hardware.
[Screenshot: Hardware detection results showing GPU/CPU capabilities and recommended model]
Configuring AI Settings
Accessing Settings
Go to Settings > AI to view and configure all AI features.
For detailed configuration instructions, see Guide 38: AI Configuration.
Quick Configuration
| Tier | Toggle | Recommendation |
|---|---|---|
| Tier 1: Rules | Always on | Create rules for your top payees |
| Tier 2: Payee Learning | Always on | Learns automatically from your choices |
| Tier 3: XGBoost ML | On if service is running | Best balance of speed and accuracy |
| Tier 4: Local LLM | On if model is downloaded | Good for offline use |
| Tier 5: Cloud AI | Optional | Enable for highest accuracy; requires API key |
Confidence Thresholds
You can adjust how confident the AI must be before automatically applying a category:
| Setting | Default | Description |
|---|---|---|
| Auto-Accept Threshold | 90% | Categories at or above this confidence are applied automatically |
| Minimum Confidence | 70% | Suggestions below this threshold are discarded |
[Screenshot: AI confidence threshold settings]
Training the AI
How the AI Learns
The AI improves every time you interact with it:
- Categorize a transaction -- The payee-to-category mapping is saved to the Payee Master database
- Accept an AI suggestion -- Reinforces the AI's choice for that payee
- Correct a wrong suggestion -- The correction overrides the previous mapping and teaches XGBoost's merchant dictionary
- Create a rule -- Rules always take priority and provide 100% reliable categorization
Best Practices for Training
- Be consistent -- Always assign the same category to the same payee. If "Starbucks" is sometimes "Food & Dining:Coffee" and sometimes "Food & Dining:Restaurants", the AI gets confused.
- Use "Remember this choice" -- When the Categorization Assistant asks for a decision, check the "Remember" box to save the pattern.
- Fix mistakes promptly -- If you notice a miscategorized transaction, correct it. The correction teaches the AI immediately.
- Clean up payee names -- The AI performs better with clean, recognizable names. Use the Friendly Name Extractor or manually edit messy bank descriptions.
- Train XGBoost periodically -- After you have categorized a significant batch of transactions, retrain the XGBoost model from Settings to capture new patterns.
Payment Processor Prefixes
The AI knows about common payment processor prefixes and decodes them automatically:
| Prefix | Meaning | Example |
|---|---|---|
TST* |
Toast POS (restaurants) | TST* BURGERPLACE -> "Burger Place" |
SQ * |
Square merchant | SQ *JOES COFFEE -> "Joe's Coffee" |
PP* / PAYPAL* |
PayPal transaction | Extracts merchant name after prefix |
GOOGLE* |
Google Play/Services | GOOGLE*YOUTUBE -> "YouTube" |
APPLE.COM/BILL |
Apple subscription | Mapped to Subscriptions |
AMZN MKTP |
Amazon Marketplace | Mapped to Shopping |
UBER EATS / UBER TRIP |
Uber services | Mapped to Food Delivery / Transportation |
Privacy Considerations
Tiers 1-4: Fully Local
- All data stays on your computer -- nothing is sent over the internet
- Tiers 1-3 use only local databases, ML models, and lookup tables
- Tier 4 (Local LLM) runs entirely on your hardware via Ollama or a local Gemini instance
- No internet connection required for these tiers
Tier 5: Cloud AI
When Cloud AI is enabled, the following data is sent to the provider:
Sent:
- Payee name / merchant name
- Transaction amount
- Transaction date
- Memo or description text
- Account type (e.g., "Checking", "Credit Card")
NOT sent:
- Your account numbers
- Your name or personal details
- Your bank name
- Your full financial history
- Your category tree (only a summarized list for matching)
Tip: If privacy is a top concern, you can get excellent results using only Tiers 1-4 (fully local). Enable Tier 5 selectively for transactions that local tiers cannot handle.
Web Search Privacy
When web search enrichment is enabled:
- Google Search -- Payee names are sent to Google as search queries
- DuckDuckGo -- Payee names are sent to DuckDuckGo's Instant Answer API (privacy-focused, no tracking)
Only the payee name is searched. No transaction amounts, dates, or personal information are included in search queries.
Tips and Best Practices
Start with rules for your top payees. The 20-30 merchants you use most often account for the majority of your transactions. Create rules for them first, and the AI handles the rest.
Let the Assistant run all phases. The 7-phase workflow is designed to maximize automated categorization before asking you to make decisions. Do not skip phases.
Enable XGBoost if possible. The XGBoost ML tier provides the best balance of speed, accuracy, and privacy. It processes thousands of transactions in seconds.
Review AI suggestions weekly. Spend a few minutes each week reviewing and correcting any miscategorized transactions. Each correction makes the AI smarter.
Use "Include All Accounts" sparingly. Processing all accounts at once is convenient but can take longer. For routine maintenance, process one account at a time.
Enable web search for cryptic payees. If your bank produces hard-to-read descriptions, web search enrichment significantly improves AI accuracy.
Check the Activity Log. During the AI processing phase, the activity log shows you exactly which provider categorized each transaction and with what confidence. This helps you understand how the system is performing.
Train XGBoost after large imports. When you import a large batch of historical transactions and categorize them, retrain the XGBoost model to capture all the new patterns.
Troubleshooting
Q: The AI keeps getting the same payee wrong.
A: Create an explicit rule for that payee in Guide 24: Creating Rules. Rules always take priority over AI. Also check that you have not categorized the same payee inconsistently in the past.
Q: XGBoost is not available / shows as unhealthy.
A: Make sure the Python XGBoost service is running. Check that http://localhost:8101/health returns a response in your browser. If the service is running but the model is not loaded, try training it from Settings.
Q: The Local LLM is slow or not responding.
A: Check that Ollama is running and a model is downloaded. OtterLedger detects your hardware and recommends a model size. If your system lacks a GPU, consider using a smaller model or relying on XGBoost (Tier 3) and Cloud AI (Tier 5) instead.
Q: Cloud AI is not working.
A: Verify your API key is correct in Settings > AI. Use the Test Connection button to confirm connectivity. Check your internet connection and ensure your API key has not expired or exceeded its usage quota.
Q: Web search is getting rate-limited.
A: Google Search can return HTTP 429 errors when processing many transactions. Try switching to DuckDuckGo as the primary provider, or process smaller batches of transactions.
Q: Categorization confidence is always low.
A: This usually means the AI providers do not recognize the payee. Try enabling web search enrichment to give the AI more context. Also ensure you have enough categorized transactions (50+) for the learning tiers to work effectively.
Q: How do I reset AI learning for a specific payee?
A: Go to Settings > AI > Training to view and manage learned payee associations. You can delete individual patterns or reset all learning.
Q: Amazon transactions are not being matched.
A: Make sure you have imported your Amazon order data and that the "Match Amazon Orders" option is enabled in the Categorization Assistant. The assistant matches Amazon bank transactions to your imported order history and creates itemized splits.
What's Next?
- Guide 24: Creating Rules -- Build custom categorization rules for your most frequent payees
- Guide 25: AI Assistant -- Ask AI questions about your finances using natural language
- Guide 38: AI Configuration -- Detailed setup for all AI providers, API keys, and advanced settings
Need help? Visit the OtterLedger community at github.com/openledger or check the FAQ.