New to OtterLedger? Read the Documentation
45 Guides Available Quick Start Guide
Learn AI Categorization View Guide
Have Questions? Check the FAQ
New to OtterLedger? Read the Documentation
45 Guides Available Quick Start Guide
Learn AI Categorization View Guide
Have Questions? Check the FAQ

Duplicate Detection

Guide 45: Duplicate Detection

Prevent and resolve duplicate transactions


Overview

Duplicates can occur when importing transactions from multiple sources (bank sync, manual import, direct entry). OtterLedger automatically detects and helps resolve duplicates.


How Detection Works

Multi-Algorithm Similarity Pipeline

OtterLedger uses a three-algorithm pipeline when scanning for duplicate payees. Each algorithm runs in sequence, and a payee can only appear in one group — once matched, it is excluded from subsequent passes.

Algorithm 1: Normalized Name Comparison

The first pass normalizes each payee name (stripping case differences, extra spaces, and punctuation) and groups exact normalized matches together. This catches cases like Amazon.com vs AMAZON COM vs amazon com.

  • Confidence: ~95%
  • Handles: case differences, spacing differences, punctuation differences

Algorithm 2: Levenshtein Distance (Edit Distance)

The second pass compares the normalized names of all remaining (ungrouped) payees pairwise using Levenshtein distance — a measure of how many single-character edits (insertions, deletions, substitutions) are needed to transform one name into the other.

Two payees are flagged as likely duplicates if:

  • Edit distance is 3 or fewer characters, and
  • The edit distance is less than 30% of the longer name's length

This catches typos and minor name variations such as Starbucks vs Starbuck or Walgreens vs Walgreen.

  • Confidence: 50–85% (scales with similarity; the closer the names, the higher the score)
  • Handles: typos, truncated names, minor spelling variations

Algorithm 3: Common Prefix Matching

The third pass looks for payees whose normalized names share a significant common prefix. This catches cases where the same merchant appears with different trailing identifiers, store numbers, or location suffixes.

  • Confidence: lower than algorithms 1 and 2
  • Handles: location suffixes, store numbers, branch identifiers

Matching Criteria for Transaction Duplicates

During import, transaction-level duplicate detection checks:

  • Amount - Exact or within tolerance
  • Date - Same day or within range
  • Account - Same account
  • Payee - Similar payee names

Confidence Scores

Each detected duplicate group displays a confidence score from 0 to 100% indicating how likely the match is to be a true duplicate.

Score Range Meaning Recommended Action
90–100% Near-certain duplicate Review and merge or delete
70–89% Likely duplicate Review carefully before merging
50–69% Possible duplicate Inspect transaction history before deciding
Below 50% Low confidence Use "Keep Both" unless you recognize the duplication

Tip: Confidence scores are shown in the duplicate scan results next to each group. Sort by confidence (highest first) to tackle the most obvious duplicates first.

Match Confidence Summary (Import)

Level Criteria Met Action
Definite All match exactly Auto-skip
Likely 3+ criteria match Flag for review
Possible 2 criteria match Suggest review

During Import

Preview Screen

Before completing import, review flagged duplicates:

  • Keep Both - Import anyway
  • Skip - Don't import this transaction
  • Skip All Similar - Skip all with same pattern

Settings

SettingsImportDuplicate Detection

  • Enable/disable detection
  • Set date range tolerance
  • Set amount tolerance

Finding Existing Duplicates

Duplicate Finder

ToolsFind Duplicates

  1. Select accounts to scan
  2. Choose date range
  3. Set sensitivity
  4. Click Scan

Review Results

Results are grouped by likely duplicate pairs or clusters. Each group shows the confidence score from the detection algorithm.

Date Amount Payee Account Confidence Action
1/15 $42.50 Amazon Checking 95% [Keep] [Delete] [Merge]
1/15 $42.50 AMZN Checking 95% [Keep] [Delete] [Merge]

Resolution Options

Delete Duplicate

Remove one transaction entirely.

Merge Transactions

Combine into single transaction:

  • Keeps earlier date
  • Preserves memo and attachments from both
  • Uses categorization from the one you choose

For merging payee records (not just individual transactions), see Guide 47: Payee Management & Merge, which covers the full payee merge workflow including reassigning transaction history and consolidating AI learning rules.

Keep Both

Mark as "not duplicates" — won't be flagged again.


Prevention Tips

  1. Use one import source per account - Bank sync OR manual import
  2. Set clear date ranges - Don't overlap import periods
  3. Wait for sync - Don't manually enter transactions that will sync
  4. Review immediately - Catch duplicates before reconciliation

Common Scenarios

Pending vs Posted

  • Pending transaction imports
  • Same transaction posts later
  • Amount or date may differ slightly

Multiple Bank Accounts

  • Transfer shows in both accounts
  • Not a duplicate — it's the same transfer
  • Link as transfer instead

Manual Entry + Import

  • You enter a transaction manually
  • Bank sync imports same transaction
  • Use duplicate finder to merge

Settings

Setting Description Default
Date tolerance Days before/after 3
Amount tolerance % variance allowed 0%
Auto-skip definite Skip exact matches On
Flag likely Show review prompt On

See also: