Duplicate Detection
Guide 45: Duplicate Detection
Prevent and resolve duplicate transactions
Overview
Duplicates can occur when importing transactions from multiple sources (bank sync, manual import, direct entry). OtterLedger automatically detects and helps resolve duplicates.
How Detection Works
Multi-Algorithm Similarity Pipeline
OtterLedger uses a three-algorithm pipeline when scanning for duplicate payees. Each algorithm runs in sequence, and a payee can only appear in one group — once matched, it is excluded from subsequent passes.
Algorithm 1: Normalized Name Comparison
The first pass normalizes each payee name (stripping case differences, extra spaces, and punctuation) and groups exact normalized matches together. This catches cases like Amazon.com vs AMAZON COM vs amazon com.
- Confidence: ~95%
- Handles: case differences, spacing differences, punctuation differences
Algorithm 2: Levenshtein Distance (Edit Distance)
The second pass compares the normalized names of all remaining (ungrouped) payees pairwise using Levenshtein distance — a measure of how many single-character edits (insertions, deletions, substitutions) are needed to transform one name into the other.
Two payees are flagged as likely duplicates if:
- Edit distance is 3 or fewer characters, and
- The edit distance is less than 30% of the longer name's length
This catches typos and minor name variations such as Starbucks vs Starbuck or Walgreens vs Walgreen.
- Confidence: 50–85% (scales with similarity; the closer the names, the higher the score)
- Handles: typos, truncated names, minor spelling variations
Algorithm 3: Common Prefix Matching
The third pass looks for payees whose normalized names share a significant common prefix. This catches cases where the same merchant appears with different trailing identifiers, store numbers, or location suffixes.
- Confidence: lower than algorithms 1 and 2
- Handles: location suffixes, store numbers, branch identifiers
Matching Criteria for Transaction Duplicates
During import, transaction-level duplicate detection checks:
- Amount - Exact or within tolerance
- Date - Same day or within range
- Account - Same account
- Payee - Similar payee names
Confidence Scores
Each detected duplicate group displays a confidence score from 0 to 100% indicating how likely the match is to be a true duplicate.
| Score Range | Meaning | Recommended Action |
|---|---|---|
| 90–100% | Near-certain duplicate | Review and merge or delete |
| 70–89% | Likely duplicate | Review carefully before merging |
| 50–69% | Possible duplicate | Inspect transaction history before deciding |
| Below 50% | Low confidence | Use "Keep Both" unless you recognize the duplication |
Tip: Confidence scores are shown in the duplicate scan results next to each group. Sort by confidence (highest first) to tackle the most obvious duplicates first.
Match Confidence Summary (Import)
| Level | Criteria Met | Action |
|---|---|---|
| Definite | All match exactly | Auto-skip |
| Likely | 3+ criteria match | Flag for review |
| Possible | 2 criteria match | Suggest review |
During Import
Preview Screen
Before completing import, review flagged duplicates:
- Keep Both - Import anyway
- Skip - Don't import this transaction
- Skip All Similar - Skip all with same pattern
Settings
Settings → Import → Duplicate Detection
- Enable/disable detection
- Set date range tolerance
- Set amount tolerance
Finding Existing Duplicates
Duplicate Finder
Tools → Find Duplicates
- Select accounts to scan
- Choose date range
- Set sensitivity
- Click Scan
Review Results
Results are grouped by likely duplicate pairs or clusters. Each group shows the confidence score from the detection algorithm.
| Date | Amount | Payee | Account | Confidence | Action |
|---|---|---|---|---|---|
| 1/15 | $42.50 | Amazon | Checking | 95% | [Keep] [Delete] [Merge] |
| 1/15 | $42.50 | AMZN | Checking | 95% | [Keep] [Delete] [Merge] |
Resolution Options
Delete Duplicate
Remove one transaction entirely.
Merge Transactions
Combine into single transaction:
- Keeps earlier date
- Preserves memo and attachments from both
- Uses categorization from the one you choose
For merging payee records (not just individual transactions), see Guide 47: Payee Management & Merge, which covers the full payee merge workflow including reassigning transaction history and consolidating AI learning rules.
Keep Both
Mark as "not duplicates" — won't be flagged again.
Prevention Tips
- Use one import source per account - Bank sync OR manual import
- Set clear date ranges - Don't overlap import periods
- Wait for sync - Don't manually enter transactions that will sync
- Review immediately - Catch duplicates before reconciliation
Common Scenarios
Pending vs Posted
- Pending transaction imports
- Same transaction posts later
- Amount or date may differ slightly
Multiple Bank Accounts
- Transfer shows in both accounts
- Not a duplicate — it's the same transfer
- Link as transfer instead
Manual Entry + Import
- You enter a transaction manually
- Bank sync imports same transaction
- Use duplicate finder to merge
Settings
| Setting | Description | Default |
|---|---|---|
| Date tolerance | Days before/after | 3 |
| Amount tolerance | % variance allowed | 0% |
| Auto-skip definite | Skip exact matches | On |
| Flag likely | Show review prompt | On |
See also: