Duplicate Detection

Guide 45: Duplicate Detection

Prevent and resolve duplicate transactions

Overview

Duplicates can occur when importing transactions from multiple sources (bank sync, manual import, direct entry). OtterLedger automatically detects and helps resolve duplicates.

How Detection Works

Multi-Algorithm Similarity Pipeline

OtterLedger uses a three-algorithm pipeline when scanning for duplicate payees. Each algorithm runs in sequence, and a payee can only appear in one group — once matched, it is excluded from subsequent passes.

Algorithm 1: Normalized Name Comparison

The first pass normalizes each payee name (stripping case differences, extra spaces, and punctuation) and groups exact normalized matches together. This catches cases like Amazon.com vs AMAZON COM vs amazon com.

Confidence: ~95%
Handles: case differences, spacing differences, punctuation differences

Algorithm 2: Levenshtein Distance (Edit Distance)

The second pass compares the normalized names of all remaining (ungrouped) payees pairwise using Levenshtein distance — a measure of how many single-character edits (insertions, deletions, substitutions) are needed to transform one name into the other.

Two payees are flagged as likely duplicates if:

Edit distance is 3 or fewer characters, and
The edit distance is less than 30% of the longer name's length

This catches typos and minor name variations such as Starbucks vs Starbuck or Walgreens vs Walgreen.

Confidence: 50–85% (scales with similarity; the closer the names, the higher the score)
Handles: typos, truncated names, minor spelling variations

Algorithm 3: Common Prefix Matching

The third pass looks for payees whose normalized names share a significant common prefix. This catches cases where the same merchant appears with different trailing identifiers, store numbers, or location suffixes.

Confidence: lower than algorithms 1 and 2
Handles: location suffixes, store numbers, branch identifiers

Matching Criteria for Transaction Duplicates

During import, transaction-level duplicate detection checks:

Amount - Exact or within tolerance
Date - Same day or within range
Account - Same account
Payee - Similar payee names

Confidence Scores

Each detected duplicate group displays a confidence score from 0 to 100% indicating how likely the match is to be a true duplicate.

Score Range	Meaning	Recommended Action
90–100%	Near-certain duplicate	Review and merge or delete
70–89%	Likely duplicate	Review carefully before merging
50–69%	Possible duplicate	Inspect transaction history before deciding
Below 50%	Low confidence	Use "Keep Both" unless you recognize the duplication

Tip: Confidence scores are shown in the duplicate scan results next to each group. Sort by confidence (highest first) to tackle the most obvious duplicates first.

Match Confidence Summary (Import)

Level	Criteria Met	Action
Definite	All match exactly	Auto-skip
Likely	3+ criteria match	Flag for review
Possible	2 criteria match	Suggest review

During Import

Preview Screen

Before completing import, review flagged duplicates:

Keep Both - Import anyway
Skip - Don't import this transaction
Skip All Similar - Skip all with same pattern

Settings

Settings → Import → Duplicate Detection

Enable/disable detection
Set date range tolerance
Set amount tolerance

Finding Existing Duplicates

Duplicate Finder

Tools → Find Duplicates

Select accounts to scan
Choose date range
Set sensitivity
Click Scan

Review Results

Results are grouped by likely duplicate pairs or clusters. Each group shows the confidence score from the detection algorithm.

Date	Amount	Payee	Account	Confidence	Action
1/15	$42.50	Amazon	Checking	95%	[Keep] [Delete] [Merge]
1/15	$42.50	AMZN	Checking	95%	[Keep] [Delete] [Merge]

Resolution Options

Delete Duplicate

Remove one transaction entirely.

Merge Transactions

Combine into single transaction:

Keeps earlier date
Preserves memo and attachments from both
Uses categorization from the one you choose

For merging payee records (not just individual transactions), see Guide 47: Payee Management & Merge, which covers the full payee merge workflow including reassigning transaction history and consolidating AI learning rules.

Keep Both

Mark as "not duplicates" — won't be flagged again.

Prevention Tips

Use one import source per account - Bank sync OR manual import
Set clear date ranges - Don't overlap import periods
Wait for sync - Don't manually enter transactions that will sync
Review immediately - Catch duplicates before reconciliation

Common Scenarios

Pending vs Posted

Pending transaction imports
Same transaction posts later
Amount or date may differ slightly

Multiple Bank Accounts

Transfer shows in both accounts
Not a duplicate — it's the same transfer
Link as transfer instead

Manual Entry + Import

You enter a transaction manually
Bank sync imports same transaction
Use duplicate finder to merge

Settings

Setting	Description	Default
Date tolerance	Days before/after	3
Amount tolerance	% variance allowed	0%
Auto-skip definite	Skip exact matches	On
Flag likely	Show review prompt	On

See also: