Free Download

Built in Rust

Forensic OCR with Audit Trails. Free.

Extract text from scanned documents, PDFs and images with per-word confidence scoring and court-ready chain of custody. Powered by Tesseract 5 LSTM. Hash-chained ed25519 audit log on every action.

Need batch processing? The Forensic Edition adds 1-32 parallel workers, EDRM XML export for Relativity and Nuix, searchable PDFs and signed forensic reports.

Sherlock Forensics OCR Reader is a free forensic OCR tool for Windows powered by Tesseract 5 LSTM. It provides per-word confidence scoring, ed25519 hash-chained audit trails, 7 export formats including EDRM XML v1.2 and table extraction. The Forensic Edition at $67 adds batch OCR with 1-32 workers, searchable PDF export and signed forensic reports.

No signup· SHA-256 verified· 30 MB· Since 2006

Why Sherlock OCR

Built for Forensic Examiners. Usable by Anyone.

Hash-Chained Audit Log

Append-only JSONL audit trail with ed25519 cryptographic signatures on every entry. SHA-256 hash chain makes retroactive edits, insertions or deletions mathematically detectable. Verify offline with just a public key.

Per-Word Confidence Scoring

Tesseract 5 LSTM engine with 0-100 confidence on every word captured from hOCR x_wconf. Flag suspect tokens instantly. Per-line and per-word bounding boxes in image-pixel coordinates for precise visual review.

7 Export Formats

Plain Text, CSV Summary, JSON with bounding boxes, hOCR, EDRM XML v1.2 for Relativity/Nuix/Axiom/FTK ingestion, Searchable PDF with selectable text and Signed Forensic Report with SHA-256 chain verification.

Batch OCR (1-32 Workers)

Process entire folders with parallel workers. Pause, resume or cancel per-file. Per-file progress bars with elapsed time and rolling ETA. Completed files drop below active work automatically. Forensic Edition feature.

Core Engine

Beyond Tesseract: Smarter OCR Out of the Box

Sherlock OCR Reader wraps the Tesseract 5 LSTM neural engine with a forensic-grade preprocessing pipeline and multi-pass voting system that consistently outperforms raw Tesseract on real-world documents. Smart bicubic upscaling recovers text from low-DPI scans that Tesseract alone would garble. Adaptive binarization strips colored backgrounds and decorative elements before the engine ever sees the page. Automatic deskew straightens crooked scans at 0.25-degree precision. Auto-rotation detects and corrects 90/180/270-degree source rotation. The multi-pass voting mode runs four different page segmentation strategies and merges results by best mean confidence, pulling accurate text from pages that defeat any single Tesseract pass. Every preprocessing step is recorded in the audit trail so you know exactly what was applied. Per-word confidence scoring flags suspect tokens instantly. The result: forensic-grade accuracy from an open-source engine, with a complete chain of evidence for every character recognized.

Multi-Pass Voting

Runs PSM 3, 6, 11 and 12 simultaneously and merges results by best mean confidence. Recovers text from difficult pages that single-pass OCR misses.

Multi-Language OCR

Combine any installed Tesseract language packs in a single pass. Built-in selectors for English, French, German, Spanish, Italian and common combinations like eng+fra.

Full Engine Control

Page Segmentation Mode (PSM), OCR Engine Mode (OEM), character whitelist and blacklist, user-words dictionary and user-patterns dictionary for domain vocabulary and regex-style patterns.

One-Click Presets

Optimized for Your Document Type

Document

Clean scanned pages. PSM 3, no preprocessing.

Photo / Poster

Sparse text. PSM 11 with 3x upscale and binarize.

Receipt / Invoice

Columnar data. PSM 4 with binarize and deskew.

Single Paragraph

Signs and labels. PSM 6 with upscale.

Best Quality

Multi-pass voting across PSM 3, 6, 11, 12 with upscale and auto-rotate.

Custom

Full manual control. Auto-flips to Custom when any knob is hand-edited.

Preprocessing

Pure-Rust Image Pipeline

Auto-Rotate

Tesseract OSD detects 90, 180 and 270 degree rotation and uprights losslessly. Actual rotation angle recorded in the audit chain.

Smart Upscaling

Bicubic (Catmull-Rom) upscaling at 1.5x to 4x for low-DPI sources. Automatically skipped when resolution is already sufficient.

Adaptive Binarization

Otsu binarization strips colored backgrounds and decorative elements for clean ink-on-paper output.

Deskew Detection

Hough/row-variance sweep over -10 to +10 degrees at 0.25 degree precision with bilinear interpolation.

Confidence Filtering

Line-drop threshold slider drops lines below operator-set confidence. Garbage-character filter removes cartoon-art-style OCR noise.

Fully Recorded

Pipeline outcome recorded empirically: actual rotation applied, actual upscale factor, actual deskew angle. The audit captures what happened, not just what was requested.

PDF Intelligence

Born-Digital and Scanned PDFs

Sherlock OCR Reader handles born-digital and scanned PDFs in the same document. Text-layer pages extract authored text directly from the PDF stream. Image-only pages run through OCR. Mixed documents process each page with the right method automatically.

Pdfium Rasterization

High-DPI page rendering powered by the same engine behind Chrome. Operator-controlled DPI at 150, 200, 300 (forensic default), 400 or 600.

Born-Digital Fast Path

Text-layer extraction runs in milliseconds. Byte-faithful to what the document contains. Confidence reported as 100.0 to distinguish from OCR.

Embedded Object Recovery

JPEG, JPEG 2000, CCITT Fax G4 image streams and PDF attachments recovered with per-child SHA-256. Page-index attribution for inline images. One-click OCR per embedded image.

Tamper Detection Posture

Avoids the OCR-over-text-PDF pitfall that can mask text-stream divergence from rendered glyphs.

Table Detection

Rule-Less Table Extraction

Sherlock OCR Reader detects tables in receipts, invoices and court exhibits using alignment-based detection that works without visible table borders. Tables include per-table SHA-256 for audit chain integrity and export to CSV.

  • Alignment-based detection finds rule-less tables that line-detection algorithms miss
  • Row clustering by Y-center proximity with adjacency walk over column anchors
  • Minimum 3 consecutive rows sharing 2+ column anchors to suppress prose false-positives
  • Per-table bounding box, mean OCR confidence and SHA-256 locked into the audit chain
  • Copy-as-CSV and Save-as-CSV per table with RFC 4180 quoting
  • Works over both OCR word boxes and born-digital text-layer word boxes

Metadata

Document Metadata Inspector

PDF Properties

Title, Author, Subject, Keywords, Creator, Producer, dates, Trapped flag, custom keys, PDF version, page count, encryption status and linearization detection.

Trailer IDs

Original and modified document GUIDs recovered as hex. The canonical eDiscovery anchor for detecting re-saved-but-unchanged files.

XMP and EXIF

Raw XMP packet preservation. EXIF extraction via kamadak-exif for JPEG, TIFF and PNG. GPS, camera body and capture-date tags distinguished by IFD label.

Crash-Hardened

Malformed PDFs return "no metadata available" instead of crashing. Non-ASCII author and title fields decoded correctly from PDFDocEncoded and UTF-16BE.

Chain of Custody

Ed25519 Hash-Chained Audit Log

Every action in Sherlock OCR Reader is recorded in an append-only JSONL audit log with ed25519 cryptographic signatures and SHA-256 hash chaining. Retroactive edits, insertions or deletions are mathematically detectable. The log can be verified offline with just the public key.

Per-Entry Signatures

Monotonic sequence number, UTC millisecond timestamp, event payload, previous-line hash, current-line SHA-256 and ed25519 signature on every log entry.

Session Anchoring

SessionStart and SessionEnd events capture license holder, operator identity (username, domain, machine, OS, locale) and OCR engine identity including a SHA-256 of the Tesseract binary itself.

Evidence Events

EvidenceVerified, FileOpened, PageOcrComplete, TableExtracted, EmbeddedObjectExtracted, ExportWritten. Every event links source SHA-256 to output SHA-256.

Crash Recovery

fsync on every append. Partially written final lines quarantined to .broken file with timestamps in audit-recovery.log. Zero silent data loss.

Offline Verification

Standalone chain-verify function re-walks the entire log and validates hashes plus sequence and prev_hash continuity end to end. No network connection required.

Air-Gap Compatible

Operator identity captured from environment variables with zero network calls. The entire tool operates on fully air-gapped forensic workstations.

Evidence Protection

Type-State Evidence Model

  • Compile-time enforcement: a file must be hashed and sealed into VerifiedEvidence before any OCR, export or rendering call will accept it. Bypass attempts fail to compile.
  • Read-only opens: every source file opened with no write and no create flags. Even an in-tool defect cannot modify the source.
  • SHA-256 everywhere: source files, output files, tables, embedded objects and the hash chain itself all use SHA-256 (lowercase 64-char hex, eDiscovery-compatible).
  • Streaming hashing: 64 KB read buffer so multi-gigabyte exports hash without exhausting memory.
  • Lineage preservation: even unsupported PDF filters are recorded as "Unsupported" with a description so the audit trail knows they were seen, never silently skipped.

Export

7 Export Formats

.txt Plain text with page-break form-feeds Free
.csv Per-page summary with SHA-256, confidence, line counts Free
.json Per-line bounding boxes, confidence, metadata, tables Pro
.hocr Standard hOCR HTML with x_wconf word confidence Pro
.xml EDRM XML v1.2 for Relativity, Nuix, Axiom, FTK Pro
.pdf Searchable PDF with selectable, Ctrl-F-searchable text Pro
.pdf Signed Forensic Report with SHA-256 chain and ed25519 key Pro

Every export records a post-write SHA-256 of the output file and logs an ExportWritten audit event linking source SHA-256 to target SHA-256.

Compare

Free vs Forensic Edition

FeatureFreeForensic ($67)
Single-file OCRYesYes
Per-word confidence scoringYesYes
Preprocessing pipelineYesYes
PDF text-layer extractionYesYes
Table extractionYesYes
Metadata inspectorYesYes
Hash-chained audit logYesYes
Multi-language OCRYesYes
One-click presetsYesYes
Region of Interest (ROI) selectionYesYes
OCR box overlay (confidence heatmap)YesYes
Dark/light themeYesYes
Plain Text export (.txt)YesYes
CSV Summary export (.csv)YesYes
Batch OCR (folders)Preview onlyFull (1-32 workers)
JSON export-Yes
hOCR export-Yes
EDRM XML v1.2 export-Yes
Searchable PDF export-Yes
Signed Forensic Report-Yes
Multi-worker concurrency1 worker2-32 workers
Embedded image OCR in batch-Yes
Priority support-Yes

Input

Supported File Formats

PDF
Born-digital and scanned. Text-layer extraction or OCR per page. Embedded object recovery.
PNG
Full alpha flattening onto white for clean OCR input on RGBA images.
JPG / JPEG
Standard JPEG with EXIF metadata extraction.
TIF / TIFF
Multi-page TIFF support with per-page OCR.
BMP
Windows bitmap with automatic format detection.

Pricing

One-Time Payment. Yours Forever.

Forensic Edition

$67 USD
Single machine license. No subscription. One-time payment. Yours forever.
  • All free features included
  • Batch OCR with 2-32 parallel workers
  • JSON export with per-line bounding boxes
  • hOCR export with word-confidence values
  • EDRM XML v1.2 for Relativity, Nuix, Axiom, FTK
  • Searchable PDF with selectable text
  • Signed Forensic Report with ed25519 verification
  • Embedded image OCR in batch mode
  • Multi-seat licensing with seat release
  • Priority email support
  • 30-day money-back guarantee

5+ machines? Contact us for volume pricing.

Use Cases

Who Uses Sherlock OCR Reader

eDiscovery Document Processing

OCR scanned exhibits and produce EDRM XML load files for direct ingestion into Relativity, Nuix and Axiom. Parent-to-child SHA-256 lineage preserves embedded object relationships. Pairs with our eDiscovery services.

Insurance Claim Review

Batch OCR stacks of scanned claim documents with table extraction for receipts, invoices and expense reports. Per-word confidence flags illegible entries before they become disputes.

Court Exhibit Preparation

Produce searchable PDFs from scanned evidence with signed forensic reports documenting tool version, SHA-256 hashes and audit chain integrity. Built by CISSP-certified examiners with courtroom experience.

Compliance Audits

Digitize legacy paper records with full chain of custody. Hash-chained audit log provides mathematical proof that no document was altered during processing.

HR Investigation Records

OCR handwritten notes, printed memos and scanned correspondence. Region-of-interest selection targets specific areas. Low-confidence filtering surfaces words that need manual review.

Legacy Document Digitization

Process archives of old documents with multi-language support. Born-digital PDFs extract text instantly without OCR overhead. Mixed document handling processes each page with the right method.

Questions

OCR Reader FAQ

Does Sherlock OCR Reader require Tesseract to be installed?
Yes. Tesseract 5 must be installed separately. Sherlock OCR Reader auto-locates it from standard install paths including UB Mannheim default, Program Files and PATH. A built-in helper links to the recommended installer if Tesseract is not found.
Can I OCR PDFs that already have text?
Yes. Born-digital PDFs extract the text layer directly, which is faster and byte-faithful to what the document actually contains. Image-only pages use OCR. Mixed documents handle each page with the right method automatically. Confidence is reported as 100.0 for text-layer pages to distinguish authored content from recognized content.
What makes this different from Adobe Acrobat OCR?
Per-word confidence scoring with Tesseract 5 LSTM, ed25519 hash-chained audit trail with per-entry cryptographic signatures, EDRM XML v1.2 export for direct ingestion into Relativity, Nuix, Axiom and FTK, and a type-state evidence model that prevents accidental source modification at compile time. Built for forensic examiners, not general office use.
Can I process an entire folder of documents?
Yes, with the Forensic Edition. Batch OCR processes every PDF and image in a folder with 1 to 32 parallel workers. Pause, resume or cancel individual files mid-batch. Per-file progress tracking with elapsed time and rolling ETA. Completed files automatically separate from active work.
What export formats are supported?
Seven formats. Free: Plain Text (.txt) and CSV Summary (.csv). Forensic Edition: JSON with full per-line bounding boxes and confidence, hOCR (standard HTML with word-confidence values), EDRM XML v1.2 for eDiscovery platform ingestion, Searchable PDF with selectable text, and Signed Forensic Report with tool version, SHA-256 hashes, audit chain path and ed25519 signing key.
Is the $67 price a subscription?
No. The $67 USD Forensic Edition license is a one-time payment. No subscriptions, no recurring charges. You own the license permanently with free updates included.
Can I move my license to a different machine?
Yes. Use the built-in "Release seat" button to deactivate on the current machine, then paste the same license key on the new workstation. Reinstalls on the same machine are automatically recognized without consuming an additional seat.
Does it work on air-gapped workstations?
Yes. After initial license activation (which requires one network round trip), Sherlock OCR Reader operates fully offline. Operator identity is captured from environment variables with zero network calls. The audit log and all processing happen entirely locally.
What happens if the tool crashes mid-OCR?
Per-page subprocess isolation means a malformed image or pathological PDF page takes down only that one OCR job, never the whole tool. Batch mode preserves partial results from cancelled files. The audit log uses fsync on every append and recovers gracefully from partial writes.
How do I verify the download is safe?
Every download displays a SHA-256 hash on the download page. After downloading, compute the SHA-256 of the file and compare it to the published hash. If the values match, the file has not been tampered with. Use our Sherlock Forensics Hash tool or any SHA-256 calculator.

Get Started

Download Sherlock Forensics OCR Reader Today

Free for single-file OCR with confidence scoring and audit trails. Forensic Edition at $67 USD for batch processing, EDRM XML export and signed forensic reports. Built by the same team that delivers expert witness testimony and forensic investigations in Canadian courts.

Since 2006CISSP, ISSAP, ISSMP certified604.229.1994

Used for: eDiscovery document processing, insurance claim review, HR investigation records, court exhibit preparation, compliance audits and legacy document digitization

30-day money back guarantee on the Forensic Edition. If it does not meet your needs, contact us for a full refund.

fb1282aec47d5309965439b39f59d7436809eb08460555f3ddb9a9cfa4fc6608

How to verify:
1. Open PowerShell (right-click Start menu, click Terminal)
2. Run: Get-FileHash .\sherlock-ocr.exe
3. Compare the output with the hash above. If they match, the file has not been tampered with.

Sherlock Forensics OCR Reader is provided for lawful use. Terms of Service

Download

Enter your details to download. We will send you update notifications for new versions.

Checkout - OCR Reader Forensic Edition

$67.00 USD. One-time payment. License key delivered to your email.

Secure via Stripe 30-day money back No subscription