Free Download

Forensic OCR with Audit Trails. Free.

Q: Does Sherlock Forensics OCR Reader work on Linux?

Yes. Sherlock Forensics OCR Reader is available as a native Linux x64 binary. Download the .tar.gz archive, extract and run. Requires libgtk-3, libfontconfig1 and libxkbcommon.

Q: Can I move my license to a different machine?

Yes. Use the built-in Release seat button to deactivate on the current machine, then paste the same license key on the new workstation. Reinstalls on the same machine are automatically recognized without consuming an additional seat.

Extract text from scanned documents, PDFs and images with per-word confidence scoring and court-ready chain of custody. Powered by Tesseract 5 LSTM. Hash-chained ed25519 audit log on every action.

Need batch processing? The Forensic Edition adds 1-32 parallel workers, EDRM XML export for Relativity and Nuix, searchable PDFs and signed forensic reports.

Sherlock Forensics OCR Reader is a free forensic OCR tool for Windows powered by Tesseract 5 LSTM. It provides per-word confidence scoring, ed25519 hash-chained audit trails, 7 export formats including EDRM XML v1.2 and table extraction. The Forensic Edition at $67 adds batch OCR with 1-32 workers, searchable PDF export and signed forensic reports.

Forensic Edition adds batch OCR with 1-32 parallel workers, EDRM XML export for Relativity and Nuix, searchable PDFs and ed25519-signed forensic reports. Free includes single-document OCR, per-word confidence scoring, hash-chained audit trail and 7 export formats.

Linux requires: libgtk-3, libfontconfig1, libxkbcommon. See install instructions.

No signup· SHA-256 verified· 30 MB· Since 2006

Forensic Context

OCR Forensics for Litigation, E-Discovery and Legal Hold

OCR forensics is the discipline of extracting text from scanned documents and image-only PDFs in a way that produces court-ready OCR output, with the audit trail, Bates numbering, per-document hashes and per-word confidence scoring that downstream legal-tech workflows require. Sherlock Forensics OCR Reader is built for the practical end of OCR forensics: scanned-document forensics for civil litigation, criminal defense, regulated industries and internal investigations, at $67 lifetime versus $200-plus ABBYY FineReader or $20/month Adobe Acrobat Pro.

For OCR e-discovery review, Bates-numbered OCR output drops directly into Relativity, Concordance, Logikcull, Reveal, Everlaw and Disco ingest workflows through the EDRM XML v1.2 export. For OCR for litigation production, the producing party OCRs scanned documents under FRCP 34 obligations and the receiving party OCRs produced PDFs that arrived as flattened images. Both sides need OCR evidence with a tamper-evident chain of custody; Sherlock's Ed25519 hash-chained OCR audit trail covers that requirement. For legal hold, OCR of preserved scanned documents enables searchability across the legal-hold corpus without modifying the underlying preserved files.

Regulated industries (healthcare HIPAA, financial services SEC and FINRA, government FOIA) all require OCR'd searchable archives with audit trails. Sherlock's Bates numbering plus the EDRM XML page metadata plus the Ed25519 OCR audit trail meet the documentary chain-of-custody bar for these regulated OCR forensics workflows. See also our Cellebrite vs Magnet AXIOM 2026 breakdown for the same mid-market versus enterprise-tier positioning analysis in the adjacent mobile forensics category, our PST Viewer for the mailbox-evidence side of the same e-discovery cluster, our deleted email recovery from PST guide for the parallel deleted-record forensic-buyer workflow and our PDF Editor for the post-OCR redaction-rendering side.

Why Sherlock OCR

Built for Forensic Examiners. Usable by Anyone.

Hash-Chained Audit Log

Append-only JSONL audit trail with ed25519 cryptographic signatures on every entry. SHA-256 hash chain makes retroactive edits, insertions or deletions mathematically detectable. Verify offline with just a public key.

Per-Word Confidence Scoring

Tesseract 5 LSTM engine with 0-100 confidence on every word captured from hOCR x_wconf. Flag suspect tokens instantly. Per-line and per-word bounding boxes in image-pixel coordinates for precise visual review.

7 Export Formats

Plain Text, CSV Summary, JSON with bounding boxes, hOCR, EDRM XML v1.2 for Relativity/Nuix/Axiom/FTK ingestion, Searchable PDF with selectable text and Signed Forensic Report with SHA-256 chain verification.

Batch OCR (1-32 Workers)

Process entire folders with parallel workers. Pause, resume or cancel per-file. Per-file progress bars with elapsed time and rolling ETA. Completed files drop below active work automatically. Forensic Edition feature.

Core Engine

Beyond Tesseract: Smarter OCR Out of the Box

Sherlock Forensics OCR Reader wraps the Tesseract 5 LSTM neural engine with a forensic-grade preprocessing pipeline and multi-pass voting system that consistently outperforms raw Tesseract on real-world documents. Smart bicubic upscaling recovers text from low-DPI scans that Tesseract alone would garble. Adaptive binarization strips colored backgrounds and decorative elements before the engine ever sees the page. Automatic deskew straightens crooked scans at 0.25-degree precision. Auto-rotation detects and corrects 90/180/270-degree source rotation. The multi-pass voting mode runs four different page segmentation strategies and merges results by best mean confidence, pulling accurate text from pages that defeat any single Tesseract pass. Every preprocessing step is recorded in the audit trail so you know exactly what was applied. Per-word confidence scoring flags suspect tokens instantly. The result: forensic-grade accuracy from an open-source engine, with a complete chain of evidence for every character recognized.

Multi-Pass Voting

Runs PSM 3, 6, 11 and 12 simultaneously and merges results by best mean confidence. Recovers text from difficult pages that single-pass OCR misses.

Multi-Language OCR

Combine any installed Tesseract language packs in a single pass. Built-in selectors for English, French, German, Spanish, Italian and common combinations like eng+fra.

Full Engine Control

Page Segmentation Mode (PSM), OCR Engine Mode (OEM), character whitelist and blacklist, user-words dictionary and user-patterns dictionary for domain vocabulary and regex-style patterns.

Competitor Displacement

Tesseract Alternative + ABBYY FineReader Alternative + Adobe OCR Alternative

The OCR market splits into three buyer tiers: free open-source CLI (Tesseract), $200-plus commercial desktop (ABBYY FineReader) and subscription bundled (Adobe Acrobat Pro at $19.99/month). None of these were designed for OCR forensics or court-ready OCR work. Sherlock Forensics OCR Reader is built as the practical Tesseract alternative, ABBYY alternative and Adobe OCR alternative at $67 lifetime, with the Ed25519 hash-chained OCR audit trail, EDRM XML v1.2 export, Bates numbering, per-word confidence scoring and court-ready PDF report that the forensic and legal-tech buyer actually needs.

Capability	Tesseract	ABBYY FineReader	Adobe Acrobat Pro	Sherlock OCR Reader
Price	Free (CLI)	$199-$399 one-time	$19.99/month subscription	$67 lifetime
Forensic audit trail	None	None	None	Ed25519 hash-chained
Per-word confidence scoring	Possible via custom code	Internal, not exported	Hidden	Exposed and exportable
EDRM XML v1.2 export	None	None	None	Yes
Court-ready OCR PDF report	None	None	None	Yes (Forensic Edition)
Batch OCR processing	CLI scripting required	Yes	Yes	1-32 parallel workers, GUI
Bates numbering integration	None	None	Partial via Acrobat tools	Yes, native
OCR chain of custody	None	None	None	Yes
GUI on Windows	None	Yes	Yes	Yes
Subscription required	No	No	Yes ($19.99/mo)	No

The Tesseract alternative argument is straightforward: Tesseract is excellent free OCR for casual document scanning but the absence of an OCR audit trail and the lack of EDRM XML output disqualifies it for OCR forensics or OCR for litigation work. See the Sherlock Forensics tool catalogue for the cross-product mesh of forensic OCR plus mailbox forensics plus mobile forensics plus event-log forensics positioning at the mid-market tier. The ABBYY alternative argument: ABBYY FineReader is a competent desktop OCR product but it ships none of the forensic features (no OCR audit trail, no court-ready OCR, no Bates numbering integration at the OCR layer). The Adobe OCR alternative argument: Adobe Acrobat Pro's OCR is a feature in a subscription-bundled product rather than a focused forensic OCR engine and at $19.99/month it accumulates to more than Sherlock's lifetime price within 4 months. For a $67 lifetime Tesseract alternative + ABBYY alternative + Adobe OCR alternative purpose-built for OCR forensics, Sherlock is the focused choice.

One-Click Presets

Optimized for Your Document Type

Document

Clean scanned pages. PSM 3, no preprocessing.

Photo / Poster

Sparse text. PSM 11 with 3x upscale and binarize.

Receipt / Invoice

Columnar data. PSM 4 with binarize and deskew.

Single Paragraph

Signs and labels. PSM 6 with upscale.

Best Quality

Multi-pass voting across PSM 3, 6, 11, 12 with upscale and auto-rotate.

Custom

Full manual control. Auto-flips to Custom when any knob is hand-edited.

Preprocessing

Pure-Rust Image Pipeline

Auto-Rotate

Tesseract OSD detects 90, 180 and 270 degree rotation and uprights losslessly. Actual rotation angle recorded in the audit chain.

Smart Upscaling

Bicubic (Catmull-Rom) upscaling at 1.5x to 4x for low-DPI sources. Automatically skipped when resolution is already sufficient.

Adaptive Binarization

Otsu binarization strips colored backgrounds and decorative elements for clean ink-on-paper output.

Deskew Detection

Hough/row-variance sweep over -10 to +10 degrees at 0.25 degree precision with bilinear interpolation.

Confidence Filtering

Line-drop threshold slider drops lines below operator-set confidence. Garbage-character filter removes cartoon-art-style OCR noise.

Fully Recorded

Pipeline outcome recorded empirically: actual rotation applied, actual upscale factor, actual deskew angle. The audit captures what happened, not just what was requested.

Evidentiary Detail

Per-Word Confidence Scoring: Why It Matters for Evidentiary OCR

OCR per-word confidence is the field that turns OCR output from a black box into evidentiary OCR. When an OCR engine recognizes a scanned word, it computes a confidence score (typically 0 to 100) representing how certain the model is about that word. Tesseract computes this score internally but does not expose it natively at the per-word grain in standard output. ABBYY FineReader and Adobe Acrobat Pro keep OCR per-word confidence internal. Sherlock Forensics OCR Reader exposes OCR per-word confidence as a first-class queryable field in the EDRM XML v1.2 output and the CSV export.

Why this matters for OCR forensics: defending OCR output in court requires showing which specific words the OCR engine was uncertain about. An examiner who can produce a list of low-confidence words (with their coordinates on the page) demonstrates due diligence: every uncertain OCR reading was either manually verified, manually corrected or flagged in the OCR audit trail. Without exposed OCR per-word confidence, the same examiner is asking the court to trust the entire OCR output as a monolithic block, which is harder to defend under cross-examination.

Operational workflow: run Sherlock OCR Reader on the scanned-document corpus, then filter the EDRM XML output for words with OCR confidence below a threshold (commonly 80). Route those to manual review before producing OCR output as evidence. The OCR audit trail captures every action: original confidence score, who reviewed, what was corrected, when. The Ed25519 hash chain over the audit trail makes the entire OCR per-word confidence workflow tamper-evident at the cryptographic layer. This is why a focused $67 forensic OCR tool with exposed OCR per-word confidence beats a $200-plus ABBYY alternative or $19.99/month Adobe OCR alternative for litigation-grade work.

PDF Intelligence

Born-Digital and Scanned PDFs

Sherlock Forensics OCR Reader handles born-digital and scanned PDFs in the same document. Text-layer pages extract authored text directly from the PDF stream. Image-only pages run through OCR. Mixed documents process each page with the right method automatically.

Pdfium Rasterization

High-DPI page rendering powered by the same engine behind Chrome. Operator-controlled DPI at 150, 200, 300 (forensic default), 400 or 600.

Born-Digital Fast Path

Text-layer extraction runs in milliseconds. Byte-faithful to what the document contains. Confidence reported as 100.0 to distinguish from OCR.

Embedded Object Recovery

JPEG, JPEG 2000, CCITT Fax G4 image streams and PDF attachments recovered with per-child SHA-256. Page-index attribution for inline images. One-click OCR per embedded image.

Tamper Detection Posture

Avoids the OCR-over-text-PDF pitfall that can mask text-stream divergence from rendered glyphs.

Table Detection

Rule-Less Table Extraction

Sherlock Forensics OCR Reader detects tables in receipts, invoices and court exhibits using alignment-based detection that works without visible table borders. Tables include per-table SHA-256 for audit chain integrity and export to CSV.

Alignment-based detection finds rule-less tables that line-detection algorithms miss
Row clustering by Y-center proximity with adjacency walk over column anchors
Minimum 3 consecutive rows sharing 2+ column anchors to suppress prose false-positives
Per-table bounding box, mean OCR confidence and SHA-256 locked into the audit chain
Copy-as-CSV and Save-as-CSV per table with RFC 4180 quoting
Works over both OCR word boxes and born-digital text-layer word boxes

Metadata

Document Metadata Inspector

PDF Properties

Title, Author, Subject, Keywords, Creator, Producer, dates, Trapped flag, custom keys, PDF version, page count, encryption status and linearization detection.

Trailer IDs

Original and modified document GUIDs recovered as hex. The canonical eDiscovery anchor for detecting re-saved-but-unchanged files.

XMP and EXIF

Raw XMP packet preservation. EXIF extraction via kamadak-exif for JPEG, TIFF and PNG. GPS, camera body and capture-date tags distinguished by IFD label.

Crash-Hardened

Malformed PDFs return "no metadata available" instead of crashing. Non-ASCII author and title fields decoded correctly from PDFDocEncoded and UTF-16BE.

Legal-Tech Mechanics

Bates Numbering, Redaction and Chain of Custody for OCR'd Evidence

Three legal-tech mechanics turn OCR output into court-ready OCR evidence: Bates numbering for sequential page identification, OCR redaction support for PII removal at the OCR layer and OCR chain of custody documentation that ties each OCR'd word back to its origin document and the examiner who processed it.

Bates Numbering

Bates numbering applies a sequential identifier to every page of a produced document set (for example SHF000001, SHF000002, SHF000003). Sherlock Forensics OCR Reader supports Bates numbering with custom prefixes, suffix patterns and configurable start numbers. The OCR Bates stamp is written to the output PDF and to the EDRM XML page metadata so downstream review-platform ingest (Relativity, Concordance, Logikcull) picks up the Bates identifier automatically. Bates numbering at the OCR layer is the e-discovery-grade Bates numbering workflow that Adobe Acrobat Pro can partially mimic but does not natively combine with forensic OCR.

OCR Redaction Support

OCR redaction is the workflow of removing PII or privileged content from OCR'd documents before production. Sherlock OCR Reader's per-word confidence scoring plus the per-word position metadata (x, y, width, height per word) lets a downstream OCR redaction workflow identify candidate PII (social security numbers, account numbers, named individuals) at the OCR layer rather than at the visual-PDF layer. Sherlock produces the OCR redaction-ready input; the actual redaction-rendering of black boxes on the PDF is a separate workflow handled by tools like our PDF Editor, but the underlying word-level coordinates that make OCR redaction possible come from Sherlock OCR Reader.

OCR Chain of Custody

OCR chain of custody documents the path from the original scanned document to the final OCR output. Every OCR'd document in Sherlock captures: Ed25519 hash of the input scan, OCR confidence score per word, the examiner identity that ran the OCR job, the OCR engine version (Tesseract 5.x LSTM at processing time), batch ID and timestamp. The hash chain across all documents in an OCR batch means the entire batch is tamper-evident; modifying any single document's OCR output breaks the chain and is cryptographically detectable. This is what makes a court-ready OCR product defensible against a defense-expert challenge that the OCR output was modified after acquisition, a discipline that the free Tesseract alternative and commercial ABBYY alternative paths do not support natively.

Chain of Custody

Ed25519 Hash-Chained Audit Log

Every action in Sherlock Forensics OCR Reader is recorded in an append-only JSONL audit log with ed25519 cryptographic signatures and SHA-256 hash chaining. Retroactive edits, insertions or deletions are mathematically detectable. The log can be verified offline with just the public key.

Per-Entry Signatures

Monotonic sequence number, UTC millisecond timestamp, event payload, previous-line hash, current-line SHA-256 and ed25519 signature on every log entry.

Session Anchoring

SessionStart and SessionEnd events capture license holder, operator identity (username, domain, machine, OS, locale) and OCR engine identity including a SHA-256 of the Tesseract binary itself.

Evidence Events

EvidenceVerified, FileOpened, PageOcrComplete, TableExtracted, EmbeddedObjectExtracted, ExportWritten. Every event links source SHA-256 to output SHA-256.

Crash Recovery

fsync on every append. Partially written final lines quarantined to .broken file with timestamps in audit-recovery.log. Zero silent data loss.

Offline Verification

Standalone chain-verify function re-walks the entire log and validates hashes plus sequence and prev_hash continuity end to end. No network connection required.

Air-Gap Compatible

Operator identity captured from environment variables with zero network calls. The entire tool operates on fully air-gapped forensic workstations.

Evidence Protection

Type-State Evidence Model

Compile-time enforcement: a file must be hashed and sealed into VerifiedEvidence before any OCR, export or rendering call will accept it. Bypass attempts fail to compile.
Read-only opens: every source file opened with no write and no create flags. Even an in-tool defect cannot modify the source.
SHA-256 everywhere: source files, output files, tables, embedded objects and the hash chain itself all use SHA-256 (lowercase 64-char hex, eDiscovery-compatible).
Streaming hashing: 64 KB read buffer so multi-gigabyte exports hash without exhausting memory.
Lineage preservation: even unsupported PDF filters are recorded as "Unsupported" with a description so the audit trail knows they were seen, never silently skipped.

Export

7 Export Formats

.txt Plain text with page-break form-feeds Free

.csv Per-page summary with SHA-256, confidence, line counts Free

.json Per-line bounding boxes, confidence, metadata, tables Pro

.hocr Standard hOCR HTML with x_wconf word confidence Pro

.xml EDRM XML v1.2 for Relativity, Nuix, Axiom, FTK Pro

.pdf Searchable PDF with selectable, Ctrl-F-searchable text Pro

.pdf Signed Forensic Report with SHA-256 chain and ed25519 key Pro

Every export records a post-write SHA-256 of the output file and logs an ExportWritten audit event linking source SHA-256 to target SHA-256.

Compare

Free vs Forensic Edition

Feature	Free	Forensic ($67)
Single-file OCR	Yes	Yes
Per-word confidence scoring	Yes	Yes
Preprocessing pipeline	Yes	Yes
PDF text-layer extraction	Yes	Yes
Table extraction	Yes	Yes
Metadata inspector	Yes	Yes
Hash-chained audit log	Yes	Yes
Multi-language OCR	Yes	Yes
One-click presets	Yes	Yes
Region of Interest (ROI) selection	Yes	Yes
OCR box overlay (confidence heatmap)	Yes	Yes
Dark/light theme	Yes	Yes
Plain Text export (.txt)	Yes	Yes
CSV Summary export (.csv)	Yes	Yes
Batch OCR (folders)	Preview only	Full (1-32 workers)
JSON export	-	Yes
hOCR export	-	Yes
EDRM XML v1.2 export	-	Yes
Searchable PDF export	-	Yes
Signed Forensic Report	-	Yes
Multi-worker concurrency	1 worker	2-32 workers
Embedded image OCR in batch	-	Yes
Priority support	-	Yes

Input

Supported File Formats

PDF: Born-digital and scanned. Text-layer extraction or OCR per page. Embedded object recovery.
PNG: Full alpha flattening onto white for clean OCR input on RGBA images.
JPG / JPEG: Standard JPEG with EXIF metadata extraction.
TIF / TIFF: Multi-page TIFF support with per-page OCR.
BMP: Windows bitmap with automatic format detection.

Pricing

One-Time Payment. Yours Forever.

Forensic Edition

$67 USD

Single machine license. No subscription. One-time payment. Yours forever.

All free features included
Batch OCR with 2-32 parallel workers
JSON export with per-line bounding boxes
hOCR export with word-confidence values
EDRM XML v1.2 for Relativity, Nuix, Axiom, FTK
Searchable PDF with selectable text
Signed Forensic Report with ed25519 verification
Embedded image OCR in batch mode
Multi-seat licensing with seat release
Priority email support
Try the free version before you buy

5+ machines? Contact us for volume pricing.

Use Cases

Who Uses Sherlock Forensics OCR Reader

eDiscovery Document Processing

OCR scanned exhibits and produce EDRM XML load files for direct ingestion into Relativity, Nuix and Axiom. Parent-to-child SHA-256 lineage preserves embedded object relationships. Pairs with our eDiscovery services.

Insurance Claim Review

Batch OCR stacks of scanned claim documents with table extraction for receipts, invoices and expense reports. Per-word confidence flags illegible entries before they become disputes.

Court Exhibit Preparation

Produce searchable PDFs from scanned evidence with signed forensic reports documenting tool version, SHA-256 hashes and audit chain integrity. Built by CISSP-certified examiners with courtroom experience.

Compliance Audits

Digitize legacy paper records with full chain of custody. Hash-chained audit log provides mathematical proof that no document was altered during processing.

HR Investigation Records

OCR handwritten notes, printed memos and scanned correspondence. Region-of-interest selection targets specific areas. Low-confidence filtering surfaces words that need manual review.

Legacy Document Digitization

Process archives of old documents with multi-language support. Born-digital PDFs extract text instantly without OCR overhead. Mixed document handling processes each page with the right method.

Questions

OCR Reader FAQ

Does Sherlock Forensics OCR Reader require Tesseract to be installed?

Yes. Tesseract 5 must be installed separately. Sherlock Forensics OCR Reader auto-locates it from standard install paths including UB Mannheim default, Program Files and PATH. A built-in helper links to the recommended installer if Tesseract is not found.

Can I OCR PDFs that already have text?

Yes. Born-digital PDFs extract the text layer directly, which is faster and byte-faithful to what the document actually contains. Image-only pages use OCR. Mixed documents handle each page with the right method automatically. Confidence is reported as 100.0 for text-layer pages to distinguish authored content from recognized content.

What makes this different from Adobe Acrobat OCR?

Per-word confidence scoring with Tesseract 5 LSTM, ed25519 hash-chained audit trail with per-entry cryptographic signatures, EDRM XML v1.2 export for direct ingestion into Relativity, Nuix, Axiom and FTK plus a type-state evidence model that prevents accidental source modification at compile time. Built for forensic examiners, not general office use.

Can I process an entire folder of documents?

Yes, with the Forensic Edition. Batch OCR processes every PDF and image in a folder with 1 to 32 parallel workers. Pause, resume or cancel individual files mid-batch. Per-file progress tracking with elapsed time and rolling ETA. Completed files automatically separate from active work.

What export formats are supported?

Seven formats. Free: Plain Text (.txt) and CSV Summary (.csv). Forensic Edition: JSON with full per-line bounding boxes and confidence, hOCR (standard HTML with word-confidence values), EDRM XML v1.2 for eDiscovery platform ingestion, Searchable PDF with selectable text plus Signed Forensic Report with tool version, SHA-256 hashes, audit chain path and ed25519 signing key.

Is the $67 price a subscription?

No. The $67 USD Forensic Edition license is a one-time payment. No subscriptions, no recurring charges. You own the license permanently with free updates included.

Can I move my license to a different machine?

Yes. Use the built-in "Release seat" button to deactivate on the current machine, then paste the same license key on the new workstation. Reinstalls on the same machine are automatically recognized without consuming an additional seat.

Does it work on air-gapped workstations?

Yes. After initial license activation (which requires one network round trip), Sherlock Forensics OCR Reader operates fully offline. Operator identity is captured from environment variables with zero network calls. The audit log and all processing happen entirely locally.

What happens if the tool crashes mid-OCR?

Per-page subprocess isolation means a malformed image or pathological PDF page takes down only that one OCR job, never the whole tool. Batch mode preserves partial results from cancelled files. The audit log uses fsync on every append and recovers gracefully from partial writes.

How do I verify the download is safe?

Every download displays a SHA-256 hash on the download page. After downloading, compute the SHA-256 of the file and compare it to the published hash. If the values match, the file has not been tampered with. Use our Sherlock Forensics Hash tool or any SHA-256 calculator.

Does Sherlock Forensics OCR Reader work on Linux?

Yes. Sherlock OCR Reader is available as a native Linux x64 binary. Download the .tar.gz archive, extract and run. Requires libgtk-3, libfontconfig1 and libxkbcommon.

Is Sherlock Forensics OCR Reader a Tesseract alternative?

Yes. Sherlock OCR Reader is a Tesseract alternative built for OCR forensics at $67 lifetime. It uses the Tesseract 5 LSTM engine under the hood but wraps it with the OCR audit trail, Bates numbering, per-word confidence exposure, EDRM XML v1.2 export and court-ready PDF report that the free Tesseract CLI does not provide. For casual document scanning, Tesseract alone is fine. For OCR for litigation, OCR e-discovery and any OCR evidence work, the Sherlock Tesseract alternative is the focused forensic option.

Is Sherlock Forensics OCR Reader an ABBYY FineReader alternative?

Yes. Sherlock OCR Reader is a $67 lifetime ABBYY alternative versus ABBYY FineReader at $199 to $399 one-time. The trade-off is honest: ABBYY has very mature OCR engine tuning, broad language support and polished desktop UX. Sherlock is purpose-built for OCR forensics with the Ed25519 hash-chained OCR audit trail, EDRM XML v1.2 export, Bates numbering and per-word confidence exposure that ABBYY does not ship. For legal-tech and forensic OCR workflows, the ABBYY alternative argument is straightforward: focused features at lower cost.

Is Sherlock Forensics OCR Reader an Adobe Acrobat OCR alternative?

Yes. Adobe Acrobat Pro OCR is bundled into the $19.99/month subscription. After 4 months, the subscription cost exceeds Sherlock's $67 lifetime price. Adobe's OCR is a feature in a general PDF product, not a focused forensic OCR engine. The Adobe OCR alternative argument: Sherlock ships forensic-only features (Ed25519 OCR audit trail, EDRM XML, Bates numbering, per-word confidence exposure, court-ready OCR PDF report) that Adobe Acrobat does not, at a one-time price.

Does Sherlock Forensics OCR Reader do Bates numbering?

Yes. Sherlock OCR Reader applies Bates numbering with custom prefixes, configurable start numbers and suffix patterns. The OCR Bates stamp is written to the output PDF and to the EDRM XML page metadata so review platforms (Relativity, Concordance, Logikcull, Reveal, Everlaw, Disco) pick up the Bates identifier automatically on ingest. Bates numbering at the OCR layer is the e-discovery-grade workflow that legal-tech reviewers need for production document sets.

Can I use Sherlock Forensics OCR Reader output as evidence in litigation?

Yes. Sherlock OCR Reader Forensic Edition produces court-ready OCR output with the Ed25519 hash-chained OCR audit trail, per-word OCR confidence, EDRM XML v1.2 export, Bates numbering and a signed forensic report documenting tool version, SHA-256 hashes and ed25519 signing key. The tool is built by CISSP, ISSAP and ISSMP certified examiners with 20-plus years of courtroom testimony. Admissibility depends on jurisdiction and on the examiner following proper OCR chain of custody procedure, but the report format documents what courts typically require for OCR evidence.

How does Sherlock's OCR audit trail work?

Sherlock Forensics OCR Reader records every action to an append-only OCR audit trail with Ed25519 signing. The chain hashes: input scan SHA-256, OCR engine version, per-word confidence scores, examiner identity (captured from environment), batch ID and timestamp. Each new audit log entry includes the hash of the previous entry, creating a tamper-evident chain. Modifying any entry breaks the chain and is cryptographically detectable. The audit log is what differentiates court-ready OCR from casual OCR output.

What is EDRM XML and why does Sherlock export it?

EDRM XML (Electronic Discovery Reference Model XML, currently version 1.2) is the standardized format that e-discovery review platforms ingest. Relativity, Concordance, Logikcull, Reveal, Everlaw and Disco all support EDRM XML import. Sherlock Forensics OCR Reader exports OCR results as EDRM XML with per-page metadata, Bates identifiers, per-word confidence scores and embedded document hashes. This lets OCR e-discovery teams drop Sherlock output directly into review workflows without format conversion or custom scripting.

What is per-word OCR confidence scoring and why is it forensic?

OCR per-word confidence is a 0-to-100 score the OCR engine assigns to each recognized word. Tesseract computes this internally but does not expose it natively in standard output. ABBYY and Adobe keep OCR per-word confidence internal. Sherlock exposes OCR per-word confidence as a first-class queryable field in EDRM XML and CSV exports. The forensic value: examiners can filter low-confidence words for manual review BEFORE producing OCR output as evidence, which is the OCR forensics due-diligence pattern that holds up under cross-examination.

Can Sherlock Forensics OCR Reader process scanned documents for e-discovery?

Yes. Scanned-document forensics for OCR e-discovery is a primary design target. Batch OCR processes folders of scanned PDFs and image-only PDFs with 1 to 32 parallel workers, applies Bates numbering, exports EDRM XML v1.2 for review-platform ingest and produces the Ed25519 hash-chained OCR audit trail. The same Sherlock OCR Reader workflow covers civil litigation OCR review, criminal defense OCR review, regulated-industry compliance OCR audits (HIPAA, SEC, FINRA, FOIA) and internal investigation OCR review.

What is the difference between batch OCR and individual file OCR?

Individual file OCR processes one document at a time, suitable for quick lookups or single-document evidence work. Batch OCR (Forensic Edition) processes folders of documents with 1 to 32 parallel workers, per-file progress tracking, pause/resume/cancel per file, partial-results preservation on cancellation and unified Bates numbering across the batch with chained Ed25519 OCR audit trail. For OCR e-discovery and large litigation production sets, batch OCR is the operational workflow; for one-off scanned-document forensics, individual file OCR is sufficient.

Get Started

Download Sherlock Forensics OCR Reader Today

Free for single-file OCR with confidence scoring and audit trails. Forensic Edition at $67 USD for batch processing, EDRM XML export and signed forensic reports. Built by the same team that delivers expert witness testimony and forensic investigations in Canadian courts.

Since 2006CISSP, ISSAP, ISSMP certified888.883.4550

Linux requires: libgtk-3, libfontconfig1, libxkbcommon. See install instructions.

Used for: eDiscovery document processing, insurance claim review, HR investigation records, court exhibit preparation, compliance audits and legacy document digitization

Try the free version before you buy. No limitations on viewing, searching or analysis.

be27bea489e30b2d9a239a9dbd8964305171175ab747e9565331a4d0d27e5790

How to verify:
1. Open PowerShell (right-click Start menu, click Terminal)
2. Run: Get-FileHash .\sherlock-ocr.exe
3. Compare the output with the hash above. If they match, the file has not been tampered with.

Sherlock Forensics OCR Reader is provided for lawful use. Terms of Service

Forensic OCR with Audit Trails. Free.

OCR Forensics for Litigation, E-Discovery and Legal Hold

Built for Forensic Examiners. Usable by Anyone.

Hash-Chained Audit Log

Per-Word Confidence Scoring

7 Export Formats

Batch OCR (1-32 Workers)

Beyond Tesseract: Smarter OCR Out of the Box

Multi-Pass Voting

Multi-Language OCR

Full Engine Control

Tesseract Alternative + ABBYY FineReader Alternative + Adobe OCR Alternative

Optimized for Your Document Type

Document

Photo / Poster

Receipt / Invoice

Single Paragraph

Best Quality

Custom

Pure-Rust Image Pipeline

Auto-Rotate

Smart Upscaling

Adaptive Binarization

Deskew Detection

Confidence Filtering

Fully Recorded

Per-Word Confidence Scoring: Why It Matters for Evidentiary OCR

Born-Digital and Scanned PDFs

Pdfium Rasterization

Born-Digital Fast Path

Embedded Object Recovery

Tamper Detection Posture

Rule-Less Table Extraction

Document Metadata Inspector

PDF Properties

Trailer IDs

XMP and EXIF

Crash-Hardened

Bates Numbering, Redaction and Chain of Custody for OCR'd Evidence

Bates Numbering

OCR Redaction Support

OCR Chain of Custody

Ed25519 Hash-Chained Audit Log

Per-Entry Signatures

Session Anchoring

Evidence Events

Crash Recovery

Offline Verification

Air-Gap Compatible

Type-State Evidence Model

7 Export Formats

Free vs Forensic Edition

Supported File Formats

One-Time Payment. Yours Forever.

Forensic Edition

Who Uses Sherlock Forensics OCR Reader

eDiscovery Document Processing

Insurance Claim Review

Court Exhibit Preparation

Compliance Audits

HR Investigation Records

Legacy Document Digitization

OCR Reader FAQ

Download Sherlock Forensics OCR Reader Today

Download

Checkout - OCR Reader Forensic Edition