Does Sherlock Forensics OCR Reader require Tesseract to be installed?
Yes. Tesseract 5 must be installed separately. Sherlock Forensics OCR Reader auto-locates it from standard install paths including UB Mannheim default, Program Files and PATH. A built-in helper links to the recommended installer if Tesseract is not found.
Can I OCR PDFs that already have text?
Yes. Born-digital PDFs extract the text layer directly, which is faster and byte-faithful to what the document actually contains. Image-only pages use OCR. Mixed documents handle each page with the right method automatically. Confidence is reported as 100.0 for text-layer pages to distinguish authored content from recognized content.
What makes this different from Adobe Acrobat OCR?
Per-word confidence scoring with Tesseract 5 LSTM, ed25519 hash-chained audit trail with per-entry cryptographic signatures, EDRM XML v1.2 export for direct ingestion into Relativity, Nuix, Axiom and FTK plus a type-state evidence model that prevents accidental source modification at compile time. Built for forensic examiners, not general office use.
Can I process an entire folder of documents?
Yes, with the Forensic Edition. Batch OCR processes every PDF and image in a folder with 1 to 32 parallel workers. Pause, resume or cancel individual files mid-batch. Per-file progress tracking with elapsed time and rolling ETA. Completed files automatically separate from active work.
What export formats are supported?
Seven formats. Free: Plain Text (.txt) and CSV Summary (.csv). Forensic Edition: JSON with full per-line bounding boxes and confidence, hOCR (standard HTML with word-confidence values), EDRM XML v1.2 for eDiscovery platform ingestion, Searchable PDF with selectable text plus Signed Forensic Report with tool version, SHA-256 hashes, audit chain path and ed25519 signing key.
Is the $67 price a subscription?
No. The $67 USD Forensic Edition license is a one-time payment. No subscriptions, no recurring charges. You own the license permanently with free updates included.
Can I move my license to a different machine?
Yes. Use the built-in "Release seat" button to deactivate on the current machine, then paste the same license key on the new workstation. Reinstalls on the same machine are automatically recognized without consuming an additional seat.
Does it work on air-gapped workstations?
Yes. After initial license activation (which requires one network round trip), Sherlock Forensics OCR Reader operates fully offline. Operator identity is captured from environment variables with zero network calls. The audit log and all processing happen entirely locally.
What happens if the tool crashes mid-OCR?
Per-page subprocess isolation means a malformed image or pathological PDF page takes down only that one OCR job, never the whole tool. Batch mode preserves partial results from cancelled files. The audit log uses fsync on every append and recovers gracefully from partial writes.
How do I verify the download is safe?
Every download displays a SHA-256 hash on the download page. After downloading, compute the SHA-256 of the file and compare it to the published hash. If the values match, the file has not been tampered with. Use our
Sherlock Forensics Hash tool or any SHA-256 calculator.
Does Sherlock Forensics OCR Reader work on Linux?
Yes. Sherlock OCR Reader is available as a native Linux x64 binary. Download the .tar.gz archive, extract and run. Requires libgtk-3, libfontconfig1 and libxkbcommon.
Is Sherlock Forensics OCR Reader a Tesseract alternative?
Yes. Sherlock OCR Reader is a Tesseract alternative built for OCR forensics at $67 lifetime. It uses the Tesseract 5 LSTM engine under the hood but wraps it with the OCR audit trail, Bates numbering, per-word confidence exposure, EDRM XML v1.2 export and court-ready PDF report that the free Tesseract CLI does not provide. For casual document scanning, Tesseract alone is fine. For OCR for litigation, OCR e-discovery and any OCR evidence work, the Sherlock Tesseract alternative is the focused forensic option.
Is Sherlock Forensics OCR Reader an ABBYY FineReader alternative?
Yes. Sherlock OCR Reader is a $67 lifetime ABBYY alternative versus ABBYY FineReader at $199 to $399 one-time. The trade-off is honest: ABBYY has very mature OCR engine tuning, broad language support and polished desktop UX. Sherlock is purpose-built for OCR forensics with the Ed25519 hash-chained OCR audit trail, EDRM XML v1.2 export, Bates numbering and per-word confidence exposure that ABBYY does not ship. For legal-tech and forensic OCR workflows, the ABBYY alternative argument is straightforward: focused features at lower cost.
Is Sherlock Forensics OCR Reader an Adobe Acrobat OCR alternative?
Yes. Adobe Acrobat Pro OCR is bundled into the $19.99/month subscription. After 4 months, the subscription cost exceeds Sherlock's $67 lifetime price. Adobe's OCR is a feature in a general PDF product, not a focused forensic OCR engine. The Adobe OCR alternative argument: Sherlock ships forensic-only features (Ed25519 OCR audit trail, EDRM XML, Bates numbering, per-word confidence exposure, court-ready OCR PDF report) that Adobe Acrobat does not, at a one-time price.
Does Sherlock Forensics OCR Reader do Bates numbering?
Yes. Sherlock OCR Reader applies Bates numbering with custom prefixes, configurable start numbers and suffix patterns. The OCR Bates stamp is written to the output PDF and to the EDRM XML page metadata so review platforms (Relativity, Concordance, Logikcull, Reveal, Everlaw, Disco) pick up the Bates identifier automatically on ingest. Bates numbering at the OCR layer is the e-discovery-grade workflow that legal-tech reviewers need for production document sets.
Can I use Sherlock Forensics OCR Reader output as evidence in litigation?
Yes. Sherlock OCR Reader Forensic Edition produces court-ready OCR output with the Ed25519 hash-chained OCR audit trail, per-word OCR confidence, EDRM XML v1.2 export, Bates numbering and a signed forensic report documenting tool version, SHA-256 hashes and ed25519 signing key. The tool is built by CISSP, ISSAP and ISSMP certified examiners with 20-plus years of courtroom testimony. Admissibility depends on jurisdiction and on the examiner following proper OCR chain of custody procedure, but the report format documents what courts typically require for OCR evidence.
How does Sherlock's OCR audit trail work?
Sherlock Forensics OCR Reader records every action to an append-only OCR audit trail with Ed25519 signing. The chain hashes: input scan SHA-256, OCR engine version, per-word confidence scores, examiner identity (captured from environment), batch ID and timestamp. Each new audit log entry includes the hash of the previous entry, creating a tamper-evident chain. Modifying any entry breaks the chain and is cryptographically detectable. The audit log is what differentiates court-ready OCR from casual OCR output.
What is EDRM XML and why does Sherlock export it?
EDRM XML (Electronic Discovery Reference Model XML, currently version 1.2) is the standardized format that e-discovery review platforms ingest. Relativity, Concordance, Logikcull, Reveal, Everlaw and Disco all support EDRM XML import. Sherlock Forensics OCR Reader exports OCR results as EDRM XML with per-page metadata, Bates identifiers, per-word confidence scores and embedded document hashes. This lets OCR e-discovery teams drop Sherlock output directly into review workflows without format conversion or custom scripting.
What is per-word OCR confidence scoring and why is it forensic?
OCR per-word confidence is a 0-to-100 score the OCR engine assigns to each recognized word. Tesseract computes this internally but does not expose it natively in standard output. ABBYY and Adobe keep OCR per-word confidence internal. Sherlock exposes OCR per-word confidence as a first-class queryable field in EDRM XML and CSV exports. The forensic value: examiners can filter low-confidence words for manual review BEFORE producing OCR output as evidence, which is the OCR forensics due-diligence pattern that holds up under cross-examination.
Can Sherlock Forensics OCR Reader process scanned documents for e-discovery?
Yes. Scanned-document forensics for OCR e-discovery is a primary design target. Batch OCR processes folders of scanned PDFs and image-only PDFs with 1 to 32 parallel workers, applies Bates numbering, exports EDRM XML v1.2 for review-platform ingest and produces the Ed25519 hash-chained OCR audit trail. The same Sherlock OCR Reader workflow covers civil litigation OCR review, criminal defense OCR review, regulated-industry compliance OCR audits (HIPAA, SEC, FINRA, FOIA) and internal investigation OCR review.
What is the difference between batch OCR and individual file OCR?
Individual file OCR processes one document at a time, suitable for quick lookups or single-document evidence work. Batch OCR (Forensic Edition) processes folders of documents with 1 to 32 parallel workers, per-file progress tracking, pause/resume/cancel per file, partial-results preservation on cancellation and unified Bates numbering across the batch with chained Ed25519 OCR audit trail. For OCR e-discovery and large litigation production sets, batch OCR is the operational workflow; for one-off scanned-document forensics, individual file OCR is sufficient.