Sherlock OCR Reader wraps the Tesseract 5 LSTM neural engine with a forensic-grade preprocessing pipeline and multi-pass voting system that consistently outperforms raw Tesseract on real-world documents. Smart bicubic upscaling recovers text from low-DPI scans that Tesseract alone would garble. Adaptive binarization strips colored backgrounds and decorative elements before the engine ever sees the page. Automatic deskew straightens crooked scans at 0.25-degree precision. Auto-rotation detects and corrects 90/180/270-degree source rotation. The multi-pass voting mode runs four different page segmentation strategies and merges results by best mean confidence, pulling accurate text from pages that defeat any single Tesseract pass. Every preprocessing step is recorded in the audit trail so you know exactly what was applied. Per-word confidence scoring flags suspect tokens instantly. The result: forensic-grade accuracy from an open-source engine, with a complete chain of evidence for every character recognized.
Multi-Pass Voting
Runs PSM 3, 6, 11 and 12 simultaneously and merges results by best mean confidence. Recovers text from difficult pages that single-pass OCR misses.
Multi-Language OCR
Combine any installed Tesseract language packs in a single pass. Built-in selectors for English, French, German, Spanish, Italian and common combinations like eng+fra.
Full Engine Control
Page Segmentation Mode (PSM), OCR Engine Mode (OEM), character whitelist and blacklist, user-words dictionary and user-patterns dictionary for domain vocabulary and regex-style patterns.