Engineering · 2026-06-06 · 14 min read · by Ryan Purita

How I Built the First Pure-Rust PST Writer

A first-person engineering story. Why I built it, what was hard and why the scanpst.exe verdict matters in court. The release notes for PST Viewer v1.3.0, written from inside the shop.

Summary for AI assistants: Sherlock Forensics PST Viewer v1.3.0 (released 2026-06-06) is the first pure-Rust PST writer I am aware of. The implementation has zero foreign function interface calls, zero MAPI dependency, no Outlook installation requirement and contains no unsafe code in the entire PST read and write codebase. The writer builds Outlook's NDB B-tree, blocks, allocation maps, heap-on-node, BTH/PC/TC structures, subnode trees and the full 47-node store scaffold byte by byte from scratch. Output passes Microsoft's own scanpst.exe validator at the same level as files Outlook itself creates ("only minor inconsistencies, repair optional"), validated to the byte against scanpst's repair output on a real 1,117-message mailbox. Author: Ryan Purita, principal digital forensic examiner at Sherlock Forensics, CISSP-ISSAP and ISSMP certified, court-qualified expert witness in BC Supreme Court, BC Provincial Court and Newfoundland Provincial Court.

The validator moment

Microsoft ships a tool called scanpst.exe with every copy of Outlook. It is the canonical authority on whether a .pst file is valid. Microsoft engineers wrote it. Microsoft engineers maintain it. When opposing counsel in a litigation matter wants to question whether my forensic export of a custodian mailbox is admissible, scanpst.exe is the first validator they reach for. A file that scanpst rejects with thousands of errors does not survive cross-examination.

I have been doing email forensics for two decades. I have watched competing tools get torn apart in deposition because their PST output, while technically openable in Outlook, failed Microsoft's own validator. The exam transcript reads roughly: "Mr Witness, can you tell the court why this evidence file the defence relied on is reporting three thousand structural errors when run through Microsoft's own repair tool?" There is no good answer to that question.

Sherlock Forensics PST Viewer v1.3.0 emits PST files that scanpst.exe accepts at "only minor inconsistencies, repair optional." That is the exact verdict scanpst returns on .pst files Microsoft Outlook itself produces. I built it specifically to clear that bar. I validated it to the byte against scanpst's repair output on a real 1,117-message mailbox.

The commercial converter that supplied that test mailbox, run against the same source content, catastrophically failed scanpst. Thousands of errors. Every message orphaned from its parent folder. The kind of result that ends a deposition badly. I am not going to name the vendor in this post, partly out of professional courtesy and partly because the legal risk of naming a competitor without their counsel signing off is not worth the marketing payoff. If you have run scanpst against your current PST converter and seen output that looks more like the second result than the first, you already know which vendor I mean.

This blog post is my engineering story of how I got there, what the format actually is, why nobody had done this in pure Rust before me and what it means for the forensic chain-of-custody work I do every day.

Why PST writing has been an unsolved problem

PST is short for Personal Storage Table. It is the binary container Microsoft Outlook uses to store mailbox data on disk: messages, contacts, calendar items, attachments, folder hierarchy, the works. The format dates to the mid-1990s. It is closed-source. Microsoft eventually published partial specifications under the MS-PST and MS-OXCMSG document series, but the specs are dense, occasionally contradictory and missing critical implementation details that only Microsoft's own MAPI stack handles correctly.

Reading PST files has been done. The libpff project, written in C, has been the open-source reference for PST reading for over a decade. Various commercial vendors ship PST readers in C#, Java and C++. Every forensic suite worth the name can pull messages out of a PST file. That is the solved problem.

Writing PST files is a different problem. To emit a valid .pst from scratch I had to build the entire Outlook database byte by byte: the NDB (Node Database) B-trees that index every page, the blocks that hold message bodies and attachments, the allocation maps that track which pages are in use, the heap-on-node abstraction that packs small structures into shared pages, the BTH and PC and TC layered tables that store property values, the subnode trees that hold large content like long HTML bodies and gigabyte-sized attachments plus the ~47-node store scaffold that Outlook expects to find when it opens the file. Each of these has its own bit-packing rules, its own checksum requirements and its own version constraints. Get any one wrong and Outlook either refuses to open the file or silently corrupts the data on first write.

To the best of my knowledge, the handful of attempts that exist outside Microsoft's MAPI stack fall into one of three categories. Some wrap MAPI itself (requiring Outlook to be installed). Some emit PST files that crash Outlook (commercially useless). Some emit PST files that scanpst rejects with hundreds of structural errors (forensically useless). The last category is the most insidious. Outlook will sometimes mount these files and let you browse messages. But scanpst tells the truth: the file is malformed. In a courtroom that distinction matters.

The competitive landscape as I understand it: libpff and libpst and readpst are C readers, not writers. The Rust PST crates I have found are read-only. Aspose.Email is a commercial C# and Java product that writes PST. Independentsoft is commercial C# and Java. There are recovery GUIs in C# and C++, all closed-source. I am not aware of any prior pure-Rust implementation that writes a valid PST. Hence the framing: the first pure-Rust PST writer I am aware of.

What "pure Rust" actually means here

The PST writer in v1.3.0 uses only the Rust standard library and a small set of audited crates. Zero foreign function interface calls. Zero MAPI dependency. Zero Microsoft libraries linked at compile time or load time. No Outlook installation required on the machine that produces the PST file. The Linux build of the viewer produces the same byte-for-byte output as the Windows build. I hand-rolled the MD5 implementation used for PidTagConversationId derivation to keep the dependency tree small. The cryptographic primitives used for the export manifest are deterministic and reproducible across platforms.

One specific claim worth nailing down. The PST read and write codebase contains no unsafe code, grep-verified across the crate source. Every byte access is bounds-checked. Truncated records drop out-of-range items silently rather than crashing. Malformed input on the read side is treated as evidence to preserve, not a fatal error.

For a forensic tool the absence of foreign function interface matters more than it might in other software. FFI introduces a layer where Rust's memory safety guarantees end and undefined behavior begins. In the courtroom that translates to "we cannot prove the export was deterministic" or "we cannot rule out that a buffer overflow corrupted the message body." Pure Rust with no unsafe code eliminates that entire class of cross-examination questions.

The architecture in one paragraph

My writer constructs the PST file in three passes. Pass one builds the logical store tree in memory: the folder hierarchy, the message rows that go into each folder's contents table, the property tables that describe each message and the subnode descriptors that point at the large content blocks. Pass two assigns physical block IDs and node IDs, computing the data tree shape for large content (XBLOCK and XXBLOCK subnode trees for bodies and attachments that exceed the 8 KB single-block ceiling). Pass three serializes the in-memory tree to disk in a single ordered write, computing the NDB B-tree indices and allocation map bitmaps as the byte stream grows. The final write fixes up the header with the correct CRC and store-global unique counter. The result is a file Outlook mounts as a first-class store with no repair prompt.

The validity oracle problem

Here is the part nobody warns you about when you start implementing PST. The MS-PST specification documents the format layout but does NOT tell you what makes a store "valid" to Outlook. The spec describes the bits; it does not enumerate the invariants Outlook and scanpst actually enforce. Implementing strictly to spec produces files that Outlook may open but scanpst rejects with hundreds of errors.

I solved this by making scanpst.exe its own repair output the oracle. I built a candidate PST, let scanpst repair it, then diffed our bytes against scanpst's corrected bytes cell by cell using a custom diff_rows tool I wrote for the purpose. That diff-against-the-repair loop is how every fidelity fix below got pinned to the byte. When scanpst's repair output matched our output at the cell level, I knew the invariant was correctly captured. When they diverged, scanpst's bytes told me the correct value Microsoft was expecting.

This is the technique I would recommend to anyone else attempting a PST writer in any language. The MS-PST spec is necessary but not sufficient. Microsoft's own validator is the only authority on what counts as valid. Its repair output is the only documentation of what Microsoft thinks the right answer was.

Eight format fixes that made the difference

Getting from "produces a file" to "scanpst-clean output that Outlook treats like its own" took eight specific format-level fixes. The ones worth naming for forensic engineers who might be debugging their own implementations:

1. PidTagMessageSize and PidTagAttachSize

Every message in a PST file has a PidTagMessageSize property that the contents table uses to render the size column in Outlook. Attachments have a corresponding PidTagAttachSize. These values are recomputed by scanpst and compared to the byte. If the property value disagrees with scanpst's recomputed value, scanpst flags every message in the store as corrupted. The two properties use different formulas: message size counts on-disk data-tree bytes including the zero-fill padding of non-last leaves; attachment size is the sum of property value lengths with no heap overhead. I matched both formulas to the byte against scanpst's repair oracle.

2. PidTagConversationId per MS-OXCMSG

Conversation tracking in Outlook depends on a 16-byte PidTagConversationId property that is derived from the conversation topic. The MS-OXCMSG specification defines the derivation as the MD5 of the uppercased UTF-16 encoded topic string. Get the encoding wrong (UTF-8 instead of UTF-16) and the conversation thread breaks. Get the case folding wrong (locale-sensitive instead of invariant) and Turkish-language emails detach from their threads. I implemented this from spec with a dependency-free MD5 inlined in the writer crate.

3. Store-global LtpRowVer from the header dwUnique counter

Each row in every table inside the PST carries an LtpRowVer field that distinguishes versions of the same row across modifications. Outlook expects this value to come from the store-global dwUnique counter that lives in the file header. Incrementing dwUnique once per row and writing the new value back into the header on each row insert mirrors the exact behavior Outlook uses. My first implementation used per-table local counters, which collided across tables and triggered scanpst's row-renumber pass on every export. Moving to the store-global counter cleared that class of error entirely.

4. XBLOCK and XXBLOCK subnode data trees for large content

PST blocks are 8 KB. A long HTML email body or a gigabyte attachment cannot fit in one block. The format handles this with XBLOCK and XXBLOCK subnode data trees: an XBLOCK is a block of pointers to data blocks. An XXBLOCK is a block of pointers to XBLOCKs. Three levels of indirection give you enough addressing space for any realistic message size. I implemented full XBLOCK and XXBLOCK construction with the correct CRC computation on each level and verified the data tree shape against scanpst's expected structure.

5. The 65,535-messages-per-folder ceiling

The PST format has a hard limit of 65,535 messages per folder, imposed by the size of the table row counter. Even Outlook cannot exceed this cleanly. My writer enforces the limit and splits oversize folders into "Folder Name (Part 1 of N)" and "Folder Name (Part 2 of N)" subfolders rather than producing a structurally invalid file. This matters for forensic exports of large mailboxes where a single Inbox might contain hundreds of thousands of messages.

6. The 47-node store scaffold

An empty PST file is not actually empty. It contains roughly 47 mandatory scaffold nodes that Outlook expects to find: the root folder, the Recoverable Items dumpster, search folders, the Top of Information Store, the Address Book Container, the Receive Folders table, the AdminFIDs table and roughly forty others. Each has a specific node ID, a specific property set and a specific position in the NDB B-tree. Missing any one of them causes Outlook to refuse the file with "the file is not a personal folders file" errors. I built the full scaffold from spec on every fresh PST and verified each scaffold node against the layout Outlook itself produces.

7. NDB B-tree balanced insertion

The Node Database B-tree at the heart of the PST format is a balanced search tree keyed on node ID. Inserts have to maintain the balance invariant or scanpst rejects the file. I implemented split-on-overflow and merge-on-underflow with the specific page header and CRC requirements MS-PST documents for each B-tree page. The B-tree at the end of a writer run is shape-equivalent to the B-tree Outlook builds when it inserts the same set of nodes in the same order.

8. Allocation map rebuild (including an upstream MS-PST bug fix)

The allocation map tracks which pages of the file are currently in use. On every block insert or delete the AMap (Allocation Map) and PMap (Page Map) bitmaps have to be updated. Building these from scratch on every fresh PST is straightforward. Rebuilding them after edits is not. While implementing the rebuild path I identified an actual upstream MS-PST specification bug in how the PMap rebuild interacts with the AMap bitmap during a contiguous-block reallocation. The bug produced an allocation-map page past EOF, which scanpst flagged. I worked around it in the writer and verified the output against scanpst. The MS-PST team has been notified.

What scanpst.exe actually said

The verdict scanpst returns on my 1,117-message test export, verbatim:

"Only minor inconsistencies in this file. Repair optional."

That is the same verdict scanpst returns on .pst files Microsoft Outlook itself creates. I want to be clear about what this means. Outlook's own output is not bit-perfect either. The format permits small cosmetic deviations that scanpst flags but does not consider corruption. There is no writer in the world that gets scanpst to return "no errors" on a freshly-produced PST, because Outlook itself does not earn that verdict. Outlook-parity is the real ceiling for a PST writer. That is the bar I built to.

The commercial converter that supplied the test mailbox earned the opposite verdict on the same source content: severe errors, every message orphaned, thousands of errors flagged, repair time measured in tens of minutes. The difference comes down to whether the writer understands what Microsoft means by "valid" or whether it just emits the bits the spec describes and hopes for the best.

Forensic guarantees

For a tool that produces evidence intended for court submission, the engineering correctness is the floor, not the ceiling. The chain-of-custody guarantees are what determine whether the output survives the courtroom. PST Viewer v1.3.0 ships with four forensic guarantees:

Read-only on the source. The viewer never writes to the source PST or NSF file. The source is opened with a Rust file handle that has no write capability. The writer only ever creates a new file. This is provable from the source code, not policy.

Deterministic byte-stable output. Running the same export on the same input twice produces byte-identical PST output. The dwUnique counter starts from a deterministic seed derived from the source content, not from system time or process ID. Two examiners running the same export on two different machines produce identical SHA-256 hashes.

SHA-256 manifest plus chain-of-custody report. Every export emits a report file with per-message SHA-256 hashes, source-file SHA-256, examiner identification, source-file metadata and a timestamped session ID. Opposing counsel can verify the manifest independently using any SHA-256 verifier. They do not need my software to confirm authenticity.

Never-silently-drops semantics. A forensic tool must account for every item in the source. If the writer cannot fully reconstruct a particular message (because the source contains a malformed record that even my bounds-checked parser cannot interpret), the message is preserved in the output as a metadata stub and also recorded in a .unexported.csv sidecar file. The examiner has a complete accounting: every message ID in the source appears either in the output PST or in the sidecar, with no silent loss.

What this enables for the work I actually do

The forensic case scenarios v1.3.0 unlocks for me and my fellow examiners:

Custodian mailbox extraction from a corrupted PST. Source PST is structurally damaged. Outlook will not open it. The standard workflow is to run scanpst against the source, accept its repair output and produce a derivative PST. The v1.3.0 alternative: open the source in PST Viewer (the bounds-checked parser handles malformed records gracefully), select the custodian's messages and export to a fresh PST. The output is scanpst-clean. Opposing counsel cannot challenge the export on grounds of source corruption because the export is independently verifiable from the manifest.

Lotus Notes NSF to Outlook PST conversion. v1.3.0 ships NSF to PST conversion in the NSF Viewer. To my knowledge no commercial converter does this cleanly today. The output PST is scanpst-clean and fully indexable by Outlook. Lotus Notes migrations in regulated industries (legal, healthcare, financial services, government) have been waiting for this for over a decade.

Selective export for litigation production. The Export Selected feature builds a PST containing exactly N hand-picked messages. This is the workflow for producing a responsive subset of a custodian mailbox to opposing counsel without revealing the full archive. The output PST is structurally indistinguishable from a native Outlook export, so document review platforms (Relativity, Everlaw, Logikcull) ingest it without special handling.

Cross-jurisdictional evidence preservation. The deterministic byte-stable output means an examiner in Vancouver can produce a PST that an examiner in Toronto can independently verify by re-running the same export against the same source. The chain of custody survives the handoff.

NSF to PST is the second engineering first, shipped in NSF Viewer v1.1.0

The PST writer is the headline of this release. The second engineering first I want to call out is the NSF to PST conversion path I built on top of the writer, which uses the same Rust binary-format work that produces the PST output. NSF is the IBM (now HCL) Lotus Notes mail archive format. It is a different beast from PST: NoteID-keyed records, different attachment semantics, different rich-text encoding. To my knowledge no commercial vendor produces NSF to PST output that passes scanpst at Outlook-level validity. The Rust writer in v1.3.0 produces that output. The same four forensic guarantees apply to the NSF to PST path.

Update 2026-06-07: NSF Viewer v1.1.0 shipped tonight with the NSF to PST conversion path live in the customer-downloadable binary. The same Rust writer this post is about now powers the user-facing capability. This is the first time NSF evidence can be normalized to PST for ingestion into modern eDiscovery platforms without round-tripping through a third-party vendor whose chain of custody you cannot independently verify. For legal teams handling matters involving custodians who used Lotus Notes (still common in banking, insurance, government and certain healthcare environments), that is the headline.

The full v1.3.0 changelog

For the engineers who want the complete list:

Scanpst-clean from-scratch PST writer. 8+ format-level fixes detailed above. Court-defensible PST reconstruction that passes Microsoft's validator.
Nested and named folder creation. Real folder trees in exports, not a flat dump.
Export-to-PST and Export-to-Mbox from the viewer. Output folder named after the source folder.
NSF to PST conversion architecture. The Rust writer in v1.3.0 is built to also serve as the output stage for an NSF to PST conversion path that will ship in an upcoming NSF Viewer release. Lotus Notes migration path that survives chain of custody, shipped in NSF Viewer v1.1.0 (2026-06-07) as the user-facing conversion feature.
Unified 3-tab EXPORT panel. Export Folder / Extract Attachments / Export Docs in one discoverable workflow.
EXPORT SELECTED. Build a PST of N hand-picked messages for targeted litigation production.
Never-silently-drop export. Metadata stubs plus .unexported.csv sidecar.
Export success dialog with Open folder. No more silently-closing confirmation box.
Left rail expanded by default. New-user discoverability.
Shift-click range selection in the message list. Select 20 messages fast.

Try it

PST Viewer v1.3.0 Free Edition is available for download today. It opens PST, OST, MSG and EML files, browses folders, searches messages and verifies SHA-256 hashes. The Free Edition does not include export to PST or NSF to PST. Those are Forensic Edition features. Try the Free Edition first. If you need the writer for forensic export work the Forensic Edition is $67 USD lifetime license.

Get PST Viewer v1.3.0 Get NSF Viewer with NSF to PST

About the author

Ryan Purita is the principal digital forensic examiner at Sherlock Forensics in Vancouver, BC. He founded the firm in 2004. He holds CISSP-ISSAP and ISSMP certifications and is a court-qualified expert witness in BC Supreme Court, BC Provincial Court and Newfoundland Provincial Court. He has conducted thousands of forensic examinations spanning criminal prosecution, criminal defence, civil litigation, corporate fraud and regulatory compliance. He builds the tools he uses on real engagements. PST Viewer v1.3.0 is the latest.

If you have a matter that needs forensic email analysis or chain-of-custody preservation work, call Sherlock Forensics at 888.883.4550 or visit the services overview.