Active 2025–Present Document Intelligence

AI Memoir & Narrative Pipeline

A deterministic LLM pipeline that transforms fragmented long-form source material into publication-ready output while preserving retrieval quality, editorial control, and full auditability.

LLM APIs FAISS sentence-transformers SQLite Typer FastAPI

Overview

This project is a deterministic AI pipeline built to convert raw communications into a structured long-form manuscript. The input layer ingests emails, WhatsApp archives, and document files into JSONL event streams stored in SQLite, creating a searchable and reproducible record of source material rather than a loose prompt pile.

Retrieval is handled through FAISS and sentence-transformers so the proposal engine can surface relevant evidence for each chapter draft before anything is sent to the LLM. That matters because the problem is not simply text generation; it is controlled generation with the right memory at the right time.

The LLM produces structured chapter proposals that are applied as atomic patches instead of free-form replacements. That patch-based model protects manuscript integrity and makes it possible to review, approve, or reject discrete edits without losing the broader arc.

Every run also passes through narrative safeguards and editorial quality gates. Structural anchors, agency beats, and tension calibration are enforced on each apply pass, while run IDs, metrics logs, and change history make the entire pipeline observable and auditable.

Why this matters to employers and consulting buyers

  • AI adoption with controls: This shows how to introduce LLMs into document workflows without surrendering review rights, evidence grounding, or auditability.
  • Decision-content governance: Retrieval, chapter patching, and run history translate directly to policy, legal, research, and knowledge workflows where output quality matters.
  • Cross-functional coordination: The workflow gives subject-matter experts, editors, and technical owners a shared operating model instead of disconnected prompts and manual rewrites.

Pipeline Flow

Source archives
  |
  |  emails, chat exports, .docx files
  v
Structured ingestion
  |
  |  normalized JSONL events stored in SQLite
  v
Semantic retrieval layer
  |
  |  FAISS + sentence-transformers select relevant evidence
  v
Proposal engine
  |
  |  LLM generates structured chapter proposals
  v
Atomic patch application
  |
  |  manuscript updated through controlled chapter-level patches
  v
Narrative safeguards + quality gate
  |
  |  structural anchors, agency beats, tension calibration, approvals
  v
DOCX export + observability
  |
  +--> run IDs, metrics logs, CHANGELOG, Telegram notifications

Why It Matters

Document intelligence, not blind generation

The system is built around retrieval and validation so the LLM is grounded in evidence, which maps directly to real enterprise document-intelligence problems.

Controlled long-form output

Atomic chapter patching makes this a governed content system, not a single-shot prompt workflow. That is the same operating principle needed for policy, legal, or knowledge outputs.

Quality gates reduce regression risk

The narrative safeguards and editorial approvals ensure each run improves the manuscript without silently breaking structure or consistency.

Observability supports trust

Run-level metrics, IDs, and audit trails make the workflow reviewable and explainable, which is essential when AI output needs stakeholder trust.