Overview

This project is a deterministic AI pipeline built to convert raw communications into a structured long-form manuscript. The input layer ingests emails, WhatsApp archives, and document files into JSONL event streams stored in SQLite, creating a searchable and reproducible record of source material rather than a loose prompt pile.

Retrieval is handled through FAISS and sentence-transformers so the proposal engine can surface relevant evidence for each chapter draft before anything is sent to GPT-4. That matters because the problem is not simply text generation; it is controlled generation with the right memory at the right time.

GPT-4 produces structured chapter proposals that are applied as atomic patches instead of free-form replacements. That patch-based model protects manuscript integrity and makes it possible to review, approve, or reject discrete edits without losing the broader arc.

Every run also passes through narrative safeguards and editorial quality gates. Structural anchors, agency beats, and tension calibration are enforced on each apply pass, while run IDs, metrics logs, and change history make the entire pipeline observable and auditable.

Pipeline Flow

Source archives
  |
  |  emails, chat exports, .docx files
  v
Structured ingestion
  |
  |  normalized JSONL events stored in SQLite
  v
Semantic retrieval layer
  |
  |  FAISS + sentence-transformers select relevant evidence
  v
Proposal engine
  |
  |  GPT-4 generates structured chapter proposals
  v
Atomic patch application
  |
  |  manuscript updated through controlled chapter-level patches
  v
Narrative safeguards + quality gate
  |
  |  structural anchors, agency beats, tension calibration, approvals
  v
DOCX export + observability
  |
  +--> run IDs, metrics logs, CHANGELOG, Telegram notifications

Why It Matters

Document intelligence, not blind generation

The system is built around retrieval and validation so GPT-4 is grounded in evidence, which maps directly to real enterprise document-intelligence problems.

Controlled long-form output

Atomic chapter patching makes this a governed content system, not a single-shot prompt workflow. That is the same operating principle needed for policy, legal, or knowledge outputs.

Quality gates reduce regression risk

The narrative safeguards and editorial approvals ensure each run improves the manuscript without silently breaking structure or consistency.

Observability supports trust

Run-level metrics, IDs, and audit trails make the workflow reviewable and explainable, which is essential when AI output needs stakeholder trust.

AI Memoir & Narrative Pipeline