DDD Enforcer

LLM
RAG
TypeScript^5.9.3
Gemini API>=1.0.0
Python

Project Overview

DDD Enforcer is an LLM-powered VSCode extension that lints code for Domain-Driven Design compliance. It treats your codebase as a domain and uses Gemini-backed retrieval to surface naming-convention violations, banned-term usage, and bounded-context leaks that a regex-based linter would miss.

The flagship behaviour: you tag your domain entities with synonyms_to_avoid lists (e.g., Customer has ["Client", "User", "Buyer"]), and the extension warns whenever any of those synonyms appears in code that's supposed to be talking about Customers. The LLM is what decides whether a usage of "user" is referring to the domain entity Customer or to something else entirely (like the OS user).

How It Works

The extension ships a RAG pipeline tuned for source code:

Indexing. The codebase + DDD glossary documents (Markdown / PDF) are chunked and embedded into a local vector store. Chunks track their file + line origin so violation reports can cite the exact location.
Query. When the linter runs, it builds queries from each entity definition ("Customer entity, synonyms to avoid: Client, User, Buyer") and retrieves the top-K most similar chunks.
Augmented prompt. The retrieved chunks are sent to Gemini along with a structured-output schema asking for {file, line, violation_type, violation_message} per detected violation.
Surfacing. Gemini's response is parsed and surfaced as VSCode diagnostics — squiggly underlines, hover messages, quick-fix suggestions.

Architecture

The extension has three layers:

Indexer (Python). Chunks source files and Markdown glossaries; computes embeddings via a local sentence-transformer; writes to a Chroma collection.
Query Engine (TypeScript). Builds structured queries, retrieves chunks, calls Gemini with a strict prompt template, parses structured-output responses.
VSCode Extension Host (TypeScript). Wires the query engine into VSCode's diagnostic API; runs incrementally as files change.

Each violation has a typed schema:

type Violation = {
  file: string;
  line: number;
  violation_type: 'SynonymViolation' | 'BannedTermViolation' | 'ContextLeak';
  violation_message: string;
};

What I Learned

Structured-output prompting (forcing the model to emit JSON matching a schema) is dramatically more reliable than free-form parsing. Gemini's response_schema setting was a unlock.
Embedding code is different from embedding prose — code-aware chunking that respects function boundaries beats fixed-size chunks by a wide margin.
VSCode's diagnostic API is well-documented but the gotcha is latency — running an LLM-backed linter on every keystroke is a bad idea. Debounce + run on save.
DDD purists will tell you "Client" and "Customer" are different concepts in different bounded contexts. The extension lets you scope synonym lists per context, but that's the kind of feature only one user will ever turn on (me).