Create spec directory (vibe-kanban bb30cf94)

tomzx · tomzx · commit 9ea00c064e35 · 2026-01-21T23:47:58.000-05:00
/directory-to-spec

Symlink the spec directory in ../docs using a relative symlink and update mkdocs.yml.
diff --git a/docs/spec b/docs/spec
@@ -0,0 +1 @@
+../spec
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -48,6 +48,15 @@ nav:
   - Getting Started: getting-started.md
   - Commands: commands.md
   - Architecture: architecture.md
+  - Specifications:
+      - Code Parsing: spec/code-parsing.md
+      - Description Storage: spec/description-storage.md
+      - Staleness Tracking: spec/staleness-tracking.md
+      - LLM Integration: spec/llm-integration.md
+      - Configuration Management: spec/configuration-management.md
+      - Git Integration: spec/git-integration.md
+      - Hierarchical Scopes: spec/hierarchical-scopes.md
+      - CLI Interface: spec/cli-interface.md
 
 markdown_extensions:
   - pymdownx.highlight:
diff --git a/spec/cli-interface.md b/spec/cli-interface.md
@@ -0,0 +1,96 @@
+# CLI Interface
+
+## Overview
+
+The CLI interface provides command-line access to all code-lod operations using Typer. Each command is implemented in a separate module for maintainability.
+
+## Requirements
+
+### MUST
+
+- The system MUST provide commands: init, generate, status, validate, read, update, clean, config, hooks
+- Each command MUST be implemented in a separate file under `cli/`
+- The system MUST use Typer for command parsing and help text
+- The system MUST auto-detect the project root from the current directory
+- The system MUST provide clear error messages for common failure cases
+
+### Command Descriptions
+
+**init**: Initialize code-lod in a project directory
+- Creates `.code-lod` directory structure
+- Creates default `config.json`
+
+**generate**: Generate descriptions for code entities
+- Parses source files
+- Generates descriptions via LLM
+- Stores in database and `.lod` files
+
+**status**: Check freshness status of descriptions
+- Shows total, fresh, and stale counts
+- Lists stale entries
+
+**validate**: Validate descriptions
+- Checks for stale descriptions
+- Can fail with exit code 1 if stale entries found
+
+**read**: Output descriptions in LLM-consumable format
+- Retrieves descriptions from storage
+- Formats for LLM input
+
+**update**: Update stale descriptions
+- Regenerates only stale entries
+- Updates database and `.lod` files
+
+**clean**: Clean all code-lod data
+- Removes `.code-lod` directory
+- Removes all `.lod` files
+
+**config**: Configuration management
+- View and edit configuration
+- Set provider and model options
+
+**hooks**: Git hooks management
+- install: Install pre-commit hook
+- uninstall: Remove installed hooks
+
+### SHOULD
+
+- Commands SHOULD support common options (verbose, quiet, etc.)
+- Commands SHOULD provide helpful output for success and failure cases
+
+### MAY
+
+- The system MAY add additional commands in the future
+- The system MAY support shell completion for commands
+
+## Implementation
+
+### CLI Structure
+
+```
+cli/
+├── __init__.py      # Main app registration
+├── init.py          # Initialize code-lod
+├── generate.py      # Generate descriptions
+├── status.py        # Check freshness
+├── validate.py      # Validate descriptions
+├── read.py          # Output descriptions
+├── update.py        # Update stale descriptions
+├── clean.py         # Clean all data
+├── config.py        # Configuration management
+└── hooks.py         # Git hooks
+```
+
+### Main App (`cli/__init__.py`)
+
+- Creates the main Typer app
+- Registers all sub-commands
+- Provides top-level help and version info
+
+### Command Pattern
+
+Each command module:
+- Defines one or more Typer functions
+- Uses `get_paths()` to find project root
+- Handles errors with appropriate exit codes
+- Provides user-friendly output via `typer.echo()`
diff --git a/spec/code-parsing.md b/spec/code-parsing.md
@@ -0,0 +1,50 @@
+# Code Parsing
+
+## Overview
+
+The code parsing feature extracts code entities (functions, classes, modules) from source files using Tree-sitter parsers. It computes AST hashes for each entity to enable change detection and staleness tracking.
+
+## Requirements
+
+### MUST
+
+- The parser MUST extract all functions, classes, and module-level entities from source files
+- The parser MUST compute AST hashes for each extracted entity using normalized source code
+- The parser MUST support Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby, PHP, C#, Scala, Bash, YAML, JSON, TOML, and Markdown
+- The parser MUST provide a file extension to language name mapping
+- The parser MUST detect the programming language from file extensions automatically
+- The base parser interface MUST be implemented as an abstract base class
+- Each parsed entity MUST include: scope, name, location (path, start_line, end_line), source code, AST hash, language, and optional parent name
+
+### SHOULD
+
+- The parser SHOULD normalize source code before hashing to ignore cosmetic changes (comments, whitespace)
+- The parser SHOULD extract parent names for nested entities (methods in classes)
+- The parser SHOULD handle language-specific node types for functions and classes
+
+### MAY
+
+- The parser MAY support additional languages via Tree-sitter language pack
+- The parser MAY cache parsed entities for performance
+
+## Implementation
+
+### BaseParser Interface
+
+Abstract base class defining:
+- `language` property: Returns the language name
+- `parse_file(path)`: Parses a file and returns list of ParsedEntity
+- `parse_module(source, path)`: Parses a module as a whole
+
+### TreeSitterParser
+
+Concrete implementation using Tree-sitter:
+- Maintains language-specific node type mappings for functions and classes
+- Traverses the AST to extract entities with proper parent relationships
+- Uses tree-sitter-language-pack for dynamic language loading
+
+### Hash Computation
+
+- Normalizes source by stripping comments and normalizing whitespace
+- Computes SHA-256 hash prefixed with "sha256:"
+- Hashes are used for change detection and staleness tracking
diff --git a/spec/configuration-management.md b/spec/configuration-management.md
@@ -0,0 +1,66 @@
+# Configuration Management
+
+## Overview
+
+The configuration management feature handles project configuration, provider settings, and model selection per scope. Configuration is stored in `.code-lod/config.json` and manages paths relative to the project root.
+
+## Requirements
+
+### MUST
+
+- The system MUST store configuration in `.code-lod/config.json`
+- The system MUST auto-detect the project root by searching for the `.code-lod` directory
+- The configuration MUST support: languages list, auto_update flag, fail_on_stale flag, provider selection, and per-provider model settings
+- Model settings MUST support default and scope-specific models (project, package, module, class, function)
+- The system MUST provide standard paths: code_lod_dir, lod_dir, config_file, hash_db
+- The system MUST validate hash format in `@lod` comments
+
+### SHOULD
+
+- The system SHOULD provide default configuration when config file doesn't exist
+- The system SHOULD handle configuration errors gracefully by falling back to defaults
+- The system SHOULD allow querying model configuration for specific scopes and providers
+
+### MAY
+
+- The system MAY support additional configuration options in the future
+- The system MAY provide configuration validation and schema checking
+
+## Implementation
+
+### Config Model
+
+Pydantic BaseModel with fields:
+- `languages`: List of supported languages (default: ["python"])
+- `auto_update`: Whether to auto-update descriptions (default: false)
+- `fail_on_stale`: Whether to fail validation on stale descriptions (default: false)
+- `provider`: LLM provider to use (default: Provider.MOCK)
+- `model_settings`: Dict mapping Provider to ModelConfig
+
+### ModelConfig Model
+
+Pydantic BaseModel for per-provider model settings:
+- `default`: Default model for the provider
+- `project`: Model for PROJECT scope
+- `package`: Model for PACKAGE scope
+- `module`: Model for MODULE scope
+- `class_`: Model for CLASS scope
+- `function`: Model for FUNCTION scope
+- `get_model_for_scope(scope)`: Method to retrieve model for a specific scope
+
+### Paths Dataclass
+
+Frozen dataclass with path management:
+- `root_dir`: Project root directory
+- `code_lod_dir`: `.code-lod` directory
+- `lod_dir`: `.code-lod/.lod` directory
+- `config_file`: `.code-lod/config.json`
+- `hash_db`: `.code-lod/hash-index.db`
+
+### Configuration Functions
+
+- `find_project_root(start_path)`: Searches upward for `.code-lod` directory
+- `get_paths(root_dir)`: Returns Paths object for the project
+- `load_config(paths)`: Loads configuration from file or returns defaults
+- `save_config(config, paths)`: Saves configuration to file
+- `get_model_for_scope(config, provider, scope)`: Retrieves configured model
diff --git a/spec/description-storage.md b/spec/description-storage.md
@@ -0,0 +1,82 @@
+# Description Storage
+
+## Overview
+
+The description storage feature provides a dual storage system for code descriptions: a SQLite database for metadata and caching, and `.lod` files alongside source code with structured `@lod` comments for human readability.
+
+## Requirements
+
+### MUST
+
+- The system MUST maintain a SQLite database at `.code-lod/hash-index.db` for hash-to-description mapping
+- The database MUST store: hash, description, stale status, created_at, updated_at, and hash_history
+- The system MUST create `.lod` files alongside source files to store descriptions
+- `.lod` files MUST use structured `@lod` comments with hash, stale status, and description
+- The system MUST support reading and writing `.lod` files
+- The system MUST parse `@lod` comments to extract hash, stale, and description fields
+- The database MUST support CRUD operations: get, set, mark_stale, mark_fresh, delete
+- Database connections MUST use context managers for proper cleanup
+
+### SHOULD
+
+- `.lod` files SHOULD include function/class signatures for readability
+- `.lod` files SHOULD preserve module-level descriptions
+- The writer SHOULD format comments appropriately for the programming language
+
+### MAY
+
+- The system MAY support additional storage backends in the future
+- The system MAY compress descriptions in the database for large codebases
+
+## Implementation
+
+### SQLite Database (HashIndex)
+
+Table schema:
+```sql
+CREATE TABLE descriptions (
+    hash TEXT PRIMARY KEY,
+    description TEXT NOT NULL,
+    stale BOOLEAN DEFAULT FALSE,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    hash_history TEXT DEFAULT '[]'
+)
+```
+
+Operations:
+- `get(hash_)`: Retrieve a description record
+- `set(hash_, description, stale, hash_history)`: Create or update a record
+- `mark_stale(hash_)`: Mark a description as stale
+- `mark_fresh(hash_)`: Mark a description as fresh
+- `get_all_stale()`: Retrieve all stale records
+- `delete(hash_)`: Remove a record
+
+### .lod Files
+
+Structure:
+- Module-level description at the top (optional)
+- Entity descriptions with `@lod` annotations
+
+Comment format:
+```
+# @lod hash:sha256:<hexdigest> stale:true/false
+# @lod description:<description text>
+<class_or_function_signature>
+```
+
+### LodReader
+
+Parses `.lod` files and extracts:
+- Scope (function, class, module)
+- Name
+- Hash, stale status, description
+- Signature
+- Line numbers
+
+### LodWriter
+
+Writes `.lod` files with:
+- Module description header
+- Entity descriptions with signatures
+- Language-appropriate comment syntax
diff --git a/spec/git-integration.md b/spec/git-integration.md
@@ -0,0 +1,58 @@
+# Git Integration
+
+## Overview
+
+The Git integration feature provides pre-commit hooks to ensure code descriptions stay fresh. It automatically validates descriptions before commits, preventing stale code documentation from being committed.
+
+## Requirements
+
+### MUST
+
+- The system MUST support installing pre-commit hooks
+- The system MUST support uninstalling hooks
+- The installed hook MUST run `code-lod validate --fail-on-stale`
+- The hook script MUST be executable (chmod 0o755)
+- The hook MUST be installed in `.git/hooks/`
+- The system MUST verify that code-lod is initialized before installing hooks
+- The system MUST verify that the directory is a git repository before installing hooks
+
+### SHOULD
+
+- The system SHOULD support additional hook types (e.g., pre-push)
+- The system SHOULD provide clear error messages when initialization or git repository checks fail
+
+### MAY
+
+- The system MAY support hook customization (e.g., different validation commands)
+- The system MAY integrate with other hook managers (e.g., pre-commit framework)
+
+## Implementation
+
+### install_hook Function
+
+Creates a git hook script:
+1. Validates code-lod is initialized (checks for `.code-lod` directory)
+2. Validates the directory is a git repository (checks for `.git/hooks`)
+3. Creates the hook script with appropriate content
+4. Sets executable permissions (0o755)
+5. Reports success to the user
+
+Hook script template:
+```bash
+#!/bin/sh
+# code-lod {hook_type} hook
+code-lod validate --fail-on-stale
+```
+
+### uninstall_hook Function
+
+Removes the git hook:
+1. Validates code-lod is initialized
+2. Removes `.git/hooks/pre-commit` if it exists
+3. Reports success or that no hook was found
+
+### Error Handling
+
+- Exits with status code 1 if code-lod is not initialized
+- Exits with status code 1 if not in a git repository
+- Uses typer.error() for user-friendly error messages
diff --git a/spec/hierarchical-scopes.md b/spec/hierarchical-scopes.md
diff --git a/spec/llm-integration.md b/spec/llm-integration.md
diff --git a/spec/staleness-tracking.md b/spec/staleness-tracking.md