Update documentation

tomzx · tomzx · commit 52bc0f3224f8 · 2026-01-22T00:01:17.000-05:00
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -2,6 +2,10 @@
 
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 
+## Rules
+
+- Always update the `CLAUDE.md`, `README.md`, `docs/`, and `spec/` files when there are significant changes to the codebase or architecture.
+
 ## Common Commands
 
 ```bash
@@ -17,7 +21,11 @@ code-lod init              # Initialize in project directory
 code-lod generate          # Generate descriptions
 code-lod status            # Check description freshness
 code-lod validate          # Validate descriptions
+code-lod update            # Update stale descriptions
 code-lod read              # Output descriptions in LLM-consumable format
+code-lod config set-model  # Configure LLM models per scope
+code-lod install-hook      # Install git pre-commit hook
+code-lod clean             # Remove all code-lod data
 
 # Documentation
 uv run mkdocs build        # Build documentation
@@ -36,7 +44,7 @@ Code LoD is a CLI tool that generates and manages code descriptions at different
 
 3. **Staleness Tracking** (`staleness.py`): `StalenessTracker` uses the hash index to determine if descriptions need regeneration.
 
-4. **Generation** (`llm/`): Abstract `BaseGenerator` interface for LLM providers. Currently uses mock generator; real providers (OpenAI, Anthropic, Ollama) are planned.
+4. **Generation** (`llm/description_generator/`): LLM provider implementations (OpenAI, Anthropic, Ollama, Mock) with auto-detection from environment variables and scope-specific model selection.
 
 5. **Storage** (`db.py`, `lod_file/`): Dual storage system:
    - SQLite database (`hash_index.db`) for metadata and caching
@@ -55,25 +63,26 @@ src/code_lod/
 ├── cli/                # Typer CLI commands (one file per command)
 │   ├── __init__.py     # Main app entry point
 │   ├── clean.py        # Clean all code-lod data
-│   ├── config.py       # Configuration management
+│   ├── config.py       # Configuration management (config, set-model commands)
 │   ├── generate.py     # Generate descriptions
-│   ├── hooks.py        # Git hooks installation
+│   ├── hooks.py        # Git hooks installation/removal
 │   ├── init.py         # Initialize code-lod
 │   ├── read.py         # Output descriptions
 │   ├── status.py       # Check freshness status
 │   ├── update.py       # Update stale descriptions
 │   └── validate.py     # Validate descriptions
-├── config.py           # Paths management
+├── config.py           # Configuration and paths management
 ├── db.py               # SQLite hash index
 ├── hashing.py          # AST hash computation
 ├── models.py           # Pydantic data models
 ├── staleness.py        # StalenessTracker
 ├── llm/
 │   ├── __init__.py
-│   └── description_generator/  # LLM generator abstraction
-│       ├── generator.py  # BaseGenerator interface
+│   └── description_generator/  # LLM generator implementations
+│       ├── generator.py  # BaseGenerator, Provider enum, get_generator()
 │       ├── anthropic.py  # Anthropic Claude provider
 │       ├── openai.py     # OpenAI provider
+│       ├── ollama.py     # Ollama local models provider
 │       └── mock.py       # Mock generator for testing
 ├── parsers/            # BaseParser, tree-sitter implementations
 └── lod_file/           # .lod file read/write/comment parsing
@@ -90,8 +99,18 @@ src/code_lod/
 
 ### Configuration
 
-Stored in `.code-lod/config.json` with supported languages. Paths are resolved relative to project root via `Paths` dataclass.
+Stored in `.code-lod/config.json`:
+- `languages`: List of supported languages
+- `provider`: LLM provider (openai, anthropic, ollama, mock)
+- `model_settings`: Hierarchical model configuration per scope
+  - Supports different models for different scopes (project, package, module, class, function)
+
+Provider auto-detection: Checks `ANTHROPIC_API_KEY`, `OPENAI_API_KEY` environment variables. Falls back to mock if none found.
+
+Paths are resolved relative to project root via `Paths` dataclass.
 
 ### Git Hooks
 
-The `install_hook` command creates pre-commit hooks that run `code-lod validate --fail-on-stale` to ensure descriptions stay fresh.
+The `install-hook` command creates pre-commit hooks that run `code-lod validate --fail-on-stale` to ensure descriptions stay fresh. Use `uninstall-hook` to remove the hook.
+
+Supports both `pre-commit` and `pre-push` hook types via `--hook-type` option.
diff --git a/README.md b/README.md
@@ -70,7 +70,10 @@ Reading a project's README and source code works for small projects, but becomes
 | `validate` | Validate description freshness |
 | `update` | Update stale descriptions |
 | `read` | Output descriptions in LLM-consumable format |
+| `config` | Get or set configuration values |
+| `config set-model` | Configure LLM models per scope |
 | `install-hook` | Install git pre-commit hook |
+| `uninstall-hook` | Remove git hook |
 | `clean` | Remove all code-lod data |
 
 ## Architecture
@@ -80,7 +83,7 @@ Code LoD generates, manages, and updates code descriptions through a multi-layer
 1. **Parsing** (`parsers/`): Tree-sitter based parsers extract code entities (functions, classes, modules) with AST hashes
 2. **Hashing** (`hashing.py`): AST hashes are computed on normalized source to detect semantic changes
 3. **Staleness Tracking** (`staleness.py`): Uses the hash index to determine if descriptions need regeneration
-4. **Generation** (`llm/description_generator/`): Abstract `BaseGenerator` interface for LLM providers (Anthropic, OpenAI, Mock)
+4. **Generation** (`llm/description_generator/`): LLM provider implementations (OpenAI, Anthropic, Ollama, Mock) with auto-detection and scope-specific model selection
 5. **Storage** (`db.py`, `lod_file/`): Dual storage system with SQLite database and `.lod` files
 
 ### Storage
@@ -92,6 +95,46 @@ Code LoD uses a dual storage system:
 
 Descriptions are organized by hierarchical scope: `project` > `package` > `module` > `class` > `function`.
 
+## LLM Provider Configuration
+
+Code LoD supports multiple LLM providers for generating descriptions:
+
+### Supported Providers
+
+- **OpenAI**: GPT-4, GPT-4o, GPT-3.5-turbo
+- **Anthropic**: Claude Sonnet, Claude Haiku, Claude Opus
+- **Ollama**: Local models (e.g., llama2, mistral, codellama)
+- **Mock**: Placeholder descriptions for testing (no API key required)
+
+### Configuration
+
+Set your API key via environment variables:
+
+```bash
+# For OpenAI
+export OPENAI_API_KEY="sk-..."
+
+# For Anthropic
+export ANTHROPIC_API_KEY="sk-ant-..."
+```
+
+Code LoD auto-detects the available provider from environment variables. Configure different models for different scopes:
+
+```bash
+# Set model for all scopes
+code-lod config set-model --provider openai --model gpt-4o
+
+# Set model for specific scope
+code-lod config set-model --scope function --provider openai --model gpt-4o
+code-lod config set-model --scope project --provider anthropic --model claude-sonnet
+```
+
+For Ollama (local models):
+
+```bash
+code-lod config set-model --provider ollama --model codellama
+```
+
 ## Development
 
 ```bash
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -39,19 +39,23 @@ Code LoD uses a modular architecture with clear separation of concerns:
 
 ## Components
 
-### CLI Layer (`cli.py`)
-
-Built with [Typer](https://typer.tiangolo.com/), the CLI provides all user-facing commands:
-
-| Command | Handler |
-|---------|---------|
-| `init` | Creates project structure and config |
-| `generate` | Parses code, generates descriptions |
-| `status` | Reports freshness status |
-| `validate` | Checks for stale descriptions |
-| `update` | Regenerates stale descriptions |
-| `read` | Outputs descriptions for LLMs |
-| `install-hook` | Installs git hooks |
+### CLI Layer (`cli/`)
+
+Built with [Typer](https://typer.tiangolo.com/), the CLI provides all user-facing commands. Each command is implemented in its own module:
+
+| Command | Handler | Description |
+|---------|---------|-------------|
+| `init` | `cli/init.py` | Creates project structure and config |
+| `generate` | `cli/generate.py` | Parses code, generates descriptions |
+| `status` | `cli/status.py` | Reports freshness status |
+| `validate` | `cli/validate.py` | Checks for stale descriptions |
+| `update` | `cli/update.py` | Regenerates stale descriptions |
+| `read` | `cli/read.py` | Outputs descriptions for LLMs |
+| `config` | `cli/config.py` | Get/set configuration values |
+| `config set-model` | `cli/config.py` | Configure LLM models per scope |
+| `install-hook` | `cli/hooks.py` | Installs git hooks |
+| `uninstall-hook` | `cli/hooks.py` | Removes git hooks |
+| `clean` | `cli/clean.py` | Removes all code-lod data |
 
 ### Parser Layer (`parsers/`)
 
@@ -166,22 +170,67 @@ def authenticate_user(username: str, password: str) -> str:
 - **SQLite**: Fast lookups, caching, revert detection
 - **.lod files**: Human-readable, version-controlled, LLM-consumable
 
-### LLM Integration (`llm/`)
+### LLM Integration (`llm/description_generator/`)
+
+Abstract interface for description generation with multiple provider implementations:
+
+```python
+class Provider(str, Enum):
+    OPENAI = "openai"
+    ANTHROPIC = "anthropic"
+    OLLAMA = "ollama"
+    MOCK = "mock"
+
+def get_generator(
+    provider: Provider | None = None,
+    model: str | None = None,
+) -> DescriptionGenerator:
+    """Get a description generator instance."""
+```
+
+#### Provider Implementations
+
+**OpenAI** (`openai.py`): GPT-4, GPT-4o, GPT-3.5-turbo via `openai` package
+
+**Anthropic** (`anthropic.py`): Claude Sonnet, Claude Haiku, Claude Opus via `anthropic` package
+
+**Ollama** (`ollama.py`): Local models (codellama, mistral, llama2, etc.) via `ollama` package
 
-Abstract interface for description generation:
+**Mock** (`mock.py`): Placeholder descriptions for testing (no API key required)
+
+#### Auto-Detection
+
+The `get_generator()` function auto-detects providers from environment variables:
+1. Checks `ANTHROPIC_API_KEY` → uses Anthropic
+2. Checks `OPENAI_API_KEY` → uses OpenAI
+3. Falls back to Mock generator
+
+#### Scope-Specific Models
+
+Configure different models for different hierarchical scopes:
+
+```bash
+code-lod config set-model --scope function --provider openai --model gpt-4o
+code-lod config set-model --scope project --provider anthropic --model claude-sonnet
+```
+
+#### Base Generator Interface
 
 ```python
-class BaseGenerator(ABC):
+class DescriptionGenerator(ABC):
     @abstractmethod
-    def generate(self, entity: ParsedEntity) -> str:
+    def generate(self, entity: ParsedEntity, context: str | None = None) -> str:
         """Generate a description for a code entity."""
-```
 
-Currently uses a mock generator. Planned providers:
+    @abstractmethod
+    def generate_batch(self, entities: list[ParsedEntity], context: str | None = None) -> list[str]:
+        """Generate descriptions for multiple entities."""
+```
 
-- OpenAI (GPT-4, o1)
-- Anthropic (Claude)
-- Ollama (local models)
+The `BaseLLMDescriptionGenerator` provides:
+- Prompt templates for function, class, and module scopes
+- Source truncation for large code blocks
+- Automatic fallback to mock on API errors
 
 ## Data Models
 
@@ -215,15 +264,31 @@ class ParsedEntity:
 ```
 src/code_lod/
 ├── __init__.py
-├── cli.py              # Main CLI commands
-├── config.py           # Configuration management
+├── __main__.py         # Entry point
+├── cli/                # CLI commands (one per file)
+│   ├── __init__.py     # Main app, command registration
+│   ├── clean.py        # Clean command
+│   ├── config.py       # Config and set-model commands
+│   ├── generate.py     # Generate command
+│   ├── hooks.py        # install-hook, uninstall-hook
+│   ├── init.py         # Init command
+│   ├── read.py         # Read command
+│   ├── status.py       # Status command
+│   ├── update.py       # Update command
+│   └── validate.py     # Validate command
+├── config.py           # Configuration and paths management
 ├── db.py               # SQLite database layer
 ├── hashing.py          # AST hash computation
 ├── models.py           # Pydantic data models
 ├── staleness.py        # Staleness tracking
 ├── llm/                # LLM integration
 │   ├── __init__.py
-│   └── generator.py    # Base generator interface
+│   └── description_generator/
+│       ├── generator.py    # Base classes, Provider enum, get_generator()
+│       ├── anthropic.py    # Anthropic Claude provider
+│       ├── openai.py       # OpenAI provider
+│       ├── ollama.py       # Ollama local models provider
+│       └── mock.py         # Mock generator for testing
 ├── parsers/            # Code parsers
 │   ├── __init__.py
 │   ├── base.py         # BaseParser interface
diff --git a/docs/commands.md b/docs/commands.md
@@ -379,5 +379,49 @@ code-lod config languages
 code-lod config languages python,javascript
 ```
 
+---
+
+## `config set-model`
+
+Configure LLM models per scope.
+
+```bash
+code-lod config set-model [OPTIONS]
+```
+
+### Options
+
+| Option | Short | Default | Description |
+|--------|-------|---------|-------------|
+| `--scope` | `-s` | `None` | Scope to configure (project, package, module, class, function) |
+| `--provider` | `-p` | `None` | LLM provider (openai, anthropic, ollama, mock) |
+| `--model` | `-m` | `None` | Model name |
+
+### Provider Options
+
+| Provider | Environment Variable | Default Models |
+|----------|---------------------|----------------|
+| `openai` | `OPENAI_API_KEY` | gpt-4o, gpt-4-turbo, gpt-3.5-turbo |
+| `anthropic` | `ANTHROPIC_API_KEY` | claude-sonnet, claude-haiku, claude-opus |
+| `ollama` | (none) | codellama, mistral, llama2, etc. |
+| `mock` | (none) | (no API key required) |
+
+### Examples
+
+```bash
+# Set model for all scopes
+code-lod config set-model --provider openai --model gpt-4o
+
+# Set different models for different scopes
+code-lod config set-model --scope function --provider openai --model gpt-4o
+code-lod config set-model --scope project --provider anthropic --model claude-sonnet
+
+# Use Ollama for local generation
+code-lod config set-model --provider ollama --model codellama
+
+# Use mock for testing (no API key)
+code-lod config set-model --provider mock
+```
+
 !!! note
-    This command is not yet fully implemented.
+    If no provider is specified, Code LoD auto-detects from environment variables (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`).
diff --git a/docs/getting-started.md b/docs/getting-started.md