You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the largest remaining feature — the distributed mesh LLM described in the whitepaper and issue #27. The mesh LLM is an inter-model Mixture-of-Experts system where GPU donor nodes each run a small language model, a distributed router selects K-of-N experts per token, and the system self-prompts to improve the cluster.
This issue supersedes #27 and provides the detailed implementation breakdown.
Architecture (from whitepaper)
Each GPU donor runs a complete small model (LLaMA-3-8B at 4-bit quantization, ~4-6GB VRAM)
Distributed router selects K-of-N expert nodes per output token
Measure tokens/second at various K values and latencies
Test kill switch → verify immediate halt
Test self-prompting loop → verify actionable output
Test action tier escalation → verify governance gating
Test with heterogeneous models (different sizes, same tokenizer)
Test graceful degradation with fewer than 280 nodes
Bandwidth measurement: verify <2KB per expert per token
Notes
This is a major undertaking that should be broken into sub-tasks during planning. The phased rollout means Phase 0-1 (centralized model, read-only) can ship first, with distributed ensemble features enabled at each phase transition via governance vote.
Description
This is the largest remaining feature — the distributed mesh LLM described in the whitepaper and issue #27. The mesh LLM is an inter-model Mixture-of-Experts system where GPU donor nodes each run a small language model, a distributed router selects K-of-N experts per token, and the system self-prompts to improve the cluster.
This issue supersedes #27 and provides the detailed implementation breakdown.
Architecture (from whitepaper)
Three uses (from #27)
Fractal scaling (from #27)
Components (from spec Phase 9, T111-T119)
src/agent/mesh_llm/router.rs): K-of-N expert selection per token, LLaMA-3 tokenizersrc/agent/mesh_llm/expert.rs): registration, health tracking, capacity reportingsrc/agent/mesh_llm/aggregator.rs): sparse logit aggregation, weighted average, samplingsrc/agent/mesh_llm/self_prompt.rs): autonomous agent generating improvement taskssrc/agent/mesh_llm/subset.rs): independent parallel agent subsets for concurrent taskssrc/agent/mesh_llm/safety.rs): action tier classification, governance kill switchproto/mesh_llm.proto): RegisterExpert, GetRouterStatus, SubmitSelfTask, HaltMeshAction tiers (from whitepaper)
Phased rollout
Requirements
Success Criteria
Testing (Principle V)
Notes
This is a major undertaking that should be broken into sub-tasks during planning. The phased rollout means Phase 0-1 (centralized model, read-only) can ship first, with distributed ensemble features enabled at each phase transition via governance vote.
References:
research/09-mesh-llm.mdresearch/10-prior-art-distributed-inference.md