BlueOS

A GPU-first user-space inference runtime that orchestrates multiple frontier LLMs on a single consumer machine.

BlueOS treats your GPU as the primary processor and your CPU as an I/O coprocessor. Four specialized models (1T, 685B, 80B params) collaborate through shared natural-language memory on hardware with only 16GB VRAM — no cloud, no API keys, no subscription.

Rust orchestration (12K LOC) + C++/CUDA backend (5.7K LOC), targeting Linux userspace.

Why BlueOS Exists

Commodity hardware is massively underutilized for LLM inference. A single RTX 4080 has 700 GB/s memory bandwidth, 16,384 CUDA cores, and tensor cores designed for matrix math — but generic OS abstractions treat GPUs as a peripheral device behind a slow PCIe bus. The result: most consumer hardware achieves 10-20% of its theoretical throughput during inference.

The insight: treat the GPU as the primary compute engine and the CPU as an I/O coprocessor, not the other way around. This is the same architecture that made the N64 RCP revolutionary — dedicate the fastest silicon to the hot path, and use everything else to keep it fed. BlueOS runs entirely in userspace (no kernel modules), maximizing performance within SCHED_DEADLINE + mlock + CUDA streams.

The second insight: one model doing everything is worse than four models each doing one thing. A 1T-parameter thinker reasons about the problem. A 685B coder writes the solution. An 80B reviewer finds the bugs. The coder fixes them. They communicate through a shared natural-language memory called the Thinking Buffer — each model reads enriched context from prior stages without knowing other models exist.

Architecture

System Layer Stack

The full BlueOS stack, from hardware to API surface. Data flows bottom-up during boot, top-down during inference.

graph TB
    subgraph API["Layer 4: Inference API"]
        GEN["blue_generate()"]
        STREAM["blue_generate_stream_poll()"]
        CANCEL["blue_cancel()"]
        KVAPI["blue_kv_save / blue_kv_load"]
    end

    subgraph CASCADE["Layer 3: Cascade Pipeline"]
        TB["Thinking Buffer<br/>(named slots)"]
        SPEC["Speculative Decoding<br/>(Qwen3-1.5B draft)"]
        STAGES["4-Stage Cascade<br/>Think → Code → Review → Fix"]
        ENTROPY["Entropy Monitor<br/>(skip / trim / budget)"]
    end

    subgraph MEM["Layer 2: Memory Subsystem"]
        T0["Tier 0: VRAM<br/>16GB · 700 GB/s"]
        T1["Tier 1: Pinned RAM<br/>48GB · 38 GB/s DDR5"]
        T2["Tier 2: NVMe RAID-0<br/>4TB · 28 GB/s"]
        DMA["DMA Engine<br/>(A-B double buffer)"]
        UMH["Unified Memory Hub"]
    end

    subgraph SCHED["Layer 1: HCAL Scheduler"]
        HCAL["Heterogeneous Compute<br/>Abstraction Layer"]
        PCORES["P-cores 0-7<br/>(attention, orchestration)"]
        ECORES["E-cores 8-23<br/>(data prep, tokenization)"]
        GPU["GPU sm_89<br/>(FFN, MoE, dequant)"]
    end

    subgraph HAL["Layer 0: Hardware Abstraction"]
        PCIE["PCIe Topology<br/>Gen4 x16 · 25 GB/s"]
        GDS["GDS / pread() fallback"]
        CPUDET["CPU Feature Detection<br/>(AVX2, FMA, AMX)"]
        BLUEIR["Blue-IR 6-op ISA<br/>(sparsity fast-path)"]
    end

    subgraph HW["Hardware"]
        HWCPU["i9-13900F<br/>8P + 16E cores"]
        HWGPU["RTX 4080 16GB<br/>sm_89 · Ada Lovelace"]
        HWRAM["64GB DDR5-5600"]
        HWNVM["4× NVMe RAID-0"]
    end

    API --> CASCADE
    CASCADE --> MEM
    MEM --> SCHED
    SCHED --> HAL
    HAL --> HW

    classDef compute fill:#2563eb,stroke:#1e40af,color:#fff
    classDef memory fill:#059669,stroke:#047857,color:#fff
    classDef io fill:#d97706,stroke:#b45309,color:#fff
    classDef hw fill:#6b7280,stroke:#4b5563,color:#fff

    class GEN,STREAM,CANCEL,KVAPI,TB,SPEC,STAGES,ENTROPY,HCAL,PCORES,ECORES,GPU compute
    class T0,T1,T2,DMA,UMH memory
    class PCIE,GDS,CPUDET,BLUEIR io
    class HWCPU,HWGPU,HWRAM,HWNVM hw

Memory Hierarchy (TV-VRAM)

Three-tier virtual VRAM makes 16GB act like 64GB+. A-B double-buffering hides PCIe latency: while the GPU computes on Buffer A, the next layer's weights stream into Buffer B via DMA. Compute time per layer (~10ms) exceeds DMA time (~8ms), so the GPU never stalls.

graph LR
    subgraph TIER0["Tier 0 — GPU VRAM (16GB)"]
        ACT["Activations<br/>KV Cache"]
        BUF_A["Buffer A<br/>(computing)"]
        BUF_B["Buffer B<br/>(DMA filling)"]
        DRAFT["Draft Model<br/>(permanent)"]
    end

    subgraph TIER1["Tier 1 — Pinned Host RAM (48GB)"]
        WARM["Warm Weights<br/>(queued models)"]
        PREFETCH["Prefetched Experts<br/>(MoE co-activation)"]
        POOL["MemAscend Pools<br/>(adaptive sizing)"]
    end

    subgraph TIER2["Tier 2 — NVMe RAID-0 (4TB)"]
        GGUF["GGUF Model Files"]
        KVSNAPSHOT["KV Cache Snapshots"]
        COLD["Cold Weights"]
    end

    TIER2 -- "28 GB/s<br/>io_uring async" --> TIER1
    TIER1 -- "25 GB/s<br/>PCIe DMA" --> TIER0
    BUF_A <-.-> BUF_B

    classDef hot fill:#dc2626,stroke:#b91c1c,color:#fff
    classDef warm fill:#d97706,stroke:#b45309,color:#fff
    classDef cold fill:#2563eb,stroke:#1e40af,color:#fff

    class ACT,BUF_A,BUF_B,DRAFT hot
    class WARM,PREFETCH,POOL warm
    class GGUF,KVSNAPSHOT,COLD cold

Thinking Buffer Cascade

The central innovation. Four specialized models collaborate through shared natural-language memory. Each stage reads from previous slots, writes to its own, and the entropy monitor decides whether downstream stages can be skipped.

graph LR
    TASK["Task Input"]

    subgraph S1["Stage 1: THINKER"]
        M1["Kimi K2<br/>(1T MoE, 62B active)"]
    end

    subgraph S2["Stage 2: CODER"]
        M2["DeepSeek V3.2<br/>(685B MoE, 37B active)"]
    end

    subgraph S3["Stage 3: REVIEWER"]
        M3["Dense 80B"]
    end

    subgraph S4["Stage 4: FIXER"]
        M4["DeepSeek V3.2<br/>(685B MoE, 37B active)"]
    end

    OUTPUT["Final Output"]

    TASK -- "task" --> S1
    S1 -- "reasoning" --> S2
    S2 -- "code_draft" --> S3
    S3 -- "review" --> S4
    S4 -- "final_code" --> OUTPUT

    ENT{"Entropy<br/>Monitor"}
    ENT -. "H < threshold<br/>→ skip stage" .-> S3
    ENT -. "H < threshold<br/>→ skip stage" .-> S4

    classDef think fill:#7c3aed,stroke:#6d28d9,color:#fff
    classDef code fill:#2563eb,stroke:#1e40af,color:#fff
    classDef review fill:#059669,stroke:#047857,color:#fff
    classDef entropy fill:#dc2626,stroke:#b91c1c,color:#fff

    class M1 think
    class M2,M4 code
    class M3 review
    class ENT entropy

Slot rules: Each model sees only the Thinking Buffer slots it's configured to read. No model knows about the others — they just see progressively enriched context. The entropy monitor tracks output entropy per-token: if a stage produces low-entropy output (high confidence), downstream stages that would add minimal value are skipped.

FFI Boundary

Rust handles orchestration, scheduling, and memory management. C++/CUDA handles kernel launches and raw GPU compute. They meet at a strict C ABI boundary — no C++ types cross the FFI.

graph TB
    subgraph RUST["Rust Side (blueos/)"]
        RAPI["API Surface<br/>CascadeExecutor, ThinkingBuffer"]
        RSCHED["HCAL Scheduler"]
        RMEM["Memory Manager<br/>TV-VRAM, DMA, UMH"]
        RSPEC["Speculative Engine"]
        RFFI["common/ffi.rs<br/>extern C declarations<br/>#[repr(C)] structs"]
    end

    subgraph FFI["FFI Boundary — C ABI"]
        HEADER["runtime.h<br/>Single source of truth"]
        TYPES["Only: i32, u32, f32, *const u8,<br/>*mut opaque handles"]
    end

    subgraph CPP["C++ / CUDA Side (sovereign/blue/)"]
        RUNTIME["BlueRuntime<br/>blue_pool_*, blue_generate"]
        MPOOL["Model Pool<br/>weight switching"]
        KERN["CUDA Kernels (sm_89)<br/>fused_dequant_gemv<br/>fused_rmsnorm<br/>sparse_ffn<br/>async_prefetch"]
        LLAMA["llama.cpp / ggml<br/>tensor ops foundation"]
    end

    RUST --> FFI
    FFI --> CPP

    classDef rust fill:#d97706,stroke:#b45309,color:#fff
    classDef ffi fill:#6b7280,stroke:#4b5563,color:#fff
    classDef cpp fill:#2563eb,stroke:#1e40af,color:#fff

    class RAPI,RSCHED,RMEM,RSPEC,RFFI rust
    class HEADER,TYPES ffi
    class RUNTIME,MPOOL,KERN,LLAMA cpp

Inference Data Flow

What happens on a single blue_generate() call, end to end.

graph TB
    REQ["Inference Request"]
    SCHED["HCAL: select execution target"]
    CHECK{"Weights in<br/>VRAM?"}
    HIT["Proceed to compute"]
    MISS["Tier 2→1→0 DMA pipeline<br/>(A-B double buffer)"]
    DRAFT["Stage 1: Draft tokens<br/>(Qwen3-1.5B on E-cores)"]
    VERIFY["Stage 2: Verify batch<br/>(target model, 1 forward pass)"]
    ACCEPT{"Tokens<br/>accepted?"}
    COMMIT["Commit to KV cache"]
    REJECT["Rewind, reduce K"]
    EXTEND["Extend generation"]
    STREAM["Stream tokens to caller<br/>(ring buffer, per-token entropy)"]
    DONE["Response complete"]

    REQ --> SCHED
    SCHED --> CHECK
    CHECK -- "hit" --> HIT
    CHECK -- "miss" --> MISS
    MISS --> HIT
    HIT --> DRAFT
    DRAFT --> VERIFY
    VERIFY --> ACCEPT
    ACCEPT -- "yes" --> COMMIT
    ACCEPT -- "no" --> REJECT
    REJECT --> DRAFT
    COMMIT --> EXTEND
    EXTEND --> STREAM
    STREAM --> DONE

    classDef hot fill:#dc2626,stroke:#b91c1c,color:#fff
    classDef compute fill:#2563eb,stroke:#1e40af,color:#fff
    classDef mem fill:#059669,stroke:#047857,color:#fff

    class DRAFT,VERIFY,EXTEND compute
    class MISS,COMMIT mem
    class STREAM,DONE hot

Key Design Decisions

User-space over kernel module. No insmod, no root required, no kernel version coupling. SCHED_DEADLINE + mlock + io_uring give us everything we need from userspace. The blast radius of a bug is one process, not a kernel panic.
Rust + C++/CUDA split at FFI boundary. Rust owns orchestration (scheduling, memory management, cascade logic) because those are complex state machines where memory safety matters. C++/CUDA owns GPU compute because CUDA's C++ API is the only way to access tensor cores, shared memory, and warp-level primitives. The FFI boundary is a strict C ABI — no C++ types, no Rust types, just i32, f32, and opaque pointers.
Tiered virtual VRAM over unified memory. CUDA Unified Memory hides the memory hierarchy, which means you can't optimize for it. TV-VRAM makes the tiers explicit: Tier 0 (VRAM, 700 GB/s) for hot data, Tier 1 (pinned RAM, 38 GB/s) for warm weights, Tier 2 (NVMe, 28 GB/s) for cold storage. A-B double buffering ensures the GPU never stalls waiting for PCIe.
Speculative decoding cascade over standard autoregressive. A small draft model (Qwen3-1.5B, always resident in VRAM) generates K candidate tokens. The target model verifies them in a single forward pass. Mathematically lossless — identical output distribution. 2-4x throughput increase at zero quality cost. Adaptive K tracks acceptance rate and adjusts speculation depth.
Four specialized models over one generalist. A 1T thinker with full attention on reasoning outperforms a generalist splitting attention between reasoning, coding, reviewing, and fixing. Each stage builds on verified prior work in the Thinking Buffer rather than maintaining everything in one attention window.
PCIe topology detection at boot. Not all NVMe drives have equal latency — CPU-direct drives bypass the PCH, saving ~2µs per I/O. BlueOS enumerates PCIe topology at boot, identifies which drives are CPU-direct vs. PCH-routed, and enables GPUDirect Storage only where the topology actually supports it. If GDS isn't available, it falls back to pread() without crashing.

Hardware Target

Component	Spec	Role in BlueOS
CPU	i9-13900F (8P + 16E cores)	P-cores: attention, orchestration. E-cores: tokenization, data prep
GPU	RTX 4080 16GB (sm_89, Ada Lovelace)	Expert FFN, dequantization, MoE compute. 700 GB/s VRAM bandwidth
RAM	64GB DDR5-5600	Tier 1 warm weights, prefetched experts. 38 GB/s bandwidth
Storage	4× NVMe RAID-0	Tier 2 cold storage, GGUF files. 28 GB/s aggregate
PCIe	Gen 4 x16	GPU ↔ RAM highway. 25 GB/s effective

Bandwidth Cheat Sheet

GPU VRAM internal:    ~700 GB/s    (Tier 0 — activations live here)
DDR5 RAM:             ~38  GB/s    (Tier 1 — warm weights)
NVMe RAID-0:          ~28  GB/s    (Tier 2 — cold storage)
PCIe Gen4 x16:        ~25  GB/s    (GPU ↔ RAM transfer)
L2 cache (GPU):       ~3   TB/s    (on-chip, exploited by fused kernels)
L3 cache (CPU):       ~300 GB/s    (36MB shared, used by CPU attention)

Project Structure

blueos/                          Rust workspace root
├── Cargo.toml                   Workspace configuration
├── common/                      Shared types: tensors, GPU, entropy, FFI declarations
│   └── src/
│       ├── entropy.rs           Shannon entropy, KL divergence, mutual information
│       ├── ffi.rs               All extern "C" FFI bindings (mirrors runtime.h)
│       ├── tensor.rs            Tensor primitives and quantization types
│       ├── gpu.rs               GPU device types, topology, compute scoring
│       ├── memory.rs            Memory region types, VRAM tier definitions
│       └── error.rs             BlueError types (thiserror)
├── boot/                        6-phase hardware init → runtime handoff
│   └── src/
│       ├── main.rs              Boot sequence entry point
│       ├── cpu_isolation.rs     Core pinning, IRQ steering, power lockdown
│       ├── hal.rs               Hardware abstraction layer
│       └── pcie_topology.rs     PCIe BDF enumeration, GDS capability gating
├── kernel/                      Resource management (not a real OS kernel)
│   └── src/
│       ├── memory/
│       │   ├── tv_vram.rs       Three-tier virtual VRAM manager
│       │   ├── dma_engine.rs    A-B double-buffered DMA transfers
│       │   ├── umh.rs           Unified Memory Hub (global address space)
│       │   ├── pinned_pool.rs   Pre-allocated pinned memory slabs
│       │   ├── expert_prefetch.rs  MoE expert prefetch prediction
│       │   ├── thrash_detector.rs  Detects tier-thrashing patterns
│       │   └── ...              Tier managers, victim selection, demand paging
│       ├── scheduler/
│       │   ├── hcal.rs          HCAL: heterogeneous compute scheduler
│       │   ├── batch.rs         Non-preemptive batch execution
│       │   └── sharding.rs      Proportional work distribution
│       └── arch/mod.rs          CPU feature detection (AVX2, AVX-512, AMX)
├── runtime/                     Multi-model inference orchestration
│   └── src/
│       ├── cascade.rs           Cascade pipeline executor (Think→Code→Review→Fix)
│       ├── thinking_buffer.rs   Shared natural-language memory (named slots)
│       ├── engine.rs            MultiModelEngine — FFI wrapper around BlueRuntime
│       ├── speculative.rs       Universal draft engine (adaptive K, KL tracking)
│       ├── kv_cache.rs          KV cache library and persistence
│       ├── pipelines.rs         Pre-built cascade templates
│       ├── wal.rs               Write-ahead log for Thinking Buffer durability
│       └── profiler.rs          Per-stage timing and telemetry
├── bluefs/                      Weight-optimized I/O
│   └── src/
│       ├── gguf_parser.rs       GGUF format parsing
│       ├── weight_store.rs      Weight streaming and caching
│       └── entropy_stream.rs    Entropy-aware I/O prioritization
└── blueir/                      Custom micro-ISA for sparsity fast-path
    └── src/
        ├── isa.rs               6-instruction ISA: LOAD, STORE, SPARSE, DEQUANT, FMAC, JUMP
        ├── nasm_lower.rs        Compiles Blue-IR to x86-64 via NASM
        └── kernels/dequant.asm  Hand-written dequantization kernel

sovereign/                       C++/CUDA inference backend
├── CMakeLists.txt               Build system (CUDA sm_89, llama.cpp integration)
├── Makefile                     Convenience build/serve/bench targets
├── llama.cpp                    Git submodule — ggml tensor ops foundation
└── blue/
    ├── runtime.h                PUBLIC C API — single source of truth for FFI
    ├── runtime.cpp              API implementation
    ├── model_pool.h/.cpp        Multi-model registry, weight switching, VRAM budgets
    ├── speculative.h/.cpp       Draft engine: adaptive K, verification, KL divergence
    ├── cpu_attention.h/.cpp     CPU-side attention using AVX2+FMA on P-cores
    ├── cpu_gpu_split.h/.cpp     Parallel CPU ∥ GPU execution coordinator
    ├── kernel_dispatch.h/.cpp   Custom CUDA kernel router
    ├── dma_bridge.h/.cpp        DMA transfer management
    ├── gds_bridge.cpp           GPUDirect Storage (NVMe→VRAM bypass)
    ├── streaming.cpp            Token streaming ring buffer
    ├── watchdog.cpp             Generation timeout watchdog
    ├── memory/
    │   ├── tv_vram.cpp          Three-tier memory (C++ side)
    │   └── pinned_pool.cpp      Pre-allocated pinned CUDA memory slabs
    ├── kernels/
    │   ├── fused_dequant_gemv.cu    INT4→FP16 during GEMV (fused)
    │   ├── fused_dequant_gemm_tc.cu Tensor core GEMM with inline dequant
    │   ├── fused_rmsnorm.cu         Fused RMSNorm + scale
    │   ├── sparse_ffn.cu            Skip inactive MoE experts
    │   ├── async_prefetch.cu        Background weight DMA
    │   ├── verify_crc.cu            CRC32 DMA integrity verification
    │   └── test_correctness.cu      Kernel correctness tests
    └── cli/main.cpp             CLI entry point

tools/
└── nanoquant/admm_ptq.py        ADMM-based post-training quantization

build.sh                         Unified build script (CMake → Cargo, mock/native modes)

Building

Prerequisites

Rust toolchain (stable, 2021 edition) — rustup
CMake 3.18+
C++17 compiler (GCC 11+ or Clang 14+)
CUDA Toolkit 12.0+ (optional — for GPU kernels, targets sm_89)
NASM assembler (optional — for Blue-IR compiled kernels)

Build Commands

# Full build: C++ sovereign runtime → Rust workspace (release)
./build.sh

# Rust only (mock mode — no GPU required, all tests pass)
./build.sh rust-only

# C++ only (sovereign runtime + llama.cpp)
./build.sh cpp-only

# Debug build
./build.sh debug

# Run all tests
./build.sh test

# Clean everything
./build.sh clean

The build system auto-detects CUDA availability. If nvcc is not found, it builds without GPU kernel support. If libblueruntime.a is not found, the Rust workspace builds in mock mode — all orchestration logic is testable without a GPU.

Two-Phase Build

CMake compiles CUDA kernels (.cu) → links llama.cpp + BlueRuntime → produces libblueruntime.a
Cargo detects libblueruntime.a → enables native feature; if missing → builds in mock mode

Both modes compile and pass tests.

Current Status

Module	Status	Notes
Boot sequence (6-phase init)	✅ Complete	PCIe topology, CPU isolation, power lockdown
TV-VRAM (3-tier memory)	✅ Complete	A-B double buffering, tier management
DMA engine	✅ Complete	Async transfers, CRC32 verification
Unified Memory Hub	✅ Complete	Global address space across tiers
HCAL scheduler	✅ Complete	Compute scoring, proportional sharding
Cascade pipeline	✅ Complete	4-stage Think→Code→Review→Fix
Thinking Buffer	✅ Complete	Named slots, WAL, context assembly
Speculative decoding	✅ Complete	Adaptive K, KL divergence tracking
Entropy monitor	✅ Complete	Skip logic, budget trimming
KV cache persistence	✅ Complete	Save/load across model switches
Blue-IR (6-op ISA)	✅ Complete	NASM lowering to x86-64
CUDA kernels (6 kernels)	✅ Complete	fused_dequant_gemv, rmsnorm, sparse_ffn, etc.
C API (runtime.h)	✅ Complete	Full FFI surface, streaming, hooks
Expert prefetch prediction	✅ Complete	Co-activation matrix, MI scoring
Mock mode (GPU-free testing)	✅ Complete	All Rust tests pass without C++ backend
GPUDirect Storage	🚧 In Progress	GDS bridge exists, topology gating implemented
Preemptive GPU scheduling	📋 Planned	`blue_abort_stream()` for entropy-triggered preemption
Continuous training context	📋 Planned	DPO from cascade traces, LoRA fine-tune on idle
Graph-aware VRAM scheduler	📋 Planned	Timeline-driven DMA pre-scheduling
Blue-IR expansion (~25 ops)	📋 Planned	tinygrad-aligned universal compilation target

Three Laws of BlueOS

These are architectural invariants, not guidelines. Every commit, every optimization, every design decision is measured against them.

1. Zero Degradation — No abstraction may reduce output quality below the raw model baseline. Speculative decoding is mathematically lossless. Entropy-based skipping only removes redundant stages. If you can't prove losslessness, it doesn't ship.

2. Specialization > Generalization — Four focused models each doing one thing well outperform one giant model doing everything. The Thinking Buffer Cascade is the central innovation. Each stage has full attention on ONE concern: reasoning, coding, reviewing, or fixing.

3. Hardware-First — Software is shaped by the silicon. Pin CPU cores. Steer interrupts. Manage VRAM tiers. Fuse CUDA kernels. Know your bandwidth numbers. Generic abstractions that hide hardware are the enemy. Measure the silicon, then write the code.

Roadmap

Blue-IR Expansion — Expand from 6 ops to ~25 tinygrad-aligned ops. Blue-IR becomes a universal compilation target for compute hardware. New backends: PTX (sm_89), future RDNA/Metal.
Graph-Aware VRAM Manager — Replace reactive LRU eviction with planned timeline-driven scheduling. Pre-schedule all DMA transfers based on the deterministic cascade plan. Zero OOM surprises.
Preemptive GPU Scheduling — When the entropy monitor detects degenerate output, kill the CUDA stream, flush partial KV to checkpoint, and advance to the next cascade stage.
Continuous Training Context — Capture cascade traces as DPO training data (code_draft = rejected, final_code = chosen). LoRA fine-tune the draft model during idle time. The system improves with use.
Scheduling Hint API — Lock-free SPSC channel from entropy monitor to HCAL. Hint types: LOW_ENTROPY, HIGH_ENTROPY, EARLY_EXIT, PREFETCH_NOW, SKIP_STAGE.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
blueos		blueos
sovereign		sovereign
tools/nanoquant		tools/nanoquant
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BlueOS

Why BlueOS Exists

Architecture

System Layer Stack

Memory Hierarchy (TV-VRAM)

Thinking Buffer Cascade

FFI Boundary

Inference Data Flow

Key Design Decisions

Hardware Target

Bandwidth Cheat Sheet

Project Structure

Building

Prerequisites

Build Commands

Two-Phase Build

Current Status

Three Laws of BlueOS

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BlueOS

Why BlueOS Exists

Architecture

System Layer Stack

Memory Hierarchy (TV-VRAM)

Thinking Buffer Cascade

FFI Boundary

Inference Data Flow

Key Design Decisions

Hardware Target

Bandwidth Cheat Sheet

Project Structure

Building

Prerequisites

Build Commands

Two-Phase Build

Current Status

Three Laws of BlueOS

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages