A neural network learns to play a 2D platformer in real time, entirely in the browser. Training, inference, and visualization run client-side via TensorFlow.js on WebGPU.
Live Demo · Hugging Face Space
A small agent spawns on a procedurally-generated Mario-style level and learns from scratch via REINFORCE policy gradients. You can watch the neural activations fire in real time as it figures out how to jump gaps and reach the goal flag.
Two swappable architectures (CoFrNet is the default):
- CoFrNet — a continued-fraction network (Puri et al., NeurIPS 2021): parallel ladders of nested fractions with pole-safe reciprocals. Comes with a dedicated ladder-rung visualization and live feature-attribution panel. Interpretable by construction — you can read which input features drive each action directly from the weights.
- MLP — a classic two-layer perceptron (24 → 64 → 4) with tanh activation and a live node-and-edge visualization.
Requires a browser with WebGPU support (Chrome/Edge 113+).
npm install
npm run devYou should see backend=webgpu in the footer. After a few minutes the agent should start reliably reaching the flag.
npm test # unit tests (vitest, CPU backend)
npm run build # typecheck + production bundlesrc/
├── game/
│ ├── level.ts # seeded procedural level generation
│ ├── world.ts # physics, obs encoding (24-dim), rewards
│ └── render.ts # Three.js Mario-themed renderer
├── net/
│ ├── policy.ts # Policy interface + Activations union type
│ ├── mlp.ts # 2-layer MLP (~1,860 params)
│ ├── cofrnet.ts # CoFrNet-F (4 ladders × depth 8, ~900 params)
│ └── reinforce.ts # REINFORCE with Adam optimizer
├── viz/
│ ├── neurons.ts # MLP activation viz (canvas 2D)
│ └── cofrnet.ts # CoFrNet ladder viz + feature attribution
├── ui/
│ ├── stats.ts # reward/loss sparklines + controls
│ └── info.ts # expandable info panel
├── main.ts # boot + RAF loop
└── styles.css
Data flow per frame:
world.obs() → net.forward(obs) → { probs, activations }
├─ viz.draw(activations)
├─ sample(probs) → next action
└─ trainer.record(obs, action, reward)
on episode done: trainer.endEpisode() → gradient step
docs/design.md— overall architecture and rationaledocs/cofrnet.md— CoFrNet integration design