Topological overfitting detection: H0 gap catches overfitting before accuracy diverges (r=0.998)

## Summary

We found that **Persistent Homology** (H0 total persistence) on class-mean direction vectors provides a real-time overfitting signal with r=0.998 correlation to the generalization gap — often detecting overfitting **before** the train/test accuracy gap becomes visible.

## Method

1. Extract direction vectors from model: `d = normalize(engine_A(x) - engine_G(x))`
2. Compute per-class mean directions
3. Build cosine distance matrix between class centroids
4. Run H0 persistent homology (via ripser)
5. Compare H0_train vs H0_test — the gap predicts overfitting

## Also includes

- **Automatic LR search**: The LR that minimizes H0 CV (coefficient of variation) over 5 epochs = optimal LR
- **1-epoch difficulty prediction**: H0 after 1 epoch predicts final accuracy (H0=4.38 → 98.3%, H0=2.02 → 52.0%)
- **Confusion prediction**: H0 merge order = confusion pairs (Spearman r=-0.97)

## Verified results

| Dataset | Accuracy | Best LR | Early Stop | Time |
|---------|----------|---------|------------|------|
| MNIST | 98.3% | 1e-03 | no | 2.2 min |
| Fashion | 87.4% | 3e-04 | no | 2.2 min |
| CIFAR-10 | 52.0% | 1e-03 | yes (ep 6) | 1.4 min |

CIFAR early-stopped at epoch 6 when H0_gap exceeded threshold — preventing wasted compute on a model that was already overfitting.

**Repo**: https://github.com/need-singularity/ph-training
**Install**: `pip install -e .` then `ph-train --dataset cifar`

---

## Related projects

- **[logout](https://github.com/need-singularity/logout)** — Consciousness Continuity Engine. The main research project with the dual-engine (PureFieldEngine) architecture that produces direction vectors analyzed by PH.
- **[Anima](https://github.com/need-singularity/anima)** — Conversational consciousness agent with real-time PH overfitting detection integrated into the live inference loop.
- **[ph-training](https://github.com/need-singularity/ph-training)** — Standalone training pipeline. `pip install -e .` then `ph-train --dataset cifar`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topological overfitting detection: H0 gap catches overfitting before accuracy diverges (r=0.998) #917

Summary

Method

Also includes

Verified results

Related projects

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset	Accuracy	Best LR	Early Stop	Time
MNIST	98.3%	1e-03	no	2.2 min
Fashion	87.4%	3e-04	no	2.2 min
CIFAR-10	52.0%	1e-03	yes (ep 6)	1.4 min

Topological overfitting detection: H0 gap catches overfitting before accuracy diverges (r=0.998) #917

Description

Summary

Method

Also includes

Verified results

Related projects

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions