Fleet AI — Real-Time ML Inference System

Production-style REST API for serving the Fleet AI tyre defect detection model. Built to show what happens after training — how you actually get a model into a state where it can handle real traffic reliably.

What this is

Most ML projects stop at training. This one goes further — it wraps the trained MobileNetV2 model in a proper API server with:

Sub-100ms inference latency on CPU
Request logging with unique IDs and timestamps
Live latency metrics (avg, p95, p99)
Load testing to prove it under concurrent traffic
A real-time monitoring dashboard
Docker container for reproducible deployment

Performance (from load test)

Metric	Result
Avg inference latency	15.87ms
p95 latency	21.8ms
p99 latency	31.44ms
Requests/sec (20 users)	13–15 req/s
Total requests served	375
Success rate	100%
Model load time	72ms

Screenshots

API endpoints

Method	Endpoint	Description
GET	`/health`	Server health, uptime, CPU/memory
POST	`/predict`	Single tyre image → prediction + confidence
POST	`/predict/batch`	Multiple images → results + fleet summary
GET	`/metrics`	Latency stats, success rate, prediction distribution
GET	`/metrics/logs`	Last N request logs
GET	`/model/info`	Model architecture and training metadata
GET	`/docs`	Auto-generated Swagger UI

Quickstart

git clone https://github.com/vanshk3/ml-inference
cd ml-inference
pip install -r requirements.txt

Copy your trained model weights from Fleet AI:

mkdir models
cp ../fleet_ai/models/best_model.pth models/
cp ../fleet_ai/models/metadata.json models/

Start the server:

uvicorn app.server:app --host 0.0.0.0 --port 8000

Open Swagger docs at: http://localhost:8000/docs

Run the monitoring dashboard

streamlit run monitoring/dashboard.py

Opens at localhost:8501 — shows live latency, prediction distribution, and request logs.

Load testing

locust -f tests/load_test.py --host=http://localhost:8000 \
       --users 20 --spawn-rate 5 --run-time 30s --headless

Or open the Locust UI at localhost:8089 for interactive control.

Docker

docker build -t fleet-ai-inference .
docker run -p 8000:8000 -v $(pwd)/models:/app/models fleet-ai-inference

Project structure

ml_inference/
├── app/
│   └── server.py          FastAPI server — all endpoints
├── core/
│   ├── model.py           model loading, singleton registry, inference
│   └── logger.py          request logging, stats aggregation
├── monitoring/
│   └── dashboard.py       Streamlit real-time dashboard
├── tests/
│   └── load_test.py       Locust load testing
├── models/                put best_model.pth and metadata.json here
├── logs/                  auto-created, stores requests.jsonl
├── Dockerfile
└── requirements.txt

Tech stack

Python · FastAPI · PyTorch · Uvicorn · Locust · Streamlit · Docker · psutil

Built a production-style ML inference API serving the Fleet AI tyre defect model — FastAPI + Uvicorn, model loads in 72ms; achieved 13–15 req/s at avg 15.87ms latency (p99 31ms) under 20 concurrent users with 100% success rate across 375 requests; includes real-time monitoring dashboard, request logging, and Docker containerisation.

Built by Vansh — MSc Data Science, University of Bath Part of the Fleet AI project.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
core		core
monitoring		monitoring
screenshots		screenshots
tests		tests
.gitignore		.gitignore
DockerFile		DockerFile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fleet AI — Real-Time ML Inference System

What this is

Performance (from load test)

Screenshots

API endpoints

Quickstart

Run the monitoring dashboard

Load testing

Docker

Project structure

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fleet AI — Real-Time ML Inference System

What this is

Performance (from load test)

Screenshots

API endpoints

Quickstart

Run the monitoring dashboard

Load testing

Docker

Project structure

Tech stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages