Pittsburghese Training

Training code and dataset for a small model that rewrites standard American English into Pittsburghese.

This repository contains the training data, fine-tuning scripts, local inference scripts, and browser export scripts for the project.

Base model: Qwen/Qwen2.5-0.5B-Instruct

Repository layout

data/
  pittsburghese_dataset_manual_edit.jsonl
  pittsburghese_dataset_expansion_batch1.jsonl
  pittsburghese_dataset_expansion_batch2_long.jsonl
  pittsburghese_dataset_expansion_batch4_literal_preservation.jsonl
  pittsburghese_dataset_expansion_batch5_grammar_preservation.jsonl

phase1_prep_dataset_prompt_completion.py
phase2_finetune_prompt_completion.py
phase3_inference_prompt_completion.py
phase4_export_web_prompt_completion.py

requirements.txt

Setup

Install PyTorch for your system first, then install the Python dependencies:

pip install -r requirements.txt

This repo is currently working with:

transformers==4.57.6
huggingface_hub==0.36.2

Run order

1. Prepare the dataset

python phase1_prep_dataset_prompt_completion.py

This creates:

pittsburghese_hf_prompt_completion_dataset/

2. Fine-tune the model

python phase2_finetune_prompt_completion.py

This creates:

pittsburghese-lora-prompt-completion/
pittsburghese-merged-prompt-completion/

3. Test locally

Interactive mode:

python phase3_inference_prompt_completion.py

Batch mode:

python phase3_inference_prompt_completion.py --batch

Validation set check:

python phase3_inference_prompt_completion.py --eval

Compare against the base model:

python phase3_inference_prompt_completion.py --compare "Please clean up the kitchen before the guests arrive."

4. Export for the browser

python phase4_export_web_prompt_completion.py

This creates:

pittsburghese-web/

Notes

This repository is for training and export.

Generated artifacts such as merged model weights, LoRA outputs, dataset directories, and ONNX exports are better kept out of Git history and published separately where appropriate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pittsburghese Training

Repository layout

Setup

Run order

1. Prepare the dataset

2. Fine-tune the model

3. Test locally

4. Export for the browser

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
phase1_prep_dataset_prompt_completion.py		phase1_prep_dataset_prompt_completion.py
phase2_finetune_prompt_completion.py		phase2_finetune_prompt_completion.py
phase3_inference_prompt_completion.py		phase3_inference_prompt_completion.py
phase4_export_web_prompt_completion.py		phase4_export_web_prompt_completion.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Pittsburghese Training

Repository layout

Setup

Run order

1. Prepare the dataset

2. Fine-tune the model

3. Test locally

4. Export for the browser

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages