Skip to content

dev4pgh/pittsburghese-training

Repository files navigation

Pittsburghese Training

Training code and dataset for a small model that rewrites standard American English into Pittsburghese.

This repository contains the training data, fine-tuning scripts, local inference scripts, and browser export scripts for the project.

Base model: Qwen/Qwen2.5-0.5B-Instruct

Repository layout

data/
  pittsburghese_dataset_manual_edit.jsonl
  pittsburghese_dataset_expansion_batch1.jsonl
  pittsburghese_dataset_expansion_batch2_long.jsonl
  pittsburghese_dataset_expansion_batch4_literal_preservation.jsonl
  pittsburghese_dataset_expansion_batch5_grammar_preservation.jsonl

phase1_prep_dataset_prompt_completion.py
phase2_finetune_prompt_completion.py
phase3_inference_prompt_completion.py
phase4_export_web_prompt_completion.py

requirements.txt

Setup

Install PyTorch for your system first, then install the Python dependencies:

pip install -r requirements.txt

This repo is currently working with:

  • transformers==4.57.6
  • huggingface_hub==0.36.2

Run order

1. Prepare the dataset

python phase1_prep_dataset_prompt_completion.py

This creates:

pittsburghese_hf_prompt_completion_dataset/

2. Fine-tune the model

python phase2_finetune_prompt_completion.py

This creates:

pittsburghese-lora-prompt-completion/
pittsburghese-merged-prompt-completion/

3. Test locally

Interactive mode:

python phase3_inference_prompt_completion.py

Batch mode:

python phase3_inference_prompt_completion.py --batch

Validation set check:

python phase3_inference_prompt_completion.py --eval

Compare against the base model:

python phase3_inference_prompt_completion.py --compare "Please clean up the kitchen before the guests arrive."

4. Export for the browser

python phase4_export_web_prompt_completion.py

This creates:

pittsburghese-web/

Notes

This repository is for training and export.

Generated artifacts such as merged model weights, LoRA outputs, dataset directories, and ONNX exports are better kept out of Git history and published separately where appropriate.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages