Case Parser

A robust Python tool for processing anesthesia case data from Excel files and converting it to standardized case log format with ACGME web form integration.

Features

Pattern-Based Architecture: Self-contained extraction modules with clear separation of concerns
Domain-Driven Design: Typed domain models with comprehensive validation and confidence tracking.
Modern Output: Professional terminal output with color-coding, tables, and panels using the rich library.
Flexible Configuration: Customizable column mappings and processing rules
Comprehensive Extraction: Airway management, vascular access, specialized monitoring
Intelligent Categorization: Surgery-specific logic for cardiac, vascular, intracerebral, and OB/GYN procedures
Validation Reporting: Detailed validation reports in text, JSON, or Excel format.
Chrome Extension: Auto-fill ACGME case entry forms (see Chrome Extension)
Debug Tools: Interactive categorization debugger with rich formatting

Requirements

Python 3.12+
uv (recommended) or pip

Installation

Using uv (recommended)

# Install dependencies
uv sync

# Install in development mode
uv pip install -e .

# Install with dev dependencies (ruff, pytest, type stubs)
uv sync --group dev

Using pip

pip install -e .

Usage

Basic Usage

# Process a single file
python main.py input.xlsx output.xlsx

# Or use the installed command
case-parser input.xlsx output.xlsx

Advanced Usage

# Specify sheet name and default year
case-parser input.xlsx output.xlsx --sheet "Data" --default-year 2024

# Override column mappings
case-parser input.xlsx output.xlsx --col-date "Date of Service" --col-age "Patient Age"

# Enable verbose logging
case-parser input.xlsx output.xlsx --verbose

# Generate validation report
case-parser input.xlsx output.xlsx --validation-report validation.txt

CSV v2 Format (MPOG Export)

# Process a directory of CaseList/ProcedureList CSV pairs
case-parser /path/to/csv-dir/ output.xlsx --v2

Batch Processing

# Process all residents in Output-Supervised/
python batch_process.py

# Custom directories
python batch_process.py --base-dir /path/to/supervised --output-dir /path/to/output

# Sort output files to match a names list
python sort_logs.py --names-file residents.txt --input-dir Output

Debug Categorization

# Test a specific procedure categorization
case-parser debug-categorize "CABG with CPB" "CARDIAC SURGERY"

# Test with multiple services
case-parser debug-categorize "AVR" "CARDIAC,THORACIC"

# Interactive bug tracking review
case-parser debug-categorize "CABG" "CARDIAC" --bug-track

The debug command displays:

Rule matching trace with formatted tables
Pattern matches and exclusions
The final category with color-coded results
Warnings and special cases

Interactive Bug Tracking

The --bug-track flag enables an interactive review mode for categorization results:

# Review all cases in a file interactively
case-parser input.xlsx output.xlsx --bug-track

Features:

Interactive Selection: Prompts you to confirm or correct the predicted category for each case.
Matched Rules Display: Shows exactly which rules triggered the categorization during review.
Markdown Output: Generates a concise Markdown table suitable for bug trackers (e.g., GitHub Issues), including original procedure, services, predicted vs. correct category, and matched rules.
Progress Tracking: Includes a "Done" checkbox for each row in the Markdown table.

Column Mapping

The tool automatically maps common column names, but you can override them using command-line options:

--col-date: Date column name (default: "Date")
--col-episode-id: Episode ID column name (default: "Episode ID")
--col-anesthesiologist: Anesthesiologist column name (default: "Responsible Provider")
--col-age: Age column name (default: "Age At Encounter")
--col-emergent: Emergent flag column name (default: "Emergent")
--col-asa: ASA status column name (default: "ASA")
--col-final-anesthesia-type: Anesthesia type column name (default: "Final Anesthesia Type")
--col-procedure-notes: Procedure notes column name (default: "Procedure Notes")
--col-procedure: Procedure column name (default: "Procedure")
--col-services: Services column name (default: "Services")

Input File Format

File types: .xlsx, .xls, or .csv
Header row required; one row per case
Required columns (exact names unless overridden): Date, Episode ID, Responsible Provider, Age At Encounter, ASA, Final Anesthesia Type, Procedure, Services
Optional columns: Emergent, Procedure Notes
Services values must be newline-separated within the cell
Date should be parseable by pandas (recommended: MM/DD/YYYY); missing/unparseable dates fall back to --default-year (default 2025, January 1)
Age At Encounter must be numeric

For a step-by-step walkthrough, see USER_GUIDE.md.

Chrome Extension

The ACGME case entry Chrome extension lives in the chrome-extension/ submodule:

razvanazamfirei/acgme-case-parser-extension

The extension reads the Excel output produced by this tool and auto-fills ACGME case entry forms.

To clone this repo with the extension included:

git clone --recurse-submodules https://github.com/razvanazamfirei/acgme-case-parser.git

If you already cloned without --recurse-submodules:

git submodule update --init

See chrome-extension/README.md for installation and usage instructions.

Project Structure

case-parser/
├── src/
│   └── case_parser/
│       ├── __init__.py              # Package initialization
│       ├── models.py                # Data models and configuration
│       ├── domain.py                # Typed domain models
│       ├── processor.py             # Core data processing
│       ├── extractors.py            # Extraction function exports
│       ├── validation.py            # Validation and reporting
│       ├── csv_io.py                # CSV v2 format I/O
│       ├── io.py                    # Excel file I/O
│       ├── cli.py                   # Command line interface
│       ├── exceptions.py            # Custom exceptions
│       ├── logging_config.py        # Logging configuration
│       ├── ml/                      # ML-enhanced classification
│       │   ├── features.py          # Feature engineering
│       │   ├── hybrid.py            # Hybrid rule+ML classifier
│       │   ├── loader.py            # Model loader
│       │   └── predictor.py         # ML predictor
│       └── patterns/                # Pattern-based extraction
│           ├── __init__.py          # Pattern exports
│           ├── README.md            # Pattern documentation
│           ├── extraction_utils.py  # Shared utilities
│           ├── airway_patterns.py   # Airway management
│           ├── vascular_access_patterns.py  # Vascular access
│           ├── monitoring_patterns.py       # Specialized monitoring
│           ├── procedure_patterns.py        # Categorization rules
│           ├── categorization.py    # Categorization logic
│           ├── approach_patterns.py # Surgical approach detection
│           ├── age_patterns.py      # Age range categorization
│           └── anesthesia_patterns.py       # Anesthesia type mapping
├── chrome-extension/                # Chrome extension (git submodule)
├── tests/                           # Unit tests
├── batch_process.py                 # Batch process all residents
├── sort_logs.py                     # Sort output files by names list
├── debug_categorization.py          # Categorization debugger
├── main.py                          # Main entry point
├── pyproject.toml                   # Project configuration and dependencies
└── README.md                        # This file

Data Processing

The tool processes anesthesia case data and extracts:

Case Information: Episode ID, date, responsible provider
Patient Demographics: Age categorization (ACGME categories a-e)
ASA Status: Physical status with emergency flag handling
Procedure Details: Original procedure text, intelligent categorization
Anesthesia Type: Standardized anesthesia type mapping
Airway Management: Intubation techniques (ETT, LMA, DLT), laryngoscopy types (DL, VL), difficult airway indicators
Vascular Access: Arterial lines, central venous catheters, PA catheters
Specialized Monitoring: TEE, electrophysiologic monitoring, CSF drains, invasive neuro monitoring

Intelligent Categorization

Surgery-specific categorization logic:

Cardiac: Distinguishes CPB vs non-CPB procedures (TAVR, CABG, valve replacements)
Vascular: Detects endovascular vs open approach (EVAR, TEVAR, open AAA)
Intracerebral: Categorizes by approach and pathology (endovascular, vascular open, nonvascular)
OB/GYN: Differentiates cesarean vs vaginal delivery, detects labor epidurals

Validation

The tool includes comprehensive validation with modern formatted reports:

Confidence Scoring: Tracks extraction confidence for each field
Warning Detection: Identifies missing data, unparseable fields, and low confidence extractions
Multiple Formats: Generate reports as text (with rich formatting), JSON, or Excel.
Problematic Case Flagging: Automatically identifies cases needing review

# Generate validation report
case-parser input.xlsx output.xlsx --validation-report validation.txt

# JSON format
case-parser input.xlsx output.xlsx --validation-report validation.json

# Excel format
case-parser input.xlsx output.xlsx --validation-report validation.xlsx

Development

Setup

# Install all dependencies including dev tools
uv sync --group dev

# Install the package in editable mode
uv pip install -e .

Code Quality

# Format and lint
ruff format .
ruff check --fix .

# Run tests
uv run pytest

Adding New Patterns

Edit the appropriate pattern file in src/case_parser/patterns/
Add your pattern to the relevant pattern list
Update the extraction function if needed
Test with the debug script
Run linting: ruff format . && ruff check .

See src/case_parser/patterns/README.md for detailed pattern documentation.

Documentation

USER_GUIDE.md: End-user guide with CLI walkthroughs
chrome-extension/README.md: Chrome extension installation and usage
src/case_parser/patterns/README.md: Comprehensive pattern documentation with examples and debugging tips
CLAUDE.md: Detailed architectural guidance for AI-assisted development

Error Handling

The tool includes comprehensive error handling:

File Validation: Checks file existence and format before processing
Data Validation: Validates required columns and data types
Processing Errors: Graceful handling of individual row failures
Logging: Detailed logging for debugging and monitoring
Modern Output: Color-coded error messages with rich formatting

License

MIT License — see LICENSE for details.

Contributing

When contributing:

Follow the pattern-based architecture
Add business logic to pattern modules, not processors
Use the debug script to test categorization changes
Run ruff format . && ruff check . before committing
Update relevant documentation

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
chrome-extension @ 0d2ddcc		chrome-extension @ 0d2ddcc
ml_training		ml_training
src/case_parser		src/case_parser
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
USER_GUIDE.md		USER_GUIDE.md
batch_process.py		batch_process.py
codecov.yaml		codecov.yaml
list_attendings.py		list_attendings.py
main.py		main.py
mise.toml		mise.toml
prek.toml		prek.toml
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
sort_logs.py		sort_logs.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Case Parser

Features

Requirements

Installation

Using uv (recommended)

Using pip

Usage

Basic Usage

Advanced Usage

CSV v2 Format (MPOG Export)

Batch Processing

Debug Categorization

Interactive Bug Tracking

Column Mapping

Input File Format

Chrome Extension

Project Structure

Data Processing

Intelligent Categorization

Validation

Development

Setup

Code Quality

Adding New Patterns

Documentation

Error Handling

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages