A robust Python tool for processing anesthesia case data from Excel files and converting it to standardized case log format with ACGME web form integration.
- Pattern-Based Architecture: Self-contained extraction modules with clear separation of concerns
- Domain-Driven Design: Typed domain models with comprehensive validation and confidence tracking.
- Modern Output: Professional terminal output with color-coding, tables, and panels using the rich library.
- Flexible Configuration: Customizable column mappings and processing rules
- Comprehensive Extraction: Airway management, vascular access, specialized monitoring
- Intelligent Categorization: Surgery-specific logic for cardiac, vascular, intracerebral, and OB/GYN procedures
- Validation Reporting: Detailed validation reports in text, JSON, or Excel format.
- Chrome Extension: Auto-fill ACGME case entry forms (see Chrome Extension)
- Debug Tools: Interactive categorization debugger with rich formatting
- Python 3.12+
uv(recommended) orpip
# Install dependencies
uv sync
# Install in development mode
uv pip install -e .
# Install with dev dependencies (ruff, pytest, type stubs)
uv sync --group devpip install -e .# Process a single file
python main.py input.xlsx output.xlsx
# Or use the installed command
case-parser input.xlsx output.xlsx# Specify sheet name and default year
case-parser input.xlsx output.xlsx --sheet "Data" --default-year 2024
# Override column mappings
case-parser input.xlsx output.xlsx --col-date "Date of Service" --col-age "Patient Age"
# Enable verbose logging
case-parser input.xlsx output.xlsx --verbose
# Generate validation report
case-parser input.xlsx output.xlsx --validation-report validation.txt# Process a directory of CaseList/ProcedureList CSV pairs
case-parser /path/to/csv-dir/ output.xlsx --v2# Process all residents in Output-Supervised/
python batch_process.py
# Custom directories
python batch_process.py --base-dir /path/to/supervised --output-dir /path/to/output
# Sort output files to match a names list
python sort_logs.py --names-file residents.txt --input-dir Output# Test a specific procedure categorization
case-parser debug-categorize "CABG with CPB" "CARDIAC SURGERY"
# Test with multiple services
case-parser debug-categorize "AVR" "CARDIAC,THORACIC"
# Interactive bug tracking review
case-parser debug-categorize "CABG" "CARDIAC" --bug-trackThe debug command displays:
- Rule matching trace with formatted tables
- Pattern matches and exclusions
- The final category with color-coded results
- Warnings and special cases
The --bug-track flag enables an interactive review mode for categorization results:
# Review all cases in a file interactively
case-parser input.xlsx output.xlsx --bug-trackFeatures:
- Interactive Selection: Prompts you to confirm or correct the predicted category for each case.
- Matched Rules Display: Shows exactly which rules triggered the categorization during review.
- Markdown Output: Generates a concise Markdown table suitable for bug trackers (e.g., GitHub Issues), including original procedure, services, predicted vs. correct category, and matched rules.
- Progress Tracking: Includes a "Done" checkbox for each row in the Markdown table.
The tool automatically maps common column names, but you can override them using command-line options:
--col-date: Date column name (default: "Date")--col-episode-id: Episode ID column name (default: "Episode ID")--col-anesthesiologist: Anesthesiologist column name (default: "Responsible Provider")--col-age: Age column name (default: "Age At Encounter")--col-emergent: Emergent flag column name (default: "Emergent")--col-asa: ASA status column name (default: "ASA")--col-final-anesthesia-type: Anesthesia type column name (default: "Final Anesthesia Type")--col-procedure-notes: Procedure notes column name (default: "Procedure Notes")--col-procedure: Procedure column name (default: "Procedure")--col-services: Services column name (default: "Services")
- File types:
.xlsx,.xls, or.csv - Header row required; one row per case
- Required columns (exact names unless overridden):
Date,Episode ID,Responsible Provider,Age At Encounter,ASA,Final Anesthesia Type,Procedure,Services - Optional columns:
Emergent,Procedure Notes Servicesvalues must be newline-separated within the cellDateshould be parseable by pandas (recommended:MM/DD/YYYY); missing/unparseable dates fall back to--default-year(default 2025, January 1)Age At Encountermust be numeric
For a step-by-step walkthrough, see USER_GUIDE.md.
The ACGME case entry Chrome extension lives in the chrome-extension/
submodule:
razvanazamfirei/acgme-case-parser-extension
The extension reads the Excel output produced by this tool and auto-fills ACGME case entry forms.
To clone this repo with the extension included:
git clone --recurse-submodules https://github.com/razvanazamfirei/acgme-case-parser.gitIf you already cloned without --recurse-submodules:
git submodule update --initSee chrome-extension/README.md for installation and usage instructions.
case-parser/
├── src/
│ └── case_parser/
│ ├── __init__.py # Package initialization
│ ├── models.py # Data models and configuration
│ ├── domain.py # Typed domain models
│ ├── processor.py # Core data processing
│ ├── extractors.py # Extraction function exports
│ ├── validation.py # Validation and reporting
│ ├── csv_io.py # CSV v2 format I/O
│ ├── io.py # Excel file I/O
│ ├── cli.py # Command line interface
│ ├── exceptions.py # Custom exceptions
│ ├── logging_config.py # Logging configuration
│ ├── ml/ # ML-enhanced classification
│ │ ├── features.py # Feature engineering
│ │ ├── hybrid.py # Hybrid rule+ML classifier
│ │ ├── loader.py # Model loader
│ │ └── predictor.py # ML predictor
│ └── patterns/ # Pattern-based extraction
│ ├── __init__.py # Pattern exports
│ ├── README.md # Pattern documentation
│ ├── extraction_utils.py # Shared utilities
│ ├── airway_patterns.py # Airway management
│ ├── vascular_access_patterns.py # Vascular access
│ ├── monitoring_patterns.py # Specialized monitoring
│ ├── procedure_patterns.py # Categorization rules
│ ├── categorization.py # Categorization logic
│ ├── approach_patterns.py # Surgical approach detection
│ ├── age_patterns.py # Age range categorization
│ └── anesthesia_patterns.py # Anesthesia type mapping
├── chrome-extension/ # Chrome extension (git submodule)
├── tests/ # Unit tests
├── batch_process.py # Batch process all residents
├── sort_logs.py # Sort output files by names list
├── debug_categorization.py # Categorization debugger
├── main.py # Main entry point
├── pyproject.toml # Project configuration and dependencies
└── README.md # This file
The tool processes anesthesia case data and extracts:
- Case Information: Episode ID, date, responsible provider
- Patient Demographics: Age categorization (ACGME categories a-e)
- ASA Status: Physical status with emergency flag handling
- Procedure Details: Original procedure text, intelligent categorization
- Anesthesia Type: Standardized anesthesia type mapping
- Airway Management: Intubation techniques (ETT, LMA, DLT), laryngoscopy types (DL, VL), difficult airway indicators
- Vascular Access: Arterial lines, central venous catheters, PA catheters
- Specialized Monitoring: TEE, electrophysiologic monitoring, CSF drains, invasive neuro monitoring
Surgery-specific categorization logic:
- Cardiac: Distinguishes CPB vs non-CPB procedures (TAVR, CABG, valve replacements)
- Vascular: Detects endovascular vs open approach (EVAR, TEVAR, open AAA)
- Intracerebral: Categorizes by approach and pathology (endovascular, vascular open, nonvascular)
- OB/GYN: Differentiates cesarean vs vaginal delivery, detects labor epidurals
The tool includes comprehensive validation with modern formatted reports:
- Confidence Scoring: Tracks extraction confidence for each field
- Warning Detection: Identifies missing data, unparseable fields, and low confidence extractions
- Multiple Formats: Generate reports as text (with rich formatting), JSON, or Excel.
- Problematic Case Flagging: Automatically identifies cases needing review
# Generate validation report
case-parser input.xlsx output.xlsx --validation-report validation.txt
# JSON format
case-parser input.xlsx output.xlsx --validation-report validation.json
# Excel format
case-parser input.xlsx output.xlsx --validation-report validation.xlsx# Install all dependencies including dev tools
uv sync --group dev
# Install the package in editable mode
uv pip install -e .# Format and lint
ruff format .
ruff check --fix .
# Run tests
uv run pytest- Edit the appropriate pattern file in
src/case_parser/patterns/ - Add your pattern to the relevant pattern list
- Update the extraction function if needed
- Test with the debug script
- Run linting:
ruff format . && ruff check .
See src/case_parser/patterns/README.md for detailed pattern documentation.
- USER_GUIDE.md: End-user guide with CLI walkthroughs
- chrome-extension/README.md: Chrome extension installation and usage
- src/case_parser/patterns/README.md: Comprehensive pattern documentation with examples and debugging tips
- CLAUDE.md: Detailed architectural guidance for AI-assisted development
The tool includes comprehensive error handling:
- File Validation: Checks file existence and format before processing
- Data Validation: Validates required columns and data types
- Processing Errors: Graceful handling of individual row failures
- Logging: Detailed logging for debugging and monitoring
- Modern Output: Color-coded error messages with rich formatting
MIT License — see LICENSE for details.
When contributing:
- Follow the pattern-based architecture
- Add business logic to pattern modules, not processors
- Use the debug script to test categorization changes
- Run
ruff format . && ruff check .before committing - Update relevant documentation