Skip to content

🚀 Added a Translation Pipeline#43

Open
leagrieder wants to merge 14 commits intoEPFLiGHT:masterfrom
leagrieder:addTranslationModel
Open

🚀 Added a Translation Pipeline#43
leagrieder wants to merge 14 commits intoEPFLiGHT:masterfrom
leagrieder:addTranslationModel

Conversation

@leagrieder
Copy link
Copy Markdown

@leagrieder leagrieder commented Jan 28, 2026

This PR introduces a translation interface for NLLB-200 with fasttext detection.

✨ Key Contributions

  • Translator (translator.py) for multimeditron inference

    • Automatic language detection with fastText (80% confidence threshold)
    • Smart routing to prevent mistranslation of ambiguous inputs
    • Bidirectional medical translation (user language ↔ English)
    • Compatible with base and fine-tuned NLLB-200 models
  • Consensus-based data generation

    • Synthetic parallel medical corpora built from multi-model translation agreement
    • Scalable approach for low-resource languages
  • Fine-tuning & evaluation framework

    • Scripts for NLLB-200 medical fine-tuning
    • Comprehensive experiments on translation quality and downstream medical QA

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants