fix(deps): update dependency transformers to v5#317
Open
dreadnode-renovate-bot[bot] wants to merge 1 commit intomainfrom
Open
fix(deps): update dependency transformers to v5#317dreadnode-renovate-bot[bot] wants to merge 1 commit intomainfrom
dreadnode-renovate-bot[bot] wants to merge 1 commit intomainfrom
Conversation
9f77e00 to
157a706
Compare
| datasource | package | from | to | | ---------- | ------------ | ------ | ----- | | pypi | transformers | 4.57.1 | 5.2.0 |
157a706 to
b5dd341
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
>=4.41.0,<5.0.0→>=5.2.0,<5.3.0Release Notes
huggingface/transformers (transformers)
v5.2.0: : GLM-5, Qwen3.5, Voxtral Realtime, VibeVoice Acoustic TokenizerCompare Source
New Model additions
VoxtralRealtime
VoxtralRealtime is a streaming speech-to-text model from Mistral AI, designed for real-time automatic speech recognition (ASR). Unlike the offline Voxtral model which processes complete audio files, VoxtralRealtime is architected for low-latency, incremental transcription by processing audio in chunks as they arrive.
The model combines an audio encoder with a Mistral-based language model decoder, using time conditioning embeddings and causal convolutions with padding caches to enable efficient streaming inference.
GLM-5 - GlmMoeDsa
The zAI team launches GLM-5, and introduces it as such:
Qwen3.5, Qwen3.5 Moe
The Qwen team launches Qwen 3.5, and introduces it as such:
VibeVoice Acoustic Tokenizer
VibeVoice is a novel framework for synthesizing high-fidelity, long-form speech with multiple speakers by employing a next-token diffusion approach within a Large Language Model (LLM) structure. It's designed to capture the authentic conversational "vibe" and is particularly suited for generating audio content like podcasts and multi-participant audiobooks.
One key feature of VibeVoice is the use of two continuous audio tokenizers, one for extracting acoustic features and another for semantic features.
Breaking changes
Attn] New attn mask interface everywhere (#42848)🚨 This one is quite breaking for super super super old modles: 🚨 🚨
If the config does not have a model-type field, we no longer check the name of the folder like for https://huggingface.co/prajjwal1/bert-tiny/blob/main/config.json
Bugfixes and improvements
convert_rope_params_to_dictso it usesrope_thetafrom the config (#43766) by @hmellorAGENTS.md(#43763) by @tarekziadeModular Dependencies] Fixup qwen rms norms (#43772) by @vasquRepo Consistency] Fix rms norm (#43803) by @vasqucheck_model_inputsimplementation (#43765) by @Cyrilvallezdo_sample=Falseto qwen2_5_vl model tests to stablize the output (#43728) by @kaixuanliuJamba] Fallback to slow path and warn instead of error out (#43889) by @vasqufix] Uselast_hidden_statekey fromget_image_featuresfor llama4 (#43882) by @tomaarsencheck_model_inputsintocapture_outputsandmerge_with_config_defaults+ ensure correctness (#43862) by @Cyrilvallez_keys_to_ignore_on_load_missingfor now (#43893) by @ArthurZuckerinput_embedstoinputs_embedseverywhere (#43916) by @Cyrilvallezimage_urlcontent support inapply_chat_template(#43786) by @kaixuanliugenerate(#43734) by @zucchini-nlprun_*_no‑trainer.pyexamples (#42769) by @casincarun_*_no‑trainer.pyexamples (#43947) by @casincaout_features(#43886) by @zucchini-nlpget_number_of_image_tokens(#43948) by @zucchini-nlpother_workflow_run_idsforissue_commentinutils/notification_service.py(#44036) by @ydshiehSignificant community contributions
The following contributors have made significant changes to the library over the last release:
Jamba] Fallback to slow path and warn instead of error out (#43889)Attn] New attn mask interface everywhere (#42848)Repo Consistency] Fix rms norm (#43803)Modular Dependencies] Fixup qwen rms norms (#43772)v5.1.0: : EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, GLM-OCRCompare Source
New Model additions
EXAONE-MoE
K-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.
PP-DocLayoutV3
PP-DocLayoutV3 is a unified and high-efficiency model designed for comprehensive layout analysis. It addresses the challenges of complex physical distortions—such as skewing, curving, and adverse lighting—by integrating instance segmentation and reading order prediction into a single, end-to-end framework.
Youtu-LLM
Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.
GlmOcr
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.
Breaking changes
🚨 T5Gemma2 model structure (#43633) - Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.init. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code
🚨 Generation cache preparation (#43679) - Refactors cache initialization in generation to ensure sliding window configurations are now properly respected. Previously, some models (like Afmoe) created caches without passing the model config, causing sliding window limits to be ignored. This is breaking because models with sliding window attention will now enforce their window size limits during generation, which may change generation behavior or require adjusting sequence lengths in existing code.
🚨 Delete duplicate code in backbone utils (#43323) - This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant
After this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models. A few config arguments are removed as a result.
🚨 Refactor DETR to updated standards (#41549) - standardizes the DETR model to be closer to other vision models in the library.
🚨Fix floating-point precision in JanusImageProcessor resize (#43187) - replaces an
int()withround(), expect light numerical differences🚨 Remove deprecated AnnotionFormat (#42983) - removes a missnamed class in favour of
AnnotationFormat.Bugfixes and improvements
feat] Allow loading T5Gemma2Encoder with AutoModel (#43559) by @tomaarsenimage_sizesinput param (#43678) by @kaixuanliuAttn] Fixup interface usage after refactor (#43706) by @vasqunum_framesin ASR pipeline (#43546) by @jiqing-fengPreTrainedTokenizerBase(#43675) by @tarekziadeFP8Expertfor DeepSeek R1 (#43616) by @yiliu30HunYuan] Fix RoPE init (#43411) by @vasquSam] Fixup training flags (#43567) by @vasquprocess_bad_commit_report.py: avoid items to appear innullauthor in the report (#43662) by @ydshiehKeyErrorincheck_bad_commit.py(#43655) by @ydshiehtied_weight_keysin-place (#43619) by @zucchini-nlpRope] Revert #43410 and make inheritance implicit again (#43620) by @vasqumake_batched_videowith 5D arrays (#43486) by @zucchini-nlputils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584) by @ydshiehMistralConverter.extract_vocab_merges_from_model(#43557) by @tarekziadetemplatesfolder (#43536) by @CyrilvallezConfiguration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.