CMTNET：CMTNet: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition

Code by: Shihe Dong, Jiajun Wei, Junfeng Zhao, Yibing Zhu, Jiayi Zhou

Introduction of Our Code

CMTNET_FOR_SER
├── features_extraction
│   ├── database.py
│   ├── features_util.py
│   └── run_extract_features.py
├── models
│   ├── Mamba
│   │   ├── BiMamba.py
│   │   └── Spec_Mamba.py
│   ├── transformers_encoder
│   │   ├── Cross_Attention.py
│   │   ├── Embedding.py
│   │   └── position_embedding.py
│   └── ser_model.py
├── crossval_SER.py
├── data_utils.py
├── README.md
├── requirements.txt
└── train_ser.py

Environment Requirements

Python Version

Recommended 3.11

Install Dependencies

pip install -r requirements.txt

Pre-trained Model

You can download WAVLM-LARGE model on: https://huggingface.co/microsoft/wavlm-large, and modify the path in ./models/ser_model.py

Run this model

Features_extraction

Run the run_extract_features.py script and modify the parameters in the def parse_arguments(argv) function to implement feature extraction on different datasets. Files generated by feature extraction will be converted to .pkl format for subsequent training.

Training

Run the crossval_SER.py script to execute the training process in the code. Various training parameters can be modified within this script.

Citation

If you use this code in your research, please cite our paper :

@article{dong2026cmtnet,
  title={CMTNet: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition},
  author={Dong, Shihe and Wei, Jiajun and Zhao, Junfeng and Zhu, Yibing and Zhou, Jiayi and Shao, Zhuhong and Niu, Mingyue and Tan, Xiaohui and Jiang, Yinan and Qin, Rongyin},
  journal={Pattern Recognition},
  pages={113159},
  year={2026},
  publisher={Elsevier}
}

Special Thanks

We would like to thank https://github.com/Vincent-ZHQ/CA-MSER for the valuable insights and inspiration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMTNET：CMTNet: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition

Introduction of Our Code

Environment Requirements

Python Version

Install Dependencies

Pre-trained Model

Run this model

Features_extraction

Training

Citation

Special Thanks

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
features_extraction		features_extraction
models		models
README.md		README.md
crossval_SER.py		crossval_SER.py
data_utils.py		data_utils.py
requirements.txt		requirements.txt
train_ser.py		train_ser.py

AKAPhilipD/CMTNET_for_SER

Folders and files

Latest commit

History

Repository files navigation

CMTNET：CMTNet: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition

Introduction of Our Code

Environment Requirements

Python Version

Install Dependencies

Pre-trained Model

Run this model

Features_extraction

Training

Citation

Special Thanks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages