Skip to content

CMT-Net: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition

Notifications You must be signed in to change notification settings

AKAPhilipD/CMTNET_for_SER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMTNET:CMTNet: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition

Code by: Shihe Dong, Jiajun Wei, Junfeng Zhao, Yibing Zhu, Jiayi Zhou

Introduction of Our Code

CMTNET_FOR_SER
├── features_extraction
│   ├── database.py
│   ├── features_util.py
│   └── run_extract_features.py
├── models
│   ├── Mamba
│   │   ├── BiMamba.py
│   │   └── Spec_Mamba.py
│   ├── transformers_encoder
│   │   ├── Cross_Attention.py
│   │   ├── Embedding.py
│   │   └── position_embedding.py
│   └── ser_model.py
├── crossval_SER.py
├── data_utils.py
├── README.md
├── requirements.txt
└── train_ser.py

Environment Requirements

Python Version

Recommended 3.11

Install Dependencies

pip install -r requirements.txt

Pre-trained Model

You can download WAVLM-LARGE model on: https://huggingface.co/microsoft/wavlm-large, and modify the path in ./models/ser_model.py

Run this model

Features_extraction

Run the run_extract_features.py script and modify the parameters in the def parse_arguments(argv) function to implement feature extraction on different datasets. Files generated by feature extraction will be converted to .pkl format for subsequent training.

Training

Run the crossval_SER.py script to execute the training process in the code. Various training parameters can be modified within this script.

Citation

If you use this code in your research, please cite our paper :

@article{dong2026cmtnet,
  title={CMTNet: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition},
  author={Dong, Shihe and Wei, Jiajun and Zhao, Junfeng and Zhu, Yibing and Zhou, Jiayi and Shao, Zhuhong and Niu, Mingyue and Tan, Xiaohui and Jiang, Yinan and Qin, Rongyin},
  journal={Pattern Recognition},
  pages={113159},
  year={2026},
  publisher={Elsevier}
}

Special Thanks

We would like to thank https://github.com/Vincent-ZHQ/CA-MSER for the valuable insights and inspiration.

About

CMT-Net: A Collaborative Mamba-Transformer Network with Spatial-Temporal Cross-Fusion for Speech Emotion Recognition

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages