Cross-modal

Two-stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding

Knowledge distillation from text BERT to speech model by matching sequence-level contextualized representations in pretraining and predicted logits in finetuning - ___[ICASSP 2021](https://2021.ieeeicassp.org)___

ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding

Speech-text cross-modal pretraining with cross-modal masked language modeling (CM-MLM) and cross-modal conditioned language modeling (CM-CLM) - ___[ICASSP 2021](https://2021.ieeeicassp.org)___