Distillation

Two-stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding

Knowledge distillation from text BERT to speech model by matching sequence-level contextualized representations in pretraining and predicted logits in finetuning - ___[ICASSP 2021](https://2021.ieeeicassp.org)___