Paper Notes - Multi-Teacher Knowledge Distillation - 1

Learning from Multiple Teacher Networks http://library.usc.edu.ph/ACM/KKD%202017/pdfs/p1285.pdf loss: teachers的softmax输出取平均和student的交叉熵 中间层表示的相对相异度(仅适用于MTKD), 三元组$(q_i...

13 min · Michelia-zhx

Paper Notes - Multi-Teacher Knowledge Distillation - 2

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks https://arxiv.org/pdf/2004.05937.pdf Learning from Multiple Teacher Networks, KDD 2017 Efficient knowledge distillation from an ensemble of teachers. Interspeech 2017: 对teacher的logits取加权平均, 加权平均和student的logit...

12 min · Michelia-zhx