Network Slimming & Knowledge Distillation

## Network Slimming

(Learning Efficient Convolutional Networks through)

channel 贡献和Batchnorm 层关联，若channel scaling factors小，说明该channel的贡献可以被忽略

$\lambda \sum_{\gamma \in \Gamma} g(\gamma)$ 中$\lambda$ 平衡因子，$g(\gamma)$缩放因子惩罚项，g(s)=|s|，即L1-正则化

Channel Pruning for Accelerating Very Deep Neural Networks

For instance, we prune 70% channels with lower scaling factors by choosing the percentile threshold as 70%. By doing so, we obtain a more compact network with less parameters and run-time memory, as well as less computing operations.剪掉整个网络中70%的通道，那么我们先对缩放因子的绝对值排个序，然后取从小到大排序的缩放因子中70%的位置的缩放因子为阈值，通过这样做，我们就可以得到一个较少参数、运行时占内存小、低计算量的紧凑网络

Multi-pass方案

## Knowledge Distillation

• Make use of the well-trained large network

• Let the small network mimic the behavior of large network

• Mimic the value of neuron (Hints)

• Usually use ensemble of models as teacher
• Mimic the final output (probability) of large network

• They are the same when T is large

teacher 概率

qi 第i类的可能性， zi 预测最终预测那一层的神经元，T 软化：猫和狗的概率上升，不要和人一样，把老师汉得信息放大些（老师预测的标签里有一些人工不含有的信息，泛化能力强（暗藏的信息））

(Learning Efficient Object Detection Models with Knowledge Distillation) 以Faster RCNN为例

• Adopt Faster-rcnn as the object detection framework
• Region Proposal Network (RPN)
• Region Classification Network (RCN)
• Hints learning with adaptive layer.
• Obiective function

Classification：

Regression：

vgg 参数多（深），泛化能力好

Reprint policy