バッチ正規化の Tips

バッチ正規化自体に性能を向上させる能力はない

Understanding Batch Normalization によると、バッチ正規化を採用することで高い学習率で学習できるようになる。高い学習率が速い収束と sharp local minima の回避による高い性能とを実現している。

バッチ正規化を使わずに高い学習率を採用すると、ウェイトが発散して学習に失敗する。

バッチ正規化の位置について

バッチ正規化を活性化関数の前後どちらに配置するのがいいかは諸説ある。ReLU ではどちらでも性能に変化はない^*1。

Batch Normalization and Bounded Activation Functions は、Tanh のような有界活性化関数（bounded activation function）の場合は活性化関数の後ろにバッチ正規化を配置すると性能が上がると主張している。

Dropout 不要論

Deep LearningにおけるBatch Normalizationの理解メモと、実際にその効果を見てみるでは Dropout ありとなしで実験し、あったほうが学習が安定すると主張している。

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift は、Dropout は学習時と推論時とで分散が変化するが、バッチ正規化はそうならないので、バッチ正規化に対して Dropout は性能が悪化すると指摘している。

Beyond Random Masking: When Dropout meets Graph Convolutional Networks では、Graph Convolutional Network において Dropout とバッチ正規化を併用した場合の正則化のメカニズムを解説している。Graph Convolutional Network においては Dropout とバッチ正規化を併用した方が性能が出ると主張している。

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networksは Dropout とバッチ正規化を併用し、ReLU の後ろにバッチ正規化を配置する Independent-Component という手法で、学習の安定性・収束速度・性能を向上させられると主張している。

外部リンク

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift #608

Ordering of batch normalization and dropout?

外部リンク

Understanding Batch Normalization

Batch Normalization and Bounded Activation Functions

Intro to Optimization in Deep Learning: Busting the Myth About Batch Normalization

^*1 Moein Hasani and Hassan Khotanlou. An empirical study on position of the batch normalization layer in convolutional neural networks. In 2019 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp. 1–4. IEEE, 2019.