[Improving: Hyper-parameter tuning, Regularization and Optimization] Mini-batch gradient descent

Software Courses/Improving Deep Neural Networks

[Improving: Hyper-parameter tuning, Regularization and Optimization] Mini-batch gradient descent

김 정 환 2020. 4. 15. 17:09

This note is based on Coursera course by Andrew ng.

(It is just study note for me. It could be copied or awkward sometimes for sentence anything, because i am not native. But, i want to learn Deep Learning on English. So, everything will be bettter and better :))

INTRO

One thing that makes it more difficult is that Deep Learning tends to work best in the regime of big data. We are able to train neural networks on a huge data set and training on a large data set is just slow. So, what we find is that having fast optimization algorithms, having good optimization algorithms can really speed up the efficiency of training. Let's get started by talking about mini-batch gradient descent.

MAIN

WHAT

Epoch means one pass over the full training set.

Batch means that we use all our data to compute the gradient during one iteration.

Mini-batch means we only take a subset of all our data during one iteration.

HOW & WHY

On batch gradient descent, we would expect the cost to go down on every single iteration. It should decrease on every single iteration.

On mini-batch, it may not decrease in every iteration. It should trend downwards, but it is also going to be a little bit noisier.

One of the parameters we need to choose is the size of mini-batch.

If mini-batch size = m : batch gradient descent
- Too long per iteration
If mini-batch size = 1 : stochastic gradient descent
- loss speed up from vectorization
Proper size
- If small training set (m < 2000) : use batch gradient descent.
- if m is greater than 2000 : mini-batch size = 64, 128, 256, 512.

CONCLUSION

'Software Courses > Improving Deep Neural Networks' 카테고리의 다른 글

[Improving: Hyper-parameter tuning, Regularization and Optimization] Optimization - Momentum (0)	2020.04.16
[Improving: Hyper-parameter tuning, Regularization and Optimization] Exponentially weighted averages (0)	2020.04.16
[Improving: Hyper-parameter tuning, Regularization and Optimization] Programming - Gradient Checking (0)	2020.04.15
[Improving: Hyper-parameter tuning, Regularization and Optimization] Programming - Regularization (0)	2020.04.14
[Improving: Hyper-parameter tuning, Regularization and Optimization] Programming - Initialization (0)	2020.04.13

현재글[Improving: Hyper-parameter tuning, Regularization and Optimization] Mini-batch gradient descent

거창한 시작