This note is based on Coursera course by Andrew ng.
(It is just study note for me. It could be copied or awkward sometimes for sentence anything, because i am not native. But, i want to learn Deep Learning on English. So, everything will be bettter and better :))
INTRO
One thing that makes it more difficult is that Deep Learning tends to work best in the regime of big data. We are able to train neural networks on a huge data set and training on a large data set is just slow. So, what we find is that having fast optimization algorithms, having good optimization algorithms can really speed up the efficiency of training. Let's get started by talking about mini-batch gradient descent.
MAIN
WHAT
Epoch means one pass over the full training set.
Batch means that we use all our data to compute the gradient during one iteration.
Mini-batch means we only take a subset of all our data during one iteration.
HOW & WHY
On batch gradient descent, we would expect the cost to go down on every single iteration. It should decrease on every single iteration.
On mini-batch, it may not decrease in every iteration. It should trend downwards, but it is also going to be a little bit noisier.
One of the parameters we need to choose is the size of mini-batch.
- If mini-batch size = m : batch gradient descent
- Too long per iteration
- If mini-batch size = 1 : stochastic gradient descent
- loss speed up from vectorization
- Proper size
- If small training set (m < 2000) : use batch gradient descent.
- if m is greater than 2000 : mini-batch size = 64, 128, 256, 512.
CONCLUSION