This note is based on Coursera course by Andrew ng.
(It is just study note for me. It could be copied or awkward sometimes for sentence anything, because i am not native. But, i want to learn Deep Learning on English. So, everything will be bettter and better :))
INTRO
RMSprop means Root Mean Square prop that can also spead up gradient descent.
MAIN
In order to provide intuition for this example, let's say the vertical axis is b and horizontal axis is w. It could be w1 and w2. And we want to slow down the learning in the b direction, and spead up horizontal direction.
Sdw will be relatively small, and we are dividing by Sdw at w update. Whereas Sdb will be relatively large, so that we are dividing by Sdb at b update in order to slow down the update on a vertical dimension.
We call the vertical and horizontal directions b and w to illustrate example. In practice, we are in a very high dimensional space of parameters. So, the veritical dimensions might be w1, w2, w17, and the horizontal dimesions might be w3, w4, w15. So, we end up computing a larger sum and a weighted average for these squares and derivatives. Finally, we dump out the directions in which there are these oscillations.
CONCLUSION