Software Courses/Neural network and Deep learning

[Neural Network and Deep Learning] Forward Propagation in a Deep Network

김 정 환 2020. 3. 23. 13:14
반응형

This note is based on Coursera course by Andrew ng.

(It is just study note for me. It could be copied or awkward sometimes for sentence anything, because i am not native. But, i want to learn Deep Learning on English. So, everything will be bettter and better :))

 

 

 

 

Let's see how we can perform forward propagation in a deep network. Given a single training example x, here is how we compute the activations of the first layer. 

 

 

 

 

We have done all this for a single training example. How about for doing it in a vectorized way for the whole training set at the same time? 

 

 

 

 

When implementing a deep neural network, one of the debugging tools Andrew often use to check the correctness of his code is to pull of paper, and work through the dimensions and matrix he is working with. Let's see how to do that. If we implementation forward propagation, the first step will be z1 = w1*x +b1. Let's ignore the bias terms b for now, and focus on the parameters w. And think about the dimension of z, w and x. And we can define the dimesion of z, w and x as follow.

 

 

 

 

So, what we figured out above is that the dimension of w1 has to be n1 by n0. And more generally, the dimensions of w[l] must be n[l]by n[l-1]. 

 

 

 

Now, let's think about the dimension of the vector b. w[1] * x is going to be a (3, 1) vector, so we have to add other (3, 1) vector in order to get a (3, 1) vector as the output. So, more general rule is that b[l] should be n[l] by 1 dimenstional. 

 

 

 

 

Now, in a vectorized implementaion, we would have Z[1] = X[1] * x + b[1]. The dimension of Z[1] is, instead of being n[1] by 1, it ends up being n[1] by m, and m is the size we are trying to set. The dimension of W[1] stays the same. And X is, instead of n[1] by 1, all our training examples stacked horizontally. The final detail is that b[1] is still n[1] by 1, but when we take W[1]*X and add it to b, then through Python broadcasting, this will get duplicated and turn n[1] by m matrix, and then add the element wise.

반응형