Software Courses/Neural network and Deep learning

[Neural Network and Deep Learning] Why deep representations?

김 정 환 2020. 3. 24. 12:32
반응형

This note is based on Coursera course by Andrew ng.

(It is just study note for me. It could be copied or awkward sometimes for sentence anything, because i am not native. But, i want to learn Deep Learning on English. So, everything will be bettter and better :))

 

 

INTRO

We have all been hearing that deep neural network work really well for a lot of problems, and it's not just that they needd to be big neural networks. They need to be deep or to have a lot of hidden layers. So why is that?

 

 

MAIN

What is a deep network computing? If we are building a system for face recognition or face detection, here's what a deep neural network could be doing. We put a picture of a face, then the first layer of the neural network might be a feature detector or an edge detector. 

 

In this example, There is a neural network with 24 hidden units, and these are visualized by these liittle square boxes. The top-left corner square box is trying to figure out where the edges of that orientation( / ) are in the image. And the bottom-right corner square box is trying to figure out where the horizontal edges are in this image. 

 

 

Now, let's think about where the edges in this picture by grouping together pixels to from edges. It can then detect the edges and group edges together to form parts of faces. For example, we might a low neuron trying to see if it is finding an eye, or a different nuron trying to find that part fo the nose. So, by putting together lots of edges, it can start to detecct different parts of faces. 

 

Finally, by putting together different parts of faces, it can then try to recognize or detect different types of faces. So intuitively, we can think of the earlier layers of the neural network as detecting simple functions, like edges. And then composing them together in the later layers of a neural network so that it can learn more and more complex functions. 

 

The other piece of intuition about why deep networks seem to work well is the following. 

 

Circuit Theory and Deep Learning

"There are functions you can compute with a "small" L-layer deep neural network that shallower networks require exponentially more hidden units to compute."

 

For example, let's say we are trying to compute the exclusive OR, or the parity of all our input features. If we build in XOR tree like below, it computes XOR of X1 and X2, then take X3 and X4 and compute their XOR. And technically, if we are just using AND or NOT gate, we might need a couple of layers to compute the XOR function rather than just one layer, but with a relatively small circuit, we can compute the XOR, and so on. So, the number of nodes or the number of circuit components or the number of gates in this network is not that large. We don't need that many gates in order to compute the exclusive OR.

 

 

 

But now, if we are not allowed to use a neural network with multiple hidden layers, we are forced to compute this function with just one hidden layer. Then in order to compute this XOR function, this hidden layer will need to be exponentially large, because essentially we need to exhaustively enumerate our 2 to the N possible configuration. So on the order of 2 to the N, possble configurations of the input bits that result in the exclusive Or being either 1 or 0. So we end up needing a hidden layer that is exponentially large in the number of bits. 

 

In addition to this reason for prefering deep neural networks, the other reasons the term deep learning has taken off is just branding. This things just we call neural networks with a lot of hidden layers, but the phrase deep learning is just a great brand, it's just so deep. 

 

 

Conclusion

Regardless of the PR branding, deep networks do work well. Sometimes people go overboard and insist on using tons of hidden layers. When Andrew is starting out a new problem, he will often really start out with even logistic regression then try somthing with one or two hidden layers and use that as a hyper parameter.

반응형