I’m currently learnign deep vs shallow NN and it seems that small deep NN is better than large shallow NN. I’ve started to think if we could go beyond deep learning:

- Shallow NN with 1 layer is kind of 1D array (rank 1 NN). We can add neurons in one direction.
- Deep NN has multiple layers so it is kind of 2D matrix (rank 2 NN). We can increase size in two directions (number of neurons in layer and number of layers)
- Could we create 3D (rank 3) NN? For example could we take neural network with 5 layers and 10 neurons in each layer (lets call it sheet) and stack 5 such sheets one on another?

If deep NN is better than shallow NN, such hiper-deep NN should be even better. Is is possible to create such NN?

The shallow net Prof Ng shows us in DLS C1 W3 actually has 2 layers: 1 hidden layer and the output layer. So I think your 1D vs 2D analogy for shallow versus deep NNs doesn’t really apply in the way you are describing it. But the key point is that with these network architectures, everything is “serial”: you have one input vector and it feeds through each layer in sequence and produces one output (vector) that is the input to the next layer.

I’m not sure I understand what you have in mind with the “sheets” you describe in your case 3, but I think what you mean is that the sheets get applied in parallel to the inputs. If I’m interpreting that correctly, then I think it sounds a lot like Convolutional Networks which we will learn about in Course 4 of the DLS series. ConvNets can handle inputs that are rank 3 tensors and apply multiple parallel “filters” to them producing a multichannel output that is also a rank 3 tensor. Note that in the way I’m describing the inputs here, I mean just one sample being a rank 3 tensor. So the comparison to the “feed forward fully connected” shallow and deep nets we are learning about in Course 1, the inputs are vectors (each sample) so you could consider them to be rank 1 tensors even though we actually represent them as rank 2 tensors because we want to process multiple samples in parallel.

Sorry, what I said above may sound a bit incoherent. To state the overall point more simply: my suggestion would be that you “hold that thought” and see if what Prof Ng shows us about Convolutional Networks in Course 4 relates to your idea or not. But if not, then maybe it will at least give you a framework for expanding on your idea.

But if it sounds like I’m just missing your point, please describe your “sheets” idea in more detail.

Thanks for answer. ConvNet are something different from what I’m thinking.

In standard NN layer L is connected to layer L+1 (forward) and L-1 (backward). I was thinking if we could connect each layer with additional layer “on the left” and “on the right”

By sheets I was trying to visualise my idea. You can draw NN on single sheet of paper and each layer is actualy collumn of neurons. Each layer (column on flat surface of that sheet) is connected with layer in front and at back. If you could take second sheet of paper that consist the same network (number of layers and nuerons in each layer) we could add connections between thease two sheets. So just imagine two (or more) copies of NN and connections between each of layers (L1 from first network is connected with L1 from second, L2 from first NN is connected with L2 from second).

It would add significant amount of connections and just doulbe number of neurons. The problem that I see is in “serial” calculations of forward and back propagation. I’m not sure how to include propagation in additional dimensions (between of sheets so it would be left propagation calculated together with forward, and right calculation together with back). But maybe there will be a simple way to do it separately and just average results from forward-left and backward-right?

Thanks for the more detailed description. I can see now more what you mean by “sheets”. It’s easier to visualize with that picture in mind. As you say towards the end, the question is how you recombine the inputs from the multiple sheets. Also if you want twice the number of neurons, why is your sheets architecture better than just doubling the sizes of all the layers in the plain vanilla Fully Connected architecture? The key will be to think about how the connections between the sheets are defined. If you just treat them as parallel and feed the same inputs, then the fact that you start with different random initializations will (at least potentially) allow back propagation to learn different solutions in the various sheets. But they could actually end up just being symmetries of the same solution.

Anyway, it’s cool to try to think about new ideas like this. I still think it’s a good idea to “hold that thought” until you’ve seen all the interesting architectures that Prof Ng shows us in Courses 4 and 5. E.g. Residual Nets (with skip layers) and U-Net with transposed convolutions and YOLO where the outputs of the network are multi-dimensional tensors and then the architectures in Course 5 with LSTM and Attention models. Once you’ve seen all that, it may give you some more ideas about how to flesh out your “sheets” architecture.

Thanks for your answer. I will finish the remaining courses and keep in mind my question. Maybe my idea is somehow addressed in other kinds of architecture, right now it seems that expanding neural networks as I proposed will end up just adding symmetrical cases.

I am not sure how to type comments in latex. So, I typed the answer on my computer and posted it as picture. Hope it addresses your question.

The first option (no intra-layer connections) is trivial, but I’m not sure if option 2 can be reduced in such a way.

If we had linear activation functions, we could reduce every network to a linear combination of W and b, but it can’t be done due to nonlinearities in activations.

Let me know if my understanding is correct.

1 Like

@ravi1165 You can interpolate LaTeX by bracketing it with a single dollar sign on either side. This is described on the FAQ Thread. Since you are new to the course, the FAQ Thread is worth a look just on general principles: there are lots more topics besides the LaTeX one.

@Szymon.P.Marciniak, you rightly understood the basic idea. The argument can be extended to non-linear cases as well :

In case of intra-layer connections WITH the activations included, we can write the output of neuron A for above mentioned example case as

\begin{eqnarray} Y_A &=& g_A(W_AX_{in} + W_{AB}Y_B)\\ &=& g_A\left[W_AX_{in} + W_{AB} \times g_B(W_BX_{in})\right]\\ &=& g^{new}_A(X_{in}) \end{eqnarray}

The above math means that we will be using different activation functions for A and B, though it is not the standard way of implementing neural networks. (However, the same is implemented as a ResNet, which was dealt in course4.) The point I want to drive-home is that a higher dimensional NN can be represented as a 2D NN with each layer of the network being a column of neurons.

Thanks @paulinpaloalto. The advices about the Latex and FAQ are really helpful