Vectorisation has been around for a while now. It’s completely independent of deep-learning developments. MATLAB for a long time provided an efficient implementation for matrix/vector products/operations. Python is more verbose and it’s feedback from the command prompt is symbolic, rather than being straight-forward like MATLAB, which is cleaner and clearer.
I found the videos quite useful to gain clarity. Although I think of it differently to Andrew.
-
z is the INPUT to the neuron: the weighted sum of the activations from the previous layer, plus some bias. It’s a scalar value at the level of the neuron.
-
a is the OUTPUT of the neuron: a non-linear function of the z. It’s a scalar value at the level of neuron.
I totally agree that MATLAB handles vectorization much more cleanly. The way polymorphism works there is a thing of beauty. But the point is that the vector constructs were designed into the language from the beginning, as opposed to python where they were pasted on later as an afterthought.
On the question of the z and a values, you’re right that they are scalars at the level of any given neuron. I would phrase it differently, though:
The “activation” at a given layer happens in two steps: first the “linear activation” followed by the non-linear activation.
The inputs at a given layer are the vector of all the outputs of the neurons in the previous layer. The linear activation is the linear combination of that vector with the vector of the weights for a given neuron, followed by the addition of the bias term. So that is really what mathematicians call an “affine” transformation:
Z^{[l]} = W^{[l]} \cdot A^{[l-1]} + b^{[l]}
Then to compute the final output of the layer, you apply the non-linear activation function g^{[l]}:
A^{[l]} = g^{[l]}(Z^{[l]})
So Z is not the input to the layer, that is the A from the previous layer or X in the special case of the first hidden layer.
Of course I’m doing the fully vectorized version there, where the individual “samples” are the columns of the A values.
1 Like
It’s just a matter of perspective: the math stays the same, so no foul.
I remember reading Rumelhart and McClelland some time back and the neuron (‘unit’) was considered the non-linear transform. The activation then propagates to the next layer in the network. The activation pattern from the previous layer is a vector and by linear transform (weighted sum plus bias) forms input to the neuron. So from this perspective Z (weighted sum plus bias) is the input, A is the output of each neuron. Here I mean z and a as a scalars, because we are at the level of the neuron :b
Yes, you were talking about it at the level of a single neuron, whereas I was showing the full vectorized version: vectorized both across the neurons and across the samples.
Sure, in the Rumelhart and McClelland formulation, they are just meaning something different by “neuron” than Prof Ng means. You’re welcome to think of it that way, if it seems more intuitive to you. As you say, the math is the same. But I like Prof Ng’s way of stating it: the neuron is the entire mechanism that takes all the outputs of the neurons in the previous layer and performs both calculations to give a single output. I haven’t explored the literature, so I don’t know which formulation is more common.
1 Like
Thanks for the detailed explanation, Paul. It’s great that one can join a learning platform and have these sorts of engagements on tech and the materials. Different views are what drive innovation and make AI/ML such a great field. I really appreciate how well-thought out and logically coherent your feedback is. The key learning objective is so well stated. I definitely will always remember the concept of vectorisation in deep-learning.
I think earlier, when Rumelhart and McClelland were communicating their ideas, neural networks were seen as “the microstructures of cognition”: that neural networks (or what they referred to as “connectionist networks”) were a plausible architecture for the nervous system to implement cognitive processing.
They argued for a level of abstraction above the actual electrical phenomena of the nervous system, but below the level at which classical symbolic AI operated. They fundamentally changed our views of knowledge representation: the concept of parallel-distributed representation. Admittedly, thinking has changed alot since then. Prof Ng represents the cutting edge of that thinking.
Alot of disruptive innovations in neural networks have moved away from searching for plausible cognitive architectures, to performance measures based on “consequentialism” (cf. Russel and Norvig), creating desired states in the environment through intelligent and rational learning and decision-making, rather than something that is biologically plausible. To serve this purpose, biological plausibility is less important–we don’t need to constrain networks (and AI systems) to be plausible as an computational architecture that is implemented in the nervous system.
Perhaps a bit of a disclaimer is necessary here: the mentors are simply fellow student volunteers here. We can claim no credit for the quality of the course materials, but I completely agree that the courses are great. Prof Ng and his team at DeepLearning.AI are the ones who created all the course materials. The best we can do is make suggestions to them based on discussions and comments from the student community.
One more thing to say about vectorization (which I’m sure you already know, but for anyone else who finds this thread later) is that the real action happens at the hardware level. By formulating the computations correctly in python/numpy, the python interpreter ends up being able to call the underlying math libraries in a way that they can take advantage of the CPU’s vector instructions.