Nothing to really write home about , but I was thinking, we describe (and program out) the processing of the ANN as matrix operations, but this seems low-level and also inherited from the conventions of linear algebra. It becomes subtle once batch processing enters the picture. Then one needs fixes between the math level and the program level like âbroadcastingâ. Furthermore the âindividual elements of a batchâ appear along the â2nd dimensionâ of matrices, but the â2nd dimensionâ is also used as usual for the W (weight) linear transformation matrices. As an IT guy, that looks somewhat sloppy.
First note a subtlety that one should make clear: one can always remove or add an arbitrary number of dimensions of size 1 to an array, thus making arrays of the following shapes (for example) equivalent:
- (a, b)
- (a, b, 1)
- (a, b, 1, 1)
- etc..
This is not unlike being able to add or remove 0âs on the left of a number in standard notation and nothing changes in what the number designates: 000012 = 12.
But note this:
- (), no shape at all, a scalar
- (a), undetermined orientation (âupgradeâ the array to (a, 1) or (1, a), as desired)
- (a, 1), a column vector of a rows
- (a, 1, 1)
- etc..
Contrariwise:
- (1, a), a row vector of a columns
- (1, a, 1)
- (1, a, 1, 1)
- etc..
After this intro, the basic idea (itâs very pedestrian). would be to do the following:
- express all the matrices we encounter during processing always as 3-D matrices/arrays
- âŚwith dimension row, col, depth (in that order, order being important), and
- âŚwith the âbatch examplesâ always arranged along the depth dimension, and
- with the operations transforming one array into the next disentangled from the underlying linear algebra (but of course still explained via linear algebra) as follows:
- MIX: performs a linear transformation by multiplying each (a, 1, 1) shaped sub-array (slice) found along the depth dimension of an input array of shape (a, 1, m) with a matrix of shape (b, a) âon the leftâ, yielding an output array of shape (b, 1, m) composed of the individually transformed slices along depth
- ADD: adds an array of shape (a,1) to each (a, 1, 1) shaped sub-array (slice) found along the depth dimension of an input array of shape (a, 1, m), yielding an output array of shape (a, 1, m), composed of the individually transformed slices along depth
- APPLY: applies an arbitrary 1-argument operation to each element of an input array of shape (a, 1, m), yielding an output array of identical shape (a, 1, m)
- NegLL: applies the two-argument log-likelihood computation on pairwise elements picked from two input arrays of shape (1, 1, m) along the depth dimension, yielding an output arrays of shape (1, 1, m)
- MEAN: computes the mean of an input array of shape (1, 1, m) along the depth dimension, yielding an output array of shape (1, 1, 1), i.e. a scalar.
Now consider a processing chain of a 2-layer ANN like the one below. Note that the layer âwidthsâ (i.e. the number of neurons in that layer) are denoted with:
- \lambda[0]
- \lambda[1]
- \lambda[2], which in this case is \lambda[2] = 1
We can now express the above through an isometric diagram like the following (I have no software to do that, so itâs hand.drawn, though there is PlotNeuralNet but I have no idea how it works). Note the indications of dimension sizes on each array.
Now the âdepthâ of the arrays always expresses the size of the batch and the matrixes W^{[i]} and b^{[i]} are used to parametrize the transformations from one array to the next, but are not bound up with them.