Initializers with Keras

Hello,
I was wondering about he difference between He and Xavier factors.
There seems Keras has a pretty well documented page in this regard at Layer weight initializers. There seems Xavier is also called Glorot with a couple of flavors.

layer[l] = layers.Dense(
units=64, # W[l].shape[0] = n[l]
kernel_initializer=initializers.HeNormal(shape=(n[l], n[l-1]), seed=42), # W[l]
bias_initializer=initializers.Zeros(), # b[l]
activation = activations.relu, # sigmoid, softmax
kernel_regularizer = regulizers.L2(lambd) # I guess L2 is the Frobenius norm
)

regulizers: Layer weight regularizers
activations: Layer activation functions

There seems Keras has a 1:1 mapping with what is covered in class.
What is the purpose of reimplementing the bodies of these Keras methods? I’d totally understand the pedagogical interest of doing so to remove the “black magic” and understand what is going on under the hood.
However, in practice aka real life, are we supposed to reinvent the wheel or to use higher level libraries like Keras (Tenserflow, PyTorch…)?
I’d also find some pedagogical value in learning and practicing with these libraries as well. (As a beginner, I find Keras more approachable).

Sorry, but I don’t totally understand what question you’re asking.
Can you give more background?

Like Tom, I’m not sure what your question actually is, but I’ll take a shot at one interpretation of what I think you might be asking:

In “real life”, when implementing solutions to real world problems, everyone uses a framework like TF/Keras or PyTorch or one of the others. The reason is that the state of the art is pretty deep at this point and it’s just not possible to build all that you need from scratch. The other advantage of using the packages is that they are already debugged and optimized for you, so that’s another layer of work you don’t need to also do in order to achieve a working solution.

There may be cases in which you want to experiment with implementing your own algorithms, if you can’t find any of the “canned” ones that work well in your case. The State of the Art does continue to advance because researchers are trying to “push the envelope” and solve problems that are not well addressed by the current packages. If you discover something new that is valuable, then write a paper about it and pretty soon it will be another function or set of functions that get added to TF and the other packages.

Now in terms of why Prof Ng chooses to show us the internals of some of the functions, it is for pedagogical reasons. When you are trying to solve a problem, if you don’t really understand what is happening “under the covers”, then you may lose out on the intuition of which of the various possible functions you might want to consider. He spends a fair amount of time educating us about how to approach problems (e.g. if you have overfitting, then here’s how to approach that). If you just treat everything as a “black box”, then you lose some level of understanding and intuition and you just have to do a random search, which is less efficient. So you will see this pattern throughout all the courses in the DLS series: Prof Ng will explain things in terms of the underlying principles and then the lectures and the assignments will walk us through building at least the fundamentals of the solution. But then as we apply the ideas to more complex problems, he switches gears and shows us how to use TF/Keras to put together the solutions.

2 Likes

My apologies if the context introducing a couple of questions was not clear enough. I indeed had two questions and you nicely addressed both: one regarding the pedagogical aspect that you confirmed, and the other one regarding the usage of “canned” libraries. I may not be deep enough in the curriculum to have been exposed to these libraries yet. I’m looking forward to keep up with this journey to get acquainted with these higher level libraries down the road.

Sorry, yes, I forgot that you asked the question in DLS C2. We first get introduced to TensorFlow in Week 3 of C2. Then in C4 (ConvNets) and C5 (Sequence Models) the usage of TF will get much deeper. C3 is not a programming course but covers other important issues.