Hi there,
Here is some debriefing. I suppose a lot of those have been raised in the past…
4.2 Linear-Activation Forward, text
The “sigmoid” text should probably talk about A
rather than a
.
The ReLU says:
The mathematical formula for ReLu is: A = RELU(Z) = max(0,Z).
But RELU is not a commonly know math symbol, so just:
The mathematical formula for ReLU is: ReLU(Z) := max(0,Z).
“Arguments” vs “parameters” in text
Minor, but sometimes “arguments” is used when “parameters” is meant. The rule:
“parameters" are called "formal parameters”
“arguments” are called “actual parameters”
MSDN: “…the procedure defines a parameter, and the calling code passes an
argument to that parameter. You can think of the parameter as a parking space and the argument as an automobile.”
Typehint everything!
It is absolutely time to typehint the Python functions. This exercise is complex enough as it is and we are not using objects, so we had better describe those old-school generic datatypes (i.e. maps & lists). The underlying Python is of high enough version. Typehinting informs the student and makes commentary describing the I/O unnecessary superfluous - it’s self-maintaining documentation for human and static analyzer.
Here are the declarations. The grader accepts them.
def initialize_parameters(n_x: int, n_h: int, n_y: int) -> Dict[str, np.ndarray]:
def initialize_parameters_deep(layer_dims: List[int]) -> Dict[str, np.ndarray]:
def linear_forward(A: np.ndarray, W: np.ndarray, b: np.ndarray) -> Tuple[np.ndarray,Tuple[np.ndarray,np.ndarray,np.ndarray]]:
def sigmoid(Z: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
def relu(Z: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
def linear_activation_forward(A_prev: np.ndarray, W: np.ndarray, b: np.ndarray, activation: str) -> Tuple[np.ndarray, Tuple[Tuple, np.ndarray]]:
def L_model_forward(X: np.ndarray, parameters: Dict[str, np.ndarray]) -> Tuple[np.ndarray, List[Tuple[Tuple,np.ndarray]]]:
def compute_cost(AL: np.ndarray, Y: np.ndarray) -> float:
def linear_backward(dZ: np.ndarray, cache: Tuple[np.ndarray, np.ndarray, np.ndarray]) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
def relu_backward(dA, activation_cache) -> np.ndarray:
def sigmoid_backward(dA, activation_cache) -> np.ndarray:
def linear_activation_backward(dA: np.ndarray, cache: Tuple[Tuple[np.ndarray,np.ndarray,np.ndarray],np.ndarray], activation: str) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
def L_model_backward(AL: np.ndarray, Y: np.ndarray, caches: List[Tuple[Tuple[np.ndarray,np.ndarray,np.ndarray],np.ndarray]]) -> Dict[str,np.ndarray]:
def update_parameters(params: Dict[str,np.ndarray], grads: Dict[str,np.ndarray], learning_rate: float) -> Dict[str,np.ndarray]:
Here is a diagram for “caches”, which I actually suggest be added to the text:
LLM-assisted IDEs will make exercises obsolete
As I am editing in JetBrains PyCharm rather than the editor to advantageously apply typechecking where possible, the local code-assistance LLM sometimes proposed the right solution even when I myself was still unsure about it, and I’m not even connected to the big JetBrains LLM in the cloud.
6.3 - L-Model Backward
We read:
Recall that when you implemented the L_model_forward function, at each iteration, you stored a cache which contains (X,W,b, and z).
Should be:
contained A[L-1]
, W[L]
, b[L]
and the name of the activation function
In the back propagation module, you’ll use those variables to compute the gradients.
“values” rather than “variables”
Exercise 4 - linear_activation_forward
The code says:
activation cache is a dict that contains “Z” → Z
NOPE! “activation cache” is just Z
, not a dict.
Do not stress the user in Exercise 9
Tell the user to NOT add dAL
or aA2
to the grads
dictionary:
# grads[f"dA{L}"] = dAL # NO! do not do that!
otherwise the unit test will fail because a key dA2
must not be in the grads
dict (pretty mysteriously too, the test does not not provide enough info to find out what’s wrong)
Alternatively, improve the unit test code to complain properly (I hacked something together, but it’s not presentable to the public)
Nitpicks
In the screen following the exercise, “Confusing Output from the AutoGrader”, there is a typo:
In that particular scenerio (scenario), you can ignore the output
The reason for getting that output is that the AutoGrader has an alloted (allotted)
Note that I encountered this problem in Week 2 already. One gets a “keyboard interrupt” message, which I interpreted as meaning the process ran out of time. One then has to shrug and resubmit. So maybe this screen should be moved to to before Week 2 programming exercise.
That’s about it…
Thank you for reading.
I also was not content with the diagrams, so I have my own one but it might be too telling (but not really, it informs about how data flows between the functions)
Here is a part: