Notes and nitpicks on "Programming Assignment number 1" from Week 4

Hi there,

Here is some debriefing. I suppose a lot of those have been raised in the past…

4.2 Linear-Activation Forward, text

The “sigmoid” text should probably talk about A rather than a.

The ReLU says:

The mathematical formula for ReLu is: A = RELU(Z) = max(0,Z).

But RELU is not a commonly know math symbol, so just:

The mathematical formula for ReLU is: ReLU(Z) := max(0,Z).

“Arguments” vs “parameters” in text

Minor, but sometimes “arguments” is used when “parameters” is meant. The rule:

“parameters" are called "formal parameters”
“arguments” are called “actual parameters”
MSDN: “…the procedure defines a parameter, and the calling code passes an
argument to that parameter. You can think of the parameter as a parking space and the argument as an automobile.”

Typehint everything!

It is absolutely time to typehint the Python functions. This exercise is complex enough as it is and we are not using objects, so we had better describe those old-school generic datatypes (i.e. maps & lists). The underlying Python is of high enough version. Typehinting informs the student and makes commentary describing the I/O unnecessary superfluous - it’s self-maintaining documentation for human and static analyzer.

Here are the declarations. The grader accepts them. :ok_hand:

def initialize_parameters(n_x: int, n_h: int, n_y: int) -> Dict[str, np.ndarray]:
def initialize_parameters_deep(layer_dims: List[int]) -> Dict[str, np.ndarray]:
def linear_forward(A: np.ndarray, W: np.ndarray, b: np.ndarray) -> Tuple[np.ndarray,Tuple[np.ndarray,np.ndarray,np.ndarray]]:
def sigmoid(Z: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
def relu(Z: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
def linear_activation_forward(A_prev: np.ndarray, W: np.ndarray, b: np.ndarray, activation: str) -> Tuple[np.ndarray, Tuple[Tuple, np.ndarray]]:
def L_model_forward(X: np.ndarray, parameters: Dict[str, np.ndarray]) -> Tuple[np.ndarray, List[Tuple[Tuple,np.ndarray]]]:
def compute_cost(AL: np.ndarray, Y: np.ndarray) -> float:
def linear_backward(dZ: np.ndarray, cache: Tuple[np.ndarray, np.ndarray, np.ndarray]) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
def relu_backward(dA, activation_cache) -> np.ndarray:
def sigmoid_backward(dA, activation_cache) -> np.ndarray:
def linear_activation_backward(dA: np.ndarray, cache: Tuple[Tuple[np.ndarray,np.ndarray,np.ndarray],np.ndarray], activation: str) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
def L_model_backward(AL: np.ndarray, Y: np.ndarray, caches: List[Tuple[Tuple[np.ndarray,np.ndarray,np.ndarray],np.ndarray]]) -> Dict[str,np.ndarray]:
def update_parameters(params: Dict[str,np.ndarray], grads: Dict[str,np.ndarray], learning_rate: float) -> Dict[str,np.ndarray]:

Here is a diagram for “caches”, which I actually suggest be added to the text:

LLM-assisted IDEs will make exercises obsolete

As I am editing in JetBrains PyCharm rather than the editor to advantageously apply typechecking where possible, the local code-assistance LLM sometimes proposed the right solution even when I myself was still unsure about it, and I’m not even connected to the big JetBrains LLM in the cloud. :face_with_monocle:

6.3 - L-Model Backward

We read:

Recall that when you implemented the L_model_forward function, at each iteration, you stored a cache which contains (X,W,b, and z).

Should be:

contained A[L-1], W[L], b[L] and the name of the activation function

In the back propagation module, you’ll use those variables to compute the gradients.

“values” rather than “variables”

Exercise 4 - linear_activation_forward

The code says:

activation cache is a dict that contains “Z” → Z

NOPE! “activation cache” is just Z, not a dict.

Do not stress the user in Exercise 9

Tell the user to NOT add dAL or aA2 to the grads dictionary:

# grads[f"dA{L}"] = dAL # NO! do not do that!

otherwise the unit test will fail because a key dA2 must not be in the grads dict (pretty mysteriously too, the test does not not provide enough info to find out what’s wrong)

Alternatively, improve the unit test code to complain properly (I hacked something together, but it’s not presentable to the public)

Nitpicks

In the screen following the exercise, “Confusing Output from the AutoGrader”, there is a typo:

In that particular scenerio (scenario), you can ignore the output

The reason for getting that output is that the AutoGrader has an alloted (allotted)

Note that I encountered this problem in Week 2 already. One gets a “keyboard interrupt” message, which I interpreted as meaning the process ran out of time. One then has to shrug and resubmit. So maybe this screen should be moved to to before Week 2 programming exercise.

That’s about it…

Thank you for reading.

I also was not content with the diagrams, so I have my own one :joy: but it might be too telling (but not really, it informs about how data flows between the functions)

Here is a part:

1 Like

Thanks for all the detailed feedback, but please realize that most of this is simply not going to happen. In particular, there’s no way they are going to spend the effort to go full “object” and “type” in python. This whole course is designed to be accessible for people who “just barely” know python, not those who are scholars in the subject.

But there are some of your other suggestions that might be low enough effort to get done, e.g. wording changes.

You’re right that the structure and contents of the caches are a frequent cause of confusion. Here’s one possible correction to your diagram on that:

The contents of the “activation cache” are Z or Z^{[l]} not W^{[Z]}. But perhaps I am just misinterpreting your intent there …

2 Likes

Thanks Paul.

You are right of course. It should be Z!

Here we go:

Also at:

2 Likes

Well, I contend that if one understands what a “function” is and one can program it out, then one can also understand type annotations indicating what should “go in” and what will “come out” of the function. An indispensable help in understanding what one does. It’s just a way of writing down explicitly what one has in mind, after all. Much clearer than communicating with extensive comments basically trying to supplement missing type annotations with ambiguous prose.

Or in other words, you need to decide what types you are actually dealing with before designing a computable mapping between them (aka. “coding”).

Or in other words, it is better to instruct a person in how to use a propeller than in making him suffer with the paddle.

But that’s just my 2 cent :sunglasses:

P.S.

ChatGPT has proven a great ally in researching and understanding the finer points of the Python type system and library functions. It’s like a live book that both knows what you are actually asking and provides examples to help you along. I don’t know why it works so well, but it does.

1 Like

Of course you are right at the philosophical level, but two other points I would add:

  1. Don’t forget about polymorphism: I first encountered this in MATLAB, but python also supports it. Notice that the sigmoid function we wrote in Week 2 will do something reasonable when fed either a scalar python int (any resolution), scalar float (any resolution) or numpy array (any dtype and any shape). Of course this virtue does not derive from any cleverness on our part: it’s that np.exp is already polymorphic and the type coercion behavior of python operators.

  2. From a more practical standpoint, I have been mentoring these courses since 2017 and I don’t remember very many (any?) cases in which the student’s mistake had to do with not understanding the type of a function argument. It’s usually something way more fundamental than that: e.g. the semantics of optional keyword arguments, the fact that indentation is part of the syntax or just fundamentally misunderstanding how the algorithm in question is intended to work.

print(f"np.exp(1) = {np.exp(1)}")
print(f"np.exp(2.) = {np.exp(2.)}")
print(f"np.exp(True) = {np.exp(True)}")

np.exp(1) = 2.718281828459045
np.exp(2.) = 7.38905609893065
np.exp(True) = 2.71875

Although now that I actually look at the output for True, something funny is going on there. The resolution is terrible: only accurate to 3 decimal places. Hmmmm, they must be doing something weirder than simply coercing True to 1.

2 Likes

I agree that errors are unlikely to be uncovered by typing hints at this point (and one always has the tactical use of assert), but just the additional information provided by them would be worth it (self-documenting code, I reckon). No need to even use it. That’s the good aspect of Python’s gradual type, you are not forced to use it, although this also means I do not want to see Python code in medical applications. :joy:

Polymorphic calls (anytyped calls) may seem like a problem, but Python typehints actually support union types! ChatGPT suggest this example:

def add_numbers(x: int | float, y: int | float) -> float:
    return x + y
print(add_numbers(5, 3.2))  # ✅ Works with int and float
print(add_numbers("5", 3.2))  # ❌ Type error (if using a type checker like mypy)

The one problem is that the type hint itself may become too unwieldy. Then one needs to fall back to Any and anything goes. :joy:

Sadly one cannot specify “if I give it a float, then it returns float but if I give it an int, then it returns an int”. Or even “if I give it an array of N int, it will return an array of 2*N int”, but it’s already sufficiently interesting.

There are also Generics. Oh my.

And here is ChatGPT trying to provide a weakness/strength comparison:

1 Like

Not wanting to be that guy but my astrodroid friend ChatGPT has this theory:

The reason for the slight loss of precision in your output (2.71875 instead of 2.718281828459045) is due to Numpy’s default floating-point precision when handling boolean values.

Explanation

  1. Conversion of True to 1

    • In Python, True is equivalent to 1 when used in a numerical context.
    • Therefore, np.exp(True) is the same as np.exp(1).
  2. Expected Result in Full Precision

    • The mathematical value of ( e^1 ) is approximately 2.718281828459045.
    • If you compute np.exp(1.0), you typically get this high precision.
  3. Why 2.71875?

    • When passing True (a boolean) to np.exp(), NumPy internally promotes it to an integer (int8, int32, or int64, depending on platform).
    • In some NumPy configurations, operations on integer types can be computed in lower precision floating-point (like float16 or float32) rather than full float64 precision.
    • 2.71875 is the nearest representable value in lower precision (e.g., float16 or float32).

Verifying the Behavior

You can check the actual type and precision by explicitly casting to different NumPy data types:

import numpy as np

print(np.exp(True))  # Likely lower precision
print(np.exp(np.float16(True)))  # 2.71875 (float32)
print(np.exp(np.float32(True)))  # 2.71875 (float32)
print(np.exp(np.float64(True)))  # 2.718281828459045 (float64)

Solution: Explicitly Use float64

If you need full precision, convert explicitly:

print(np.exp(np.float64(True)))  # Outputs: 2.718281828459045

So let’s see ourselves

import numpy as np

print(np.exp(True))
print(np.exp(np.float16(True))) 
print(np.exp(np.float32(True)))
print(np.exp(np.float64(True)))

and indeed:

2.719
2.719
2.718282
2.718281828459045
2 Likes

Exactly. Yes, it is beautifully expressive, but it will only add to the confusion in a context in which a lot of the students are relatively new to python. :smile:

2 Likes

Here.

Source: typing — Support for type hints — Python 3.13.2 documentation

1 Like