Some suggestions to improve the exercise “Logistic Regression with a Neural Network mindset”

Some suggestions to improve the exercise “Logistic Regression with a Neural Network mindset”

Overview of the Problem set

This overlaps “Exercise 1”, but we could have a describe() procedure to introduce the data handed to us, like this:

# Loading the data (cat/non-cat)
# load_dataset() is defined in the task-specific "lr_utils.py"
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

def describe(obj,name):
    if type(obj) == np.ndarray :
        print(f"{name} is a {type(obj)} of shape {obj.shape}")
    else:
        print(f"{name} is a {type(obj)}, that's all I know")

describe(train_set_x_orig,"train_set_x_orig")
describe(train_set_y,"train_set_y")
describe(test_set_x_orig,"test_set_x_orig")
describe(test_set_y,"test_set_y")

Output:

train_set_x_orig is a <class 'numpy.ndarray'> of  shape (209, 64, 64, 3)

etc.

Even though the ndarry is “just a rectangular, n-dimensional bunch of numbers”
one can see the implied hierarchy of objects (and idea that does not exist in the mathematical object itself): image → row → pixel (col) → colors (RGB)
although it is not immediate clear whether the row or column is higher up in the hierarchy.

In “Example of a picture”, the code should be collected into a single procedure (always do procedures, even if the language doesn’t demand it). Also, one should use string interpolation in print (this applies to every print statement on the page, really) and the “class name retrieval” should be made more explicit (I was reflecting for some time what that might be about…) Here we go;

def retrieve(index):
    image      = train_set_x_orig[index]
    ylabel_arr = train_set_y[:, index] # this is an ndarray of shape (1,)
    ylabel_int = np.squeeze(ylabel_arr) # this is an ndarray of shape (), in this case an int
    class_name = classes[ylabel_int].decode('utf-8') # lookup name in the list of classes: 'cat'/'non-cat'
    plt.imshow(image)
    print (f"y = {ylabel_int}, it's a '{class_name}' picture.")

retrieve(25) # is cat
# retrieve(20) # is non-cat

Now the student can easily retrieve images.

In the next exercise, we should again prefer the power of string interpolation to make code more readable:

print (f"Number of training examples: m_train = {m_train}")
print (f"Number of testing examples: m_test = {m_test}")
print (f"Height/Width of each image: num_px = {num_px}")
print (f"Each image is of size: ({num_px}, {num_px}, 3)")
print (f"train_set_x shape: {train_set_x_orig.shape}")
print (f"train_set_y shape: {train_set_y.shape}")
print (f"test_set_x shape: {test_set_x_orig.shape}")
print (f"test_set_y shape: {test_set_y.shape}")

In Exercise 2, I was puzzled by the reshape instruction. But then I found out that:

A trick when you want to flatten a matrix X of shape (a,b,c,d) to a matrix X_flatten of shape (b∗c∗d, a) is to use:

X_flatten = X.reshape(X.shape[0], -1).T

due to the way Python processes tuples in calls, this is the same as

X_flatten = X.reshape((X.shape[0], -1).T

which tells reshape to build a matrix with size X.shape[0] along the first axis, and to figure out a fitting size to accept all cells for the second axis.

So maybe the remarks in the code should be extended to include the above.

Again in the exercise, prefer string interpolation for readability:

print (f"train_set_x_flatten shape: {train_set_x_flatten.shape}")
print (f"train_set_y shape: {train_set_y.shape}")
print (f"test_set_x_flatten shape: {test_set_x_flatten.shape}")
print (f"test_set_y shape: {test_set_y.shape}")

In exercise 4, “Building the parts of our algorithm”, the pseudo-code given seems confusing. It seems to be the “non-batch” version, whereas we have up to now, worked on the whole batch of training examples in one weight-update step, computing the error rather than the loss and computing the gradient of the error relative to the weight parameters rather than the gradient of the loss relative to the weight parameters.

Below that, printing the sigmoid can be done more nicely with

print (f"sigmoid([0, 2]) = {sigmoid(np.array([0,2])}")

In fact, we can flexibilize:

def apply_sigmoid(z):
    if isinstance(z,(list,tuple)):
        nz = np.array(z)
    elif isinstance(z,(int,float)):
        nz = np.array([z])        
    elif type(z) == np.ndarray:
        nz = z
    else:
        print (f"Can't handle {type(z)}")
        return
    print (f"sigmoid({nz}) = {sigmoid(nz)}")

apply_sigmoid(0) # scalar
apply_sigmoid([0,1]) # list
apply_sigmoid((0,2)) # tuple
apply_sigmoid(np.array([0,2])) # numpy array of int
apply_sigmoid(np.array([0.5,0,2.0])) # numpy array of float

Then:

sigmoid([0]) = [0.5]
sigmoid([0 1]) = [0.5        0.73105858]
sigmoid([0 2]) = [0.5        0.88079708]
sigmoid([0 2]) = [0.5        0.88079708]
sigmoid([0.5 0.  2. ]) = [0.62245933 0.5        0.88079708]

There are a few other places where string interpolation can simplify code.

Before “predict”, I suggest adding this note:

Always use m[i, j] for accessing elements in a NumPy array. While m[i][j] works, it is less efficient and less idiomatic in NumPy.

because I was unsure about the access notation.

In “predict”, there is a line

w = w.reshape(X.shape[0], 1)

which seems useless to me as w already has that shape.

Note that for filling in predict, one can point the student to the Numpy floor() function. The logistic function is well suited for that approach. :smirk:

Finally, we are told to display the picture of a single failed entry in the test set, in a hard-to-understand manner. This being completely unfun, I suggest displaying all of the failed tests in one setting. Here is a reasonable attempt (empirical) which will display the failures in a 4 x 4 grid:

def find_failed_tests(testset_Y_predicted,testset_Y_expected):
    failed_tests = [] # collect indexes
    assert testset_Y_predicted.shape[1] == testset_Y_expected.shape[1], \
    "predicted and expected test result counts must be equal"    
    num_tests = testset_Y_predicted.shape[1]            
    for testset_i in range(num_tests):
        class_predicted_num = int(testset_Y_predicted[0,testset_i])
        class_expected_num  = int(testset_Y_expected[0,testset_i])
        # print(f"Predicted class: {class_predicted_num}, Expected class: {class_expected_num}")
        if class_predicted_num != class_expected_num:
            # store the index and outcome of the failed prediction           
            class_predicted_text = classes[class_predicted_num].decode("utf-8")
            class_expected_text  = classes[class_expected_num].decode("utf-8")
            # print(f"Predicted class: {class_predicted_text}, Expected class: {class_expected_text}")
            failed_tests.append([testset_i,class_predicted_text,class_expected_text]) 
    return failed_tests;

def display_in_grid(failed_tests, test_set_x):
    width  = 4
    height = int((len(failed_tests) + (width-1))/width)
    # matplotlib.pyplot.figure says "figsize is the size of a subplot in inches"
    # for some reason, 10 x 10 inch seems best, entirely empirical!
    fig, axes = plt.subplots(width, height, figsize=(10,10)) 
    for i in range(height):
        for j in range(width):
            index = j + i*width
            if (index < len(failed_tests)):                
                testset_i = failed_tests[index][0]
                pic_data = test_set_x[:, testset_i].reshape((num_px, num_px, 3))                
                class_predicted_text = failed_tests[index][1]
                class_expected_text  = failed_tests[index][2]
                axes[i,j].imshow(pic_data)
                axes[i,j].set_title(f"Test pic {testset_i}.\nPredicted '{class_predicted_text}'\nExpected '{class_expected_text}'")
                axes[i,j].axis('off')
            else:
                axes[i,j].axis('off')            
    plt.tight_layout()
    plt.show()
    
testset_Y_predicted = logistic_regression_model["Y_prediction_test"]
testset_Y_expected  = test_set_y

failed_tests = find_failed_tests(testset_Y_predicted,testset_Y_expected)

print(f"Found {len(failed_tests)} failed tests")

display_in_grid(failed_tests, test_set_x)

Aaannn… that’s about it. Thank you for reading :saluting_face:

Thanks for your suggestions. In the case of the ones that are basically stylistic (e.g. always formulating code as functions), I don’t think there’s much chance that the course developers will consider that worthy of their time. But I will file an enhancement request and point them to this thread.

For the question about how the reshape is done, have you seen this thread? I think I may already have sent that to you in response to an earlier thread you wrote.To get the full value there, make sure to read all the way through to learn about “F” versus “C” order.

It’s entirely possible I’m missing your point here, but I don’t see how floor is relevant in predict. What is floor(0.75)? Perhaps you meant round(), but then you have to run the experiment to figure out how it handles the value of exactly 0.5. Or we can just write exactly the code we want, meaning using > 0.5 as the criterion for “Yes”. If I were going to suggest an enhancement here, it would be not to use a for loop and to use “logical indexing” instead, which is a more elegant way to express the same thing. Try running the following code and watch what happens:

A = np.random.randint(0, 10, (4,3))
print(f"A before =\n{A}")
A[A > 7] = 42
print(f"A after =\n{A}")

Thank you Paulin

I think I may already have sent that to you in response to an earlier thread you wrote.

Yes, but in this case, I was confused about how Python handles parameters according to numpy.reshape — NumPy v2.2 Manual

It took some time to understand that (1 tuple parameter)

X_flatten = X.reshape((X.shape[0], -1)).T

is actually the same as (2 numeric parameters)

X_flatten = X.reshape(X.shape[0], -1).T

Python eats both.

It’s entirely possible I’m missing your point here, but I don’t see how floor is relevant in predict.

Well, you just take the Y row vector, add 0.5 to it (vectorially, as it were), then “floor” it to obtain the dichotomic vector of 0.0/1.0 result you need. 1 line. That’s how I have done it since the …uh… late 80s :grin:

(Philosophically, this actually trashes the ‘probability value’ of the result, which would be contrary to the spirit of “Logistic Regression”, whereby what we are actually doing here is estimating a ‘probability’ that the system under observation (in this case, a human looking at a picture) would classify the input as ‘cat’, but that’s another discussion entirely)

Ok, that works, but it seems a bit contrived. Why is that preferable to simply using round() on the value “as is”? Or the “vectorial” version that I showed with the condition A > 0.5?

Well, thinking \epsilon more about it, your comment about the 1980s probably tells the story: it’s one of those cases where you can write the code in such a way that it runs as fast as possible on the hardware that you’ve got, even if it takes the person other than the author who reads the code some thought to understand why it is written that way.

Fair enough, I guess. Performance still matters when training models at scale even in 2025. Although the more modern approach is to write clear and maintainable code and assume that the JIT compiler will optimize that for you to something that runs “fast enough”.