Random Initalization in Neural Networks

colakhalil · September 9, 2024, 12:05pm

Hello,

I have seen the video “Random Initalization” in week 3 of the course. I understand why we are initalizing the W variables as randomly to break the symmetry. However, to have better intution I have a question about the idea. Lets say that I have 3 hidden layers with many neurons and 1 output layer with 1 neuron. If I initalize the hidden layer’s weight variables symmetric but If I initalize output layer’s weight variables randomly (to make sure all the variables are different from each other in the output layer) I believe that this will result in convergence in the gradient descent. Because in first iteration of the GD the layer before output layer will have different weights, than in the next iteration second hidden layer will have different weights and so on.

My question is that if this is the case, since we want W variables close to 0 (especially for sigmoid and tanh functions) to converge faster can’t we just make all the weights 0 except the output layer which will be assigned randomly and close to 0. Wouldn’t it provide better performance in gradient descent?

Thank you in advance,
Halil

Nevermnd · September 9, 2024, 12:10pm

@colakhalil I could be wrong, but I don’t think we play around with/initialize the variables in our output layer-- only the inner ones.

Nor would I say we ‘want the gradients close to zero’-- We just want to keep them on a steady keel (i.e. not ‘vanishing’, not ‘exploding’).

Nevermnd · September 9, 2024, 12:23pm

Also, reading your post more closely, yes ultimately we would like to minimize loss, but that does not imply sending ‘the whole model to zero’.

Therein, we would have learned nothing.

But, thank you-- Your thought made me think I really need to finally get to Claude Shannon’s classic on information theory this week that I have been putting off.

I mean he basically defines your ‘cross-entropy loss’.

Deepti_Prasad · September 9, 2024, 1:13pm

Hi @colakhalil

the idea of intializing weight randomly is to let neural network choose it’s gradient towards 0(or close to 0) or a better term would be allowing the network to learn about the complexity of feature randomly

if you assign all weights to 0, then the network is learning the same feature during the training.

Also If all the weights are initialized with 0, the derivative with respect to loss function is the same for every w in W[l], thus all weights have the same value in subsequent iterations. This makes hidden units symmetric and continues for all the n iterations i.e. setting weights to 0 doesn’t make a better model or I should say to learn anything about the model.

Regards
DP

paulinpaloalto · September 9, 2024, 5:51pm

There is a lot to say here. For starters, here’s a pre-existing thread that discusses Symmetry Breaking that’s worth a look. One interesting point covered on that thread is that it’s not necessary for Logistic Regression, but it is once we go to real Neural Networks with more than just the output layer.

It also turns out that you need to break symmetry at all layers of the neural net. If you look at the formulas for how the gradients are computed, it’s the inverse of how forward propagation works. In forward prop, the input neurons of any given layer each get all the outputs of the previous layer and then each neuron has its own weights that they apply to those inputs. When we are going backwards, then all the gradients from the subsequent layers apply equally to all the neurons in the current layer. So if those start out the same, then they stay the same. So any layer that you don’t start with asymmetric weights will just stay symmetric, which defeats the purpose of having multiple neurons in that layer.

Of course all this is an experimental science. If you still have doubts and don’t want to actually do the calculus, it’s easy enough just to try the experiment you are suggesting. E.g. define a 3 layer network and start the middle layer symmetric, but randomly initialize the other two layers. Then run the training and watch what happens: print out the weight matrix for the initially symmetric layer after the training and see what it looks like.

The other high level point here (which was also mentioned on that other thread above) is that it’s not just setting all the weights to zero that’s the problem: setting them to any fixed value is the problem. There’s nothing magically bad about zero: it’s symmetry that’s bad.

Nevermnd · September 9, 2024, 6:28pm

.

colakhalil · September 9, 2024, 10:01pm

Hello,

Firstly, thank you all for your support. Everybody’s comment and resources on the thread helped me to have better intuition. I have tried to come up with a proof about my idea; however, since I am not an expert I might be wrong in some points.

Secondly, after my work I still think that having all symmetric neurons in hidden layers while having randomly initalized output layer will not cause a problem since all the symmetricity will be broken at some point in the hidden layer’s neurons. Means GD will converge and all neurons will behave differently. To proof the idea I have used 2 layers in my work, one is for hidden layer and other one is for output layer. Also I used linear activation function and ignored b values for simplicity reasons. Also the values under weights for example W111 means (first layer, first neuron, first weight). If you have further question about the work I can explain it.

Finally, I believe that if I have n hidden layers that are initalized symmetrically and one hidden layer that is randomly initalized than we need n iteration of GD to break all the symmetricity in all layers which is explained last part of my work.

If you would check my work and find the points that I might be wrong I would appriciate it.

Regards,
Halil

saifkhanengr · September 10, 2024, 5:05am

Hello @colakhalil! It seems you put a lot of effort to explain your point. I like that. So, I did a quick code (ChatGPT did that), three hidden layers, weight initialized with zeroes, and output layer weight initialized randomly. Here is the result of 10 iterations.

Iteration 0
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 1
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 2
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 3
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 4
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 5
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 6
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 7
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 8
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 9
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Iteration 10
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[0.01062255]
[0.02110083]
[0.03210288]]

Below is the code if you want to experiment. For 1000 iterations, I see no symmetry breaking.

import tensorflow as tf
import numpy as np

# Generate example data
X = np.linspace(-2 * np.pi, 2 * np.pi, 100)
Y = np.sin(X)

# Create a sequential model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(3, activation='relu', kernel_initializer=tf.keras.initializers.Zeros(), input_shape=(1,), name='hidden1'),
    tf.keras.layers.Dense(3, activation='relu', kernel_initializer=tf.keras.initializers.Zeros(), name='hidden2'),
    tf.keras.layers.Dense(3, activation='relu', kernel_initializer=tf.keras.initializers.Zeros(), name='hidden3'),
    tf.keras.layers.Dense(1, kernel_initializer=tf.keras.initializers.RandomUniform(minval=-0.1, maxval=0.1), name='output')
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Prepare the data
X_train = X.reshape(-1, 1)
Y_train = Y.reshape(-1, 1)

# Function to print weights in a readable format
def print_weights(model, iteration):
    print(f"\nIteration {iteration + 1}")
    for layer in model.layers:
        weights = layer.get_weights()
        if weights:
            print(f"{layer.name} weights:")
            print(weights[0])
    print()

# Display initial weights
print("Initial Weights")
print_weights(model, -1)

# Train the model for 10 iterations
for iteration in range(10):
    model. Fit(X_train, Y_train, epochs=10, verbose=0)
    print_weights(model, iteration)

Iteration 998
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[-0.07681055]
[ 0.02625201]
[ 0.07456041]]

Iteration 999
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[-0.07681055]
[ 0.02625201]
[ 0.07456041]]

Iteration 1000
hidden1 weights:
[[0. 0. 0.]]
hidden2 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
hidden3 weights:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
output weights:
[[-0.07681055]
[ 0.02625201]
[ 0.07456041]]

paulinpaloalto · September 11, 2024, 1:03am

In addition to Saif’s experiment, I took the “L layer” model from DLS C1 W4 A2 and modified it to use the following initialization function:

# Version that keeps some of the layers symmetric to see what happens

def initialize_parameters_deep_symmetric(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network
    
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """
    
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims) # number of layers in the network

    for l in range(1, L):
        #(≈ 2 lines of code)
        # parameters['W' + str(l)] = ...
        # parameters['b' + str(l)] = ...
        # YOUR CODE STARTS HERE
        # Randomly initialize the first hidden layer and the output layer
        # Keep the internal layers symmetric but non-zero
        if l == 1 or l == L - 1:
            parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * 0.01
            parameters["b" + str(l)] = np.zeros((layer_dims[l], 1))
        else:
            parameters["W" + str(l)] = np.ones((layer_dims[l], layer_dims[l - 1])) * 0.01 * l
            parameters["b" + str(l)] = np.zeros((layer_dims[l], 1))
        # YOUR CODE ENDS HERE
        
        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l - 1]))
        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
        
        print(f"W{l} {parameters['W' + str(l)]}")
        print(f"b{l} {parameters['b' + str(l)]}")

        
    return parameters

So you can see that it initializes the first hidden layer and the output layer with normal distribution * 0.01, but keeps all the other hidden layers symmetric, but each layer has a different symmetric value.

Then I ran the training with the same 4 layer network that was used as the 4 layer example in the assignment:

layers_dims = [12288, 20, 7, 5, 1] # 4-layer model

Here is the beginning of the output, showing the initial W^{[l]} and b^{[l]} values:

layers_dims = [12288, 20, 7, 5, 1]
W1 [[ 0.01788628  0.0043651   0.00096497 ...  0.00742033  0.00777721
  -0.02044101]
 [-0.02034741 -0.01277108 -0.00845047 ... -0.01592858  0.01189758
   0.0136909 ]
 [ 0.00736324  0.01040032 -0.00610759 ... -0.00719972  0.01342522
  -0.00194119]
 ...
 [ 0.00152689  0.0117185  -0.01256988 ... -0.01793973  0.00977007
   0.00740467]
 [ 0.00301225  0.01519223  0.00774002 ... -0.00081801 -0.00483844
   0.01257785]
 [ 0.01000491  0.0052482  -0.0007646  ...  0.00668237  0.00346636
  -0.00618991]]
b1 [[0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]
W2 [[0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]]
b2 [[0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]
W3 [[0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]]
b3 [[0.]
 [0.]
 [0.]
 [0.]
 [0.]]
W4 [[-0.00421271  0.0013598   0.00384183  0.00058416  0.00534057]]
b4 [[0.]]
Cost after iteration 0: 0.6931605644178619 (overflows 0)
AL: [[0.49972569 0.49973082 0.49972838 0.49972952 0.49973165 0.49972385
  0.49972066 0.499729   0.49972601 0.49972862 0.49973926 0.49973179
  0.49972348 0.49974294 0.49973366 0.49973185 0.49973769 0.4997367
  0.4997345  0.4997252  0.49973627 0.49972727 0.49974024 0.4997315
  0.49973214 0.49972417 0.49972907 0.49973705 0.49973845 0.49972394
  0.49972659 0.49972628 0.49974055 0.49973314 0.49972778 0.499728
  0.49973429 0.49973589 0.49971446 0.49972885 0.49973606 0.49971959
  0.49972928 0.49972947 0.49972905 0.49972286 0.49973215 0.49972409
  0.49972881 0.49973311 0.49973125 0.49972948 0.49973276 0.49973369
  0.49973101 0.49974147 0.49972562 0.49973787 0.49971862 0.49971958
  0.49972673 0.49972886 0.49973284 0.49972563 0.49973753 0.4997234
  0.49973501 0.4997275  0.49973159 0.49973156 0.49973424 0.49972426
  0.49972422 0.49972423 0.49973815 0.49972863 0.49972171 0.49972711
  0.4997248  0.49972025 0.49973516 0.49972981 0.49972011 0.49972812
  0.4997241  0.49973542 0.49973359 0.49973297 0.49972927 0.4997203
  0.49972984 0.49972965 0.49972603 0.49972676 0.49973359 0.49973526
  0.49973052 0.49972859 0.49973293 0.49972895 0.49973115 0.49972546
  0.49973437 0.49973233 0.49972538 0.49973529 0.49973714 0.4997266
  0.49972539 0.4997359  0.49973812 0.49973408 0.49973244 0.4997341
  0.4997339  0.49973211 0.4997254  0.49973187 0.49973314 0.4997302
  0.4997266  0.49973327 0.4997359  0.49973994 0.49972821 0.49972594
  0.499727   0.49973026 0.49973647 0.49972627 0.49972591 0.49973081
  0.49972402 0.49973695 0.49972506 0.49973604 0.49973192 0.49974137
  0.49973189 0.49971853 0.49973544 0.49973524 0.49973815 0.49972493
  0.4997361  0.49972905 0.49972986 0.49972546 0.49971998 0.49973953
  0.49973587 0.49973047 0.49973228 0.49974149 0.49971936 0.49973198
  0.49973202 0.49972508 0.49972947 0.49972589 0.49972344 0.49974107
  0.49973363 0.49973533 0.49974094 0.49972372 0.49973235 0.49972893
  0.4997267  0.49973476 0.49973594 0.49972846 0.49973055 0.49973124
  0.49973178 0.49972869 0.49973134 0.49972541 0.49972555 0.4997276
  0.49973588 0.4997351  0.49973231 0.49973875 0.49972804 0.49972951
  0.49972865 0.49972701 0.49974082 0.49973418 0.49973714 0.49973181
  0.49972104 0.49973394 0.49972981 0.49971536 0.49972834 0.4997336
  0.49973154 0.49972806 0.49973272 0.49973757 0.49973695 0.49973105
  0.49973438 0.49971555 0.49973701 0.49971343 0.4997218 ]]
predictions: [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
true labels: [[0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0
  0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 1
  0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1
  1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0
  1 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 0 1 0 1
  0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0]]
Accuracy: 0.6555023923444976
Cost after iteration 100: 0.6780097962751666 (overflows 0)
Cost after iteration 200: 0.667592306155943 (overflows 0)
Cost after iteration 300: 0.6604119985676893 (overflows 0)
Cost after iteration 400: 0.655447628857486 (overflows 0)

Then I ran the training for 2500 iterations and here’s the result after all that:

Cost after iteration 2400: 0.643985211371017 (overflows 0)
AL: [[0.34679948 0.34677231 0.34678866 0.34677796 0.34677105 0.34680619
  0.34682216 0.3467869  0.34679707 0.3467831  0.3467428  0.34677323
  0.34680723 0.34672004 0.34676142 0.34677249 0.34674559 0.34674449
  0.34675819 0.34680376 0.34674951 0.34679054 0.34673632 0.34677365
  0.34676832 0.34680726 0.34678147 0.34674794 0.34674044 0.34680616
  0.34679451 0.34679732 0.34673486 0.34676368 0.34678518 0.34679127
  0.34675931 0.34675019 0.346851   0.34678509 0.3467485  0.34682502
  0.34678187 0.34677924 0.34678438 0.3468131  0.34677237 0.34680579
  0.34678256 0.34676687 0.34677608 0.34678371 0.34676549 0.34676485
  0.34677546 0.34673038 0.34680266 0.34674639 0.34683035 0.3468284
  0.34679872 0.34678572 0.34676818 0.34679941 0.34674523 0.34681092
  0.3467585  0.34679262 0.34677562 0.34676869 0.34676159 0.34680607
  0.34680772 0.34680497 0.34674296 0.34678692 0.34681338 0.34679241
  0.34679788 0.34682235 0.34674912 0.34678258 0.34682477 0.34679169
  0.34680674 0.34675246 0.34676413 0.34676711 0.34678647 0.34682146
  0.34678145 0.34678409 0.34679273 0.34679665 0.34676617 0.34675493
  0.34677587 0.34678776 0.3467626  0.34678103 0.34677475 0.34680106
  0.34675781 0.34676653 0.34680328 0.34675326 0.34675271 0.34679341
  0.34680305 0.34675128 0.34674219 0.34676624 0.34676808 0.34676266
  0.34676319 0.34676748 0.34679873 0.34677371 0.34676739 0.34678118
  0.34679688 0.34676751 0.3467524  0.34673289 0.34679187 0.34679389
  0.3467917  0.34678078 0.34675355 0.3467995  0.34679655 0.34677568
  0.3468075  0.34675123 0.34680092 0.34675578 0.34677178 0.34673443
  0.34676737 0.34683173 0.34675649 0.34675942 0.34674832 0.34680116
  0.34675393 0.34678394 0.34678284 0.3467981  0.34682558 0.34674067
  0.34675506 0.34677576 0.34676732 0.34673455 0.34682446 0.34676992
  0.34676766 0.34679912 0.34677932 0.34679574 0.34680535 0.34672712
  0.34676155 0.34675685 0.34673422 0.34680642 0.34677043 0.34678381
  0.34679582 0.34675593 0.34675711 0.34678663 0.34677911 0.34677495
  0.3467711  0.34678621 0.34677064 0.34680356 0.34680069 0.34679153
  0.34675448 0.34675328 0.34676674 0.34674161 0.34678494 0.34677998
  0.34678561 0.346793   0.34673775 0.3467607  0.34674501 0.34677344
  0.34681964 0.34676044 0.34677613 0.34684776 0.34678774 0.34676602
  0.34677365 0.34678654 0.34677118 0.34674254 0.34675039 0.34676949
  0.34676052 0.34684533 0.34674659 0.34685381 0.34681587]]
predictions: [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
true labels: [[0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0
  0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 1
  0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1
  1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0
  1 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 0 1 0 1
  0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0]]
Accuracy: 0.6555023923444976
Cost after iteration 2499: 0.6439817828953256 (overflows 0)

Then here are the final weight and bias values after 2500 iterations:

W1 = [[ 0.01788628  0.0043651   0.00096497 ...  0.00742033  0.00777721
  -0.02044101]
 [-0.02036943 -0.01278026 -0.00847095 ... -0.01593007  0.01190871
   0.01368698]
 [ 0.00737208  0.01042064 -0.00608053 ... -0.00719253  0.01344298
  -0.00192751]
 ...
 [ 0.00152864  0.0117237  -0.01257146 ... -0.01794156  0.00977583
   0.00740564]
 [ 0.00301708  0.01519869  0.00774623 ... -0.00081634 -0.0048349
   0.0125803 ]
 [ 0.01000668  0.00525214 -0.00075915 ...  0.00668461  0.00347258
  -0.00618627]]
b1 = [[ 0.00000000e+00]
 [ 3.51662485e-06]
 [-1.21415280e-05]
 [ 2.96873603e-05]
 [-1.35570898e-05]
 [-1.87651002e-05]
 [ 1.39109249e-05]
 [ 1.52668597e-05]
 [ 2.27734644e-05]
 [ 3.40384411e-05]
 [ 3.16009596e-06]
 [ 1.79391711e-06]
 [ 1.36460768e-05]
 [ 2.20328512e-05]
 [-2.50012905e-06]
 [ 6.28288347e-05]
 [ 3.39598606e-05]
 [-3.09986423e-06]
 [ 1.25808340e-05]
 [ 6.35296395e-06]]
W2 = [[0.02       0.02006    0.02000512 0.02009686 0.01989857 0.01998671
  0.02002449 0.02001099 0.02001473 0.02023282 0.0200023  0.02001529
  0.02007557 0.02002042 0.01999589 0.02014233 0.0201408  0.01992271
  0.02000572 0.02001768]
 [0.02       0.02006    0.02000512 0.02009686 0.01989857 0.01998671
  0.02002449 0.02001099 0.02001473 0.02023282 0.0200023  0.02001529
  0.02007557 0.02002042 0.01999589 0.02014233 0.0201408  0.01992271
  0.02000572 0.02001768]
 [0.02       0.02006    0.02000512 0.02009686 0.01989857 0.01998671
  0.02002449 0.02001099 0.02001473 0.02023282 0.0200023  0.02001529
  0.02007557 0.02002042 0.01999589 0.02014233 0.0201408  0.01992271
  0.02000572 0.02001768]
 [0.02       0.02006    0.02000512 0.02009686 0.01989857 0.01998671
  0.02002449 0.02001099 0.02001473 0.02023282 0.0200023  0.02001529
  0.02007557 0.02002042 0.01999589 0.02014233 0.0201408  0.01992271
  0.02000572 0.02001768]
 [0.02       0.02006    0.02000512 0.02009686 0.01989857 0.01998671
  0.02002449 0.02001099 0.02001473 0.02023282 0.0200023  0.02001529
  0.02007557 0.02002042 0.01999589 0.02014233 0.0201408  0.01992271
  0.02000572 0.02001768]
 [0.02       0.02006    0.02000512 0.02009686 0.01989857 0.01998671
  0.02002449 0.02001099 0.02001473 0.02023282 0.0200023  0.02001529
  0.02007557 0.02002042 0.01999589 0.02014233 0.0201408  0.01992271
  0.02000572 0.02001768]
 [0.02       0.02006    0.02000512 0.02009686 0.01989857 0.01998671
  0.02002449 0.02001099 0.02001473 0.02023282 0.0200023  0.02001529
  0.02007557 0.02002042 0.01999589 0.02014233 0.0201408  0.01992271
  0.02000572 0.02001768]]
b2 = [[0.00024163]
 [0.00024163]
 [0.00024163]
 [0.00024163]
 [0.00024163]
 [0.00024163]
 [0.00024163]]
W3 = [[0.03030641 0.03030641 0.03030641 0.03030641 0.03030641 0.03030641
  0.03030641]
 [0.03009032 0.03009032 0.03009032 0.03009032 0.03009032 0.03009032
  0.03009032]
 [0.02999407 0.02999407 0.02999407 0.02999407 0.02999407 0.02999407
  0.02999407]
 [0.0301204  0.0301204  0.0301204  0.0301204  0.0301204  0.0301204
  0.0301204 ]
 [0.02993595 0.02993595 0.02993595 0.02993595 0.02993595 0.02993595
  0.02993595]]
b3 = [[ 5.40045273e-03]
 [ 1.61919072e-03]
 [-6.50129094e-05]
 [ 2.14550134e-03]
 [-1.08199205e-03]]
W4 = [[-0.01327811 -0.00652045 -0.00351054 -0.00746104 -0.00169305]]
b4 = [[-0.63425246]]

So here is what happened overall:

The prediction results are terrible: the model just predicts “False” (not a cat) for every image. That gives 65% accuracy on the training set, because it has 65% false samples. It gives 34% accuracy on the test set, because that has 66% True samples.

Notice that the W2 values did learn a little bit: they changed. But the key point is that you see different values if you look across the rows, but if you look down the columns, notice that all the values are the same in each column. Each row is the weights for one output neuron, so what that means is that every neuron has learned exactly the same thing. The b2 values are all the same also.

Interestingly the W3 and b3 values are symmetric, but in a different way: they are the same across the rows, but with differences down the columns that are not very large.

So this seems like pretty convincing evidence that this is not a useful strategy.

I will run one more experiment where I use the more sophisticated He Initialization for layers 1 and 4 and see if that works any better.

paulinpaloalto · September 11, 2024, 1:38am

Ok, with He Initialization for the outer layers, here are the results:

layers_dims = [12288, 20, 7, 5, 1]
W1 [[ 0.01613539  0.0039378   0.00087051 ...  0.00669395  0.0070159
  -0.01844003]
 [-0.0183556  -0.01152091 -0.00762325 ... -0.01436933  0.01073293
   0.01235069]
 [ 0.00664245  0.00938224 -0.00550971 ... -0.00649494  0.01211102
  -0.00175117]
 ...
 [ 0.00137742  0.01057137 -0.01133941 ... -0.0161836   0.00881368
   0.00667983]
 [ 0.00271738  0.01370506  0.00698235 ... -0.00073793 -0.0043648
   0.01134661]
 [ 0.00902553  0.00473445 -0.00068975 ...  0.00602823  0.00312704
  -0.00558398]]
b1 [[0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]
W2 [[0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]
 [0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
  0.02 0.02 0.02 0.02 0.02 0.02]]
b2 [[0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]
W3 [[0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]
 [0.03 0.03 0.03 0.03 0.03 0.03 0.03]]
b3 [[0.]
 [0.]
 [0.]
 [0.]
 [0.]]
W4 [[-0.18839812  0.06081193  0.17181204  0.0261246   0.23883766]]
b4 [[0.]]
Cost after iteration 0: 0.6936887994529632 (overflows 0)
AL: [[0.50037135 0.50057804 0.50047942 0.5005278  0.50061453 0.50029719
  0.50016703 0.50050605 0.50038426 0.50048932 0.50092124 0.50061879
  0.5002813  0.50107082 0.50069419 0.50062097 0.50085839 0.50081613
  0.50072712 0.50035241 0.50080051 0.50043603 0.50095994 0.50060571
  0.50063141 0.50031096 0.50050751 0.50083203 0.50089012 0.5002995
  0.50040907 0.50039359 0.50097537 0.5006721  0.50045629 0.50046589
  0.50072033 0.50078258 0.49991671 0.50049883 0.50079162 0.50012413
  0.50051609 0.50052449 0.50050824 0.50025695 0.50063362 0.50030581
  0.50049872 0.50067235 0.50059541 0.50052442 0.50065666 0.50069441
  0.50058753 0.50101185 0.50036737 0.50086423 0.50008412 0.5001228
  0.50041346 0.50050018 0.50066092 0.50036866 0.50085133 0.50027829
  0.50074877 0.50044473 0.50060902 0.5006084  0.50071731 0.50031375
  0.5003131  0.50031208 0.50087657 0.50048947 0.50020924 0.5004287
  0.50033423 0.50015139 0.5007547  0.50053743 0.50014522 0.50047104
  0.50030596 0.50076525 0.50069132 0.50066738 0.50051573 0.50015282
  0.5005383  0.5005316  0.50038435 0.5004142  0.50069152 0.50075764
  0.50056615 0.50048814 0.50066424 0.50050193 0.5005928  0.50036158
  0.50072269 0.50064071 0.50035904 0.50075917 0.50083599 0.50040793
  0.50035835 0.50078482 0.50087481 0.50071166 0.50064641 0.50071201
  0.50070305 0.50063131 0.50035833 0.50062195 0.50067514 0.50055325
  0.50040823 0.5006778  0.50078545 0.50094993 0.50047164 0.50038036
  0.50042284 0.50055731 0.50080788 0.5003953  0.50037911 0.50057855
  0.50030285 0.5008293  0.50034501 0.50078978 0.50062383 0.50100752
  0.50062194 0.50008085 0.50076683 0.50075884 0.50087706 0.50034135
  0.50079254 0.50050642 0.50053964 0.50036053 0.50014071 0.50093309
  0.50078367 0.50056616 0.50063762 0.50101311 0.5001142  0.50062567
  0.50062765 0.50034674 0.50052284 0.50037962 0.50028002 0.50099612
  0.50069226 0.50076232 0.50098945 0.50029155 0.50064183 0.50050206
  0.50041187 0.50073839 0.50078734 0.50048272 0.50056735 0.50059587
  0.50061732 0.50049312 0.50059971 0.50035908 0.50036638 0.50044838
  0.50078586 0.50075091 0.50063907 0.50090025 0.50046627 0.50052552
  0.50049239 0.5004249  0.5009852  0.50071593 0.50083474 0.50061788
  0.50018251 0.50070503 0.50053818 0.49995365 0.50047768 0.50069265
  0.50060875 0.50046764 0.50065653 0.50085339 0.50082922 0.5005883
  0.50072186 0.49996087 0.50083006 0.49987441 0.50021348]]
predictions: [[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1]]
true labels: [[0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0
  0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 1
  0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1
  1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0
  1 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 0 1 0 1
  0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0]]
Accuracy: 0.354066985645933
Cost after iteration 100: 0.6773813109399727 (overflows 0)
Cost after iteration 200: 0.6667941792821449 (overflows 0)
Cost after iteration 300: 0.6595967032880212 (overflows 0)

Here are the final results after 2500 iterations:

Cost after iteration 2400: 0.6438564294285455 (overflows 0)
AL: [[0.34631037 0.34596946 0.34621602 0.34598246 0.34599433 0.34650298
  0.34656188 0.3463928  0.34643882 0.34624509 0.34601864 0.34622778
  0.34655328 0.34636089 0.34608321 0.34610976 0.34596896 0.34610091
  0.34597793 0.3466738  0.34605429 0.34619185 0.34611411 0.34623971
  0.3461087  0.34659627 0.34604512 0.34610622 0.34605398 0.34641191
  0.34637865 0.34630743 0.34601037 0.3460238  0.34618332 0.34645074
  0.34598261 0.34596872 0.3469591  0.34630537 0.34598277 0.34653509
  0.346623   0.34598593 0.34617026 0.34661554 0.34603581 0.34663731
  0.34609454 0.34602715 0.34607786 0.34620624 0.34605708 0.3460078
  0.3462476  0.3460172  0.346549   0.34611598 0.3466932  0.34676386
  0.34658722 0.34648886 0.34597119 0.34632917 0.3460113  0.34652342
  0.34609473 0.34627259 0.34623138 0.34606653 0.34604676 0.34668677
  0.3464737  0.34651219 0.34611119 0.34624088 0.34642669 0.34640206
  0.3462875  0.34661729 0.3459811  0.34597222 0.34662289 0.34650033
  0.34644321 0.34599119 0.34597603 0.34608832 0.34639794 0.34658307
  0.34611952 0.34620625 0.34616406 0.34646976 0.34630213 0.34611301
  0.34617454 0.34629509 0.34600151 0.34606943 0.34628519 0.34628617
  0.34603865 0.34601632 0.34666153 0.34599913 0.34618295 0.3464292
  0.34655186 0.34609409 0.34613779 0.34598238 0.34602364 0.34596873
  0.34601932 0.34597412 0.34610541 0.34623989 0.34607455 0.34613257
  0.34628253 0.34601403 0.34596901 0.3460336  0.34625956 0.34622921
  0.34632024 0.34630514 0.34599997 0.34657087 0.34632288 0.34615073
  0.3462685  0.34599997 0.3465624  0.34615436 0.34610588 0.34623408
  0.34598478 0.34658108 0.34601904 0.34596872 0.34608446 0.34636966
  0.34610198 0.34634276 0.34615467 0.34642639 0.34662972 0.34620863
  0.34598984 0.34612181 0.34599152 0.34608681 0.34655982 0.34613747
  0.34604723 0.34624209 0.34598021 0.34627332 0.34631308 0.34598692
  0.3461252  0.34599919 0.3462155  0.34646924 0.34605399 0.34609702
  0.34629668 0.34616915 0.3461266  0.34644273 0.3460835  0.3461096
  0.34623591 0.34632135 0.34610739 0.34636453 0.34631005 0.34626167
  0.34599054 0.34604678 0.34601482 0.34606152 0.3460422  0.34617402
  0.3462181  0.34616598 0.34621692 0.34596906 0.34604638 0.34605678
  0.34662578 0.34605625 0.34600846 0.34695189 0.34625841 0.34614294
  0.34602474 0.34642581 0.34614474 0.3461181  0.34605503 0.3460673
  0.34602474 0.3468345  0.34597031 0.34699494 0.34645873]]
predictions: [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
true labels: [[0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0
  0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 1
  0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1
  1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0
  1 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 0 1 0 1
  0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0]]
Accuracy: 0.6555023923444976
Cost after iteration 2499: 0.643848153095711 (overflows 0)

Here are the final weights:

W1 = [[ 0.01613539  0.0039378   0.00087051 ...  0.00669395  0.0070159
  -0.01844003]
 [-0.01839656 -0.01149602 -0.00762733 ... -0.01433364  0.01085021
   0.01239119]
 [ 0.00662007  0.00936159 -0.00551797 ... -0.00648331  0.01212442
  -0.00174843]
 ...
 [ 0.00134554  0.01058824 -0.01131966 ... -0.01618656  0.00885519
   0.00670748]
 [ 0.00265556  0.01364538  0.00693916 ... -0.00076382 -0.00438564
   0.011324  ]
 [ 0.00905752  0.00477973 -0.00063062 ...  0.006047    0.00317198
  -0.00556362]]
b1 = [[ 0.00000000e+00]
 [ 6.96286507e-05]
 [-9.19637012e-05]
 [ 1.88251062e-04]
 [ 1.59978343e-05]
 [-1.54797085e-05]
 [ 3.55256587e-05]
 [ 9.44771481e-05]
 [ 2.74192345e-04]
 [ 2.62071646e-04]
 [-5.94038162e-05]
 [-1.31153317e-05]
 [ 2.42564395e-04]
 [ 1.22382210e-04]
 [-4.07133885e-05]
 [ 2.03648624e-04]
 [ 2.60070872e-04]
 [-4.15052811e-05]
 [-1.26541019e-04]
 [ 5.30027085e-05]]
W2 = [[0.02       0.020143   0.01998048 0.02241795 0.02121535 0.02024293
  0.02004604 0.02037558 0.02072049 0.02178992 0.0199291  0.02010069
  0.02250558 0.02022371 0.01995685 0.0210207  0.02113007 0.02043567
  0.0199286  0.02021147]
 [0.02       0.020143   0.01998048 0.02241795 0.02121535 0.02024293
  0.02004604 0.02037558 0.02072049 0.02178992 0.0199291  0.02010069
  0.02250558 0.02022371 0.01995685 0.0210207  0.02113007 0.02043567
  0.0199286  0.02021147]
 [0.02       0.020143   0.01998048 0.02241795 0.02121535 0.02024293
  0.02004604 0.02037558 0.02072049 0.02178992 0.0199291  0.02010069
  0.02250558 0.02022371 0.01995685 0.0210207  0.02113007 0.02043567
  0.0199286  0.02021147]
 [0.02       0.020143   0.01998048 0.02241795 0.02121535 0.02024293
  0.02004604 0.02037558 0.02072049 0.02178992 0.0199291  0.02010069
  0.02250558 0.02022371 0.01995685 0.0210207  0.02113007 0.02043567
  0.0199286  0.02021147]
 [0.02       0.020143   0.01998048 0.02241795 0.02121535 0.02024293
  0.02004604 0.02037558 0.02072049 0.02178992 0.0199291  0.02010069
  0.02250558 0.02022371 0.01995685 0.0210207  0.02113007 0.02043567
  0.0199286  0.02021147]
 [0.02       0.020143   0.01998048 0.02241795 0.02121535 0.02024293
  0.02004604 0.02037558 0.02072049 0.02178992 0.0199291  0.02010069
  0.02250558 0.02022371 0.01995685 0.0210207  0.02113007 0.02043567
  0.0199286  0.02021147]
 [0.02       0.020143   0.01998048 0.02241795 0.02121535 0.02024293
  0.02004604 0.02037558 0.02072049 0.02178992 0.0199291  0.02010069
  0.02250558 0.02022371 0.01995685 0.0210207  0.02113007 0.02043567
  0.0199286  0.02021147]]
b2 = [[0.00182235]
 [0.00182235]
 [0.00182235]
 [0.00182235]
 [0.00182235]
 [0.00182235]
 [0.00182235]]
W3 = [[0.03835712 0.03835712 0.03835712 0.03835712 0.03835712 0.03835712
  0.03835712]
 [0.0293258  0.0293258  0.0293258  0.0293258  0.0293258  0.0293258
  0.0293258 ]
 [0.03019203 0.03019203 0.03019203 0.03019203 0.03019203 0.03019203
  0.03019203]
 [0.02887905 0.02887905 0.02887905 0.02887905 0.02887905 0.02887905
  0.02887905]
 [0.03068656 0.03068656 0.03068656 0.03068656 0.03068656 0.03068656
  0.03068656]]
b3 = [[ 0.12291871]
 [-0.01887535]
 [-0.01943598]
 [-0.01858523]
 [-0.01975487]]
W4 = [[-0.23364311  0.06143069  0.17313634  0.02378412  0.24025545]]
b4 = [[-0.60362419]]

Note that the weights and bias values in layers 2 and 3 have the same symmetry properties as they did in my previous experiment. The predictions were mostly “True” at the beginning, but after training are back to the same results as with the simpler initialization: all predictions are “False”.

Nevermnd · September 11, 2024, 2:00am

I would not disagree with you @paulinpaloalto, just here to learn, but why would you spin up your dense output layer-- that is where our cost is resolved.

Even completely (pseudo) random, isn’t there a risk there in we’re biasing things ?

paulinpaloalto · September 11, 2024, 2:02am

I don’t understand your question. I am just implementing the scenario that was posed in the initial set of postings on this thread. The question was what happens if you do symmetry breaking on the output layer, but not any of the other layers. So I did a variation of that: I did symmetry breaking on the very first hidden layer and on the output layer. And it looks like the results are lousy. You can see that symmetry is preserved in the layers that were initialized symmetrically, so the learning is very limited.

The conclusion is that symmetry breaking is required on all layers.

paulinpaloalto · September 11, 2024, 2:07am

Also note that I did not in any way change the architecture of the network: the number of layers, number of neurons in each layer, activation functions and cost function are all exactly the same. It is precisely the 4 layer model from DLS C1 W4 A2, but with different initialization.

The point is that symmetry breaking matters.

Nevermnd · September 11, 2024, 2:09am

Okay;

My point was you inspired me to muse on this.

saifkhanengr · September 11, 2024, 3:16am

paulinpaloalto · September 11, 2024, 8:40pm

I modified the predict code to also print the A^{[l]} values, so that we can see the results created by the symmetry of the W2 and W3 values. Here are the results for the test set, which is a bit easier to read because m = 50 instead of 209.

A1 = [[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 2.60491590e-02 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 1.77299175e-01 0.00000000e+00 5.38764667e-01
  0.00000000e+00 1.12330402e-01 0.00000000e+00 0.00000000e+00
  2.69820776e-02 0.00000000e+00 0.00000000e+00 1.82900921e-01
  0.00000000e+00 0.00000000e+00 2.85602656e-01 0.00000000e+00
  0.00000000e+00 7.69771887e-04 1.03409943e-01 0.00000000e+00
  0.00000000e+00 1.27154517e-01 2.31246673e-01 0.00000000e+00
  0.00000000e+00 4.88493165e-02 8.46590357e-02 1.40834202e-01
  1.21171863e-01 3.53949320e-02 2.82942968e-02 1.16152311e-01
  0.00000000e+00 3.78340242e-02 0.00000000e+00 0.00000000e+00
  2.95098242e-02 0.00000000e+00 4.29533066e-02 0.00000000e+00
  5.29227112e-02 0.00000000e+00 0.00000000e+00 1.40353780e-01
  7.79066601e-02 0.00000000e+00 0.00000000e+00 1.37115547e-01
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 7.05023028e-02 0.00000000e+00
  5.24799806e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [5.40531504e-01 6.40229763e-01 6.66266339e-01 3.22324786e-01
  6.83092730e-01 1.78932390e-01 6.25853629e-01 6.15525108e-01
  5.52737845e-01 5.53075184e-01 3.41570856e-01 3.33458490e-01
  3.09102191e-01 0.00000000e+00 1.73430778e-01 3.79514157e-01
  5.48945077e-01 6.34685862e-01 2.50066514e-01 3.07085028e-01
  5.90839079e-01 2.25869816e-01 5.09445771e-01 6.27787914e-01
  6.23899471e-01 3.80452731e-01 3.16842268e-01 1.95060840e-01
  0.00000000e+00 5.19278977e-01 3.71928054e-01 4.96647400e-01
  4.41353375e-01 4.84355079e-01 3.20906171e-01 4.56315284e-01
  2.88004664e-01 1.16357662e-01 8.39282792e-01 4.37769365e-01
  4.35914187e-04 6.08620319e-01 4.43557557e-01 3.72754352e-01
  1.83715001e-01 5.02583685e-01 4.43611362e-01 2.28951440e-01
  3.65280848e-01 6.80390305e-01]
 [0.00000000e+00 4.01181703e-01 2.01125287e-01 1.63140338e-01
  0.00000000e+00 1.18487940e-02 4.08154158e-01 7.83668492e-02
  1.71556130e-01 2.21472434e-01 0.00000000e+00 2.83412218e-01
  1.91756969e-01 4.01952220e-01 0.00000000e+00 3.91473171e-01
  0.00000000e+00 8.36221574e-02 0.00000000e+00 3.61942985e-02
  6.05686188e-01 0.00000000e+00 1.19109218e-01 0.00000000e+00
  2.74503629e-01 1.92471070e-01 1.15328338e-01 0.00000000e+00
  2.64539873e-01 0.00000000e+00 2.49574288e-01 1.76754676e-02
  0.00000000e+00 4.44625845e-01 4.86965961e-01 0.00000000e+00
  0.00000000e+00 0.00000000e+00 2.31817948e-01 0.00000000e+00
  0.00000000e+00 3.77782016e-01 1.98368927e-01 0.00000000e+00
  3.71811485e-01 4.16790736e-02 0.00000000e+00 0.00000000e+00
  5.15000949e-01 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 1.51080353e-01 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 9.14139972e-03 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 1.36478202e-01 3.56805284e-02 0.00000000e+00
  0.00000000e+00 2.55754889e-01 0.00000000e+00 1.15392045e-01
  0.00000000e+00 0.00000000e+00 0.00000000e+00 1.87954040e-01
  0.00000000e+00 5.88854498e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 1.36688451e-02 1.23034039e-01
  0.00000000e+00 0.00000000e+00 0.00000000e+00 8.35484576e-02
  5.02615611e-01 0.00000000e+00 0.00000000e+00 1.81796264e-01
  0.00000000e+00 2.71695763e-01 0.00000000e+00 3.14519242e-01
  3.04066801e-02 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 2.16736503e-01 0.00000000e+00
  1.36675709e-01 0.00000000e+00 1.42977710e-01 0.00000000e+00
  8.58551085e-03 0.00000000e+00 1.16144514e-01 2.56366259e-01
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  5.96455341e-02 1.60507157e-01 9.61710076e-02 0.00000000e+00
  7.21960770e-04 2.71539689e-01 8.37729002e-02 2.42039617e-01
  1.77801856e-01 0.00000000e+00 5.87613657e-03 0.00000000e+00
  0.00000000e+00 1.15789436e-01 0.00000000e+00 2.88464919e-01
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  2.42554057e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00
  1.32576311e-01 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 1.69048713e-01 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 1.32754886e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 9.00329174e-02 0.00000000e+00 0.00000000e+00
  0.00000000e+00 2.54805534e-01 0.00000000e+00 0.00000000e+00
  3.01145757e-02 0.00000000e+00 1.36023690e-01 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 1.62030840e-01 0.00000000e+00
  0.00000000e+00 2.20420398e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 4.52508466e-03 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [1.27985721e-01 5.66937769e-02 1.34787679e-01 0.00000000e+00
  9.38258953e-02 2.37667163e-02 1.41921517e-01 0.00000000e+00
  0.00000000e+00 2.67393551e-01 0.00000000e+00 5.90888005e-02
  0.00000000e+00 2.81079622e-01 1.70552663e-01 0.00000000e+00
  0.00000000e+00 5.53971492e-02 3.74129396e-01 1.01901433e-01
  0.00000000e+00 0.00000000e+00 4.12501554e-01 2.80414043e-01
  0.00000000e+00 1.77325481e-01 0.00000000e+00 9.11728277e-02
  7.61074128e-02 0.00000000e+00 9.98787159e-02 0.00000000e+00
  0.00000000e+00 2.53344522e-01 4.97820103e-01 3.94258398e-01
  3.80991384e-02 0.00000000e+00 1.15667173e-01 0.00000000e+00
  2.85580729e-01 0.00000000e+00 1.67591071e-01 5.15599115e-02
  0.00000000e+00 0.00000000e+00 0.00000000e+00 3.72697941e-01
  0.00000000e+00 1.00392566e-01]
 [9.26390410e-01 1.46023483e+00 1.82993253e+00 1.51447705e+00
  1.88693339e+00 1.00550942e+00 1.77532883e+00 1.44516847e+00
  1.65131515e+00 9.55012906e-01 1.65144827e+00 1.48484980e+00
  1.24603330e+00 1.15940269e+00 1.17346580e+00 1.33713303e+00
  8.95931002e-01 1.12155128e+00 1.22855262e+00 1.53052769e+00
  1.35308547e+00 6.64267495e-01 9.12762999e-01 9.88298589e-01
  1.10681351e+00 9.87683344e-01 1.67060235e+00 1.30354363e+00
  7.32987443e-01 1.45694667e+00 1.17547429e+00 1.51602422e+00
  1.74223564e+00 7.42993649e-01 1.07699275e+00 1.18008306e+00
  1.46843193e+00 1.52232044e+00 1.09727186e+00 1.17935550e+00
  9.29615234e-01 1.22941786e+00 1.39915067e+00 1.00964358e+00
  1.33071480e+00 1.20289035e+00 1.33386661e+00 1.43136007e+00
  1.09278496e+00 1.49572774e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  7.75725997e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 2.12924098e-01 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 1.25562483e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 8.34404622e-02
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  2.47155527e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [1.81263049e-01 3.81388208e-01 5.08909229e-01 7.67360092e-01
  4.62435956e-01 4.49382608e-01 4.08158092e-01 8.62079726e-02
  8.27165805e-01 1.46335542e-01 6.27212184e-01 5.62418853e-01
  5.02534479e-01 0.00000000e+00 4.14428060e-01 3.93426398e-01
  3.49142850e-01 4.73065533e-01 3.96676692e-01 7.63052812e-01
  9.27234214e-01 2.31256307e-01 4.02466160e-01 4.11664923e-01
  2.07317033e-01 4.09401629e-01 9.89618989e-01 0.00000000e+00
  1.29482932e-01 6.46766315e-01 3.17912432e-01 4.94786816e-01
  5.62407730e-01 2.02647378e-01 3.95405968e-01 4.07148760e-01
  7.47935889e-01 6.88360990e-01 4.94976140e-01 1.76716145e-01
  1.01211410e-01 1.93996290e-01 4.89976449e-01 4.88710159e-01
  9.80073411e-02 3.26581990e-01 3.03873104e-01 0.00000000e+00
  8.99815930e-01 4.58707562e-01]
 [1.74623617e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 4.97798004e-02 0.00000000e+00
  0.00000000e+00 0.00000000e+00 7.06391387e-02 0.00000000e+00
  8.76489873e-02 0.00000000e+00 0.00000000e+00 3.30741458e-02
  0.00000000e+00 2.88203237e-02 3.82072180e-01 0.00000000e+00
  2.72354207e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00
  1.60206588e-01 0.00000000e+00 2.36345806e-01 0.00000000e+00
  0.00000000e+00 2.15448784e-01 1.52180624e-01 0.00000000e+00
  2.45627873e-01 3.33255168e-02 0.00000000e+00 2.87902015e-01
  1.01064298e-01 3.26370729e-01 2.03399925e-01 2.56985648e-01
  0.00000000e+00 3.22283916e-01 0.00000000e+00 5.62937030e-02
  0.00000000e+00 0.00000000e+00 1.02885776e-01 0.00000000e+00
  1.70349707e-01 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 7.29062498e-02 0.00000000e+00
  1.10170193e-01 0.00000000e+00 4.95767776e-01 0.00000000e+00
  2.41147286e-01 0.00000000e+00 2.86693113e-01 1.53092955e-01
  3.22416816e-01 0.00000000e+00 5.20120422e-01 5.28286325e-01
  1.41332129e-01 0.00000000e+00 3.75434866e-02 0.00000000e+00
  1.96513878e-01 0.00000000e+00 1.05194208e-01 0.00000000e+00
  0.00000000e+00 1.49814290e-01 4.57983076e-01 1.87913167e-01
  2.91340053e-01 0.00000000e+00 2.82990501e-01 1.58934476e-01
  8.43424088e-03 5.96449336e-03 3.50729935e-01 4.17187241e-01
  0.00000000e+00 0.00000000e+00 1.48106240e-01 3.24764370e-01
  9.97079592e-02 2.21041024e-01 0.00000000e+00 0.00000000e+00
  2.27247034e-02 3.59610629e-02]
 [8.52852361e-01 1.49670027e+00 1.14462355e+00 1.17658197e+00
  1.29652313e+00 4.13819337e-01 1.27057606e+00 9.56778579e-01
  7.83421233e-01 9.17951964e-01 3.91321972e-01 7.83513412e-01
  3.34273086e-01 1.44146624e+00 5.38423279e-01 6.69658698e-01
  6.80096159e-01 9.98624650e-01 8.14007949e-01 1.14649054e+00
  6.88990948e-01 7.27367551e-01 5.60089163e-01 7.05900247e-01
  4.15775626e-01 9.87653445e-01 9.47440423e-01 5.39493630e-01
  6.04362408e-01 3.22881970e-01 2.36024447e-01 7.30818921e-01
  9.82674865e-01 1.01418088e+00 7.14505945e-01 9.41320767e-01
  8.89034006e-01 6.76378153e-01 1.06585443e+00 1.03013214e+00
  7.20461980e-01 7.08919711e-01 6.69566613e-01 2.54180761e-01
  6.13963949e-01 9.31743233e-01 7.75701907e-01 9.22499429e-01
  4.20588073e-01 8.94955954e-01]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 2.19310365e-02 0.00000000e+00 0.00000000e+00
  0.00000000e+00 1.48910617e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00 5.61660006e-03 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 1.49338188e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 3.50903652e-01 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 7.81197030e-02
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 9.08075907e-02 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  3.42178913e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 1.10031922e-01 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00]]
A2 = [[0.06240965 0.1014594  0.10239015 0.0984931  0.09799915 0.05005594
  0.10815373 0.0709308  0.09186419 0.07118785 0.0840437  0.08178703
  0.06792546 0.07471532 0.07016336 0.07466277 0.06282851 0.07851985
  0.09187173 0.10257841 0.09704088 0.05455891 0.07054594 0.07143489
  0.06931356 0.08143744 0.10068993 0.05452957 0.04373652 0.09934236
  0.08088161 0.08408458 0.09880554 0.07097915 0.08346977 0.08664237
  0.07949895 0.07643289 0.10117567 0.08451878 0.05679708 0.08062456
  0.07822291 0.06408918 0.06653141 0.07987078 0.06623446 0.07608429
  0.0810827  0.08173716]
 [0.06240965 0.1014594  0.10239015 0.0984931  0.09799915 0.05005594
  0.10815373 0.0709308  0.09186419 0.07118785 0.0840437  0.08178703
  0.06792546 0.07471532 0.07016336 0.07466277 0.06282851 0.07851985
  0.09187173 0.10257841 0.09704088 0.05455891 0.07054594 0.07143489
  0.06931356 0.08143744 0.10068993 0.05452957 0.04373652 0.09934236
  0.08088161 0.08408458 0.09880554 0.07097915 0.08346977 0.08664237
  0.07949895 0.07643289 0.10117567 0.08451878 0.05679708 0.08062456
  0.07822291 0.06408918 0.06653141 0.07987078 0.06623446 0.07608429
  0.0810827  0.08173716]
 [0.06240965 0.1014594  0.10239015 0.0984931  0.09799915 0.05005594
  0.10815373 0.0709308  0.09186419 0.07118785 0.0840437  0.08178703
  0.06792546 0.07471532 0.07016336 0.07466277 0.06282851 0.07851985
  0.09187173 0.10257841 0.09704088 0.05455891 0.07054594 0.07143489
  0.06931356 0.08143744 0.10068993 0.05452957 0.04373652 0.09934236
  0.08088161 0.08408458 0.09880554 0.07097915 0.08346977 0.08664237
  0.07949895 0.07643289 0.10117567 0.08451878 0.05679708 0.08062456
  0.07822291 0.06408918 0.06653141 0.07987078 0.06623446 0.07608429
  0.0810827  0.08173716]
 [0.06240965 0.1014594  0.10239015 0.0984931  0.09799915 0.05005594
  0.10815373 0.0709308  0.09186419 0.07118785 0.0840437  0.08178703
  0.06792546 0.07471532 0.07016336 0.07466277 0.06282851 0.07851985
  0.09187173 0.10257841 0.09704088 0.05455891 0.07054594 0.07143489
  0.06931356 0.08143744 0.10068993 0.05452957 0.04373652 0.09934236
  0.08088161 0.08408458 0.09880554 0.07097915 0.08346977 0.08664237
  0.07949895 0.07643289 0.10117567 0.08451878 0.05679708 0.08062456
  0.07822291 0.06408918 0.06653141 0.07987078 0.06623446 0.07608429
  0.0810827  0.08173716]
 [0.06240965 0.1014594  0.10239015 0.0984931  0.09799915 0.05005594
  0.10815373 0.0709308  0.09186419 0.07118785 0.0840437  0.08178703
  0.06792546 0.07471532 0.07016336 0.07466277 0.06282851 0.07851985
  0.09187173 0.10257841 0.09704088 0.05455891 0.07054594 0.07143489
  0.06931356 0.08143744 0.10068993 0.05452957 0.04373652 0.09934236
  0.08088161 0.08408458 0.09880554 0.07097915 0.08346977 0.08664237
  0.07949895 0.07643289 0.10117567 0.08451878 0.05679708 0.08062456
  0.07822291 0.06408918 0.06653141 0.07987078 0.06623446 0.07608429
  0.0810827  0.08173716]
 [0.06240965 0.1014594  0.10239015 0.0984931  0.09799915 0.05005594
  0.10815373 0.0709308  0.09186419 0.07118785 0.0840437  0.08178703
  0.06792546 0.07471532 0.07016336 0.07466277 0.06282851 0.07851985
  0.09187173 0.10257841 0.09704088 0.05455891 0.07054594 0.07143489
  0.06931356 0.08143744 0.10068993 0.05452957 0.04373652 0.09934236
  0.08088161 0.08408458 0.09880554 0.07097915 0.08346977 0.08664237
  0.07949895 0.07643289 0.10117567 0.08451878 0.05679708 0.08062456
  0.07822291 0.06408918 0.06653141 0.07987078 0.06623446 0.07608429
  0.0810827  0.08173716]
 [0.06240965 0.1014594  0.10239015 0.0984931  0.09799915 0.05005594
  0.10815373 0.0709308  0.09186419 0.07118785 0.0840437  0.08178703
  0.06792546 0.07471532 0.07016336 0.07466277 0.06282851 0.07851985
  0.09187173 0.10257841 0.09704088 0.05455891 0.07054594 0.07143489
  0.06931356 0.08143744 0.10068993 0.05452957 0.04373652 0.09934236
  0.08088161 0.08408458 0.09880554 0.07097915 0.08346977 0.08664237
  0.07949895 0.07643289 0.10117567 0.08451878 0.05679708 0.08062456
  0.07822291 0.06408918 0.06653141 0.07987078 0.06623446 0.07608429
  0.0810827  0.08173716]]
A3 = [[0.13967569 0.15016054 0.15041045 0.14936409 0.14923146 0.13635872
  0.15195797 0.14196362 0.14758423 0.14203263 0.14548443 0.14487851
  0.14115669 0.14297976 0.14175756 0.14296565 0.13978815 0.14400128
  0.14758625 0.15046099 0.14897417 0.13756777 0.14186028 0.14209897
  0.14152939 0.14478465 0.14995394 0.13755989 0.13466196 0.14959212
  0.14463541 0.1454954  0.14944798 0.1419766  0.14533033 0.14618217
  0.14426417 0.14344093 0.15008436 0.14561199 0.13816872 0.14456639
  0.14392155 0.14012664 0.14078238 0.144364   0.14070265 0.14334733
  0.1446894  0.14486512]
 [0.         0.00195229 0.00214335 0.00134337 0.00124197 0.
  0.00332651 0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.002182   0.00104525 0.         0.         0.
  0.         0.         0.00179433 0.         0.         0.0015177
  0.         0.         0.0014075  0.         0.         0.
  0.         0.         0.00189405 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.        ]
 [0.         0.00200688 0.00220359 0.00137997 0.00127557 0.
  0.00342168 0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.00224337 0.00107305 0.         0.         0.
  0.         0.         0.00184425 0.         0.         0.00155945
  0.         0.         0.001446   0.         0.         0.
  0.         0.         0.00194691 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.        ]
 [0.         0.00192513 0.00211328 0.00132548 0.00122563 0.
  0.00327841 0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.00215134 0.00103191 0.         0.         0.
  0.         0.         0.00176958 0.         0.         0.00149716
  0.         0.         0.00138864 0.         0.         0.
  0.         0.         0.00186777 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.        ]
 [0.         0.00203921 0.00223914 0.00140203 0.00129592 0.
  0.00347719 0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.00227957 0.00109008 0.         0.         0.
  0.         0.         0.00187392 0.         0.         0.00158445
  0.         0.         0.00146914 0.         0.         0.
  0.         0.         0.00197826 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.        ]]
AL: [[0.34609281 0.3457655  0.34577453 0.34573672 0.34573193 0.34626822
  0.34583044 0.34597184 0.34567475 0.34596819 0.34578573 0.34581775
  0.3460145  0.34591812 0.34598274 0.34591887 0.34608686 0.34586412
  0.34567465 0.34577635 0.34572264 0.34620428 0.34597731 0.34596469
  0.3459948  0.34582272 0.34575803 0.34620469 0.34635797 0.34574496
  0.3458306  0.34578515 0.34573975 0.34597116 0.34579387 0.34574885
  0.34585023 0.34589374 0.34576275 0.34577899 0.3461725  0.34583425
  0.34586834 0.34606897 0.34603429 0.34584495 0.34603851 0.34589869
  0.34582775 0.34581846]]
predictions: [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
true labels: [[1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0
  0 1 0 0 1 1 1 0 0 0 1 1 1 0]]
Accuracy: 0.3400000000000001

You can see that all the rows of A2 are equal, as we would expect.

Topic		Replies	Views
Randomly initialize parameter b instead of W Neural Networks and Deep Learning coursera-platform	6	660	August 23, 2022
Week 3 Random Initialization Neural Networks and Deep Learning coursera-platform	6	675	May 6, 2022
How does Random Initialization prevent convergence? Neural Networks and Deep Learning coursera-platform	1	553	July 7, 2021
Questions about initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	674	October 30, 2021
What is the effect random initialization of W on multiple nodes and in neural network when all of them are doing the same thing Neural Networks and Deep Learning week-3 , coursera-platform	6	174	May 24, 2024

Random Initalization in Neural Networks

Related topics