Coffee roasting tensor flow lab--I don't know if helpful to anyone-- I did solve this

s-dorsher · December 22, 2024, 1:15am

I don’t think it is revealing too much to show this.

After training the algorithm provided in the lab, I get a low loss but the weights don’t match the weights given from a previous training run with a similar loss in the next step.

My loss:

Epoch 1/10
6250/6250 [==============================] - 5s 747us/step - loss: 0.1782
Epoch 2/10
6250/6250 [==============================] - 5s 754us/step - loss: 0.1165
Epoch 3/10
6250/6250 [==============================] - 5s 775us/step - loss: 0.0426
Epoch 4/10
6250/6250 [==============================] - 5s 742us/step - loss: 0.0160
Epoch 5/10
6250/6250 [==============================] - 5s 744us/step - loss: 0.0104
Epoch 6/10
6250/6250 [==============================] - 5s 756us/step - loss: 0.0073
Epoch 7/10
6250/6250 [==============================] - 5s 743us/step - loss: 0.0052
Epoch 8/10
6250/6250 [==============================] - 5s 737us/step - loss: 0.0037
Epoch 9/10
6250/6250 [==============================] - 5s 749us/step - loss: 0.0027
Epoch 10/10
6250/6250 [==============================] - 5s 750us/step - loss: 0.0020
<keras.callbacks.History at 0x70985c5fe710>

The provided loss: “If you got a low loss after the training above (e.g. 0.002), then you will most likely get the same results.”

My weight output:

W1:
 [[ -0.13  14.3  -11.1 ]
 [ -8.92  11.85  -0.25]] 
b1: [-11.16   1.76 -12.1 ]
W2:
 [[-45.71]
 [-42.95]
 [-50.19]] 
b2: [26.14]

The provided weight output from a previous run:

W1 = np.array([
    [-8.94,  0.29, 12.89],
    [-0.17, -7.34, 10.79]] )
b1 = np.array([-9.87, -9.28,  1.01])
W2 = np.array([
    [-31.38],
    [-27.86],
    [-32.79]])
b2 = np.array([15.54])

So I tried with a second model that didn’t tile the data thinking that maybe it was an issue of correlations between data, and I couldn’t get the loss to converge even with a variety of training rates and epochs.

So then I tried with a different random seed. I chose 1235. This is my output.

Epoch 1/10
6250/6250 [==============================] - 5s 732us/step - loss: 0.1848
Epoch 2/10
6250/6250 [==============================] - 5s 745us/step - loss: 0.1274
Epoch 3/10
6250/6250 [==============================] - 5s 737us/step - loss: 0.1155
Epoch 4/10
6250/6250 [==============================] - 5s 738us/step - loss: 0.0459
Epoch 5/10
6250/6250 [==============================] - 5s 735us/step - loss: 0.0167
Epoch 6/10
6250/6250 [==============================] - 5s 743us/step - loss: 0.0109
Epoch 7/10
6250/6250 [==============================] - 5s 742us/step - loss: 0.0076
Epoch 8/10
6250/6250 [==============================] - 5s 730us/step - loss: 0.0055
Epoch 9/10
6250/6250 [==============================] - 5s 737us/step - loss: 0.0040
Epoch 10/10
6250/6250 [==============================] - 5s 746us/step - loss: 0.0029

The loss is great. It should be great. The behavior should be similar for a different random seed.

However, the parameters are fairly different. I suppose I should see if the predictions are different or not, since that’s ultimately what matters.

Here’s the parameters with the random seed 1235.

W1:
 [[-17.4  -10.71  -0.07]
 [-14.5   -0.16  -8.56]] 
b1: [ -2.62 -11.75 -10.71]
W2:
 [[ 33.33]
 [-46.6 ]
 [-42.18]] 
b2: [-9.08]

I’m not sure why the fit parameters to the model would be unstable in this way. That hasn’t been my experience with fits for fitting data using statistical methods in programming. A variation of the random seed in a monte carlo simulation should also not change the characteristics of the fit a great deal, generally speaking, and I would have expected this to behave similarly. However, I am realizing that the “product” of this program is the prediction, rather than the fit parameters. Let me give that a try.

For the seed = 1235

predictions = 
 [[9.64e-01]
 [1.10e-04]]

decisions = 
[[1.]
 [0.]]

Which is reasonable

For the provided weights:

predictions = 
 [[9.63e-01]
 [3.03e-08]]

decisions = 
[[1.]
 [0.]]

Which is the same

Interesting.

As far as decision boundaries go, the parameters differ in which weight is which, and the normalization, roughly. I imagine introducing a regularization term would address the normalization, however, the neurons are still interchangeable.

I did write this up and almost posted it before I solved it, but then I didn’t need help. Now I think there’s that bug on the final practice lab, and I’m thinking, maybe the numbers aren’t supposed to be different, and someone might want this help…

Steven

TMosh · December 22, 2024, 1:22am

Interesting experiment, thanks for posting it.

Neural networks do not have convex cost functions. So there are usually multiple solutions. Which minimum you find can be influenced by the initial weight values (which change with the seed value).

As long as you find a minimum that gives “good enough” performance, that is a good solution.

Topic		Replies	Views
[C2_W1_Lab02_CoffeeRoasting_TF] Why do we load some saved weights from a previous training run? Advanced Learning Algorithms week-module-1	7	636	November 27, 2023
C2_W1_Lab02_CoffeeRoasting_TF Counting of weights and tensorflow Advanced Learning Algorithms week-module-1	2	231	April 8, 2024
Model.fit() related question Advanced Learning Algorithms week-module-1	6	586	December 24, 2022
Varying loss and accuracy _Convolution_model_Application_ Week1 Convolutional Neural Networks coursera-platform	2	517	December 30, 2021
C2_W1_Lab02_CoffeeRoasting_TF: tf.random.set_seed AND load previous weights (why don't we just use the calculated weights) Advanced Learning Algorithms week-module-1	1	328	November 20, 2023

Coffee roasting tensor flow lab--I don't know if helpful to anyone-- I did solve this

Related topics