TensorFlow coffee roasting lab
I don’t think it is revealing too much to show this.
After training the algorithm provided in the lab, I get a low loss but the weights don’t match the weights given from a previous training run with a similar loss in the next step.
My loss:
Epoch 1/10
6250/6250 [==============================] - 5s 747us/step - loss: 0.1782
Epoch 2/10
6250/6250 [==============================] - 5s 754us/step - loss: 0.1165
Epoch 3/10
6250/6250 [==============================] - 5s 775us/step - loss: 0.0426
Epoch 4/10
6250/6250 [==============================] - 5s 742us/step - loss: 0.0160
Epoch 5/10
6250/6250 [==============================] - 5s 744us/step - loss: 0.0104
Epoch 6/10
6250/6250 [==============================] - 5s 756us/step - loss: 0.0073
Epoch 7/10
6250/6250 [==============================] - 5s 743us/step - loss: 0.0052
Epoch 8/10
6250/6250 [==============================] - 5s 737us/step - loss: 0.0037
Epoch 9/10
6250/6250 [==============================] - 5s 749us/step - loss: 0.0027
Epoch 10/10
6250/6250 [==============================] - 5s 750us/step - loss: 0.0020
<keras.callbacks.History at 0x70985c5fe710>
The provided loss: “If you got a low loss after the training above (e.g. 0.002), then you will most likely get the same results.”
My weight output:
W1:
[[ -0.13 14.3 -11.1 ]
[ -8.92 11.85 -0.25]]
b1: [-11.16 1.76 -12.1 ]
W2:
[[-45.71]
[-42.95]
[-50.19]]
b2: [26.14]
The provided weight output from a previous run:
W1 = np.array([
[-8.94, 0.29, 12.89],
[-0.17, -7.34, 10.79]] )
b1 = np.array([-9.87, -9.28, 1.01])
W2 = np.array([
[-31.38],
[-27.86],
[-32.79]])
b2 = np.array([15.54])
So I tried with a second model that didn’t tile the data thinking that maybe it was an issue of correlations between data, and I couldn’t get the loss to converge even with a variety of training rates and epochs.
So then I tried with a different random seed. I chose 1235. This is my output.
Epoch 1/10
6250/6250 [==============================] - 5s 732us/step - loss: 0.1848
Epoch 2/10
6250/6250 [==============================] - 5s 745us/step - loss: 0.1274
Epoch 3/10
6250/6250 [==============================] - 5s 737us/step - loss: 0.1155
Epoch 4/10
6250/6250 [==============================] - 5s 738us/step - loss: 0.0459
Epoch 5/10
6250/6250 [==============================] - 5s 735us/step - loss: 0.0167
Epoch 6/10
6250/6250 [==============================] - 5s 743us/step - loss: 0.0109
Epoch 7/10
6250/6250 [==============================] - 5s 742us/step - loss: 0.0076
Epoch 8/10
6250/6250 [==============================] - 5s 730us/step - loss: 0.0055
Epoch 9/10
6250/6250 [==============================] - 5s 737us/step - loss: 0.0040
Epoch 10/10
6250/6250 [==============================] - 5s 746us/step - loss: 0.0029
The loss is great. It should be great. The behavior should be similar for a different random seed.
However, the parameters are fairly different. I suppose I should see if the predictions are different or not, since that’s ultimately what matters.
Here’s the parameters with the random seed 1235.
W1:
[[-17.4 -10.71 -0.07]
[-14.5 -0.16 -8.56]]
b1: [ -2.62 -11.75 -10.71]
W2:
[[ 33.33]
[-46.6 ]
[-42.18]]
b2: [-9.08]
I’m not sure why the fit parameters to the model would be unstable in this way. That hasn’t been my experience with fits for fitting data using statistical methods in programming. A variation of the random seed in a monte carlo simulation should also not change the characteristics of the fit a great deal, generally speaking, and I would have expected this to behave similarly. However, I am realizing that the “product” of this program is the prediction, rather than the fit parameters. Let me give that a try.
For the seed = 1235
predictions =
[[9.64e-01]
[1.10e-04]]
decisions =
[[1.]
[0.]]
Which is reasonable
For the provided weights:
predictions =
[[9.63e-01]
[3.03e-08]]
decisions =
[[1.]
[0.]]
Which is the same
Interesting.
As far as decision boundaries go, the parameters differ in which weight is which, and the normalization, roughly. I imagine introducing a regularization term would address the normalization, however, the neurons are still interchangeable.
I did write this up and almost posted it before I solved it, but then I didn’t need help. Now I think there’s that bug on the final practice lab, and I’m thinking, maybe the numbers aren’t supposed to be different, and someone might want this help…
Steven