Coffee Roasting Example. How come 3 neurons with the same activation function (sigmoid) provided outputs for 3 different regions?

alex_fkh · November 15, 2023, 5:12pm

Hi!

“C2_W1_Lab02_CoffeeRoasting_TF” Lab.

How come 3 neurons with the same activation function (sigmoid) provided outputs for 3 different regions? 3 the same functions got 1 the same set of input data. How come they (units) made different “conclusions” out the data AND the 3 conclusions match the 3 regions (time, temperature, time*temperature)?

Thanks,
Alex

TMosh · November 15, 2023, 6:12pm

They have the same activation function, but not the same weights.

Each unit’s weight is randomly initialized. Since the cost function is not convex, each weight will evolve to learn a separate feature.

This method is called “breaking symmetry”.

alex_fkh · November 15, 2023, 7:17pm

Aha… In this case, does this mean that:

in the example unit 0 covered duration, unit 1 covered temperature and unit 3 covered time*temperature accidentally? If starting random weights were different than eg. unit 0 could cover temperature and so on?
Due to the random nature it could happen that 2 units would learn the same feature AND a feature could stay not be covered at all. Is the situation that we see in the example artificially created and in real situation having 3 neurons in a Layer we would probably get different results?

Thanks!

TMosh · November 15, 2023, 11:35pm

There is a better way to show the NN architecture than what is in the Lab02 notebook. Lab02 doesn’t really show the weights between the input and hidden layer.

Note that W1 has six weights - that’s all of the combinations of the two inputs (temperature and duration) and the three hidden layer units.

W2 are the weights that are used to compute the A2 value “good or bad coffee”. In this example, that’s True/False for whether the coffee is good.

For simplicity I’m not showing the bias weights b1 (a 3-element vector) and b2 (a scalar).

The A1 units (the hidden layer) give the non-linear combinations of the two input features. In general they don’t have any physical meaning - they’re just non-linear combinations of the input features.

In this simple coffee roasting example, it turns out that the three hidden layer units do have some explainable relationship.

Nowhere do we specifically compute a feature that is (temperature * duration). That’s an example of the non-linear process in the hidden layer activation function.

alex_fkh · November 16, 2023, 12:05am

I think I get it now. Thank you @TMosh for your help!

Topic		Replies	Views
How did the coffee roasting NN get trained? Advanced Learning Algorithms week-module-1	5	618	September 26, 2022
How do units within the same layer end up with different weights? Advanced Learning Algorithms week-module-2	3	710	July 28, 2022
Regarding neurons coffee roasting Advanced Learning Algorithms week-module-1	2	32	September 1, 2024
What makes different neurons calculate different parameters within a layer? Advanced Learning Algorithms week-module-1	21	1030	July 8, 2024
C2W1: How is each unit responsible for different region? Advanced Learning Algorithms week-module-1	3	572	February 2, 2023

Coffee Roasting Example. How come 3 neurons with the same activation function (sigmoid) provided outputs for 3 different regions?

Related topics