What exactly is a TensorFlow Layer "unit"?

Adam_Ting · January 28, 2024, 4:46am

When creating a post, please add:

Week2: TensorFlow implementation of content-based filtering
1031×506 314 KB

I am working through the Machine Learning Specialization courses. I have reached the 3rd Course(Unsupervised Learning, Recommenders, Reinforcement Learning) and finished week2, but I am still struggling to understand a couple basics.
So the problem I have is in the course notes, lecture notes and code. It keeps using the following code:

tf.keras.layers.Dense(units=256),
tf.keras.layers.Dense(units=128),
tf.keras.layers.Dense(units=32),

And the question I have is, what is a “unit”?
I was under the impression that a single unit inside a layer is essentially a feature variable that is changed in order to reduce the cost. Which makes sense if you put in a set of vectors with matching length to the unit size, but is “units” in tensorflow just using arbritary amount of parameters? I don’t see the corresponding purpose of a Dense layer?

Thanks,
Adam

Adam_Ting · January 28, 2024, 4:48am

The first image included is from Advanced Learning Algorithm → Week1-> Inference: making predictions (forward propagation) as a reference point for another example where I don’t quite understand why the numbers are specified.

TMosh · January 28, 2024, 4:58am

Sorry, I do not understand what you mean by “feature variable”.

A unit represents the combination of the input features and a weight for each of those features plus a bias, with some activation function applied.

a = g(w*x +b)

If you have more than one unit in a layer, they each have a unique set of weights and bias, so they each can learn different aspects of the input features.

rmwkwok · January 28, 2024, 10:28am

Hello @Adam_Ting

Above is the math equation of what one (tensorflow’s dense layer) unit does. It produces one output value per sample. I think you can call such value as the sample’s embedded feature value in the corresponding layer, if you like to.

Not arbitary. Look at the following equation again:

For the dot product to work, the shape of \vec{w_1}^{[2]} cannot be arbitary, right?

To implement the NN architecture in the left, we need three Dense layers where the first Dense layer represents the left-most vertical bar of 256 circles (although only 4 are explicitly drawn). With three bars, we need three Dense layers. Each bar’s number of circles is the corresponding Dense layer’s number of units.

Cheers,
Raymond

Adam_Ting · January 28, 2024, 11:30pm

Firstly, thanks for the reply.

I think what’s tripping me up is, how do you pick the number of units? Why is it 25 and not any other number? I presume too many and it’s “overfit” and too few and it’s “underfit” would apply (right?) but I thought that if I am designing this for a new product I would have to pick the number of features, e.g. house price, house size, house age, etc. But this seems to be missing that sort of data and just seems to take 25 units of unknown inputs and tells it to calculate but I don’t know what is being processed inside those 25 units.
I understand that a is used to assign an activation that is passed to the next layer but it seems like it can be arbritary if I change the length of a as well.

So instead of 25->15 ->1. It can be 250 → 150 → 1, I don’t see what’s preventing that but nor do I understand what each unit’s value is being derived from?
In the handwriting example, why would it be 25->15-> 1, where have those numbers come from?

TMosh · January 29, 2024, 12:14am

Experimentation.

You want the model to be complex enough to give good-enough results, but not so complicated that it takes too long to train or uses too much computer resources.

rmwkwok · January 29, 2024, 12:57am

Hello @Adam_Ting,

I just want to highlight two keywords - overfit and underfit - from your message:

Therefore, you want just enough to be in between them. In other words, you need to know when it is neither overfitting and underfitting, which is introduced in Course 2 Week 3.

Cheers,
Raymond

Adam_Ting · January 31, 2024, 12:07am

I understand all of that, but really just wanted to get more granular detail as to what each unit represents. But it sounds like I can have any arbritary number of units I want and it just processes different number of data points to use to refine the model.

Thank you for all your help

Adam_Ting · January 31, 2024, 12:09am

Thank you for your assistance, it sounds like the number I pick is just an arbritary number that I can use to find the sweet middle spot that gives fast and accurate results. But doesn’t seem to represent any specific sort of meaning otherwise.

I appreciate all the help!

TMosh · January 31, 2024, 1:24am

You are correct.

Topic		Replies	Views
C2W1_Lab_Coffee Roasting in Tensorflow Advanced Learning Algorithms week-module-1	7	517	January 2, 2023
What does the 'units' parameter in the 'Dense' function signify? Advanced Learning Algorithms week-module-1	2	395	October 8, 2023
Week1 - Inference in Code lecture Advanced Learning Algorithms week-module-1	8	152	July 7, 2024
Dense Layer's Units Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	1	550	September 19, 2022
What is the role of a Dense layer and its units Advanced Learning Algorithms week-module-1	2	533	August 27, 2022

What exactly is a TensorFlow Layer "unit"?

Related topics