I am working through the Machine Learning Specialization courses. I have reached the 3rd Course(Unsupervised Learning, Recommenders, Reinforcement Learning) and finished week2, but I am still struggling to understand a couple basics.
So the problem I have is in the course notes, lecture notes and code. It keeps using the following code:

And the question I have is, what is a “unit”?
I was under the impression that a single unit inside a layer is essentially a feature variable that is changed in order to reduce the cost. Which makes sense if you put in a set of vectors with matching length to the unit size, but is “units” in tensorflow just using arbritary amount of parameters? I don’t see the corresponding purpose of a Dense layer?

The first image included is from Advanced Learning Algorithm → Week1-> Inference: making predictions (forward propagation) as a reference point for another example where I don’t quite understand why the numbers are specified.

Sorry, I do not understand what you mean by “feature variable”.

A unit represents the combination of the input features and a weight for each of those features plus a bias, with some activation function applied.

a = g(w*x +b)

If you have more than one unit in a layer, they each have a unique set of weights and bias, so they each can learn different aspects of the input features.

Above is the math equation of what one (tensorflow’s dense layer) unit does. It produces one output value per sample. I think you can call such value as the sample’s embedded feature value in the corresponding layer, if you like to.

Not arbitary. Look at the following equation again:

For the dot product to work, the shape of \vec{w_1}^{[2]} cannot be arbitary, right?

To implement the NN architecture in the left, we need three Dense layers where the first Dense layer represents the left-most vertical bar of 256 circles (although only 4 are explicitly drawn). With three bars, we need three Dense layers. Each bar’s number of circles is the corresponding Dense layer’s number of units.

I think what’s tripping me up is, how do you pick the number of units? Why is it 25 and not any other number? I presume too many and it’s “overfit” and too few and it’s “underfit” would apply (right?) but I thought that if I am designing this for a new product I would have to pick the number of features, e.g. house price, house size, house age, etc. But this seems to be missing that sort of data and just seems to take 25 units of unknown inputs and tells it to calculate but I don’t know what is being processed inside those 25 units.
I understand that a is used to assign an activation that is passed to the next layer but it seems like it can be arbritary if I change the length of a as well.

So instead of 25->15 ->1. It can be 250 → 150 → 1, I don’t see what’s preventing that but nor do I understand what each unit’s value is being derived from?
In the handwriting example, why would it be 25->15-> 1, where have those numbers come from?

You want the model to be complex enough to give good-enough results, but not so complicated that it takes too long to train or uses too much computer resources.

I just want to highlight two keywords - overfit and underfit - from your message:

Therefore, you want just enough to be in between them. In other words, you need to know when it is neither overfitting and underfitting, which is introduced in Course 2 Week 3.

I understand all of that, but really just wanted to get more granular detail as to what each unit represents. But it sounds like I can have any arbritary number of units I want and it just processes different number of data points to use to refine the model.

Thank you for your assistance, it sounds like the number I pick is just an arbritary number that I can use to find the sweet middle spot that gives fast and accurate results. But doesn’t seem to represent any specific sort of meaning otherwise.