Size of W matrix

Hi there,

I’ve got a question about the W[1] vector in the 2 layers NN: Andrew says it is a 4x3 matrix, 4 being because the layer has 4 nodes that get stacked horizontally. Why has this matrix 3 columns? Each w vector of each node then should be a 3x1 vector. Why?


  1. 1 ↩︎

1 Like

Hey @Kaulfield,

The weight matrix W1 for this connection typically has dimensions that depend on the number of nodes in the input layer and the number of nodes in the hidden layer. Let’s break down the dimensions:

  • Number of nodes in the input layer (features): This determines the number of rows in the weight matrix.

  • Number of nodes in the hidden layer: This determines the number of columns in the weight matrix.

If Andrew says that the weight matrix W1 is a 4x3 matrix, it means that:

  • The input layer has 4 nodes (features), so there are 4 rows in W1.
  • The hidden layer has 3 nodes, so there are 3 columns in W1.

I hope it makes sense now.


Thanks, Jamal, yes, I saw it now… Which follows to my next question, which I guess is pretty stupid, given I saw it clearer with the simple logistic regression example: given x1, x2, x3, etc, as inputs, I thought each of them was the value from a single training example. Andrew talks about ALL of them belonging to a single training example, then forming capital X as a matrix. I don’t get it, van a single training example have plenty of inputs? Wouldn’t each of them be a single training example?


Hey @Kaulfield,

It’s not a stupid question at all! The distinction between individual input features (like x1, x2, x3, etc.) and individual training examples can sometimes be a bit confusing, but it’s an important concept to understand in machine learning and i will try to make it clear for you.

  • Each input feature (x1, x2, x3, etc.) represents a different characteristic or property of the data. These features are used to describe the attributes of the data points you’re trying to learn from.

  • A training example is a complete data point that includes values for all of the input features. In other words, a training example is a set of feature values that correspond to a single data point or observation. This is what makes up a single row of your training data matrix.

So let’s use an example to clarify it.

Suppose you are working with a dataset of houses and you want to predict the price of a house. Your input features could include things like the number of bedrooms (x1), the square footage (x2), and the neighborhood (x3). Each house listing would be a training example, and for each house, you have values for all these features:

  • House 1: (x1, x2, x3) = (3 bedrooms, 1500 sq. ft., “Suburb A”)
  • House 2: (x1, x2, x3) = (4 bedrooms, 1800 sq. ft., “Suburb B”)
  • House 3: (x1, x2, x3) = (2 bedrooms, 1200 sq. ft., “Downtown”)

In this example, each of these “Houses” is a training example. They each have multiple input features. These training examples would be organized into a matrix X, where each row represents a different house.

So, when you hear Prof. Andrew Ng or others talk about input features belonging to a single training example, it means that each row of the training data matrix represents a distinct data point, and the columns represent different attributes or features of that data point.

I hope it’s clear now for you and feel free to ask for more clarifications if needed anytime.


Hey, Jamal:

You made the distinction cristal clear! Thanks a lot.


El El vie, 20 oct 2023 a las 13:17, Ahmed Gamal via DeepLearning.AI <> escribió:

Hi again,

Sorry to bother again, but I was re-reading your explanations and, given Matrix capital X with all training examples, of size nx x m, in your example of houses, each column should represent a house, right?


El El vie, 20 oct 2023 a las 13:17, Ahmed Gamal via DeepLearning.AI <> escribió:

You are correct. For the purpose of this course, the X (capitalized) matrix is n x m. This means each column represents a single house, and each row represents the features for that house.

Different courses/articles may do this differently, and so you might see something different if you read other articles. Fortunately, you can easily convert any matrix from (n x m) to (m x n) as necessary simply by using its transpose, which is X.T in python/pytorch.

Yes, I didn’t want to seem a smart-ass, I just wanted to make sure I got all notations etc. of this particular course (which I am enjoying a lot).

Thanks all for your help,

El El mar, 24 oct 2023 a las 7:20, JB (Don) Lau via DeepLearning.AI <> escribió:

Hello @Kaulfield,

In the previous explanation, each house is considered a training example, and the features, such as the number of bedrooms, square footage, and neighborhood, are represented as columns in your data.

I’ll provide a clearer explanation of how this relates to a neural network:

Input Layer:

  • In a neural network, the input layer consists of nodes, with each node corresponding to a specific feature in your dataset.
  • For instance, if you are working on a house price prediction model with features like the number of bedrooms, square footage, and neighborhood, you would typically have three input nodes in the input layer. Each of these nodes is specifically designed to process one of these features.

To illustrate, you can think of the input data as your “x1,” which represents the first data point or the first row, essentially representing the “first house” in your dataset. Each feature (column) from this data point is passed to a dedicated node in the input layer of the neural network.

I hope it’s clear now and feel free to ask for more clarifications.