Hi, I am working on the “Multi-Class Classification lab” of Week 2 of Advanced Learning Algorithms. When analysing the working of Layer 1 with ReLU activation function, I cant understand what the decision boundary of this function implies. Like in case of sigmoid function the decision boundary was something that distinguishes between Probability being greater than or less than 0.5. Can someone explain what does decision boundary in this case(ReLU activation) imply?

ReLU doesn’t give you a decision boundary. It isn’t used for making predictions. It is only used in a hidden layer to provide a non-linear activation function.

Hey, Thanks for replying.

In the lab that I mentioned above (“Multi-Class Classification lab” of Week 2 of Advanced Learning Algorithms), layer 1 uses ReLU and the solution shows that the two units of the layer 1 classify the 4 coloured data sets in two distinct ways. How is this different from classification of data ?

I will check the details and reply again later.

The hidden layer L1 uses ReLU units. They aren’t really doing ‘classification’ in the sense of the output of the entire model.

Each ReLU unit is just drawing a line that splits the input data into two regions. Nothing in the model tells the L1 units exactly how to do this. Each unit just learns the weight and bias values that help to minimize the cost at the model’s output.

It’s not shown clearly in the lab, but the true “decision boundaries” are at the output layer, where each unit learns to identify one class and reject the other three. That happens automatically because there are four output units, and the model is compiled to convert the linear output units into logits and to use “Sparse Categorical Crossentropy” for the cost (loss) function.

During training, any errors in the output predictions are feed back into the hidden layer, where the two ReLU units each learn to identify a different pairs of clusters - not because they’re specifically told to, but because that’s the solution that minimizes the cost.

Ok so the 2 units in ReLU layer are just segregating the area of the plot into two regions, depending on where the value of activation function crosses zero, and the shaded area in the plots for Layer 1 shows just that i.e. the “line” where this transition occurs, though its still not a classification of any kind. To segregate these shaded areas the units in layer 1 adjusts to weights which ultimately minimise the output loss, and the real classification happens when the linear output of the second layer is input in softmax model. This gives the probabilty distribution for each unit of the output layer as shown by the shaded areas for probability distribution in the 4 plots of output.

Thanks a lot!!