Multi-label classification, "covariance", and conditional probabilities

The video ‘Classification with multi outputs’ explains that we are estimating three distinct activation output which would translate into the 3 probabilities for car, bus, and pedestrian.

I am getting hung on the idea that the presence of a pedestrian may be correlated to the presence of buses and/or cars. So each activation/probabilities would not be independent but correlated. However, the way we have the three output activations makes it look like they are independent. I get that we are not literally modeling the presence of cars, buses, and pedestrians through a multivariate model, but where is that covariance component? Is it somewhat built into the fact that each output activation get a piece of the activation from the previous layer?

Going one step further, how would one get to P(car=Y|pedestrian=Y) for instance? in other words, we describe how to get the marginal probabilities, but how one would get all the conditionals?

My intuition splits this into 3 stages:

  1. Having set the target value, model1 ANN gives out probability of prior event with actual multilabelling task.
  2. Would be to compute joint probabilities of scenarios of interest and relevant prior events. This can be done as part of data prepration (image processing) stage as this would ultimately be our new target value.
  3. In ANN model 2, Output in 1 shall be input feature for identifying and computing joint probabilities of the scenarios of interest., With Jcost (maximized or minimised) depending on how we frame our cost function.

I read somewhere on this forum about generator models(?) too., Which can answer causal-effect time course relations too, if it of interest.

I think you’re taking this example one step too far. The simple image classification method taught here only assumes that each image contains exactly one of the labeled objects. No assumptions are made about real-life relationships or associations between the objects.

There are methods to deal with the topic you’re discussing - I believe they’re taught in the Deep Learning Specialization in Course 4 about Convolutional Neural Networks.

P(car=Y|pedestrian=Y)Probability that there is a car given that there is a pedestrain, right? What if you pass in a photo that you know there is a pedestrain and the model tells you the probability that there is a car?

We don’t talk about whether the model will give a good estimate on that, because a model gives a good estimate on something only if the model is designed to be good at deliverying that something. As Tom suggested, in the MLS, we mean the model to classify photos with labels that correctly describe them, and each label talks about one object only.

However, would the estimate that I talk about at the beginning be something that you are looking for? Or should we design a different model that takes care of more than one object?