How to qualify unknown in CNN response variable?

Manu · June 21, 2022, 10:09am

Lets imagine that we are observing foxes and we write the presences (1) and the absences (0), in some square cell (1kmx1km) of a raster being like a matrix of cells covering the whole territory.

The neural network is a CNN:

The response variable, Y, is thus a matrix containing absences 0 and presences 1.
The predictor variables X, are for example altitude - vegetation…

Given the fact that we could conduct observations only in a few spots, we can thus attribute our observation values 0 or 1 to a few cells only.

How should the unknown or N/A values in Y be encoded ? Because all other cells that we didn’t have a chance to visit may in fact contain absence or presences of foxes.

→ Would assigning a neutral value of 0.5 (for unknown-N/A) in Y be appropriate? Having at the end 0 for absence, 1 for presence and 0.5 for unknown. The risk might be that the network will learn on unknown as well and thus introduce bias.

→ Another solution would be for Y to be a superposition of 2 matrixes, 1 matrix being foxesAbsence with value 1, another being foxesPresence with value 1. In this case, the unknown would be left implicit.

It should be mentioned that subsetting on Y by leaving aside unknown is not applicable for two reasons: 1) The network used is a CNN, 2) The response variable is a superposition of matrixes of other species as well. Therefore, a way to qualify N/A or unknown in the different species matrix constituting Y is needed.

Any thoughts or suggestion?

Elemento · June 21, 2022, 6:08pm

Hey @Manu,
That’s an interesting question. Before presenting my opinion, do help me to understand the question clearly please. When you say this:

Are these defined for the entire raster only once, or are they defined individually for the individual square cells (1km x 1km)? I am assuming the latter one, because otherwise how would we differentiate the square cells from one another? Moreover, how are these features defined for these square cells in the form of an image? Like one image for altitude of raster, one image for vegetation of raster and so on?

Additionally, do you want the network to predict “Unknown or N/A”, or do you want the network to predict presence or absence of foxes for these unknown square cells? Again, I am assuming the latter one, but still clarify please.

Lastly, please elaborate upon the following statement:

Regards,
Elemento

Manu · June 22, 2022, 6:18am

Hi @Elemento, thanks a lot for your interest and questions:

Single predictor X

Each variable, for example vegetation, is one raster layer with the shape nW, nH, nC → For example (100, 100, 1)
In this case, we have 100X100 cells, that could be 1km2 each
The value of vegetation (0 or 1) is then assigned to each cell. If no vegetation, the cell gets the value 0 or if vegetation is present within the cell, the cell gets the value 1

Multiple predictors X

There could be 15 other variables, such as water, altitude…
Each variable has the shape (100,100,1) with values within each cell. The values can be binary, as in the case of vegetation or continous, as in the case of altitude.
All raster layers are then stacked together
In the end, for 15 variables, you get X.shape = (100,100,15)

Single response variable y

For example foxes presence(1) or absence(0) within each cell
For a single species, Y.shape would then be (100,100,1)
The goal, is for network to predict the presence or absences of foxes within each unknown square cells, where observer could not go to see if foxes were there or not.

Multiple response variables Y

Other species could be added, for example deer absences - presence within each cell
For 2 species, Y.shape would then be: (100,100,2)

Model1: prediction of a single species

To lessen complexity, in a first phase, a network predicting the presence-absence of a single species will be done
In this case, X.shape = (100,100,15) and Y.shape = (100,100,1)
Based on the X predictors, the model should output the probability of presence or absence of foxes within each cell with unknown status. In other word, within each cell where observer could not go to conduct observations.
A CNN model is used, as it will process X (100,100,15) , then with a depth of 15 channels. It’s like processing a picture, but instead of having 3 colours channels, we now have 15.
In the case of single species prediction, the CNN shape output will be (100,100,1)

Model2: prediction of multiple species

To lessen complexity, model1 is elaborated in a first phase
In a second phase, the objective is to have the model output prediction for multiple species
In this example with 2 species (foxes and deers), it means that X.shape(100,100,15) and Y.shape(100,100,2), having this time an output with channels = 2.

Any thoughts about how to handle the unknown (N/A) cells in the response variable ?

Regards,
Manu

Elemento · June 22, 2022, 6:39pm

Hey @Manu,
It indeed is an amazing question. But let’s take it step by step.

From your above description, am I safe to conclude that the features in X which has a shape of possibly (100, 100, 15) aren’t related to pixels in any possible way? If my conclusion is correct, in that case, have you given any thought of posing it as a classification problem but instead of using CNNs, using a classification model, for instance, XGBoost, Decision Tree, Logistic Regression, etc?

Cause, if we have X with dimensions (100, 100, 15), we can simply unroll it, so that it has a shape of (10000, 15), and now, it’s nothing but a tabular dataset, and we can simply eliminate the rows having N/A values.

I am suggesting this because if your data doesn’t share any fundamental aspects of a typical image, like sharing horizontal edges, diagonal edges, circles, etc, CNN’s won’t be of much help to you, don’t you think?

Do let me know what do you think about this, and then we will discuss further?

Cheers!

TMosh · June 22, 2022, 9:14pm

There are methods for dealing with missing features (for example, if specific ‘x’ values from a given example are not available).

But with supervised learning, you need to have output labels for all of the examples. So if you have an unavailable Y value, that example needs to be removed from the training set.

Perhaps the method @Manu is proposing is not a good match for this task.

Elemento · June 23, 2022, 5:15am

Hey @TMosh,
Thanks a lot for your input. I really missed out on the fact that we don’t have y values for examples having N/A. I will update my answer.

Regards,
Elemento

Manu · June 23, 2022, 5:54am

Hi @Elemento,
Thank you, excellent point regarding screening out n/a and tabular dataset. As a matter of fact, I did that already and used a Gradient Boosting Machine approach. It worked fine in terms of accuracy, but I found 2 problems with a tabular approach screening out n/a:

I am missing the environmental structure, the value of environmental variables being just the values of the raster cell in which the presence or abences are found, not the whole environmental structure closeby such as a river networkor the interactions effects between different type of environmental variables (river network + rocky plain). In other word, CNN would enable this, by learning the environmental representation, as on an image. This comes back to your point, regarding the image. Handling the stack of X variables as an image would enable this and CNN would be a clear added values. For info, some research was already done on this : Convolutional neural networks improve species distribution modelling by capturing the spatial structure of the environment
It takes 90 minutes for one single species processing and the objective would be to process 300 of them, regularly. I could of course do some optimization, but the most important is that single species prediction does not enable to learn an environmental representation common to a large number of species, which stabilizes predictions from one species to another.

Therefore, CNN seemed to be the next best move in this regard

Manu · June 23, 2022, 6:02am

Hey @TMosh,
Thanks for your feedback, as stated previously in my answer to @Elemento, some research was already done on this:

But as you say, the structure of Y need to be carrefully considered. In this research they used the presence of species only. The objective would be to improve on this and use absence points as well.

Elemento · June 23, 2022, 6:21am

Hey @Manu,
I think from this answer, we can safely conclude that CNNs have a great advantage in your application, too great to discard them for a simple ML classification model.

So, now we have some grid cells for which we have missing X and y values. How about this? You may be able to find some techniques to handle invalid pixels in images, using which you can generate values for your raster data, i.e., X provided that the assumptions of these techniques hold true. Some of these I have mentioned below for your reference:

Once you employ one of these or perhaps some other technique, we will have a complete representation of X without any N/A values. Now, you can use a ML-based classification model for instance, XGBoost to predict the values of y for the grid cells, for which we have just generated the values of X. You can easily train this model on values of X and y which were already available. Once this step is done, you will have a complete X and y representation, and then you can employ CNNs.

Does it help you in any way? It may be computationally expensive and I am not sure if this is a valid method or not, but what is your opinion on this?

Regards,
Elemento

Manu · June 23, 2022, 12:26pm

Hey @Elemento,
Thanks for these interesting articles and points of view. It helps me a lot with my thinking.

The main problem in my case is to handle N/A in Y, not in X. For example:

X1 is a variable is for example water, each pixel or cell has value 1 if river is found into it or 0 if the river is not found.
X2 is vegetation, 1 if found in a pixel, 0 if not found.

Therefore, no problem with N/A at the level of X, but the main issues are N/A in Y

Lets continue with a simple example to help my thinking process as well: we could imagine pictures or rasters made of only 4 pixels (or cells)

X

X1.shape (2,2,1) = water with value 1 if present in a pixel or 0 if absent
X2.shape (2,2,1) = vegetation with value 1 if present in a pixel or 0 if absent
→ the X input shape is thus (2,2,2)

Y for a single species:
Y.shape(2,2,1) = foxes with values:

1 if found present in a pixel visited by an observer
0 if found absent in a pixel visited by an observer
N/A if pixel not visited by an observer
→ the Y output shape is (2,2,1)
→ the values found in the 4 pixels or cells could be for example [1 , 0 , N/A , 1]

Three potential solutions:
I think we have three potential scenario to solve this problem.

1. Fill missing values (N/A) in Y
This would be the approach you mentioned above, using for example XGBoost on tabular data to fill N/A in Y.

The advantage is that it would enable to wage CNN in a second phase on multiple species that would be stacked in Y (Y1, Y2, Y3…)
The inconvenient is processing time as you mentioned. I am also thinking about potential bias, because in a first phase, the tabular processing XGBoost would remove the environment structure and just focus on single environmental cells values of X to predict Y. It would finally apply the learned function to predict value for all N/A of a species. In a second phase, CNN would be applied, which would mean benefiting from the environment structure but learning on some new values of Y (that were N/A before) that may not be entirely correct due to the limitation of tabular XGBoost processing.

2. Remove N/A weights
I do not know if it’s technically possible, but let’s imagine that we conduct a three classes classification training on Y

Class0 : absence
Class1 : presence
Class2: unkwown N/A

At test time or production time, would there be a way to only use the weights of Class0 and Class1 for future predictions, and thus discard the weights of Class2 ? Image segementation would enable such an output, but I do not know if we could discard or neutralize all weighst related to a specific class to only focus and use the weights related to the two other classes (Class0, Class1) to get a prediction.

3. Neutralize N/A cells in Y
I do not know if it’s technically possible either. Would it be possible to define an area or zone on a picture where the CNN algo should not consider learning. If we could define such “non-learning spots” on a picture, we could then apply this technique in our case.

What do you think?

Elemento · June 23, 2022, 1:26pm

Hey @Manu,
I guess the first point requires no further discussion. You have laid down it’s pros and cons very beautifully, and I agree with all of them.

Now coming to the second point. I haven’t seen this approach being used in any research till now, because as far as my understanding goes, we can’t draw a direct relationship between the weights and the classes. The weights are more related to the function that the neural network learns to predict one of the many classes. Even if, and that’s a big

IF

we learn to somehow modify the weights (and hence the function learnt by neural network) so that the network predicts only 2 classes (instead of 3), it will be predicting incorrectly on the cells that would have been originally classified as ‘N/A’, because the dataset trained the model to classify a cell into 3 classes, not 2 classes.

If you are wondering, why it’s a big if, this is because neural networks in general are black box models. So, understanding the function learnt by a neural network is not an easy task, and then modifying it, makes it even a harder task. There has been a ton of research in the past decade enhancing our understanding of neural networks, but whether that will be enough for this task or not, that I am uncertain of.

Additionally, classifying a cell as ‘N/A’ doesn’t make much of a sense to me, because we simply have missing labels for these square cells. These cells don’t have any difference in distribution of X from the cells that have the labels. It’s just that we have missing labels. So, how do you think, a model can possibly differentiate these cells. For example, consider 2 examples having same features, for one you have the label, for other you don’t have the label. It’s a completely valid case, and now, there is no way a model can differentiate between these 2 examples. So, even if we are able to somehow implement this approach, once again a big

IF

we will circle back to

Now coming to the third approach, this seems to be an interesting one. I assume you are thinking into some sort of masking approach. I thought about this too, and it seems to be pretty good, until something else came to my mind. Let’s say we apply the masking in the input layer, i.e., to X before it is fed to any layer. Now, what’s stopping the neural network to make some sense out of these masking values and use these to learn a function which will classify each of the cells as presence or absence of a species. We wanted to make the neural network exploit the spatial information in the first place, and I guess the neural network might make a lot more sense than we wanted it to. In the inference or production time, there will be no mask in any of the examples. So, will the performance be retained?

Another possible place to apply masking is while doing the cost computation. But if we do this, it is as good as adding 0 to the cost for cells having ‘N/A’ as their labels, which is another way of saying that for these cells, we have the perfect predictions, which is definitely wrong. So, how do you think we can employ this approach?

In conclusion, the second approach seems to be dead end to me. The first approach, you have defined pretty well, as I just said, and the third approach, could be possibly used, and I am pretty uncertain of this as well

Let me tag some other mentors, and they will surely be able to correct our perspectives if they are going wrong somewhere, or perhaps provide some new perspectives.

@TMosh @paulinpaloalto @rmwkwok @anon57530071 Guys, can you please look into this query and provide your opinions. Thanks in advance.

Cheers!

rmwkwok · June 23, 2022, 2:55pm

Hey @Elemento, thanks for tagging me.

Hello @Manu, how are you? I have 2 ideas after reading your discussions.

#1
Your GBM worked fine, so why not just add some features to account for the environmental structure? A 3x3 filter takes only adjacant cells into account, so the GBM equivalence would be for each cell, you calculate the, for example, the sum of the surrounding 8 cells, and see if it improves your baseline accuracy. If not, I will expand one cell outward to aggregate 8+ 16 cells, and see the change.

#2
For your CNN, masking unknown y in the calculation of loss is a great idea, and you normalize your total loss by the number of known y. You probably need to add the mask into the loss function yourself.

Cheers!

rmwkwok · June 23, 2022, 3:00pm

#3

Another way would be, for each known y, you cut a smaller surrounding raster area out as your new X, then your new X-y dataset always have a known y.

The best cutting size should be by analysis of your data and/or domain knowledge and/or by experiments.

Elemento · June 23, 2022, 3:26pm

Hey @rmwkwok,
Won’t the second approach, lead to the below issue?

Additionally, I am a little confused about the 3rd approach that you have mentioned. Does it involve replicating the values of y to it’s surrounding square cells having N/A as their labels based on KNN-sort-of algorithm for each of the smaller regions that we cut out of the original region?

Regards,
Elemento

rmwkwok · June 23, 2022, 3:37pm

No. First, in the forward phase, it will predict something for the unknown cells, but because you masked them, their influence will not be propagated back to the weights of your model.

Cost = J(\vec{\hat{y}}, \vec{y}) = J(\vec{\hat{y_{\text{known}}}}, \vec{y_{\text{known}}}) + J(\vec{\hat{y_{\text{unknown}}}}, \vec{y_{\text{unknown}}})

Cost_{\text{masked}} = J(\vec{\hat{y_{\text{known}}}}, \vec{y_{\text{known}}})

Again, you need to introduce the mask to the loss function implementation to make that happen.

No, not at all. Let’s say we have a 100 x 100 raster. And there are two cells with known y = 1, and their locations are (25, 30) and (72, 80), assuming the cutting size is 21 x 21, I am going to cut two squares centered at the above 2 locations. For the first one, it will be from 15 to 35 on x-axis and 20 to 40 on y-axis. And this new and smaller subraster will carry a label y = 1.

So, a 21x21 raster as the new X, and 1 as the label for this X.

rmwkwok · June 23, 2022, 3:50pm

@Manu Is 90 minutes the time for training a model for one species? Or is it the time for making prediction for one species in all rasters?

Elemento · June 23, 2022, 4:13pm

I am really confused now @rmwkwok I guess I have just said the same, haven’t I

Since the labels are for the individual cells, then using y = 1 for all the cells in the region defined by (15, 20), (35, 20), (15, 40) and (35, 40) isn’t the same as replicating y = 1 for all these cells? And I am assuming if any of these cells have known y, then we will simply use those labels. Please correct me, if I am understanding it incorrectly.

Regards,
Elemento

rmwkwok · June 23, 2022, 4:15pm

No problem

Now the new model will accept my new X and predict one value of y, not a matrix of y. It’s like accepting a photo and predict whether it is Cat or Not Cat.

Elemento · June 23, 2022, 4:17pm

But the problem requires us to predict the matrix of labels right , i.e., defining whether the species is present in each cell or raster or not, instead of whether the species is present in the entire raster?

rmwkwok · June 23, 2022, 4:19pm

In that case we need to make more number of predictions to go through the whole 100 x 100 raster.

Topic		Replies	Views
Image segmentation: dataset pipeline - customized output AI Discussions	12	162	July 9, 2022
Getting Real AI Discussions ai-discussions , langchain , introductions , data-centric , project	12	77	October 29, 2024
Week 3 - Assignment 1 - Computation of Class Score: Why multiply Pc with C? Convolutional Neural Networks coursera-platform	18	768	June 4, 2022
Lessons learned training YOLO from scratch on custom images Convolutional Neural Networks coursera-platform	8	897	September 23, 2021
C5 w1a2 Dinosaur sample Y dimension Sequence Models coursera-platform	2	637	October 7, 2021

How to qualify unknown in CNN response variable?

IF

IF

Related topics