Need suggestions on neural network framework for mapping spatial polygon data to an output metric

I am working on exploring neural networks to create a model for a specific problem: I have a 3D spatial input which is defined by rectangular polygons (xmin, ymin, xmax, ymax, zcenter). For each polygon, I can apply a load (Load). This load will result in the output metric - say temperature - for each of these polygons. A high load on a given polygon will result in high temperature for that polygon and some lower temperature in neighboring polygons due to heat spreading. I have training data for this behavior which is obtained from physics based solvers.

To simplify, my input and output looks like below:

Input: [N x 6] [xmin, ymin, xmax, ymax, zcenter, Load] where N is number of polygons.

Output: [N x 1] [Temperature]

I tried few frameworks like 1D CNN, 1D CNN with attention block, 2D CNN (all with some fully connected layers). I performed convolution operation (both 1D and 2D scenarios) on the Nx6 input. None of them seem to capture the spatial behavior I am hoping to capture - Hotspot where there is load and dissipating heat as we go away from hotspot.

Can you please suggest some pointers on what you think would be a good NN framework to address above problem ?

If your dataset is really just a single large space that contains a set of adjacent polygons with heat flowing between them, then you really have only one example. Not ‘N’ examples.

I don’t fully understand what your model is attempting to learn, since your dataset seems to consist of independent polygons that are located in 3D space, without regard to any adjacent polygons.

My first thought would be you are barking up the wrong tree here. Neural Networks are not good at solving physics problems: they can only detect patterns in existing data. You’d have to provide training data for every possible physics input and by the time you can do that, what is the point? It reminds me of another case from 4 or 5 years ago where a student wanted to train a neural network to predict whether a given airfoil design was going to stall at a particular speed. That is not a reasonable approach. If you show it a shape for which there is no training data, it doesn’t know Bernoulli’s Law and has no way to make a reasonable prediction. Interpolation doesn’t work with physical processes that have critical points like when the air flow breaks up and you stall or when the water molecules change from liquid to frozen.

My suggestion would be to start by thinking about whether there is a way to extend the existing “physics solvers” you mentioned to handle your more general case.

Thanks for your question.

To clarify, I am attempting to learn the temperature at each polygon. Please see a simplified example below illustrated with 4 polygons in 2D scenario. Like shown in the illustration, I will have multiple scenarios of Nx6 data where the load value and location changes for each scenario. I want to train a model that can capture the spatial behavior of how temperature changes as we change load value/location.

Hi Paul,

Thanks for your insight. I completely agree with you that it does not make sense to replace physics based solvers with NN based solvers. I am just trying to create a reduced-order-model or compact model for a specific representation of my input/output data within the constraints of my training data. I don’t expect the NN model to capture physics in the scenario where the training data is not available.

In other words, this will be my quick turnaround model that I can use to get a ballpark estimate of how a given load impacts temperature based on location. Once I get this estimate, I can use my higher fidelity models that captures physics to get accurate numbers.

So it appears your model takes multiple polygons as inputs, and has multiple outputs (one for each polygon).

So in your illustration, that’s one example that contains four polygons. Not four different examples. So your NN architecture needs to change to match that architecture.

If you have other examples with a different number of input polygons, each geometry would have to be handled via a different model.

That is correct. My model takes multiple polygons as inputs and has multiple outputs (one for each polygon).

I need to make sure the NN architecture is capable of taking the input of varying size, whether there are 4 polygons or 50 polygons, assuming that is possible.

If I understand you correctly, you are saying it is not possible to handle varying number of input polygons with a single model. Is that correct ?

Thanks

Yes. NN’s really require a fixed input size and shape (other than Sequence models, which this isn’t).

I just finished the CNN - course, so im fairly new but it’s interesting for me to read here and trying to understand. For me, with yet very small understanding, it sounds like you basically aim at modeling dependencies between spatial distance and physical properties, e.g. the temperature you mentioned and its property of radiating heat? Is it what you are aiming for?

Regarding your input. As far as I understood, with CNNs you may vary with the size of your inputs, as long as they won’t step beyond the size of the actual layer, since using padding in your conv layer gives you the ability to pad out relative smaller inputs with respect to your actual layer.

Thank you!

If I model them as graph neural networks (GNN), which I am exploring right now, where the polygons are represented as nodes and their spatial dependency is represented as edges, do you think it is possible to have varying input size ?

I am very new to GNN, so trying to get a sense of how far this approach is reasonable from a fundamental viewpoint.

Hi Ben,

Thanks for your inputs. I considered CNN based model - and in fact, like you pointed, CNN would be the perfect candidate. I want to be able to capture a wide range of input size (from few polygons to 100s or even 1000s of polygons), if that is possible. I am afraid if padding will only allow me to expand the size by a small margin and not to this extent. Please comment if you think that is not true and if you think we can use padding to change size by order of magnitude.

Sorry, I have never worked with a GNN.

I think it might become a complex endeavor. But basically everything is do able if you ask me. But please consider me as a beginner, with barely, almost none real life production experience in this field.

Although it’s late I tried to think about it, more kind of some mind plays.
Using padding as kind of a gapcloser might be a starting point. It depends on how much your input vary. Because those paddings aren’t just paddings but also weights, and they have to be learned I guess. Not sure about how much of varying padding it can handle, before you start to get problems. At least methods introduced in the bottleneck-layer part of the course, in which you expand dimensions post injection, could help regarding the padding issue. Also Residual Connections might be useful to lower the risk of gradient problems in Backpropagation.
Maybe you could even use the padding as some sort of heat radiation, thus having information in the padding that may serve instead of being “useless” or zeros, not sure about hat.
But if so, you have to consider your inputs that almost fill the whole input layer. These cant be provided with the same amount of padding for your heat distribution like the small inputs, except via its channels.
Scaling values, like physical properties proportionally to the padding is probably not an option. Already the depth of details in your data would need to be huge.

I learned very interesting things in the course of Andrew regarding CNNs. And with such techniques like expansions, pointwise convolutions and residual connections, im sure it’s possible to handle certain padding-variabilities, even in a wider range. Additionally, there is so much research in such a high pace atm, there have to be certain methods that would act as a little helper.

At the end your data will have the most impact on what the model learns. Its capabilities to process spatial information suits well to what you are looking for. Whether the effort is worth it compared to another solution, is a different question. I don’t know. But tbh, it sounds super interesting and fun.

Cool problem. I love the way that real world problems challenge us to figure out to apply ML tooling.
As I understand your problem so far, it seems very applicable to neural networks. They should be able to capture the non-linearities in your problem domain quite well. The problem is just in characterising the problem domain.

I think a 2D CNN

Firstly, some clarifications that you might need to provide - but I’ll apply assumptions for now:

  1. As these are polygons, do they change in shape and size relative to each other or are they a regular grid? I’ll assume a regular grid for now - if that’s not the case, you may need to apply some interpolation in order to regularize your data to a regular grid.
  2. Are your polygons in 2D or 3D space? Can they be approximated to 2D space, or as a 2D manifold? Initially I’ll assume a flat 2D surface, but I’ll offer some adjustments to 2.5D manifolds afterwards.

I reckon a 2D CNN is perfect for your job.
We treat each polygon as a single position, with its neighboring polygons as neighbouring positions.
CNNs nicely solve the problem of needing to calculate the local position’s values based on its neighbours, and then calculating the neighbour’s values based on the first position’s outcome too.

Step 1 - Preparing data for CNNs.
A loosely located set of polygons in 3D space isn’t suitable for CNNs. They were designed for images, and work best on data that is at least like an image. The first step is to take your diamond-oriented polygon arrangement and make it like a regular grid. Take diagonal paths through your polygon space, either approximating or interpolating as appropriate, ending up with a regular matrix of load values. Something like in the following image:

Each “pixel” is now represented as single decimal value.

If your polygons differ significantly in size and location, then I’d probably suggest sampling at a higher resolution, but still producing an image tensor that is simply h x w x 1 shape (h=height,w=width).

  1. Output
    The output of your CNN is your heat value. Trained on your data set, after the same approximations/interpolations are applied.

Extension to 2.5D manifolds. If your polygon-to-polygon relative coordinates differ in 3D dimensions, and that third dimension is important in determining heat (eg: two surfaces closer together cause heat to dissipate between them), then you’ll need to apply that too. I’d suggest adding that as an extra layer of information on top of single load value that has been applied to our regularized grid. You’ll have to figure out the details, but I imagine something like a bump-map as used in computer graphics could approximate the relative proportion of expected surface-to-surface interactions.

On that note, and getting back to the simpler 2D polygon shape and position variability. Another alternative is to use weighting to account for the relative differences. In that way, you interpolate to a grid granularity that equals the number of polygons (like in my image), but you use a weight vector to indicate the relative differences in their effect. This could even go so far as capturing a value for each adjacent. For example, in the following diagram, I’ve indicated with green arrows the way that four adjacent polygons influence the one in the middle (assuming that there’s always exactly 4 adjacent polygons, or that we can sufficiently approximate the data in that way). Assuming that the polygon surface area is the primary factor in how it influences the heat dissipation to the middle polygon, we can set each of the four weights to indicate the relative sizes of the four neighbours. This then forms a 5-vector for each polygon, holding the polygon’s own load, plus the weighting that should be applied to the neighbours: <load, w1, w2, w3, w4>. You would use a fully-connected layer across this dimension, and a CNN across the height,width dimensions. (Sorry, I know this is possible, just not the coding that makes it happen)

I hope that made sense.
Best of luck.

1 Like

For an exercise in practicing drawing these diagrams, I think what I’m suggesting is represented like this. Notice that I’ve added some hidden layers, and arbitrary picked a depth of 16, and I’ve used the idea of “unwrapping” a 3D polygon surface to a flat map to illustrate what I was getting at at about the 3D to 2D mapping:

Hi Malcolm,

First, thank you for your detailed thoughts and illustrations to convey your suggestions. I think I can use a lot of your ideas and apply for my problem. Few things to clarify your assumptions:

  1. The polygons are indeed different sized (all of them are still rectangles, but width and length varies).
  2. My domain is 3D or 2.5D as you mentioned. The z dimension is also important for my problem. But like you mentioned, if we can formulate the problem in 2D, it should be relatively easy to scale it to the 3rd dimension.

I think I can work with the idea of mapping a 3D surface to a 2D image and converting them to a grid of 2D mesh points to feed into CNN. The only gap I see is the variable input size as I was discussing with Ben (also in this thread).

For instance, I could have one problem with very few polygons that could result in 2D input grid of size 10x10 and another problem with quite many polygons that could result in 2D input grid of size 1000x1000. Since CNN is dependent on input “image” size, other than padding with empty pixels to match the input sizes (say between 10x10 grid and 1000x1000 grid), I don’t know how to address this.

In other words, if we proceed with CNN, which I totally agree with you perfectly fits this problem, other than padding the image to arrive at a constant size, do we have any other way to train a CNN that can take multiple sized images ?

Hi Ben,

Thank you again for follow-up thoughts. I think I will consider trying out this approach to pad with empty pixels to arrive at a constant input image size.

One other thing I am thinking is to train on a very large input mesh (say 1000x1000 grid). This should capture all length scales within this grid size. So now when I input a 10x10 grid, I can just pad it with empty pixels to upsize it to 1000x1000 grid and pass it through the CNN that was trained on 1000x1000 grid. This way, the empty pixels will remain empty as there is no real input heat source in those pixels anyway.

I am still interested in finding a much more general solution that has no limit in input size, like how sequence models can deal with any sized input text (few words to 1000s of words). Any thoughts on how we can achieve this are very much appreciated.

Thanks again

@funnyfox, can you please clarify, when you say that the input mesh size varies substantiall (10x10 vs 1000x1000), is that equivalent to saying that the absolute size varies or that the resolution varies? The answer has quite a big influence on whether padding would work or not.

Or put more succinctly, how does the grid size change the outcome? For example, does the extent to which heat distributes across polygons vary by grid size (which would be the case if the grid size variation represents different sampling resolutions across a physical surface). Are there global effects that you need to take into account? For example, if one whole quadrant of the mesh heads up uniformly does that influence the diagonally opposite quadrant much more so than in some other situation?

Another assumption I was making is that the effects can be modelled in a localised way - in fact, that the heat dissipation effect distributes according to something similar to a guaussian distribution, with a narrow radius.

@malcolm.lett When I say input size varies, I mean that the physical absolute size could vary. Assuming we keep the grid size fixed (say 1m x 1m), I want to be able to solve a 10x10 mesh (10m x 10m) or 1000x1000 mesh (1000m x 1000m).

For the sake of simplicity, it would be very appropriate to assume that the grid size can be fixed and we don’t have to worry about global effects.

You are absolutely correct in assuming gaussian nature of heat distribution, if that helps in simplifying the problem. For instance, if a polygon or a grid point has a load or heat source, the temperature is highest at this point and dissipates in a guassian format as we move away from heat source (in all 3 dimensions). Of course, it is possible that there can be more than one polygon with heat source at any given point of time and the neural network should be able to capture how multiple gaussians interact with each other.