By the way, in case you’re up for it, it occurred to me that the function used to compute what I’ve been calling the “bump map” (for translating from 2.5 to 2D) might also be learned through gradient descent. And likewise how the relative polygon sizes should be converted into weights.
This may be totally unnecessary, so ignore this if that’s case. I’m guessing that the calculation of those weightings and/or bump maps might be complex - or rather than trying to figure out the maths might be hard word.
The thing is that the gradient descent algorithm is quite good at optimising what would otherwise be hyperparameters. So, for the 2.5D->2D bump map for example, maybe you decide on some function that, for a given polygon, looks at the third axis (we’ll call it height) of the adjacent polygons, calculates the amount of curvature, then multiplies by some hyperparameter to derive a weight. You can use gradient descent to find the optimum hyperparameter, optimising via the same loss function that you’d use as part of your standard supervised learning. All you need to do is to include that hyperparameter in the gradient tape (TensorFlow terminology). Again, I know this is possible, I just don’t (yet) know what the code is to do it.
Happy to explain better if you’re interested in this approach, but my explanation isn’t clear enough.
An interesting logical extension, mind you, is that if you can use the network to learn this hyperparameter, it gives a clue on an extended network input structure so that you can have the network cope with the full 2.5D mesh coordinates. But I’d keep it simple, do the bump map function manually at first, and build on it once you’ve got things working.
Then I think you’ve got the right idea - train on 1000x1000 meshes. When you run against 10x10 meshes, pad out to 1000x1000 in such a way that represents a wider sheet with load only applied in the middle.
Keep your convolution map to an appropriate small size - the largest practical extent to which the heat can dissipate. This will probably be something like 10x10 or less.
Contradicting Ben a little, there’s no extra weights to compute/learn in this padding solution. You’re only training a 10x10xdepth CNN, and even the depth here is a static hyperparameter.
Furthermore, because your model is CNNs only, without any FCNs, there’s nothing stopping you from applying the same trained network against meshes of any size. The problem is only in the APIs. When you build and train a model, it locks in the shape and prevents you from providing inputs with different widths/heights. I believe that’s fairly easy to overcome, with a little it of extra coding. What you have to do is, each time you evaluate:
create a brand new model sized to the dimension of the input mesh
copy the CNN weights from the trained model into the new model
do whatever it is that’s needed to mark the model as ready for use
use that cloned model for evaluation.
For meshes less than 1000x1000 there’s probably no benefit, assuming that 1000x1000 fits into your GPU. But larger meshes it’s worth it. Alternatively, this gives you the option of picking the training mesh size that best optimises use of your GPU.
Coincidentally, just came across this in my notes. Apparently something called a Fully Convolutional Network might help you cope with meshes of different sizes:
You are totally right, the filter won’t have hyperparameters for the padding. That was not well communicated. Thanks for that.
Could funnyfox use something like a channelwise correlation between polygons being in a certain radius close to each other, thus calculating the effect of dependent heat distribution and the temperature? Not sure if that makes sense.
Thanks for interesting thoughts again. I want to think through further on one of your suggestions:
create a brand new model sized to the dimension of the input mesh
copy the CNN weights from the trained model into the new model
In the second bullet, how do you envision we copy the weights from one model to the other ? Or how do we copy weights from a model that is trained on say, 100 x 100 grid to use on a model that will solve 1000 x 1000 grid ?
I want to elaborate on this to understand more: Let us say, we fix 100 x 100 grid as baseline for our training purposes. We will use a 10x10 filter (like you suggested) and solve the CNN on some type of a U-net where we convolve the input grid to a small layer and end up with latent variable (with some fc layers) and de-convolve to arrive at a 100 x 100 grid which will have the temperature. Something like below:
If we look at the figure (which I agree is highly simplified), are you suggesting we use the 10x10 filter and 5x5 filter that is learned on a 100 x 100 grid and apply the same filter on a larger grid - say 10,000 x 10,000 ? If the conv operation is indeed capturing the physics of how a local hotspot distributes heat to its surroundings these kernels should have everything we need to apply at any length scale.
Please let me know if I understood your suggestion correct when you say “copy the CNN weights from the trained model into the new model” .
Unfortunately my theoretical knowledge is well ahead of my practice skills with nets - otherwise I’d write the pseudo-code.
In any case, given what you’re described so far, I wouldn’t bother with different sizes conv masks, nor with a U-net. The benefit of U-nets is for vision tasks where there may be, say, one object that could be anywhere within the image, and which has information across multiple scales. This is the local vs global influences thing we were discussing before.
You’re situation is quite different. Effectively, each pixel is it’s own object, and there’s only a close local influence between objects. Also, you don’t care much about different scales (we’ve already confirmed that a 1000x1000 grid has the same resolution as a 10x10 grid). So a fixed CNN scale/size across all layers should work well.
In terms of the copying between models, I’m sorry, but you’ll have to google that one. It shouldn’t be hard. Keep in mind that you’ve only got weights + biases for the conv kernel, irrespective of input size; but you’ll have a separate conv kernel for each model layer. So you’re independent of the input grid size, but you do still have a few thousand parameters to copy.
By the way, a quick google has prodded my memory and thinking a bit more. Because you won’t have any Fully Connected layers, your model should handle input grids of any size without even needing any fiddly tricks.
Just to clarify, is the goal to learn how heat transfers between different the materials/objects? Or possibly creating a neural network way to predict physics solver outputs?
If the physics solver represents the entire scene state where temperatures change between different time steps, perhaps “how the heat transfers” could be learned/predicted for the next time step, possibly with a large enough [3D (convolutional?/attention?) block module] volume which slides over each area of the space and learns to predict future temperature dissipations? (assuming it also has access/information to the various material types)
At different layers, different volume sizes could be used to capture small local temperature changes vs. broader across-scene temperature changes.