I’m trying to come up with a neural network model to solve the following generic interpolation problem:
If x_c maps to y_c where x_c is on a coarse Ndimensional grid, y_c is a scalar and we know (x_c, y_c) pairs for each x_c on the coarse grid, predict the value y_t at a point x_t which is NOT on our coarse grid. The peculiarity here is that even though we can assume the coarse grid (and hence each x_c) to be fixed, the corresponding set of values {y_c: x_c maps to y_c} is sampled from a random continuous distribution, hence there are uncountably infinite such sets possible. Further, the limitation is that we cannot build / train the neural network independently for each (x, y) coming from a given sample set of data. Note that each individual y_c in the set defined above is from the same sample of the underlying random variable, and different sets come from different samples of this random variable
For training, we can assume that for a given sample of the underlying random variable, we know not only all the (x_c, y_c) pairs but also a lot of (x_t, y_t) pairs.
To approach this problem from deep learning perspective, I am thinking of a couple of ways:

Define the neural network input feature set as a concatenation of each of the M (x_c, y_c) pairs as well as x_t – hence input X feature size is (M*(N+1) + N). Define output Y = y_t. Now train the neural network with all available (X, Y) pairs.

Above, suppose that the input feature dimension is too high to be practical. But then, we know that the value y_t can be interpolated to desired accuracy by interpolating only from a sufficient number P of (x_c, y_c) pairs with x_c chosen from the subgrid that surrounds x_t. To utilize this, I was thinking to limit the input X feature size to (P*(N+1) + N) where we have picked only P surrounding points to x_t on the grid. Definition of Y remains same i.e. Y = y_t.
Question is, will #2 work? #2 can be viewed as starting from #1 and forcing the weights originating from nonsurrounding x_c and corresponding y_c to zeros. However, the problem seems to be that the weights that we force to zeros depend on one of the features x_c – so I am inclined to think that it may not work. Any thoughts on this will be useful.
Is there any prior work that deals with the above problem?