Clustering with something similar to convolution in CNN

I know it is a crazy question and a very long shot. But I want to try. Some time ago I faced unsatisfactory model performance with Gaussian Clustering where objects had geographical coordinates and quantitative features. More particularly, I had property objects with attributes (square footage, prices, number of bedrooms, etc. + geographical point coordinates). The task was to create overlaying (with some probability) “zones” where nearby objects were somewhat similar. I used Gaussian clustering that did not work well with many dimensions, used PCA with some additional specific weights on more and less important features, computed distances from points, etc. The EM algorithm was quite computationally costly and the grouping quality was so, so (especially if some zone was expected to be a ring (it happens!). In addition, I ignored given restrictions (because of testing), e.g.a zone cannot cover two municipalities (legally implied methodology, I have municipality polygons). I just wonder. Convolution finds sort of boundaries in pictures. What if the geographical coordinates were the location a la pixels on an image and other quantitative features were channels? It would also make the house price prediction models better as the location and distance matter. Has anyone worked on similar issues? I would appreciate a reference to a related paper a lot!