Why data augmentation instead of constraints on weights

Well, with a little more thought, maybe it’s not so easy to do this. One thing to be aware of is that if you are talking about the Fully Connected Feed Forward networks in DLS Course 1, then take a look at how the input images are handled: they are “unrolled” or flattened into vectors, so it’s actually not so easy to express the geometric symmetries. They are still buried in the data, but it’s not so easy to express. Note that there are two different orders in which you can do the flattening, so your symmetry method would need to know which one you are using. See this thread for more information. Read the whole thread down to the section that discusses order = “F”.

In Convolutional Nets (DLS Course 4), the geometry of the input tensors is preserved. But there the transformation being applied is not so straightforward: they are movable filters that are applied serially across the geometry of the input. More thought required to see how one would encode the symmetries in that case.

The other high level point here is that what we are doing here is fundamentally different than Fluid Dynamics. Mind you, I never took any Fluid Dynamics, but I did get as far as Hamiltonians and the Calculus of Variations in Intermediate Physics (but it was a very long time ago :nerd_face:). There you are solving differential equations in very high dimensions. Here we are also in very high dimensions, but there are no differential equations governing the behavior. Our constraint is the Cost Surface created by the Loss Function we have chosen. Here’s a thread that gives a link to a really interesting paper about that from Yann LeCun’s group that’s worth a look to get sense for what the solution spaces look like.