after finishing 3/4 of the CNN course of this specialization, I want to share some thoughts on the topic.
First of all, I have to admit that somewhen during this course I really lost focus and some motivation, which has not been the case during any other MLS or DLS courses before. So I asked myself why.
I came to the conclusion that somehow all the CNN stuff seems to be less “systematic” and “explainable” then all the topics before. The described architectures sometimes seem to come “out of nowhere”: They just have been proposed in some paper and have proven to work well. But why do they work well? In most cases, there seems to be only some “plausible” explanation for why these architectures work. This is a big difference to mathematical explanation or even proofs. Sometimes, even Andrew seems to say: “Don’t think too much about why this works, it just has proven to work fine and nobody really knows why.”
Having worked as a software engineer, I am used to try to apply systematic approaches to build software systems. This is an engineering approach: Instead of doing some “vodoo” and “trial and error”, you want to have some systematic approach from which you know you will come to a result and why. With CNNs, we seem to be back in the “vodoo” stage: Put some frog legs and some plants in the water, dance around and after some hours, the medicine will be ready.
Having build up my knowledge slowly from probability theory to statistics to classical Machine Learning and then Deep Learning, I now have, for the first time the impression that the whole method has lost “connection” with mathematics. At least it seems that nobody really understands fully, why all these architectures really work. This is somehow disappointing or at lease currently, it feels like it.
I just wanted to share these thoughts. How did you experience the CNN course? What do you think?
I can certainly understand your thoughts. In the end CNN architecture and layers are really enhanced and it can feel a certain magic is happening here.
Still, let me try to demystify it at least to a small extend: convolution is a really well established and recognised signal processing method in control and system theory, see also this thread: How to Calculate the Convolution? - #2 by Christian_Simonis. So when you come rather from a Mechatronics Engineering world (like I do) and used to apply autocorrelation, do some audio or image processing, apply Laplace or Fourier transform, time series analysis and filtering, it rather seems a logical step to incorporate this filter option into layer architectures and allow the net to learn parameters on its own using tons of data.
[ One side note: convolution also plays an important role in probability theory and also provides a nice illustration why so many distributions in reality are normally distributed.
E.g. if you would convolute two uniform distributions, you will get a triangle distribution (like in the distributions of the sum of points on two dices)
if you keep adding more uniform distributions (corresponding to the sum of points on three (4,…) dices), you will approach gradually a normal distribution shape.
So convolution is really well established in many mathematical core concepts.
I would be interested, do you have suggestions if some real world applications of 1D signal processing with convolution would have helped here… what do you think, @Matthias_Kleine?
thanks for your detailed reply and the provided additional links.
As you have some background in signal processing, I take the opportunity to ask you if you could recommend some introductional book or course on especially this topic. I found some course by Mike X Cohen (which I know from some other course) here: https://www.udemy.com/course/signal-processing/, which seems to be a decent introduction, also covering convolutions.
I have special interest in time series, especially in analysing two or more signals and their relations.
Coming back to the original topic, you are asking:
and my clear answer is “yes”. I think anything that “anchors” the convolution operation and make it more vivid or gaspable, would clearly help. As I am interested in time series, 1D signal processing would of course be of interest for me.
However, on the “higher levels” there will probably be some arbitrary choices left which are not really understandable by just knowing the convolution operation better, for example the choice of how many layers, of the exact dimensions asf.
A good book, even though more theoretic was Signal- und Systemtheorie from Frey / Bossert (also German) which I got in 2013 and still use today occasionally. Also several examples are included - no promotion:
When it comes to time series analysis, it depends a little what you want to do. If you want to go for predictive analytics (basically what I was doing for several years), I learned super much from this book (also German) - no promotion:
Here you can find also an application of time series analysis and ML.
But I would like to point out that especially applying these concepts in projects and practice are more important and help these concepts like conv filtering to become „ingrained“.
So, personally I do not know this udemy course you posted but it looks quite interesting from the preview material. So I guess it can be a good start for sure. I believe it can be especially helpful if you can transfer concepts and apply them hands-on.
In addition to the great and fact filled discussion you’ve already had here, I had a couple of thoughts to throw into the mix:
When you say that you feel that the networks have gotten away from the mathematics, I would just suggest a different way to view that: there has to be mathematics there, but it’s just too complex for us to understand or be able to precisely explain at this point. As a mathematician, that feeling shouldn’t be unfamiliar. It happens all the time and we (or people smarter we are) have to continue working hard to get at those explanations. It’s not directly applicable to your convnets doubts, but as an example of some of the math that’s been understood in the last 10 years that weren’t before this paper from Yann LeCun’s group about solution surfaces and proving that there are reasonable solutions to the “local minimum” problem seems like a good example. Mind you, I’m not claiming I understand the math in the paper.
Or think of it as analogous to the difference between theoretical physics and experimental physics. It has happened many times in the history of science that people have constructed an experiment that can’t be explained with the current state of the theory. Then the theorists have to sweat for a while until they can eventually enhance the theory to explain the phenomena. Dirac Scattering and the precession of the perihelion of Mercury come to mind as examples.
And think of biology and medicine. There are comparable examples there: you can get a drug approved by the FDA (in the US anyway) by running rigorous experiments to prove that it is safe and effective. But you’re not actually required to explain the “mechanism of action” of the drug. Needless to say, people feel more comfortable if you can, but it’s not required for approval.
One other more concretely applicable thought would be that maybe you just stopped one week too early. There’s a really interesting lecture in Week 4 titled “What are Deep ConvNets Learning?” that is really worth a look before you give up here.
Welcome to the community and thanks for your question!
I’m afraid I’ve only studied this literature in German and don’t have a really good substitute English. Please note that this is more of a niche for systems and signal theory and processing in mechatronics and may not be the first choice for ML depending on what you want to achieve.
Anyway: @paulinpaloalto has put together an excellent list which you can find here:
From my personal experience I can underline the mentioned books from Chollet & Goodfellow, too. I read them in 2018 during a 2 week holiday and I learned a lot!