before I close the DLS, I would like to take the opportunity to ask a question that I asked myself for months now.
When looking at the architectures of NNs like the attention model, we can see a series of “blocks”, each block representing some layer or cell or something similar.
Being a software developer and architect for about 20 years, most of my professional carreer I have been implementing software in object oriented languages, especially in Java. When I see these NN models, drawn in blocks, I automatically start thinking in “objects”, and my intuition in implementing such models would be to implement a class which takes some data as input (representing the state of the class in member variables) and offers some public methods (representing the functionality of the layer or cell).
However, as we can see in all the programming exercises, this is obviously not the way the AI community is implementing this. Instead we can observe a functional approach. What we see in a NN architecture diagram as a layer or cell is represented as a function with some input parameters, producing some output parameters.
In some cases this results in “strange” coding stye like filling “cache” variables and passing these from function to function. Also, some functions tend to have very long signatures that really look ugly from a clean software engineering point of view.
So I wonder if there has never been a tendency to overcome this functional approach and think more in a object oriented manner.
I would be happy if you could share your thinking and experience about this issue. Note that I do not want to be a smartass, being a newbie in the field and telling the experienced guys “how to do it”. I just want to understand the way of thinking in this field, and why it is like this.
There are object oriented approaches to ML programs and definitely they automate the process much better but I think the purpose of these courses here is to instruct in a step by step way and to teach the learner how the overall process progresses for an ML program.
I think that is the reason so that any learner no matter his background can have it easier to understand and learn.
If you take a glance at the Keras and TensorFlow source code over on github you will see that they are both written taking advantage of Python’s support for classes and class hierarchy.
I wrote software professionally for 2 decades also and agree that the code provided in these exercises has little resemblance to what I or my teams would have produced. Poor variable and function name choices, no encapsulation or abstraction, virtually no error handling. It’s just not professional code.
I attribute that in part to the fact that it isn’t written by professionals; it’s written by academics more focused on ML pedagogy. And in part because the content creators can’t assume their audience understands OO and don’t want the burden of teaching that along with the ML concepts. I think they actually go out of their way to hide the underlying OO foundation except where it is unavoidable, such as defining a custom Callback.
So take away the ML ideas from the exercises, ignore the software engineering approach.
I think this is a nice insight. In my experience, early attempts at understanding a problem and a demonstration of solution concept were always more procedural, more cut and paste,with substantial less polish and documentation. The longer we lived with a body of code, the more we saw what worked and didn’t, what was reused or not, the better architected the software became. It was always important for us to get some code working quickly to learn from it and not spend much time on premature optimization before that learning had occurred. Whether we went back and refactored and polished was a data-driven economic decision, and to be fair, there isn’t much financial incentive for deeplearning to invest in it. Every person hour spent refactoring working exercises is $ not invested in new course content.