As usual, @paulinpaloalto provides some good insight. Here’s my take.
In my mind every neural network has 3 common elements: input layer, hidden layers, output layer. Input and output have dependencies on the external context that may involve/require pre- and/or post_processing (eg image normalization, non-maximum suppression). The hidden layers must implement the proper downsample to get from input shape to output shape as well as implement the proper transformation(s) to achieve useful outcomes. Every NN does this, so its really important to understand that idea. Writing a simple NN completely yourself is a great way to do it. It’s important to understand what the activation layer contributes, so pick one. But after you write a sigmoid function, there is rapidly diminishing benefit from writing your own ReLU, tanh, softmax etc. I can’t recommend spending much time at that level. If you understand what a convolution is mathematically, maybe from exposure to Fourier Transforms, do you understand better after writing your own? Maybe not. And once you have written your own 3-layer network, understand how data flows to and is transformed by each layer, do you need to write your own 10- layer NN? 20- ? Probably not.
What I have personally found extremely helpful is taking one of the industry algorithms and doing a deep dive on it. For me, it was YOLO. I spent months reading the papers, reading open source implementations from darknet and others, and finally trying to reproduce it myself on a public data set. I reused the architecture and TensorFlow/Keras implementation of all the layers (convolution, pooling, activation etc) but wrote my own loss function and training loop. Really beneficial. I put some of my digital exhaust in these forums, which you can find through my @ai_curious profile.
Speaking of papers, i do recommend reading them. Maybe with your background they will be directly accessible. For me, they are often a struggle, since these days I tend to glaze over when I see too many Greek letters. I find these papers are generally written by people very deep in the field for an audience of their peers. If you don’t already know what they are talking about, sometimes it isn’t easy to figure it out. Nonetheless, i think it is good to do, and I tend to circle back and reread the original papers periodically after my own knowledge and understanding advances. The papers provide a good historical context and often refer to one another as each group builds on and tries to overcome issues and liabilities of the solutions that have been published before.
Ultimately the path to take depends on the destination. Do you want to invent new NN architectures? Improve runtime performance of existing architectures? Apply existing architectures to new business problems? In my opinion the closer you are to the domain, the less you need to focus on the gears and pulleys…leverage the frameworks and pretrained models. The more interest you have in how NN are producing their output, the more attention you need to devote to what’s behind the curtain. HTH