I think what you show above is essentially the same as Prof Ng has shown us in the lectures in Week 3 and Week 4. But your notation is a little different. You write out the composition of the functions explicitly, which may make it more apparent how forward propagation actually works, but is not really generalizable. It will be pretty confusing to read with more than 2 hidden layers.

One other thing to note: the activation functions which you have called f() and Prof Ng calls g() do not all have to be the same. You have a choice at the hidden layers. Even within the hidden layers, they don’t have to be the same, although that is typically the way it’s done. At the output layer, it is determined by what you are doing. For a binary classification network, the output activation is always sigmoid. For a multiclass classification it is always softmax, which is the generalization of sigmoid.

You also haven’t shown any of the formulas for back prop, so it’s a bit unclear how this really adds any value over and above what Prof Ng has already shown us.

Also note that Prof Ng has specifically structured these courses so that they do not require knowledge of calculus (either univariate or vector calculus). There are lots of resources available on the web if you have a math background and actually want to see the derivations. Here’s a thread that has some links to get you start on that quest.