Hi, I’m trying to figure out how the math behind Deep Learning approximately looks like.
Can somebody pls verify if my understanding of DL with a (simple, tiny, fully connected) neural network is correct? (1 epoch)
If it’s correct I think it would also help others to acquire a better understanding of neural networks
 Forward pass
 = dot product, + = addition (can include broadcasting) , f = activation function (may be different ones)
a) example with 1 hidden layer
Y^ = f ( W2 * f ( W1 * X + b1 ) + b2 )
b) example with 2 hidden layer
Y^ = f ( W3 * f ( W2 * f ( W1 * X + b1 ) + b2 ) + b3 )
Y^, the result of the forward pass gets applied on a cost function (i.e. for binary classification)

cost function
J = Y * log ( Y^ ) + ( 1  Y ) * log ( 1  Y^ ) 
back propagation
for back propagation this cost function J will be partially derived for Wi and bi.
The parameters will be updated then:
Wi := Wi  lr * dWi
bi := bi.  lr * dbi
epoch is finished
W: WeightMatrix
b : bias  Matrix
lr : learning rate
X : input
Y : label
Y^: output of forward pass
f : activation functions (may be different ones)
i: index for W, b