Hi, I’m trying to figure out how the math behind Deep Learning approximately looks like.
Can somebody pls verify if my understanding of DL with a (simple, tiny, fully connected) neural network is correct? (1 epoch)
If it’s correct I think it would also help others to acquire a better understanding of neural networks
- Forward pass
- = dot product, + = addition (can include broadcasting) , f = activation function (may be different ones)
a) example with 1 hidden layer
Y^ = f ( W2 * f ( W1 * X + b1 ) + b2 )
b) example with 2 hidden layer
Y^ = f ( W3 * f ( W2 * f ( W1 * X + b1 ) + b2 ) + b3 )
Y^, the result of the forward pass gets applied on a cost function (i.e. for binary classification)
-
cost function
J = -Y * log ( Y^ ) + ( 1 - Y ) * log ( 1 - Y^ ) -
back propagation
for back propagation this cost function J will be partially derived for Wi and bi.
The parameters will be updated then:
Wi := Wi - lr * dWi
bi := bi. - lr * dbi
epoch is finished
W: Weight-Matrix
b : bias - Matrix
lr : learning rate
X : input
Y : label
Y^: output of forward pass
f : activation functions (may be different ones)
i: index for W, b