Context: This code is based on a 3 layer fully connected neural network coded from scratch(no libraries)trained on had written numbers 0-9. This back query code will then take in an output value of 0.99,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01 and then its run backward through the network to get the pixel values at the beginning of the network to see what the defention of a 0 is to the network.
So my question is after the inverse sigmoid is applied amd that vector is multiplied by the vector of the transposed weight matrix how is that supposed to give me the activation values from the layer previous because if I do a dot product between two matrices đâđ=đ and then transpose đ.đâđ that does not give me X ? So then how could back query be useful ? It clearly is useful cause when I run the code it shows me the networks idea of a 0 but cant piece together how it works in my head.
So in short how does back-query give me this representation visual if it gives completely different values for the activations going backward then going forward ?
Its traning a neural network like normal but after training is done just inputting an ideal output inot the end of the network and going backwards to just see what the network has learned for its defenition of a #0 in this case there is no encoder decoder
I know you donât have a full autoencoder. I think you need one.
Back-feeding the output to re-compute an input is probably not useful. Iâve tried it before and it always looks like mush. Iâve never looked into it in detail.
When I plot the X with matplotlib the image sometimes roughly resemble the number that was inputted into the back of the network and back queried but sometimes it looks like complete mush of pixels
I need to first know whether your maths are correct. If you canât prove that your maths are correct, why would you believe that your code was correct, given that your code is nothing but implementing the maths?
I donât skip the maths here. Itâs your choice of whether you actually want me to have a look If you want me to have a look, please share what those question marks are in the form of math equations (not code).
The challenge here is that you want to reverse the forward propagation equation, so that you can provide the output label, and compute the most likely corresponding image.
So you need to write out the equation for forward propagation, itâs going to look some thing like:
a_out = softmax(sigmoid(X * W1 + b1) * W2 + b2)
Then solve that equation so you have X = on the left side of the equation, with W1, W2, b1, b2, and a_out on the right.
If you can write out the math, then you can implement it.
@TMosh@rmwkwok Sorry for being confused but how was the response I sent not the math ? I wrote the equations that I used to get my answer. Z2= then a1= then z1= etc⌠Thank you guys for helping me just trying to understand what exact response you want.
Is it that you donât want for example a1=W2.TZ2 but instead a1=W2.Tinverse sigmoid (A2) ? So like all the steps for each and not shorten it buy using the variable from the previous calculation in the next one ?
Are you sure that transpose is the correct operation here?
Neglecting the bias for the moment.
In general, you have some A = Z * W from forward propagation.
Now youâre trying to reverse that operation in order to get Z based on having A and W.
So in concept you want to multiply both sides by the âinverse of Wâ, so you have Z = A * "W inverse"
But in practice, since weâre using matrices, W only has an inverse if it is square. But W is almost never going to be square, except in very limited circumstances.
I still recommend you to go through the maths, because then you will be able to realize limitations that can cause unreasonable results.
Going through the maths does not mean showing how the code does the math, but how you justify them. Writing the equations down is only the first step. You correctly pointed out the problem with inverse, but how does that justify transpose? You said transpose can give you ok result, is your result ok? You shared someoneâs code for âback queryâ, is that code always giving you ok result? I canât defend the use of transpose for someone else. You said the code comes from a book, how did the book justify it?
Tomâs psuedo inverse is an inverse that works for non-square matrix, but it has its problem too, because a full-rank non-square matrix can either be over-determined or under-determined, and if you studied linear algebra, you should know what that means (although problem can also be a source for interesting findings ). If you experiment carefully, you will understand more on that problem and probably will find more problems along way (e.g. what happen if inverse sigmoid receives numbers outside of the range between 0 and 1? What justifies just clipping the numbers off? If you canât justify it, what if you change sigmoid to something else to avoid this problem?).
I canât guarantee you that that âback queryâ will work, but you can study what problems it can face. After understanding them, maybe you can think about how to tackle those problems (if possible), or maybe you can be relieved and think about: (1) why donât I just display training samples that rank highest? (2) should I search for another way?
This process may not yield a working âback queryâ, but it will yield a more solid understanding of the problem and THIS is what you can definitely get.
Wow thanks @rmwkwok that helped alot. I need to find the reasons for why the back querey should or should not work and then start chaging thinks to solve or discover new problems. The back query might be good for the purpose in the book but bad in the context of other things. Thank you for this very insightfull response I will contunue to explore this from diffrent angles. I think going super deep into these fundamentals without any lbraries to abstract anything will do wonders for me when I am manipulating or coming up with my own new nn architechtures in the future because I will be able to see the numbers flow through the networks perfectly and see how to iterate and come up with my own methods !