The E_i are the eigenvalues of the Hamiltonian. So if there’s a correlation between states i and j then it is possible to have a transition between them, and the eigenvalues will not be diagonal. So there may be additional terms in the exponent, I believe. If the correlation between i and j merely modifies the energy level E_i\rightarrow E_i^\prime and likewise for j then the the coefficients w_i would be correlated but there would be no correlation between neurons. In contrast, if it caused a transition, so that there’s “entanglement” between E_i and E_j, then the w's between neurons would be correlated as well.
by this statement are you assuming neurons=parameter/parameters, than the assumption is incorrect.
neural network comes in layer where each layer might have number of neurons or a single neuron based on the data we are working.
So somehow, what I can see you are trying to look for the thermodynamics into neural network which would not completely just, as neural network doesn’t just incorporates statistical mechanics but also a mix of mathematical computational and statistical mechanics.
Deep neural network is a very vast topic, where one can get gross into the subject we are more close or we are unknown too. Neural network doesn’t mimic end-to-end statistics or thermodynamics but incorporates features relative to statistics or mathematical computation for the neural units to understand the parameter/features fed into them, so they can try to understand the hidden ability of mechanics behind the outcome.
To your other questions, I will respond after going through the links and try my best to respond.
Each input neuron is associated with a weight, which represents it’s significance of the connection between the input neuron and the output neuron where as bias is added to the input layer to provide with additional flexibility in modeling complex patterns in the input data.
So here there is no statistics or thermodynamics.
Weights indicate how much each input affects a neuron and how much it can contribute to a prediction.
Neuron are the check point for the weight to move forward to find significance in the input data to predictive outcome. During this propagation of weight from input neuron to output neuron, weight pass through layers where usually statistical mechanics is used.
as mentioned earlier it is not just complete statistics, but it is combination of computation of gradient descent, computation cost with the use of statistical mechanics in deep neural networks.
I realize w and b can also be treated as matrices for soft max but in this particular analysis when discussing energy levels it’s helpful to think of it this way
Writing
w^{(i)}_j
For the i th neuron and the j th feature, then an interaction that causes correlations can create correlations within
w^{(i)}_j and w^{(i)}_k
Or within
w^{(i)}_j and w^{(n)}_j
Or both
The first case corresponds to
E_j \rightarrow E_j^\prime and likewise for k within neuron i
The second case creates an energy transition, by analogy, between neuron i and neuron n
@s-dorsher can I know have you completed deep learning specialisation?
neural network does work phase transitions mode but rather it cumulates input neuron states initially, provides a minimal bias for the model to learn how feature of the parameter(input) fed are trying to predict the outcome. During this machine learning process, neural network at initial point the cost is 0 as it is still, and then as the iteration or the input neuron passes the information from input neuron to hidden layer unit neuron, it is trying to understand it’s feature correlative with the outcome, during which it uses relu, tanh, sigmoid or softmax activation function based on the understanding of data or model which is worked upon.
So to use the correct statistical dynamics, it is first very much important to understand the data and its data spread which is used to create a model. This is explained in detail in Deep Learning Specialisation.
Machine Learning Specialisation, explains more of statistical relativity to data features present.
Looks like we crossed paths on this one and you posted one minute before I did, while I was typing.
However, I disagree that there is nothing here relevant to statistics or thermodynamics. The math is identical. The system can be described using the same math.
If one wanted to make measurements of a simple softmax neural network, it would be possible, in principle, I believe, to learn something about complex physical systems with interactions between “particles”. Another way of saying that, is that they’re entangled. That’s relevant to how information propogates and also macrostates of systems. I think that is extremely interesting to emergent behavior of systems like these as far as physical observable effects are concerned.
when I mentioned this, it is related to neuron specific to input neuron and not as a whole, read the whole explanation where I mentioned when the weights pass through neurons, statistical mechanics is used.
Yes I know! This has all been covered already by week two of the Advanced Learning Algorithms course. I have also done reading on my own prior to this in some more theoretical books. I lack practical experience. I was hoping for some deeper discussion but it seems we are very much not having that.
No, not yet
Great? Hopefully someone will answer this question then!!! I’ve passed it along to some physicists in the mean time, I hope.
No, I think you’re still missing my point, this isn’t necessarily about which features specifically are selected.
Yes, mostly the same after careful discretization of continuous models (Lebesgue integral/measure vs summation).
I’m not an expert, but here are two leads to bridge the gap:
Energy Based Models are inspired by physics. I learnt a bit from the neural networks course taught by Prof. Hinton, but it’s no longer available on Coursera
Probabilistic Graphical Models taught by Prof. Koller on Coursera uses a lot of the related math and intuition. I’d recommend doing a basic course on mathematical/Bayesian statistics before starting the specialization
Thank you so much both of you for the wonderful links! I am going to have to take some time to look over this! This is really exciting.
I have taken advanced courses in statistics (a calculus based course that addressed transformations between variables in 1998 while I was in high school in an accelerated college math program, must admit I’m rusty, and also a class that refreshed some of this material later at another college, as well as a math methods for physics class in grad school that covered both bayesian and frequentist statistics at an intro level). I have also used statistics in research (to develop a gravitational wave search algorithm in 2008-2010, to do an analysis of how common exoplanets are in the galaxy in 2004-2006, and to assess whether or not gravitational lensing could be used to measure dark matter and dark energy in 2003-2004). All of those research experiences were frequentist statistics, though I have read a fair amount of bayesian statistics as part of a scientific collaboration (the LIGO gravitational wave detector) that has used bayesian statistics in its detections, since then. However, I take your point that it couldn’t hurt to read a bit more.
I am truly excited about the links!
I know it’s maybe not a great citation, but I have a semi-clear explanation of one of the problems I was trying to clarify, although it wasn’t the whole issue. Grok has helped me explain this a bit better than I could have without it’s help, although I could have taken the derivative myself. I hope it is not offensive or wrong to post that link. I can rewrite it by taking the derivative myself if necessary, but I think seeing it done by Grok may settle this question a bit better.
I don’t think anyone cares, but you can find the undergrad thesis on cosmology, the exoplanet paper, and the long gravitational wave transients… (radon transform section) paper on Research Gate if it matters lol probably not. Sadly none of the code survived for github.