One concept that I’ve seen in Shogi and then Chess (Stockfish) is the idea of NNUE Models, that is ‘Efficiently Updatable Neural Networks’.
(Can’t include links, but the basic premise is you have a feature set and subsequent invocations of your model will only change the input features a very very small amount, say flipping one binary input from 1 to 0 say).
I’m a complete beginner, but in a different field, may have a similar use-case where I know between invocations of the model, my input parameters may vary only slightly. In the case of NNUE in both Stockfish and Shogi - the NN implementations are hand crafted (and get into low level code, e.g. taking advantage of AVX2 etc, shifting data to/from GPU would otherwise kill performance). I was wondering if anything like this optimisation has been done in Tensorflow/PyTorch? (And, any examples).
hi @MichaelL
is it like you want to train a model parallely??
Also can I know what to do you mean by this
also can I know the binary feature of flipping is based on what criteria, based on updating efficient neural network implementation??
if you are stating about avx2 type optimization, then I suppose they surely do in pytorch, where data is distributed on multiple GPUs during training, where data replicate the model into each GPU and the split the input batches into smaller batches to be processed independently on each GPU. After processing, the results are gathered and combined, and the backward pass is performed to update the model parameters.
Even in tensorflow, this could be done in batches and using transfer learning with selective model layer and using the base model and retrain with the update model parameters or any new features.
Hope this helps!
Regards
DP
No, It isn’t during the model training. Its once you’ve built a model.
You have some predictions to make, but you know up-front that the input parameters won’t change much between predictions hence the ‘efficiently updatable’.
But the input features can be binary switches say. That’s one thing. You may fined that you have a set of binary features and between predictions the feature set may change very slowly…
even in this first the model is built, the efficient updating of parameters would be based on how the model trained on multiple GPUs independently, and then combining the results to get updated parameters.
I still haven’t got response to my query on hand made nn implementation and low level code, please elaborate on this and also when you state predictions and efficient updating, are you talking about ai models or statistical models?
can you please provide an example scenario of what you are stating about binary feature switch? i cannot comprehend on this?? any feature to switch from 1 to 0 or 0 to 1 would mean you’re creating a model that would switch feature dimensionality???