Confusion regarding PSNR and differences in inference performance


So I am a little confused about the difference in inference between the cloud and on device as represented by PSNR.

I mean presuming you are using the same fundamental types/bits in both cases (float, double, etc), of course I might expect on-device inference to take longer-- But why would it be different ?

I’m not sure I’m getting that…

Hi @Nevermnd ,

The fundamental types are the same, but quantization (uses lower precision that can reduce accuracy and PSNR), hardware differences (limited computational power and memory), memory constraints (requires smaller models or more aggressive optimizations, impacting accuracy), and software libraries (Lighter inference engines) can lead to different performances and PSNR between cloud and on-device inference.

1 Like

@Alireza_Saei again, my question was ‘apples-to-apples’-- So quantization is a totally different question, that is changing the underlying processing of the model-- and hardware, memory can all be handled (i.e. say with paging) via clever programming tricks.

As mentioned, I have no idea in my mind on a cell phone compared to a desktop with an RTX 4090, you are going to take a huge performance hit-- meaning, it would run really, really slow. But technically the results shouldn’t be different.

Further, this use of PSNR as a metric at least has me suspecting it is a ‘cover’ for the fact that if it is low, well, now your ‘accuracy’ isn’t so great.

But no one trying to sell/pitch hardware wants to say ‘well, on device your accuracy went down’, because anyone that knows English knows that word.

So I am left wondering if there is a real, legitimate technical reason we are now using this term as measurement ?

Hey there @Nevermnd ,

I think you are trying to say that both do the same thing—one is slow but the other is fast—so the results must be the same! However, in practice, as I said, some factors can impact PSNR and accuracy in totally unequal training processes:

  1. On-device models often require optimizations to fit within the limited resources that can impact accuracy.
  2. Variations in hardware capabilities can lead to different numerical precision and performance, affecting results.
  3. Different inference engines and libraries use distinct algorithms and optimizations, leading to slight differences in output (e.g. TFlite).

And the PSNR helps us to quantify these subtle differences in output quality!