Latency will only increase

The average latency is defined as (total duration for the batch) / max_tokens. As batch size increases, the numerator increases and denominator remains same. So, the latency will increase mathematically.

Hi @nmurugesh

