M4.4: Ungraded Lab - Not equal results when top_k and top_p are 0

Shavvy · April 8, 2026, 11:48am

In the lab “Exploring LLM Capabilities” in section 3.3 and 3.2,

The outputs of the function with both top_k and top_p set to 0 should result the same equal outputs, but they don’t.

In the picture above you can see that the second Call output is different:

Response: RAG (Retrieval Augmented Generation) is an AI technique

from the first call:

Response: RAG (Retrieval-Augmented Generation) is a technique

Is it expected?

Deepti_Prasad · April 8, 2026, 1:57pm

hi @Shavvy

Top-k limits choices to the k( most probable tokens), i.e. closest to the output, while top-p picks from the smallest set of top tokens whose cumulative probability exceeds p, providing more dynamic, creative, and coherent, natural-sounding output.

Both still hold almost similar output.

Shavvy · April 9, 2026, 1:47pm

But in the explanation written “the same”, so is it a mistake?

Shavvy · April 9, 2026, 1:49pm

top_p and top_k should act the same when they are both 0,

they should pick the token with the most probability no?

Deepti_Prasad · April 9, 2026, 5:40pm

yes your understanding is correct @Shavvy, but LLM capabilities especially based on probabilities on randomness can depend on other factors.

if temperature is not also set to 0, it scales the token probabilities before top-k or top-p filtering occurs. A high temperature can “flatten” the distribution, making several tokens nearly equally likely.

at extremely low values, minute differences in how a GPU calculates probabilities (floating-point errors) can occasionally flip which token is technically first.

if the sampling range is narrow, like here max_token is using random.randint() ) to pick from the final filtered set, a different seed will result in a different pick if more than one token remains in the pool leading to different output.

Another interesting reason as I mentioned earlier is hardware dependencies

one can encounter difference in llm output with when moving between different hardware setups due to differences in CUDA kernels or numerical rounding.
Top-K requires sorting logit vectors. On larger vocabulary models (more than 100k tokens), this sorting process causes significant overhead, and the efficiency of this operation is heavily dependent on optimized GPU kernels.
Different hardware like NVIDIA H100 vs. A100 vs. consumer GPUs uses different tensor cores and floating-point precisions (FP16, BF16, FP32). Small differences in calculating logit probabilities can cause the “top” tokens to shuffle slightly.

That’s why probably many researchers feel LLMss are intelligent because of hallucination

regards

Dr. Deepti

Topic		Replies	Views
C1M4 Ungraded Lab : top_p = 0 does not produce identical output Retrieval Augmented Generation week-module-4	4	68	June 3, 2026
Both Top P And Top K non zero? How does the model choose Generative AI with Large Language Models week-module-1	1	518	June 29, 2023
Multiple sets with same probabilities in the output of softmax? Generative AI with Large Language Models week-module-1	2	359	October 17, 2023
Error in "Exploring LLM capabilities" lab Retrieval Augmented Generation week-module-4 , dl-ai-learning-platform	1	34	November 15, 2025
LLM - API : temperature / top_p/repetition_penalty parameters logic meaning from business perspective Retrieval Augmented Generation week-module-4	1	84	August 5, 2025

M4.4: Ungraded Lab - Not equal results when top_k and top_p are 0

Related topics