Lesson 5: LowRank Adaptation, Quick note on X = torch.randn(...)

taless474 · February 4, 2026, 8:33pm

Thanks! Quick note on X = torch.randn(…) vs using the model’s actual hidden states

Hi everyone,
thank you for putting these lessons together. I’m working through the LoRA section and wanted to share a small observation (and I might be misunderstanding the intent, so please feel free to correct me).

What confused me

In the example, after generating a token from input_ids, the code later introduces a new random tensor:

# dummy input tensor
# shape: (batch_size, sequence_length, hidden_size)
X = torch.randn(1, 8, 1024)

If we then apply a LoRA-style adapter using this X, we’re no longer operating on the same activations produced by the model for the given input_ids. So it seems expected that downstream outputs/logits (and any argmax token) could differ — not necessarily because LoRA “changed the model” in a controlled way, but because the input to the layer changed.

What I think is clearer (reusing the “real” X)

When the goal is to illustrate LoRA as an additive low-rank update for the same hidden states, it helps to reuse the hidden states that already exist for the same input_ids, e.g.:

X = model.embedding(input_ids)

(or equivalently, capture the embedding output with a forward hook).

Below is a minimal reproducible snippet that:

Generates a token from input_ids
Reuses X = model.embedding(input_ids) (not torch.randn)
Wraps model.linear with a LoRA-style module

Repro code

import torch
import math

# ----- toy model -----
class TestModel(torch.nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.embedding = torch.nn.Embedding(10, hidden_size)
        self.linear = torch.nn.Linear(hidden_size, hidden_size)
        self.lm_head = torch.nn.Linear(hidden_size, 10)

    def forward(self, input_ids):
        x = self.embedding(input_ids)
        x = self.linear(x)
        x = self.lm_head(x)
        return x

detokenizer = [
    "red","orange","yellow","green","blue",
    "indigo","violet","magenta","marigold","chartreuse",
]

def generate_token(model, input_ids):
    with torch.no_grad():
        logits = model(input_ids)
    last_logits = logits[:, -1, :]
    next_token_id = last_logits.argmax(dim=1).item()
    return detokenizer[next_token_id]

# ----- set seed BEFORE creating the model for reproducible weights -----
torch.manual_seed(0)
hidden_size = 1024
model = TestModel(hidden_size)

input_ids = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])

print("Before LoRA:", generate_token(model, input_ids))

# ----- IMPORTANT: use the model's actual hidden states, not torch.randn -----
with torch.no_grad():
    X = model.embedding(input_ids)  # (batch, seq, hidden)

print("X shape:", X.shape)

# ----- LoRA wrapper -----
class LoraLayer(torch.nn.Module):
    def __init__(self, base_layer: torch.nn.Linear, r: int):
        super().__init__()
        self.base_layer = base_layer
        in_features = base_layer.in_features
        out_features = base_layer.out_features

        # Trainable LoRA params
        self.lora_a = torch.nn.Parameter(torch.empty(in_features, r))
        self.lora_b = torch.nn.Parameter(torch.zeros(r, out_features))

        # lora_b starts at 0 -> output initially unchanged

    def forward(self, x):
        y_base = self.base_layer(x)  # x @ W.T + b
        y_lora = (x @ self.lora_a @ self.lora_b)
        return y_base + y_lora

# Replace the linear layer with LoRA-wrapped linear
model.linear = LoraLayer(model.linear, r=2)

# sanity check: LoRA layer works on the same X shape
with torch.no_grad():
    print("LoRA linear(X) shape:", model.linear(X).shape)

print("After LoRA (no training):", generate_token(model, input_ids))

I got:

Before LoRA: green
X shape: torch.Size([1, 8, 1024])
LoRA linear(X) shape: torch.Size([1, 8, 1024])
After LoRA (no training): red

lesly.zerna · February 6, 2026, 7:03pm

Thank you @taless474 for taking the course and sharing this comment!

It is interesting finding and I’ll share with the team

Topic		Replies	Views
Clarification on LoRA Generative AI with Large Language Models week-module-2	1	54	March 23, 2025
Week 2 Question 7 - Description of LoRA Method Generative AI with Large Language Models quiz-help , week-module-2	1	111	January 24, 2025
Retraining with Lora and adapter, frozen weights Generative AI with Large Language Models ai-discussions	1	102	September 23, 2024
PEFT-LoRA: model performance Generative AI with Large Language Models week-module-2	0	472	October 3, 2023
Stacking fine-tuning Generative AI with Large Language Models week-module-3	2	435	December 6, 2023

Lesson 5: LowRank Adaptation, Quick note on X = torch.randn(...)

Related topics