In this short course “How Transformers Work?“. I learned how to generate a single token given the jupyter notebook.
Then I started to generate a full sentence through this autoregressive model.
prompt = "Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened. "
for j in range(50):
input_ids = tokenizer(prompt, return_tensors=“pt”).input_ids
model_output = model.model(input_ids)
lm_head_output = model.lm_head(model_output[0])
token_id = lm_head_output[0,-1].argmax(-1)
next_token = tokenizer.decode(token_id)
print(f"{j}th: {repr(next_token)}")
prompt += next_token
print(prompt)
This is the output. It looks good except it does not have any space! I wonder where I missed.
0th: '\n'
1th: '\n'
2th: 'S'
3th: 'ar'
4th: 'ah'
5th: ','
6th: '\n'
7th: '\n'
8th: 'I'
9th: 'am'
10th: 'deeply'
11th: 'sorry'
12th: 'for'
13th: 'the'
14th: 'un'
15th: 'fortun'
16th: 'ate'
17th: 'inc'
18th: 'ident'
19th: 'that'
20th: 'oc'
21th: 'cur'
22th: 'red'
23th: 'in'
24th: 'my'
25th: 'g'
26th: 'arden'
27th: '.'
28th: 'I'
29th: 'under'
30th: 'stand'
31th: 'that'
32th: 'your'
33th: 'pre'
34th: 'cious'
35th: 'plants'
36th: 'w'
37th: 'ere'
38th: 'har'
39th: 'med'
40th: 'd'
41th: 'uring'
42th: 'the'
43th: 'event'
44th: '.'
45th: '\n'
46th: '\n'
47th: 'The'
48th: 'inc'
49th: 'ident'
Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.
Sarah,
Iamdeeplysorryfortheunfortunateincidentthatoccurredinmygarden.Iunderstandthatyourpreciousplantswereharmedduringtheevent.
Theincident