Output returned by model is set to float, but compared to ints

reinoudbosch · January 25, 2026, 9:43pm

As I don’t have access to the github repo for this course, I will post this issue here:

function extract_number extracts the last number of the model’s generated output and converts it to float with the following code snippet:
return float(numbers[-1])

When the output of the model is compared to data stored in GSM8K it produces a Fail when a float output by the model is compared to an int in GSM8K:

109, Model: 109.0, Correct: False

13, Model: 13.0, Correct: False

It looks like scoring for exercise 3 is influenced by this:

Exercise 3

10/20
Overall Score: 0.50 (0.5/1 exercises) Exercise Results: FAIL ex3: 0.50 - Passed 1/2 tests - Function should achieve high accuracy on mock data (all questions should be answered correctly) FAILED!

Maybe someone with access to the github repo for this course could pass this on?

Thanks!

TMosh · January 25, 2026, 10:02pm

I don’t have access to that repo either, but I will ping the staff directly.

bong.seog.choi · January 26, 2026, 5:59am

In exercise 3, when you implement “evaluate_model_correctness,” if you “extract_number” from both answers, you can be comparing the correct type.

reinoudbosch · January 26, 2026, 11:27am

Yes, that’s one of a number of ways to make it work, but it conflicts with the statement that there should be ‘exact matches’ - which would be int when the ground truth is int. So the word ‘exact’ is misleading.

bong.seog.choi · January 26, 2026, 2:29pm

What can go wrong if you simply match text? Not all answers are integers, and exact string matching fails when different representations (like .1 and 0.1) represent the same value. Matching at the float level allows us to verify the answer regardless of formatting. While not perfect, this avoids complicated code while remaining effective for educational purposes.

TMosh · January 26, 2026, 3:26pm

I agree that comparing floats and ints is not a great practice.

reinoudbosch · January 26, 2026, 4:12pm

Hi bong.seog.choi,

This is precisely my point. The learner is prompted in the following way:

"Exercise 3

In this exercise you will:

Generate responses for each problem
Extract numerical answers from both model output and ground truth
-> Compare the two answers for exact matches
Calculate overall accuracy

Comparing the two answers for exact matches would imply that a user use sample[‘answer’] rather than passing the ground truth through an additional function (extract_number) that converts the ground truth into a float.

It was confusing to me as a learner/tester. So I would suggest resolving this one way or another to avoid such confusion for other learners. But it’s not up to me to decide on this point.

TMosh · January 26, 2026, 5:01pm

I have submitted a github issue for this, so we’ll eventually see what the staff thinks.

reinoudbosch · January 26, 2026, 5:02pm

Thanks Tom!

jan.ravnik · January 26, 2026, 7:44pm

I have raised the issue with the team. Thanks Reinoud.

bong.seog.choi · January 26, 2026, 8:13pm

Thanks for elaborating. I agree that the problem statement is not clear enough.

Topic		Replies	Views
Problem in extract_final_answer for Module 3 excersise Fine-tuning & RL for LLMs: Intro to Post-training week-module-3 , project , coursera-platform	1	52	January 22, 2026
NLP course 2 Week 2 Exercise 8, expected float, got float error NLP with Probabilistic Models week-module-2	2	473	April 25, 2023
Extra text generated by the model disrupts JSON extraction, resulting in invalid JSON outputs. AI Discussions ai-discussions	10	264	March 4, 2025
C3_W2 Assignment test_model function floating point precision NLP with Classification and Vector Spaces week-module-4	4	545	January 5, 2023
C1 W1 Logical error when calculating accuracy NLP with Attention Models week-module-1	6	375	January 8, 2024

Output returned by model is set to float, but compared to ints

109, Model: 109.0, Correct: False

13, Model: 13.0, Correct: False

Related topics