Failed test case in train_val_split in Assignment

for train_val_spit() I have got the same output as expected when testing my function in the next cell.

However, the grader gives me an error:

Failed test case: incorrect number of training sentences when using split of 0.5 and a total of 2225 sentences.
Expected:
a value close to 1112 with absolute tolerance of +/- 1,
but got:
1780.

Failed test case: incorrect number of validation sentences when using split of 0.5 and a total of 2225 sentences.
Expected:
a value close to 1112 with absolute tolerance of +/- 1,
but got:
445.

My function is as follows:

# Compute the number of sentences that will be used for training (should be an integer)
train_size = 1780

# Split the sentences and labels into train/validation splits
train_sentences = sentences[0:train_size]
validation_sentences = sentences[train_size:]


what can be wrong?

Just a guess, but the comment says Compute the number of sentences for train_size. But you hard code it. As a general coding practice, harcoding lengths and sizes is not preferred. In these classes, it is a recipe for unit test and/or autograder unhappiness. Try treating the split ratio and the corpus size as variables and computing the train_size and let us know what you find?

1 Like

Thank you.
I have tried to use

`

train_size = len(sentences)* training_split

`

and am getting:
TypeError: slice indices must be integers or None or have an index method

thanks. I have used int() and then I passed.

1 Like

There is almost always more than one way to accomplish something built in to Python, and converting a floating point number to an integer is no exception (pun intended)

Here is one: Built-in Functions — Python 3.12.0 documentation

The +/- 1 in the unit test error message is an acknowledgment that there are others that produce similar but not exact results. For conversation, here are some of them…

https://numpy.org/doc/stable/reference/generated/numpy.ceil.html

https://numpy.org/doc/stable/reference/generated/numpy.floor.html