Exercise 4 - utils2 function get_batches is that right?

As I know dividing whole train data into batches data is a technique to train your model. And the amount of data is always constant. As an example If I have 1000 data and I want batch_size = 10, this means that I will have 100 batch of data, because 1000 / 10 = 100. And If the model is learning from all 100 batch of data, then we called this 1 epochs.

I dont understand the function of get_batches data here.

``````def get_batches(data, word2Ind, V, C, batch_size):
batch_x = []
batch_y = []
for x, y in get_vectors(data, word2Ind, V, C):
while len(batch_x) < batch_size:
batch_x.append(x)
batch_y.append(y)
else:
yield np.array(batch_x).T, np.array(batch_y).T
batch_x = []
batch_y = []
``````

in the part of `while len(batch_x) < batch_size` this line will append the exact same x and y data to `batch_x` and `batch_y` until the `len(batch_x)` and `len(batch_y)` is equal to the `batch size`. If the purpose of this function is to duplicate the vector representation of center_word and context_word as much as batch_size. Yea this is correct, but my question is, is this a right function to train the model?

Thanks

Oh Im sorry this problem have arrised one year ago here : What's the purpose of batch_size in the "get_batches" function in "utils2.py", and still not any answer or modification of the function till today 2023-09-17

That is a good question (and I had forgot the thread you linked to). I think you are right that the `get_batches` function is flawed and I remember not having time to dig deeper. But just by looking at the function I think youâ€™re right and since the Assignment is about the gradient it might not have any real influence on the outcome (just for illustration purposes this might be good enough even though - confusing). I will report about it.

Thanks!