Exercise 4 - utils2 function get_batches is that right?

As I know dividing whole train data into batches data is a technique to train your model. And the amount of data is always constant. As an example If I have 1000 data and I want batch_size = 10, this means that I will have 100 batch of data, because 1000 / 10 = 100. And If the model is learning from all 100 batch of data, then we called this 1 epochs.

I dont understand the function of get_batches data here.

def get_batches(data, word2Ind, V, C, batch_size):
    batch_x = []
    batch_y = []
    for x, y in get_vectors(data, word2Ind, V, C):
        while len(batch_x) < batch_size:
            yield np.array(batch_x).T, np.array(batch_y).T
            batch_x = []
            batch_y = []

in the part of while len(batch_x) < batch_size this line will append the exact same x and y data to batch_x and batch_y until the len(batch_x) and len(batch_y) is equal to the batch size. If the purpose of this function is to duplicate the vector representation of center_word and context_word as much as batch_size. Yea this is correct, but my question is, is this a right function to train the model?


Oh Im sorry this problem have arrised one year ago here : What's the purpose of batch_size in the "get_batches" function in "utils2.py", and still not any answer or modification of the function till today 2023-09-17

Hi @Stefanus_Yudi_Irwan

That is a good question (and I had forgot the thread you linked to). I think you are right that the get_batches function is flawed and I remember not having time to dig deeper. But just by looking at the function I think you’re right and since the Assignment is about the gradient it might not have any real influence on the outcome (just for illustration purposes this might be good enough even though - confusing). I will report about it.