In one of our Coursera courses, Natural Language Processing with Sequence Models, i am asked to ’ # create an array with the indexes of data_lines that can be shuffled’.
and the line of code is:
lines_index = [*range(num_lines)]
and the function containing it is defined as:
def data_generator(batch_size, max_length, data_lines, line_to_tensor=line_to_tensor, shuffle=True):
"""Generator function that yields batches of data
Args:
batch_size (int): number of examples (in this case, sentences) per batch.
max_length (int): maximum length of the output tensor.
NOTE: max_length includes the end-of-sentence character that will be added
to the tensor.
Keep in mind that the length of the tensor is always 1 + the length
of the original line of characters.
data_lines (list): list of the sentences to group into batches.
line_to_tensor (function, optional): function that converts line to tensor. Defaults to line_to_tensor.
shuffle (bool, optional): True if the generator should generate random batches of data. Defaults to True.
Yields:
tuple: two copies of the batch (jax.interpreters.xla.DeviceArray) and mask (jax.interpreters.xla.DeviceArray).
NOTE: jax.interpreters.xla.DeviceArray is trax's version of numpy.ndarray
"""
# initialize the index that points to the current position in the lines index array
index = 0
# initialize the list that will contain the current batch
cur_batch = []
# count the number of lines in data_lines
num_lines = len(data_lines)
# create an array with the indexes of data_lines that can be shuffled
lines_index = [*range(num_lines)]
# shuffle line indexes if shuffle is set to True
if shuffle:
rnd.shuffle(lines_index)
....
....# some more code
....
# convert the batch (data type list) to a numpy array
batch_np_arr = np.array(batch)
mask_np_arr = np.array(mask)
### END CODE HERE ##
# Yield two copies of the batch and mask.
yield batch_np_arr, batch_np_arr, mask_np_arr
# reset the current batch to an empty list
cur_batch = []
Can someone explain to me what ‘*range(num_lines)]’ is?
I dont understand the structure of this expression because I thought that an array could be created with np.array(). thank you!