Question about the dataset mapping function in C4W3_HF_Lab2_QA_BERT lab

It’s explained in the lab that we “need to align the start and end indices with the tokens associated with the target answer word”, but the code provided is:

# Start/end character index of the answer in the text.
        gold_text = sample["document_plaintext"][sample['annotations.minimal_answers_start_byte'][0]:sample['annotations.minimal_answers_end_byte'][0]]
        start_char = sample["annotations.minimal_answers_start_byte"][0]
        end_char = sample['annotations.minimal_answers_end_byte'][0] #start_char + len(gold_text)

        # sometimes answers are off by a character or two – fix this
        if sample['document_plaintext'][start_char-1:end_char-1] == gold_text:
            start_char = start_char - 1
            end_char = end_char - 1     # When the gold label is off by one character
        elif sample['document_plaintext'][start_char-2:end_char-2] == gold_text:
            start_char = start_char - 2
            end_char = end_char - 2     # When the gold label is off by two characters

which seem to be doing nothing as we are comparing the string from sample[‘document_plaintext’] rather than comparing it with the text extracted from the tokenized input to check if the tokenizing has caused any misalignment.
Am I missing something here?
HERE IS A LINK TO THE LAB IN QUESTION

QUESTION 2:
If we have already mapped the dataset with the mapping function which returns the following:

return {'input_ids': tokenized_data['input_ids'],
          'attention_mask': tokenized_data['attention_mask'],
          'start_positions': start_position,
          'end_positions': end_position}

Why would we need to specify columns to return like this?

columns_to_return = ['input_ids','attention_mask', 'start_positions', 'end_positions']
processed_train_data.set_format(type='pt', columns=columns_to_return)
processed_test_data.set_format(type='pt', columns=columns_to_return)

Shouldn’t the mapping function do this for us as it is the case in tensorflow datasets?

Thank you in advance to everyone who takes the time to reply and wish you all a great day/night!