Why the f.read().split(’\n’) can sort the vocabulary as below?

and why do we need to use sorted() again in the next code block if it is already sorted?

Why there is a space character in the first one of the list, but when we check the first 50 letters in the vocabulary list there is not?

Thank you so much Arvyzukai. So does that mean hmm_vocab.txt has already been sorted?
hmm_vocab.txt
is “kind of” sorted - Python sorts by byte value by default utf-8
, for example :
In: sorted(['C', 'B', 'A', 'b', 'c', 'a', '1'])
Out: ['1', 'A', 'B', 'C', 'a', 'b', 'c']
Note: capital letters comes first
It’s “safer” for consistency to sort the list again when you read a file from unknown computer (maybe different encoding, maybe different locale (é comes after z) etc.). Actually, the word order in vocabulary does not need to be sorted but it must to be consistent through whole process of achieving your goal (during training, inference etc.) - you cannot have inconsistent mapping (for example, somewhere in your code word !
maps to: 0
, and somewhere else in the code it cannot map to: 1