Why use one-hot vectors instance of a more compact representation?

Why are one hot vectors used to represent words in a dictionary instead of the index of the word in the dictionary (as an example).

For large dictionaries, it looks like one hot vectors are wasteful (unless they’re stored efficiently? Maybe by storing only the index of the bit? But in that case why not make the desired output be the index?)

1 Like

One-hot vectors are a very handy way to represent the features, because then every example is exactly the same size, and they are the same size as the vocabulary itself.

Using the index values would have the side-effect of falsely implying a linear relationship between the features (two words with adjacent indices would appear to be more similar than words that are farther apart alphabetically).

Would you not be able to achieve the same result by storing a single index?

No, because that would imply a linear relationship between the feature values.

I really sorry to keep at this, I’m just trying to understand the reasoning, not suggesting that You’re wrong or anything.

I didn’t quite get the reason for the linear relationship.

For example, let’s imagine we have a one-hot vector length 4. In the case of a value like <0, 1, 0, 0>, wouldn’t we achieve the same by saving the index that is “on”? In this case by saving “1”.

I’m looking at this from a memory efficiency perspective. Obviously in this case the index is larger, but for very large vectors, 32bit unsigned integers would be efficient.

These two systems can easily be converted between them which makes me assume it’s possible, the interface wouldn’t even have to change and could be treated as a one-hot vector. Maybe the existing frameworks already do this, I’m curious.


Yes, you could do that. It is a “sparse matrix”. It is effecient for storage, but not for math calculations.

1 Like

Ah yes yes I see what I was missing now, thanks