Model embedding layer dim


In Lab1 I tried dim = 8, same result.
How to choose correct embedding layer dim?


Hello @Taras_Buha

Thanks for reaching out.

There is no “right” answer to this question; there are many views on choosing the embedding_dimensiones.

For example, this google developer blog post says:

Well, the following “formula” provides a general rule of thumb about the number of embedding dimensions:

embedding_dimensions =  number_of_categories**0.25

The embedding vector dimension should be the 4th root of the number of categories.

The most important thing is to take in mind the following guidelines:

  1. Embedding layer is a compression of the input; when the layer is smaller, you compress more and lose more data. When the layer is bigger, you compress less and potentially overfit your input dataset to this layer, making it useless.
  2. If you have very sparse documents relative to the vocabulary, you want to “get rid” of unnecessary and noisy words - you should compress more - make the embedding smaller.
  3. The more extensive vocabulary you have, you want a better representation of it - make the layer larger.

There are some recommendations from Tolik; note that this is just a general guideline; you can set the number of embedding dimensions as you please.

Hopefully, help :muscle:

With regards,

1 Like

Hello @adonaivera

Thanks very much for so powerful information.
Very helpful answer.

Best regards, Taras

1 Like