Hi,
I’m working on a personal project to improve the sentiment analysis (Emojifier v2) using a Bidirectional LSTM and attention mechanism. Does this model seem logical and on the right track?
Thanks,
Not very familiar with the assignment but what is the meaning of the arrows crossing into each one of the attentions? Attention with transformers is a matrix multiplication!
Please see C5W3A1 (Neural Machine Translation) assignment and observe the similarity with your architecture.
I would experiment/question why we need T_x attention layers that all accept the same set of inputs.
Also, if more than one attention are needed, you need a layer to combine the outputs from those attentions, such as concat, before that FC.
arrows show direction of information.
Thanks for your observations. You’re right as they all accept the same set of inputs. Thus, just one is needed. Also how would yoou go about improving the performance of C4W2A1 without using transformers? Trying to answer this question is what led me to BI-LSTM with Attention Mechanism. What do you think of this approach or what do you suggest?
@ gent.spah, @ balaji.ambresh
Btw, putting dropout before softmax has effect, but I seldom do it, so I can’t comment on that.
Did you mean C5 W2 A1? Even so, I am not sure that assignment was about sentiment analysis. The title of that assignment is “Operations on Word Vectors” and the only task is “Word Analogy Task” in section 4, so I have no idea what model you are comparing your architecture against.
Your current architecture used many popular techniques which is definitely a good start, but beyond that, frankly, I am not good at just telling what architecture will work best for any dataset. To me, it is always a series of experiments and understanding of data.
However, one detail is that, if you have a good amount of computation resources and data, you might start with a bigger version of your architecture.
Sorry I meant C5W2A2. I’m trying to practice by building an algorithm that takes in a sentence (e.g., I didn’t enjoy my food) and output a sentiment or an emoji (e.g., disappointed).
The course used an LSTM but performace isnt great but better than using average of word embeddings. Andrew wanted to show us how LSTM are better than just averaging word embeddings.
I’m taking it upon myself to improve performance by using BI-LSTM and Attention Mechanisms. This led me to create a simple architecture as shown. And I posted it here to have suggestions and know if i m off to a good start… Btw, thanks for your feedback
Hello, @francktchafa,
Thanks for the additional backgrounds.
Now I think you are comparing with the assignment’s Emojifier V2. First, I take back my comment about putting dropout before softmax. When I commented, I thought there was no Dense layer in between, but now I believe your diagram was based upon that assignment which indeed has a Dense layer.
Second, I very much agree with practicing those techniques. If I were you, I would stick to the one-change-at-a-time principle. Now you are going from two layers of LSTMs to one layer of Bi-LSTM plus attention, so it may not be easy for you to tell what contributes more to improvement. Also, I would keep in mind and keep track of the number of trainable parameters among all experimented models to understand if the improvements come from more parameters or archectitectural change.
Besides, though the following may be off what you asked, I will inspect the failed examples and reason if any change to the model configuration should help. Lastly, it may also be interesting to single out some examples and train some sub-models for more focused investigations. At all time, the bias variance checks covered in C3 should be in place to know if the current architecture is most utilized with a good balance of size and regularization.
I think your architecture is a good start, and I would recommend the details above because I believe it is always a series of experiments particularly when it comes to practicing. If I ran out of idea of more dedicated architectures/approaches, I would conduct some research, but it seems to me you were trying a simple architecture of techniques covered in these courses, so it is a good start to me.
The only thing unfortunate is that I have never had any motivation to improve it myself so my response could not be more insightful for this particular dataset. In any case, if you wanted to share your findings here, please feel free to.
Cheers,
Raymond
Thank you for your valuable insights. I’m committed to continuous improvement and enjoy practicing to enhance my skills. I’ll experiment and keep you updated.