Some personal observations/thoughts with regard to detecting data bias and sentiment analysis…
In the Practical Data Science course (C1-W2), the Women’s Clothing data set includes a review such as “Product was damaged upon arrival”. This feedback is more relevant to the delivery service as opposed to the product itself.
I’m sure there are plenty of similar examples where negative reviews relates not to the product itself but the ordering process, customer support experience, etc.
If overall sentiment is being analysed, then fair enough. However, an inquisitive mind will naturally want to dive deeper. If we’re interested in purely product quality then reviews should be vetted to ensure relevance.
I think this is an area where e.g. enhanced data structures and/or prompt engineering could really help to identify which aspects of a business operation potentially need to be improved upon.
A final thought, an important business objective should be to maximise the positive sentiment across all aspects of the business (product quality, ordering system, delivery, customer support, etc.). But with specific relevance to product quality, if the majority of reviews have positive sentiment (75%), then that is a good thing. Yet this is viewed as a biased dataset. The focus should be on identifying the negative sentiment reviews and trying to improve product quality from there. Having more granular data about the consumers should help to differentiate between positive and negative sentiment.
Welcome people’s thoughts.
Chris.