I need to know why the log-log scatter is useful in visualizing the word frequencies in this lab.
Thank you
Hi Ahmed_Mohamed,
The visualization serves to present a picture of the number of times words are used in a negative tweet versus a positive tweet. Words on the red line occur as often in a negative tweet as in a positive tweet. Words in the triangle above the red line appear more often in negative tweets than in positive tweets, and words in the triangle below the red line appear more often in positive tweets than in negative tweets.
The reason to use a log-log scale is worded in the assignment as follows: “Instead of plotting the raw counts, we will plot it in the logarithmic scale to take into account the wide discrepancies between the raw counts (e.g. :)
has 3691 counts in the positive while only 2 in the negative)”. This simplifies the visualization by decreasing the space needed to portray the various locations of the words in the picture.
Thank you @reinoudbosch . It is clear now.