Naive Way to Analyse Competitor using WordCloud

How to obtain competitor tweets & display it nicely using wordcloud

Posted by Dwi Hadyan Harsono on Sept 29, 2021
TLDR; we tried to take all of our competitor tweets to know what are they talking about. tried using wordcloud method, but its just too many stopwords, hence no value can be extracted from here. message me on linkeidn if you think ive missed anything.

Scenario

Initially here what i am trying to achieve is to 'snoop' what competitor are doing, by taking their tweets using twitter API, and create wordcloud to know what they talk about.

Of course, we will be using OSEMN. so, let's begin

Obtain

We've obtained the data from twitter API. need to get the token first here.

...

Firstly, you must know which part of data do you want to get from twitter (retweeets, like, tweets, etc), then you go to their github to get the code boilerplate here (thanks twitter devs!). We will be analysing few biggest retail players in indonesia. Ace, shopee, tokopedia, transmart by getting their raw tweets

Scrub

Remove any punctuations & stopwords that adds no value towards the tweet. indonesian stopword can be found here. and finally tokenise the tweets.

Explore

...

Have a quick overview on what's the data remains using quick wordcount. we noticed that most of remains are stopwords that makes a meaning only if its a sentence

Model

...

Still out of curiousity, we still proceed to see wordcloud by using function 'wordcloud_generator' we created

iNterpret

  • As expected, it's true. All of the words left has meaning only if its set in complete sentences.
  • I had the idea of "listen to your customers" to change the process entirely where user can type in product name, output all tweets about the products. (but issue is too 'noisy' as in, we search 'tupperware' on twitter, and will return jokes that uses the word tupperware & nothing on the tupperware product itself)

Final words from Dwi

You can find the codes here.

Perhaps we can improve this by getting sentiment scores instead & rank them by sentiment from bad to good (to know competitor weakness ?). Stay tuned for second episode (jeng jeng jenggg)