Professional Documents
Culture Documents
An Introduction To Twitter Data Analysis in Python - Paper - 63
An Introduction To Twitter Data Analysis in Python - Paper - 63
net/publication/308371781
CITATIONS READS
2 2,445
1 author:
Vivek Wisdom
Deloitte & Touche Llp
2 PUBLICATIONS 2 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Vivek Wisdom on 21 September 2016.
lang: acronym of the tweet language like ‘en’, Counting frequencies of a term in twitter data analysis is
one of the simplest steps. By this, we can analyze for a
created_date: date of creation of the tweet, particular user that what he frequently tweets about. One use
favorite_count: number of favorites of the tweet, case of term frequencies is that advertisement companies can
provide targeted ads based on the user's term frequencies. It
will have more possibility that user clicks or visit the 4.3.Most Used Hashtags
promoted website.
Hashtags arena of the most frequently used features of
Below code can be an example of counting all the twitter. They used to represent most recent happening in the
frequencies of all the terms. world. Using hashtags effectively we can find many useful
pieces of information like a number of tweets of particular
terms_only = [term for term in hashtags, which is generally used to compare the twitter
preprocess (tweet['text']) if term battle between teams in most sports.
not in stop and not In current dataset i.e. my_tweets.json we will try to find
term.startswith(('@', ‘#'))] the most used #hashtags. Below sample code can generate all
the hashtags used in the tweet dataset:
In the above code snippet we are listing out all the terms
which are in preprocessed tweet text if that term is not in
stop-words array stop it doesn’t starts with @ or #. We can terms_hash= [term for term in
use python collection called counter() to count the preprocess(tweet[‘text']) if
occurrences of the terms and list them aside to their count. term.startswith('#')]
Here are the 20 most used terms on my personal user Below are most used hashtags on my user timeline tweet
timeline. [5] data:
5.2.Bigrams Terms
Bigrams in this #DeadlineDay data show that David Luiz
terms has been used most frequently together as its complete
name of the footballer from Brazil.