Nandoo's Blog: Extracting Twitter Data using R

Of late have been reading and spending quite a bit of time on Big Data technologies ( HDFS, Pig, Hive and Impala etc., ), Oracle Data Visualization Desktop (Oracle DVD) and R.

To try out Big Data techniques have been looking around for large data sets. Got this crazy idea of extracting twitter data and analyze it using Oracle DVD and generate cool visuals.

But now I am stuck. I did not know how to pull data out of Twitter.

Googling around found this excellent post, which details step-by-step process to extract Twitter data using R.

Pre-requisites for this are -

R installed (V3.3) on your desktop
you have a Twitter Account to create a Twitter Application.

STEPS TO CREATE A TWITTER APPLICATION

Navigate to My Applications in the upper right hand corner.

Navigate to My Applications in the upper right hand corner.

Create a new application.

Fill out the new app form. Names should be unique, i.e., no one else should have used this name for their Twitter app. Give a brief description of the app. You can change this later on if needed. Enter your website or blog address. Callback URL can be left blank. Once you’ve done this, make sure you’ve read the “Developer Rules Of The Road” blurb, check the “Yes, I agree” box, fill in the CAPTCHA and click the “Create Your Twitter Application” button.

Scroll down and click on “Create my access token” button.

Note the values of consumer key and consumer secret and keep them handy for future use. You should keep these secret. If anyone was to get these keys, they could effectively access your Twitter account.

Install and Load Required Package

R comes with a standard set of packages. A number of other packages are available for download and installation. For the purpose of this post, we will need the following packages:

– ROAuth: Provides an interface to the OAuth 1.0 specification, allowing users to authenticate via OAuth to the server of their choice.

– Twitter: Provides an interface to the Twitter web API.

installing and loading all the required packages.

install.packages("twitteR")
install.packages("ROAuth")
library("twitteR")
library("ROAuth")

Creating Twitter Authentication Process

The procedure worked for most of the bits, except for the step for Twitter Authentication step. Rather than using "TwitterOAuth" for authentication, which was not working, I had to replace this step with =>

load("base64enc")
setup_twitter_oauth(Consumer_key,Consumer_secret,access_token,access_token_secret)

Where Consumer_Key, Consumer_secret, access_token and access_token_secret are to be defined and assigned proper values as per your twitter app authentication.

After this it all worked fine, was able to connect to Twitter and extract data I was looking for.

Extract Tweets

From R CLI, search for twitter tags and write them to a text file:

> tweets <- igdata="" n="100)</font" searchtwitter="">

To verify the contents of the extract


> print(head(tweets,2))
[[1]]
[1] "smuddu: Session at #gitpro2017 on  how to get ROI from Bigdata and ML? @CIOonline @strataconf #MachineLearning @bigdata #bigdata @hadoop @awscloud"

[[2]]
[1] "alevergara78: RT @bigdata: . @AnimaAnandkumar on distributed deep learning using MXNet \xed��\xed�\u008f  \xed��\xed�\u0092 #deeplearning sessions at #stratahadoop San Jose https://t.co…"






The below steps will dump the contents of the R vector "tweets" into a file:




> sink("c:/Users/mah/Documents/tweets.txt")
> print(tweets)
> sink()

>

The next step is to analyze the data using Oracle DVD, that's for another post.

Nandoo's Blog

2017/02/19

Extracting Twitter Data using R

Install and Load Required Package

Creating Twitter Authentication Process

Extract Tweets

2 comments:

File Handling with Python