We know that computers understand programming languages but how about making them understand human language, the language that you and me speak? Natural Language Processing (NLP)...
By: Rajendra Sharma | April 24, 2015
A word cloud is a graphical representation of frequently used words in a collection of text data. The height of each word in the visualization is an indication of frequency word in the entire text.
Why do we need text analytics?
Analytics is the science of processing raw information to bring out meaningful insights. This raw information can come from variety of sources, whether it can be structure, Semi-structure as well as unstructured data.
Step by step coding on R:
For Building the Word Cloud required some special packages, which might not be pre-installed on R software. You need to install text mining package to load a library called tm and word cloud.
Note: In this process we are fetching the data from the web in the form of .csv file.
Step 1: Fetching the Data from web in the form of in .csv file.
Step 2: Create a corpus from the collection of text files.
Step 3: Performing the Data processing transformation on the text files.
Step 4: Create structured data from the text file.
Step 5: Making the word cloud using the structured form of the data.
library(tm) library(wordcloud) # downloading data from Web Sample_data<-read.csv(‘Url Path’) #converting to a data frame df <- do.call("rbind", lapply(Sample_data, as.data.frame)) # checking the dimension dim(df) # build a corpus myCorpus <- Corpus(VectorSource(df)) # convert to lower case myCorpus <- tm_map(myCorpus, tolower) # specify as Plain Text document. myCorpus <- tm_map(myCorpus, PlainTextDocument) # remove punctuation myCorpus <- tm_map(myCorpus, removePunctuation) # remove numbers myCorpus <- tm_map(myCorpus, removeNumbers) # Remove stopwords from corpus myCorpus <- tm_map(myCorpus, removeWords, stopwords("english")) # Remove the Whitespaces. myCorpus <- tm_map(myCorpus, stripWhitespace) # If need then do the stemming myCorpus <- tm_map(myCorpus, stemDocument) dtm <- DocumentTermMatrix(myCorpus) #Plot the word cloud wordcloud(myCorpus, colors=brewer.pal(8,"Dark2"),random.order=FALSE)