A word cloud is a graphical representation of frequently used words in a collection of text data. The height of each word in the visualization is an indication of frequency word in the entire text.
Why do we need text analytics?
Analytics is the science of processing raw information to bring out meaningful insights. This raw information can come from variety of sources, whether it can be structure, Semi-structure as well as unstructured data.
Step by step coding on R:
For Building the Word Cloud required some special packages, which might not be pre-installed on R software. You need to install text mining package to load a library called tm and word cloud.
Note: In this process we are fetching the data from the web in the form of .csv file.
Step 1: Fetching the Data from web in the form of in .csv file.
Step 2: Create a corpus from the collection of text files.
Step 3: Performing the Data processing transformation on the text files.
Step 4: Create structured data from the text file.
Step 5: Making the word cloud using the structured form of the data.
# downloading data from Web
#converting to a data frame
df <- do.call("rbind", lapply(Sample_data, as.data.frame))
# checking the dimension
# build a corpus
myCorpus <- Corpus(VectorSource(df))
# convert to lower case
myCorpus <- tm_map(myCorpus, tolower)
# specify as Plain Text document.
myCorpus <- tm_map(myCorpus, PlainTextDocument)
# remove punctuation
myCorpus <- tm_map(myCorpus, removePunctuation)
# remove numbers
myCorpus <- tm_map(myCorpus, removeNumbers)
# Remove stopwords from corpus
myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))
# Remove the Whitespaces.
myCorpus <- tm_map(myCorpus, stripWhitespace)
# If need then do the stemming
myCorpus <- tm_map(myCorpus, stemDocument)
dtm <- DocumentTermMatrix(myCorpus)
#Plot the word cloud
Running above code will give you the output. The order of words is completely random but the length of the words are directly proportional to the frequency of occurrence of the word in text file.