Why I am Analyzing Amazon Alexa Reviews?
Well once, I was thinking to buy Voice Controlled speaker, I was confused between Amazon Alexa and Google Mini. But to be frank I was more Inclined towards Amazon Alexa because of its Iridescent Light colors.
So this is how I got the idea to do this Project.
What Is NLP?
Formally, Natural Language Processing or NLP is defined as the application of computational techniques for the analysis and the synthesis of text. The aim of NLP is to give computers the ability to do tasks involving human language.
Uses of NLP
1) Sentiment Analysis — Finding if the text is leaning towards a positive or negative sentiment.
2) Text Classification — Categorizing text to various categories
3) Document Summarization — Compressing a paragraph/document into few words or sentences
4) Parts of Speech Tagging — Figuring out the various nouns, adverbs, verbs, etc in the text.
5) Machine translation — Translate text from one language to another
6) Named Entity Recognition — Identify the entities present in the text
7) Conversational AI — Chat with a machine in natural language and get queries resolved
Importing Libraries:

Loading and Reading the file

Describing the Data:

Now further I have added an extra column to find the length of the Reviews and also described it-

Describe the data according to Ratings-

Describing the data as per feedback-

Accessing the reviews with the below command

Accessing the most frequent words from the reviews


From the above chart we can infer that most of the review are positive as we can see the frequently occurring words are Positive. The word love is used most among all of the reviews.
Getting the Feedback length


We can infer from the above chart that, people giving feedback has given approx. 500 words and max to max 2500 words.

Feature Extraction from the Data


Using Random Forest


Applying the K fold Validation –
Dataset is divided yet in Training and Test set by the authors of the dataset it self.
In proportion approximately 75% Training images, 25% Test images.
Models will be trained considering only Training set and then Test set will be used in order to evaluate their performance in terms of accuracy.
This approach not always the best choice, because due to sample variability between training and test set, our model could gives a better prediction on training data but fail to generalize on test data; and the subset chosen could have bias and not be representative of the entire dataset.
Cross-validation is a statistical technique which involves partitioning the data into subsets, training the data on a subset and use the other subset to evaluate the model’s performance. To reduce variability we perform multiple rounds of cross-validation with different subsets from the same data. We combine the results from these multiple rounds to come up with an estimate of the model’s predictive performance.
Cross-validation will give us a more accurate estimate of a model’s performance.



Hence, we can conclude that our model has an accuracy of 93.75%.




















