Summary

Twitter is a goldmine for data science, with interesting patterns and insights spread all across it. Its constant flow of user-generated content, coupled with unique, interest-based relationships, present opportunities to understand human dynamics up close. Sentiments Analysis is one such field where Twitter provides the right set of ingredients to understand what and how we present and share opinions about products, brands, people, and so on.

Throughout this chapter, we have looked at the basics of Sentiment Analysis, key terms, and areas of application. We have also looked into the various challenges posed while performing sentiment analysis. We have looked at various commonly-used feature extraction methods such as tf-idf, Ngrams, POS, negation, and so on for performing sentiment analysis (or textual analysis in general). We have built on our code base from the previous chapter to streamline and structure utility functions for reuse. We have performed polarity analysis using Twitter search terms and have seen how public opinion about certain campaigns can be easily tracked and analyzed. We then moved on to supervised learning algorithms for classification, where we used SVM and Boosting to build sentiment classifiers using libraries such as caret, RTextTools, ROCR, e1071 and so on. Before closing the final chapter we also briefly touched upon the highly researched and widely used field of ensemble methods, and also learned about cross-validation-based model evaluation.

There are many other algorithms and analysis techniques which can be applied to extract even more interesting insights from Twitter and other sources on the Internet. Throughout this chapter (and this book), we have merely attempted to address the tip of a huge iceberg! Data science is not just about applying algorithms to solve a problem or derive insights. It requires creative thinking and a lot of due diligence apart from domain understanding, feature engineering, and collecting data to try and solve problems which are as yet unknown.

To sum up things, ponder upon this quote by Donald Rumsfeld:

"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know."

Data science is a journey of learning the knowns and exploring the unknown unknowns, and machine learning is a powerful tool to help accomplish it. #KeepMining!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset