Text Mining

"I think it's much more interesting to live not knowing than to have answers which might be wrong."
                                                                                                                 - Richard Feynman

The world is awash with textual data. If you Google, Bing, or Yahoo how much of that data is unstructured, that is, in a textual format, estimates would range from 80 to 90 percent. The real number doesn't matter. What does matter is that a large proportion of the data is in a text format. The implication is that anyone seeking to find insights in that data must develop the capability to process and analyze text.

When I first started out as a market researcher, I used to manually pore through page after page of moderator-led focus groups and interviews with the hope of capturing some qualitative insight an Aha! moment if you will-and then haggle with fellow team members over whether they had the same insight or not. Then, you would always have that one individual in a project who would swoop in and listen to two interviews-out of the 30 or 40 on the schedule and, alas, they had their mind made up on what was really happening in the world. Contrast that with the techniques being used now, where an analyst can quickly distill data into meaningful quantitative results, support qualitative understanding, and maybe even sway the swooper.

Over the last few years, I've applied the techniques discussed here to mine physician-patient interactions, understand FDA fears on prescription drug advertising, and capture patient concerns about a rare cancer, to name just a few. Using R and the methods in this chapter, you too can extract the powerful information in textual data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset