Keyword Research Using Python, RAKE, and Support Chat Transcripts

There’s probably a better way to do this with R, but I already somewhat know Python, so that was my main tool of choice. I’d dabbled with nltk before, but all I needed for now was a simple keyword extraction with weighting of the terms. I wanted to know all the related terms, but I wanted to make sure I knew which were occurring frequently. After a bit more research, the RAKE package seemed to be exactly what I needed.

RAKE is pretty simple as far as these things go, but I came across a great tutorial here that walks you through doing an extraction. While you don’t need to get super fancy with it, I ended up adding a fair amount of functionality to allow me to use a text file as the input, and output a CSV with the scored keywords to make it easier to use. You can see what I ended up with here (feel free to fork and put your own spin on it).

I ran my chats through the RAKE extraction, and on the first run, I got some okay results. A lot of expected keywords pulled out, but a lot of noise from URLs that were pasted into chat messages. So, I went back into Atom and used more regex to clean those out.

My second pass came out quite a bit cleaner, but still a bit iffy on the words it was ranking highly. At that point, I discovered that Medelyan had included an optimization script in her repo for RAKE. I ran the `optimize_rake.py` script to determine the best settings for the keyword set, and what I had (defaults) were far different than the eventual suggestion from the optimization script.  I changed the settings, and let RAKE loose again on the chat messages.

After an hour or so of processing, RAKE finally spit out a pretty solid list of ranked keywords.

Screenshot of eventual keyword list.

As you can see, these are pretty on-point search marketing and SEO-related terms, which is exactly what I was looking for with this experiment. These are the things our users are actually asking us about, and not speculation or anecdotal evidence.

Your clients might not using chat for customer interactions; it requires a fair amount of “babysitting” and isn’t necessarily useful for all industries. However, they’ve probably got a CRM system that customer interactions run through, or a contact form for pre-sales questions. They might record phone calls that could be transcribed by an outsourced transcription service. There are a million ways that you can get customer input/inquiries in a manner that this technique can be used on. The silver bullet is that you’re not looking at all the same competitors and AdWords data that everyone else is; you’re gleaning keyword research directly from real, live customers. Using these weighted keyword lists as part of your research strategy will help drive down click budgets, create better targeted content, and come up with better answers for those common customer questions.

Tools
Python: https://www.python.org/downloads/
RAKE: https://github.com/zelandiya/RAKE-tutorial and https://github.com/chrisfromthelc/python-rake (my “value-added” version)
Atom: https://atom.io/

Pages: 1 2 3

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.