Chris W. Smith

  • About Me
    • Past Work
  • Tools
  • Blog
  • Contact

Keyword Research Using Python, RAKE, and Support Chat Transcripts

June 16, 2017 By Chris Smith Leave a Comment

Over at WordPress.com, our main avenue of customer support is live chat. We previously were on Olark, but has since built out a chat system that we call HappyChat (support folks are referred to as Happiness Engineers). There are a number of excellent features that the development team has built in, but an often underutilized one is chat tagging. If a user joins a chat, and asks about a domain renewal, that chat might be tagged with “domains” or “domain-renewal”; we aren’t very strict on tagging, except in certain circumstances. We can pull out the data we need even if the tagging is a little fuzzy.

While I’ve been writing search marketing and SEO articles for our users for a while, I wanted to know exactly what our users were asking in chat, and what kind of things were we spending chat time teaching them about. This is where the chat tagging comes in. Most of the Happiness Engineers are pretty diligent about tagging their chats, so it was pretty easy for me to get an SQL dump of all the transcripts tagged with “seo”…however, it ended up being roughly about 18,000 lines of chat transcripts over the past few months.

Time for some Python magic.

I exported the chats out of my local MySQL instance into a CSV file. This thing needed some cleanup first before I could really process it. I like Atom, but any decent text editor that supports regex will do. I needed to get this CSV, with every detail about every chat (which included many columns of diagnostic-type data), down to the bare messages. After a number of different regex find-and-replaces in Atom, I finally trimmed it down to about 15,000 lines of pure chat messaging, both the user and the Happiness Engineer side of the conversation.

Ok, now it’s time for some Python magic.

There’s probably a better way to do this with R, but I already somewhat know Python, so that was my main tool of choice. I’d dabbled with nltk before, but all I needed for now was a simple keyword extraction with weighting of the terms. I wanted to know all the related terms, but I wanted to make sure I knew which were occurring frequently. After a bit more research, the RAKE package seemed to be exactly what I needed.

RAKE is pretty simple as far as these things go, but I came across a great tutorial here that walks you through doing an extraction. While you don’t need to get super fancy with it, I ended up adding a fair amount of functionality to allow me to use a text file as the input, and output a CSV with the scored keywords to make it easier to use. You can see what I ended up with here (feel free to fork and put your own spin on it).

I ran my chats through the RAKE extraction, and on the first run, I got some okay results. A lot of expected keywords pulled out, but a lot of noise from URLs that were pasted into chat messages. So, I went back into Atom and used more regex to clean those out.

My second pass came out quite a bit cleaner, but still a bit iffy on the words it was ranking highly. At that point, I discovered that Medelyan had included an optimization script in her repo for RAKE. I ran the `optimize_rake.py` script to determine the best settings for the keyword set, and what I had (defaults) were far different than the eventual suggestion from the optimization script.  I changed the settings, and let RAKE loose again on the chat messages.

After an hour or so of processing, RAKE finally spit out a pretty solid list of ranked keywords.

Screenshot of eventual keyword list.

As you can see, these are pretty on-point search marketing and SEO-related terms, which is exactly what I was looking for with this experiment. These are the things our users are actually asking us about, and not speculation or anecdotal evidence.

Your clients might not using chat for customer interactions; it requires a fair amount of “babysitting” and isn’t necessarily useful for all industries. However, they’ve probably got a CRM system that customer interactions run through, or a contact form for pre-sales questions. They might record phone calls that could be transcribed by an outsourced transcription service. There are a million ways that you can get customer input/inquiries in a manner that this technique can be used on. The silver bullet is that you’re not looking at all the same competitors and AdWords data that everyone else is; you’re gleaning keyword research directly from real, live customers. Using these weighted keyword lists as part of your research strategy will help drive down click budgets, create better targeted content, and come up with better answers for those common customer questions.

Tools
Python: https://www.python.org/downloads/
RAKE: https://github.com/zelandiya/RAKE-tutorial and https://github.com/chrisfromthelc/python-rake (my “value-added” version)
Atom: https://atom.io/

Filed Under: Data Science, Keyword Research, Keywords, SEO, WordPress.com Tagged With: atom, keyword research, Keywords, python, RAKE, wordpress.com

Grow Your Traffic with Keyword Research

July 7, 2016 By Chris Smith Leave a Comment

Grow Your Traffic with Keyword Research

This post is an introduction for WordPress.com bloggers to the concept of keyword research, with some real-world examples, tools, and step-by-step guidance.

Filed Under: Keyword Research, SEO, WordPress.com Tagged With: google suggest, keyword research, Keywords, keywordtool.io, long-tail keywords

Backlinking: High PR vs Relevancy

January 12, 2010 By Chris Smith Leave a Comment

In my most recent research, I’ve come to discover that the high PR versus relevancy debate is less important than most think.

The typical procedure for creating backlinks (from high PR sites) usually goes as follows: pick a keyword or a couple of keywords, start linking from high PR sites, and then wait for the traffic to come in on those keywords via boosted SERP rankings.

The relevant linking is much more drawn out. Join some sort of community, build a relationship, and people will naturally link to you, but most likely from low-PR sites that are usually, in the end, NOT relevant.

For the most powerful linking strategy, we need to combine the best parts of both procedures, and create one supercharged backlinking plan.

High-PR advantages:
1. High PR (obviously)
2. Bigger PR voting to Google

Relevancy advantages:
1. Narrowed scope of potential visitors (however, for normal backlinking, this isn’t really something we should worry about yet)
2. Diverse spread of anchor text

Take a look at the italicized items: High PR and diverse anchor text. I’ll skip the high PR one, since we all know that high PR is preferable, and focus on the second one; diverse anchor text.

Unless you stumble upon something that becomes an Internet meme overnight, building a lot of links in a relevant, organic manner will take months. The only benefit from this method is the diversity of anchor text that links back to your site. While you might have a site dedicated to “green widgets”, you could easily have anchor text linked from someone’s blog that reads “cool widget”, and yet another that used “cheapest widget I’ve found online”. Extrapolate a decent organic linking pace over a year, and you’ll have a respectable amount of backlinks with hugely varying anchor text.

Now, with the amount of refereeing that Google (and the other search engines) have been doing recently, it’s starting to become less easy to quickly build a library of links without unwillingly participating in the Google Dance.

So, how can we avoid this? Internet memes seem relatively unaffected by the Google Dance, yet it’s possible for thousands of links to a site to happen overnight.

By creating backlinks with a very diverse (15-20 at least) set of widely varying anchor texts (albeit linking back to one or two key pages), it’s possible (but not guaranteed) to sidestep Google’s penchant to temporarily penalize a page that suddenly gains a large number of incoming links. We know that this likely to work in most scenarios because of the way that Google already handles hot trends: by not penalizing seemingly organic waves of backlinks. It also gives the impression that the links are much less likely to be automatically generated, and the endpoint site is more likely relevant within a larger cross-section of varied, but related contexts for your subject/keyword..

Essentially, you’re creating your own “hot trend”, and because of the varying anchor texts, you’ll have a stronger position over a greater number of what should be longer-tail keywords (if you follow the formula for creating your anchor texts that I’ll show later).

Here’s the formula that I typically use, and seems to work fairly well:

50-60% of your anchor texts should be keywords containing the root keyword that vary slightly. For example, “red widgets”, “blue widgets”, and “green widgets” would all be good if you want to target “widgets”. You could even do something like “widgets repair”. They should all be (assuming you have a single-word keyword) 2 or 3 words, but preferably 2 words.

Some people will tell you that it doesn’t matter what the rest are, as long as they are somewhat relevant, but I disagree. I’ll take about half of what’s left (20-25%) and make up a short sentence with your keyword in it, and create a phrase anchor text. “I found a shop that repairs the special edition orange widgets for half of what the dealership does.” I would then use the full sentence for linking, and actually link only the italicized text as the anchor text. I will also make about 10 variations of sentences, repeating them only a few times (if at all).

As far as the rest of the links (20-25%), I will usually pick 3-5 high-converting, long-tail keywords (use AdWords to test for click-through rates from Google), and backlink those without ANY modification to the word itself.

So, to recap:
50-60% – Variation of primary keyword
20-25% – Phrase within a sentence, loosely-related to main keyword but highly relevant
20-25% – Only long-tail keywords, higher traffic ones with good conversion preferred

Try out this method in your next round of linking, and see if you can skip the Google Dance altogether!

Filed Under: Backlinking, Google, SERPs Tagged With: Backlinking, Google, Google Dance, Keywords

Latest Posts

  • Keyword Research Using Python, RAKE, and Support Chat Transcripts June 16, 2017
  • No Title June 16, 2017
  • Premixed Old Fashioneds May 3, 2017
  • Downstairs Bathroom Renovation March 8, 2017
  • Testing Android January 9, 2017
  • About Me
  • Tools
  • Blog
  • Contact

Copyright © 2022 ยท Built with the Genesis Framework on WordPress

 

Loading Comments...