Chris W. Smith

  • About Me
    • Past Work
  • Tools
  • Blog
  • Contact

Keyword Research Using Python, RAKE, and Support Chat Transcripts

June 16, 2017 By Chris Smith Leave a Comment

Over at WordPress.com, our main avenue of customer support is live chat. We previously were on Olark, but has since built out a chat system that we call HappyChat (support folks are referred to as Happiness Engineers). There are a number of excellent features that the development team has built in, but an often underutilized one is chat tagging. If a user joins a chat, and asks about a domain renewal, that chat might be tagged with “domains” or “domain-renewal”; we aren’t very strict on tagging, except in certain circumstances. We can pull out the data we need even if the tagging is a little fuzzy.

While I’ve been writing search marketing and SEO articles for our users for a while, I wanted to know exactly what our users were asking in chat, and what kind of things were we spending chat time teaching them about. This is where the chat tagging comes in. Most of the Happiness Engineers are pretty diligent about tagging their chats, so it was pretty easy for me to get an SQL dump of all the transcripts tagged with “seo”…however, it ended up being roughly about 18,000 lines of chat transcripts over the past few months.

Time for some Python magic.

I exported the chats out of my local MySQL instance into a CSV file. This thing needed some cleanup first before I could really process it. I like Atom, but any decent text editor that supports regex will do. I needed to get this CSV, with every detail about every chat (which included many columns of diagnostic-type data), down to the bare messages. After a number of different regex find-and-replaces in Atom, I finally trimmed it down to about 15,000 lines of pure chat messaging, both the user and the Happiness Engineer side of the conversation.

Ok, now it’s time for some Python magic.

There’s probably a better way to do this with R, but I already somewhat know Python, so that was my main tool of choice. I’d dabbled with nltk before, but all I needed for now was a simple keyword extraction with weighting of the terms. I wanted to know all the related terms, but I wanted to make sure I knew which were occurring frequently. After a bit more research, the RAKE package seemed to be exactly what I needed.

RAKE is pretty simple as far as these things go, but I came across a great tutorial here that walks you through doing an extraction. While you don’t need to get super fancy with it, I ended up adding a fair amount of functionality to allow me to use a text file as the input, and output a CSV with the scored keywords to make it easier to use. You can see what I ended up with here (feel free to fork and put your own spin on it).

I ran my chats through the RAKE extraction, and on the first run, I got some okay results. A lot of expected keywords pulled out, but a lot of noise from URLs that were pasted into chat messages. So, I went back into Atom and used more regex to clean those out.

My second pass came out quite a bit cleaner, but still a bit iffy on the words it was ranking highly. At that point, I discovered that Medelyan had included an optimization script in her repo for RAKE. I ran the `optimize_rake.py` script to determine the best settings for the keyword set, and what I had (defaults) were far different than the eventual suggestion from the optimization script.  I changed the settings, and let RAKE loose again on the chat messages.

After an hour or so of processing, RAKE finally spit out a pretty solid list of ranked keywords.

Screenshot of eventual keyword list.

As you can see, these are pretty on-point search marketing and SEO-related terms, which is exactly what I was looking for with this experiment. These are the things our users are actually asking us about, and not speculation or anecdotal evidence.

Your clients might not using chat for customer interactions; it requires a fair amount of “babysitting” and isn’t necessarily useful for all industries. However, they’ve probably got a CRM system that customer interactions run through, or a contact form for pre-sales questions. They might record phone calls that could be transcribed by an outsourced transcription service. There are a million ways that you can get customer input/inquiries in a manner that this technique can be used on. The silver bullet is that you’re not looking at all the same competitors and AdWords data that everyone else is; you’re gleaning keyword research directly from real, live customers. Using these weighted keyword lists as part of your research strategy will help drive down click budgets, create better targeted content, and come up with better answers for those common customer questions.

Tools
Python: https://www.python.org/downloads/
RAKE: https://github.com/zelandiya/RAKE-tutorial and https://github.com/chrisfromthelc/python-rake (my “value-added” version)
Atom: https://atom.io/

Filed Under: Data Science, Keyword Research, Keywords, SEO, WordPress.com Tagged With: atom, keyword research, Keywords, python, RAKE, wordpress.com

Grow Your Traffic with Keyword Research

July 7, 2016 By Chris Smith Leave a Comment

Grow Your Traffic with Keyword Research

This post is an introduction for WordPress.com bloggers to the concept of keyword research, with some real-world examples, tools, and step-by-step guidance.

Filed Under: Keyword Research, SEO, WordPress.com Tagged With: google suggest, keyword research, Keywords, keywordtool.io, long-tail keywords

Analog Keyword Research

January 8, 2013 By Chris Smith Leave a Comment

(originally posted on DesignBigger)

When it comes to keyword research, I’m guilty of doing the same old thing; I open up the AdWords Keyword Tool and start pecking away, hoping to take a few general head keywords and turn it into something workable. I’m pretty sure every other SEO on the planet starts out keyword research pretty similarly, if not the exact same way.

Most of our customers are traditionally paper and pen kind of businesses; the digital landscape is pretty new to them, and subsequently, they may not really have many online assets (service manuals, documentation, whitepapers, etc.). It kind of struck me today that while getting these things online and working in our benefit might be extremely costly, we can still take advantage of them for the initial brainstorming phase (not to mention content fodder down the road).

Here are some non-electronic assets you should absolutely be looking at for keyword research and content generation:

1. Product manuals

Typically, manuals exist for everything from CD players to industrial equipment costing thousands of dollars. Most manuals contain professionally-written content about the exact item you’re trying to market, and probably address (in the instructions or specifications) features that might be selling points. “Bar length = 20” can easily turn into “20 inch gas chainsaw” for a pretty solid keyword to bid on or build a product page around. It seems simple (and it is!), but it’s quite effective.

2. Tech Support/Customer Service/Receptionist/Sales

Ask any of the phone-answering employees in your company, and they’ll be able to tell you what the top 3 things people call and ask about are. These are get things to write FAQs about, and will give you keyword ideas for pages that could be built out or improved upon. If you want to get really tricky, use a service like CallRail and record your calls, and have them transcribed by an outsourcer inexpensively. Use the resulting data to generate a word frequency list. Pull out the winners and research the surprising outliers.

Better yet, parse your helpdesk and customer service emails for this information; if you use a CRM system, this functionality might even already be available to you.

3. Sales Literature/Contact Cards/Trade Show Materials

Yes, finally, there is a use for those cards you’re made to fill out before you can get a free t-shirt. Where there’s a greater than 100% chance that John Doe isn’t going to buy from you, you probably got some information in the way of his job title (and it might even be real!), problems that they’re looking to solve in their business, and location.

If John Doe’s a chemical engineer in Seattle, and he’s looking for pollution control devices, targeting “thermal oxidizer manufacturer seattle” might be a good idea. It’s going to be insanely cheap to bid on (if you’re doing PPC), very targeted for content, and will bring in buying traffic versus researching traffic.

As far as sales literature, of course, review your own first. However, your next step should be poring over your competitor’s information to see where you might’ve missed something. Perhaps they use a slightly different term for something that might be familiar to customers, and if so, addressing that term will bring in extra, already interested, traffic.

These aren’t groundbreaking new sources of keywords, and they won’t replace what we currently do. But, they’ll give you ideas for things that you can’t find in the AdWords Tool or WordTracker, especially when you’re in an industrial or technical niche. Glean some solid starting phrases, and then build on that success. You’ll be ahead of the rest of the people who keep doing the same thing they’ve always done.

 

Filed Under: Keyword Research, SEO

Three SEO Metrics You Should Monitor, or, “PageRank Is Still Mostly Useless”

July 26, 2011 By Chris Smith Leave a Comment

It’s been pretty commonly agreed for quite a while that PageRank (in the form that’s publically viewable) is a number that just doesn’t match up to the real authority of a particular site. The latest post from the Webmaster Tools blog solidifies that fact that this just isn’t something to base your success metrics on.

The key phrase of the article?

…PageRank comes in a number. Relevance doesn’t.

Anybody with any marked SEM experience would agree that conversions are the ultimate goal. Beyond the obvious (conversion rates, bounce rate, number of conversions), there are a few numerical metrics that I like to keep tabs on.

Domain Authority and Page Authority

Domain Authority is a pretty good indicator of how you’re doing when it comes to building your link profile. Yes, it’s a number, just like PageRank, however, there’s a clear definition of what’s being figured into that number. Inside of Domain Authority, you’ll notice that MozRank, MozTrust, number of linking domains, among others. Page Authority is a similar score, but limited to the single page. The domain’s Domain Authority score is a site-wide average of the Page Authority metrics.

Visit-To-Goal Conversion Rate Trending

You can have all the #1 organic rankings in the world, but if you can’t convert, you may as well just not have a website, right? Far too often, clients are so focused on achieving first place organic rankings that they lose site of why they’re spending money on internet marketing in the first place; to make a sale, a contact, or a lead. Thorough documentation of SEO efforts (dates included) will help you determine with pretty close accuracy as to what is working and what isn’t when it comes to your conversion rate metrics. People tend to overlook high-converting, low competition, long tail keywords because they normally don’t bring an appreciable amount of traffic. However, these quick and easy wins can make the difference between an okay month and a great month of conversions.

Top 100 Keyword Metrics

I like to individually monitor the top 100 keywords that are sending traffic to the site. This could even trickle down to single digit visits, but that’s okay. The goal here is two-fold; we want to make sure that we’re attracting the right traffic by watching the entrance keywords, and then look for areas that we’re not optimizing for that would be yet another easy win. Just so we’re clear, this list is going to naturally morph over time.

Additionally, 100 isn’t a hard limit. Your particular list could be much larger, or even smaller depending on the site. The methods and reasoning is still the same: getting the correct, targeted traffic to the site with the least amount of hassle.

While there are many more metrics that you can and should be monitoring, these three are certainly very (if not the most) important things you should be keeping a close eye on. If you do well in these metrics, you’ll be pleased with your site’s performance (and hopefully, your client will be also).

Filed Under: Google, Keyword Research, SEO, SERPs

Latest Posts

  • Keyword Research Using Python, RAKE, and Support Chat Transcripts June 16, 2017
  • No Title June 16, 2017
  • Premixed Old Fashioneds May 3, 2017
  • Downstairs Bathroom Renovation March 8, 2017
  • Testing Android January 9, 2017
  • About Me
  • Tools
  • Blog
  • Contact

Copyright © 2022 ยท Built with the Genesis Framework on WordPress

 

Loading Comments...