Chris W. Smith

  • About Me
    • Past Work
  • Tools
  • Blog
  • Contact

Keyword Research Using Python, RAKE, and Support Chat Transcripts

June 16, 2017 By Chris Smith Leave a Comment

Over at WordPress.com, our main avenue of customer support is live chat. We previously were on Olark, but has since built out a chat system that we call HappyChat (support folks are referred to as Happiness Engineers). There are a number of excellent features that the development team has built in, but an often underutilized one is chat tagging. If a user joins a chat, and asks about a domain renewal, that chat might be tagged with “domains” or “domain-renewal”; we aren’t very strict on tagging, except in certain circumstances. We can pull out the data we need even if the tagging is a little fuzzy.

While I’ve been writing search marketing and SEO articles for our users for a while, I wanted to know exactly what our users were asking in chat, and what kind of things were we spending chat time teaching them about. This is where the chat tagging comes in. Most of the Happiness Engineers are pretty diligent about tagging their chats, so it was pretty easy for me to get an SQL dump of all the transcripts tagged with “seo”…however, it ended up being roughly about 18,000 lines of chat transcripts over the past few months.

Time for some Python magic.

I exported the chats out of my local MySQL instance into a CSV file. This thing needed some cleanup first before I could really process it. I like Atom, but any decent text editor that supports regex will do. I needed to get this CSV, with every detail about every chat (which included many columns of diagnostic-type data), down to the bare messages. After a number of different regex find-and-replaces in Atom, I finally trimmed it down to about 15,000 lines of pure chat messaging, both the user and the Happiness Engineer side of the conversation.

Ok, now it’s time for some Python magic.

There’s probably a better way to do this with R, but I already somewhat know Python, so that was my main tool of choice. I’d dabbled with nltk before, but all I needed for now was a simple keyword extraction with weighting of the terms. I wanted to know all the related terms, but I wanted to make sure I knew which were occurring frequently. After a bit more research, the RAKE package seemed to be exactly what I needed.

RAKE is pretty simple as far as these things go, but I came across a great tutorial here that walks you through doing an extraction. While you don’t need to get super fancy with it, I ended up adding a fair amount of functionality to allow me to use a text file as the input, and output a CSV with the scored keywords to make it easier to use. You can see what I ended up with here (feel free to fork and put your own spin on it).

I ran my chats through the RAKE extraction, and on the first run, I got some okay results. A lot of expected keywords pulled out, but a lot of noise from URLs that were pasted into chat messages. So, I went back into Atom and used more regex to clean those out.

My second pass came out quite a bit cleaner, but still a bit iffy on the words it was ranking highly. At that point, I discovered that Medelyan had included an optimization script in her repo for RAKE. I ran the `optimize_rake.py` script to determine the best settings for the keyword set, and what I had (defaults) were far different than the eventual suggestion from the optimization script.  I changed the settings, and let RAKE loose again on the chat messages.

After an hour or so of processing, RAKE finally spit out a pretty solid list of ranked keywords.

Screenshot of eventual keyword list.

As you can see, these are pretty on-point search marketing and SEO-related terms, which is exactly what I was looking for with this experiment. These are the things our users are actually asking us about, and not speculation or anecdotal evidence.

Your clients might not using chat for customer interactions; it requires a fair amount of “babysitting” and isn’t necessarily useful for all industries. However, they’ve probably got a CRM system that customer interactions run through, or a contact form for pre-sales questions. They might record phone calls that could be transcribed by an outsourced transcription service. There are a million ways that you can get customer input/inquiries in a manner that this technique can be used on. The silver bullet is that you’re not looking at all the same competitors and AdWords data that everyone else is; you’re gleaning keyword research directly from real, live customers. Using these weighted keyword lists as part of your research strategy will help drive down click budgets, create better targeted content, and come up with better answers for those common customer questions.

Tools
Python: https://www.python.org/downloads/
RAKE: https://github.com/zelandiya/RAKE-tutorial and https://github.com/chrisfromthelc/python-rake (my “value-added” version)
Atom: https://atom.io/

Filed Under: Data Science, Keyword Research, Keywords, SEO, WordPress.com Tagged With: atom, keyword research, Keywords, python, RAKE, wordpress.com

Grow Your Traffic with Keyword Research

July 7, 2016 By Chris Smith Leave a Comment

Grow Your Traffic with Keyword Research

This post is an introduction for WordPress.com bloggers to the concept of keyword research, with some real-world examples, tools, and step-by-step guidance.

Filed Under: Keyword Research, SEO, WordPress.com Tagged With: google suggest, keyword research, Keywords, keywordtool.io, long-tail keywords

Three Google Search Console Tips for Better SEO

December 16, 2015 By Chris Smith Leave a Comment

Three Google Search Console Tips for Better SEO

This is a post I wrote geared towards WordPress.com users. However, the tips are great for any site, no matter the platform, and are the very first things I do after verifying a site on Google Search Console.

Filed Under: Google, Google Search Console, SEO, WordPress.com Tagged With: fetch as google, google search console, googlebot, international targeting, internationalization, sitemaps

Help! Googlebot Cannot Access CSS And JS files On My WordPress Site!

July 28, 2015 By Chris Smith Leave a Comment

Googlebot cannot access CSS and JS files on...

If you received this message this morning from Google Search Console (formerly Google Webmaster Tools), you’re not alone. Google has recently been pushing harder for webmasters to allow crawlers full access to all Javascript and CSS so that they can render a site to determine whether or not it meets mobile standards (among other things). Many WordPress sites (along with the other major CMSs) received an ominous warning that their site was blocking assets, and it could affect rankings. While technically true, many of the sites were only blocking /wp-admin/, where absolutely no public-facing assets should live. For those of you that are in that situation, relax.

However, a number of sites are also blocking /wp-includes/. While it doesn’t seem obvious, there are a number of things that live here which would need to be accessed by users (i.e., crawlers) to render pages properly. For example, Dashicons, the small icons you generally associate with the admin side of WordPress, can often be called by themes for front-end usage. Another major thing that can hinder proper rendering by crawlers is jQuery. Sometimes, themes will enqueue a different version, but by default, it lives in the /wp-includes/ folder. If we dive even further into the issue, we’d see that the built-in emojis and comment reply handling would also be affected.

So, what can be safely blocked? At this point, here’s what a “compliant” WordPress robots.txt would look like (as far as what is safe to block). Of course, you’d want to add in your own sitemap directives and other special cases, but this is a good starting point.


User-agent: *
Disallow: /cgi-bin/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /xmlrpc.php

One big word of warning: DO NOT block anything under /wp-content/. You can assume anything blocked here is going to hit you pretty hard eventually, since everything to do with site rendering exists here; plugins, themes, images, and so on. It was popular at one point to block crawlers (like Googlebot) from plugin folders, but given the amount of Javascript and CSS many plugins employ now, it’s a dangerous proposition. If you’re blocking anything here, open it back up immediately.

Questions or comments? Leave them below!

Filed Under: Google, SEO, WordPress Tagged With: css, google search console, google webmaster tools, googlebot, gwt, javascript, js, robots.txt, search console, webmaster tools

Improving WordPress Site Speed

February 11, 2015 By Chris Smith Leave a Comment

(originally posted on Bring Your Own Design)

Most of us don’t need to serve up millions (or even hundreds) of pageviews a day, but, eventually you’ll run into a situation where your site is running slow. While the reasons for a slow-loading site are countless, there are a few things you can do to help increase your WordPress site speed.

Hosting

It’s easy to get tempted by cheap hosting. You can even find free WordPress hosting if you look hard enough! But, the old adage of “you get what you pay for” definitely applies to the web hosting world. Cheap hosts tend to overcrowd their servers, and one badly-behaving site can take down multiple others! For mission-critical sites, we recommend using a managed hosting provider like WPEngine or Synthesis (if you’re on StudioPress Genesis). If you’re handy with a command line, a managed VPS like A Small Orange offers is probably your best bet.

Content Delivery Networks (CDNs)

If you have a lot of images or other media, a CDN is the way to go. Content Delivery Networks are basically copies of your local files that exist in different locales for faster access around the world. For example, if you have a site that is popular in Europe, but your hosting is based in the US, it would make a lot of sense to use something like MaxCDN or Amazon Cloudfront to cache files on one of their locations in Europe for quick access from that part of the world.

While speed is the ultimate payoff with CDNs, cost can be a beneficial factor as well. Most CDNs will allow you to store gigabytes of data for pennies, compared to the much higher prices of purchasing additional storage and bandwidth from your hosting provider.

To help you manage using a CDN with WordPress, we’d suggest W3 Total Cache. While it is a full-fledged optimization plugin, it can be used only for the CDN functionality (but you really should use the other stuff too!).

Minify, Static-fy

Minifying your static files will make an incremental improvement on site loading speed, but all those small improvements really can add up in the long run! HTML, CSS, and Javascript can all be minified, and you can even combine your CSS and Javascript files for even more improvements.

W3 Total Cache is our favorite for handling those tasks. Another thing that it does well is generate static files to serve up, instead of hitting the database on every single request. This method works best on sites with pages that don’t change much, but even busy blogs can benefit from it.

NGINX

Here’s where we get into the really heavy stuff. If you’ve outgrown managed hosting, and are running a VPS, you’ll definitely want to swap out Apache for NGINX. Many of the biggest and most popular WordPress-based sites run on it (because it’s simply just faster than Apache for WordPress). Another huge advantage for WordPress site speed is being able to easily use APC and memcached on NGINX. With a VPS, you’re typically on your own to get these things running, but some VPS hosts offer configurations like this as an option or a paid service. Either way, it’s well worth it to squeeze the extra speed out of your installation.

Plugins (or rather, getting rid of them)

Many people make the big mistake of overdoing plugins. Even for the simplest task, a plugin has to make a database call, which slows loading of the page. Simple things like social media buttons can be coded directly into the template to save resources. Offloading plugin functionality to the theme really does make a huge difference in speed.

Finally…

While this is by no means a comprehensive list of to-do items, it certainly will give you a solid set of ideas and tools to work with on making your WordPress site faster. Is there a trick or tip that you think should be included? If so, leave us a comment!

Share our post by clicking on your favorite social site below!

Filed Under: SEO, Speed, Web Development Tagged With: cdn, content delivery networks, manage vps, managed hosting, nginx, Site Speed, w3 total cache, wordpress hosting

  • 1
  • 2
  • 3
  • 4
  • Next Page »

Latest Posts

  • Keyword Research Using Python, RAKE, and Support Chat Transcripts June 16, 2017
  • No Title June 16, 2017
  • Premixed Old Fashioneds May 3, 2017
  • Downstairs Bathroom Renovation March 8, 2017
  • Testing Android January 9, 2017
  • About Me
  • Tools
  • Blog
  • Contact

Copyright © 2022 ยท Built with the Genesis Framework on WordPress

 

Loading Comments...