If you receive a list of target keywords or a keyword research spreadsheet, do you know how they were sourced? Where exactly does your keyword data come from? Your answer is likely "from my browser extension," "from our audience editorial team," "from my keyword research tool" or "from our digital marketing agency."

Now that digital marketing has become common, if not sophisticated, many basic aspects of the optimization projects are taken for granted or considered "magic." You get the keywords from the keyword generator because your online SEO course or your boss explained that's where to find them. You use the software, organize the keywords, and pass off to the next person.

You may even understand how some variations of your keywords got there: if you entered the word "comic books" into a keyword research tool, you shouldn't be surprised to see search terms like "comic book store near me," "marvel comic books" or even "graphic novel," the more literary synonym for comic books.

But why are you seeing terms like "seduction of the innocent," "orc stain"or "tokyo ghost" in your research report when the seed term is "comic book"?

The answer lies in how Google's search algorithms process, analyze and process textual data from both advertisers and search users, creating what semantic analysts call knowledge graphs and entities.

What is entity optimization in SEO content? | The Content Technologist
In SEO, entities are the “topic clouds” or holistic concepts that surround a keyword. Entity optimization is both extremely simple and maddeningly complex. Learn about entity optimization.

Google's decades-long data harvest

A note: Researching Google via Google is a bit of an ouroboros, since Google does its best to keep only its most current policies and best practices on its website and its search results. Please know that this newsletter is an overview from an experienced professional on a deadline, and from not a historian. I acknowledge that a more journalistic documentation should use a tool other than Google... like a library or a book or an interview with an expert. I gave other search engines the ol' college try with little luck. Like much of 2000s era digital history, the history of keyword planner or any of the platform's core advertising technologies would be a great thesis topic.

Since the Google search engine launched in 1998, it's collected and stored the search terms that it uses in a database. In 2000, Google launched its search ads, followed by a self-service ad administration tool. With the self-service tool, advertisers could name the keywords they wanted their ads to appear next to, so if they were selling shoes, they could bid on "shoes."

To help advertisers find the most relevant keywords for their products—and likely to encourage the use of keyword-oriented thinking in general—Google launched Keyword planner. Again, more historical data needed here, but for as long as I've been working in search, keyword planner spits out recommended search queries and phrases based on a seed query, website or landing page.

Google's Keyword Planner results for the seed query "comic books"

In Google's nascent years, Keyword Planner data was freely available to all Google Ads users, no matter how much they spent on ad buys. But to "protect user privacy" and business interests, Google significantly scaled back what's available in keyword planner for free.

Since 2016 organic researchers like me have needed to either piggy back on a well-funded paid Google Ads account or subscribe to a keyword research tool to have access to more detailed keyword research data. The free version of the data in the gif above is deeply stripped back. In the free version, researchers can see the keywords, but the search volumes and forecasts are obscured into ranges in multiples of ten.

Language processing formulas | The Content Technologist
Language processing formulas break complex text down into component parts and assign meaning, identifying the statistical likelihood that one word or another will occur in relation to another.

Where does Keyword Planner and Google Ads API data come from?

When you use Keyword Planner or a keyword research tool for the first, third, or three-hundredth time, it can feel magic. How does Google know that all those words, many of which I've never even thought about before, are related to my original seed query?

Google Ads keyword planner data is compiled from two decades of the following:

  • Billions of end users who type queries into the platform to search for literally anything on the internet, ad nauseam, every day
  • Advertisers providing lists of keywords so they can buy relevant ad space
  • Professional and novice publishers, bloggers, Redditors, what have you building websites and pushing out content that Google's web crawlers scan and classify in its index
  • Data from other Google products, like Chrome and Analytics, that inform (if not directly influence) Google's understanding of content and search behavior
  • Google employees using semantic analysis, pattern recognition, predictive analytics, user feedback and machine learning to understand what it all means and tweak the algorithm so searchers see the best results.

With these inputs, Google catalogs and categorizes related words and patterns, building entities, or a series of commonly used words or images related to a topic. Google's entities are extremely sophisticated, which is why searching "comic books" brings up not only the usual superhero suspects from Marvel and DC, but also independent comics like the Umbrella Academy... as well as legacy comics like Little Lulu.

Google's natural language processing capabilities identifies patterns not only in the word itself, but also in syntax and intent. When I search for "comic books" in the Keyword Planner, Google knows that I'm looking for drawn panels and weekly releases of sequential art books, not Tig Notaro and live events. A few years ago, the Keyword Planner and the Ads API wasn't quite as smart. But we all learn over time, and there's no shortage of comic book content online to help Google discern the nuances.*

The planner also understands my seed query's intent is to understand comic book series and not movies or TV based on superhero comics. Google's keyword database has memory, although it's never been clear to me how far back, which means that related keywords can appear next to a seed query for years after they've peaked.

*That said, I am surprised that there's not more on WebToons or manga in the keywords generated from the "comic books" query. It's another exploration for another time.

Search data as palimpsest

The concept of a palimpsest helps me comprehend how Google's database works: it's a document that rewrites itself and refreshes constantly, but traces of past iterations never quite disappear. This works fantastically for Little Lulu, a comic strip from the 1930s that still lingers in popular memory and search queries.

Little Lulu learns about keyword research, via The Comics Alliance

Keyword planner's memory not ideal for quickly evolving cultural norms, which you'll discover if you start researching topics related to race, gender, sexuality and religion.** Why does Google think that term is related to my seed query? is a rabbit hole of Wonderland scope where you're as likely to find singing flowers as that terrifying part where a baby turns into a pig.

Google's massive relational tables (smart spreadsheets), data storage, processing capabilities and advanced machine learning organize the keyword data. It's an astonishing act of computing so advertisers can find the right keywords for their entity, quickly and accurately. It exports this data for free via the Google Ads API because it wants to "freely encourage innovation," according to its documentation. Sure, but...

Giving away raw data and free training on its software has proved wildly profitable for Google. It's a juggernaut business strategy in the open source tradition. The more people who learn to use keyword data, the more people understand search engine marketing, the more ads Google can sell.

**Read Safiya Umoja Noble's Algorithms of Oppression for a detailed analysis of this problem.

How do non-Google keyword research tools work?

Keyword research tools are independently operated platforms that harness and organize keyword data from the Google Ads API, which mirrors but doesn't necessarily replicate the data in Google's Keyword Planner. The API is free, but for organic researchers who don't have regular access to a well-funded Google Ads account, accessing its wealth requires purchase of a subscription keyword research tool.

Full article for paid supporters only. Upgrade to access.

Become a paid supporter for full access to remaining content, new articles weekly in your inbox, and hundreds of additional digital content strategy articles on The Content Technologist. No spam. We don't sell your data. By subscribing, you're helping us publish more original, independent and practical expertise from the pros at The Content Technologist.

Sign up now Already have an account? Sign in