How does Google understand text?

On yoast.com, we talk a lot about writing and readability. We consider it a very important part of SEO. Your text needs to be easy to follow and it needs to satisfy your users’ needs. This, in turn, will help your rankings. However, we rarely talk about how Google and other search engines read and understand these texts. In this post, we’ll explore what we know about how Google analyzes online text.

Are we sure Google understands text?

We know that Google understands text to some degree. Just think about it. One of the most important things Google has to do is match what someone types into the search bar to a suitable search result. User signals (like click-through and bounce rates) alone won’t help Google to do this. Moreover, we know that it’s possible to rank for a phrase that you don’t use in your text (although it’s still good practice to identify and use one or more specific keyphrases). So clearly, Google does something to actually read and assess your text in some way or another.

How Google understands text

Back to our initial question: How does Google understand text? To be honest, we don’t know this in detail. Unfortunately, that information isn’t freely available. And we also know, judging from the search results, that there is still a lot of work that needs to be done. But there are some clues here and there that we can draw conclusions from. We know that Google has taken big steps when it comes to understanding context. We also know that the search engine tries to determine how words and concepts are related to each other. How do we know this? On the one hand, by keeping an eye on any news surrounding Google’s algorithm. On the other hand, by considering how the actual search results pages have changed.

Word embeddings

One interesting technique Google has filed patents for and worked on is called word embedding. We’ll save the details for another post, but the goal is basically to find out what words are closely related to other words. This is what happens: a computer program is fed a certain amount of text. It then analyzes the words in that text and determines what words tend to appear together. Then, it translates every word into a series of numbers. This allows the words to be represented as a point in space in a diagram, like a scatter plot. This diagram shows what words are related in what ways. More accurately, it shows the distance between words, sort of like a galaxy made up of words. So for example, a word like “keywords” would be much closer to “copywriting” than it would be to say “kitchen utensils”.

Interestingly, this can also be done for phrases, sentences and paragraphs. The bigger the dataset you feed the program, the better it will be able to categorize and understand words and work out how they’re used and what they mean. And, what do you know, Google has a database of the entire internet. With a dataset like that, it’s possible to create very reliable models that predict and assess the value of text and context.

From word embeddings, it’s only a small step to the concept of related entities. Let’s take a look at the search results to illustrate what related entities are. If you type in “types of pasta”, this is what you’ll see right at the top of the SERP: a heading called “pasta varieties”, with a number of rich results that include a ton of different types of pasta. These pasta varieties are even subcategorized into “ribbon pasta”, “tubular pasta”, and other subtypes of pasta. And there are lots and lots of similar SERPs that reflect the way words and concepts are related to each other.

google entities types of pasta — After typing [types of pasta] Google now shows this entity-based rich result

The related entities patent that Google has filed actually mentions the related entities index database. This is a database that stores concepts or entities, like pasta. These entities also have characteristics. Lasagna, for example, is a pasta. It’s also made of dough. And it’s food. Now, by analyzing the characteristics of entities, they can be grouped and categorized in all kinds of different ways. This allows Google to better understand how words are related, and, therefore, to better understand context.

Google is heavily investing in NLP

Natural language processing is the understanding of language by machines. It is one of the hardest parts of computer science and one where the most advances are being made. Today, with a world increasingly powered by systems run by AI, proper language understanding is key. Google understands this and invests a ton in the development of NLP models. One key system was BERT, a model that could understand the text coming after the content words and before those words. This way, the system has the full context of a sentence to make proper sense of its meaning. What BERT did is awesome, but Google is doing more. Meet MUM.

MUM: Google’s upcoming language model

At an event this year, Google introduced a new language model: MUM. According to Google, it’s supposed to be 1000 times more powerful than BERT, an earlier language model. How? Well apparently, MUM can multitask. This means that this model can read text, understand its meaning, form a deeper knowledge about the subject, use other media to enrich that knowledge, get insights from more than 75 languages and translate everything into content that answers complex search queries. All at the same time.

Google's MUM language model — A visual representation of how Google MUM works (image from Google’s blog)

Practical conclusions

So, how does Google understand text exactly? What we know leads us to two very important points:

1. Context is key

If Google understands context in one way or another, it’s likely to assess and judge context as well. The better your copy matches Google’s notion of the context, the better its chances of ranking well. So thin copy with a limited scope is going to be at a disadvantage. You need to cover your topics properly and exhaustively. And on a larger scale, covering related concepts and presenting a full body of work on your site will reinforce your authority on the topic you write about and specialize in.

2. Write for your reader

Easier texts which clearly reflect relationships between concepts don’t just benefit your readers, they help Google as well. Difficult, inconsistent and writing with poor structure is more difficult to understand for both humans and machines. You can help the search engine understand your texts by focusing on:

Readability: making your text as easy to read as possible without compromising your message.
Good structure: adding clear subheadings and using transition words.
Good content: adding clear explanations that show how what you’re saying relates to what’s already known about a topic.

The better you do, the easier your users and Google alike will understand your text and what it tries to achieve. Which also helps you rank with the right pages when a user types in a certain search query. Especially because Google is basically creating a model that mimics the way we humans process language and information.

Google wants to be a reader

In the end, it boils down to this message: Google is becoming more and more like an actual reader. By writing rich content which is well-structured and easy to read and is clearly embedded into the context of the topic at hand, you’ll improve your chances of doing well in the search results.

Read more: SEO copywriting: the ultimate guide »

The post How does Google understand text? appeared first on Yoast.