Entity Optimisation in 2019 – Kevin Indig

What are entities and how do they relate to SEO? Why are they important? How can you optimise for entities?

The brilliant Kevin Indig tackled all these questions (and a whole load more) in a fantastic talk he delivered at Optimisey.

The video, slides and transcript (usual caveat here – mistakes/typos etc. are mine, not Kevin’s) are below if you want to get stuck in.

A word of warning, Kevin gets into some pretty deep and complex topics like machine learning and natural language processing (some of the audience watching this talk had steam coming out of their ears partway through!) but stick with it.

Kevin brilliantly turns all that theory into actionable advice and tips for things you can actually do to your website and content plans in 2019 to help your SEO efforts really hit the mark.

If you enjoy all that and it leaves you eager for more you should grab a free place at the next Optimisey event.

Book your place

Video & Slides



Thanks for coming out tonight – really pumped to be here. Thanks also to the sponsors to make this happen really kind and thanks to Andrew for pulling this off. This is awesome right? I think SEO meetups and events are super crucial. That’s where I got the most valuable information, still get the most value information in the beginning.

So Andrew and I we’ve been trying to make this happen for almost a year I think now? And so I’m glad to finally be here. And when we talked about coming here and he was like ‘Hey why don’t you just talk about what really excites you?’ and what really excites me is, amongst other things, fitness… but this is not an option so I’m talking about entities tonight…

But on a serious note I’m talking about entities because I think this is really, really, really important to get right in SEO otherwise you might be left behind not in a couple of years but already. Today.

So before we move on – little correction right there I know you got me and drew with the S it’s supposed to be a zee.

Why are entities important in SEO?

But yeah it starts with the why. Like why are these actually important?

And there’s three big reasons for why.

So yes, in general, SEO is getting much, much harder compared to ten years ago when I started and when I came up.

It’s getting much more complex, with reasons.

So first of all general the complexity of SEO has increased because Google is using more machine learning. The more ranking factors they’re being rolled out fluently, you know over time, so it’s really hard to recognise a pattern and really know and understand what’s going on. We’ll talk a bit more about the updates in just a second.

And so the second reason here is that paid search in general is getting more expensive.

This is a slide from the internet trends report from Kleiner Perkins which I can really, really recommend you to read and it shows that the price for paid search in general is going up.

So more companies and more people will revert to organic search to get the customers.

And then the third reason is that you’re competing with Google more directly.

So this is a screenshot from the query ‘buy sneakers’ on desktop and mobile – you see the fold right here – meaning this upper part is what people see once they search that on Google.

And what I’m marking here are all the SERP features that Google shows you to immediately satisfy and answer your question or satisfy your intent.

So if you have any transaction intent – you’re trying to buy something – Google will show you certain ads where you can buy products; it will show you maps where you can buy certain products; if you have an informational query or an informational search (trying to figure something out) then Google will try to answer your question directly.

So Google has shifted in terms of sending traffic to websites and now it’s trying to answer the question that people have – right in the search results.

So it means you get less clicks on average, so you have to rank even higher and make sure that you know all your tactics are better.

So bottom line: we need to be more efficient. And as we all know content is one of the big pillars of SEO or has become one of the pillars of SEO.

Entities in SEO: Creating content

So the question is: how can we create better content to get the traffic that still comes through to other sites?

And we also have to take into account that more content is created to have more competition and less content is being shared.

So this is from a BuzzSumo study that basically shows Twitter shares over time versus how much content is being created. In this case for the topic of machine learning (but you will see the same happen in all the different sorts of topics) so even more competition when it comes to content.

And so that’s why I’m here to talk about entity optimisation.

A couple words about myself. Until there recently I’ve worked for Atlassian. I’m a free agent on the market right now but I’m also working for The German Accelerator which is a startup accelerator in the Silicon Valley.

And – as Andrew already hinted at – I live in the US and work with a couple of companies as well.

I’ll show a couple of examples.

So we’ll start with a more theoretical part and then we’ll switch over to more practical and tactical advice.

What are entities in SEO?

Let’s jump in. What actually are entities?

So entities is a somewhat abstract concept but I’ll do my best to to explain it to you.

Entities can be different things they can be places, people, organisations but they can also be concepts or ideas. So for example when you have a text you’ll see that Google recognises different sorts of entities like people, an organisation like Google, a date like 2007 or a country like America and so on and so on.

Entities in SEO - a visual demonstrating different types of entities and how Google recognises them

And this is how Google improves its natural or its understanding of language and basically its understanding of content as well.

If I were to try to define what entities are I would say they are semantic interconnected objects that help machines to understand explicit and implicit language.

And this is the crucial part right here to understand: it’s not just about the explicit thing you say it’s also about implicit concepts, ideas, trends all these kind of things.

The question of course is why Google focuses on entity extraction or on entities as a whole?

And there are a couple of good reasons but here are some of them.

So first of all we all know that there is a lot of you know ‘link graph manipulation’, link building all that kind of stuff and there’s a lot of spam out there.

So Google actually tries to shift away from that. For a time there’s – you’ve all seen that people kind of assume that social signals are becoming more important because that’s a better indicator for what relevant content is to Google – but right now it seems a lot like it’s entities.

It also helps to group content from the same brand or same asset. So if you were to think about an app from a company, and the website – and a couple of other things that we’ll talk about in just a second – then entities allow Google to really understand what is coming from the same brand and what might be different.

It’s also language agnostic and it is very supportive of voice search. And I will dive deeper into all of these reasons.

How to do entity extraction (and how Google does it)

So first of all if you were to try to do entity extraction yourself – and to do the extraction is basically trying to get these entities out of text to process them to work with them and so on – then there are basically four different parts that you need:

  1. you need an ID to recognise an entity like an address and there are MREIDs that have been written about online they’re basically like URLs but not the exact same concept
  2. We need tonnes of data – that Google already has because of its corpus or because of the Google index
  3. We need a knowledge base like Freebase or Wikipedia and remember that Google bought Freebase a couple of years ago and it’s probably exactly for this purpose
  4. And then we need attributes- and attributes are basically the relationships between entities which help Google to understand the concepts behind them.

And in 2012 Google published some data that they already had – this is supposed to be 500 million by the way not five hundred – five hundred million objects in their knowledge base and 3.5 billion facts and there was seven years ago.

So you can imagine how high that number is nowadays and how good Google’s understating has become.

And Google uses a a set of algorithms called Word2Vec to increase its understanding of language. It’s based on two sub algorithms.

The first one is Skip Grams the other one is Continuous Bag of Words – which is what the ‘CBOW’ stands for – and it basically combines all of these words.

It looks at them you know it looks basically at what word is next to each other – and that on a large scale. And it’s able to understand how the words are related in the text.

For example what is the subject, object etc. And then it’s also able to cluster your content into categories.

And by the way this is publicly accessible information this is from the natural language processing API which I’ll talk more about in this presentation.

What it shows is that Google already has lots of different categories there – I think about a hundred – to cluster content into and understand what the broader scope and topic of that content is. And with that algorithm it’s able to basically map words or basically transform words into numbers and these numbers can be mapped onto a vector or a graph.

Remember math and eighth grade? That was no fun but this is! And it basically is able to understand words with the same relationship in different languages.

A chart showing words mapped to vectors - Word2Vec - a means for GOogle to understand entities

So this here says one, two, three, four, five and this these are the same numbers in Spanish. And you see they’re almost in the same spot right? They almost have the same position.

And same  down here: you have a pig, cow, horse and in Spanish they almost have the same position.

Once you translate words into numbers it becomes concepts and what they represent, becomes very clear independently of what language you use. That’s highly, highly powerful.

At the same time there are, when you connect these vectors, you’re able to understand the relationships between them. And this works in different formats.

So really we’ve already had the the language and country comparison right? So these are all capitals of different countries.

And so for example Ottawa and Canada is the same as Moscow in Russia from a conceptual perspective. They’re both capitals of a country at the same time, their verb tenses and their male-female relationships.

A chart showing the relationship between words as understood by natural language processing

And these are just a few examples of language.

And what that allows is that you can use words to put up basic calculations. So if you were to say:

King – a man + a woman = a Queen.

That all of sudden becomes incredibly logical right?

‘Things not strings’

It really helps you not only to understand how many words are on a page it helps you to understand what they mean. And you can translate that back into understanding what users are actually trying to figure out and what the user is actually trying to know.

So you can imagine if you were to mention a graph you can think of entities like nodes and relationships like edges right?

And so when you were to Google “United States presidential candidates in 2012” you have for example: Mitt Romney, Ron Paul, Gary Johnson and of course they’re all Presidential candidates.

But then they’re all of course living people right? They’re not dead, like Abraham Lincoln or George Washington, they’re all part of the Republican Party.

And then there are other people that also have the last name Johnson – like Andrew Johnson or Lyndon B Johnson which have also been part of the Republican Party.

But then for example Mitt Romney and Johnson have both been involved in Medicare right?

So through all these connections between entities that help Google to understand what they’re all about.

And if you scale that up a billion times then that’s really, really powerful understanding that’s very different from the keyword density that we used to look at a couple years ago – five to ten years ago when, I cannot write – you just had to stuff text with keywords.

And so this obviously is already being applied for example to a Knowledge Graph where Google shows you lots of different information about all sorts of words.

It’s also being shown in these carousels over here where Google is clearly able to understand that all these brands created products for project management software.

And it’s also being applied to other SERP features – that suddenly start to pop out of nowhere – meaning features shown in the search result pages like these passenger planes.

I was looking for Airbus A380 and Google is able to understand that all these are passenger planes as well.

But Google also announced publicly that they’re going to make that shift and that was in 2012 an announcement from from Amit Singhal the Head of Search back then.

I’m not going to read the whole thing to you but he basically said – and this is the highlighted part at the bottom :

It’s what we’ve been working on an intelligent model – in geek-speek a ‘graph’ – that understands real-world entities in their relationship to one another: things not strings.
– Amit Singhal

And you’ve probably heard this ‘thing’s not strings’ a couple of times in different blog articles and it’s exactly the shift that I’m trying to convey here,

It’s the shift away from just the amount of keywords and the text and more to a deeper understanding of what the text is actually about.

So we’ve seen a couple of updates this year – with a couple I mean almost one update per month – there have been lots of shifts. I’ve worked with lots of bigger brands and sites that lost a lot of traffic.

And I and a couple of other people in the industry think that they’re all interconnected. These are not updates that happen in isolation but they all play into a greater thing which is this kind of shift to entities and entity extraction.

And if you want to read a really good article about that then check out AJ Kohn’s Algorithm Analysis in the Age of Embeddings and while I’m plugin AJ Kohn also check out the work of Cindy Krum.

She has a beautiful five article series about entities and entity understanding that really showed different applications of the concept, and voice search, and organic search, and mobile search and all these kind of things.

But he was spot-on: these updates have been natural language understanding updates meaning Google refined its understanding of what language and content is actually all about.

10 tips to apply entity extraction

So with all that theoretical stuff let’s actually get to some actionables like what should you do with all of that stuff that I just threw at you?

That’s why I compiled 10 tips to apply the theory about entity extraction.

And the first one is to actually:

Build a brand and not just a site

Now that sounds very abstract but what it actually means is when you have a brand make sure that you give Google all the indicators that known and popular brands have in search and on the internet.

Create the social accounts; maybe spend do some AdWords spend; make sure that there are people related to your brand and company right as authors or as employees; if you have an app link it to your site.

Obviously the domain is a strong indicator. Logo is a strong indicator. Location is a strong indicator. So don’t forget to have the address of your company’s somewhere on the site. I’ve seen forgotten in the past as well.

And obviously you also want to become an expert for different topics. So let me speak about topics.

Tip number two is:

Create expert level content

Now we’ve heard that idea of ’10x content’ but I really want to home in on that because I think it’s really important.

At Atlassian what we did is we created so called micro sites. And a micro site is a set of pages that solely revolves around one single topic – and by the way we had these on sub-directories not on sub-domains as they’re often being used.

“Make sure that they’re in a sub-directory because it’s easier for Google to recognise that they belong to the same domain and therefore to the same brand.

And these micro sites, as I said, all revolve around a single topic. Here you see ‘The Agile Coach’ which is all about the topic of agile and you’ll find everything from what is agile, to how to apply it, to the actual ceremonies all the little sub-details and everything that you know you actually need to know to become an actual agile pro.

We also have the git micro site which is all you need to know about the topic of git.

And that you know this is one example of what the traffic looks like over time. Starts at basically end of 2015 until recently and we got some nice boost along the way as Google’s understanding of language got better and as we continued to build out the site.

So think of this expert level content, first of all of content that revolves around a single topic but also content that looks like a library right? Not a series of blog articles where it’s mixed with lots of different topics but as an enclosed kind of you know library.

That if, once you land there, you can figure everything out about the topic.

And then third tip is to:

Use Google’s natural language API

So I talked about this earlier. If you go to cloud.google.com/natural-language you can not only sign up for the API and test it out you can also test it live. On that landing page you can copy paste your text and see what entities Google finds in your content.

And I’ve did that here with an article that I wrote and it found 461 entities. That’s what they all look like.

A screenshot of Google's natural language API reviewing text

I mean look at how many Google can already recognise. These are all concepts, places, people, brands and whatnot that Google already understands in text right?

So this is massive. You can also find out the sentiment of each entity – meaning do you write about it in a positive or a negative concept context? And what’s very important is the salience here.

So salience is a value from one from zero to one that shows you how important that entity is in the greater context of the article you’re writing.

So obviously you know something like: backlinks, linking, Cheirank etc. these are crucial points and concepts in the text that I wrote. And I want them to to appear high up here with a high as possible salience.

Then sometimes Google would also link to the Wikipedia article about that specific topic. And that’s a good sign because Google relies so heavily on Wikipedia to understand these entities it is good when they show you you have a lot of entities that also have a Wikipedia article because you can make sure that Google has a better understanding of your importance or of the importance of the text I wrote for that specific topic.

A nice feature as well that’s often overlooked are the categories. It’s still the same landing page and still the same API – so here you can see whether Google really understands what categories your content fits into and what not.

So here it’s about web services and business services so that’s pretty much spot-on and Google shows a very high confidence for these two categories as well so that’s really good.

A screenshot of Google's natural language API reviewing text and its related categories

And then if you use that you know – if you create content on a smaller scale but wanna make sure it’s optimised – simply copy and paste your content into that NLP API and just tweak it a little bit.

Just look at what comes up you know and make sure that the most important concepts that you’re trying to convey are showing up as the most important entities in the report.

Number four is:

Structure your content

And this is something relatively basic that you learn when you get started in SEO but I just want to highlight it again.

Make sure that your content is easy to read, easy to understand, use lots of headlines, use things like tables. Google loves that stuff.

Lists – you know bulleted or numbered doesn’t matter – just make sure you you make it easy to convey not just to a reader but also to a machine.

And the more structured your content is the easier it is for Google to understand how the different parts, paragraphs and words play together and how important they are.

Don’t forget your author.

There was a lot of chatter about E-A-T meaning expertise and authority, authoritativeness and trustworthiness being an important factor now in SEO.

I’m not saying this is wrong but I’m saying if you’re an author that writes on several different sites and creates lots of content Google will understand that this author is an expert in that specific field and might – and this is an assumption here – might just you know give you a bonus if you have an author writing your content as well.

And let me stress this again this is an assumption I don’t have proof for that – but it makes a lot of sense when you think about the relationships of entities.

Yeah: title, description, related articles all the fun stuff!

Number five:

Reverse engineer optimal content through the search results pages

And this seems like a bit of a no-brainer when you think about it but has proven to be an efficient tactic to create good content over and over again.

What I mean by that is use the features that Google shows in the search results pages to understand what Google thinks is important right?

So when you have this ‘People also ask’ box and you write an article about ‘How to get upgrades with miles’ look at the questions that people show, at what Google shows here.

You can – when you open a couple of these Google will expand and show you more questions – so make use of that you know? Use it for paragraphs for subheadings and make sure that you cover all the information that seems to be important here.

At the same time look at the Knowledge Graph integration if there is one and see what data Google shows. A Knowledge Graph here right this is all super valuable and might just help you to become more important for that specific topic.

And then there are a couple of more other things that you can use like what Google suggest is obviously a no-brainer but still very valuable. There’s ‘Searches related to your query’ this is at the very bottom of the page or of the search and it used to be called ‘People also search’ or ‘People also look for’ its now ‘Searches related to’ so that’s very interesting. Make sure you got that on your radar.

And then in in the Knowledge Graph you now have this ‘People also search for’ so get inspired by that.

It’s really easy to put together a content briefing by just looking at what the search results pages actually look like.

And then of course schema the heck out of this.

So you probably all know that you should use structured data and schema json-ld but there’s one that really stands out and that’s the ‘same as’ itemprop tag.

It helps you to say, you know in code basically, ‘Hey this organisation is the same as the organisation mentioned in this Wikipedia article right here’ or ‘These are the social profiles for this brand’ so make sure you include this nice neat little tag when you add structured data to your site because it really helps Google to understand the direct relationships between entities.

Optimise for the search journey

And what I mean by that is when you create content try to embrace where the searcher has come from and where he’s going to next.

So what I mean by that is:

Every query, every keyword that brings a visitor to your site has a past and a future – as in what are the steps that the searcher might have looked for previously? And what are the next ones?

And you can use that to show other articles at the bottom of yours or even mention it in the content – you know add breadcrumbs, navigation to your page make it as easy as possible for the user to get the full bandwidth of the topic within that area the query lies in that he was searching for.

And this is what I meant earlier with create a library of your topic and not just you know banging out different blog articles about different topics.

Satisfy intent or nothing

So nowadays if you don’t satisfy the intent directly of the user you will not rank.

This is already a reality and what that looks like is that Google has given us four basic intents: so finding information; finding a location; finding a brand; and buy something or you know purchase something.

And when we look at the search results pages we’re able to identify what Google wants to see right now as an intent. So for example ‘buy sneakers’ very obvious somebody wants to buy something or you know purchase sneakers obviously.

But when people ask questions this is obviously very informational intent. And when we add all of this together we’re able to identify all these different user intents depending on what search features there are.

I wrote a little article about that called ‘User Intent on Steroids’ where you know it’s easier for you to identify what the searcher is doing right now.

It sounds very simple but in lots of cases it’s not right. So imagine something like ‘Independence Day’. For most time of the year when you Google ‘Independence Day’ you will see lots of results about the movie Independence Day.

But closer to 4th of July you’ll see everything about the holiday right? So these intents and these entities they’re seasonal. Their intent changes over time.

I think of something like Apple right? Which is now obviously a brand but which is also a fruit. So when it wasn’t imagined it wasn’t as known today as it is then you would see something very different right?

So be aware of that and use the search results features to understand what Google is trying to see and what Google is trying to get here.

And the ninth tip is to

Look at the first five results that are ranking

And understand what they’re writing about and try to be better right? Not only should you cover and summarise the topic in your content.

And this is just an arbitrary number – you know five results that I wanted to show here – but you should look at what they’re writing about; what their soft topics are; how their content is structured.

Do it the same and do it better right?

Think about the information that might be missing from these articles. Because there’s obviously a reason that Google is ranking these so high.

And then the last tip is to

Avoid weak click-through rate

So for a long time we said, or a lot of people including myself said, that user signals are really important.

User signals are something like click-through rate or time on site but now we actually have proof.

So this from a site that I’ve been working with in the health space – very big site – that was hit by one of these bigger updates. This one specifically in March and when I went to the click-through rate in the Search Console I saw that almost all of the keywords or queries that the site lost traffic for had really, really weak click-through rate.

So if you’ve been affected by one of the big updates this year go back to Search Console and check how the click-through rate was before your site lost traffic and see if maybe a low click-through rate might have triggered the update.

I would bet that this is the case.

So you can regularly go into Search Console, see which queries show a low click-through rate and then adjust accordingly, optimise your snippet, optimise the content, see if you know you’re maybe not hitting the user intent or see if maybe the content is just old or poorly designed and make sure you optimise them.

So this little summary – I’m not going to go through all of these again – but give you a quick second to take a picture. Yep if you want to find out more I wrote an article about that on my site Kevin-Indig.com. I write a weekly newsletter, it’s free we write about all of this stuff… and thanks for attention


That was pretty awesome, no? Don’t forget to subscribe to the Optimisey YouTube channel and make sure you grab a free seat at the next event!

Book your place

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.