Untangling Generative Search Myths in SEO | Dawn Anderson at Optimisey

After COVID put the brakes on the in-person Optimisey SEO events I assumed it’d be a temporary pause.

A couple of years and several lockdowns later, they were still on pause.

Several people were kind enough to ask me: “When are you bringing them back?” some were even more insistent: “You should bring them back.”

One of those people was Dawn Anderson. So when she said that whilst we were chatting at the UK Search Awards I said: “I will… if you come and speak at one.”

Happily for me (and all those that were there to see her speak) Dawn said: “Yeah, OK.” – despite being based in Manchester, some way from Cambridge!

SEO, AEO, GEO or EI-EI-O?

Dawn is an SEO expert with not just the experience and skills to show for it for the qualifications, degrees, Master’s degree and more.

Shortly before her talk her book, The AI SEO Playbook: Practical strategies for entity SEO, AI citations, and visibility in generative search, that she was commissioned to write went on pre-sale.

All the smart folks that come to speak at #OptimiseyEvents know their stuff but Dawn takes that to another level.

In her talk she tackles lots of the ‘AI in SEO’ myths that are bandied around by the new raft of ‘experts’.

Get the good stuff first

Want to be among the first to get this good stuff? Here are your options:

  1. Attend the Optimisey events in person and get it all first hand (plus the chance to discuss it with the speakers and audience afterwards!)
  2. Sign-up to the Optimisey newsletter
  3. Subscribe to the Optimisey YouTube channel

Now, on to the video of Dawn’s talk, slides and a full transcript are all below – with the usual caveat – any errors are almost certainly mine, my typos (or the AI tool I used to help) not Dawn’s.


Video & Slides


Transcript

I’m going to be talking about AI SEO. Obviously, we have been bombarded this past few years since ChatGPT launched globally. It is a strange new world at the moment in the SEO world in particular, as in a lot of other worlds. We are being asked all kinds of strange questions suddenly; the senior leadership teams want to know why we’re not in ChatGPT and why we are not appearing for this prompt and that prompt and so on and so forth.

So I’m going to talk about some of the myths that are starting to appear because, as with anything new, you end up with lots of folklore starting to emerge. As I say, some of it’s going to get a bit geeky. Bear with me, and hopefully most of it will make sense.

The Rise of AI SEO Myths and Misinformation

If anybody’s been in SEO for any period of time, you will know that we get these perpetual cycles of cyclic debates around 301 redirects or 302 redirects, or subfolders versus subdirectories, or all these strange things. We get all these strange emails promising to instantly rank number one, claiming bad backlinks, or saying content doesn’t matter, and so on and so forth. We have got a misinformation superhighway which has suddenly gone into overdrive with the emergence of AI.

Now, you may not know some of these terminologies if you’re not into the geeky stuff, but I’m going to hopefully clear some of that up for you. We have been a bit bombarded with some new terms, like one of them is llms.txt. Another one you may have heard of is information gain, which has been doing the rounds for a couple of years now. Chunking is another big one that seems to be having quite a big impact in the SEO world.

Obviously, we have also got stories where people are being laid off sometimes because nobody needs SEO anymore since AI can do it all, or nobody needs a copywriter anymore because AI can do it all. As I say, I’m going to go over some of these.

Why SEO Myths Spread So Quickly

Why do SEO myths spread generally? Our industry is fairly big on mistruths; we have had a reputation in the past, unfortunately, for the whole snake oil salesperson stereotype. We are trying to lose that, but obviously, when AI SEO comes along, we go back ten steps with this.

Part of the reason is obviously rapid evolution. We’re in a really fast-moving learning curve at the moment with AI SEO, with lots of guesswork going on, more so than normal. SEO does not have that many definitive answers. It’s not like PPC where you can just switch it on and switch it off, bid X, and get Y. The words “it depends” are kind of our industry catchphrase.

If you ever go on LinkedIn or any other social media channel where there is quite a big SEO community, you will end up with these echo chambers and almost sycophantic groups where you end up with tribes of this opinion and that opinion. Right now, there is a whole debate over whether it is GEO, SEO, or AEO, leading to a kind of herd behavior.

Patents are another big one in SEO. Everybody waves a patent around and says, “Hey, look at this Google patent,” without necessarily realizing that patents are not always in production. Unfortunately, from the C-suite in our industry, there is a desire for simple levers, like wanting to know what is going to make this work tomorrow. SEO is not like that, but it doesn’t stop people from pushing for quick change. One of the biggest issues at the minute is ambiguous terminology and newer terms appearing in our industry that people don’t fully understand.

The Myth of Content Chunking in SEO

The first one is the myth of chunking. Chunking has several meanings, and you’re going to hear a lot about chunking over the years now in the SEO world because it is largely to do with the way large language models are different from the way search engines deal with what they call information retrieval in normal text.

Chunking in itself as a terminology has different meanings depending on which particular background of computer science you’re looking at it from. There is a notion of chunking in information theory and choice theory, such as the famous paper by George Miller that was based on the magic number seven. Basically, his concept was that if short-term memory has to try and count over seven or consume over seven pieces of information together, you just end up not being able to remember it.

In the world of information architecture, this theory of the magic number seven plus or minus two has an influence on things like the number of items on a navigation menu, making sure people don’t have too much choice. Chunking is also used in education in what they call the segmenting principle, which is about breaking pieces of education and learning up into small pieces.

Unfortunately, the SEO world seems to have latched onto this chunking notion generally and they are trying to apply it to normal search. In actual fact, chunking relates to chopping up pieces of text for large language models like ChatGPT, Gemini, Copilot, and so forth. The way AI works with search is that it has what they call a context window, meaning it can only look at so many things at a time because of the computational expense of it.

Why Writing for Machine Chunks Fails

What some people have started doing in the SEO world, with some tools even suggesting it, is breaking up the text on their page into really silly, awful formats like one sentence, then a space, then a sentence, then a space, or five words and a space. Search engines do not want us to break up the text on the page so that it is ridiculously difficult to read, looks stupid, or looks unnatural.

In fact, just a couple of days ago, there was a Google Search Central event in Toronto and one of the big things that Danny Sullivan from Google said is, “Hey, Google doesn’t want you to chunk your content up into tiny pieces.” Don’t do it. What is happening is search engines are trying to learn how to integrate the LLM world in with the normal search world, so they are trying to teach their system to understand what natural looks like.

All of a sudden, if you have cut your content up into these tiny chunks which are really difficult for humans to consume, and search engines suddenly start to get better at understanding what humans want, your content stands out like an old keyword-stuffed page. We don’t need to do these things.

Chunking is used in machine learning because of the token limits on language models to make sure there is no loss of information. It enables what they call vector search, and there is also this notion of Retrieval-Augmented Generation (RAG), which is where large language models need to utilize external knowledge from search or another knowledge base to help them fully understand gaps in the data they have trained on. They train language models in a closed model system which has a cutoff time, so it is never good enough. They have to take extra learning from current search using this method of Retrieval-Augmented Generation, and chunking is used there as well.

Therefore, it supports knowledge retrieval rather than ranking; it has nothing to do with ranking, and I think that’s part of the problem. There are also many different types of machine learning chunking, so how do we know which one Google is using if we try to game the system? Well, we just don’t, so we shouldn’t bother.

Semantic Structuring vs. Machine Chunking

The narrative seems misunderstood. You’re going to hear a lot about chunking, but the bottom line is: don’t go there. You even see respected tools in our industry displaying examples of what they claim chunking looks like versus unchunked text.

That is not chunked content on that page; that is semantically organized, well-structured content, probably utilizing a classic system such as the inverted pyramid of journalism. That is just well-organized content, not cut into little tiny pieces. Do not confuse semantic structuring, which you should use, with chunking.

Otherwise, Wikipedia has been chunking for decades because Wikipedia does a great job of structuring reference materials. You should organize your content like Wikipedia in that regard. Google explicitly says don’t chunk your content, so keep everything natural.

Don’t forget the search engines are trying to emulate humans, so do not write for machines. Nowadays you even hear people saying we are writing for machines again, but you are not. If you are, you’re writing for machines that are trying to emulate humans, so always think of the human reader.

Focus on clear, readable, user-first content, strong topical authority, and very strong internal linking to send strong signals of what the content is. Make sure you’re satisfying user intent. Increasingly, technical performance is not going away anytime soon, and all the foundations of good technical SEO matter as much as they always did, alongside good structured data, accessibility, and crawlability.

You were probably doing that already, but this is the point: AI SEO is not any different, it is just a nuanced evolution as we move forward. It is going to be hybrid, but the foundations of good SEO will get you 99% of the way there.

Is LLMs.txt a Legitimate Web Protocol?

Most of you have heard of robots.txt, which we use on a website to say to search engine crawlers either go here or don’t go here. Somebody came up with an idea for large language model bots called llms.txt, literally supposedly to be like robots.txt but for large language model crawlers.

In principle, it kind of sounds like a good idea, but in reality, you don’t just throw out a new internet protocol and expect the internet to accept it overnight. Things like robots.txt are commissioned and accepted by the Internet Engineering Task Force (IETF) over many years of debates, alongside the W3C foundation and various professors. They are the ones deciding whether something gets accepted as a web protocol, so you don’t just throw out an llms.txt idea and say, “Hey, let’s all accept this.”

The proposed file did get some traction, and people started putting the llms.txt file on their websites to signpost the best parts of a website to AI bot crawlers, effectively minimizing what resources LLM bots would need. However, there is a lot of confusion over its purpose, and it is certainly not for crawling or LLM control.

Mark Williams Cook, based in Norwich, jokingly said that if somebody is going to have an llms.txt, he was going to create a cats.txt to see if he could get a standard for that going. His actually did just as well as llms.txt. The point is that the SEO world has gone a bit nuts jumping on all sorts of trends.

Chris Green, who is a super technical, brilliant specialist, decided to see whether AI bots actually crawl this file across millions of websites as an example. He found that actually no LLM bots are crawling the file, so they are not interested. Google has also stated that no AI system is interested in this either, so it is probably not worth bothering with.

Google keeps saying, “Don’t bother,” but then Google’s own developers added it to their own websites. Google’s John Mueller was saying nobody should bother with llms.txt, and then Crystal Carter from Wix spotted that all the Google developer websites had added it. Lo and behold, the next day they all disappeared. There is some truth to the idea that Google doesn’t give SEO advice to their own internal development teams; they don’t even speak to them, so maybe they just got caught up in the SEO hype and thought that’s what you’re supposed to do.

Agentic Search and the Future of AI Protocols

At the same time, there is another argument for this file because we’re entering a world of agentic search. This is where you won’t just have standard ChatGPT queries; you’ll have agents where you can ask it to go and book tickets to a concert, specify what kind of seats or restaurant table you want, and it just goes off and does it. That is the future that Google is talking about going forward.

Andrea Vulpini, who is super clever and deeply into semantic search, had a really good thought that llms.txt isn’t for discovery, it is actually for these automated agents to control agentic search using structured data. It is all kind of just starting to emerge now.

There is a lot going on in this agentic world, including the Universal Commerce Protocol and a collaboration between Shopify and OpenAI called an agentic commerce protocol. This is where eventually you will be able to just go to ChatGPT and check out instead of visiting a separate website. This involves a collaboration between Google, Shopify, and various LLM tech companies, meaning protocols like llms.txt might impact more on the agentic side of things. Much of the narrative, however, is driven by AI fear and blog exaggeration making claims and causing confusion.

Demystifying Information Gain

The next topic is information gain. It all centers around a single patent that somebody jumped on, which basically says if you’ve got more information than your competitor, we’re going to rank you higher. That’s the bottom line to it, but it is just one relatively old patent.

Terminology in one sector of computer science can have a completely different meaning elsewhere, and in reality, information gain in the machine learning world means nothing like that patent. In the content marketing world, people are using it to pedal this notion of adding more value, which makes perfect sense, but patents being written doesn’t necessarily mean they have been deployed.

The bottom line is it looks similar to what we would call skyscraper SEO, where you see what your competitors are doing, do something better, and add a bit more value. But again, you were probably already doing that anyway. Who doesn’t look at what competitors are doing and say, “We’re going to do something better than that?”

In actual fact, information gain is much bigger than just that one patent. It is based on a paper called A Mathematical Theory of Communication by Claude Shannon, which has tens of thousands of citations and is huge in the world of machine learning and AI search using decision tree classification.

Basically, it is used to predict how pure a split is in data. Classification modeling is widespread in search engines; for instance, Gmail uses it as a classifier to decide if an email looks like spam. Information gain does what they call the quantification of reduction in entropy or uncertainty.

It is possibly used in crawling to know if a crawler should continue down a path, where the value ends, or to determine content duplication by deciding where the purity is. Claude the LLM was actually named after Claude Shannon because his work is that significant, but it is not what SEOs are saying it is; it is about purity in classification.

Information gain gets misread due to the worshipping of patents under the assumption that they are automatically in production, but they are not designed for easily measurable factors. A lot of the time, patents are published just as defensive measures to protect things and stop a competitor from creating something you think they might make, and the majority of papers never actually ship.

Does Google Penalize AI Content?

Another prevalent SEO myth is that AI content is automatically penalized or automatically low quality, which is not true. However, if you scale AI content and just churn out a load of copy-and-paste stuff from ChatGPT and throw it out there, there’s a trend emerging called “Mount AI.” This is where visibility looks like it’s going to do really well for a few months and then it just falls off a cliff.

I think that is the period of time it takes for Google to return a few times, build a picture, and realize what’s going on. That penalty happens because you’re doing it at scale without thinking about quality, not simply because it is AI.

Studies show that if you use AI well for ideation, you can do really well with it. Google has said it won’t cause an automatic penalty; it’s not the AI itself that causes it. Automation has been used for many years already for sports scores, predictions, or travel sites summarizing user reviews. You can do great stuff with it, but it depends entirely on how you utilize it. Every e-commerce site uses automated content and has for years because they are dynamically driven templates.

The Future of Hybrid Search and AI Overviews

We’re starting to realize that AI isn’t completely taking over SEO; rather, we are learning to live alongside it. It is a really steep learning curve. Tom Capper from Moz gave a talk questioning if we have passed “peak AI Overviews,” but it turns out we haven’t. There are actually more AI Overviews appearing at the top of search results now, so we are having to learn to live with it.

The classic claim that SEO is dead is false, but we are definitely in a hybrid state. The industry cannot even agree on a name yet, debating terms like GEO, AIO, AEO, and ASO on LinkedIn. It is a steep learning curve and we have a lot to learn as we go along.

Focus on the Core Foundations of Good SEO

You should always just revert back to the key things: be helpful in your content, and focus heavily on originality. Google recently discussed the difference between “commodity content” and “non-commodity content.” Commodity content is generic stuff that anybody could create, like “10 tips for this” or “5 tips for that,” which a scraper or an AI could easily churn out.

The real value is in non-commodity content that uses your own data, surveys, human experience, and interviews—things that ChatGPT cannot necessarily just chuck out. You must also maintain entity clarity by making sure your messaging is consistent everywhere you are found on the internet.

Crawl efficiency, proper sitemaps for discoverability, and content freshness remain highly important. Fabrice Canel from Bing always talks about continuously revisiting your content to bring it up to date. Apparently, there is a 60% higher chance that you’re going to get cited in either Google’s AI Overviews or LLMs like ChatGPT or Perplexity if your content is fresh versus old content. Finally, search is complex, meaning critical thinking and the disambiguation of terminology are increasingly critical.


Thanks Dawn! Never have so many myths been blasted in such a short amount of time!

You can, of course, buy her book and if you enjoyed this sort of thing and would like more you should subscribe to the Optimsey Youtube channel too!

Was this article helpful?
YesNo
Posted in: SEO

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.