
Google — The All-Knowing Decider of Indexing
Conflicts Within Site Signals Muddy The Process
Crawled Not Indexed
The “Noindex,Follow” Way to Wealth, Fame and Lost Value
But Wait — There’s More to It!
URL Parameters, Sitemap XML Files, and Inbound Links
Whenever discussing the crawl and indexation decision process, it’s important to also mention the fact that URL paramters, when set to “representative URL”, or “Let Googlebot Decide” can also muck up that decision process. The same is true for inclusion in sitemap XML files and when enough high value inbound links exist. All of these can influence, to varying degrees, how and what Google crawls and ultimately indexes. In spite of robots settings or canonical tags.
_______________________________________
The SEO Indexing Bottom Line — Consistency
Okay that wasn’t the bottom line. It was just the last section label in this post. The best recommendation I have is one I repeat often in my audit work — Never leave it to Google to “figure it all out” when you have the ability to control, through consistency of signals, what you want their systems to do and how you want their systems to behave regarding your site.
It’s interesting to hear your comment about canonicals. Canonicals are something still new to me (only because I choose to focus my efforts on broader topics such as audits and consulting). But you’re the first authority I’ve heard mention Google does not truly respect canonicals.
But come to think of it, that has to be true.
Google has guidelines and recommendations, but that’s it. There’s no sure-fire way of performing many tasks when it comes to indexing a website properly. Google uses custom software, not a single human being to index websites.
A simple post, Alan, but very valuable to me. Thank you.
JL,
Google first confirmed it’s a hint, officially, in their own documentation many years ago. Here’s an entry from Google’s webmaster blog going back to 2009.
___________________________
Is rel=“canonical” a hint or a directive?
It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.
___________________________
So it’s just a matter of knowing where to look, and also paying attention to as much as they communicate as is reasonable. Yet it’s also something I know because of how many audits I’ve done where Google has completely ignored mass volumes of canonicals due to other signals confusing them.
Quote reference from:
https://webmasters.googleblog.com/2009/02/specify-your-canonical.html
Then is it safe to say, anything a webmaster has control over, Google will take their directives as only recommendations or a “hint”, but can choose otherwise the final outcome?
Because it would not be in their best interest as a business, to allow a person outside of their company, to have control over how they display content.
Have you come across an absolute process, to anything within search engine marketing where you can predict the outcome to an action taking by an SEO, every time?
Essentially, that is the situation we face. And that’s exactly why I always recommend people need to be consistent in their signals where one signal can confirm or conflict with other signals.
As for predicting outcome of any one thing, in every case, it’s tricky. There are “most of the time” scenarios, and “some of the time” scenarios for just about any one thing that can be done.
I wholeheartedly agree on how Google treats robots.txt, meta robots tags, x‑robots tags and canonicals, Alan, as well as their reasons for failing to respect them when faced with conflicting or confusing signals. I’ve seen those conflicts cause serious crawl budget issues on sites with only a few thousand pages… I can only imagine how much worse it could become for a huge ecomm site. Great post!
Thanks Doc. Yeah when it’s a half million pages that have conflicts, that takes a toll all over the place.