A word or 300 about using “noindex,follow” and other indexing signal factors…
Questions have been asked from people in the SEO community steadily over the years about using “noindex,follow”, or “canonical tags pointing to other pages”, as ways to get Google to act a certain way. It’s a subject that’s come up from time to time recently in a group I’m an admin of over on Facebook — “Dumb SEO Questions”.
In providing some insight there today, I thought it would be a strong enough topic to deserve my writing a blog post about it.
Note this is my perspective and experience to those lines of thinking. It doesn’t mean what I convey here is absolute, set in stone, or applicable to every site and every situation. With all things SEO, there are edge case scenarios where something else may be true. So take what I offer here and do with it what you will.
Google — The All-Knowing Decider of Indexing
Google, being the all wise cataloger of the web (in their view), does not truly respect robots.txt, meta robots tags, x‑robots tags or canonical tags, in spite of those each having the purpose of being a directive. Google’s programmers, in their view of the world, consider each such signal only as a “hint” as to what site owners intend.
Conflicts Within Site Signals Muddy The Process
Of course, many sites inject conflicts across that range of signals, which is the altruistic reason Google decided long ago to only take them as hints. Except that means the imperfect algorithmic process quite often makes a poor determination as to what the system “should” do if not what those signals convey.
Because of all of that, even including a URL type in robots.txt or canonicalizing to a different URL or having a meta robots noindex status can sometimes not prevent Google from not only crawling some URLs but also where they end up indexing them.
Crawled Not Indexed
“Most” of the time (a relative concept), if a file is listed in the robots.txt file, even when Google crawls them, they will list those URLs in search results, yet they will have at least honored the spirit of the robots.txt file by not indexing the content of those URLs. It’s pretty insane sometimes.
The “Noindex,Follow” Way to Wealth, Fame and Lost Value
As for “noindex,follow”, there’s never a valid reason to use that combination. Sure, Google will pass value THROUGH those pages, initially. Except if the noindex aspect remains long enough, Google will eventually stop even passing the value through them — they’ll end up being removed from the Google process entirely.
John Mueller confirmed this in a Webmaster Hangout — Barry Schwartz shared the video over on SEO Roundtable .
And from the perspective of consistency of signals, if a page deserves “noindex” status, it’s best to use “nofollow” as well, so as to more readily convey what you do want indexed and crawled simultaneously. For larger sites this is even more important because of crawl budget considerations, where such a factor starts to become integral to getting signals correctly understood in formulaic processing.
But Wait — There’s More to It!
Then there’s the notion that other signals get factored in to all of this as well. If enough pages are deemed, by Google’s systems, to be unworthy of indexing for other reasons — too much duplication, not enough unique value, not enough trust, and other examples, those pages won’t always be indexed in spite of other signals. Or they may be indexed, yet not helpful. In fact, Google will sometimes index pages that don’t ultimately deserve to be indexed, and that alone can weaken the value of pages that do deserve to be indexed.
URL Parameters, Sitemap XML Files, and Inbound Links
Whenever discussing the crawl and indexation decision process, it’s important to also mention the fact that URL paramters, when set to “representative URL”, or “Let Googlebot Decide” can also muck up that decision process. The same is true for inclusion in sitemap XML files and when enough high value inbound links exist. All of these can influence, to varying degrees, how and what Google crawls and ultimately indexes. In spite of robots settings or canonical tags.
The SEO Indexing Bottom Line — Consistency
Okay that wasn’t the bottom line. It was just the last section label in this post. The best recommendation I have is one I repeat often in my audit work — Never leave it to Google to “figure it all out” when you have the ability to control, through consistency of signals, what you want their systems to do and how you want their systems to behave regarding your site.