Since I do a lot of site audit work, I often have clients with extensive volumes of old content — typically blog posts, with some “news” articles in the mix. With Google’s Panda algorithm focused in on quality considerations on-site, many client’s assume they have to kill all the old content, while many others who don’t stay up on SEO don’t know if its hurting them or not, and its up to me to inform them what to do with that content.
Earlier today, Barry Schwartz posted an article over on Search Engine Roundtable entitled “Google: Panda Victims Don’t Necessarily Need To Delete Old Blog Posts”.
In that article, Barry covered a discussion where Marie Haynes (one of the industry’s leading link cleanup experts) asked Google’s John Mueller whether thousands of old, rarely read blog posts might harm a site specifically because of Panda.
NO SINGLE ANSWER
John responded by essentially saying that it’s not an absolute one way or the other. That’s an answer I give to clients way too often in response to questions across the entire spectrum of SEO. It’s just the nature of how complex the web is, how complex search algorithms and multidimensional considerations have become regarding quality.
What it comes down to is what I communicated in my most recent presentation on the Philosophy of SEO Audits at Pubcon in Las Vegas:
SEO is Google’s algorithmic attempt to emulate user experience
Summing up what John initially said about reasons not to kill off the old content:
There could be valid reasons to keep old content archives, and Google does their best to recognize where a site might have a lot of old content — but as long as the main focus of the site was high quality, they take that into consideration when they can as well.
He did however, clearly communicate at least one scenario where you need to not ignore that content:
But sometimes it might also be that these old blog posts are really low quality and kind of bad, then that’s something you would take action on.
NOT CLEAR ENOUGH FOR EVERYONE
To many people this still leaves the question wide open for interpretation. Since John didn’t say “always do this” or “Don’t take a risk — when in doubt delete it”, that leaves people guessing way too often. The “yeah but…” response kicks in. Or worse, they just clump that type of response in with all the other reasons out there why Google shouldn’t be trusted.
CRITICAL THINKING REQUIRED
Personally, I don’t see it as an insurmountable issue to figure out.
Yes, there are always at least some exceptions to the concept of a standard “best practices” approach here. Yet most of the time, even with variables in place, it’s a straight forward decision making process.
THE QUALITY SCALE CONSIDERATION
Look at the signals that content is sending, and it’s scale compared to quality content. If it’s old or new is less relevant than the quality signals, especially on scale.
Individual page quality scores also need to be considered in relation to the totality of scores within an individual section of the site. If the overwhelming majority of content in a given section is strong, that can outweigh the negative signals from those pages within that section that are low quality.
The same applies to the entire site. If enough quality exists on scale across the entire site, that low quality portion is less harmful overall.
There are more often borderline cases that are the tough ones to decide about — mediocre content that may or may not be a problem overall, but where it’s “likely” to be weighing down the site.
WHAT MAKES IT LOW QUALITY
Google has done a lot over the past few years to try and communicate what makes something low quality. Though to be honest, they have done so in very generic “what would a user think” terms.
From my audit work, several patterns have emerged though that fit that notion.
- Page Speed
- Topical Confusion / duplication
- Topical association with your main message / goals
- Usefulness to Users (intuitiveness of access, helpfulness)
Those are all part of my five super signals:
WHEN IN DOUBT — FOLLOW BEST PRACTICES
From a best practices perspective the answer is simple — set a single high standard for quality, uniqueness, and relevance. If you do so, authority and trust will be an outgrowth of that effort.
Anything below that standard gets killed off. That way you don’t have to guess about whether it may or may not be hurting your site.
CRAWL EFFICIENCY CONSIDERATION
One other reason to slash and burn it even if its borderline is if you have a big site — that old or low quality content doesn’t get crawled as frequently as newer or higher quality content, yet it gets crawled at some point. And that works against you from a crawl efficiency perspective.
If at any point Google is crawling that mass volume of “borderline” content, their system may very well abandon the crawl. It’s a known fact that their systems do abandon crawl all the time, the bigger the site being crawled. So why force them to crawl questionable content and as a result, have other content that might be newer, skipped? That’s crazy.
404 or 410 — WHICH IS BETTER?
One last recommendation — I’ve found that if you set those no-longer existing pages to a 410 “gone” server status, that sometimes helps speed up the pace at which Google removes those pages from their index. It’s an unequivocal signal — 404s can sometimes be caused by unintentional mistakes, whereas 410 is a clear signal.