While the actual numbers are different on each site, all too often I see sites where the volume of indexed pages becomes exponentially greater than the total number of products offered. For Ecommerce sites, it gets out of control, and becomes a crisis situation.
Actual SKU Count: 9,000
Indexed Pages: 45,000
Actual SKU Count: 25,000
Indexed Pages: 350,000
Actual Sku Count: 750,000
Indexed Pages: 9,000,000
By understanding how this happens, why it happens, and what to do about it, you can, when proceeding with caution, dramatically reduce the indexed page count while simultaneously improving overall visibility in search engines for the pages you care most about. That in turn, can significantly increase revenue.
A Word of Caution
Warning – every site is unique. Each situation is different. While the concepts presented here are based on real world experience repeated many times across several different sites in different industries, there are countless reasons this is not a guaranteed outcome.
Inability to implement technical changes is often at the top of the list of reasons this can sometimes not be achieved. Difficulty and complexity of product / sku and customer base expectations can also be another major barrier.
Another critical consideration is that I do not believe in isolated changes to a site from an SEO perspective on scale. If a site has many weaknesses, making any one significant change may help, yet it may not be enough to score the big win. There’s no simple answer in where that threshold of “we did enough, across enough areas” exists for any single site.
When the wins happen though, they can be big. Very big!
In this example, for an audit I performed this spring, and where the work took weeks to implement, then a couple months to stabilize, we see how there was a dramatic reduction in total pages indexed based on my recommendations for cleaning up categorization and faceted navigation, along with a dramatic increase (over 20%) in revenue.
Several changes were made in late spring, and the big “index cleanup” steps in this example, took place in June of 2016, while additional improvements were made after that as well.
A Quick Primer
Before I dive into the ‘how do get there’, it’s important that people just getting started in SEO, or people not yet beyond fundamental intermediate understanding, get a short primer in core concepts.
If you’re already deeply advanced, read this anyway. We can never become complacent in our understanding.
Search Engines Attempt to Emulate User Experience through Formulaic Methods
Crawl Efficiency Red Flag:
If a human being ends up having to spend too much time trying to find what they want or need, it is most likely search engine crawl budget limitations will be overburdened and your most important content will become lost in that process. End result — SEO suffers.
Sensation Overload Red Flag:
If a human being is confused or overwhelmed, it is most likely search engine algorithms will be formulaically overwhelmed.
Too many sites I audit have content organizational overload that becomes toxic on scale. End result – SEO suffers.
Let’s Dive In – Proper Planning Prevents Poor Performance
If you don’t take the time to plan things out, the results you end up with could be a death knell to your business.
The concepts I present here will hopefully give you enough understanding for you to be able to evaluate your own site situation and then plan for the cleanup, accordingly.
The Ecommerce Organizational Example
9,000 products – different organizational scenarios:
If you break it out into 30 categories, with no subcategories, and 300 products in each category, that is “going too wide and not deep enough” for both human experience and SEO.
If you have 3 categories, 300 subcategories, and 10 products in each of those, that is just is equally problematic because it goes too wide at the subcategory level and too thin within each of those.
If you have 10 categories, and 10 subcategories in each, with 90 products in each subcategory, that is one example of a reasonable site structure.
Exceptions to the Rule
If you have some categories that contain more subcategories than others, that can be perfectly valid. If some subcategories have more products than others, that too, can be perfectly valid.
Sometimes you really do have a need for an extremely large volume of main categories or within any of those, a large volume of subcategories. Or within a single subcategory, a large volume of products.
Narrowing Opportunities Can Wreak Havoc
Another usability “feature that can get out of control is “faceted” navigation – the ability for visitors to further sort, group or refine any given category or subcategory even further.
- Sort by price (highest to lowest, lowest to highest)
- Refine by price group (products under $x, products in a $x to $y price range, etc.)
- Refine by color / size / brand / ______
- Newest Products
- Best Sellers / Most Popular
These are all valid ways, when it makes sense to offer them, to help visitors look for a narrow set of products to help meet their unique situation, goal or needs.
Except you can take that too far as well, and overwhelm the senses, end up causing duplicate content confusion, and severely weaken crawl efficiency.
Human Users vs. Search Engines as Users
One of the most crucial considerations in this process is whether a particular refinement of product grouping may be beneficial to potential customers, yet where that refinement overwhelms search engines and thus isn’t helpful when it’s indexed.
Sometimes people, once they arrive at a site, want to do things that, from an SEO perspective, don’t warrant indexation, so remembering that as you make decisions, is critical.
Product Grouping & Filter Narrowing Organization Decision Process
In order to determine exactly what silos should exist, and in what combination, it will be vital for you to take the time, up front, to evaluate organizational criteria based on:
- Total number of products at each level
- Overall search volume
- Overall Revenue Value
- Refined Profit Margin Value
If this step is taken, it will help you make decisions about whether some subcategories might not be worth keeping in the new system. Some might not have enough products on their own (very thin pages).
Others might not have enough search volume to bother (low value based on inactivity), and others still, may be loss-leader or very low profit margin value to bother wanting to keep as individual pages, where consolidating those products or subcategories into other, more valuable pages can help bolster those.
Product Group Filter / Refinement Functionality
When you present visitors with the ability to narrow a particular category or subcategory result by various criteria, it can help those visitors bypass the need to look through all of your products when they want a particular highly refined sub-set.
As mentioned earlier in this article, giving visitors the ability to sort by price, new products, popularity, color or other “facets”, you can help them find what they are looking for sooner. Yet that can become toxic as well.
Too Many Filters End Up Becoming Toxic to Visitors & SEO
If I go to a page on a site and I am overwhelmed by the choices presented, I can become confused and frustrated. If search engine algorithms have to “figure it all out” with so many choices, formulaic evaluation decisions can break down.
Is this narrowing function helpful to visitors?
Is there enough search volume to warrant allowing this narrowing function result to become indexed?
Is there enough search volume to make the duplicate content concern worth addressing in other ways?
Blue Floral Print Summer Dresses
Blue Floral Print Summer Dresses Under $100
At what point is there enough search volume to justify any of these being indexable?
At what point is there not enough search volume to justify the duplicate result concern?
Other Channel Data
When determining if there’s enough search volume, then it’s time to look at existing and previous sales data. At this point, don’t just rely on organic search originated sales. Especially if a particular combination of features or “facets” wasn’t visible previously due to poor SEO.
So look at overall sales data from all channels. Direct, PPC, email marketing – all of these can help inform your decision as to whether a particular combination of features or facets may be more valuable than the very narrow data specific to organic listings.
Profit Margins Matter!
Just because you have data that may show “we sold a lot of this combination”, it doesn’t mean that combination deserves to be indexed even if there’s enough organic search volume overall to seemingly justify indexation.
If a particular group of products has a high sales volume, yet the profit margin on that group is very low, by preventing that from being indexed in search engines, you can reduce some of the duplicate content confusion, and that in turn, can help put more focus on higher profit margin product groups.
Stop Already ! It’s Too Much to Figure Out!
One concept I drive repeatedly in my audit work is “don’t leave it to Google to figure it all out”. That’s because the more you leave to Google to have to figure out, when you can help their crawlers and algorithms, the more likely that automated, formulaic process is going to make poor choices, give more value to things you care less about, and give less visibility to things you care more about.
The same concept can be applied to you and to site decision makers with all of these steps needed to get to highly refined indexation choices.
This is especially true when you have tens of thousands or hundreds of thousands of products or SKUs.
If that’s the case for you, one way to deal with it, at least in the short to mid-term, is to block ALL of the various filter / feature / facet options on your site from indexation.
If you do that, you can then come back in a future phase, and slowly, methodically reintegrate a limited number of filters / features / facets to indexation, one at a time. Yet only if you’ve taken the time to determine which of those deserve indexation based on the criteria I’ve communicated above.
Content Blocking Tactics
Once you’ve mapped out what to keep in the index and what to block, there are multiple ways you can go about the blocking process.
Robots.txt Silo Blocking Method
You can use the robots.txt file to block entire silos from indexation, if groups of products you want to block are contained in their own hierarchical URL silo.
With a single robots.txt entry, you can block the entire “under-100-dollars” group. The flaw in that method is if you don’t have those “under 100 dollars” products accessible outside of that silo level. You don’t want to go too far in blocking, so that’s generally not always advisable.
If you do list all of your summer women’s dresses in the /summer/ level, then sure – you can block the /under-100-dollars/ group. That is a valid use of the robots.txt file, as long as the individual product URLs are not buried beneath that level though.
MyDomain.com/womens-dresses/summer/under-100-dollars/liz-claiborne-powder-blue-dress-8930w2/ is an example of having the individual product URL hierarchically situated in a silo level where blocking the /under-100-dollars/ level would be toxic to SEO.
If, instead, you have MyDomain.com/womens-dresses/summer/liz-claiborne-powder-blue-dress-8930w2/, and the /under-100-dollars/ level only exists to help show that subset of products for site visitors, you can safely block that /under-100-dollars/ level from indexation in the robots.txt file.
Meta Robots and URL Parameter Blocking
If you pass filters or facets in your URLs, you can use a custom parameter to designate all the various combinations of filters and facets you want blocked with one easy method.
All of these are examples of how passing along filter or facet options in the URL can be done, and they’re great examples of how out of control filter and facet refinement can become.
Many clients I work with have Google Search Console set up to deal with URL parameters. Except that’s rarely done properly.
Not only is it dangerous to assume “Googlebot can figure it all out”, it leaves the site vulnerable to incorrect indexation results. And it’s not helpful to Bing or any other search engine.
So for URL parameter cases, if there are any filter or facet specific parameter combinations you want to block from indexation, you can add “noidx=1” to the end of those URL strings. Then, you can have an entry in your robots.txt file to look for the “noidx=1” parameter / state combination, and block those out.
Sure, if you do that, you can also change the Google Search Console setting to “no urls”, however it’s best to take full control on-site as your primary method of communication.
Meta Robots Blocking Methods
If you need more granular control as to what you block, you can programmatically set all of the pages you want to be blocked through a meta robots “noindex,nofollow” state in the header of those pages.
The Meta Robots “noindex,follow” myth
While there are always exceptions to the rule, most of the time, it is not valid or appropriate to use noindex and follow together in the meta robots tag. Why? Because of the following:
- If you want a page indexed, it should, whenever possible, be accessed from a silo path where each page in that page is indexed.
- Since a “noindex” state causes a page to have zero added SEO value*, a noindex,follow state sends zero SEO value to the target pages.
- Forcing search crawlers to navigate through noindex pages, for content discovery is very inefficient, and weakens crawl budget limitation concerns.
*When I say a noindex page has zero SEO value, I mean to say there is no new value added from that page to pass along to pages linked from it. PageRank is a Pass Through state on noindex pages, not an “add value” state. If you have 1 million links, the lost crawl efficiency does more harm than pas through is worth. Maximized link distribution on scale is more efficient when each page crawled passes through & adds value.
So while you CAN have noindex,follow, and it is possible that ranking won’t be harmed, from a crawl efficiency perspective it is not a best practice. And on scale, the larger the site, the more harm that can come from crawl inefficiency than it is worth to allow noindex,follow to take place.
Example: 1 million pages. 200,000 are noindex,follow. What if you also have intermittent server issues, and Google abandons the crawl some of the time? When is it ever acceptable to allow that possibility, where in the process, tens of thousands of noindex,follow URLs got crawled, yet other pages that you want crawled for indexability, would be abandoned? This is an extreme example (though I’ve seen much bigger sites) — however it’s the point that in a maximized efficiency scenario, noindex,follow is not helpful.
Meta Robots / Canonicalization Conflicts
Another critical flaw I find in sites that attempt to control what gets indexed and what doesn’t, is conflicts between meta robots and canonical tags.
In the above example, the meta robots tag communicates “don’t index this page”, while the canonical tag communicates “index this URL”.
A Word About Navigation and Hierarchical Grouping
In the examples I used early on in this post, I used a single case example – 9,000 products. In many of the audits I perform, an individual site can easily reach into the tens of thousands or hundreds of thousands of products.
When that happens, it’s tempting to have hundreds of categories, or in any one or more categories, hundreds of subcategories. And just as tempting to end up with a single subcategory containing tens of thousands of products.
There is no one right mix or mathematical formula for how many URLs and links you have at any single level. The most important concept is to always be looking at it from the perspective of your ideal client or customer market.
If a site visitor becomes overwhelmed with choices at any single point, that’s going to be a red flag that search crawlers and algorithms will also become overwhelmed.
When it comes to displaying navigational options to visitors, that too needs to be filtered through human experience. If I see a list of 500 choices on the navigation bar or sidebar (and yes, I’ve seen worse), not only does that overwhelm the senses, it dilutes the refined topical focus vital to maximized SEO.
General Navigation Volume Guidelines
As a general rule, it’s wise to not have more than eight to twelve main categories, and not to have more than eight to twelve subcategories in any single category.
I shouldn’t be assaulted, as a user (human or search engine) with links to “all the things” in navigation. As a result, the more products you have, the more need there will be to go deeper in hierarchical silos.
When I, as a user, go into a single category, I should only see subcategory links that point to content within that category.
A Word About Flat Architecture
One last concept I need to convey in this post has to do with flat architecture. The SEO myth states “the closer to the root domain a page exists, the more important that page is”.
While the very basic concept is sound, at this point in Internet history, flat architecture is almost always toxic.
Given the scale of individual sites and niche markets dictates that creating a flat hierarchical URL structure sends invalid, and topical weakening signals.
In this example, you’re communicating “each of these pages is equally important”. That is not only not true, it dilutes the importance of those pages that are truly more important, and those that have a larger topical reach. It’s forcing search engine algorithms to have to “figure it all out”, which I have said previously, is dangerous and likely very harmful to maximized SEO.
“But my URLs will become too long”
Generally speaking, it’s not only acceptable to end up with longer URLs the deeper you go in navigation, it’s perfectly valid for human needs as well. How many site visitors actually read the entire URL of a given page deep within a site?
Obviously, you may need to alter how you generate product detail page URLs – instead of using the entire 20 word product title, you’d probably be better off crafting a shorter version of those for this situational need.
A Final Word
I’ve done my best in this article to give as much clarity to the issue of product category and faceted navigation challenges and solution options for SEO. Even though that’s the case, I can’t emphasize enough that each situation is unique, and every site has not only its own needs for SEO, but also any number of potential limitations regarding the implementation of fixes.
Like anything else in SEO though, the closer you can get to a healthier state from an existing mess, the better your site will be overall. For humans and search engines as users.
And there’s always more to do. Always.