I’ve been talking to a lot of prospects lately, from large and small organisations, with a common theme being that whilst reviewing their websites, whether they are brochure sites or large, complex ecommerce sites, their sites display a lack of site integrity.
By this, I’m mostly referring to common items, like monitoring and fixing broken links, cleaning up after page URL changes or improperly transferring from a previous domain.
All of these items chip away at a website’s authority and we’ve found that fixing these items alone can really help boost a website’s visibility in search engines.
Broken links just happen with websites. You link to great external resources which get moved without you realising and the other site doesn’t redirect these properly (or simply deletes them, the fools). Sometimes this happens internally on a website, particularly when multiple stakeholders have the ability to change a website without thinking about the implications of the changes that they’re making.
You might think that case study is out of date now and needs to be deleted and later replaced with a new kickass one, but did you remember to remove the links to this page from the rest of the website?
So, how do you find all these broken links? My favourite tool for this is, of course, the Screaming Frog SEO Spider, of which I’m a big fan (here’s a picture of me wearing my Screaming Frog Hoodie which they kindly sent me).
You can crawl a website with Screaming Frog really easily and if it’s a small site, you can crawl up to 500 URLs without the crawl customisation elements at no charge. You can review which pages are linked to that are returning 404 or other types of server errors, and plan redirects for keeping a part of the authority that’s on that URL, if it’s internal.
The quickest way to get a breakdown of the specific broken links issues is to then do an advanced export of the 4xx links which will give you a line-by-line breakdown of where the source of the broken link is, where it links to and how it’s linking (either by the text anchor or alt text if it’s an image).
When you’ve got this list, it’s usually relatively simple to get them fixed. You just alter the pages where the broken links currently exist and update the link to point to a new location or simply remove it altogether.
If the same links are broken on all pages this usually indicates that you’ll need to change a page template rather than just an individual page.
Similarly, websites can be an ever evolving monster at times, with internal redirects. Unlike broken links, these might occur if you move a page and do remember to use a 301 redirect to the new location. Or maybe you’re adding links a little sloppily in your content and link to a version of a page without a trailing “/”.
Either way, I often see these in far greater numbers than broken links and when there are no good reasons for these redirects to exist, we always recommend getting them fixed to remove the issue.
You might ask why this is an issue…
It’s generally accepted that you lose a percentage of a links authority when you pass through a 301 redirect, though most of the time I’ve heard people talking about this, they were referring to ensuring that you capture the authority of an external link to your site.
This hasn’t been something that comes up quite as often in that context in a post-Penguin world, as so many more folks are trying to lose links that might damage them – giving 410 gone forever on a page that has lots of links for example.
But it always surprised me how many SEOs were prepared to allow thousands of internal redirects continue to exist on a website when there’s always been that potential of losing some of your own internal link authority.
It may only be a small percentage for each link, but if you have several of these on every page of your website and a website of thousands or millions of pages, you are slowly chipping away at that authority on a much bigger scale than you might realise, when you look at a redirecting link in the navigation and think “it’s only one link”.
Putting a report together on these from Screaming Frog is simple too – hit the advanced export Redirection (3xx report) In Links and you’ll have a breakdown of each individual link that needs to be updated. Then the harder work of cleaning them up begins!
When I was looking at broken links and internal redirects in the first two sections of this, we were looking at integrity issues existing with the links that are crawlable on the site at the time you send your spiders through.
Sometimes though, you won’t be able to find a crawl error that a search engine spider has found on your website based on internal crawls alone.
There can be all sorts of reasons for this: malformed links, old links to pages from external sources, temporary server errors and former hacking issues are just a few possibilities.
However, Google (and Bing) Webmaster Tools are kind enough to give you reports on some of the errors that they encounter when they try to visit your website.
They categorise this based on the type of Googlebot crawl that encountered the error (Desktop, Smartphone and Feature phone) and I’ve known these 3 reports to have quite different results that need resolving.
These then get categorised further by Server Errors (I can quite often identify if a client site has been down for maintenance due to a spike in 503s here), Not Found (where I find there are the most issues most of the time that need looking at) and Blocked.
Sometimes when you review these reports you might find that you’ve already fixed a lot of these as a result of fixing the items we identified in the broken links and redirects sections. I just tend to mark those as complete and if they are properly fixed, they’ll be gone for good.
My biggest irritation with this area of Google Webmaster Tools is that it limits you to be able to mark 1000 of these fixed per day. As per the example above, where there were almost 20k of errors at its peak, it can take a while to get a handle on all of these when you only get a small sample of the data Google has about these links.
I tend to export a whole list over the course of a week (or month depending on how many there are), de-dupe (these reports repeat themselves due to the small data samples) and look to resolve in one hit. With the export lists, it usually tells you the header status and the date the error was detected.
I tend to use SEO Tools for Excel to do a live header check on the list, though it would be easy to export the list and use the Screaming Frog spider in list mode and do the same. I was quite late to the SEO Tools for Excel party, but I’ve found it very useful for a myriad of tasks since I started using it a couple of years ago and I think I’m barely scratching the surface of its capabilities at the moment.
Once you’ve verified that the errors are still errors (or have resolved themselves which I find happens quite a bit with the GWT data) you’ll have a list of issues, but not always the source of the issue.
This can be a bit annoying – if you go back through the web interface it can tell you where they’ve done that, but you need to click on each link individually to get that data.
I usually look down the items on that list and make a judgement call on the best action to take. If there are lots of 410’s you might assume that there has been some link clean up to remove those deliberately from the index, some 404s might be easy to fix with a quick redirect to a page similar to its original destination.
Many times, you’ll just want to leave 404s to 404 though – if there is no similar or like for like page, sometimes you just have to let it go. I do usually mark those as complete, too, in the hope that they stop making my flatlined graphs look bumpy.
Personally, with the sites I work on, I like to have these factors ‘as clean as a whistle’ as it gives the website a solid platform for its excellent content and on site optimisation to be effective.
I can be a bit obsessive about this at times, but I’m a firm believer in setting that foundation as solidly as possible for the rest of the SEO work, on and offsite, to have maximum impact.
Obviously, there are often other factors at play when it comes to ranking a website, so cleaning up these sorts of issues won’t remove a link penalty, but sorting these issues is usually one of the first things we’ll do on a project, in order to have a firm base to work from.