Why crawling your website for error’s is important

You’ll hear me say this a lot. Yes. Having great content is amazing. It helps earn you links, which is important, but I’m currently learning just how important those links are.

One of the sites I manage has virtually zero links compared to its competitors. So, I get that it’s important. However, without great foundations, a brand can suffer as a result.

That’s why you need to get your geek on and sort out your site technically.

Do a technical audit of your website

When was the last time you did a technical audit of your website? Last month? The beginning of the year? When the new version of the site went live?

Every website I manage gets crawled weekly; it’s one of the benefits of using DeepCrawl over Screaming Frog.

Not only does it mean I can identify changes on a weekly basis, DeepCrawl also provides a comparison report so that any changes are spotted quickly.

Here, we’re going to analyse two things you should be looking out for when auditing your website.

Internal broken links

So. You’re spending lots of resources creating new content to get links, but what about internal pages that are dead and have internal and external links pointing to them?

If you 301 any broken links to relevant content, you’ll not only improve your customer experience, it will also provide you with a great source of links.

I wanted to give you an example but it’s not in my nature to pull to pieces another SEO’s work. So I had a good think about which site I could use and it came to me one night while watching television.

The BBC.

BBC-Crawl-top line stats DeepCrawl

Here’s a site with millions of pages that doesn’t rely on SEO. So I spoke with the guys at DeepCrawl to crawl part of the site and get me some stats. And did they deliver!
BBC Crawl internal redirects broken DeepCrawlThey crawled around 200,000 pages and, as you can see from the screenshot below, just this small portion of the BBC’s site has over 31,000 broken links and over 10,000 non-301 redirects!

Out of the 200,000 pages crawled, there were nearly 183,000 redirected links, simply because the trailing slash off the end of the URL was removed.

There are two solutions for this. The BBC could train ALL of its journalists not to put a trailing slash at the end of the URLs. Or, a simpler solution would be to sort out the CMS so that any links added by journalists don’t automatically include the slash.

After delving a bit deeper, it looks like the problem lies with the links in the top navigation menu, which would be a very simple fix. At the very least, the BBC could make these 301 redirects, instead of 302.

External Broken links

The one downside with DeepCrawl is that it doesn’t check external links (as a default and it uses additional credits toBBC Crawl 302 redirects DeepCrawl check these links) and these are just as important as internal links.

If you’re telling customers there’s a great resource at this location and they click through to discover a 404 message, what do you think that does to your brand’s reputation? It implies that you don’t keep your website up to date and could even suggest you are lazy.

Either way, it’s not a positive sign.

To perform a check of external broken links, I had to use Screaming Frog but on a super hot server. Like before, I couldn’t crawl the entire BBC site as there are over 19 million pages, but I did crawl a good sample and I was surprised at the number of broken links there were.

Doing some digging, I discovered these tended to be because of the following main reasons:

  • Expired websites
  • Page has been taken down, but the domain still exists
  • Incorrect links

The BBC has been around for years and has a lot of historical pages that don’t get updated, but these pages have a lot of external links on them.

If you have a link from the BBC it can be seen as a great sign of trust for your brand. So, if the link to your site from the BBC is broken because the page on your website has been taken down, it seems silly not to do something about it.

If you’ve have had to take the page down, either do a 301 to a similar relevant page or speak to the site owner (or the journalists, in the case of the BBC) and get them to update the link.

If the link they’ve added has is a ‘misspell’, then contact the site owners to explain their error and ask them to update the article. You don’t want to be wasting links from powerhouses like the BBC!

As you’ll be crawling your own site and not the BBC’s, you might find old blog posts from yesteryear that linked to a great resource that no longer exists. In which case, you should update the article and make it relevant to a new source.

You might find that some of your content is dated and could do with a refresh. Change it. Make it relevant and promote it again to your customers.

The good thing about DeepCrawl is you can link into Analytics. So, if you see a page with errors, you can at least see if the page is getting any Organic Visitors. If it is, then fixing the broken links and updating the article is a must.

Whether you use DeepCrawl or Screaming Frog, you should be crawling your site regularly for errors and fixing them. If you’re using DeepCrawl with a regular crawl, it’ll highlight any differences, which makes it a lot quicker, but you can do the same in Excel with exports from Screaming Frog.

Doing it once a year, a quarter or even every month isn’t good enough, even if your site only has a few pages. One of the sites I’m currently looking after only has 106 pages, but it still gets crawled every week.

It’s easy to miss a change made by the developers, or even someone else within the business, so crawling regularly is a good habit to get into.

If you’re reading this from the BBC – give me a quick call or drop me an email, there are some very quick fixes I have identified for you.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top