OnCrawl: A review
I have used both Screaming Frog and Deep Crawl to crawl websites, both of which I have covered many times. However, this time I thought I would see what OnCrawl has to offer.
OnCrawl is more similar to DeepCrawl than Screaming Frog. It’s a web based platform so doesn’t use your machines resources unlike Screaming Frog. It doesn’t matter if you’re a Windows, Mac or Linx fan; you can use the software. So that’s a great start.
I am not going to cover why you should be crawling your website, that’s for a different article. If you’re reading this you probably already know how important it is to regularly crawl your site and the benefits it can bring to your business.
Actually crawling your website for errors should be done by all SEO’s whether they are ‘whitehat’, ‘Blackhat’ or somewhere in between.
This is very straightforward and they have a very good pricing model, the more you want to crawl the more it will cost you and the prices are very reasonable. For example to crawl 100,000 URLs a month is only 19 Euros.
So what about the software itself? To give a fair review I needed to crawl a website. As with my previous reviews of similar software I am going to pick on the BBC again. They aren’t a commercial company so my findings can’t really hurt others.
So I went with the 100,000 crawl. I know they have more than 100,000 pages but it should give a good sample and a good basis on which to analyse the software.
I chose to crawl external links so in theory could of crawled more of BBC links, but having broken external links is just as bad as having broken internal links.
After running the software you are greeted with a home page for each project to quickly analyse the crawl. If you are working in-house or for an agency and need to present this data, there is a quick ‘print/pdf’ button making reporting the results of the crawl easy.
The first thing to stand out, is that out of the 100,000 crawled pages, 96k are indexable – this is fine. There are always going to be pages on any site we don’t want indexing. What does ring alarms is that there are only 81k indexable compliant pages. There are around 15,000 pages which have disappeared. That’s 15% of pages which are never going to be indexed, so can’t ever appear in the SERPs so have no chance of driving organic visitors.
The great thing about OnCrawl is that this is highlighted as you log in – you don’t have to go drilling around trying to find this data. It’s there in front of you.
So the first thing has to be to find those 15k pages and work out why they aren’t being compliant.
It’s so simple, all the numbers are links to the report, so all I need to do is click on the ’96,645’ link and the following page appears.
We are only interested at this point in the missing 15k page, so we just need to add a filter but this is very simple (Ok you need to know what the symbol for ‘doesn’t equal’ is, but you can Google this if you don’t know and then hit ‘apply filters’.
We now have our 14,342 rows of data that can’t be indexed. Yes you can do some more drilling down in the platform, but this is where my data analyst skills kick in – GIVE ME EXCEL and there is a huge ‘export data’ button. The data is export into tab separated so you just need to ‘column to text’ and delimited on “.
There are a lot of 301s and 302 plus a few other errors. If anyone from the BBC marketing / web team is reading this – you have a few issues. Get in touch and I will fix them for you.
So now we have identified where the 15k pages which can’t be indexed are, we can now look at some of the other issues.
We have 3,355 pages which aren’t indexed because of the robots file. I had a quick look through the list and the BBC probably have a good reason for not wanting them indexed, but if this was your site you could do a quick look and make sure that only pages you wanted to be no-index was in this list. If there was a URL in the results that wasn’t meant to be there, you can isolate it, find out why and fix the issue.
Other issues which are highlighted to you are the 455 4xx pages – but we identified these earlier and have already addressed these when trying to find the missing 15k pages – but it’s useful to see quickly how many pages return 4xx errors.
The next two figures in the summary document are very useful. Pages with tag duplication (Pages with at least one of these issues: duplicate title, duplicate meta description, duplicate h1) and pages with duplicated content (Pages whose text content is considered duplicated, or very similar to another page.) – again we can quickly drill down and find the issues and resolve them.
Scrolling further down the page, passing the headline numbers, brings up some useful graphs.
As SEO’s we are always told to cut down on the number of levels and to keep as much information as close to the root domain as possible and easy to find. They have 10 levels (ok level 10 only has 1 page), but still 10 levels for Googlebot to navigate through and determine the correct importance. Most of BBC pages are at level 5. Is this right? I have never worked on a news site before or a site as large as BBC. I worked on sites with over a million pages before but they didn’t have this many levels. I didn’t even crawl the entire site so I’m not sure how far down they go.
Keep scrolling and you get four graphs – three very useful graphs in fact.
As anyone in internet marketing, whether that be PPC, SEO, email or any other discipline of internet marketing, we all know speed is crucial. This clearly breaks down which pages are slow. You can do this analysis yourself in Google Webmaster tools, but this requires you inputting every URL for your site and if you have quite a few pages OnCrawl could save you a lot of time and effort.
We only really care about the slow pages and again, by clicking on any of the segments a custom report is created in the ‘Data Explorer’ section showing you the slow pages. While this gives you some useful information like page size – trying to figure out the reason why you are, in my opinion, going to need to use some other software – but at least OnCrawl has narrowed it down to which pages you need to analyse.
It highlights which type of links you have follow/no follow both internal and external. Again you can drill down to find useful information. Again with the ‘export data’ tab I can download the data and analyse in EXCEL.
The final section, actually makes the analysis we did earlier about the response codes seem slow, because there is a lovely chart which breaks down by number range all the response codes. I still prefer my method above as I can see all the response codes without having to do any further drilling down but for some quick analysis this is ideal.
In my opinion the best part of the platform – basically you can integrate the raw data however you want too, allowing you to get to the bottom of any technical issues on the site. It might be the data analyst that comes out in me but any platform which allows me all the data and to be able to filter and analyse the metrics which are useful to me – not what they decided beforehand. Even better is that the filtering is done within the platform and EXCEL isn’t needed for filtering.
You can add columns which are useful, remove others which you don’t need. Filter on any of the columns – it’s just amazing the filtering and drill down you can do. This feature alone would save me a lot of time when looking at a client’s website.
As I said in my presentation at BrightonSEO, you wouldn’t build a million pound mansion on quicksand, so why build a million pound website on poor foundations? Technical SEO is one of the key pillars of success, so crawling your site for errors in crucial.
I wasn’t sure what to expect when I started with this platform, I am a huge fan of DeepCrawl – their platform is something I have used for a long time and I have all my clients in the platform with automatically crawls and I can quickly see differences. This part I haven’t tested yet and maybe next month when my credits refresh I will crawl a smaller site and put it onto auto recrawl and see what that feature is like.
While Deepcrawl may be prettier and produce nice looking graphs for you – it doesn’t actually let you interrogate the data, or not to the same level OnCrawl does. Would I switch from Deepcrawl to OnCrawl? It would have to depend on what comparison analysis you get.
Would I recommend OnCraw? Happily – I kind of wanted to write this article saying “it was a good attempt, nice bit of software but doesn’t come close to DeepCrawl” but I can’t and the more I use the software the more I am becoming an advocate for the platform.
I will update this next month once I have done two crawls of the same URL and see what comparisons are reported.
I am the Managing Director of Coreter Media and have been in Digital Marketing since 2009. Initially in-house working for some of the UK’s biggest brands, but now I run my own agency helping small businesses grow.