An easy and fast way to map out potential technical errors and issues for any website.
For the last couple of months, I've been working on a brand new tool that we'd like to call the "Site Crawler". The purpose of this tool is to get an extensive overview, on the spot, of any potential errors or issues that we might stumble upon when "crawling" a desired domain.
It is pretty self-explanatory, but read along to get some additional insight on each step of the process.
By entering a valid URL or domain, the crawling can be initiated. The crawling always starts from the home-page and then works its way down the DOM-tree via the internal links it comes across (Naturally, we respect robots.txt and rel="nofollow").
The amount of URL's define the maximum number of unique URL's that the tool will crawl (including external URL's, if checked - we'll return to this option later). In other words, if 50 URL's are chosen, the Site Crawler will only crawl a maximum of 50 URL's. In a scenario where a higher number of URL's are chosen (like 300 URL's) but the number of URL's found are less or equal to the previous option (like 50 URL's), the amount of credits lost will be adjusted after the crawl is complete.
The credit cost varies between the amount of URL's chosen, where a higher credit cost translates to a greater value per URL. The higher options also include AUR (Ahrefs URL Rating) which determines the strength of a specific URL's backlink profile. This increases the value of a single submit significantly.
"The credit cost varies between the amount of URL's chosen, where a higher credit cost translates to a greater value per URL."
By checking "Also check if all external URLs are working", the crawler will also follow external links and see if they return a satisfactory HTTP status code (200 - OK). This is optional.
The crawl time may vary from domain to domain, and this is obviously due to a number of different factors such as the amount of URL's found and the performance of the targeted server. At the moment, we wont allow any crawling to persist longer than around 2 minutes, so the number of URL's found might be affected by this limitation. During our internal tests, this was rarely a problem.
The crawl will continue until either the URL limit or the time limit is reached. The crawler follows a road map in the following order:
* Provided a URL option includig AUR was chosen.
When the crawl is finished, the results are presented and you get to reap the rewards of your patience! A brief overview is placed on top of the table, which lets you do a quick assessment of the results.
You can filter the table either by clicking on the boxes in the "overview", by using the input-field to the top right of the table or simply by clicking on the table heads to sort each column by errors and issues.
Areas marked in red are what we consider to be errors. It can be overly long load times, or the lack or misuse of crucial meta-elements.
Areas marked in yellow are what we consider to be issues. Not as serious as an error, but what we consider to be subjects of improvement.
Areas marked in green are satisfactory to our standards, good job!
URL's that are external or points to a file, will only be seen in the table if they return a faulty HTTP Status Code (But they are obviously accounted for in the number of URL's analyzed).
We hope that this tool will become as essential to you as it is for us internally.
The Site Crawler will initially enter an open beta-phase. And with that said, we appreciate any feedback sent our way.
And as always, we have more exciting tools and features in the pipeline. Subscribe to our newsletter and you wont miss a beat!