Simon Lesser
December 22, 2020
Image crawling, hreflang, social markup, pixel widths, and more
We’ve been hard at work this year on some big improvements to our onsite crawler (powering tools like Site Auditor, Site Explorer, and URL X-Ray). Today, we’re excited to announce the launch an additional 30 new issues tracked by the Site Auditor, and over 40 new data fields collected for every crawled URL.
The new issues and fields are all powered by updates in 4 main areas:
- Image crawling
- Hreflang
- Social markup
- Pixel widths
Let’s check out each area individually, shall we?
Image crawling
In addition to HTML URLs, our crawler will now include images in the crawl as well.
A new Images tab has been added to URL X-Ray, so you can now view the images found on any URL in your site, along with additional optimization data.
We’ll look for a number of optimization issues on each image we crawl. Here’s a sample of some of the new image reports in Site Auditor:
- Broken images: Shows images returning an HTTP 4xx or 5xx status
- Alt text empty / missing: Checks whether the alt text attribute was omitted for each image
- Alt text too long: Flags images with an alt text longer than the recommended length
- Invalid Content Type: Reveals files that were linked in an img tag, but are not actually images
- Asset is linked via redirect: Images returning an HTTP 3xx status code
- X-robots noindex resources: Images blocking search engines from crawling the image via the x-robots noindex header
- Resources blocked by robots.txt: Images blocking search engines from crawling the image via the robots.txt directive
- All Images: A list of all images found on the site
Hreflang
Hreflang is a set of markup placed on a URL to tell search engines that there are multiple versions of a page in different languages. This helps search engines serve visitors the most appropriate version of each URL by language or region.
As anyone who’s attempted to implementing hreflang knows, the rules can get devilishly complicated quickly. There are so many potential areas where mistakes can be made, it’s no wonder that the vast majority of hreflang implementations have numerous issues.
Our crawler looks deep into all of the most common problems, uncovering them for you and providing the exact data you need to fix it quickly and easily.
- Broken hreflang link: Checks to make sure URLs are not returning an HTTP 4xx or 5xx response
- Invalid hreflang value: Flags hreflang tags that have a value that is not valid
- No self-referencing hreflang tag: Shows URLs that have one or more hreflang tags on the page, but doesn't link back to itself
- Non-reciprocal hreflang tag: Lists pages that have at least one hreflang tag, but are missing one or more reciprocal links back to this page
- Hreflang to non-canonical: Displays URLs that are linked in an hreflang tag that are not canonical (and therefore not indexable)
- Redirecting hreflang link: Displays URLs that are linked in an hreflang tag that redirect to a different page (and therefore not indexable)
- Multiple hreflang languages referencing the same URL: Flags URLs with one or more hreflang tags that use the same URL for multiple hreflang languages
- Multiple URLs referenced for the same hreflang language: Flags URLs with one or more hreflang tags that have more than one URL listed for a single hreflang language
- Empty or missing hreflang: A notice (not necessarily an issue) that hreflang was not found on this page
- Missing x-default hreflang: A notice (not necessarily an issue) that the x-default hreflang value was not found on this page
- Hreflang and HTML lang mismatch: Shows URLs that have different values for the HTML language attribute and the self-referencing hreflang tag
- Invalid HTML lang value: Reveals URLs with an invalid value for the HTML lang attribute
- Missing HTML lang value: Pages where no HTML lang value was found
In URL X-Ray (Content tab), you can view all hreflang tags on a page, whether the hreflang value is valid, and whether the linked URL is indexable. The value of the self-referencing tag and x-default tag will also be shown too.
You can even see all of the inbound hreflang links to this page too!
Social markup
Our crawler now looks for Open Graph and Twitter Card markup, and checks whether they’re properly implemented. These tags are used to generate those nice-looking previews in social media and messaging platforms when a link to this URLs is shared.
Here’s a few of the most common issues with social markup we look for in Site Auditor:
- Open Graph tags missing: No Open Graph markup was found on this page. (just a notice, not necessarily an issue)
- Open Graph tags incomplete: At least one Open Graph tags were found, but one or more of the required tags were not found
- Open Graph URL not matching canonical: The URL listed in the Open Graph tag is different than this page's canonical URL
- Twitter card missing: No Twitter Card markup was found on this page. (just a notice, not necessarily an issue)
- Twitter card incomplete: At least one Open Graph tags were found, but one or more of the required tags were not found
- Twitter card description too long: The content in the Twitter Card description tag was longer than the maximum length
In URL X-Ray (Content tab), you can view the value of every Open Graph and Twitter Card tag we track, too!
Pixel widths
Each text character takes up a different amount of horizontal space on the page (for example, iiiii and wwwww are both 5 characters, but wwwww is much longer).
Therefore, when thinking about how much content will be shown in the SERP snippet, the number of characters found in the title tag or meta description is not important — what really matters is the width they take up (measured in pixels).
In light of this, our crawler now calculates the pixel width of the title tag and meta description, as rendered with the same font family and size as in the Google snippet.
As long as the URL is not flagged with the Title Too Long or Meta Description Too Long issues, you’ll be able to know with confidence that your title and description tags will be able to be displayed properly on the SERP snippet without being truncated!
It’s not just the Site Auditor that has been upgraded — these changes have been woven throughout the app.
- Site Explorer: All 40+ new crawled fields are available in the table, filter, and export
- URL X-Ray (images tab): A new Images tab has been added to URL X-Ray, so you can now view the images found on any URL in your site, along with additional optimization data.
- URL X-Ray (content tab): New crawled fields, such as the values of all hreflang links, open graph, or Twitter cards found on this page are now visible
- URL X-Ray (optimization tab): Targeted recommendation and help text is shown for each issue found for this URL
- Dashboard: All new issues types are available in the Onsite Dashboard modules
- Report Builder: All new issues types are available in the Onsite Report Builder modules, including Site Issue Details.
Dozens of smaller updates
We’ve gone through our onsite tools and cleaned up and added plenty of small tweaks and optimizations to make the platform a bit easier to use.
For example, we’ve completely redesigned the filter and navigation on Site Explorer to be considerably simpler.
Rolling out soon
The new onsite updates will be rolling out over the next few weeks, so if you don’t see it today, don’t fret! Just check back again soon, or get in touch with our Customer Success team via live chat.