XML Sitemap - Beginners Guide
Here's a step-by-step guide to creating and optimizing XML sitemaps, complete with a best practices checklist. Google and SEO evolve alongside the web. As a result, what is considered best practice is frequently in flux. What was sound advice yesterday may not be so today?
This is especially true for sitemaps, which have been around almost as long as SEO.
The issue is that when everyone and their dog has posted answers in forums, published recommendations on blogs, and amplified opinions with social media, it takes time to separate the useful information from misinformation. So, while most of us understand that submitting a sitemap to Google Search Console is essential, you may not understand the complexities of how to do so in a way that drives SEO key performance indicators (KPIs).
Let's clear the air about best practices for sitemaps right now. This blog discusses the below table of content.
What is an XML sitemap?
An XML sitemap is a file that lists all of a website's important pages so that Google can find and crawl them all. It also aids search engines in understanding the structure of your website. You want Google to crawl all of your website's important pages. However, pages without internal links pointing to them can become difficult to find. A sitemap can aid in content discovery.
An XML sitemap is simply a list of your website's URLs.
It serves as a road map for search engines, indicating what content is available and how to access it. An XML sitemap's ability to assist crawlers in faster indexation is especially important for websites that:
- Have a large number of pages and/or a complex website architecture.
- New pages are frequently added.
- Change the content of existing pages on a regular basis.
- Weak internal linking and orphan pages are a problem.
- There is a lack of a strong external link profile.
XML sitemap format
This is the XML sitemap for a one-page site that uses all available tags:
ref:www.searchenginejournal.com/technical-seo/xml-sitemaps However, how should an SEO employ each of these tags? Is all of the metadata useful?
Tag Loc (a.k.a. Location):
This mandatory tag contains the absolute, canonical URL location.
It should accurately reflect your site protocol (http or https) as well as whether you have included or excluded
This is also where you can implement your hreflang handling for international websites.
By indicating the language and region variants for each URL with the xhtml:link attribute, you reduce page load time, which other implementations of link elements in the head> or HTTP headers cannot provide. Yoast has an excellent post on hreflang for those interested in learning more.
lastmod (also known as Last Modified) Tag:
This tag is optional but highly recommended for communicating the file's last modified date and time. John Mueller acknowledged that Google does use the lastmod metadata to determine when a page was last modified and whether it should be crawled. In 2015, Illyes gave contradictory advice.
The last modified time is especially important for content sites because it helps Google determine that you are the original publisher.
It's also effective for communicating freshness, but make sure to update the modification date only when you've made significant changes.
Trying to fool search engines into thinking your content is new when it isn't may result in a Google penalty.
Changefreq (also known as Change Frequency) Tag This optional tag used to indicate to search engines how frequently the content on the URL was expected to change.
However, according to Mueller, "change frequency doesn't really play that much of a role with sitemaps" and that "it is much better to just specify the time stamp directly."
This is an optional tag that tells search engines how important a page is in relation to your other URLs on a scale of 0.0 to 1.0. At best, it was a hint to search engines, and both Mueller and Illyes have stated unequivocally that they ignore it. An XML sitemap is required for your website, but not necessarily the priority and change frequency metadata. Use the lastmod tags correctly and pay close attention to submitting the correct URLs.
Types of sitemaps
There are numerous types of sitemaps. Let's start with the ones you actually require.
Index of XML Sitemaps:
There are a few drawbacks to using XML sitemaps:
- A maximum of 50,000 URLs are permitted.
- A maximum uncompressed file size of 50MB.
To save bandwidth on your server, sitemaps can be compressed using gzip (the file name becomes something like sitemap.xml.gz). However, once unzipped, the sitemap cannot exceed either limit. When you reach either limit, you must split your URLs across multiple XML sitemaps.
These sitemaps are then combined into a single XML sitemap index file, commonly referred to as sitemap-index.xml. In essence, a sitemap for sitemaps. You can also create multiple sitemap index files for extremely large websites that want to take a more granular approach. As an example:
However, keep in mind that you cannot nest sitemap index files.
To make it easier for search engines to find all of your sitemap files at once, you should:
- Register your sitemap index(es) with Google Search Console and Bing Webmaster Tools.
- In your robots.txt file, specify your sitemap index URL(s). Directing search engines to your sitemap as you invite them to crawl.
You can also submit those sitemaps by pinging them to google. But be cautious: Google no longer considers hreflang entries in "unverified sitemaps," which Tom Anthony believes are those submitted via the ping URL.
In XML Image Sitemaps in XML Image Sitemaps in XML Image Sitemaps in XML Image Sitemaps in XML Image Images are embedded within page content in modern SEO, so they will be crawled alongside the page URL.
Furthermore, using JSON-LD schema.org/ImageObject markup to call out image properties to search engines is best practise because it provides more attributes than an image XML sitemap. As a result, most websites do not require an XML image sitemap. Incorporating an image sitemap would be a waste of crawl budget. The only exception is if images are important to your business, such as a stock photo website or an ecommerce site that gets product page visits from Google Image search.
Understand that images do not have to be on the same domain as your website in order to be included in a sitemap. A CDN can be used as long as it is verified in Search Console.
Sitemap for XML Video:
If videos are important to your business, you should submit an XML video sitemap. Otherwise, a video sitemap is unnecessary.
Save your crawl budget for the page where the video is embedded, and ensure that all videos are marked up with JSON-LD as a schema.org/VideoObject.
XML sitemap for Google News:
This sitemap should only be used by sites that are registered with Google News.
Include articles published within the last two days, up to a maximum of 1,000 URLs per sitemap, and update with new articles as soon as they are published. Contrary to popular belief, Google News sitemaps do not support image URLs. To specify your article thumbnail for Google News, Google recommends using schema.org image or og:image.
Sitemap for Mobile Devices:
Most websites do not require this, Because Mueller has confirmed that mobile sitemaps are only for feature phone pages. Not suitable for smartphone compatibility. A mobile sitemap will be useless unless you have unique URLs designed specifically for featured phones.
HTML Sitemaps :
XML Sitemaps cater to search engine requirements. HTML sitemaps were created to help human users find content. The question then becomes, do you need an HTML sitemap if you have a good user experience and well-crafted internal links?
In Google Analytics, look at the page views for your HTML sitemap. The chances are extremely slim. If not, it's a good indication that your website navigation needs to be improved. HTML sitemaps are commonly found in website footers. Obtaining link equity from each and every page of your website.
Consider this. Is that the best way to put that link equity to use? Or are you including an HTML sitemap as a nod to best practises for legacy websites?
If only a few people use it. Furthermore, search engines do not require it because you have strong internal linking and an XML sitemap. Is there a purpose for that HTML sitemap? No, I would say.
Sitemap in Dynamic XML:
Static sitemaps are easy to create with a tool like Screaming Frog. The issue is that your sitemap becomes out of date as soon as you add or remove a page. If you make changes to a page's content, the sitemap will not automatically update the lastmod tag. So, unless you enjoy manually creating and uploading sitemaps for every single change, static sitemaps should be avoided. In contrast, dynamic XML sitemaps are automatically updated by your server to reflect relevant website changes as they occur.
To make a dynamic XML sitemap, follow these steps:
- Request that your developer create a custom script, making sure to provide detailed instructions.
- Make use of a dynamic sitemap generator.
- Install a plugin for your CMS, such as the Yoast SEO plugin for WordPress.
- Modern best practises include dynamic XML sitemaps and a sitemap index. Mobile and HTML sitemaps are not supported.
- Only use image, video, and Google News sitemaps if improved indexation of these content types is important to your KPIs.
XML sitemap indexation optimization
XML Include only SEO-relevant pages in XML Sitemaps. An XML sitemap is a list of pages that you want crawled, which does not have to be every page on your website. When a search spider visits your website, it is given a "allowance" for the number of pages it will crawl. The XML sitemap indicates that you value the included URLs more than those that aren't blocked but aren't in the sitemap. You're using it to tell search engines, "I'd really appreciate it if you could pay special attention to these URLs." Essentially, it allows you to make better use of your crawl budget. By including only SEO-relevant pages, you assist search engines in crawling your site more intelligently, allowing you to reap the benefits of better indexation. You should not include:
- Pages that aren't canonical.
- Pages that are duplicates.
- Pages are paginated.
- URLs with parameters or session IDs.
- Pages of site search results
- Respond to URLs in comments.
- Email the URLs to others.
- URLs generated by filtering that are not useful for SEO.
- Pages from the archive.
- Any 3xx redirections, 4xx missing pages, or server error pages (5xx).
- Robots.txt prevents access to certain pages.
- Pages that have no index.
- A lead generation form provides access to resource pages (e.g., white paper PDFs).
I'd like to share an example of page prioritisation from Michael Cottam: Assume you have 1,000 pages on your website. 475 of those 1,000 pages contain SEO-related content. In an XML sitemap, you highlight those 475 pages, essentially asking Google to deprioritize indexing the rest. Let's say Google crawls those 475 pages and determines that 175 are "A," 200 are "B+," and 100 are "B" or "B-." That's a high average grade, indicating a high-quality website to which users should be directed. Compare this to submitting all 1,000 pages via the XML sitemap. Now, Google examines the 1,000 pages you claim are SEO-relevant content and discovers that more than half are "D" or "F" pages. Your average grade is no longer looking so good, which may jeopardise your organic sessions. But keep in mind that Google will only use your XML sitemap as a guide to what's important on your site. Just because it isn't in your XML sitemap doesn't mean Google won't index those pages. Overall site quality is an important factor in SEO. To evaluate the quality of your site, use Google Search Console's sitemap reporting (GSC). Manage your crawl budget by restricting XML sitemap URLs to only SEO-relevant pages and investing time in reducing the number of low-quality pages on your website.
Fully Utilize Sitemap Reporting:
gsc-sitemap-report-new-old The new Google Search Console's sitemaps section is less data-rich than what was previously available. Its primary function now is to confirm that your sitemap index was successfully submitted.
You can also get a sense of the number of different types of SEO pages that have been "discovered" - aka all URLs found by Google via
Sitemaps as well as other methods such as following links - if you have chosen to use descriptive naming conventions rather than numeric.
The Index Coverage report in the new GSC is the more valuable area for SEOs in terms of sitemaps.
XML sitemap best practice checklist
Make the time to:
- Incorporate hreflang tags into XML sitemaps.
- Include the tags loc> and lastmod>.
- Use gzip to compress sitemap files.
- Make use of a sitemap index file.
- Use image, video, and Google news sitemaps only if indexation is important to your KPIs.
- XML sitemaps can be generated dynamically.
- Ensure that URLs are only included in one sitemap.
- In robots.txt, include the sitemap index URLs.
- Submit your sitemap index to Google Search Console as well as Bing Webmaster Tools.
- Only include SEO-relevant pages in XML sitemaps.
- Correct all errors and warnings.
- Examine trends and different types of valid pages.
- Calculate the indexation rates of submitted pages.
- Address the reasons for exclusion for submitted pages.
Now, go check your own sitemap to ensure you're doing it correctly.
When done correctly, XML sitemaps assist search engines in quickly finding, crawling, and indexing websites. To reap the most benefits from search engines, ensure that your XML sitemap has been properly formatted, compressed, and submitted.
You can no longer rely on links to get your pages crawled. Search engines will notice new or updated sites and pages faster. Bots can crawl pages more intelligently now that sitemaps contain meta information. You can ensure that search engines find important information about images and videos that crawlers cannot access.
Generally, AMP or Accelerated Mobile Pages is a Google project that aims to improve the mobile web experience.
A search engine spider has a "allowance" for how many pages on your site it can and wants to crawl. This is referred to as a "crawl budget."
Meta robot tags are an essential tool for improving search engine crawling and indexing behavior and controlling your SERP snippets.