Sunday, March 21, 2010

What is "XML sitemap"?

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling,They allow you to specify the importance of each page, the frequency which they should download them and the last time you modified them. or The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. This allows search engines to crawl the site more intelligently. Sitemaps are a URL inclusion protocol and complement robots.txt, a URL exclusion protocol.
Sitemaps are particularly beneficial on websites where:
The webmaster can generate a Sitemap containing all accessible URLs on the site and submit it to search engines. Since Google, MSN, Yahoo, and Ask use the same protocol now, having a Sitemap would let the biggest search engines have the updated pages information.

Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results.Google first introduced Sitemaps 0.84 in June 2005 so web developers could publish lists of links from across their sites. Google, MSN and Yahoo announced joint support for the Sitemaps protocol in November 2006. The schema version was changed to "Sitemap 0.90", but no other changes were made.
In April 2007, Ask.com and IBM announced support for Sitemaps. Also, Google, Yahoo, MS announced auto-discovery for sitemaps through robots.txt. In May 2007, the state governments of Arizona, California, Utah and Virginia announced they would use Sitemaps on their web sites.
The Sitemaps protocol is based on ideas from "Crawler-friendly Web Servers".
The Sitemap Protocol format consists of XML tags.Sitemaps can also be just a plain text list of URLs. They can also be compressed in .gz format.
The File format of XML Site Map is shown below in image:


Sitemap Index
The Sitemap XML protocol is also extended to provide a way of listing multiple Sitemaps in a 'Sitemap index' file. The maximum Sitemap size of 10 MB or 50,000 URLs means this is necessary for large sites. As the Sitemap needs to be in the same directory as the URLs listed, Sitemap indexes are also useful for websites with multiple subdomains, allowing the Sitemaps of each subdomain to be indexed using the Sitemap index file and robots.txt.
Sitemap limits
Sitemap files have a limit of 50,000 URLs and 10 megabytes per sitemap. Sitemaps can be compressed using gzip, reducing bandwidth consumption. Multiple sitemap files are supported, with a Sitemap index file serving as an entry point for a total of 1000 Sitemaps.
As with all XML files, any data values (including URLs) must use entity escape codes for the characters : ampersand(&), single quote ('), double quote ("), less than (<) and greater than (>).

No comments:

Post a Comment