XML Sitemap URL Extractor
This program lists all the URLs inside a sitemap, and all the nested URLs as well.
What is a Google XML Sitemap?
A Google XML Sitemap or simply sitemap is a blueprint of your website contained in a file called sitemap.xml
or something similar. It helps search engines find, crawl and index all of your website's content. Sitemaps also tell search engines which pages on your site are most important.
Why is it called XML Sitemap?
It is sometimes called XML Sitemap because the format in which it is stored is XML. This is a sample XML tag in a Sitemap.
<url> <loc>https://aruljohn.com/</loc> <lastmod>2024-02-26T19:33:40.000Z</lastmod> </url>
Where can I find the sitemap of a website?
Type just the domain name of the website and then add a /robots.txt
to the end. You will see the sitemap listed there. For example, if you want to find the sitemap of slashdot.org
, just go to https://slashdot.org/robots.txt
. You will find more than one sitemaps listed there.
Can you share the Linux commands to get this information?
Yes, you can read our article on how to write Linux commands to extract URLs from sitemaps. This uses curl, grep, gunzip and other commands.