I heard on an older podcast that
- Google spider traffic is significant on Stack Overflow, and
- the sitemap only contains entries for the newest 50,000 pages.
Would it be more efficient, in terms of cutting the Google spider traffic, to add every page to the sitemap?
It would allow Google to recognize the pages that have not been changed since they were indexed, without accessing those files individually.
It would take 40+ sitemap files, but the files with older pages could be updated only once a month or so (for new comments, etc.)