In a recent NARA memo (NWM 13.2008) a high-ranking NARA official wrote:
The National Archives and Records Administration (NARA) preserved a one-time snapshot of agency public web sites as they existed on or before January 20, 2001, as an archival record in the National Archives of the United States. NARA also conducted a harvest (i.e., capture) of Federal Agency public web sites in 2004 and of Congressional web sites in 2006. See http://www.archives.gov/records-mgmt/policy/web-harvest-snapshot.html.
After considering our other records management program priorities for FY 2008, availability of harvested web content at other "archiving" sites (e.g., www.archive.org), and the resources required for conducting and preserving a government-wide web snapshot, NARA has determined that we will not conduct a web harvest or snapshot at the end of the current Administration.
But depending on the nonprofit organization Internet Archive to do the archiving of a nation is risky. The last Executive Branch web harvest that NARA conducted preserved 75 million web pages, many which will be valuable records for historians in the coming decades. The Internet Archive may cease to exist in 10 years, but the archives will only grow more valuable with time.
Not capturing federal web sites now may mean losing millions of web pages authored under the Bush administration when leadership changes in January 2009.
In an inspirational NARA video, they say that "NARA has been a public trust on which our democracy depends—NARA enables people to personally inspect the record of what the government has done."
That may be true, but if the choice for NARA comes down to using resources to capture the glossy PR brochure of the Web or capturing millions of real records of the Administration, I'd vote for the latter. NARA isn't the demon here -- Congress and the Administration that sets funding for NARA is where the real issue lies.
Unfortunately, it was not noted in the NARA release.
The two Executive Office web harvests or snapshots were very, very limited. At best, they went down 2-3 levels, and did not cover all web information found on Federal websites. My understanding is that these web harvests cost several hundred thousand dollars. Any further harvesting beyond the few levels covered would have in fact, cost several hundred thousand more dollars.
NARA does in fact, want agencies to schedule websites as records, and provide appropriate disposition instructions. Those that lack permanent value will be deleted and disposed of as appropriate; those that are of permanent value will be scheduled as such. In fact, if the NARA website was reviewed, I think that you might easily find web transfer instructions.
All of this is based on the quaint but efficient and economical concept of scheduling records, and transferring permanent records to NARA. I could go on about the arbitrary capriciousness of the web crawler initiative, but I won't. Leave it to say it was not well thought out, is not efficient, is not economical, and really, poorly reflects the Federal web presence.
The 2006 Congressional webcrawl was a funded Congressional mandate. An "earmark." It, too, cost a few hundred thousand dollars. For the National Archives, excluding the Electronic Records Archives system, a few hundred thousand dollars is real money. Just ask the NARA reference unit trying to provide reference services, the custodial units trying to process a few hundred thousand feet of records, the records management unit trying to schedule records, and many other units trying to fullfill a mission with a staff that has decreased considerably- perhaps as much as 40% over the past 8 years. I hear no one screaming from the rooftops about this.
NARA "errors" are, how do we say, "low hanging fruit." Reach not very high, and "voila!!", there is a rotten one. No muss, no fuss. Don't have to actually review the NARA website for information on what we might be doing about websites, do we? God forbid we actually research the Federal Budget and such.
I think I could prove that 75% of all NARA critics lacked any real basis for their criticism.
Bottom line- Yup, a few hundred thousand dollars is real money to us. And yes, a "web crawling" mission is not a priority. Sorry. North Texas has the bucks, has done a good job, and is now affiliated with GPO and NARA. Check it out.
Pretty darn good stuff for government-academic cooperation.
In regards to my previous comment, I have recently been made aware that certain sections are not relative to civil discussion, particularly a certain "hypothesis." In addtion, but the tone could also stand a little improvement. Though what is said can still stand on whatever merit remains, I apologize for any non-civility. Cheers!
Why do you think the Internet Archive will not be around in 10 years? I work at the Archive and there is long term funding and a track record.
The Internet Archive has done .gov domain crawls for the National Archives. These are deep crawls of those domains bringing back many millions of pages, not just the surface.
There are several libraries talking now about how to continue and grow this tradition. Kris Carpenter, the head of the Web Group at the Internet Archive would be happy to coorespond with anyone that would like to be involved.
We read with interest your postings on this topic.
The National Archives and Records Administration (NARA) has posted background information regarding our web harvest decision at http://www.archives.gov/records-mgmt/memos/nwm13-2008-brief.html. This background document includes links to our guidance products related to web records and the decisionmaking process we went through to arrive at our decision.
Paul M. Wester, Jr.
Director, Modern Records Programs
National Archives and Records Administration