About

Overview

The Web Curator Tool (WCT) is a tool for managing the selective web harvesting process, and is designed for use in libraries by non-technical users. It is integrated with v3 of the Heritrix web crawler which is used to download web material (but technical details are handled behind the scenes by system administrators).

The WCT supports

  • Harvest Authorisation: getting permission to harvest web material and make it available.
  • Selection, scoping and scheduling: what will be harvested, how, and how often?
  • Description: Dublin Core metadata.
  • Harvesting: Downloading the material at the appointed time with the Heritrix web harvester deployed on multiple machines.
  • Quality Review: making sure the harvest worked as expected, and correcting simple harvest errors.
  • Submitting the harvest results to a digital archive.

What it is NOT

  • It is NOT a digital archive or document repository - It is not appropriate for long-term storage - It submits material to an external archive
  • It is NOT an access tool - It does not provide public access to harvested material - (But it does let you review your harvests) - You should use Wayback or WERA as access tools
  • It is NOT a cataloguing system - It does allow you to record external catalog numbers - And it does allow you to describe harvests with Dublin Core metadata
  • It is NOT a document management system - It does not store all your communications with publishers - But it may initiate these communications - And it does record the outcome of these communications

The Web Curator Tool supports a harvesting workflow comprising a series of specialised tasks:

  • selecting an online resource
  • seeking permission to harvest it and make it publicly accessible
  • describing it
  • determining its scope and boundaries
  • scheduling a web harvest or a series of web harvests
  • performing the harvests
  • performing quality review and endorsing or rejecting the harvested material
  • and depositing endorsed material in a digital repository or archive.

Most current web archiving activities rely heavily on the technical expertise of the harvest operators. The Web Curator Tool, on the other hand, makes harvesting the responsibility of users and subject experts (rather than engineers and system administrators) by handling automatically the technical details of web harvesting. The tool is designed to operate safely and effectively in an enterprise environment, where technical support staff can maintain it.

History

The National Library of New Zealand has a legal mandate, and a social responsibility, to preserve New Zealand’s social and cultural history, be it in the form of books, newspapers and photographs, or of websites, blogs and videos. Increasing amounts of New Zealand’s documentary heritage is only available online. Users find this content valuable and convenient, but its impermanence, lack of clear ownership, and dynamic nature pose significant challenges to any institution that attempts to acquire and preserve it.

The Web Curator Tool was developed to solve these problems by allowing institutions to capture almost any online document, including web pages, web sites, and web logs, and most current formats, including HTML pages, images, PDF and Word documents, as well as multimedia content such as audio and video files. These artifacts are handled with all possible care, so that their integrity and authenticity is preserved. The public benefit from the safe, long-term preservation of New Zealand’s online heritage is incalculable. Our online social history and much government and institutional history will be able to be preserved into the future for researchers, historians, and ordinary New Zealanders. They will be able to look back on our digital documents in the same way that the New Zealanders of today look back on the printed words left to us by previous generations.

The software was developed as a collaborative project between the National Library of New Zealand and the British Library, conducted under the auspices of the International Internet Preservation Consortium. The Web Curator Tool has been built with support and contributions from professionals at the National Library of New Zealand, the British Library, Sytec Resources Ltd., Oakleigh Consulting, the National Library of Australia, the Library of Congress, and many others.

Project objectives

  • Meets the needs of the National Library of New Zealand
  • Meets the needs of the British Library
  • Is modular and can be extended to meet the needs of IIPC members and other organizations engaging in web harvesting
  • Manages permissions, selection, description, scoping, harvesting and quality review
  • Provides a consistent, managed approach allowing users with limited technical knowledge to easily capture web content for archival purposes.
  • The National Library of New Zealand has used the Web Curator Tool as the basis of its selective web archiving programme since January 2007. It is the primary tool and responsibility of the web archivists in the Alexander Turnbull Library.

The tool is open-source software and is freely available for the benefit of the international web archiving community.