The University Archives seeks to collect academic, administrative, and social web content relating to Wake Forest University. This collection includes materials published by Wake Forest University, news about Wake Forest University, social media documenting University life, and related scholarship available on the web.
Identifying & Capturing Content
We use web content to complement our physical holdings based on our University Archives Record Groups. We identify creators and assign metadata based on these groups. Our selection criteria is based on our Records Retention Schedule.
To capture the content of wfu.edu and other Wake Forest-related domains, we use Archive-it, a subscription service from the Internet Archive, to harvest and preserve collections of digital content. After the content is harvested, it is available and searchable through Wake Forest University Archives collection in Archive-It as well as through the Wayback Machine. Other tools are available for collecting web content, but we appreciate both the stability of Archive-It as a well-funded tool as well as its search tool that allows searching of Wake Forest University collection.
Social media is often challenging to harvest, especially as new sites and apps develop. So as the web evolves, other tools may be used. We work with Archive-It to utilize best practices and software developments to optimize our social media captures and strategies. In order to collect University-related sites more effectively and completely, archivists or librarians may reach out to site owners and discuss how to edit their site to improve its crawl content.
How Archive-It Works
Our web collections are harvested, stored, and accessed through Archive-It, a subscription service from the Internet Archive. The University Archives selects websites to be crawled through Heritrix, a web archiving tool developed by the Internet Archive. The crawler captures web domains or individual web pages, taking a snapshot of the page and storing a copy in the Internet Archive, which can be accessed through our individually curated collection in Archive-It as well as the WayBack Machine.
Frequency of Crawls
Web sites are crawled on an annual, semiannual, quarterly, monthly, weekly, daily, or one-time basis. As sites are added to the collection, we assign a crawl frequency and regularly assess the status of each seed’s crawl, adjusting as needed, based on the site’s activity, including publication schedules and frequency of content updates.
Scope & Limitations of Crawls
On a collection level, we may limit each web crawl by number of documents, data, or time; there is also an option to crawl PDFs only.
When working on an individual seed, we adjust a variety of settings such as robot exclusions, regular expressions, and SURT rules to more completely capture websites in our collection.
Some websites prove challenging to Archive-It’s web crawlers, so final capture of pages may be incomplete. In many cases we are able to capture downloadable media, including audio and video files. We do our best to review site crawls and perform quality assurance to capture the media as completely as possible. Media behind a paywall or login, or that requires a search query, may be unable to be captured. Archive-It does not provide the ability to capture live streaming audio and video, as of now.
Access & Use Policies
Access to the WFU Web Archive is provided via the Archive-It interface.
Copyright & Permissions
Special Collections and Archives does not claim copyright for any materials that we collect from the web. In most cases, the rights holder is listed within the site captured. This is mainly a University Records collection, so the most common rights holder for this collection is Wake Forest University.
Anyone with questions, concerns, or takedown requests regarding content contained in the Wake Forest University Web Archives Collection is encouraged to contact Special Collections and Archives at email@example.com. We will evaluate takedown requests from the Collection on a case-by-case basis. Wake Forest University has the right to remove or takedown any web site or web content at any time.