Thomas at the AWS Carolinas Library Summit
On March 23, 2023, I attended a meeting held at the Atkins Library, UNCC, hosted by Amazon Web Services. AWS has been making a conscious push into higher ed in the last few years, and they wanted to showcase a couple of library-oriented projects in the region and show off some very cool software.
In an opening talk, Greg Ritter, their Higher Education Lead, discussed how AWS can support libraries with four main challenges they hear from clients:
- Keeping pace with the growth of digital collections (think: massive amounts of incredibly reliable cloud storage, but also huge machine learning-based boosts to descriptive metadata for photo and video archives)
- The changing needs of scholars and researchers as digital tools become more available and easier to use (think: Digital Humanities projects and research data curation)
- Quantifying and assessing the Library’s impact on student outcomes (library analytics)
- Transforming library physical spaces (notably including space usage sensors, down to individual seats, and managing the data they generate)
One of the specific tools they’ve created is a set of reference architectures. That is, they’ve established pre-configured setups for applications like Archivematica for digital preservation, or Omeka, for digital presentations. This greatly simplifies getting started with these platforms.
Speakers from USC and UNCC talked through several projects they’ve done with AWS. UNCC is on a journey comparable to ours, having moved their on-premises servers to the AWS cloud. They are also beginning work on a one million-photograph collection of NASCAR images for which they will use AWS software to identify drivers and other individuals.
USC described a project to extract the full text of approximately 10,000 scanned probate records from South Carolina, documenting the legal transfer of property. The challenge is that these are handwritten, tabular records from the early to mid 19th Century. 19th Century cursive handwriting is tricky enough, not to mention faded ink on stained paper, but OCR systems chronically have problems making sense of tables and forms, so this was impressive to see.
In the afternoon, I attended the technical track meeting that stepped through some sample projects with some of this software. We worked with Rekognition, the software that “understands” the content of images, and Textract, the text recognition software. Text grabbed with Textract can then be piped to other apps for translation or for semantic comprehension (for example, determining that “London’s Whitechapel district” refers to a place, and “Jack London’s White Fang” refers to a person and a book).
As very early adopters, we know the value we have gotten using AWS servers. This was an illuminating exploration of other services we may want to take advantage of.