Data and Discovery: Roz at Midwinter

This article is more than 5 years old.

So last Midwinter I wrote my post on the them of eBooks as that was the dominant thread that ran through my sessions. This year I thought I’d do another ‘theme’ issue rather than a regurgitation of the sessions I have attended. This year, for me at least, the theme has been data and discovery. Beginning with Carol Tenopir’s presentation and followed by presentations by Serials Solutions and EBSCO and then into the Top Tech Trends and other sessions, more people are talking about data. The conversation involves several issues.

First, the data we collect in our libraries and how that data could be leveraged to improve our services, promote ourselves and help people visualize our value. I had already been thinking about this value issue from the assessment angle after the ACRL paper on the value of academic libraries came out. There is a tension in libraries between protecting the privacy of our users and realizing that data mining has helped commercial vendors create and improve their services. So, for example, the highlights in a Kindle book are viewable to others who also have that book – so if we can determine the parts that readers find important, or even how far in a reader reads, what could that tell us about an author or publisher, etc. If we tracked trends in the books that were checked out from year to year might we be able to see what areas are growing in terms of book circulation, etc. If, for example, we knew that every book we had on cyber war was checked out every year for the past three than we could perhaps see that we need more books in that area. But that requires keeping more and deeper circulation data AND having people on staff who can mine that data.

Which brings me to the next data point (pun intended) which is that as we create and purchase more data, we need to create new positions in libraries. One is the data librarian (which many large libraries already have) who can help patrons navigate the datasets, research data and statistical sources that we are increasingly adding to our collections. That is one kind of data position. The other is someone who can mine and interpret the data that we have in libraries that help us to improve our services. Web site usage, circulation stats, and other data that goes far deeper than the more superficial statistics most libraries now keep and that our accreditation agencies and ACRL demand. Is ‘presentations to groups,’ my personal pet peeve in the statistical world, really indicative of how good our services are?? I don’t think so. But more qualitative data or more data-mining types of information might actually help us demonstrate long term value to our institution.

A third data point that has been circulating is how to ‘curate’ the data that is produced on our campus by our patrons. This is not my area of expertise but it is an interesting issue going forward as we think toward cloud storage, institutional repositories and the like.

SO then there is the ubiquitous ‘metadata’ discussions and that brings me to my second theme which is discovery. The abundance of information that confronts our faculty and students as they research is something we have long seen as an issue. It is not particularly efficient to have to go to multiple interfaces, using different search strategies just to get what you need. This is why Google is so popular. People feel like they are searching everything at once. Search ‘William Shakespeare’ in Google and you get pictures, videos, books, fan pages, everything. What is missing, of course, is the filtering for quality that we know library sources provide. So the search has been on for a while for that application that that ‘Googleize’ library content. Federated searching was the first attempt to do this – but it was slow and relied on connectors going out to various sources and searching AFTER you typed in your terms. The results were lowest common denominator searches and lots of time-out errors. The current set of these sources are being called ‘discovery services.’ Serials Solutions (owned by ProQuest) has one called Summon and EBSCO’s is called Ebsco Discovery Service. Of course the big drawback is that Summon doesn’t search your EBSCO content and EDS doesn’t search ProQuest content. That is a BIG drawback (so is the cost), but the demos I saw of both of these gives me hope that we may be nearing a new age in discovery where the searches are comprehensive, lightening fast and wickedly useful. The ruminations I have been doing on Carol Tenopir’s presentation about how we market ourselves in the faculty’s search process as time-savers has really stuck with me. If in seconds you can search our catalog, databases, journal subscriptions, Institutional repository, etc. and get back results that you can then use clear facets to make more relevant then we do our students and faculty a huge service. But it comes at a cost and there are no open source competitors on the horizon because the technology is based on having the metadata pre-indexed and that would require the big vendors to give you their metadata.

OK – that’s enough for now and it frankly pales in comparison to the news about our library award but I wanted to get it written before going off for usability testing for ERIC and a last run through the exhibits. Keep your fingers crossed for us getting home tomorrow and stay safe and warm!!

2 Comments on ‘Data and Discovery: Roz at Midwinter’

Lynn
Excellent analysis, Roz!
1:01PM, 1/10/2011
Mr. Gunn
Well this is as good an invitation as I’m likely to get, so let me jump right in and mention Mendeley as a provider of discover tools as well. In addition to searching the research catalog at http://www.mendeley.com/research-papers/ you can also get personalized recommendations tailored to you based on your reading history. Hopefully I’ll be able to talk much more about this at ALA11.
6:44PM, 1/11/2011