The Archivist’s Nook: Get Paranoid – Data Collection in Libraries

…with your user data?
…with your user data?

The following post was authored by Digital Archivist Paul Kelly.

The issue of patron and student privacy has raged across library school classrooms and the profession in general since time immemorial. Indeed, my own MLIS final exam hinged upon presenting a cohesive (ha!) data collection plan for a mid-sized university that balanced the rights of students, the needs of institutions, and various legal requirements. Some librarians, energized by, for example, the Snowden revelations of 2013 or the fact that the Google education ecosystem tracks student activity with no opt-out clause, have kick-started initiatives to not only increase awareness of privacy issues but also help libraries take concrete steps to combat what could be interpreted as infringements on intellectual freedom. Whether setting up public Tor nodes (a core component of the Library Freedom Project) or using Riseup.net email addresses actually improves privacy or not is debatable, but one thing is clear – this is a conversation worth having.

Which brings me to the Archives here at CUA. It’s important to note that our mission is not that of a public library, and while we share similar values, there are a few differences in the ways that we operate. Number one – our reading room does not contain public computers, so the Tor issue is borderline-moot. Number two – as an archives, researchers must sign in before they handle original documents, so some data collection is inevitable. Finally, the data that we do gather are centered not so much around the researcher, but on the materials being used for research.

Library Freedom Project
Library Freedom Project

So what data are we collecting, and how are we using them? There are several sides to this! One local database, which was created in Microsoft Access, is comprised of details that researchers have entered on our sign-in form (mandatory fields are name, contact number or email, and research area). Secondly, a set of Google spreadsheets are utilized to track reference questions (mandatory fields are collection name, date initiated, and date completed). Finally, we utilize Google Analytics to determine which of our finding aids are most popular in any given month.

Tor Project
Tor Project

What does this allow us to do? Well, we compare information from each of these sources to inform our digitization plan (a plan that continues to evolve based on that growing pool of data). We also examine our usage data to determine which collections should reside on campus, and which can be stored off-site. As a final example, we use our reference data to ask questions about levels of service, and where we can improve in terms of speed and accuracy.

Clearly, the above is an overview, so feel free to email us if you’re interested in more specifics. Double-clearly, we use platforms that have come under fire from privacy advocates. Let’s be realistic, though – in the modern world, this can be extremely difficult to avoid completely. That said, I like to think that we minimize risk and retain only what we need.

Until next time, be sensible about giving out your personal information, and don’t be afraid to ask questions before (or after) you do. Thanks for reading!

Share this: