Planet Code4Lib

Law and Mol / Ed Summers

Law, J. and Mol, A. (1995). Notes on materiality and sociality. The Sociological Review, 43(2):274–294.

While both [Law] and Mol were mentioned in Nicolini (2012) this article wasn’t referenced by Nicolini. I can’t remember where I ran across it now, which is a bit of a shame because I really enjoyed it. I can see from my BibDesk database that I added it the day before I added Nicolini. So perhaps it turned up in some bibliographic research I was doing when I was putting together the reading list for my independent study.

Law and Mol’s goal in this paper is to describe how materiality and sociality are produced together–but they don’t specifically use the term sociomateriality themselves. It seems that the term sociomateriality has been more prevalent in organizational studies, and was established largely by Orlikowski. I’ve got Orlikowski (2000) on my reading list already. I think I added it because it is where she pivots from Giddens’ structuration theory towards practice theory, and first starts talking about sociomateriality. I can see she references Latour and Law, so perhaps these ideas have their source partly in this work by Law and Mol? The use of the word produced here is also interesting because I was really interested in exploring the idea of [coproduction] from Jasanoff, where knowledge and technologies are developed together.

At any rate, Law along with Latour and Callon helped establish Actor Network Theory, which took a quick look at in my last post. In the 90s he started moving away from the idea of ANT being a theory as such, and cited his collaborative work with Mol as being one of the reasons for this shift. It appears that this article was the second collaboration, at least in their writing. The first appears to have been just a year earlier in Mol & Law (1994). It’s interesting to identify these moments of intellectual shifting, where one idea gives way to another. Perhaps it’s where the limits of theory are easiest to see.

The style if very sparse and is driven by short case studies or *stories that highlight three aspects, or metaphors of materiality and sociality: semiotics, strategy and patchwork. For an example of this sparse style, which I really like, here’s how they start out:

What is materiality? What is sociality? Perhaps these are two different questions. Perhaps materiality is a matter of solid matter. And sociality has to do with interactive practices. Perhaps, then, sociology departs from matter. Perhaps it ‘departs’ from it in two different senses: perhaps it both rests upon it; and it goes beyond it. To say this would be to hold on to materialism. And to idealism. Together. It would be to hold on to a traffic between the two. An interchange.

Perhaps. But perhaps not. Perhaps materiality and sociality produce themselves together. Perhaps association is not just a matter for social beings, but also one to do with materials. Perhaps, then, when we look at the social, we are also looking at the production of materiality. And when we look at materials, we are witnessing the production of the social. That, at any rate, is a possibility. The possibility that we explore here.

I like the rhythm of the words here, and how it holds and turns the ideas around. The use of the stories (electric cars, pasteurization, speak & spell games, World War 2 bunkers, Long Island parkway) are also really helpful because they help ground the philosophical discussion in particulars. They are also drawn from other work by Latour, Langdon Winner and Sherry Turkle which constellates their ideas.

I hadn’t run across the use of semiotics in my reading of practice theory yet…so that was a novel introduction. Perhaps it was trendier in the 90s when this was written. Semiotics basically provides flattened space, where the social, mind, truth, knowledge, science can be deconstructed or dissolved.

Sherry Turkle’s stories suggest that the dividing line between people and machines is negotiable. And that sometimes it is difficult to draw a line at all. So that what we see is heterogeneous. Think of that heterogeneity. People have dental fillings, spectacles, drugs, heart pacemakers, condoms, alarm clocks, dresses, telephones, shopping bags, money, books, identity cards, bus passes and ball-point pens. And machines have drivers, pilots users, service-people, designers, victims, onlookers, look-outs, cleaners bricoleurs, adapters, admirers and abusers.

This flattened space, and the stylistic use of lists, reminds me a lot of Object Oriented Ontology, except that OOO attempts to circumvent anthropomorphism altogether and imagine what it’s like to be a thing. Law and Mol are keeping the baby in the bathwater, as it were. While I think it can be an interesting (and useful) poetical experiment to imagine what its like to be a thing, I appreciate this approach where the human is decentered, but still in the picture. I guess this flattened space has most in common with ANT, where objects can have agency as well as humans.

The idea of strategy is another metaphor they introduce. They cite the now classic example of Moses’ bridges over the Long Island Parkway that Winner (1986) used as an example of how artifacts have politics. Basically strategies are relations between people and objects, to achieve particular objectives. Sometimes the strategies shift but the material configurations don’t. Sometimes there are latent strategies that were not planned for, but were enabled. Stategies are embodied in performance which organize and produce material distinction.

This feeds into their last metaphor of patchwork, or the ways in which multiple strategies can coexist in a given scene. This idea draws directly on Mol’s idea of multiplicity or multiple forms of materiality. They use an example of a doppler probe being used in three different medical settings to show how the object participates in different material realities. These realities can reinforce each other, and drift apart. Sketching out these different scenarios is what patchwork is all about–and it seems to have a lot in common with practice theory. In fact I wonder if practice and strategy here have a great deal in common.

Interestingly Mol’s idea of bracketing came up in my ethnographic methods class reading just a week ago. I think this idea of patchwork, or multiplicity could have a lot to offer in looking at web archives, so I’m going to give Mol (2002) a read soon. Another idea that I like, which Law and Mol stress is the importance of durability. Networks of relations between people and artifacts can last longer than their strategies…which can cause them to be repurposed and improvised upon. They use this example or story:

A few miles outside Utrecht the fields are filled with large blocks of concrete and heavily armoured bunkers. These are part of a line of defence built by slaves for the Nazis during World War Two. The object was to preserve the Thousand Year Reich. Happily, though the concrete blocks remain, they didn’t work. Like the elaborate nineteenth century system of flooding the polders which failed to save the Netherlands from the Nazis in 1940, the blocks didn’t stop the Allied advance in 1945.

The object-networks which were supposed to obstruct allied tanks did not stand in the their way. The soldiers are gone, and when the rain drives across the flat Dutch landscape the concrete blocks shelter cows. So the concrete is still there. But it isn’t an element in a Nazi network any more. That network was less durable than some of its concrete elements.

For some reason while reading this paper I got to thinking (again) about the idea to study open source repository systems through the lens of sociomateriality…or now perhaps practice theory. Mark Matienzo suggested to me during a cab ride in NYC this summer that it could be interesting to do a study of the development of the Fedora repository software. I think it could be an useful stepping stone in showing how the concept of “digital preservation” is built up or ontologized. I haven’t read it yet, but I think this could build off some recent work that looks at the role of standards such as OAIS in the digital preservation community (Bettivia, 2016). I just thought I’d jot down the idea in case I ever need to return to it, or it’s the seed of an idea for someone else to run with. I’m not sure it fits in completely with my idea of studying web archives…but perhaps it could.


Bettivia, R. S. (2016). Encoding power: The scripting of archival structures in digital spaces using the open archival information system (oAIS) reference model. (PhD thesis). University of Illinois Urbana-Champaign.

Mol, A. (2002). The body multiple: Ontology in medical practice. Duke University Press.

Mol, A., & Law, J. (1994). Regions, networks and fluids: Anaemia and social topology. Social Studies of Science, 24(4), 641–671.

Nicolini, D. (2012). Practice theory, work, and organization: An introduction. Oxford University Press.

Orlikowski, W. J. (2000). Using technology and constituting structures: A practice lens for studying technology in organizations. Organization Science, 11(4), 404–428.

Winner, L. (1986). The whale and the reactor: A search for limits in an age of high technology. In Daedalus (pp. 121–136). University of Chicago Press.

Data and Humanism Shape Library of Congress Conference / Library of Congress: The Signal

Photo of Jer Thorp. Photo by Shawn Miller.

Jer Thorp. Photo by Shawn Miller.

The presentations at the Library of Congress’ Collections As Data conference coalesced into two main themes: 1) digital collections are composed of data that can be acquired,  processed and displayed in countless scientific and creative ways and 2) we should always be aware and respectful that data is manipulated by — and derived from — people.

Jane McAuliffe, director of National and International Outreach, welcomed the attendees and live-stream viewers to “the largest repository of recorded knowledge in the world.” McAuliffe said, “It’s not enough anymore to just open the doors of this building and invite people in. We have to open the knowledge itself for people explore and use.” She introduced the Library of Congress’s new division, National Digital Initiatives — who organized the event — and said, “Today is a perfect example of the work we want them to do, leveraging the Library to bring all of you together, discussing the best practices and lessons learned from your work, thinking through next steps and what we can do even better moving forward.”

Setting the tone for the conference in his opening keynote address, Jer Thorp, of the Office for Creative Research, touched on how data points and data-collection methods have become distanced and disconnected from the humans that the data describes. His examples included a project that ran a sentiment analysis of a high school Twitter feeds to find “the saddest high school in New York.” Researchers initially released inaccurate results, assuming the Tweets came from Hunter College High School when they actually came from a single Twitter account located just south of the school. Students themselves pointed out that high schoolers do not use Twitter. Thorp commented on advertisers freely taking personal data from people’s browsing habits and how information gleaned from browser ads and cookies creates a distorted picture of individuals. To prove his point, he paid a group to write a profile of him based on the ads that targeted his browser. Thorp — an exuberant artist, teacher, father and husband — said, “I learned some things about what advertisers believe about me…I am sad and I live alone and play video games.”

He displayed data visualizations. There were his travel patterns going to and from work, and visualizations of people around the world Tweeting “good morning,” GIS-mapped to their geolocations (green dots representing people who got up early and red for people who got up late). Shifting to citizen science, he displayed a project that visually correlates weather events with chronic pain sufferers. Every week, volunteers submitted information about their pain, which was mapped to weather data. The project benefited both patients and doctors.

Thorp got away from visualization of solid data by asking the rhetorical question, “How do we present the cold clinical magnitude of data alongside the human story?” He demonstrated the Time of the Game project, where he and his colleagues overlapped many digital photos of people, residing in different places, watching the same World Cup game. They centered the television screen in each photo and aligned the photos so that as they flickered in an animated sequence, the TV screen became a hub around which the images of people changed. The visual effect conveyed viscerally — in ways that words could not — a shared, collective experience.

In another example of citizen science, Mark Bouslog, of Zooniverse, spoke about the power of crowdsourcing and what the Zooniverse site calls “people-powered research.” He showcased Zooniverse’s do it yourself Project Builder tool  and a few tagging and transcription projects. Zooniverse’s Galaxyzoo, for example, invites volunteers to classify galaxies by stepping them through simple observations about different galaxies’ features. “The pattern recognition for the initial classifying, humans excel at, while computers yet do not,” said Bouslog. “The results were incredible. There were over thousand people contributing and it was eventually determined that the crowd-source consensus was as good as classifications shown by professional astronomers showing that we have gotten it right. Some of the volunteers are listed as co-authors.” Penguin Watch invites volunteers to identify penguins, again through carefully guided input and a controlled vocabulary. The interface is clean and colorful and there are competitive games to engage the volunteers. To date, 37,723 volunteers have cataloged 4,353,970 images. As for transcription of handwritten letters, Bouslog said of projects such as Shakespeare’s World that project organizers make it as simple and foolproof as possible to ensure success. “We don’t ask volunteers to transcribe an entire page or even transcribe the entire sentence. You simply ask them to transcribe one visual line at a time. That they feel they can do in confidence.”

Photo of Kate Zwaard. Photo by Shawn Miller.

Kate Zwaard. Photo by Shawn Miller.

Kate Zwaard, of the Library of Congress’ National Digital Initiatives division, addressed the perceived-versus-real tension between innovation and sustainability and how that informs NDI’s work. She talked about Henriette Avram, who helped create MARC standards, and how Avram embodied the complementary skills of computer programming and library science to create a durable cataloging system. Zwaard spoke about the complexity of creating a major collection online, about the number of resources — human, software and hardware — that go into it, and how online collections require complimentary skills. She said NDI’s goals are to maximize the benefit of the Library’s digital collections to the American public and to the world; incubate, encourage and promote the digital innovation; and collaborate with other cultural-memory institutions and digital creators. Zwaard said, “What we do have the power of is knowing people, knowing technology and being able to connect folks.”

Elizabeth Lorang, of the University of Nebraska–Lincoln, researched the challenge of thinking about text in visual terms and computationally finding text that was part of image files. Using image-recognition software, they searched for poems by their shapes. Their reasoning was that regular newspaper text, for example, has a predictably boxy shape but poetry, with its staggered lines and generous use of space, has distinctive shapes. By means of Image Analysis for Archival Discovery, or AIDA, Lorang and her team discovered a batch of poems in images from the Chronicling America collection.

Leah Weinryb Grohsgal. Photo by Shawn Miller.

Leah Weinryb Grohsgal. Photo by Shawn Miller.

Leah Weinryb Grohsgal of the National Endowment for the Humanities and Deborah Thomas of the Library of Congress also spoke about the Chronicling America project and about the NEH Data Challenge built upon it. Thomas began by profiling Chronicling America. She said, “American historic newspapers are actually archived in state libraries around the country, so using digital technologies we are able to bring this material back together again through the partnerships with the NEH…The data is available for harvesting or reuse outside of the individual interfaces that we provide through the website. So we have digitized page images. We have optical character recognition, which is machine-readable text. We have metadata, which surrounds every page and issue in a standardized METS format for MODS description characteristics, which describes the place and time of that particular issue as well as the newspaper directives. All of this information can be taken out of the site and analyzed in different ways for researchers that don’t involve the actual website.” Grohsgal talked about the results of the Chronicling America Historic Newspaper data challenge, which she wrote about in The Signal in August, and took a closer look at each of the winning entries: Biblical Quotations in U.S. NewspapersAmerican LynchingHistorical Agricultural NewsChronicling and Revealing History with Chronicling America.

Nicole Saylor, of the Library of Congress, talked about the American Folk Center and, among other things, how it engages the public to contribute to AFC’s collections. At Halloween 2014, for example, AFC invited people to share their favorite photos on Flickr with the hastag #FolklifeHalloween2014. Saylor also spoke about the personal stories acquired by AFC through StoryCorps and the StoryCorp app. Saylor said, “By leveraging or partnering with third-party software platforms, these efforts allow us to focus on preservation and long term access of records while still supporting immediate and dynamic engagement in the community.” Saylor also touched on the subject of bias in metadata, which several other speakers also addressed throughout the day, and how resources such as Traditional Knowledge Labels were enabling communities who have a personal stake in the collections to add their own metadata and reflect their own understanding and viewpoint of the content. This same topic was addressed by another group of scholars a few weeks earlier at the Library of Congress’s American Folklife Collections, Collaborations & Connections Symposium.

Matthew Weber. Photo by Shawn Miller.

Matthew Weber. Photo by Shawn Miller.

Matthew Weber, of Rutgers University, spoke about the Archives Unleashed datathon, which the Library of Congress hosted in June. He said of his collaborative digital humanities work, “I’m a communications scholar and historian and Jimmy (Lin) is a computer scientist and we come from entirely different backgrounds and we speak entirely different languages as academics. And yet together in the same room, we are able to work with data and create meaning out of that data as we collaborate.” Weber talked about the increasing commonality of such collaborations and stressed the need for data laboratories in which scholars can come together and exchange ideas, and how scholars need to be educated about data-processing tools for their research.

Ricardo Punzalan, of the University of Maryland, talked about “virtual reunification” as a strategy to enable dispersed collections to be brought together. He said that in the past, the archival community acknowledged in their finding aids that related elements of the physical collection reside in other institutions but now you can link collections together virtually. He pointed to the Walt Whitman archives as an example of unified resources from various institutions. Punzalan also talked about repatriation, returning things to their cultural owners after the objects have been digitized. This cultural sensitivity echoed Saylor’s mention of Traditional Knowledge Labels and a presentation about repatriation that was presented at the IASA conference at the Library of Congress. In a related repatriation note, the Library of Congress recently donated digitized holdings relating to the culture and history of Afghanistan to cultural and educational institutions in Afghanistan for use in their own digital libraries and online repositories.

Bergis Jules, of UC Riverside, talked about Documenting the Now, which builds free and open-source tools for collecting, analyzing and sharing Twitter data. His work was inspired by the activism and protest that followed the police killing of Michael Brown in Ferguson, Missouri. Jules said that there was more to gathering information about such public events than just archiving Tweets and photos and news stories. He said, “We had the responsibility really not to forget that there are in fact people behind all of this data. We are really interested in how our building of these collections might affect peoples’ lives. It’s also why we are being really transparent with our work, at the same time trying to help build a community of people who also value these ideas….It’s about valuing people enough to care about how we collect and store their data.” Jules also raised the issue of privacy concerns, as Thorp did. “How will our collections of social media data be different than those built by law enforcement or private security firms?” Jules said. “How will the library respond to requests from private security firms and law enforcement for the data?” He said that we need to directly engage with users of social media regarding how collecting this type of data might affect their lives.

Photo. Left to right: Abbey Potter, Nikki Saylor, Maciej Ceglowski, Bergis Jules. Photo by Mike Ashenfelder.

Left to right: Abbey Potter, Nikki Saylor, Maciej Ceglowski, Bergis Jules. Photo by Mike Ashenfelder.

Maciej Ceglowski, founder of the social bookmarking site Pinboard, delved even deeper into privacy concerns. “I worry about legitimizing a culture of surveillance,” Ceglowski said. “I am very uneasy when I see social scientists working with Facebook, for example.” He was wary also of finding patterns in data as an end in itself. Ceglowski spoke of the failure of imagination by so-called experts and he encouraged the audience to honor individuality. He recalled how slowly many people embraced Wikipedia. “I saw the (Andrew Mellon Foundation) librarians fail to engage in (Wikipedia in) the early days, a service that they later grew to love, basically because of the lack of trust and openness to an experiment around unstructured tagging,” he said. He acknowledged that collaborating with communities means relinquishing some amount of control, which is frightening and fascinating. He cited how people who would consider themselves technology amateurs actually develop marketable skills just by working and playing on the web. And how communities that form organically through online spaces, such as social bookmarking sites, actually help each other. Ceglowski said, “My dream of the web is for it to feel like the big city, a place where you rub elbows with people that are not like you. A little scary and chaotic and full of many things that you can imagine, and many things where you can’t and also for people to be themselves and for people to create their own spaces and to learn from one another.”

Harriet Green, of the University of Illinois at Urbana-Champaign, spoke about digital scholarship in the Humanities and Social Sciences. Green and her colleagues are developing “Digging Deeper, Reaching Further,” a project empowering users to mine the HathiTrust Digital library resources. Green said, “We enable researchers to gather the information from the library and work with data sets and analyze and produce new findings. We facilitate the access to textual data.” DDRF’s goal is to train the trainer, to teach librarians fundamental text mining skills and how to work with data. They also train librarians and researchers together. They will eventually launch pilot workshops and take them on the road to major conferences and key geographic areas.

Trevor Muñoz, of the University of Maryland, spoke about involving the community with the archives that are supposed to represent it. He focused an initiative titled, “African American History, Culture and Digital Humanities,” or AADHUM. One of their goals is breaking down barriers in the community that they are archiving and engaging with the people the archived stuff comes from. Muñoz said, “We put the data in the center and asked our community all of the ways in which they might wish to respond to it, without presuming that we know particular methods or techniques that we need to communicate outward to them.” Muñoz said that the communities that AADHUM builds will be as much a measure of success as the programming it produces. He sees AADHUM as being a  feedback loop that continuously informs and restructures itself.

Marissa Parham, of Amherst College, spoke about the personal element of archives. “When you are thinking about the raw material that constitutes collections, you are often talking about personal things,” Parham said. Archivists too often look at collections, especially digital, as merely a collection of data and they lose sight of the person at the heart of it. She talked about finding photos with no descriptions and about the bias of facial-recognition software technology, and in each case how the person in the photo can become disconnected from the object. The archivist needs to not be invasive and clinical but to do her best to honor the humanity of the person the archives is built on. In speaking about a certain person’s archives, Parham said, “Much of what is at stake in example comes down to ownership. The idea of personal collection is an exercise in exclusion. I have to be careful in thinking about the archive as her collection. It’s a collection of her stuff…We must have humility about the stewardship.”

Thomas Padilla, of UC Santa Barbara, delivered the ending keynote. Padilla stressed the need for expanding an individual’s capacity to act.  In the case of web archives, that applies to going beyond searching, browsing and reading archived web pages. “If we peel back the layer, to engage the underlying, less visible structures organizing the representation we see on the screen, it becomes possible to ask more questions of the collection as data,” Padilla said. As an example, he cited Ian Milligan’s network visualization of connections among Canadian Political Parties and Interest Groups.

Padilla faulted a lack of administrative support for exploration and experimentation in many institutions and he singled out the Cooper Hewitt museum and the British Library labs for “brave experiments” that generated innovative results. “You may have heard of (the British Library Labs’) pioneering work, automatically extracting a million images from historical text, and making them available via Flickr under a cc-0 license,” Padilla said. “Subsequently, they have encouraged a number of competitions for working with the collections that they release as data. These efforts…provide opportunities to refine the way that we prepare and provide access to collections, and can lead to concrete reciprocal benefits from outside our institutions.” Padilla echoed the same sentiments of many of the day’s speakers, that we need to be aware of who assembled the collections, who made the decisions and how they may have influenced the collections with a subtle – or not so subtle — bias. He called for more transparency and openness around digital collections to help avoid systematic bias — gender, racial, geographical or cultural — and he singled out the Digital Library Federation’s Cultural Assessment Group  as a step in the right direction.

The Library of Congress’s 500-seat Coolidge Auditorium was filled almost to capacity, with visitors from across the United States. The event was live streamed with real-time close captioning, 200-300 concurrent views at a time for a total of over 6,000 views. Individual videos from the event will be posted online soon.

Viewers Tweeted during the conference from across the United States and from as far away as Lebanon, Finland, Italy, Germany, Norway, Switzerland, France, the United Kingdom, Ireland, Brazil, Venezuela, Canada, Mexico, New Zealand, Australia, Indonesia and India.

The Library of Congress has commissioned a report based on the presentations from this event and the small, half-day workshop that followed the next day. We hope to share that with you soon.

A Cost-Effective Large LOCKSS Box / David Rosenthal

Back in August I wrote A Cost-Effective DIY LOCKSS Box, describing a small, 8-slot LOCKSS box capable of providing about 48TB of raw RAID-6 storage at about $64/TB. Now, the existing boxes in the CLOCKSS Archive's 12-node network are nearing the end of their useful life. We are starting a rolling program to upgrade them with much larger boxes to accommodate future growth in the archive.

Last week the first of the upgraded LOCKSS boxes came on-line. They are 4U systems with 45 slots for 3.5" drives from, the same boxes Backblaze uses. We are leaving 9 slots empty for future upgrades and populating the rest with 36 8TB WD Gold drives, giving about 224TB of raw RAID-6 storage, say a bit over 200TB after file system overhead. etc. We are specifying 64GB of RAM and dual CPUs. This configuration on the 45drives website is about $28K before tax and shipping. Using the cheaper WD Purple drives it would be about $19K.

45drives has recently introduced a cost-reduced version. Configuring this with 45 8TB Purple drives and 32GB RAM would get 280TB for $17K, or about $61/TB. It would be even cheaper with the Seagate 8TB archive drives we are using in the 8-slot box.

2016 LITA Forum – Programs, Schedule Available / LITA

2016 LITA Forum logo

Check out the 2016 LITA Forum website now for the preliminary schedule, program descriptions and speakers. You’re sure to find sessions and more sessions that you really want to attend.

Register Now!

Fort Worth, TX
November 17-20, 2016

Participate with your LITA and library technology colleagues for the excellent networking opportunities at the 2016 LITA Forum.

And don’t forget the other conference highlights including the Keynote speakers and the Preconferences.

Keynote Speakers:

Cecily Walker, Vancouver Public Library

It was her frustration with the way that software was designed to meet the needs of highly technical users rather than the general public that led her to user experience, but it was her love of information, intellectual freedom, and commitment to social justice that led her back to librarianship.

Waldo Jaquith, U.S. Open Data

The director of U.S. Open Data, an organization that works with government and the private sector to advance the cause of open data

Tara Robertson, @tararobertson

“I like figuring out how things work, why they break, and how to make them work better. I’m passionate about universal design, accessibility, open source software, intellectual freedom, feminism and Fluevog shoes.”

The Preconference Workshops:

Librarians can code! A “hands-on” computer programming workshop just for librarians

Computer programming has a reputation for being highly technical, unintelligible, and out of reach for the average librarian. Meanwhile, kids as young as 7 years old are learning to code at library programs every day. This workshop will convince you that you can code. It’s not as hard as you think!

Letting the Collections Tell Their Story: Using Tableau for Collection Evaluation

The increased accessibility of data visualization platforms, both in cost and ease of use, has opened a floodgate of possibilities for telling stories. With our data in Tableau Public, available to all, we are encouraging sharing of library collections data, for the purposes of comparing measures and setting benchmarks.

Full Details

Join us in Fort Worth, Texas, at the Omni Fort Worth Hotel located in Downtown Fort Worth, for the 2016 LITA Forum, a three-day education and networking event featuring 2 preconferences, 3 keynote sessions, more than 55 concurrent sessions and 25 poster presentations. It’s the 19th annual gathering of the highly regarded LITA Forum for technology-minded information professionals. Meet with your colleagues involved in new and leading edge technologies in the library and information technology field. Registration is limited in order to preserve the important networking advantages of a smaller conference. Attendees take advantage of the informal Friday evening reception, networking dinners and other social opportunities to get to know colleagues and speakers.

Get the latest information, register and book a hotel room at the 2016 Forum Web site.

We thank our LITA Forum Sponsors:

OCLC, Yewno, EBSCO, BiblioCommons

See you in Fort Worth.



Case studies in RDM capacity acquisition: A new project / HangingTogether

rdm_imageManaging research data is an important aspect of the scholarly process, and for many academic libraries, it’s what comes immediately to mind in regard to new roles and responsibilities they will potentially assume in fulfilling their traditional mission of collecting and sustaining the scholarly record.

In response to the growing importance of research data management (RDM), OCLC Research is pleased to introduce a new project that will look at the choices research institutions make in building or acquiring RDM capacity. Our new study extends earlier work in which we documented significant trends shaping the scholarly record (summarized by the picture above), and considered the implications of an evolving scholarly record for long-term stewardship and accessibility.

As research data figures more prominently within the scholarly record, we have seen the emergence of a variety of services aimed at supporting scholars’ research data management needs. These services range from educating scholars on the benefits – or in the case of compliance, the necessity – of managing research data, to the provision of repositories where data can be stored, preserved, and accessed. From the academic library’s perspective, RDM services fit into a broader portfolio of researcher-focused services that directly engage the researcher and the research process.

The RDM service space is a dynamic one with lots of solutions emerging, including internal capacity development, cooperative arrangements, subject-specific data repositories, and commercial services. Some of these solutions directly involve the library, and others do not involve it at all; some solutions are deployed to meet institutional needs, while others are directed at the needs of individual researchers. RDM is a complicated landscape that has yet to mature.

As the RDM space continues to develop, useful perspective can be gathered from institutional efforts to address RDM needs in a variety of contexts. Our new project will take an in-depth look at the RDM offerings of four research universities: the University of Illinois at Urbana-Champaign, Monash University, the University of Edinburgh, and Wageningen University and Research Centre. The data-gathering process will include a series of interviews with key staff at each institution. The goal is to produce a case study for each institution that details what RDM services are deployed, who is responsible within the institution for the provision of these services, and where the services are sourced. We will pay particular attention to the key decision points underpinning these profiles, in order to understand the thinking that shaped the RDM offering both as it exists today and will evolve in the future. In other words, we will seek to uncover the why behind the what, who, and where.

The results of this study are not intended to offer a comprehensive picture of RDM capacity choices among research institutions, nor to support grand generalizations on optimal RDM strategies. Rather, we aim to provide a detailed look at how four institutions, operating in four different national contexts, are acquiring RDM capacity to meet institutional needs in this area. We are hopeful that readers will see something of their own institutional context in these case studies, and benefit accordingly in thinking about their local RDM offerings. While our perspective in the study will be at the institutional level, we will highlight the role of the academic library in the broader institutional RDM context.

This work will be published as a series of short reports, each dealing with a different aspect of RDM capacity acquisition. The first report (to be released in early 2017) will serve as an introduction to the series, and present a simple framework for understanding the scope and breadth of the RDM service landscape. Subsequent reports will address the nature of the RDM service offerings at each institution (the what); the institutional units responsible for overseeing the provision of these services (the who); and the choices made for sourcing these services internally or externally (the where). In addition to detailing institution-specific choices, we will also highlight points of convergence and divergence among the four institutions in each of these areas.

RDM services are an important, yet still developing area of interest for academic libraries, campus IT units, public and private funders, publishers, and individual scholars. Documenting practical experiences in acquiring RDM capacity supplies helpful “signage and wayfinding” for other institutions as they navigate this space.

Please get in touch with the project team with any questions or comments:

Rebecca Bryant

Brian Lavoie

Constance Malpas


About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. He has worked on projects in many areas, such as digital preservation, cooperative print management, and data-mining of bibliographic resources. He was a co-founder of the working group that developed the PREMIS Data Dictionary for preservation metadata, and served as co-chair of a US National Science Foundation blue-ribbon task force on economically sustainable digital preservation. Brian's academic background is in economics; he has a Ph.D. in agricultural economics. Brian's current research interests include stewardship of the evolving scholarly record, analysis of collective collections, and the system-wide organization of library resources.

What is the Open Fiscal Data Package? / Open Knowledge Foundation


This post looks at the Open Fiscal Data Package – an open standard for publishing fiscal data developed by Open Knowledge International, GIFT and the World Bank. In September of 2016, Mexico became the first country to officially endorse the OFDP, by publishing Federal Budget data  in open formats using OpenSpending tools. OpenSpending is one of Open Knowledge International’s core projects. It is a free and open platform for accessing information on government spending. OpenSpending supports civil society organisations by creating tools and standards so citizens can easily track and analyse public fiscal information globally. 

The Open Fiscal Data Package (formerly Budget Data Package) is a simple specification for publishing fiscal data.  The first iteration was developed between 2013 and 2014 in collaboration with multiple partners including the International Budget Partnership (IBP), Omidyar Network,, the Global Initiative for Fiscal Transparency (GIFT), the World Bank and others. The 0.3 version of the OFDP was released at the beginning of 2016, featuring a major revision in the structure and approach, establishing the foundation for all future work leading up to a future v1 release of the specification.

The OFDP is part of our work towards “Frictionless Fiscal Data” where users of fiscal information –  from journalists to researchers to policy makers themselves – will be able to access and analyze government data on budgets and expenditures, reducing the time it takes to gather insights and drive positive social change. The Open Fiscal Data Package enables users to generate useful visualizations like the following one in only a few clicks:

picture1Explore the visualization here.

Having a standard specification for fiscal data is essential to being able to scale this work, allowing tool-makers to automate:

  • Aggregations (e.g. how much did we spend on defence in 2014?)
  • Search (e.g. how much money did we give to IBM?)
  • Comparison (e.g. are we spending more or less than the country next door?)

We have drawn on excellent related work from similar initiatives like the International Aid Transparency Initiative (IATI), the Open Contracting Partnership, and others while aiming to keep the specification driven by new and existing tooling as much as possible.  The specification took into account existing tools and platforms, in order to ensure that adaptations are simpler and with less friction.

The Open Fiscal Data Package and its associated tooling was built to reduce the “friction” in accessing and using public fiscal information, making it much easier for governments to publish data and for users of the data, such as journalists, researchers and policy-makers, to access and analyse the information quickly and reliably.

What’s New in version 0.3?

The Open Fiscal Data Package is an extension of Tabular Data Package which itself is an extension of Data Package, an emerging standard for packaging any type of data.  Data Packages are formats for any kind of data based on existing practices for publishing open-source software. We extend this standard for fiscal data through mapping values from transaction line items in the on-disk dataset (in this case, a CSV file) to a conceptual representation of financial amounts, entities (e.g. payee/payor), classifications (e.g. COFOG), or government projects.

Our approach to describing the logical model is based heavily on the terminology and approach of OLAP (Online Analytical Processing). This approach allows answering multi-dimensional analytical queries swiftly. Through a system of community feedback via GitHub issues, we have defined methods of modeling hierarchical budgets, the “direction” of a given transaction, as well as the fiscal periods for specific spending.  In addition, we support both aggregated and transactional datasets, as well as budgets containing  “status” information (e.g. “proposed”, “approved”, “adjusted”, and “executed”).


We are committed to developing this standard in concert with developing the tooling to support it. OpenSpending Next, the next version of OpenSpending is currently working natively with the Open Fiscal Data Package.

The Future

Fiscal data comes in many forms, and we have sought to model a large variety of datasets in the simplest terms possible.  In the future, we are looking to support a wider variety of data.

In the next few months, the OpenSpending team will pilot the OFDP specification in a number of countries. The specification and the OpenSpending tools are free and available to use to any interested stakeholder. To find out more, get in touch with us on the discussion forum.

To hear more about the Open Fiscal Data Package and OpenSpending tools, join us for the Google Hangout on October 25th, 4 pm Berlin time. More details can be found here.

Copytalk: Article takedowns webinar archived / District Dispatch

An archived copy of the CopyTalk webinar “SSRN: Another enclosure of the commons” is now available. Originally webcast on October 6th by the Office for Information Technology Policy’s Copyright Education Subcommittee, our presenter was Michael Wolfe, Executive Director of the Authors Alliance. Mike talked about the Authors Alliance’s reaction to the unannounced removal of scholarly articles from the Social Science Research Network (SSRN), after its purchase by Elsevier. Included is a discussion of a set of principles that Authors Alliance members expect when depositing articles in open access repositories like SSRN.

Coffee cup

An archived copy of OITP’s recent webinar on article takedowns from the popular open access repository SSRN is now available.

Plan ahead: OITP’s free, one-hour CopyTalk webinars occur on the first Thursday of every month at 2 p.m. Eastern time/11 a.m. Pacific time. Our November 3rd webinar will focus on a new report from Public Knowledge on the systemic bias at the U.S. Copyright Office. Don’t miss it!

The post Copytalk: Article takedowns webinar archived appeared first on District Dispatch.

Jobs in Information Technology: October 19, 2016 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Roosevelt Children’s Academy Charter School, Open Position School Librarian, Roosevelt, NY

Sanibel Public Library, Information Technology Manager, Sanibel, FL

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Ars Electronica Highlights 3 / Harvard Library Innovation Lab

I’m sharing more highlights from this year’s Ars Electronica Festival. See parts one and two for even more.



The future of the lab – I saw Ivan Poupyrev talk about about the future of labs. He said a whole bunch of interesting things, but the thing stuck most with me is his advice on staffing a lab. He said something like “Bring in people that are focused on solving a problem. If they have a project you want to support and grow, bring them in and give them space to build that thing.” Totally sold on this idea. Projects often need the stamina and focus of a single person (two people feels good too) to jam it through to success.


Jllr by Benjamin Maus, Prokop Bartoníček – A beautiful, relaxing, rock sorting machine. An instrument floats over the top of a bed of rocks looking at each one. After examination, it picks up a rock and moves it to a place in a grid of rocks sorted by geological age.



Parasitic Symbiotic by Ann-Katrin Krenz – A machine that draws on trees. 😃🍃



Running Cola is Africa by Masao Kohmura, Fujino, Kouji, and Computer Technique Group – A classic piece of computer art from 1968. An algorithmic creation of frames starting with a running person, route through a bottle of cola, and end in the shape of the African continent.

An Intro to SQL for Librarians / LibUX

So Ruth Kitchin Tillman wrote a really nice intro for those of you interested in learning SQL — I pronounce it “sequel” — which, you know, you probably should.

When might it be appropriate for a librarian to learn SQL? … For example, at my place of work we have a MySQL database where we hold metadata before turning it into Fedora objects. I use SQL to extract that metadata into spreadsheets and then run update queries to improve that metadata. A lot of the work done around migrating Archivists’ Toolkit installations to ArchivesSpace involved digging into both databases. And it translates. Understanding SQL may help you better understand an API for a service you’re trying to set up. I’d definitely recommend it for metadata types and digital archives/collections types.

This last bit about better understanding APIs is, I think, more important than not. For those of you aren’t already deep in the database there is an increasing chance you’re in a content management system working with a facade, like the WP_Query() in WordPress or some other API, that ultimately exists to make it easier for you to lookup something in the database.

Design your technology to work toward diversity, inclusion, and equity / LibUX

Chris Bourg posted her talk about libraries, technology, and social justice where she makes some really great observations about our role as designers.

So here’s what happened with AirBnB – first there was an experimental study out of Harvard about a year ago showing that renters were less likely to rent to people with black sounding names; then there were several reports of renters cancelling bookings for black guests; only to then rent to white guests for the same time period. … What is interesting is that AirBnB is trying to do something about it, and they are being unusually transparent about it; so we might learn what works and what doesn’t. … What’s really interesting is that they are also working on technical features to try to eliminate instances where hosts claim a room or house is booked when a black renter makes a request; only to then immediately rent for the same time period to a white renter. Here is how they explain it: With the new feature If a host rejects a guest by stating that their space is not available, Airbnb will automatically block the calendar for subsequent reservation requests for that same trip.

She adds: “Would it have been better if they had anticipated the racist behavior enabled by their platform? Sure. But now that they are trying to make corrections, and to use technology to do it, I think there might be a real opportunity for us all to learn how we might leverage technology in combatting discrimination.”

How do we promote an inclusive perspective, and an agenda of equity in and through our tech work?

But again, this just points to the fact that if we want our technology to work towards diversity, inclusion and equity; we have to intervene and design it explicitly to do so.


DRM is Still Coming to the Web and You Should Care More / LibUX

Sign that reads "come in, we're open" as well as a fist breaking a chain wrapped around a book

Original photo by Álvaro Serrano

In March, the World Wide Web Consortium (W3C) threw out a nonaggression covenant that would safeguard people from some of the legal risk associated with building DRM (digital rights management) into the open web. This means that the charter for the HTML Media Extensions Working Group—which oversees the Encrypted Media Extensions specification—had been extended through September 2016.

This was a big deal and no one really seemed to notice.

What are Encrypted Media Extensions?

W3C is the nonprofit body governing the core technical standards for the web, responsible for ensuring that specs like HTML, WCAG (accessibility standards), and RDF are openly developed and implemented. In 2007 their HTML5 specification introduced the <video> element, which since has largely freed the web from the chokehold of third-party plugins—like Flash and Silverlight—except when that media needs to be locked down. Netflix obviously doesn’t want its viewers to right-click and download Jessica Jones the same way people save memes.

And although the capability of browsers improved so much so that the cool stuff afforded previously by plug-ins like Flash is now replicable without, our ability to Right-Click-View-Source pretty much guaranteed the persistence of those pesky, impossible-to-update Java applets. These impede the accessibility of the web and the security of its users. We knew it, the W3C knew it.

So, a few years ago, they requested an alternative:

In February 2012 several W3C members proposed Encrypted Media Extensions (EME) to extend HTMLMediaElement that would replace the need for users to download and install ’plug-ins’ with a standard API (Application Programming Interface) that would automatically discover, select and interact with a third-party’s protected content.

Encrypted Media Extensions work behind the scenes: when the browser recognizes that a video or audio happens to have one or more encrypted parts, it quickly negotiates the license then streams the content like you’d expect—no fuss, no Flash.

Folks for and against agree DRM sucks for users

As a W3C member, the Electronic Frontier Foundation has been long since involved, intending to “persuade the W3C that supporting DRM is a bad idea for the Web, bad for interoperability, and bad for the organization.”

EFF’s international director Danny O’Brien writes:

The W3C is composed of many thoughtful and experienced engineers and standards writers who know, often through personal experience, how painful digital rights management is to implement, how brittle it is in the face of inevitable attack, how privacy-invasive it can be, how damaging it can be for competition, and how often unpopular it is among consumers. … Our impression is that dominant reason why the W3C (and Tim Berners-Lee, as its tie-breaking Executive Director) has continued to permit DRM to be considered in the HTML working group is their hope that within the W3C, the worst parts of DRM might be tamed.

Tim Berners-Lee himself said as much in October 2013:

No one likes DRM as a user, wherever it crops up. It is worth thinking, though, about what it is we do not like about existing DRM-based systems, and how we could possibly build a system which will be a more open, fairer one than the actual systems which we see today. If we, the programmers who design and build Web systems, are going to consider something which could be very onerous in many ways, what can we ask in return?

The legal argument against

The controversy, however, is chiefly legal. Opponents like O’Brien claim that since “laws like the US Digital Millennium Copyright Act, Canada’s C-11, New Zealand’s Bill 92A; and accords like the European EUCD, the Central American Free Trade Agreement, and the US-Australian and US-Korean Trade Agreements” make it illegal to tamper with DRM; essentially, the open lawful development of the web cannot continue since, paradoxically, DRM is core to that development.

He goes on:

The W3C can’t fix that [paradox]. Even if its most optimistic goals of limiting the dangers of DRM came to pass—by defining strict sandboxing, say, or carefully cabining off its use in other Web standards—W3C standards could still be used to punish security researchers and attempts at interoperability. You can’t prosecute a researcher for finding a bug in your browser, or threaten someone for using your Web standard in a lawful way you didn’t approve of; but those are the dangers that hang over anyone investigating a DRM implementation.

Thus the Electronic Frontier Foundation proposed in its “Objection to the rechartering of the W3C EME group” a DRM circumvention nonaggression covenant in which W3C members agree not to sue anyone for circumventing the encrypted media specification or in disclosing any vulnerabilities therein.

This proposal was rejected.

You need to be more involved in the conversation

DRM in HTML means that organizations like libraries could possibly, more usefully, incorporate vendor-blocked content into their sites and applications without the need for users to download special software, or potentially even visit the vendors’ sites.

The argument against, however, as presented by Cory Doctorow, suggests that what net gains there could be in the user experience may be at a substantial cost:

Equally significant in the world of open standards is protecting interoperability. The normal course of things in technology is that one company may make a product that interoperates with another company’s products, provided that they don’t violate a patent or engage in some other illegal conduct. But once DRM is in the mix, interoperability is only legal with permission.

It’s an important conversation to have.

To librarians, specifically

Libraries’ predilection toward open source and direct involvement in the development of specs like RDF demonstrate greater stake in these discussions than not. There is an opportunity—albeit a diminishing one—to participate in this conversation, one that librarians on the front-lines of open access look past, unaware.

Client-side Video Tricks for IIIF / Jason Ronallo

I wanted to push out these examples before the IIIF Hague working group meetings and I’m doing that at the 11th hour. This post could use some more editing and refinement of the examples, but I hope it still communicates well enough to see what’s possible with video in the browser.

IIIF solved a lot of the issues with working with large images on the Web. None of the image standards or Web standards were really developed with very high resolution images in mind. There’s no built-in way to request just a portion of an image. Usually you’d have to download the whole image to see it at its highest resolutions. Image tiling works around a limitation of image formats by just downloading the portion of the image that is in the viewport at the desired resolution. IIIF has standardized and image servers have implemented how to make requests for tiles. Dealing with high resolution images in this way seems like one of the fundamental issues that IIIF has helped to solve.

This differs significantly from the state of video on the web. Video only more recently came to the web. Previously Flash was the predominant way to deliver video within HTML pages. Since there was already so much experience with video and the web before HTML5 video was specified, it was probably a lot clearer what was needed when specifying video and how it ought to be integrated from the beginning. Also video formats provide a lot of the kinds of functionality that were missing from still images. When video came to HTML it included many more features right from the start than images.

As we’re beginning to consider what features we want in a video API for IIIF, I wanted to take a moment to show what’s possible in the browser with native video. I hope this helps us to make choices based on what’s really necessary to be done on the server and what we can decide is a client-side concern.

Crop a video on the spatial dimension (x,y,w,h)

It is possible to crop a video in the browser. There’s no built-in way that this is done, but with how video it integrated into HTML and all the other APIs that are available there cropping can be done. You can see one example below where the image of the running video is snipped and add to a canvas of the desired dimensions. In this case I display both he original video and the canvas version. We do not even need to have the video embedded on the page to play it and copy the images over to the canvas. The full video could have been completely hidden and this still would have worked. While no browser implements it a spatial media fragment could let a client know what’s desired.

Also, in this case I’m only listening for the timeupdate event on the video and copying over the portion of the video image then. That event only triggers so many times a second (depending on the browser), so the cropped video does not display as many frames as it could. I’m sure this could be improved upon with a simple timer or a loop that requests an animation frame.

And similar could be done solely by creating a wrapper div around a video. The div is the desired width with overflow hidden and the video is positioned relative to the div to give the desired crop.

This is probably the hardest one of these to accomplish with video, but both of these approaches could probably be refined and developed into something workable.

Truncate a video on the temporal dimension (start,end)

This is easily accomplished with a Media Fragment added to the end of the video URL. In this case it looks like this:,480/default.mp4#t=6,10. The video will begin at the 6 second mark and stop playing at the 10 second mark. Nothing here prevents you from playing the whole video or any part of the video, but what the browser does by default could be good enough in lots of cases. If this needs to be a hard constraint then it ought to be pretty easy to do that with JavaScript. The user could download the whole video to play it, but any particular player could maintain the constraint on time. What’s nice with video on the web is that the browser can seek to a particular time and doesn’t even need to download the whole video to start playing any moment in the video since it can make byte-range requests. And the server side piece can just be a standard web sever (Apache, nginx) with some simple configuration. This kind of “seeking” of tiles isn’t possible with images without a smarter server.

Scale the video on the temporal dimension (play at 1.5x speed)

HTML5 video provides a JavaScript API for manipulating the playback rate. This means that this functionality could be included in any player the user interacts with. There are some limitations on how fast or slow the audio and video can play, but there’s a larger range of how fast or slow the just the images of the video can play. This will also differ based on browser and computer specifications.

This video plays back at 3 times the normal speed:

This video plays back at half the normal speed:

Change the resolution (w,h)

If you need to fit a video within a particular space on the page, a video can easily be scaled up and down on the spatial dimension. While this isn’t always very bandwidth friendly, it is possible to scale a video up and down and even do arbitrary scaling right in the browser. A video can be scaled with or without maintaining its aspect ratio. It just takes some CSS (or applying styles via JavaScript).

Rotate the video

I’m not sure what the use case within IIIF is for rotating video, but you can do it rather easily. (I previously posted an example which might be more appropriate for the Hague meeting.)

Use CSS and JavaScript safely, OK?


Two of the questions I’ll have about any feature being considered for IIIF A/V APIs are:

  1. What’s the use case?
  2. Can it be done in the browser?

I’m not certain what the use case for some of these transformations of video would be, but would like to be presented with them. But even if there are use cases, what are the reasons why they need to be implemented via the server rather than client-side? Are there feasibility issues that still need to be explored?

I do think if there are use cases for some of these and the decision is made that they are a client-side concern, I am interested in the ways in with the Presentation API and Web Annotations can support the use cases. How would you let a client know that a particular video ought to be played at 1.2x the default playback rate? Or that the video (for some reason I have yet to understand!) needs to be rotated when it is placed on the canvas? In any case I wonder to what extent making the decision that someone is a client concern might effect the Presentation API.

New Exhibition! Battle on the Ballot: Political Outsiders in US Presidential Elections / DPLA

With Election Day just three weeks away, we are pleased to announce the publication of our newest exhibition, Battle on the Ballot: Political Outsiders in US Presidential Elections. With only twenty-one days until we elect our next president, your inboxes, news feeds, and social media networks are likely abuzz with the minute-to-minute happenings in the world of polls, pundits, and party politics, yet historical perspective is sometimes hard to find. Both candidates—a billionaire businessman and the first woman nominated by a major party—approach the presidency as outsiders, reaching beyond the traditional boundaries of US presidential politics, though each in very different ways.

In Battle on the Ballot, the DPLA curation team digs into the vast collections of our partner institutions to explore the ways in which the 2016 race resonates with the legacies of the outsiders who have come before. The exhibition offers a dynamic definition of outsider and explores the rich history of select individuals, parties, events, and movements that have influenced US presidential elections from the outside—outside Washington politics, outside the two-party system, and outside the traditional conception of who can be an American president.

  • Have Americans elected past presidents with no political experience?
  • What third parties have successfully impacted election outcomes?
  • Who were some of the earliest women to run for president?
  • What happens when parties and politicians organize around fear of outsiders?
  • How have African Americans exercised their political power—as voters and candidates—in presidential elections over the last fifty years?

Explore the answers to these questions and more in the exhibition.

View the Exhibition

Battle on the Ballot: Political Outsiders in US Presidential Elections was curated using materials contributed by institutions across our partner network. In particular, we would like to thank the Digital Library of Georgia, Missouri Hub, North Carolina Digital Heritage Center, and California Digital Library for their assistance in creating this exhibition.  

The image in the featured banner comes from the collection of University of North Carolina at Chapel Hill via North Carolina Digital Heritage Center.

Why Did Institutional Repositories Fail? / David Rosenthal

Richard Poynder has a blogpost introducing a PDF containing a lengthy introduction that expands on the blog post and a Q&A with Cliff Lynch on the history and future of Institutional Repositories (IRs). Richard and Cliff agree that IRs have failed to achieve the hopes that were placed in them at their inception in a 1999 meeting at Santa Fe, NM. But they disagree about what those hopes were. Below the fold, some commentary.

Poynder sets out the two competing visions of IRs from the Santa Fe meeting. One was:
The repository model that the organisers of the Santa Fe meeting had very much in mind was the physics preprint server arXiv. ... As a result, the early focus of the initiative was on increasing the speed with which research papers were shared, and it was therefore assumed that the emphasis would be on archiving papers that had yet to be published (i.e. preprints).
The other was:
However, amongst the Santa Fe attendees were a number of open access advocates. They saw OAI-PMH as a way of aggregating content hosted in local - rather than central - archives. And they envisaged that the archived content would be papers that had already been published, rather than preprints. ... In other words, the OA advocates present were committed to the concept of author self-archiving (aka green open access). The objective for them was to encourage universities to create their own repositories and then instruct their researchers to deposit in them copies of all the papers they published in subscription journals. As these repositories would be on the open internet outside any paywall the papers would be freely available to all. And the expectation was that OAI-PMH would allow the content from all these local repositories to be aggregated into a single searchable virtual archive of (eventually) all published research.
Poynder's summary of the state of IRs is hard to dispute:
So while the OA movement may now appear unstoppable there is a growing sense that both the institutional repository and green OA have lost their way. It is not hard to see why. Not only are most researchers unwilling to self-archive their papers, but they remain sceptical about open access per se. Consequently, despite a flood of OA mandates being introduced by funders and institutions, most IRs remain half empty. What content they do contain often consists of no more than the bibliographic details of papers rather than the full text. More strikingly, many of the papers in IRs are imprisoned behind "login walls", which makes them accessible only to members of the host institution (and this is not just because of publisher embargoes). As a result, the percentage of content in IRs that is actually open access is often pretty low. Finally, since effective interoperability remains more aspiration than reality searching repositories is difficult, time-consuming and deeply frustrating.
A small part of this is because OAI-PMH was, as Herbert van de Sompel and Michael Nelson pointed out in Reminiscing About 15 Years of Interoperability Efforts, insufficiently "webby" to be an effective basis for aggregated search across IRs. A larger cause was inadequate investment in IRs:
What has surely also limited what IRs have been able to achieve is that by and large they have been seriously under resourced. This point was graphically made in 2007 by erstwhile repository manager Dorothea Salo. Her conclusion nine years ago was: there is need for a "serious reconsideration of repository missions, goals, and means."
My analysis of the major causes is different, and differs between the advocates of pre- and post-print IRs:
  • The pre-print IR advocates missed the key advantage that subject as opposed to institutional repositories have for the user; each is a single open-access portal containing all the pre-prints (and for essentially all the papers) of interest to researchers in that subject. The idea that a distributed search portal built on OAI-PMH would emerge to allow IRs to compete with subject repositories demonstrates a lack of understanding of user behavior in the Web.
  • The post-print IR advocates were naive in thinking that a loose federation of librarians with little institutional clout and few resources could disrupt a publishing oligopoly generating many billions of dollars a year on the bottom line. It should have been clear from librarians experience of "negotiating" subscriptions that, without strong support from University presidents and funding agencies, the power of the publishers was too great.
There was an interesting discussion in the comments to the blog post, to which Poynder responded here.

Is the Framework Elitist? Is ACRL? / Meredith Farkas


Many of you who read my blog already know that I came to librarianship from social work, where I was a child and family psychotherapist. As a therapist, one of our major guiding documents (whether we liked it or not) was the DSM (Diagnostic and Statistical Manual of Mental Disorders). The DSM determined what things were considered “real” mental disorders and what the diagnostic criteria for each disorder were. It’s so important to the mental health fields that we actually had to memorize the majority of it in grad school. Many therapists, psychologists, and psychiatrists disagree with aspects of the DSM or the entirety of the DSM. I personally felt like it pathologized a lot of things that were not pathological (like being a bright, energetic little boy), but I still had to use the DSM to diagnose my clients so that we could bill Medicaid for their treatment. I didn’t let it influence how I looked at or treated my clients though because I didn’t have to. Therapists have different views of mental illness, work with different populations, and provide different types of therapy (solution-focused, cognitive-behavioral, narrative, etc.). There is no consensus in the mental health community that a particular approach to therapy or technique is the best (probably because different approaches work for different people and illnesses), and our individual approaches are guided by a mix of theory, experience, and our own personal biases.

Similarly, we librarians work in all sorts of different contexts with different populations. We have different approaches to teaching information literacy and there is no one agreed upon approach that people have found is best. Our approaches have probably been influenced by theory, experience, and our own biases and probably change over time. In that context, where neither we nor our patrons are interchangeable widgets, the idea that any guiding document or vision of information literacy is going to meet everyone’s needs is laughable. Given that diversity, it makes sense to develop a guiding document that is as flexible and as little prescriptive as possible, since we’re not billing insurance for our services.

At least that’s my view of things, but it clearly does not jive with Christine Bombaro of Dickinson College who argues in her Viewpoint article in Reference Services Review that “The Framework is Elitist” (sorry for recommending you read something so long, but it’s an easy skim). Before I get into critiquing the content of the article, I want to say that I was frankly dismayed that an article like this would be published in one of our best instruction-focused peer-reviewed journals. While it is a “Viewpoint” article, it reads like a ranty one-sided blog post; like something that would be published here (I can’t find another one in RSR’s archives that is similar in length or tone). I also find it funny that the article is published in a non-OA publication when so much of the article is about how the education regarding the ACRL Framework for Information Literacy for Higher Education (from now on called the Framework) is inaccessible to most librarians who don’t have big professional development budgets.

But let’s get to the meat of it. I think Bombaro would have been well-served by focusing solely on why the Framework was elitist, but her article is a litany of complaints, many of which have nothing to do with elitism. Here are the ones I was able to tease out (there may be more):

  1. The Framework is not consistent with Threshold Concepts theory
  2. Threshold concepts cannot be taught
  3. The Framework is meant to be adapted by the individual institution.
  4. The Framework suggests that disciplinary faculty must be involved in the teaching of information literacy, which is totally unrealistic
  5. The Framework is more about validating librarians as scholars than supporting our work with students
  6. The Framework inspired divisions between “philosopher librarians” and “practical librarians”
  7. The Framework made her feel foolish because it referenced theory she wasn’t familiar with
  8. All of the professional development experiences she’s gone to re: the Framework have been given by people unqualified to be teaching how to use the Framework in instruction and assessment
  9. The Framework both requires us to completely change our instruction programs AND it’s not really so different from the recently-rescinded Information Literacy Competency Standards (from now on called the Standards) anyway (I have no idea how to reconcile these claims, but she seems to make them both)
  10. The concerns of people who valued the ACRL Standards were ignored and not addressed in a meaningful way by the ACRL Board
  11. The Standards were rescinded by the ACRL Board without any call for public feedback from the membership
  12. There has been little in the way of free support for librarians looking for ways to implement the Framework in their libraries

It looks to me like #6 might suggest that some librarians are elitist and #’s 10, 11, and 12 definitely suggest that ACRL is elitist. None of these really suggest that the Framework itself is elitist. In fact, I would suggest that the Framework is the opposite of elitist, recognizing that we don’t all work with exactly the same populations and need to define learning outcomes for our own context.

To me, the ACRL Information Literacy Standards seemed elitist because they proffered a very specific definition of what an information literate person looks like with, basically, a list of diagnostic criteria. To me, a person is successfully information literate if they are and feel they are successful at finding and using information in their lives, and I think that probably looks different depending on the individual. The Standards also viewed information literacy as a mechanical task focused on what we can see, ignoring the changes in thinking and feeling that come with being information literate. Information literacy is about empowerment, and I saw none of that in the Standards. The Framework suggests six basic concepts that are part of being information literate, though they make it clear that there are likely others (and others have created additional “Frames”). The dispositions and knowledge practices, though not totally dissimilar from what we saw in the standards, embraces that a lot of that internal thinking, feeling, and understanding that is not necessarily as visible in a students’ work, but is the real meat of becoming information literate. While I did think it was weird to base our guiding document on threshold concepts, I loved how much more realistic and human (and less mechanistic and elitist) the Framework was versus the Standards.

Anyone who has read my blog for a while knows that there are few things I hate more than the creation of false dichotomies; the rhetorical us vs. them tool. We’ve seen that used a lot in politics over the years, but never more than in this election where Trump has positioned himself (somehow) as being the antithesis of the moneyed, liberal, intellectual elite. Bombaro made her dichotomy quite clear — it’s the “philosopher librarians” vs. the “practical librarians.” This was primarily in the context of discussions on the ACRL Framework listserv, but she used it liberally throughout the document, which I found incredibly divisive. The “philosopher librarians” had more advanced degrees, worked at moneyed libraries, had faculty status, got to take sabbaticals to explore philosophy, and loved to talk about theory. The “practical librarians” did not have PhDs, did not have tons of professional development funding, did not have access to sabbaticals, and were focused on their day-to-day work with patrons. While she hedged and said that librarians could fall into either category or both at different times, she described the “philosopher librarians” in very negative terms, even calling their contributions to the listserv “supercilious.”

I remember reading some of those contributions to the listserv myself. There was a group of mostly (or all?) men who were talking about theory a great deal and one in particular who seemed to suggest that to understand the Framework, you basically had to read a gazillion books about learning theory. I call bullshit on that, but I didn’t find most of the contributions superior at all (especially not Bill Badke who you called out, Christine and who has been so generous in sharing his work and ideas over so many years); they were just geeking out on theory in ways that I never will. The pedagogical theories that I have read have enriched my teaching hugely. I don’t like to talk about theory that much, nor am I nearly as well-read as a lot of my friends who geek out on theory, but I also don’t feel less-than because theory isn’t as much my jam. I might have found some of their conversations annoying or baffling, but we all have things that we’re really into that we want to talk about all the time. I’m sure my colleagues find my undying love of Bruce Springsteen annoying.

I saw a lot of insecurity in what Christine Bombaro was writing (a feeling I know well) and she came out and said that these conversation made her feel “frankly stupid.” While I agree that one of our major guiding professional documents should be easy for anyone to understand, to call the contributions of a whole group of people “supercilious” and to call the Framework “elitist” because you didn’t understand the dialogue on a listserv seems amazingly anti-intellectual. There are loads of things I don’t know or understand well (a-ha! I knew math was elitist!). And I have lots of friends who are deep into pedagogical theory and are also some of the best and most passionate instruction librarians I know. I may be less into theory, but it doesn’t make me in any way less than. I’ve learned a ton from them and they’ve also learned from me. Diversity in our profession is a good thing.

Another thing that bugged me a bit is that while Bombaro seems to have little regard for the “philosopher librarians” there is one whose opinions she seems to hold in high esteem: Lane Wilkinson. Apparently Lane (who is lovely, but holy moly, he writes a lot of high-minded stuff about philosophy) using terms like “agent-relative” and “reductionist” is not supercilious because he was criticizing the Framework. You can’t have it both ways. I feel like Bombaro would have been far better served leaving out her critique of the Framework not being 100% consistent with Threshold Concepts theory. Maybe I’m crazy, but, to me, theory is not some monolithic immoveable thing. It’s not made of concrete. It’s not incapable of being changed, tweaked, and adapted for different purposes. Maybe this is heretical, but I see nothing wrong with taking the aspects of theories that work for us and molding them to our purposes. Then again, I’m the lunatic who teaches the BEAM Model without the M, so you should probably ignore me. Intellectual impurity!!!

I’m no expert on threshold concepts (for that, talk to my dear friend Amy Hofer), but I do think you can teach to threshold concepts and create learning experiences for students focused on helping them move through a threshold. It appears that Bombaro is conflating teaching with learning and also crossing a threshold. Information literacy, because it is so complex and internal, is not easy to learn. It’s not a formula you can memorize nor facts you can digest, though lord knows librarians have tried to oversimplify aspects of it for decades. One hour of dipping them in the information literacy tea is not going to make them information literate, nor are we capable of making students learn at all (that’s their choice). We play our very small role in the big picture of their learning experience and all we can do is hope we have opened their minds a little more to these things. They may cross that threshold days, weeks, months, or years later, or not at all; all I hope is that I helped them come a little closer to it.

And that’s why making sure that our faculty feel equipped to teach information literacy within their curriculum is so critical. Bombaro might call that elitist (because at most libraries collaborating with faculty is really hard to do), but I call it reality. A daunting reality, but the reality we live in nonetheless. At my library this year, we are working on developing a toolkit to support faculty in teaching information literacy. It will contain videos we’ve created packaged with suggested in-class activities, worksheets, assignments, and lesson plans (for those keeping score, yes, I did this at PSU too). We’re also working on our marketing communication to faculty, which right now is rather hit-or-miss. Some of us have also talked about getting funding to pay faculty to attend research assignment design workshops that we would teach. We want to brand ourselves as partners in pedagogy who can advise them in their teaching of information literacy. This is not easy work by any means, but it’s probably actually more important than teaching students ourselves because students will have so much more contact with their disciplinary instructors.

What I read between the lines (or really, in the lines) of Bombaro’s article was that she was perpetually searching for authority. She wanted ACRL to tell her how to use the Framework in her library. She wanted answers from the ACRL Framework listserv that were authoritative, not musings on the theoretical aspects of the Framework and discussions about how “non-experts” implemented it. She wanted to go to a conference and learn the right way to implement the Framework, not to hear from “people who were largely unqualified” (talk about elitist!). Authority, authority, authority. This reminds me of students who just want to be told whether a source is good or not. They must hate it when we tell them “well, it depends on how you intend to use it.” It’s the same here. Context matters. We are all struggling with this stuff; no one is really an expert at implementing the Framework and whoever pretends to be is probably a charlatan. We can learn from each other, but we can’t expect that anyone is going to have all the answers.

What I agree with Christine Bombaro 100% about is her criticisms of ACRL and the ACRL Board. There were so many people who made it clear that they relied on the Standards for their teaching, assessment work, and accreditation, and so many who argued that the Standards and the Framework could coexist. Whether the latter was true or not (and whether or not people REALLY needed the Standards for accreditation, which I always wondered about), the fact that so many dues-paying ACRL members were so concerned about this should have merited additional work and communication, not just time. There were no calls for comment or open discussions with the ACRL membership before the Standards were rescinded by the Board at the ALA Annual Conference this past summer. The lack of openness and transparency was pretty stunning. While I didn’t care much about them being rescinded because I didn’t feel like it would affect my work, I did think it was a foolish move politically for ACRL.

Given the controversy among the membership around the Framework, the ACRL Board should not have filed the Standards without requiring a comprehensive plan for providing accessible (read: free) professional development around the Framework for its members. Bombaro is not alone in her concerns about the Framework and this has left a huge number of academic librarians feeling alienated from their professional home. The Sandbox was a great idea, but it’s still not here and I’m honestly incredulous that they rescinded the Standards before anything concrete really was provided to ACRL members to support their adoption of the Framework. It’s no surprise that people feel like the ACRL Board is elitist; their concerns were ignored!

I won’t tar all of ACRL with the same brush. ACRL Publishing has been an incredible trailblazer in terms of open access and the Instruction Section has done a brilliant and innovative job of engaging members who can’t attend the twice-yearly conferences. There are probably other groups within the Division that are doing a lot for members who don’t have any or much professional development funding. But ACRL as a whole has not exactly done a lot to support members who do not have big professional development budgets. Reading Bombaro’s article started to make me wonder what value ACRL provides me anymore, now that I do not have professional development funding and am focusing my service work at the state level. I pay my dues each year like a robot, but why? I especially wonder why people like Bombaro, who feel totally ignored by and alienated from ACRL now, should keep paying their dues. I wish Bombaro’s criticism had been more focused on ACRL itself, because it’ll be easy for many to write-off her article as a one-sided rant.

Maybe I’m naive, but I still can’t understand why people feel like everything has to change because of the Framework. I guess it should come as no surprise that I really don’t care whether the Framework or the Standards are the “law of the land” because I’m going to do what is best for my students at my institution. ACRL is not our boss, not an instruction cop, nor an accrediting agency that requires us to follow their standards to the letter. The Framework has influenced my teaching, assessment work, and instructional outreach focus, but neither I, nor my colleagues, felt like we had to totally change the way we did instruction and assessment because of this. That might be because we weren’t in lockstep with the Standards either. Bombaro even admits in the beginning of her article that the Framework has enriched her teaching, so she, like me, has found aspects of it that are useful. Take those and run with them! There is no requirement that you change something if it’s working well for your students. In the end, that is all that matters.

Image summary: A screenshot from this YouTube video on logical falacies

Islandoracon Workshop Survey / Islandora

Islandoracon is coming up next May in Hamilton, ON. During the conference, we will have a full day of workshops to help you expand your Islandora skills. Workshops will be 90 minutes long and will run in two tracks, so we'll have a total of eight. 

The topics in our survey have been suggested by the Islandora community as potential workshops. Please select your top three choices. We will use the results of this survey to determine the schedule of workshops offered at Islandoracon.

You can fill out the one-question survey here.

More information about Islandoracon here.

A Thorough-as-hell Intro to the Kano Model / LibUX

In the 1980s, Noriaki Kano — a professor and consultant in the field of quality management — observed how customer satisfaction is determined by whether the service is good and if it meets customers’ expectations about what it should be. We can suss out these expectations and plan features that satisfy them (this know-your-users paradigm is central to user-experience design). However, features play off one another, and one that’s poorly implemented negates the benefits of the others.

If you like, you can download the MP3 or subscribe to LibUX on StitcheriTunes, YouTube, Soundcloud, Google Music, or just plug our feed straight into your podcatcher of choice.

MyData 2016 – What we learned about personal data and where to go from here? / Open Knowledge Foundation

This piece is the final installment of a three-part series of posts from MyData 2016 – an international conference that focused on human-centric personal information management. The conference was co-hosted by the Open Knowledge Finland chapter of the Open Knowledge Network. Part 1 looked at what personal data has to do with open data and Part 2 looked at how access to personal data is linked to wider social issues.

The MyData2016 conference came to an end a couple of weeks ago, and we are now even past the International Open Data Conference, but given the discussions that emerged, it is clear this is only the beginning of something bigger. While MyData is still a vague concept, the conference started many processes that might evolve into something tangible.  During the conference I met participants that enlightened me about the MyData concept, reminding that conference is more than panels and workshops, but also about the human connection.


As I described in my first blog post in the series, I was keen to understand what the connection was between MyData and open data. Now, two weeks later and hours of going over the materials, I still have more questions than answers. Open Data is a techno-legal definition of data; MyData is still less clear. The borders between ‘My Data’, private data, and public data are sometimes blurry and undefined, and there is a need for regulation and open debate about these issues. However, the open data world gives inspiration to the MyData world, and MyData conference was an excellent opportunity for the two communities to learn from one another and think ahead.

“The borders between ‘My Data’, private data, and public data are sometimes blurry and undefined, and there is a need for regulation and open debate about these issues.”

What is MyData? One of the terms that were thrown in the air was “The Internet of Me.”  At first, this sounds to me a very millennial description (which brings, for me at least, a bad connotation). Lucie Burgess, from The Digital Catapult, shed a different light on the term. This, in her view, means that we put people, not companies or technical terms, at the center of the internet.

To me, it reminded me of Evgeny Morozov’s concept of ‘Internet-centric’ – when we give the term ‘The internet’ life of its own. When we give the internet life, we sometimes forget that humans are creating it actively, and other parts of the net are passive, like the data that we provide to companies just by using their services. We forget that the internet is what it is because of us. The ‘Internet of Me’ puts the ordinary citizen at the heart of that beast we call ”the internet”. It is a  decentralized shift, the idea that we can control our data, our information.

Lucie about Internet of me:


Credit: Pouyan MohseniniaCredit: Pouyan Mohseninia

What does it mean though when it comes to different types of data? Here is an example from one of the main promises in the field of MyData – the health sector. Health data is one of the most delicate data types out there. Having MyData as a way to make data sharing in the health sector safer and more responsible can assist many to unlock the promise of big and small health datasets to make not only services in the field better but also to improve research and human lives.

Health data raise some important questions – Who owns the data in official health registries? What is the line between MyData and public data? The way is still long, but the conference (and the Ultrahack) helped to shape some new thinking about the topic and look for new use cases.

Here is Antti Tuomi-Nikula, from THL, the Finnish Ministry of health and welfare, speaking about the potential of MyData and the answers we still need to answer:


The question of the border between personal and public data is also a concern to governments. In the last decade, many governments at different levels of jurisdiction are going through efforts to improve their services by using data for better policies. However, government personnel, in particular, local government personnel, often do not have the knowledge or capacity to have a better data infrastructure and release public data in an open way. MyData therefore, looks like a dream solution in this case. I was excited to see how the local municipalities in Finland are already examining and learning about this concept, taking into considerations the challenges this brings.

Here is Jarkko Oksala, CIO of the city of Tampere, the second biggest city in Finland speaking about MyData, and what the open Data community should do in the future:


On the one hand, the MyData concept is the ability to allow one to take control of their data, make it open to be used when they want to. When it comes to the open data community, MyData gives us all another opportunity – to learn. Open Data and MyData are frameworks and tools, not the ends. It was good to see how people come to expand their horizons and acquire new tools to achieve some of our other goals.

ultrahack3Ultrahack in action. Credit Salla Thure

One of the great side events that help to facilitate these learnings was the UltraHack, a three-day hack that tried to make the very vague concept of open data into actual use. Interesting enough, a lot of the hackathon work involved some open data as well. Open Knowledge in Finland is an expert in organizing hackathons, and the vibrant, energetic spirit was there for the whole three days.

These spirits also attracted visitors from Estonia, who crossed the bay and came to learn about hackathons and the different types of data. It was very surprising for me to see that Estonians see Finland as a place to learn from since I assumed that because Estonia is known for its progressive e-gov services, it would similarly excel at creating an open data empire. I guess that the truth is much more complicated than this, and I was very lucky to learn about the situation there. We are also excited to have our first Open Knowledge event in Estonia a couple of weeks ago to discuss setting up a group there. This would not come to life without the meetings we had in Helsinki.

Here is Maarja-Leena Saar speaking about this topic with me:


The Open Knowledge community indeed came to learn. I met School of Data Fellow Vadym  Hudyma from Ukraine, who works with the Engine room about privacy and responsible data. Vadym brought up many important points, like the fact that we should stop looking at the binary of consent of giving personal data, and how we need to remember the people behind the data points we gather.



“We discussed what we want to do with our data and the question of privacy and the willingness too of people to share and to create open data from private data.”

I also met members from Open Knowledge chapters in Japan, Switzerland, Sweden, and Germany.  They came to share their experiences but, also to learn about the different opportunities of MyData. For me, it is always good to catch up with chapters and see their point of view on various topics. Here are some useful insights I got from Walter Palmetshofer from OKF DE, who started to think about MyData concept already in 2011. We discussed what we want to do with our data and the question of privacy and the willingness too, of people to share and to create open data from private data.

More of my conversation with Walter here


All in all, I am grateful for the opportunity I had to go and learn at MyData 2016. It gave me a different perspective on my usual work on open data and open government and allowed me to explore the internet for me. This is, I hope, just the beginning, and I would like to see what other members of the network have to say about this topic.

A big thank you to the members of Open Knowledge Finland and in particular Salla Thure, who hosted me so well and helped me to find my way around the conference. Special thanks also to Jo Barratt, Open Knowledge International’s own audio guru for editing my interviews. Watch this space for his audio blog post from the GODAN summit!

Querying Wikidata to Identify Globally Famous Baseball Players / Ted Lawless

Earlier this year I had the pleasure of attending a lecture by Cesar Hidalgo of MIT's Media Lab. One of the projects Hidalgo discussed was Pantheon. Pantheon is a website and dataset that ranks "globally famous individuals" based on a metric the team created called the Historical Popularity Index (HPI). A key component of HPI is the number of Wikipedia pages an individual has in in various languages. For a complete description of the project, see:

Yu, A. Z., et al. (2016). Pantheon 1.0, a manually verified dataset of globally famous biographies. Scientific Data 2:150075.

Since the Pantheon project relies mostly on open data, I wondered if I could apply some of their techniques to look at the historical significance of Major League Baseball players.

Identifying famous baseball players using Wikidata

When the Pantheon team was assembling their data in 2012 - 2013 they considered using Wikidata rather than Wikipedia and Freebase data dumps but they found, at that time, it wasn't quite ready in terms of the amount of data. A lot has changed since then, data has accumulated in Wikidata and there are various web services for querying it, including a SPARQL endpoint.

Querying for total Wikipedia pages

With Wikidata's SPARQL support, we don't have to parse data dumps from Wikipedia and Freebase to do some initial, Pantheon inspired exploration. We can write a single SPARQL query to find entities (players) and the number of Wikipedia language pages each have.

Here is the query, I used for this exercise.

SELECT ?player ?playerLabel ?brId (COUNT(DISTINCT(?sitelink)) as ?sites)
    ?player wdt:P31 wd:Q5 .
    ?player wdt:P1825 ?brId .
    ?sitelink schema:about ?player .
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
GROUP BY ?player ?playerLabel ?brId

I'm restricting to instances with a Baseball Reference ID (Wikidata property P1825 rather than those with the because when I initially ran this query I found many non-professional baseball players with the occupation (P106 of baseball player (Q10871364 in Wikidata. This included former U.S. President George H.W. Bush and the actor and comedian Billy Crystal. These people played baseball at one time, which is interesting in a different way, but not in the MLB.

Retrieving the Baseball Reference ID in another way. I can use it to join the knowledge stored in Wikidata with other sources, like Baseball Reference or the Lahman Baseball Database. This is one of the aspects that I find most promising with Wikidata, it can serve as an identifier hub that allows users to join data from many sources, each of which have unique aspects.


Using the results from this SPARQL query we are able to rank players by the number of Wikipedia language pages written about them. The top 10 is as follows.

This top 10 list is filled with some of baseball's all time greats, including Babe Ruth at number one, which seems right. But there is at least one surprise, Jim Thorpe coming in at sixth. Thorpe had a remarkable athletic career in multiple sports but only played briefly in the MLB, so he's not often in discussions of baseball's great players.

I've also uploaded a csv file containing players that have 9 or more Wikipedia language pages, which means they are in the top 250 players (or) when ranked by number of language pages.

Digging deeper

Now that we have a list of globally famous baseball players determined by the number of Wikipedia pages in various languages, we can dig a little deeper and try to understand if fame has anything to do with actual performance on the baseball field.

Wins Above Replacement - WAR

Baseball Reference, calculates a metric called Wins Above Replacement (WAR). Describing WAR in detail is beyond the scope of this post but, briefly, WAR is a metric that attempts to capture how much a player is better than the average, or a replacement, player. If a player has a WAR of 2 for a season, that means his team won 2 more games than they would have if they would have used a replacement player instead. WAR attempts to cover all facets of the game, hitting, fielding, and base running. In recent years, WAR has begun to receive more attention from baseball media since it tries to capture the complete value of a player rather than a single aspect, like batting average.

WAR can also be a valuable way to rank players over the course of a career. Baseball Reference publishes a list of the top 1,000 players of all time based on career WAR. Here again, Babe Ruth tops the list here too. But, does WAR, or performance on the field, relate at all to global fame?

To investigate this question, I grabbed the top 50 players by career WAR from Baseball Reference. Since WAR is calculated differently for position players and pitchers, I've focused this exercise just on position players.

I merged the career WAR information with the Wikidata information using the Baseball Reference IDs. I then generated a rank for each player based on the number of Wikipedia language pages and career WAR ranking. The full data is available as CSV and inline below.

At least a few things stand out in this list.

  • some players, like the legendary Yankees centerfielder Joe DiMaggio have significant higher fame scores than WAR scores (5th vs 42nd). This can because of non-baseball reasons (served in WWII and was married to Marilyn Monroe) or because of a relatively short impactful, career.

  • other players performed well on the field but aren't as famous. Lou Whitaker and George Davis are both ranked in the top 50 for career WAR but not in in the top 2000 players when ranked by Wikipedia language pages.

  • there are still relatively few players that could be considered "globally famous" when thinking of history as a whole. The Pantheon team set a threshold of 25 language pages when they ran their analysis. At this time, only 12 players would meet that mark.

  • the list seems weighted towards players who have played during the last 10-15 years. We could use the Baseball Reference data to verify that.

To pursue these basic comparisons further, and produce more meaningful results, I would want to take a look at the other data used by the Pantheon team, like Wikipedia page views, and the time period of the careers to develop a HPI-like metric for baseball players. We could also try to isolate by team, era, etc or investigate which languages are writing about baseball players and see if we can gleam any cultural insight from that.


The main takeaway for me is that using Wikidata and SPARQL, and the methods from the Pantheon project, we can relatively quickly explore global fame for groups people we are interested in. Using identifiers stored in Wikidata, we can join information from the Wikidata knowledge base with external, domain specific, sets of data that can allow us to dig deeper.

UX Means You / Aaron Schmidt

At the end of our 2014 book, Useful, Usable, Desirable: Applying User Experience Design to Your Library, Amanda Etches and I left readers with what we consider to be an important and inspiring ­message:

“Every decision we make affects how people experience the library. Let’s make sure we’re creating improvements.”

When we wrote this, it resonated with me. But over time, I’ve come to understand just how crucial it is for us to dwell on it.
Selecting materials for purchase? It impacts UX. Choosing replacement carpet for the YA department? It impacts UX. Taping up a sign? Changing the hours of operation? Cleaning the restroom? Waiving or collecting a fine? Creating the structure on your website? Yes, all of these things impact the experience you’re giving library members. And I could go on.

As important as the “decisions” message is, I realized that it could be communicated in a more straightforward, simplified, and effective manner:

All librarians are UX librarians. This means you. I hope you’ll take the role seriously.

Where to begin? Taking into account the wants, needs, and preferences of library members is a good start. If you’re on board with the above, chances are that you’re already doing this. But a lone-wolf UX-minded ­librarian can make only so much progress in a vacuum. Since everyone impacts the library’s UX, everyone has to be on board.

It is no small task to create an organization that thinks critically about UX and effectively crafts experiences.

Here are some ideas to get you started:

Conduct a library UX assessment, highlighting both what the library is doing well and areas for improvement. This can help open people’s eyes to UX issues, and it will also help you identify some potential initial projects.
Studying and assessing the entire library is a great way to engage the whole organization, but if that seems daunting, consider conducting some usability tests. They are quick and easy to administer and can help you demonstrate that library members often have wants, needs, and preferences that are different from those of ­librarians.

A third idea to get started: go on some Service Safaris. This technique will give everyone practice in analyzing and describing experiences. Having the skills vocabulary to describe experiences is essential For more on Service Safaris see “Stepping out of the Library” (LJ 3/1/12).

Libraries are complex beasts. There’s no magic wand that you can wave for instant UX greatness. It is worth acknowledging that big changes may take time to happen. Staff need to be trained; issues must be studied. There might even be a talent management component. Hiring the right folks or reassigning roles could potentially be valuable.

However, long-term goals are no excuse for studying things to death, or to delaying changes via endless committee meetings. You’ll need to make changes—even if they’re small at first—to engage staff and let them know that their efforts are being rewarded with actual impact.

In order to maintain momentum and have a long-term focus, you’ll need a plan. Consider answering this question: “What do we want to do in the next year to improve library UX?” Work together to set goals, and ensure they’re well known throughout the organization. Break down those goals into actionable items, determine who is responsible for doing what, and get to it.

As you carry out your plan throughout the year, be sure to acknowledge milestones and celebrate small victories. This will keep everyone’s UX morale high and encourage people to stick with the long-term plan. Consider having a monthly UX meeting and a weekly “What have we done for UX?” all-staff email.

If your organization is lucky enough to have someone on staff with the title UX Librarian, that’s great. The UX ­Librarians I know are invaluable guides for their organizations. Having a ringleader to think about the big UX picture and mentor the organization is most definitely a good thing. Still, just because your library doesn’t have a titled UX Librarian doesn’t mean you’re off the hook. Take up the mantle, find some allies, create a cross-departmental UX team, and go for it.

This first appeared in Library Journal’s “The User Experience.”

Looking Back at iCampMO / Islandora

We wrapped up the last Islandora Camp of 2016 last week in Kansas City, MO. 34 Islandorians from across the US and Canada came together to share how they use Islandora and learn more about how to extend it. We covered topics such as the State of Islandora CLAW, how the University of Missouri does Islandora, and how great custom sites like Bowing Down Home are put together. For the full slate of sessions, check out the Camp Schedule.

Kansas City was also a beautiful place to visit during this time of year. A couple of rainy mornings aside, we had beautiful fall weather and a gorgeous location at the Kauffman Conference Center. We also got to sample Kansas City Barbeque, which is certainly an experience. Our camp social at Char Bar involved sharing this:

Which made us glad to have such a scenic area to go for nice long walks in afterwards. Walks that were punctuated with so




(for the record, the water flows frog-> child on that last one)

In short, iCampMO was a blast. Our sincere thanks to the University of Missouri Kansas City for hosting the event, and to Sandy Rodriguez for taking care of local arrangements. We look forward now to our next Islandora gathering: the second Islandoracon, in Hamilton, ON, Canada.

User Research for Everyone: Conference Notes / Shelley Gullikson

This was a virtual conference from Rosenfeld Media; a full day of sessions all about user research. Have a look at the program to see what a great lineup of speakers there was. Here are the bits that stood out for me.

Erika Hall: Just Enough Research

First off, Erika won me over right away with her first slide:

Slide text: Hello! You will need to imagine the emphatic gesturing.

I found she spoke more about the basic whys and hows of research, rather than how to do “just enough,” but she was so clear and engaging that I really enjoyed it anyway. Selected sound bites:

  • Keep asking research questions, but the answers will keep changing
  • Assumptions are risks
  • Research is fundamentally destabilizing to authority because it challenges the power dynamic; asking questions is threatening
  • Think about how your design decisions might make someone’s job easier. Or harder. (and not just your users, but your colleagues)
  • Focus groups are best used as a source of ideas to research, not research itself
  • 3 steps to conducting an interview: set up, warm up, shut up
  • You want your research to prove you wrong as quickly as possible

Leah Buley: The Right Research Method For Any Problem (And Budget)

Leah nicely set out stages of research and methods and tools that work best for each stage. I didn’t take careful notes because there was a lot of detail (and I can go back and look at the slides when I need to), but here are the broad strokes:

  • What is happening around us?
    • Use methods to gain an understanding of the bigger picture and to frame where the opportunities are (futures research fits in here too – blerg)
  • What do people need?
    • Ethnographic methods fit in nicely here. Journey maps can point out possible concepts or solutions
  • What can we make that will help?
    • User research with prototypes / mockups. New to me was the 5-second test, where you show a screen to a user for 5 seconds, take it away and then ask questions about it. (I’m guessing this assume that what people remember corresponds with what resonates with them – either good or bad.)
  • Does our solution actually work?
    • Traditional usability testing fits in here, as does analytics.
    • I kind of like how this question is separated from the last, so that you think about testing your concept and then testing your implementation of the concept. I can imagine it being difficult to write testing protocols that keep them separate though, especially as you start iterating the design.
  • What is the impact?
    • Analytics obviously come into play here, but again, it’s important to separate this question about impact from the previous one about the solution just working. Leah brought up Google’s HEART framework: Happiness, Engagement, Adoption, Retention, and Task Success. Each of these is then divided into Goals (what do we want?), Signals (what will tell us this?), and Metrics (how do we measure success?).

Nate Bolt: How to Find and Recruit Amazing Participants for User Research

Recruiting participants is probably my least favourite part of user research, but I’m slowly coming around to the idea that it will always be thus. And that I’m incredibly lucky to be constantly surrounded by my target audience. Nate talked about different recruitment strategies, including just talking to the first person you see. For him, one of the downsides of that was that the person is unlikely to be in your target audience or care about your interface. Talking to the first person I see is how I do most of my recruiting. And it also works really well because they are very likely to be in my target audience and care about my interface. Yay!

One comment of Nate’s stood out most for me: If someone doesn’t like your research findings, they’ll most likely attack your participants before they’ll attack your methods. This is familiar to me: “But did you talk to any grad students?” “Were these all science students?” Nate recommended choosing your recruitment method based on how likely these kinds of objections are to sideline your research; if no one will take your results seriously unless your participants meet a certain profile, then make sure you recruit that profile.

Julie Stanford: Creating a Virtual Cycle: The Research and Design Feedback Loop

Julie spoke about the pitfalls of research and design being out of balance on a project. She pointed out how a stronger emphasis on research than design could lead to really bad interfaces (though this seemed to be more the case when you’re testing individual elements of a design rather than whole). Fixing one thing can always end up breaking something else. Julie suggested two solutions:

  1. Have the same person do both research and design
  2. Follow a 6-step process

Now, I am the person doing both research and design (with help, of course), so I don’t really need the process. But I also know that I’m much stronger on the research side than on the design side, so it’s important to think about pitfalls. A few bits that resonated with me:

  • When evaluating research findings, give each issue a severity rating to keep it in perspective. Keep an eye out for smaller issues that together suggest a larger issue.
  • Always come up with multiple possible solutions to the problem, especially if one solution seems obvious. Go for both small and large fixes and throw in a few out-there ideas.
  • When evaluating possible solutions (or really, anytime), if your team gets in an argument loop, take a sketch break and discuss from there. Making the ideas more concrete can help focus the discussion.

Abby Covert: Making Sense of Research Findings

I adore Abby Covert. Her talk at UXCamp Ottawa in 2014 was a huge highlight of that conference for me. I bought her book immediately afterward and tried to lend it to everyone, saying “youhavetoreadthisitsamazing.” So, I was looking forward to this session.

And it was great. She took the approach that making sense of research findings was essentially the same as making sense of any other mess, and applied her IA process to find clarity. I took a ridiculous amount of notes, but will try to share just the highlights:

  • This seems really obvious, but I’m not sure I actually do it: Think about how your method will get you the answer you’re looking for. What do you want to know? What’s the best way to find that out?
  • Abby doesn’t find transcriptions all that useful. They take so much time to do, and then to go through. She finds it easier to take notes and grab the actual verbatims that are interesting. And she now does her notetaking immediately after every session (rather than stacking the sessions one after another). She does not take notes in the field.
  • Abby takes her notes according to the question that is being asked/answered, rather than just chronologically. Makes analysis easier.
  • When you’re doing quantitative research, write sample findings ahead of time to make sure that you are going to capture all the data necessary to create those findings. Her slide is likely clearer:
    Slide from Abby Covert's talk
  • Think about the UX of your research results. Understand the audience for your results and create a good UX for them. A few things to consider:
    • What do they really need to know about your methodology?
    • What questions are they trying to answer?
    • What objections might they have to the findings? Or the research itself?
  • In closing, Abby summarized her four key points as:
    1. Keep capture separate from interpretation
    2. Plan the way you capture to support what you want to know
    3. Understand your audience for research
    4. Create a taxonomy that supports the way you want your findings to be used

I have quite a few notes on that last point that seemed to make sense at the time, but I think “create a good UX for the audience of your results” covers it sufficiently.

Cindy Alvarez: Infectious Research

Cindy’s theme was that research – like germs – is not inherently lovable; you can’t convince people to love research, so you need to infect them with it. Essentially, you need to find a few hosts and then help them be contagious in order to help your organization be more receptive to research. Kind of a gross analogy, really. But definitely a few gems for people finding it difficult to get any buy-in in their organization:

  • Create opportunities by finding out:
    • What problems do people already complain about?
    • What are the areas no is touching ?
  • Lower people’s resistance to research:
    • Find out who or what they trust (to find a way in)
    • Ask point-blank “What would convince you to change your decision?”
    • Think about how research could make their lives worse
    • “People are more receptive to new ideas when they think it was their idea.” <– there was a tiny bit of backlash on Twitter about this, but a lot of people recognized it as a true thing. I feel like I’m too dumb to lie to or manipulate people; being honest is just easier to keep track of. If I somehow successfully convinced someone that my idea was theirs, probably the next day I’d say something like “hey, thanks for agreeing with my idea!”
  • Help people spread a message by giving them a story to tell.
  • Always give lots of credit to other people. Helping a culture of research spread is not about your own ego.

Final thoughts

It’s been interesting finishing up this post after reading Donna Lanclos’ blog post on the importance of open-ended inquiry, particularly related to UX and ethnography in libraries. This conference was aimed mostly at user researchers in business operations. Erika Hall said that you want your research to prove you wrong as quickly as possible; essentially, you want research to help you solve the right problem quickly so that you can make (more) money. All the presenters were focused on how to do good user research efficiently. Open-ended inquiry isn’t about efficiency. As someone doing user research in academic libraries, I don’t have these same pressures to be efficient with my research. What a privilege! So I now want to go back and think about these notes of mine with Donna’s voice in my head:

So open-ended work without a hard stop is increasingly scarce, and reserved for people and institutions who can engage in it as a luxury (e.g. Macarthur Genius Grant awardees).  But this is to my mind precisely wrong.  Open exploration should not be framed as a luxury, it should be fundamental.

… How do we get institutions to allow space for exploration regardless of results?

Polymer + Firebase Makefile / Alf Eaton, Alf

I’m at the Polymer Summit in London today, and finally writing up a build process for Polymer apps hosted on Firebase that I’m satisfied with enough to recommend.

The problem is this: if you create an app using web components installed with Bower, and import each component using <link rel="import" href="bower_components/component-name/component-name.html">, the bower_components folder will be inside your app folder.

This is fine for development, but when you deploy the app to a remote server (e.g. Firebase), you don’t want all the files in the bower_components folder to be deployed to the public server as part of your app - not only does it take much longer to upload, but there could be security problems with demo files hiding in any of the component folders.

So you need a build process, which builds the public app in a build folder and deploys that folder instead, without the `bower_components` folder.

The Makefile for this build process is quite straightforward:

.PHONY: clean lint build deploy serve

all: build

    rm -rf build

    #   node_modules/polylint/bin/polylint.js

build: clean lint
    cp -r app build
    node_modules/vulcanize/bin/vulcanize --inline-scripts --inline-css app/elements.html -o build/elements.html

deploy: build
    firebase deploy --public build

    firebase serve

Running make build builds the app: first making a copy of the app folder called build, then running vulcanize on app/elements.html (which includes all the <link rel="import" href="…"> imports) to create a single build/elements.html file containing all the imported code. It inlines all the script and style dependencies of those components, so everything is imported from build/elements.html instead of individual files in bower_components.

There are a few more things the build process can/could do: lint the app using polylint (currently disabled as it finds too many errors in third-party components) and lint the JavaScript using standard. It could also run crisper to move all the inline JS and CSS into separate files, but I haven’t done that here.

Running make deploy runs firebase deploy, setting build as the public folder. The “ignore” setting in the firebase.json file (see below) tells it not to upload anything in bower_components, so in the end only a few files are uploaded and deployed.

  "hosting": {
    "public": "app",
    "ignore": [
    "rewrites": [ {
      "source": "**",
      "destination": "/index.html"
    } ]

index.html is untouched, leaving it readable as a manifest and allowing it to be loaded and parsed quickly.

Law on ANT / Ed Summers

Here are some more notes from a reading that followed on from Nicolini (2012): a description of ANT by John Law, who worked closely with Latour and Callon to define it.

Law, J. (2009). The New Blackwell Companion to Social Theory, chapter Actor Network Theory and Material Semiotics, pages 141–158. Wiley-Blackwell, Oxford.

Law is careful to state up front that ANT is not a theory, and is instead a set of descriptive practices. It can be defined in the abstract, but it is best understood by looking at how it works in practice. Perhaps this is where Nicolini got the idea of the toolkit from, which eschews theory, and places ANT firmly in a practitioner space:

As a form, one of several, of material semiotics, it is best understood as a toolkit for telling interesting stories about, and interfering in, those relations. More profoundly, it is a sensibility to the messy practices of relationality and materiality of the world. (p. 141-142)

Law mentions that key to understanding ANT is the idea of translation, that was first introduced by [Michel Serres] and used by Latour and Callon in their early work on ANT. Translation is making two words equivalent, but since no two words are equivalent translation is about betrayal or shifting and linking words. He is also situates ANT as a scaled down, or empirical version of what Foucault calls discourses or epistemes. He also compares translation to Deleuze’s idea of nomadic philosophy. He also draws a parallel between Delueze’s assemblage or agencements and ANT. Just as an aside it’s interesting to think about how this philosophical work involving networks w was germinating in France in the the 1970s and 1980s and then we see the Web itself being created in the late 1980s.

Here are some features of Actor Network Theory as originally conceived:

Material Durability: social arrangements delegated into non-bodily physical form tend to hold their shape better than those that simply depend on face-to-face interaction

Strategic Durability: actor network conception of strategy can be understood more broadly to include teleologically ordered patterns of relations indifferent to human intentions

Discursive Durability: discourses define conditions of possibility, making some ways of ordering webs of relations easier and others difficult or impossible

And then here are some features of what Law calls the New Material Semiotic a more “polytheistic” version of ANT that he groups under the Deluezian heading Diaspora. Interestingly he cites Star (1990) as offering one of the earliest critiques of ANT, from a feminist perspective.

Performativity: the social is not constructed, it is enacted or performed, and its in these performances that they can be understood and studied.

Multiplicity: a given phenomenon can be understood as a confluence of practices, that aren’t different perspectives on the same phenomenon, but are actually different practices that may be coordinated for some duration Mol (2002). Aside: it’s kind of interesting that Mol has been one of the points of connection between my independent study on practice theory and my ethnographic methods class this semester.

Fluidity: the ability of objects and practices to mutate, change shape, reconfigure and persist.

Realities and Goods: networks create multiple overlapping ethical realities

Much of this discussion is oriented around the work of Haraway, Mol, Moser and Verran. He ends on this note:

To describe the real is always an ethically charged act. But, and this is the crucial point, the two are only partially connected: goods and reals cannot be reduced to each other. An act of political will can never, by itself, overturn the endless and partially connected webs that enact the real. Deconstruction is not enough. Indeed, it is trivial (Latour, 2004). The conclusion is inescapable: as we write we have a simultaneous responsibility both to the real and to the good. Such is the challenge faced by this diasporic material semiotics. To create and recreate ways of working in and on the real while simultaneously working well in and on the good.


Latour, B. (2004). Why has critique run out of steam? From matters of fact to matters of concern. Critical Inquiry, 30(2), 225–248.

Mol, A. (2002). The body multiple: Ontology in medical practice. Duke University Press.

Nicolini, D. (2012). Practice theory, work, and organization: An introduction. Oxford University Press.

Star, S. L. (1990). Power, technology and the phenomenology of conventions: On being allergic to onions. The Sociological Review, 38(S1), 26–56.

Maybe IDPF and W3C should *compete* in eBook Standards / Eric Hellman

A controversy has been brewing in the world of eBook standards. The International Digital Publishing Forum (IDPF) and the World Wide Web Consortium (W3C) have proposed to combine. At first glance, this seems a sensible thing to do; IDPF's EPUB work leans heavily on W3C's HTML5 standard, and IDPF has been over-achieving with limited infrastructure and resources.

Not everyone I've talked to thinks the combination is a good idea. In the publishing world, there is fear that the giants of the internet who dominate the W3C will not be responsive to the idiosyncratic needs of more traditional publishing businesses. On the other side, there is fear that the work of IDPF and Readium on "Lightweight Content Protection" (a.k.a. Digital Rights Management) will be a another step towards "locking down the web". (see the controversy about "Encrypted Media Extensions")

What's more, a peek into the HTML5 development process reveals a complicated history. The HTML5 that we have today derives from a a group of developers (the WHATWG) who got sick of the W3C's processes and dependencies and broke away from W3C. Politics above my pay grade occurred and the breakaway effort was folded back into W3C as a "Community Group". So now we have two, slightly different versions of HTML, the "standard" HTML5 and WHATWG's HTML "Living Standard". That's also why HTML5 omitted much of W3C's Semantic Web development work such as RDFa.

Amazon (not a member of either IDPF or W3C) is the elephant in the room. They take advantage of IDPF's work in a backhanded way. Instead of supporting the EPUB standard in their Kindle devices, they use proprietary formats under their exclusive control. But they accept EPUB files in their content ingest process and thus extract huge benefit from EPUB standardization. This puts the advancement of EPUB in a difficult position. New features added to EPUB have no effect on the majority of ebook user because Amazon just converts everything to a proprietary format.

Last month, the W3C published its vision for eBook standards, in the form on an innocuously titled "Portable Web Publications Use Cases and Requirements".  For whatever reason, this got rather limited notice or comment, considering that it could be the basis for the entire digital book industry. Incredibly, the word "ebook" appears not once in the entire document. "EPUB" appears just once, in the phrase "This document is also available in this non-normative format: ePub". But read the document, and it's clear that "Portable Web Publication" is intended to be the new standard for ebooks. For example, the PWP (can we just pronounce that "puup"?) "must provide the possibility to switch to a paginated view" . The PWP (say it, "puup") needs a "default reading order", i.e. a table of contents. And of course the PWP has to support digital rights management: "A PWP should allow for access control and write protections of the resource." Under the oblique requirement that "The distribution of PWPs should conform to the standard processes and expectations of commercial publishing channels." we discover that this means "Alice acquires a PWP through a subscription service and downloads it. When, later on, she decides to unsubscribe from the service, this PWP becomes unavailable to her." So make no mistake, PWP is meant to be EPUB 4 (or maybe ePub4, to use the non-normative capitalization).

There's a lot of unalloyed good stuff there, too. The issues of making web publications work well offline (an essential ingredient for archiving them) are technical, difficult and subtle, and W3C's document does a good job of flushing them out. There's a good start (albeit limited) on archiving issues for web publications. But nowhere in the statement of "use cases and requirements" is there a use case for low cost PWP production or for efficient conversion from other formats, despite the statement that PWPs "should be able to make use of all facilities offered by the [Open Web Platform]".

The proposed merger of IDPF and W3C raises the question: who gets to decide what "the ebook" will become? It's an important question, and the answer eventually has to be open rather than proprietary. If a combined IDPF and W3C can get the support of Amazon in open standards development, then everyone will benefit. But if not, a divergence is inevitable. The publishing industry needs to sustain their business; for that, they need an open standard for content optimized to feed supply chains like Amazon's. I'm not sure that's quite what W3C has in mind.

I think ebooks are more important than just the commercial book publishing industry. The world needs ways to deliver portable content that don't run through the Amazon tollgates. For that we need innovation that's as unconstrained and disruptive as the rest of the internet. The proposed combination of IDPF and W3C needs to be examined for its effects on innovation and competition.

Philip K. Dick's Mr. Robot is
one of the stories in Imagination
Stories of Science and Fantasy
January 1953. It is available as
an ebook from Project Gutenberg
and from GITenberg
My guess is that Amazon is not going to participate in open ebook standards development. That means that two different standards development efforts are needed. Publishers need a content markup format that plays well with whatever Amazon comes up with. But there also needs to be a way for the industry to innovate and compete with Amazon on ebook UI and features. That's a very different development project, and it needs a group more like WHATWG to nurture it. Maybe the W3C can fold that sort of innovation into its unruly stable of standards efforts.

I worry that by combining with IDPF, the W3C work on portable content will be chained to the supply-chain needs of today's publishing industry, and no one will take up the banner of open innovation for ebooks. But it's also possible that the combined resources of IDPF and W3C will catalyze the development of open alternatives for the ebook of tomorrow.

Is that too much to hope?

ROUND-UP: Fedora Camp NYC Overview / DuraSpace News

Austin, TX  The Fedora Project is set to hold a three-day Fedora Camp in New York City, hosted by Columbia University Libraries at the Butler Library, November 28-30, 2016.  In this post you will find an overview of what will be covered on each day, and details regarding instructors and scheduling.

Telling DSpace Stories at The National Library of Finland with Samu Viita / DuraSpace News

“Telling DSpace Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about DSpace implementations for the community and beyond. The following interview conducted by Carol Minton Morris with Samu Viita includes personal observations that may not represent the opinions and views of the National Library of Finland or the DSpace Project.

What’s your role with DSpace at your organization or institution?

10:21 PM EDT / Ed Summers


Q: Why was there such a large bump in the number of Hillary and Trump followers at 10:21 PM EDT during the second Presidential Debate?

A: I’m not sure, but maybe it has something to do with abortion or marriage equality?

During the last two presidential debates MITH has been collecting Twitter data using the words trump and hillary. Neil Fraistat’s motivation for organizing our Night Against Hate this week was to identify social media accounts for hate groups that are in the Southern Poverty Law Center’s Extremist Files, and then use this information to help see how these groups were using Trump’s own words in their tweets. So it was super to see How Trump Took Hate Groups Mainstream in Mother Jones just yesterday. Sarah Posner and David Neiwert have been on this trail for a while, and their write up is a must read.

We’re still talking about what we want to look for in the debate datasets and how we want to use the spreadsheet that we collaboratively built. Look for another blog post about the Night Against Hate soon.

One limitation of looking for patterns in collected Twitter data is that the Presidential Debates are high volume events. We know that the Twitter streaming API only gives us a portion of all the available tweets when there is a spike in traffic. For example our dataset for the first debate contains 1,303,084 tweets that mentioned “hillary” and the Twitter API let us know that at least 730,512 tweets were not delivered. Very little is known about what kind of sample Twitter provides, and without the full picture it is difficult to draw inferences from the numbers.


But one interesting metric that is available is the number of followers a person has. This number can be particularly interesting to look at over time as it rises and falls. Every tweet that you get from the Twitter API includes a User Object which in turn contains the followers_count for that user at the time that the request was made of the Twitter API. So if you collect data in real time from the Twitter Streaming API you get a moving picture of a user’s followers. What’s also super handy is that in every retweet you get the user object for the retweeter, and also for the user who sent the original tweet.

During the debates you can count on many people retweeting the candidates tweets. This means you can get smooth record of how many followers they have by the minute … or even second. The fact that you’re not getting all the tweets at a given point in time doesn’t really matter.

So I wrote a very simple utility that reads through Twitter data I collected with twarc looking for particular users and collects how many followers they have at particular times and writes it out as a CSV file, for analysis in a spreadsheet:

New Followers

I thought it could be interesting to look at the number of new followers each candidate received during the first debate. In theory these would be “people” that had decided to follow Clinton or Trump on Twitter as they were watching the debate. I put people in scare quotes there because it is possible, and perhaps even likely, that there are bot armies creating these accounts. You’d have to track who these new accounts were and sample them to know for sure.

So with that caveat in mind here’s what the graph for the first debate looked like:

It was encouraging (for me) to see that Hillary was gaining more followers than Trump during the debate. There were a few interesting changes in the graph, but I was mostly distracted by the fact that Hillary was outpacing Trump so much in new followers. We collected data for the second debate so I decided to look again with the same code:

I was pleased to see Clinton is gaining more supporters than Trump again; but this graph looks noticeably different. Do you see the bump in followers at 10:21 PM EDT (02:21 GMT). What happened there? Well, thanks to the Web you can see exactly what happened there: it was the question from Beth Miller about the Supreme Court that begins at 10:19:54:

If you watch the time tick by you can see in the exact minute of 10:21 PM Hillary is saying this:

I want a Supreme Court that will stick with Roe v. Wade and a woman’s right to choose, and I want a Supreme Court that will stick with marriage equality. Now, Donald has put forth the names of some people that he would consider. And among the ones that he has suggested are people who would reverse Roe v. Wade and reverse marriage equality. I think that would be a terrible mistake and would take us backwards.

What is truly bizarre is that there is a bump at that point for both candidates. Maybe this points at a problem in my code, or my spreadsheet data (I’m happy to share the Tweet ID datasets if you would like to look for yourself). Or perhaps the bump is part of some kind of backlog in Twitter’s infrastructure? Maybe there’s a bot army that’s working for both candidates? It would be necessary to inspect the users to get a sense of that.

But maybe, just maybe, this data points at the fact that a Woman’s right to choose and marriage equality are still polarizing and hot button issues, more so than any others, for folks in this years election? At least for folks who use Twitter…which is definitely not all voters. That’s probably the biggest caveat there is.

If you have ideas, questions, criticisms about any of this I’d love to hear from you.


I’m not sure my rate of change calculation was the best. I simply subtracted the new subscribers in the previous minute from new subscribers in the current minute. It does show that Trump got a bigger bump in this minute, and really highlights how much of a change it was. If you have an idea for calculating the rate of change that is better take a look at the data.

OITP on National Broadband Research Agenda / District Dispatch

In response to a request from the National Telecommunications and Information Administration (NTIA) and the National Science Foundation, this week ALA’s Office for Information Technology Policy, along with our partners at the Benton Foundation, filed comments on the National Broadband Research Agenda. The NTIA’s Request for Comment (RFC) asked for help identifying the data and research needs in the areas of broadband technology and innovation and prioritizing research proposals can foster access and adoption of broadband across unserved and underserved groups of people.

two ethernet cable ends

This week OITP filed comments on the National Broadband Research Agenda.

Among a number of things, we recommended that the NTIA and NSF “ensure that research addresses the needs of unserved and underserved groups (e.g., seniors, low-income families, persons with disabilities, people living in rural areas) that market actors usually ignore in the development of new broadband technologies.”

With explicit regard to libraries, the comments advised that the National Broadband Research Agenda:

  • “investigate the role of mobile Wi-Fi hotspot lending programs in libraries across the U.S. to promote broadband adoption and utilization inside and outside the home.”
  • “investigate, through qualitative research, the role of public libraries in promoting broadband adoption in rural areas across the U.S… [to] build on recent quantitative research, which shows that higher rates of broadband adoption can be found in rural areas with public libraries.”
  • “identify the role of anchor institutions and community organizations in supporting long-term social and economic development goals through their broadband adoption and utilization programs.”
  • “fund cross-disciplinary research, particularly across the fields of education, social work, community health, and library and information science, to deepen our understanding of the non-price barriers to broadband access, adoption, and use… [which] could also help to create solutions to the persistent challenges facing individuals and families from low-income and other vulnerable populations.”

OITP will continue to engage with allies and the NTIA to advance an inclusive broadband research agenda.

The post OITP on National Broadband Research Agenda appeared first on District Dispatch.

Welcome, Andrea! / Equinox Software

Equinox is excited to announce the newest member of our team:  Andrea Buntz Neiman!  Andrea has filled the position of Project Manager for Software Development and began work this week.  In her new role, she will coordinate with customers, developers, and other stakeholders to make sure everyone stays on the same page about development projects.

Andrea received her BA in Music at St. Mary’s College in Maryland before completing her MLS at University of Maryland College Park.  She worked at the Library of Congress for three years  on various special projects in the Music Division and Recorded Sound Section.  She then spent 11 years in public libraries.  Andrea has worked in almost every area of the public library world and has accumulated quite the Summer Reading Shirt collection.

In 2008, she helped her library migrate to Evergreen, making it the first (and so far only) public library in Maryland to use an open source ILS.  Grace Dunbar, Equinox Vice President, remarked; “I have been Andrea’s biggest fan ever since she gave an entire conference presentation on the many wonderful attributes of the Item Status screen.   The team at Equinox always been impressed with her work in the community and we are excited to have her as part of our amazing group.”

Andrea’s hobbies include gardening, cooking (and eating), shopping at thrift stores, sewing, and baseball.  She is a lifelong Baltimore Orioles fan and is still nursing her heartbreak over their recent Wild Card loss.  She has every intention of learning to play the bass guitar some day.  

C4L17: Call for Presentation/Panel proposals / Code4Lib


Code4Lib 2017 is a loosely-structured conference that provides people working at the intersection of libraries/archives/museums/cultural heritage and technology with a chance to share ideas, be inspired, and forge collaborations. For more information about the Code4Lib community, please visit

The conference will be held at the Luskin Conference Center at UCLA, from March 6, 2017 - March 9, 2017. More information about Code4lib 2017 will be coming soon.

We encourage all members of the Code4Lib community to submit a proposal for a prepared talk. Prepared talks should focus on one or more of the following areas:

-Projects you've worked on which incorporate innovative implementation of existing technologies and/or development of new software
- Tools and technologies – How to get the most out of existing tools, standards, and protocols (and ideas on how to make them better)
- Technical issues – Big issues in library technology that are worthy of community attention or development
- Relevant non-technical issues – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc.

This year, in order to provide increased opportunities for a diversity of speakers and topics, we'll be soliciting 10, 15, and 20 minute talks. You'll be asked to indicate which talk lengths you would be willing to accommodate for your proposal. We are also considering holding a poster session at this year's conference. If you would be interested in presenting your topic as a poster, please indicate so on the form.

In addition to "traditional" presentations and posters, we plan to include a panel session this year. If you have a topic you'd like to suggest for a panel, and are willing to work with the Program Committee to organize / recruit for the session, please use the following form.

As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. The top 10 proposals are guaranteed a slot of their preferred length at the conference. The Program Committee will curate the remainder of the program in an effort to ensure diversity in program content and presenters. Community votes will, of course, still weigh heavily in these decisions.

Presenters whose proposals are selected for inclusion in the program will have conference registration slots held for them (up to 2 speakers per talk). In addition, panel participants will have registration slots held. The standard conference registration fee will apply.

Proposals can be submitted through November 7, 2016 at midnight PST (GMT−8). Voting will start on November 16, 2016 and continue through December 7, 2016. The URL to submit votes will be announced on the Code4Lib website and mailing list and will require an active account to participate. The final list of presentations will be announced in December.

Thank you,
The Code4Lib 2017 Program Committee

Hydra Connect 2016 / Brown University Library Digital Technologies Projects

Last week I attended Hydra Connect 2016 in Boston, with a team of three others from the Brown University Library. Our team consisted of a Repository Developer, Discovery Systems Developer, Metadata Specialist, and Repository Manager. Here are some notes and thoughts related to the conference from my perspective as a repository programmer.


There was a poster about IPFS, which is a peer-to-peer hypermedia protocol for creating a distributed web. It’s an interesting idea, and I’d like to look into it more.

APIs and Architecture

There was a lot of discussion about the architecture of Hydra, and Tom Cramer mentioned APIs specifically in his keynote address. In the Brown Digital Repository, we use a set of APIs that clients can access and use from any programming language. This architecture lets us define layers in the repository: the innermost layer is Fedora and Solr, the next layer is our set of APIs, and the outer layer is the Studio UI, image/book viewers, and custom websites built on the BDR data. There is some overlap in our layers (eg. the Studio UI does hit Solr directly instead of going through the APIs), but I still think it improves the architecture to think about these layers and try not to cross multiple boundaries. Besides having clients that are written in python, ruby, and php, this API layer may be useful when we migrate to Fedora 4 – we can use our APIs to communicate with both Fedora 3 and Fedora 4, and any client that only hits the APIs wouldn’t need to be changed to be able to handle content in Fedora 4.

I would be interested in seeing a similar architecture in Hydra-land (note: this is an outsider’s perspective – I don’t currently work on CurationConcerns, Sufia, or other Hydra gems). A clear boundary between “business logic” or processing and the User Interface or data presentation seems like good architecture to me.

Data Modeling and PCDM

Monday was workshop day at Hydra Connect 2016, and I went to the Data Modeling workshop in the morning and the PCDM In-depth workshop in the afternoon. In the morning session, someone mentioned that we shouldn’t have data modeling differences without good reason (ie. does a book in one institution really have to be modeled differently from a book at another institution?). I think that’s a good point – if we can model our data the same way, that would help with interoperability. PCDM, as a standard for how our data objects are modeled, might be great way to promote interoperability between applications and institutions. In the BDR, we could start using PCDM vocabulary and modeling techniques, even while our data is in Fedora 3 and our code is written in Python. I also think it would be helpful to define and document what interoperability should look like between institutions, or different applications at the same institution.

Imitate IIIF?

It seems like the IIIF community has a good solution to image interoperability. The IIIF community has defined a set of APIs, and then it lists various clients and servers that implement those APIs. I wonder if the Hydra community would benefit from more of a focus on APIs and specifications, and then there could be various “Hydra-compliant” servers and clients. Of course, the Hydra community should continue to work on code as well, but a well-defined specification and API might improve the Hydra code and allow the development of other Hydra-compliant code (eg. code in other programming languages, different UIs using the same API, …).

Versioning in the Caselaw Access Project / Harvard Library Innovation Lab

We have a data management dilemma, and we hope that you – data smart people of the world – can help us out. We need a versioning and change tracking system for around 50 million XML files, and no existing solutions seem to fit.

About The Project

The Caselaw Access Project or CAP, previously known as Free The Law, is making all U.S. case law freely accessible online. For more information, see our project page, and this New York Times article.

Our Tracking Task

Like most digitization projects, we generate many page images. The binary image files rarely change and are not difficult to track. However, in addition to images, we create rich XML files containing descriptive/structural metadata and OCR. As we uncover mistakes in the OCR, encounter metadata anomalies, and gather new data through CAP-facilitated research projects, we will need to update these files. Tracking those changes is going to be a bit more difficult.

The Files

We are scanning about 37,000 volumes. Each volume contains multiple pages (obviously) and multiple cases. Usually, a case takes up a few pages, but some cases are so small that several can fit on one page, so there’s no direct parent/child relationship between them. Cases never span volumes.

If you’re interested in checking out a case for yourself, you can grab a sample case with all the associated files here.

How we split these things up into files:

    For each volume:

  • One METS XML file with all volume-level metadata (~ 1 MB avg)
    For each page side:

  • One lossless jp2 (~2.5 MB avg)
  • One 1-bit tiff (~60 KB avg)
  • One ALTO v3 XML file (~75 KB avg)
    For each case:

  • One METS XML file, which includes the text of each case body, and all case-level metadata (~75 KB avg)
    The Scale

  • Roughly 37k volumes, so about 37,000 volume XML files
  • Roughly 40mil page-sides, so that many jp2s, tiffs, and ALTO XML files
  • A bit fewer than 10 million Cases, so that many Case METS XML files

Our key requirements:

Data Set Versioning

Ideally this could be done at the corpus or series level (described below.) This would be useful to researchers working with larger sets of data.

Sanitizable Change Tracking

As is the case with most change-tracking systems, when recording changes, we usually want to be able to ascertain the state of the data before the change, whether this is by recording the old version and the new version, or the delta between the two versions. However, with some change types, we do require the ability to either delete the delta or the old data state. Ideally, we would be able to do this without removing the entire change history for the file.

File Authentication

People should be able to check if the version of the file they have is, or ever has been in our repository.

Open Data Format

Even if the change/versioning data isn’t natively stored in an easily human-readable format, it must at least be exportable into a useful open format. No strictly proprietary solutions.

Access Control

We have to be able to control access to this data.

Our Wish List

  • FOSS (Free Open Source Software) Based Solution
  • Diffing — allow downstream databases to fetch deltas between their current version and the latest
  • Minimal system management overhead
  • Ability to efficiently distribute change history with the data, ideally in a human-readable format
  • XML-aware change tracking, so changes can be applied to XML elements with the same identifiers and content, in different files
  • Will automatically detect replacement images

What we’ve considered, and their disadvantages


  • Dataset is much too large to store in a single repository
  • Non-plain-text change history
  • Redacting a single file requires rewriting large portions of the tree
    Media Wiki

  • Not geared to handle XML data
  • Would require storing in a different format/syncing
  • Non-plain-text change history
  • Provides sanitizable change tracking but no versioning of larger data sets

  • Non-plain-text change history
  • Seems to not allow easy sanitization of change history

  • P2P Architecture doesn’t give us enough access control for the first phase of the project.
    Something we write ourselves

  • Reinvents the wheel, at least in part
  • Probably not as efficient as more mature tools

Should the data be restructured?

Currently, the repository is fairly flat with each volume in its own directory, but no other hierarchy.

Files could be partitioned by “series.” A series is a numbered sequence of volumes from a particular court, such as the Massachusetts Reporter of Decisions. The largest series so far contains approximately 1k volumes, 750k pages, and 215k cases, but they are rather inconsistently sized, with the smallest containing only one volume, and the average containing 71. There are 635 series in total.

Many data consumers will want only case files, and not per-page or per-volume files. It may make sense to store case XML files and non-case-XML files in separate repositories.

What We Need From You

Ideas. We want to make sure that we get this right the first time. If you have insight into solving problems like this, we’d love to hear from you.

Next Steps

Please reach out to us at

Code4Lib 2017 / Code4Lib

The Code4Lib 2017 Los Angeles Committee is pleased to announce that we have finalized the dates and location of the 2017 conference.

The 2017 conference will be held from March 6 through March 9 in the at the Luskin Conference Center at UCLA. With a very large and modern conference center at our disposal, Code4Lib 2017 will be able to accomodate the growing number of attendees while also retaining that small, tight-knit Code4Lib community feeling of previous years. We hope you can come join us!

More details to come soon; in the meantime, the Keynote Committee is about to close submissions for the conference keynote speaker, so be sure to nominate a keynote speaker on the Code4Lib wiki before Friday, October 14th. Proposals for prepared talks are also currently open and will be accepted until November 7th. This year, there is a new, separate process for panel proposals which we are very excited about. Proposals for pre-conference workshops are also currently open and will be accepted until November 8th.

Also, the Sponsorship Committee is actively looking for sponsors for 2017; more information about how to get in touch with the committee will be forthcoming.

It’s shaping up to be a great conference this year, and there will be lots more opportunities to volunteer. Our team is looking forward to seeing you on March 6!

~ The Code4Lib 2017 Los Angeles Committee

Tiny road trip: An Americana travelogue / Eric Lease Morgan

This travelogue documents my experiences and what I learned on a tiny road trip including visits to Indiana University, Purdue University, University of Illinois / Urbana-Champagne, and Washington University In St. Louis between Monday, October 26 and Friday, October 30, 2017. In short, I learned four things: 1) of the places I visited, digital scholarship centers support a predictable set of services, 2) the University Of Notre Dame’s digital scholarship center is perfectly situated in the middle of the road when it comes to the services provided, 3) the Early Print Project is teamed with a set of enthusiastic & animated scholars, and 4) Illinois is very flat.

Lafayette Bloomington Greenwood Crawfordsville

Four months ago I returned from a pseudo-sabbatical of two academic semesters, and exactly one year ago I was in Tuscany (Italy) painting cornfields & rolling hills. Upon my return I felt a bit out of touch with some of my colleagues in other libraries. At the same time I had been given an opportunity to participate in a grant-sponsored activity (the Early English Print Project) between Northwestern University, Washington University In St. Louis, and the University Of Notre Dame. Since I was encouraged to visit the good folks at Washington University, I decided to stretch a two-day visit into a week-long road trip taking in stops at digital scholarship centers. Consequently, I spent bits of time in Bloomington (Indiana), West Lafayette (Indiana), Urbana (Illinois), as well as St. Louis (Missouri). The whole process afforded me the opportunity to learn more and get re-acquainted.

Indiana University / Bloomington

My first stop was in Bloomington where I visited Indiana University, and the first thing that struck me was how much Bloomington exemplified the typical college town. Coffee shops. Boutique clothing stores. Ethnic restaurants. And teaming with students ranging from fraternity & sorority types, hippie wanna be’s, nerds, wide-eyed freshman, young lovers, and yes, fledgling scholars. The energy was positively invigorating.

My first professional visit was with Angela Courtney (Head of Arts & Humanities, Head of Reference Services, Librarian for English & American Literature, and Director of the Scholars’ Commons). Ms. Courtney gave me a tour of the library’s newly renovated digital scholarship center. [1] It was about the same size at the Hesburgh Libraries’s Center, and it was equipped with much of the same apparatus. There was a scanning lab, plenty of larger & smaller meeting spaces, a video wall, and lots of open seating. One major difference between Indiana and Notre Dame was the “reference desk”. For all intents & purposes, the Indiana University reference desk is situated in the digital scholarship center. Ms. Courtney & I chatted for a long hour, and I learned how Indiana University & the University Of Notre Dame were similar & different. Numbers of students. Types of library collections & services. Digital initiatives. For the most part, both universities have more things in common than differences, but their digital initiatives were by far more mature than the ones here at Notre Dame.

Later in the afternoon I visited with Yu (Marie) Ma who works for the HathiTrust Research Center. [2] She was relatively new to the HathiTrust, and if I understand her position correctly, then she spends a lot of her time setting up technical workflows and the designing the infrastructure for large-scale text analysis. The hour with Marie was informative on both of our parts. For example, I outlined some of the usability issues with the Center’s interface(s), and she outlined how the “data capsules” work. More specifically, “data capsules” are virtual machines operating in two different modes. In one mode a researcher is enabled to fill up a file system with HathiTrust content. In the other mode, one is enabled to compute against the content and return results. In one or the other of the modes (I’m not sure which), Internet connectivity is turned off to disable the leaking of HathiTrust content. In this way, a HathiTrust data capsule operates much like a traditional special collections room. A person can go into the space, see the book, take notes with a paper & pencil, and then leave sans any of the original materials. “What is old is new again.” Along the way Marie showed me a website — Lapps Grid — which looks as if it functions similar to Voyant Tools and my fledgling EEBO-TCP Workset Browser. [3, 4, 5] Amass a collection. Use the collection as input against many natural language processing tools/applications. Use the output as a means for understanding. I will take a closer look at Lapps Grid.

Purdue University

The next morning I left the rolling hills of southern Indiana for the flatlands of central Indiana and Purdue University. There I facilitated a brown-bag lunch discussion on the topic of scalable reading, but the audience seemed more interested in the concept of digital scholarship centers. I described the Center here at Notre Dame, and did my best to compare & contrast it with others as well as draw into the discussion the definition of digital humanities. Afterwards I went to lunch with Micheal Witt and Amanda Visconti. Mr. Witt spends much of his time on institutional repostory efforts, specifically in regards to scientific data. Ms. Visconti works in the realm of the digital humanities and has recently made available her very interesting interactive dissertation — Infinite Ulysses. [6] After lunch Mr. Witt showed me a new library space scheduled to open before the Fall Semester of 2017. The space will be library-esque during the day, and study-esque during the evening. Through the process of construction, some of their collection needed to be weeded, and I found the weeding process to be very interesting.

University of Illinois / Urbana-Champagne

Up again in the morning and a drive to Urbana-Champagne. During this jaunt I became both a ninny and a slave to my computer’s (telephone’s) navigation and functionality. First it directed me to my location, but no parking places. After identifying a parking place on my map (computer), I was not able to get directions on how to get there. Once I finally found parking, I required my telephone to pay. Connect to remote site while located in concrete building. Create account. Supply credit card number. Etc. We are increasingly reliant (dependent) on these gizmos.

My first meeting was with Karen Hogenboom (Associate Professor of Library Administration, Scholarly Commons Librarian and Head, Scholarly Commons). We too discussed digital scholarship centers, and again, there were more things in common with our centers than differences. Her space was a bit smaller than Notre Dame’s, and their space was less about specific services and more about referrals to other services across the library and across the campus. For example, geographic information systems services and digitization services were offered elsewhere.

I then had a date with an old book, but first some back story. Here at Notre Dame Julia Schneider brought to my attention a work written by Erasmus and commenting on Cato which may be a part of a project called The Digital Schoolbook. She told me how there were only three copies of this particular book, and one of them was located in Urbana. Consequently, a long month ago, I found a reference to the book in the library catalog, and I made an appointment to see it in person. The book’s title is Erasmi Roterodami libellus de co[n]structio[n]e octo partiu[m]oratio[n]is ex Britannia nup[er] huc plat[us] : et ex eo pureri bonis in l[ite]ris optio and it was written/published in 1514. [7, 8] The book represented at least a few things: 1) the continued and on-going commentary on Cato, 2) an example of early book printing, and 3) forms of scholarship. Regarding Cato I was only able to read a single word in the entire volume — the word “Cato” — because the whole thing was written in Latin. As an early printed book, I had to page through the entire volume to find the book I wanted. It was the last one. Third, the book was riddled with annotations, made from a number of hands, and with very fine-pointed pens. Again, I could not read a single word, but a number of the annotations were literally drawings of hands pointing to sections of interest. Whoever said writing in books was a bad thing? In this case, the annotations were a definite part of the scholarship.

Manet lion art colors

Washington University In St. Louis

Yet again, I woke up the next morning and continued on my way. Along the road there were billboards touting “foot-high pies” and attractions to Indian burial grounds. There were corn fields being harvested, and many advertisements pointing to Abraham Lincoln stomping locations.

Late that afternoon I was invited to participate in a discussion with Doug Knox, Steve Pentecost, Steven Miles, and Dr. Miles’s graduate students. (Mr. Knox & Mr. Pentecost work in a university space called Arts & Sciences Computing.) They outlined and reported upon a digital project designed to aid researchers & scholars learn about stelae found along the West River Basin in China. I listened. (“Stelae” are markers, usually made of stone, commemorating the construction or re-construction of religious temples.) To implement the project, TEI/XML files were being written and “en masse” used akin to a database application. Reports were to be written agains the XML to create digital maps as well as browsable lists of names of people, names of temples, dates, etc. I got to thinking how timelines might also be apropos.

The bulk of the following day (Friday) was spent getting to know a balance of colleagues and discussing the Early English Print Project. There were many people in the room: Doug Knox & Steve Pentecost from the previous day, Joesph Loewenstein (Professor, Department of English, Director Of the Humanities Digital Workshop and the Interdisciplinary Project in the Humanities) Kate Needham, Andrew Rouner (Digital Library Director), Anupam Basu (Assistant Professor, Department of English), Shannon Davis (Digital Library Services Manager), Keegan Hughes, and myself.

More specifically, we talked about how sets of EEBO/TCP ([9]) TEI/XML files can be: 1) corrected, enhanced, & annotated through both automation as well as crowd-sourcing, 2) supplemented & combined with newly minted & copy-right free facsimiles from the original printed documents, 3) analyzed & reported upon through text mining & general natural language processing techniques, and 4) packaged up & redistributed back to the scholarly community. While the discussion did not follow logically, it did surround a number of unspoken questions, such as but not limited to:

  • Is METS a desirable re-distribution method? [10] What about some sort of database system instead?
  • To what degree is governance necessary in order for us to make decisions?
  • To what degree is it necessary to pour the entire corpus (more than 60,000 XML files with millions of nodes) into a single application for processing, and is the selected application up to the task?
  • What form or flavor of TEI would be used as the schema for the XML file output?
  • What role will an emerging standard called IIIF play in the process of re-distribution? [11]
  • When is a corrected text “good enough” for re-distribution?

To my mind, none of these questions were answered definitively, but then again, it was an academic discussion. On the other hand, we did walk away with a tangible deliverable — a whiteboard drawing illustrating a possible workflow going something like this:

  1. cache data from University of Michigan
  2. correct/annotate the data
  3. when data is “good enough”, put the data back into the cache
  4. feed the data back to the University of Michigan
  5. when data is “good enough”, text mine the data and put the result to back into the cache
  6. feed the data back to the University of Michigan
  7. create new facsimiles from the printed works
  8. combine the facsimiles with the data, and put the result to back into the cache
  9. feed the data back to the University of Michigan
  10. repeat

After driving through the country side, and after two weeks of reflection, I advocate a slightly different workflow:

  1. cache TEI data from GitHub repository, which was originally derived from the University of Michigan [12]
  2. make cache accessible to the scholarly community through a simple HTTP server and sans any intermediary application
  3. correct/annotate the data
  4. as corrected data becomes available, replace files in cache with corrected versions
  5. create copyright-free facsimiles of the originals, combine them with corrected TEI in the form of METS files, and cache the result
  6. use the METS files to generate IIIF manifests, and make the facsimiles viewable via the IIIF protocol
  7. as corrected files become available, use text mining & natural language processing to do analysis, combine the results with the original TEI (and/or facsimiles) in the form of METS files, and cache the result
  8. use the TEI and METS files to create simple & rudimentary catalogs of the collection (author lists, title lists, subject/keyword lists, date lists, etc.), making it easier for scholars to find and download items of interest
  9. repeat

The primary point I’d like to make in regard to this workflow is, “The re-distribution of our efforts ought to take place over simple HTTP and in the form of standardized XML, and I do not advocate the use of any sort of middle-ware application for these purposes.” Yes, of course, middle-ware will be used to correct the TEI, create “digital combos” of TEI and images, and do textual analysis, but the output of these processes ought to be files accessible via plain o’ ordinary HTTP. Applications (database systems, operating systems, content-management systems, etc.) require maintenance, and maintenance is done by a few & specialized number of people. Applications are often times “black boxes” understood and operated by a minority. Such things are very fragile, especially compared to stand-alone files. Standardized (XML) files served over HTTP are easily harvestable by anybody. They are easily duplicated. They can be saved on platform-independent media such as CD’s/DVD’s, magnetic tape, or even (gasp) paper. Once the results of our efforts are output as files, then supplementary distribution mechanisms can be put into place, such as IIIF or middleware database applications. XML files (TEI and/or METS) served over simple HTTP ought be the primary distribution mechanism. Such is transparent, sustainable, and system-independent.

Over lunch we discussed Spenser’s Faerie Queene, the Washington University’s Humanities Digital Workshop, and the salient characteristics of digital humanities work. [13] In the afternoon I visited the St. Louis Art Museum, whose collection was rich. [14] The next day, on my way home through Illinois, I stopped at the tomb of Abraham Lincoln in order to pay my respects.

Lincoln University Matisse Arch

In conclusion

In conclusion, I learned a lot, and I believe my Americana road trip was a success. My conception and defintion of digital scholarship centers was re-enforced. My professional network was strengthened. I worked collaboratively with colleagues striving towards a shared goal. And my personal self was enriched. I advocate such road trips for anybody and everybody.


[1] digital scholarship at Indiana University –
[2] HathiTrust Research Center –
[3] Lapps Grid –
[4] Voyant Tools –
[5] EEBO-TCP Workset Browser –
[6] Infinite Ulysses –
[7] old book from the UIUC catalog –
[8] old book from the Universal Short Title Catalogue –
[9] EEBO/TCP –
[10] METS –
[11] IIIF –
[12] GitHub repository of texts –
[13] Humanities Digital Workshop –
[14] St. Louis Art Museum –

Jobs in Information Technology: October 13, 2016 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Pratt Institute, Library Systems Administrator – Pratt Institute Libraries, Brooklyn, NY

University of Arkansas, Business Librarian, Fayetteville, AR

University of Arkansas, Education Librarian, Fayetteville, AR

The Scripps Research Institute, Discovery and Access Systems Librarian, La Jolla, CA

University of Nevada , Las Vegas – UNLV, Application Developer, Las Vegas, NV

MIT Libraries, DevOps Engineer for Library Systems, Cambridge, MA

MIT Libraries, Senior Software Engineer, Cambridge, MA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

More Is Not Better / David Rosenthal

Quite a few of my recent posts have been about how the mainstream media is catching on to the corruption of science caused by the bad incentives all parties operate under, from science journalists to publishers to institutions to researchers. Below the fold I look at some recent evidence that this meme has legs.

Donald S. Kornfeld and Sandra L. Titus have a comment in Nature entitled Stop ignoring misconduct arguing that the bad incentives for researchers inevitably produce misconduct, but that this is routinely swept under the carpet:
In other words, irreproducibility is the product of two factors: faulty research practices and fraud. Yet, in our view, current initiatives to improve science dismiss the second factor. For example, leaders at the US National Institutes of Health (NIH) stated in 2014: “With rare exceptions, we have no evidence to suggest that irreproducibility is caused by scientific misconduct”. In 2015, a symposium of several UK science-funding agencies convened to address reproducibility, and decided to exclude discussion of deliberate fraud.
The scientific powers-that-be are ignoring the science:
Only 10–12 individuals are found guilty by the US Office of Research Integrity (ORI) each year. That number, which the NIH used to dismiss the role of research misconduct, is misleadingly low, as numerous studies show. For instance, a review of 2,047 life-science papers retracted from 1973 to 2012 found that around 43% were attributed to fraud or suspected fraud. A compilation of anonymous surveys suggests that 2% of scientists and trainees admit that they have fabricated, falsified or modified data. And a 1996 study of more than 1,000 postdocs found that more than one-quarter would select or omit data to improve their chances of receiving grant funding.
Linked from this piece are several other Nature articles about misconduct:
  • Misconduct: Lessons from researcher rehab by James M. DuBois, John T. Chibnall, Raymond Tait and Jillon Vander Wal is an interesting report on a program to which researchers are referred after misconduct has been detected. It identifies the pressure for "more not better" as a factor:
    By the metrics that institutions use to reward success, our programme participants were highly successful researchers; they had received many grants and published many papers. Yet, becoming overextended was a common reason why they failed to adequately oversee research. It may also have led them to make compliance a low priority. ... Scientists become overextended in part because their institutions value large numbers of projects.
  • Robust research: Institutions must do their part for reproducibility by C. Glenn Begley, Alastair M. Buchan and Ulrich Dirnagl argues for compliance processes for general research analogous to those governing animal research. They too flag "more not better":
    Institutions must support and reward researchers who do solid — not just flashy — science and hold to account those whose methods are questionable. ... Although researchers want to produce work of long-term value, multiple pressures and prejudices discourage good scientific practices. In many laboratories, the incentives to be first can be stronger than the incentives to be right.
  • Workplace climate: Metrics for ethics by Monya Baker reports on institutions using a survey of researcher's workplace climate issues in areas such as "integrity norms (such as giving due credit to others' ideas), integrity inhibitors (such as inadequate access to material resources) and adviser–advisee relations".
Also in Nature is Corie Lok's Science’s 1%: How income inequality is getting worse in research, which starts:
For a portrait of income inequality in science, look no further than the labs of the University of California. Twenty-nine medical researchers there earned more than US$1 million in 2015 and at least ten non-clinical researchers took home more than $400,000 each. Meanwhile, thousands of postdocs at those universities received less than $50,000. Young professors did better, but many still collected less than one-quarter of the earnings of top researchers.
The work of Richard Wilkinson, Kate Pickett and others shows that increasing inequality is correlated with misconduct, among other social ills. The finance industry is the poster-child of inequality, their machinations in the 2008 financial crisis, such as robosigning and synthetic CDOs, should be evidence enough. Which way round the chain of causation runs is not clear.

Even if there is no actual misconduct, the bad incentives will still cause bad science to proliferate via natural selection, or the scientific equivalent of Gresham's Law that "bad money drives out good". The Economist's Incentive Malus, subtitled Poor scientific methods may be hereditary, is based on The natural selection of bad science by Paul E. Smaldino and Richard McElreath, which starts:
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement.
The Economist writes that Smaldino and McElreath:
decided to apply the methods of science to the question of why this was the case, by modelling the way scientific institutions and practices reproduce and spread, to see if they could nail down what is going on.

They focused in particular on incentives within science that might lead even honest researchers to produce poor work unintentionally. To this end, they built an evolutionary computer model in which 100 laboratories competed for “pay-offs” representing prestige or funding that result from publications. ... Labs that garnered more pay-offs were more likely to pass on their methods to other, newer labs (their “progeny”).

Some labs were better able to spot new results (and thus garner pay-offs) than others. Yet these labs also tended to produce more false positives—their methods were good at detecting signals in noisy data but also, as Cohen suggested, often mistook noise for a signal. More thorough labs took time to rule these false positives out, but that slowed down the rate at which they could test new hypotheses. This, in turn, meant they published fewer papers.

In each cycle of “reproduction”, all the laboratories in the model performed and published their experiments. Then one—the oldest of a randomly selected subset—“died” and was removed from the model. Next, the lab with the highest pay-off score from another randomly selected group was allowed to reproduce, creating a new lab with a similar aptitude for creating real or bogus science. ... they found that labs which expended the least effort to eliminate junk science prospered and spread their methods throughout the virtual scientific community.
Worse, they found that replication was did not suppress this selection process:
Replication has recently become all the rage in psychology. In 2015, for example, over 200 researchers in the field repeated 100 published studies to see if the results of these could be reproduced (only 36% could). Dr Smaldino and Dr McElreath therefore modified their model to simulate the effects of replication, by randomly selecting experiments from the “published” literature to be repeated.

A successful replication would boost the reputation of the lab that published the original result. Failure to replicate would result in a penalty. Worryingly, poor methods still won—albeit more slowly. This was true in even the most punitive version of the model, in which labs received a penalty 100 times the value of the original “pay-off” for a result that failed to replicate, and replication rates were high (half of all results were subject to replication efforts).
The Economist reports Smaldino and McElreath's conclusion is bleak:
that when the ability to publish copiously in journals determines a lab’s success, then “top-performing laboratories will always be those who are able to cut corners”—and that is regardless of the supposedly corrective process of replication.

Ultimately, therefore, the way to end the proliferation of bad science is not to nag people to behave better, or even to encourage replication, but for universities and funding agencies to stop rewarding researchers who publish copiously over those who publish fewer, but perhaps higher-quality papers.
Alas, the people in a position to make this change are reached this exalted state by publishing copiously, so The Economist's is a utopian suggestion. In Bad incentives in peer-reviewed science I wrote:
Fixing these problems of science is a collective action problem; it requires all actors to take actions that are against their immediate interests roughly simultaneously. So nothing happens, and the long-term result is, as Arthur Caplan (of the Division of Medical Ethics at NYU's Langone Medical Center) pointed out, a total loss of science's credibility:
The time for a serious, sustained international effort to halt publication pollution is now. Otherwise scientists and physicians will not have to argue about any issue—no one will believe them anyway.
(see also John Michael Greer).
This loss of credibility is the subject of Andrea Saltelli's Science in crisis: from the sugar scam to Brexit, our faith in experts is fading which starts:
Worldwide, we are facing a joint crisis in science and expertise. This has led some observers to speak of a post-factual democracy – with Brexit and the rise of Donald Trump the results.

Today, the scientific enterprise produces somewhere in the order of 2m papers a year, published in roughly 30,000 different journals. A blunt assessment has been made that perhaps half or more of all this production “will not stand the test of time”.

Meanwhile, science has been challenged as an authoritative source of knowledge for both policy and everyday life, with noted major misdiagnoses in fields as disparate as forensics, preclinical and clinical medicine, chemistry, psychology and economics.
Like I did above, Saltelli uses the finance analogy point out the deleterious effect of simplistic metrics - you get what you reward:
One can see in the present critique of finance – as something having outgrown its original function into a self-serving entity – the same ingredients of the social critique of science.

Thus the ethos of “little science” reminds us of the local banker of old times. Scientists in a given field knew one another, just as local bankers had lunch and played golf with their most important customers. The ethos of techno-science or mega-science is similar to that of the modern Lehman bankers, where the key actors know one another only through performance metrics.
But I think in this case the analogy is misleading. The balkanization of science into many sub-fields leads to cliques and the kind of group-think illustrated in William A. Wilson's Scientific Regress:
once an entire field has been created—with careers, funding, appointments, and prestige all premised upon an experimental result which was utterly false due either to fraud or to plain bad luck—pointing this fact out is not likely to be very popular. Peer review switches from merely useless to actively harmful. It may be ineffective at keeping papers with analytic or methodological flaws from being published, but it can be deadly effective at suppressing criticism of a dominant research paradigm.
Charles Seife's How the FDA Manipulates the Media shows how defensive scientific institutions are becoming in the face of these problems. They are so desperate to control how the press reports science and science-based policy that they are using "close-hold embargos":
The deal was this: NPR, along with a select group of media outlets, would get a briefing about an upcoming announcement by the U.S. Food and Drug Administration a day before anyone else. But in exchange for the scoop, NPR would have to abandon its reportorial independence. The FDA would dictate whom NPR's reporter could and couldn't interview.
The FDA isn't the only institution doing this:
This January the California Institute of Technology was sitting on a great story: researchers there had evidence of a new giant planet—Planet Nine—in the outer reaches of our solar system. The Caltech press office decided to give only a dozen reporters, including Scientific American's Michael Lemonick, early access to the scientists and their study. When the news broke, the rest of the scientific journalism community was left scrambling. “Apart from the chosen 12, those working to news deadlines were denied the opportunity to speak to the researchers, obtain independent viewpoints or have time to properly digest the published research paper,” complained BBC reporter Pallab Ghosh about Caltech's “inappropriate” favoritism in an open letter to the World Federation of Science Journalists.
But it may be the only one doing it in violation of their stated policy:
in June 2011, the FDA's new media policy officially killed the close-hold embargo: “A journalist may share embargoed material provided by the FDA with nonjournalists or third parties to obtain quotes or opinions prior to an embargo lift provided that the reporter secures agreement from the third party to uphold the embargo.”
The downside of the close-hold embargo is obvious from this example:
in 2014 the Harvard-Smithsonian Center for Astrophysics (CfA) used a close-hold embargo when it announced to a dozen reporters that researchers had discovered subtle signals of gravitational waves from the early universe. “You could only talk to other scientists who had seen the papers already; we didn't want them shared unduly,” says Christine Pulliam, the media relations manager for CfA. Unfortunately, the list of approved scientists provided by CfA listed only theoreticians, not experimentalists—and only an experimentalist was likely to see the flaw that doomed the study. (The team was seeing the signature of cosmic dust, not gravitational waves.)
Defensiveness is rampant. Cory Doctorow's Psychology's reproducibility crisis: why statisticians are publicly calling out social scientists reports on a response by Andrew Gelman, a professor of statistics and political science, and director of Columbia's Applied Statistics Center, to a screed by Princeton University psychology professor Susan Fiske, a past president of the Association for Psychological Science. Fiske is unhappy that "self-appointed data police" are using blogs and other social media to criticize published research via "methodological terrorism", instead of using the properly peer-reviewed and "monitored channels". Gelman's long and detailed blog post starts:
Fiske doesn’t like when people use social media to publish negative comments on published research. She’s implicitly following what I’ve sometimes called the research incumbency rule: that, once an article is published in some approved venue, it should be taken as truth. I’ve written elsewhere on my problems with this attitude—in short, (a) many published papers are clearly in error, which can often be seen just by internal examination of the claims and which becomes even clearer following unsuccessful replication, and (b) publication itself is such a crapshoot that it’s a statistical error to draw a bright line between published and unpublished work.
Gelman's post connects to the work of Smaldino and McElreath:
Fiske expresses concerns for the careers of her friends, careers that may have been damaged by public airing of their research mistakes. Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.

Hurdles and joys of introducing Data Journalism in post-Soviet universities / Open Knowledge Foundation

At the Data Journalism Bootcamp organized by the United Nations Development Programme (UNDP), a large part of the participants came from the Commonwealth of Independent States: Belarus, Ukraine, Moldova, Uzbekistan and Kyrgyzstan. Interestingly, the students were deans and professors of journalism departments who had to dive into a new field. The idea of the bootcamp was to understand how to start teaching data journalism at their respective universities. As an assistant of data journalism trainer Eva Constantaras, I was responsible for helping the Russian-speaking part of the class follow the densely packed schedule. Here are some of my thoughts and observations on this challenge.


Journalism professors turning students again during Excel labs © Alexander Parfentsov

Data journalism as the new punk for traditional societies

Data journalism is the new punk’, a metaphor used by Simon Rogers to express that anyone can do it, can take on new meanings when applied to conservative societies. As much as rock music in the 1970s, data journalism seems revolutionary for conservative media and academia.  Going against ‘he said-she said’ narratives, it questions the very power of word that journalists have traditionally been so proud of. Moreover, the whole system of journalism education in many post-Soviet countries is still oriented towards literature and humanities rather than math and statistics. Overcoming this barrier is seen as a major challenge by Irina Sidorskaya, Professor in Journalism, Mass Media and Gender at Belarusian State University.   


Cornelia Cozonac and myself © Alexander Parfentsov

Data journalism as the new toolset for investigative reporters

The good news is that, it seems, some countries have been advancing on data journalism without recognising it as such. One of the participants of the training was Cornelia Cozonac, president of the Center for Investigative Journalism in Moldova and senior lecturer at the Free International University. Take a look at this investigation of matching contractors of public procurement and political party donors that she supervised – data journalism in its pure form. Or check this ‘Guess the salary’ news app, production of Ukrainian and short-lister of Data Journalism Awards 2016. In countries where investigative journalism is strong, and there is available data to work with which sheds light on areas of public interest, there is only one step left for data journalism to flourish. And this step is training journalism students to be data-driven.


Pitching ideas for data driven stories © Alexander Parfentsov

Data Journalism as the new discipline for journalism students

In western societies, data journalism has ventured from newsrooms to classrooms, brought mostly by people from other backgrounds, like bioengineering or statistics. UNDP challenged this at the Bootcamp by addressing directly academics of the Commonwealth of Independent States and Balkans.

Is it too early to expect data journalism to be routinely taught in journalism schools?? In my personal view, as a graduate of a classic journalism school, this move is very much needed. By letting data analysis only be performed by great minds and outliers like Nate Silver, we forget that a journalist’s duty is to think critically and orient their stories for the public interest. To live up to this duty, journalists must be equipped with the training and skills required to understand the data, too.

Top tips on how to embed data journalism in university programmes (crowdsourced from participants at the Data Journalism Bootcamp)

  • Get computer classes to close the skills gap
  • Make public awareness campaign, start with your own professors
  • invite experts to evangelize the matter
  • create pilot classes, winter/summer schools, bootcamps
  • harvest or create opportunities for non-formal education like this resource dedicated to data driven journalism or this training (disclaimer: author of this blogpost is involved in both)
  • get governments interested and accountable as open data providers
  • partner up with leading newsrooms to create a job demand
  • cultivate relationships with your own experts

Rewarding good practice in research / Jez Cope

Carrot + Stick < Love from

From on Flickr

Whenever I’m involved in a discussion about how to encourage researchers to adopt new practices, eventually someone will come out with some variant of the following phrase:

“That’s all very well, but researchers will never do XYZ until it’s made a criterion in hiring and promotion decisions.”

With all the discussion of carrots and sticks I can see where this attitude comes from, and strongly empathise with it, but it raises two main problems:

  1. It’s unfair and more than a little insulting to anyone to be lumped into one homogeneous group; and
  2. Taking all the different possible XYZs into account, that’s an awful lot of hoops to expect anyone to jump through.

Firstly, “researchers” are as diverse as the rest of us in terms of what gets them out of bed in the morning. Some of us want prestige; some want to contribute to a greater good; some want to create new things; some just enjoy the work.

One thing I’d argue we all have in common is this: nothing is more offputting than feeling like you’re being strongarmed into something you don’t want to do.

If we rely on simplistic metrics, people will focus on those and miss the point. At best people will disengage and at worst they will actively game the system. I’ve got to do these ten things to get my next payrise, and still retain my sanity? Ok, what’s the least I can get away with and still tick them off. You see it with students taking poorly-designed assessments and grown-ups are no difference.

We do need to wield carrots as well as sticks, but the whole point is that these practices are beneficial in and of themselves. The carrots are already there if we articulate them properly and clear the roadblocks (don’t you enjoy mixed metaphors?). Creating artificial benefits will just dilute the value of the real ones.

Secondly, I’ve heard a similar argument made for all of the following practices and more:

  • Research data management
  • Open Access publishing
  • Public engagement
  • New media (e.g. blogging)
  • Software management and sharing

Some researchers devote every waking hour to their work, whether it’s in the lab, writing grant applications, attending conferences, authoring papers, teaching, and so on and so on. It’s hard to see how someone with all this in their schedule can find time to exercise any of these new skills, let alone learn them in the first place. And what about the people who sensibly restrict the hours taken by work to spend more time doing things they enjoy?

Yes, all of the above practices are valuable, both for the individual and the community, but they’re all new (to most) and hence require more effort up front to learn. We have to accept that it’s inevitably going to take time for all of them to become “business as usual”.

I think if the hiring/promotion/tenure process has any role in this, it’s in asking whether the researcher can build a coherent narrative as to why they’ve chosen to focus their efforts in this area or that. You’re not on Twitter but your data is being used by 200 research groups across the world? Great! You didn’t have time to tidy up your source code for github but your work is directly impacting government policy? Brilliant!

We still need convince more people to do more of these beneficial things, so how? Call me naïve, but maybe we should stick to making rational arguments, calming fears and providing low-risk opportunities to learn new skills. Acting (compassionately) like a stuck record can help. And maybe we’ll need to scale back our expectations in other areas (journal impact factors, anyone?) to make space for the new stuff.

NDI Talk at Collections as Data / Library of Congress: The Signal

Here’s the text of the talk I gave last week at the Collections as Data event my group hosted on September 27, 2016. If you would like to watch it, the talk starts at about minute 54 of the video of the event.

Welcome to Collections as Data! I’m excited to tell you about our group, National Digital Initiatives that is hosting today’s event.

Meet our team: Abbey Potter, Mike Ashenfelder and Jaime Mears. They spent a lot of time putting this together and I think they did a great job. In addition to Jane McAuliffe, who you just heard from, Eugene Flanagan from our executive management team is here too, so if you like what we’re doing, be sure to stop them and thank them.

Text: "Hello and welcome from National Digital Initiatives," photos of staff

I’d like to take this opportunity to talk to you all a bit about this new group we’ve started, National Digital Initiatives, what we hope to accomplish, and a little more about what you can expect today. But first, a short story that I think helps illustrate the Library of Congress’ long history of technological innovation. I love to tell the story of Henriette Avram, whose work here at the Library of Congress replaced ink-on-paper card catalogs and revolutionized cataloging systems at libraries worldwide

Photo of Henriette Avram with a quote of hers that reads "When I speak of and refer to it as ‘the Great Library,’ I do so with sincerity and appreciation for everything that I learned within those walls."

Henriette Avram. Photo by Reid Baker,

Henriette Avram was born in New York in 1919. She took two years of pre-med courses at Hunter College, then left to start a family. She was in her 30s when she started learning programming. (I love this story because, there are people who will insist if you haven’t been coding since you were crawling, you’ll never make it as a programmer. They’re wrong. She did, and she changed the world.)

Photo of a person using a card catalog

Woman at Main Reading Room card catalog in the Library of Congress. Photo by Jack Delano

As some of you know, Henriette and her team created the MARC format, a structure to contain bibliographic information for all forms of materials. This format was the keystone that made a revolution in information science possible. Because of her work, we can, with a few keystrokes, search the treasures of a library on the other side of the earth.

Ms. Avram is a bit of a hidden hero in computer science. She did all of this before the first relational database, before character encoding was mature and before computers were networked. It’s amazing to me that we don’t hear her mentioned more often.

Image of book stacks with text: "I am not a librarian by training but a brainwashed computer systems analyst." -Avram

Library of Congress, Jefferson Building, bookstacks area. Photo.

I like this story because it shows the power of cross-discipline fertilization. Ms. Avram herself combined two complex fields, computer programming and intricate cataloging practices to create a sea-change in how the public and scholars access library collections. Many of you who sit here may not remember using card catalogs, but this work was transformational. It made finding things much, much easier and allowed remote access to resources. I miss the smell of a card catalog, don’t you? But I don’t miss much else.

Ms. Avram’s work started the digital revolution in information science. We (both LC and the field as a whole) carry that spirit forward today in our contributions to standards work and open source tools. But innovation is not always in the big bang, it just looks like that from hindsight. People say MARC was released in 1968 but really it was released in stages over about a decade and continues to be refined to this day. The MARC of today is an international standard that covers character sets for all writing systems currently in use. Innovation often looks less like a dot on a timeline than a series of continuous improvements.

Representative images from collections released on in 2016: a portrait of Walt Whitman, an image of lace, ballroom dancing manual cover....

William T. Sherman Papers: Certificate of thanks signed by Abraham Lincoln,
Needlework display at St. Nicholas Greek Orthodox Church Photo by Jonas Dovydenas.
Walt Whitman Papers: Photograph of Whitman. Photo by Frank Pearsall,
How to dance A complete ball-room and party guide. Published by Tousey & Small.
Walt Whitman Papers: Cardboard Butterfly and Notebooks,
Nathan W. Daniels Diary: Diary; Vol. I, 1861

With that in mind, I’d like to talk a little about, which is one dot on a timeline represents a lot of invisible labor and exciting treasures. Each new collection on that list contains work from people throughout the Library: especially in our CIO’s office and in Library Services.

So far this year, we have released a lot of new collections including the diaries of George Patton, The Chicago Ethnic Arts Project survey and Walt Whitman’s papers. These new collections represent the opportunity for remote access to resources that previously required a plane trip to DC for most of America.

I just wanted to take a minute to zoom down one more level and talk about one collection, Rosa Parks. Just as an example of the kind of digital work LC does. I picked this one because I think it’s really exciting. But we could just as easily be taking a close look at any of those other collections today, like ballroom dancing instruction manuals or web archives from the 2014 election.

Rosa Parks and Honorable Congresswoman Shirley Chisholm. [ca. 1968] Image. //

Rosa Parks’ collection of personal correspondence and photographs is on a ten-year loan to LC from the Howard G. Buffett Foundation. The collection contains about 7,500 items in the Manuscript Division and 2,500 photographs in the Prints and Photographs division, documenting Mrs. Parks’ private life and public activism on behalf of civil rights for African Americans. The material was assessed and stabilized by LC’s Conservation Division staff, cataloged and described by archivists and librarians and  then digitized. Files were moved around by people and software, assessed and validated and then prepared for access. Metadata was added and transformed. Webpages were made. Search indexes populated. Rights assessed. A few years worth of work in less than 140 characters.

Image of a tweet from @libraryofcongress that reads: "Rosa Parks Collection Now Online..."

For those of you who are librarians, you know that this is just the job. A bunch of invisible work to make information usable. But I want to make a big deal out of it. Because it’s important. And because new, cool websites get a lot of attention but the usual production work of adding vast resources to the Nation’s collective is easy to overlook.

As I think about NDI, my new team, and what it’s tasked with and what I want to accomplish, I often think about the tension between innovation and sustainability. The sustained effort it takes to ready new collections for the web. The enduring power of the MARC format and bibliographic standards in general. Because I think groups like ours can become the “cool new thing” group, and I don’t want to be that. Henriette Avram is a shining beacon for me. Like me she comes to libraries from software. And like me she was excited about what infrastructure could do. I want to carry forward that enthusiasm. And I want us to try new things, with a vision for the future.

So with that in mind, I want to talk a little bit about our team at NDI. What do we plan on doing?

Photo of a woman operating a hand drill from the U.S. Office of War Information, 1944. Text reads: Maximize the benefit of the digital collection"

Operating a hand drill at the North American Aviation, Inc., Photo by Alfred T. Palmer,

We want to maximize the benefit of the Library’s digital collection to the American public and the world.

We have a lot of stuff here and a lot of it is publicly available and digital. For example, right now on our websites there are more than 10 million pages of historic newspapers, 1.2 million prints and photographs, and one-of-a-kind collections such as the papers of George Washington, Abraham Lincoln, Carl Sagan, Jackie Robinson and many more books, maps and archived websites.

And we know they’re being used by students, scholars and researchers. But how can we expand that reach even further? How do we reach more life-long learners? How do we encourage a new generation of journalists and writers to turn to LC for reference help and for resources?

When I mentioned students, there is a division at the Library that is dedicated to reaching them. The Educational Outreach team develops innovative resources for K-12 classroom teachers and it does make a difference. How do we apply those same ideas to advanced scholars, researchers and the curious to help maximize the benefit of the collections to the American public?

And we’d like to see more creative re-use of collections materials.

Screenshot of the Flickr commons showing photos with the tag "greatmustachesoftheloc"

Great Mustaches of the LOC.

It’s fun but it can serve a scholarly and important purpose too. I talked earlier about the Rosa Parks papers, which contain a handwritten note…

Note handwritten by Rosa Parks

Rosa Parks Papers: Accounts of her arrest,

…in which she says “I had been pushed around all my life and felt at this moment that I couldn’t take it anymore.”

The papers give us insight into Parks as an American hero in that moment, but other pieces help give us a fuller picture, to humanize her.

Photo of Rosa Parks pancake recipe written on a bank deposit envelope

Rosa Parks Papers: Recipe for featherlite pancakes,

That’s why I love her pancake recipe, which got picked up by the press and food bloggers. From what I hear it makes good pancakes, but it helps us remember that she was a real human, not just an icon frozen in time. And hopefully it leads people to want to learn more from the source herself.

Photo of Martha Graham in dance with text: "Incubate, encourage, and promote digital innovation"

Ekstasis, No. 2. Library of Congress, Music Division. Reproduced with permission of Martha Graham Resources, a division of The Martha Graham Center of Contemporary Dance,

NDI has a tiny, tiny staff, which I like, actually. It helps us keep our focus and makes us very agile. We can try things, like a hackathon, without there being an impact on the critical production work of the Library of Congress. We can help make connections between staff members working on similar projects. I think of this work as being like an interface. I used to say semi-permeable membrane but that got a lot of weird looks. It’s the work of finding useful and interesting stuff in our community and making sure the people in LC who should know about it are connected. And it’s the work of letting the world know the cool and useful stuff that is going on here too.

Energy diagram of an exothermic reaction

Lowering the Activation Energy of a Reaction by a Catalyst

I think we can also provide a useful service in catalyzing innovative projects. You might remember from chemistry that catalysts lower the activation energy of reactions. In our world, connections, advice and support can make new ideas possible.

I often say that the library profession is poor (I mean, we’re not Silicon Valley) but we’re scrappy as heck and we like each other. The power of indefatigable people working together is like the stream over the stone – our problems may be immovable objects but we are focused, we are many and we have a very long memory.

Crowd shots of DPLAFest and the Archives Unleashed datathon

DPLAfest 2016. Photo by Jason Dixon
Archives Unleashed Hackathon. Photo by Jaime Mears

That is why one of the focuses on NDI will be on building relationships that can lead to successful partnerships, like our co-hosting of DPLAFest and the Archives Unleashed Datathon.

Matt Weber is up next, so in the interest of no spoilers, I won’t say much about the Hackathon that NDI co-hosted with other groups. But I will say that we had a great experience hosting coders, academics and librarians working together. In addition to making some new discoveries using data analysis on digital collections, the team collaborations showed just how important librarians are acting as experts on the data, highlighting the evolving shape of reference in this new century.

I wonder if I could tell you a quick story about that. One of the groups was looking at a text analysis from a set of Supreme Court nomination websites and the results were looking a little funny. Some words didn’t make sense in context to the scholars but luckily a Law Librarian was sitting at their table. He explained a little bit about the contours of the data set and why they might be getting some of the artifacts they were seeing. And he suggested ways to refine that query to improve the quality of the results. It’s a great example of the unique service we provide in libraries.

Photo of a Scientific Laboratory

Scientific Laboratory. Photo by Prokudin-Gorskiĭ, Sergeĭ Mikhaĭlovich,

We’re thinking about ways to better support scholars and researchers working with digital material. The John W. Kluge Center hosts scholars from around the world to conduct research here at the Library of Congress. We’re working with the staff of the Kluge Center to determine how we can help scholars coming here to do computer-assisted research with the digital collections, such as visualization and network analysis.

We’ve asked two outside experts: Dan Chudnov and Michelle Gallinger to do a proof-of-concept for a digital scholars lab. Our goal in this pilot is to demonstrate what a lightweight implementation could look like and I’m really excited about it.

Photos and description from the Github accounts of Chris Adams and Tong Wang

Photos from the Github accounts of Chris Adams and Tong Wang.

I’d like to introduce Tong Wang and Chris Adams. They’re sharing NDI’s inaugural fellowship that seeks to sponsor work that demonstrates an innovative use of LC digital collections materials. They’re well on their way, exploring collections and some open source tools, and we hope they’ll have something to share with you all soon. We’re working to expand this program in future years so we can welcome a wider range of applicants. Don’t read too much into the bike gear – it’s not a requirement.

Illustration of spirals with the text "Collections as Data September 27th 2016 Library of Congress Open to the Public"

Art created by the User Experience Team of The Library of Congress, a team of professionals that are committed to making the Library’s collections more available and accessible to the American people.

As you can probably tell from our projects, our focus this inaugural year was on exploring Collections as Data, on seeking opportunities to get more value out of our own collections and on developing partnerships and friendships that can help advance the field. This summit is the public face to that work and one that we hope will grow into a sustainable program. And we’re exploring other ideas, like how we can enable more contributions to open source projects, what we can do to improve technical skill-building in libraries and how we improve shared infrastructure. All things that raise the tide, we hope.

Photo of Trees and a Bridge with the text "Thank you"

Theodore Roosevelt Island. Photo by Carol Highsmith,

Contact us! We want to work with you and we want to hear from you. Please drop us a line or read our blog, which we’ve recently relaunched with an expanded scope.

Like I said, we’re new and we are small, but I’m proud of what this team has been able to accomplish so quickly.  I am particularly proud of holding this Collections As Data event, in which we invited librarians, programmers, archivist, researchers, artists, data journalists, thinkers and partners to help us explore this topic together. I hope the story about Henriette Avram inspires you to think about the power that an indefatigable and a determined individual can have on the course of human knowledge. And looking around this room, I see hundreds of you. I see visionaries and even more important, a network of collaborators that can make those ideas happen.

It’s an exciting time to be in this field, let’s think big and make friends.

Call for Writers / LITA

blogger meme

meme courtesy of Michael Rodriguez

The LITA blog is seeking regular contributors interested in writing easily digestible, thought-provoking blog posts that are fun to read.

If you’re doing innovative work, finding ingenious solutions, or could conceivably be described as a “Library Technology MacGyver” we want you.

We are big on creativity and embracing new post formats, and 2016 saw the first LITA Vlog series, Begin Transmission. Whether you want to write a traditional blog post, create a podcast, host a Twitter chat, or stream a post with Periscope, we are eager to see what you make. Possible post formats could include interviews, how-tos, hacks, and beyond. Your posts can have an immediate impact, and you’ll find an audience of readers in the thousands. 

We embrace diverse formats and diverse voices. Library students and members of underrepresented groups are particularly encouraged to apply.

Writers contribute one post per month. A buddy system means you’ll be paired with an experienced blogger to help you draft and review your posts. You’ll join an active, supportive, and talented cadre of writers on whom you can rely for feedback. The average time commitment is between one and two hours a month.

If that sounds like too much of a time commitment, our guest contributor option may be a good option for you.

To apply, send an email to lindsay dot cronk at gmail dot com by Friday, November 11th. Please include the following information:

  • A brief bio
  • Your professional interests, including 2-3 example topics you would be interested in writing about
  • If possible, links to writing samples, professional or personal, to get a feel for your writing style

Send any and all questions my way!

Lindsay Cronk, LITA blog editor




concerns about Reveal Digital’s statement about On Our Backs / Tara Robertson

This is my third post about Reveal Digital and On Our Backs. The first post in March outlines my objections with this content being put online. The second post has some contributor agreements I found in the Cornell’s Rare Books and Manuscripts collection and the notes from my talk at code4libNYS.

About a month ago Reveal Digital decided to temporarily take down the On Our Backs (OOB) content. I was happy to hear about this. However I’ve got several concerns about their public statement (PDF). First, I’m concerned that citing Greenberg v. National Geographic Society foreshadows that they are going to disregard contributor agreements and concerns and put the whole collection online. Second, I’m concerned that minors accessing porn is listed ahead of contributor privacy issues and that reflects Reveal Digital’s priorities. Finally, I’m glad that Reveal Digital has broadened their idea of community consultation from financial stakeholders to include publishers, contributors, libraries, archives, researchers, and others, however I’m still worried about whose voices will be centered in these discussions.


According to Reveal Digital the Greenberg v. National Geographic Society ruling says gives them “the legal right to create a faithful digital reproduction of the publication, without the need to obtain permissions from individual contributors”. ARL has a summary of this case and a 5 page brief written by Ben Grilliot, who was a legal intern for ARL at the time. I’m far from being an expert on US Copyright Law but I understand this to mean that if Reveal Digital digitizes the entire run of OOB without making any changes it doesn’t matter that contributor agreements has limitations. Even if this is legal, it is not ethical.

The ARL summary says “The Copyright Act is “media-neutral,” and libraries believe that it should allow publishers to take advantage of new technologies to preserve and distribute creative works to the public.” I spoke to 3 people who modelled for OOB and none of them consented to have their photos appear online (PDF). As librarians we can’t uncritically fight for access to information, we need to take a more nuanced approach.


I’m puzzled by “minors accessing sexually explicit content” as the first reason Reveal Digital listed.  I can understand that this might be a liability issue, but it’s not difficult to find porn on the internet, especially porn that is more explicit and hard core than the images in OOB. I’m confused by this. Reveal Digital describes OOB as filling “an important hole in the feminist digital canon and is an essential artifact of the ‘feminist sex wars'” so for me this is an unexpected reason. Their statement says that they need a window of time to make the necessary software upgrades to solve this issue. I’m disappointed that this reason is given ahead of contributor privacy.


I was really happy to read how Reveal Digital articulates the importance of contributor privacy:

On the more complex issue of contributor privacy, Reveal Digital has come to share the concerns expressed by a few contributors and others around the digitization of OOB and the potential impact it might have on contributor privacy. While we feel that OOB carries an important voice that should be preserved and studied, we also feel that the privacy wishes of individual contributors should have an opportunity to be voiced and honored.

I feel like the above statement shows that they really heard and understood the concerns that many of the contributors and I had.

Community consultation

I’m thrilled to read that Reveal Digital intends to consult with various communities including “publishers, contributors, libraries, archives, researchers, and others”.

Often when people talk about consultations they mention a need to balance interests. We reject that libraries are neutral, so we need to extend that understanding to community consultation processes like these. Contributors, especially many models, could have their lives damaged by this. Researchers seek to gain prestige, grants, tenure and promotion from access to this collection and don’t stand to lose much, if anything. Different communities have a different stake in these decisions. Also, these groups aren’t homogeneous–it’s likely that some contributors will want this content online, some will be OK with some parts, and others will not any of it online. I hope that centering contributor voices is something that Reveal Digital will build into their consultation plan.

This isn’t the first digitization process that has needed community consultation. We can learn from the consultation process that took place around the digitization of the book Moko: or Maori tattooing or around the digitization of the second wave feminist periodical Spare Rib in the UK (thanks Michelle Moravec for telling me about this). Academic libraries can also learn from how public libraries build relationships with communities.

Redesigning our Subject Guides: Student-First and Staff-Friendly / Shelley Gullikson

I presented about our Web Committee’s redesign project at Access 2016 in Fredericton, NB on October 5, 2016. We started doing user research for the project in October 2015 and launched the new guides in June 2016 so it took a while, but I’m really proud of the process we followed. Below is a reasonable facsimile of what I said at Access. (I’ll link to the video of the session when it’s posted.)

Our existing subject guides were built in 2011 as a custom content type in Drupal and they were based on the tabbed approach of LibGuides. Unlike LibGuides, tab labels were hard-coded; you didn’t have to use all of them but you could only choose from this specific set of tabs. And requests for more tabs kept coming. It felt a bit arbitrary to say no to tab 16 after agreeing to tab 15.


We knew the guides weren’t very mobile-friendly but they really were no longer desktop-friendly either. So we decided we needed a redesign.

Rather than figure out how to shoe-horn this existing content into a new design, we decided we’d take a step back and do some user research to see what the user needs were for subject guides. We do user testing fairly regularly, but this ended up being the biggest user research project we’ve done.

  • Student user research:
    • We did some guerrilla-style user research in the library lobby with 11 students: we showed them our existing guide and a model used at another library and asked a couple of quick questions to give us a sense of what we needed to explore further
    • I did 10 in-depth interviews with undergraduate students and 7 in-depth interviews with grad students. There were some questions related to subject guides, but also general questions about their research process: how they got started, what they do when they get stuck. When I talked to the grad students, I asked if they were TAs and if they were, I asked some extra questions about their perspectives on their students’ research and needs around things like subject guides.
    • One of the big takeaways from the research with students is likely what you would expect: they want to be able to find what they need quickly. Below is all of the content from a single subject guide and the highlighted bits are what students are mostly looking for in a guide: databases, citation information, and contact information for a librarian or subject specialist. It’s a tiny amount in a sea of

I assumed that staff made guides like this for students; they put all that information in, even though there’s no way students are going to read it all. That assumption comes with a bit of an obnoxious eye roll: staff clearly don’t understand users like I understand users or they wouldn’t create all this content.  Well, we did some user research with our staff, and turns out I didn’t really understand staff as a user group.

  • Staff user research
    • We did a survey of staff to get a sense of how they use guides, what’s important to them, target audience, pain points – all at a high level
    • Then we did focus groups to probe some of these things more deeply
    • Biggest takeaway from the research with staff is that guides are most important for their teaching and for helping their colleagues on the reference desk when students have questions. Students themselves are not the primary target audience. I found this surprising.

We analyzed all of the user research, looked at our web analytics and came up with a set of design criteria based on everything we’d learned. But we still had this issue that staff wanted all the things, preferably on one page and students wanted quick access to a small number of resources. We were definitely tempted to focus exclusively on students but about 14% of subject guide use comes from staff computers, so they’re a significant user group. We felt it was important to come up with a design that would also be useful for them. In Web Committee, we try to make things “intuitive for students and learn-able for staff.” Student-first but staff-friendly.

Since the guides seemed to have these two distinct user groups, we thought maybe we need two versions of subject guides. And that’s what we did; we made a quick guide primarily for students, and a detailed guide primarily for staff.

We created mockups of two kinds of guides based on our design criteria. Then we did user tests of the mockups with students, iterating the designs a few times as we saw things that didn’t work. We ended up testing with a total of 17 students.

Once we felt confident that the guides worked well for students, we presented the designs to staff and again met with them in small groups to discuss. Reaction was quite positive. We had included a lot of direct quotations from students in our presentation and staff seemed to appreciate that we’d based our design decisions on what students had told us. No design changes came out of our consultations with staff; they had a lot of questions about how they would fit their content into the design, but they didn’t have any issues with the design itself. So we built the new guide content types in Drupal and created documentation with how-tos and best practices based on our research. We opened the new guides for editing on June 13, which was great because it gave staff most of the summer to work on their new guides.

Quick Guide


The first of the two guides is the Quick Guide, aimed at students. I described it to staff as the guide that would help a student who has a paper due tomorrow and is starting after the reference desk has closed for the day.

  • Hard limit of 5 Key Resources
  • Can have fewer than 5, but you can’t have more.
  • One of the students we talked to said: “When you have less information you focus more on something that you want to find; when you have a lot of information you start to panic: “Which one should I do? This one? Oh wait.” And then you start to forget what you’re looking for.” She’s describing basic information overload, but it’s nice to hear it in a student’s own words.
  • Some students still found this overwhelming, so we put a 160-character limit on annotations.
  • We recommend that databases feature prominently on this list, based on what students told us and our web analytics: Databases are selected 3x more than any other resource in subject guides
  • We also recommend not linking to encyclopedias and dictionaries. Encyclopedias and Dictionaries were very prominent on the tabbed Subject Guides but they really aren’t big draws for students (student quotations from user research: “If someone was to give this to me, I’d be like, yeah, I see encyclopedias, I see dictionaries… I’m not really interested in doing any of these, or looking through this, uh, I’m outta here.”)
  • Related Subject Guides and General Research Help Guides
  • Link to Detailed Guide if people want more information on the same subject. THERE DOES NOT HAVE TO BE A DETAILED GUIDE.
  • Added benefit of the 2-version approach is that staff can use existing tabbed guides as the “Detailed Guides” until they are removed in Sept.2017. I think part of the reason we didn’t feel much pushback was that people didn’t have to redo all of their guides right away; there was this transition time.

Detailed Guide


  • From a design point of view, the Detailed Guide is simpler than the Quick Guide. Accordions instead of tabs
    • Mobile-friendly
    • Students all saw all the accordions. Not all students saw the tabs (that’s a problem people have found in usability testing of LibGuides too)
  • Default of 5 accordions for the same reasons that Key Resources were limited to 5 – trying to avoid information overload – but because target audience is staff and not students, they can ask for additional accordions. We wanted there to be a small barrier to filling up the page, so here’s someone adding the 5th accordion, and once they add that 5th section the “Add another item” button is disabled and they have to ask us to create additional accordions. add-accordion
  • There’s now flexibility in both the labels and the content. Staff can put as much content as they want within the accordion – text, images, video, whatever – but we do ask them to be concise and keep in mind that students have limited time. I really like this student’s take and made sure to include this quotation in our presentation to staff as well as in our documentation:
    • When I come across something… I’ll skim through it and if I don’t see anything there that’s immediately helpful to me, it’s a waste of my time and I need to go do something else that is actually going to be helpful to me .

And speaking of time, thank you for yours.

Putting Critical Information Literacy into Context: How and Why Librarians Adopt Critical Practices in their Teaching / In the Library, With the Lead Pipe

Image by flickr user jakecaptive (CC BY-NC 2.0)

Image by flickr user jakecaptive (CC-BY-NC 2.0)

In Brief

Critical information literacy asks librarians to work with their patrons and communities to co-investigate the political, social, and economic dimensions of information, including its creation, access, and use. This approach to information literacy seeks to involve learners in better understanding systems of oppression while also identifying opportunities to take action upon them. An increasing number of librarians appear to be taking up critical information literacy ideas and practices in various ways, from cataloging to reference. But what does it mean to make critical information literacy part of one’s work? To learn more about how and why academic librarians incorporate critical information literacy into their classroom instruction, I interviewed thirteen librarians living in the United States. This article describes why and how these individuals take a critical approach to teaching about libraries and information, including the methods they use, the barriers they face, and the positive influences that keep them going.

In these days of mass surveillance and the massive transfer of public goods into private hands, citizens need to know much more about how information works. They need to understand the moral, economic, and political context of knowledge. They need to know how to create their own, so that they make the world a better, more just place.

– Barbara Fister, “Practicing Freedom in the Digital Library: Reinventing Libraries” (2013)



Like many other librarians who teach, I stumbled upon my newfound job duties having no formal training or experience as a teacher. I led the students brought in by their professor through a maze of databases, books, and services within the space of one hour as best I could, and students were often sympathetic but uninterested. Even at this early stage in my career, while students quickly packed up their belongings and filed out of the library classroom, I couldn’t help but wonder if something crucial was missing. I realized I was having difficulty squaring the big reasons I became a librarian–to advocate for and widen people’s access to information, to find ways to contribute to the well-being of communities through a commitment to social responsibility, to fight as one of the last holdouts in a society where “sharing” and “free” are becoming endangered terms–with my primary responsibility as a teacher of, as one student put it, “how to do the library.”

I eventually found what I was searching for, but it took some time. I started to learn about the work that had been taking place in critical information literacy, and critical librarianship more broadly, that had been occurring for decades. I looked into examples of radical librarianship and activism that were sometimes mentioned fleetingly in my MLS program but more often absent altogether, like the efforts of Sanford Berman to identify and update or remove derogatory and harmful Library of Congress subject headings (1971), the Radical Reference collective (Morrone & Friedman 2009), and the Progressive Librarians Guild (PLG), founded in 1990. This long tradition of social justice work being done in and outside of libraries by all types of librarians led to my discovery of critical information literacy. I have briefly addressed some of the many inspirational books and articles of critical librarianship and critical information literacy (Tewell 2015), but the most impactful works for me personally were by Maria Accardi, Emily Drabinski & Alana Kumbier (2010), James Elmborg (2006), Heidi Jacobs (2008), Maria Accardi (2013), and Robert Schroeder (2014). Many of these are from Library Juice Press, a publisher specializing in library issues addressed from a critical perspective.

Critical information literacy aims to understand how libraries participate in systems of oppression and find ways for librarians and students to intervene upon these systems. To do so, it examines information, libraries, and the work of librarians using critical theories and most often the ideas of critical pedagogy. As stated by Lua Gregory and Shana Higgins, critical information literacy “takes into consideration the social, political, economic, and corporate systems that have power and influence over information production, dissemination, access, and consumption” (2013). Inspired by the books and articles I had been reading, I started using different critical information literacy approaches in my classes. As I tried these methods I became increasingly interested in how other librarians made critical information literacy part of their teaching. Having firsthand experience making critical information literacy “work” as a teacher and seeing students truly engaged with their learning and topics that matter to them was very transformative for me. Seeing the change that critical information literacy can make, and the ways it deepened how I approached my interactions with students, led to me being passionate about its potential.

I interviewed 13 librarians working in a variety of academic institutions, living in different regions within the United States, and with varied ages, ethnicities, genders, and ablednesses, to see how they practice critical information literacy within library instruction. I spoke with them by email and via Skype, and asked questions about how their teaching is influenced by critical information literacy, what barriers they faced, what benefits they saw, and what factors allowed their teaching to continue or even flourish. In her recent book Critical Information Literacy: Foundations, Inspiration, and Ideas, Annie Downey argues, “Critical information literacy looks beyond the strictly functional, competency-based role of information discovery and use, going deeper than the traditional conceptions of information literacy that focus almost wholly on mainstream sources and views” (2016). At its core, critical information literacy is an attempt to render visible the complex workings of information so that we may identify and act upon the power structures that shape our lives. How this may actually be done within our libraries is what I wished to investigate in this project.

How librarians learned about critical information literacy

Critical information literacy has become better known due to the efforts of many that started writing and thinking about it years ago. This shift from the margins toward the center that critical approaches to librarianship have made is observed by librarian Emily Drabinski in a recent keynote (2016). The interviewees I spoke with learned about critical information literacy in a variety of ways. How you learn about something can in many ways shape how you come to understand it, so I wanted to address this first to set the stage for other questions.

Most interviewees learned of critical information literacy from a colleague either in their workplace or at another library, coupled with an article or book related to the subject. This is described by one librarian who was searching for readings that discussed the cultural aspects of information literacy. Upon reading an article by Cushla Kapitztke recommended by a colleague, this librarian said, “the whole world stopped around me. And I just was just blown away, and I’d never read anything like it…it really spoke to me.” Indicative of critical information literacy’s growing popularity, two interviewees learned about the topic at conferences and unconferences, such as the first #critlib unconference. Though three interviewees discovered critical information literacy during their MLIS programs only one did so through a formal class, while one was through a self-directed research paper and another was by preparing a bibliography while working as a graduate assistant. One librarian’s critical practice at her diverse public university was informed by her background in anthropology, and she identifies the Ferguson uprising as the point where she began discussing social justice issues in relation to information in her classes. She soon after found that library workers were discussing these issues on Twitter at the #critlib hashtag and “realized after the fact that what I was doing fit into this CIL [critical information literacy] approach that was already in place.”

Coursework in areas other than librarianship was key for some interviewees, who learned about critical pedagogy or critical theory before finding an article or book related to critical information literacy. One librarian learned about critical information literacy while doing research for doctoral coursework after having read Paulo Freire and in particular Myles Horton, whose work with poor and undereducated people in rural Tennessee caused her to draw connections with her own work with underprepared students: “Importantly, the community (i.e. students) identified the skills they needed to learn. I began to see information literacy as one of those skills that is truly fundamental to living and working today. This idea really pushed me past what I had previously thought information literacy was.” Other librarians related their educational backgrounds in English, Women’s Studies, and Social Studies as priming them for the ideas of critical pedagogy as applied to information literacy.

A majority of interviewees learned about critical information literacy relatively recently, between 2011 and 2014. Yet three librarians, at a community college, four year university, and liberal arts college, mentioned they had already been practicing these same types of ideas before learning about the term, showing that one may very well use critical information literacy approaches without being aware of the name: “CIL felt like a natural extension of what I had already been doing, and I imagine I’d be practicing critical librarianship/IL even if it weren’t something of an established subfield of information literacy.”

How critical information literacy can be incorporated into classes

To gather a sampling of ways these librarians brought critical information literacy into their teaching, I asked them about a time when they incorporated critical information literacy into a class. The interviewees shared a wealth of examples for single sessions and credit-bearing courses, and a few of my favorite examples follow, which I appreciate for their creativity, applicability to a variety of settings, and potential for involving learners with critical concepts. It is important to note that librarians’ identities shape the ways they are able to pursue their work. What one librarian with marginalized identities may be able to discuss in terms of politically-oriented topics with students or negotiate with course instructors in terms of class content will differ from librarians with privileged backgrounds. Librarians with marginalized identities are more likely to face challenges in actualizing their critical information literacy practice.

One interviewee at a regional public university campus described an activity that asks students to explore library databases and present them to the class. This easy-to-implement idea turns the tables on lecture-dominated library instruction, and asks students to not just be involved, but to share their knowledge. As this librarian described:

When I do this activity, I don’t even turn on my projector to show [students] stuff on the screen to get them started–I just have them jump right in, even if they don’t really know what they’re doing. Relinquishing control of the demonstration disrupts the teacher/learner hierarchy of power and places power in the learner’s hands…it shows [students] that they have knowledge that is worth sharing and that they, too, can have power to speak and guide and teach. Their voice matters. The idea, of course, as it that this leaks out of the walls of the classroom and into their lives and worlds.

Another librarian at a small college discusses power with students in terms of viewpoints represented within a database. “I love to talk about the role of power in information structures with students. One of the best ways to do this is to talk about what and who is and is not represented and why.” This librarian continues, “One way I’ve done this is to do a pre-search in a database on a topic with a bit of controversy and see if I can get a results list that is eye-opening…I had students look at the results list and evaluate the first 3-5 results and then we discussed their evaluation process…we talked about how information is created and who does the creating, including looking at who was funding the research in the peer review journals and who had ads in the trade journals.” This idea uses a database and the sources within to generate conversation and dialogue, and relates the evaluation process to the students instead of an external checklist.

One interviewee began a class discussion with a role-playing scenario: “Instead of simply demo-ing a database, I facilitated a role-playing activity in which [students] assumed the roles of scholars, and we then had a discussion about who gets to be a scholar and thus who has a voice in the literature. This was all new to them, and I think they were able to both understand what ‘the literature’ is and problematize academia in ways they hadn’t before.” She further explained, “when I did show them how to use a database, I was able to bring to their attention to the ways in which information organization (subject headings) are also problematic, particularly when it comes to gender identity and sexuality…I think this one-shot was critical in that is not only allowed students to peek ‘behind the scenes’ (as far as how information is produced and organized in academia), but it also troubled these processes.”

These ideas sometimes come in flashes of brilliance, but more often they are the result of trying something small, revising it, and trying again. One librarian at a large research university began by carefully considering the language she used and how she applied her authority as a teacher. She followed these reflective practices by asking discussion questions regarding whose voices are missing from discovery systems, whether Google or a library catalog, and found that students responded to these questions with interest. This then led to the more intentional creation of instruction sessions centered around critical topics and discussions. Several interviewees mentioned they had been unsure whether they were “allowed” to do this type of instruction, particularly those librarians who began thinking about critical information literacy a few years ago when literature and conversations regarding the topic were scarce. Starting out small may help with these feelings of uncertainty, even though, as two interviewees pointed out by relating examples of when their classes did not go as planned, all teaching is difficult and critical information literacy instruction can be particularly demanding due to the emotional investment it often requires.

How classroom methods are used to practice critical information literacy

With an understanding of the ways critical librarians taught classes using a critical information literacy approach, I was curious whether they found that particular teaching methods were conducive for doing so. Their responses revealed a number of commonalities as well as some unique ideas. Looking over these different methods, they demonstrate that critical information literacy has the potential to uncover and question some very big issues and norms while simultaneously being something that is very do-able.

Creating opportunities for interaction between the librarian and students was a frequent goal. Nine interviewees mentioned class discussions as their teaching method of choice. One librarian spends a great deal of effort fostering discussions, stating, “I spend more time now developing the questions I am going to ask than any other part of my planning because if you don’t ask the right questions, the conversation never reaches the level it needs to.” Another librarian at a comprehensive public university carefully centers student questions: “One method I use when I teach many of my classes for graduate students or doctoral students is I base the class on their questions…I give them time to talk amongst themselves about what they want to know, then I ask them. I write their questions on the board and tell them I’ll base the class on these questions, and that they should ask more if they have them.” This method “shows the students I want to try to answer their questions – they are the most important. It also parallels the kind of work they will see the librarians do with them at the reference desk or in individual research consultations.”

Certain activities and teaching techniques were shared by interviewees, including the Jigsaw Method which one librarian has “experimented over the past several years…using small groups that then convene into a larger group to guide conversations about exploring databases and evaluating sources–and not just evaluating, but collaboratively developing criteria to evaluate,” and activities that range from “group work exploring a variety of sources surrounding the murder of Trayvon Martin to acting out a scholarly debate on the coming out process.” Others had success using search examples to introduce critical ideas, such as prison abolition or Black Lives Matter. “I always try to use a search example/keywords/ideas that, hopefully, will expose students to a set of results that gets them thinking about an important topic,” one interviewee stated. For example, in an online tutorial one librarian used the research question, “How does air quality affect women’s health?” which is relevant to their student population in terms of geography, health, and economic disparity, but also draws attention to the gendered dimensions of environmental racism.

Another method for teaching critical information literacy is adapted from Paulo Freire’s concept of problem-posing, wherein teachers and learners co-investigate an issue or question of importance to them. One librarian at a small college described a successful example of problem-posing in library instruction, noting that she has “started asking the faculty to help me think of a problem the class could work on together, which I think is the best thing to have happened for my teaching in a long time.” Noting that it took her a great deal of time to reach the point where she is confident in asking faculty for something in exchange for helping their students, she describes an example:

[We looked into] when a specific law was passed and who the primary players were in passing the law. This sounds simple, but there was misinformation all over the place about this law. The Wikipedia entry was wrong and had been cited over and over so the wrong date was starting to appear as the “official date.” This was wonderful for our purposes because we had conversations about source type, government documents, how information gets perpetuated, sourcing and evaluation, etc. Essentially, we are able to problematize information consumption and dissemination with this one little question. The students were very into it.

A small but meaningful change to a common part of library instruction–an overview of the services a library provides–was described by one interviewee: “Instead of telling students about the services that we have, I might actually have the students find one or two things that they didn’t know about the library, and share with each other.” This is a way to “change the expectation that I’m going to be the person standing there and telling them what they need to know. That they can also, you know, construct their own knowledge. And perhaps learn, with maybe my guidance, learn from each other.” More than a third of the librarians I spoke with found that reflection was key to their instruction. “Allowing students time to reflect or posing questions that ask them to consider how/if the lesson is meaningful to them is an important part of the classroom experience for me,” one person affirmed. “Ideally, it adds a small jolt to their experience and communicates to students that I’m here for them, that I want to be useful and a purposeful addition to their classroom, not some intruder with my own agenda.”

The difference in critical approaches to IL does not always relate to the method, and is instead more likely to be based upon the social and political orientation of one’s instructional aims: “My set of methods haven’t changed that much since I started to shift to a more critical focus, it’s the topics that we are discussing that have shifted.” One interviewee describes what worked and what they would like to further pursue in a credit-bearing course, which corresponds to the demographics of their institution:

[A] discussion that went fairly well but that I would like to explore more next year revolved around the idea that publishing academic stuff is a mechanism for someone or a group to gain/earn authority or academic legitimacy. I had walked students through the peer review process, why it is important yet flawed, who engages in it and why, etc. Then, because another course outcome dealt with understanding “the disciplines,” we moved to the history of Chicano/a studies and its struggle for legitimacy in the academy. One of the ways in which it contributed to academic discussions/developed a canon and thus gained legitimacy in the world of higher education, was by establishing its own scholarly journals. My class discussed this a bit, but I’d really like to make this the focal point of my course next year.

How theoretical understandings inform the practice of critical information literacy

In order to better understand the various ways the librarians I interviewed thought of critical information literacy, I asked if there were theoretical or conceptual understandings that influenced their work. I clarified that these could be theories, ideas, or writings related to education, social justice, libraries, or other things meaningful to them. Many interviewees conceived of their teaching aims in terms of critical pedagogy. As described by Lauren Smith, critical pedagogy argues that “learners can only truly learn to think critically if they are also able to challenge the problems within power and knowledge structures in their educational environment as well as the wider world” (2013, p. 19). For Alana Kumbier, “Critical pedagogy offers tools we can use to denaturalize and evaluate phenomena that are often understood as inevitable, like economic or cultural globalization, or natural, like a binary sex/gender system, or just accepted, like the authority of an encyclopedia entry” (2014, p. 161).

Many interviewees cited readings that influenced how they thought about the goals and realities of formal education–what one person referred to as the “classics” of critical pedagogy. “While I was in grad school we read Freire, and Giroux, all those sort of classics. bell hooks, Teaching to Transgress, I loved too…If someone else were to say to me, ‘Oh, I’m sort of interested in this critical pedagogy, what is this about?’ those would be the things I would pull off my shelf with excitement and say, ‘You have to read these!’” This same interviewee was also influenced by critical race theory, finding it extremely useful for ideas about building inclusive classroom environments. Critical race theory was discussed by four interviewees, and the Handbook of Critical Race Theory in Education (Lynn and Dixson, 2013) was mentioned as one key resource in this area.

Feminist pedagogy was another area of theory and practice that inspired critical librarians. “For critical pedagogy and theory, I really respond to Freire, Mezirow, Shor, and hooks,” one interviewee replied. “And then feminist pedagogues who have problematized critical pedagogy like Jennifer Gore, Elizabeth Ellsworth, and Patti Lather. I like the critiques of critical pedagogy by these authors because they address the fact that it is really hard to pull critical pedagogy off in our institutions.” Maria Accardi’s 2013 book Feminist Pedagogy for Library Instruction was mentioned by four interviewees as directly informing their practice, particularly feminist pedagogy as an educational approach that “honors student experience and voice, has social justice aims, and is attentive to power dynamics in the classroom.”

One librarian’s upbringing and personal sense of social justice fundamentally informs her work: “I think I am primarily motivated by a strong inner sense of social justice more than anything. I learned about Paulo Freire and the banking method and conscientização [critical consciousness] and all of that…which helped to align my already innate social justice framework with the educational environment. But that sense that I am here on this planet to help make it better was already inside of me…my social justice framework is profoundly informed by my Catholic upbringing and education.” Another interviewee found student rallies and local issues on their campus, such as the cutting of positions in the university’s Ethnic Studies department, to be impactful on a personal level. The academic disciplines interviewees studied as undergraduate or graduate students were also highly influential, whether literature, cultural studies, anthropology, or journalism.

How critical information literacy is beneficial

As bell hooks writes, because “our educational institutions are so deeply invested in a banking system, teachers are more rewarded when we do not teach against the grain” (1994, p. 203). Why do librarians teaching against the grain choose to do so? What impact might this approach to teaching make? One question I asked interviewees was whether they find critical information literacy beneficial. The reasons they gave were largely related to the engagement they saw with students as well as their own newly discovered or rediscovered commitment to their work as librarians.

Brian Kopp and Kim Olson-Kopp argue that “the development of critical consciousness in a library setting depends first and foremost on humanizing, or putting a face on, research, and grounding it in the realities which shape it” (2010, p. 57). For the librarians I spoke with, having a basis in critical information literacy enabled them to be more engaged teachers who were able to bring their whole selves into the classroom. “I don’t think I’d still be doing what I’m doing if I hadn’t learned or figured out that I could use critical information literacy in the classroom, because I would be so burned out and bored by point-here-click-here teaching,” one interviewee states. Another librarian found the demands of this type of teaching has made them a better instructor: “It forces me to be self-reflective and challenges me to go outside of my comfort zone of knowing exactly what to do in front of a class. I’m definitely uncomfortable when I don’t have a concrete plan, but I believe that it really does benefit students more when I’m not following a rote plan and can instead allow for diversions.” These benefits of critical information literacy for instruction librarians are related to those for students, in particular fostering a sense of purpose and meaning.

Critical information literacy’s “student-centered emphasis” was influential for one interviewee, which has “meant moving beyond simply discussion or inquiry-based learning, and really bringing the students’ needs or knowledge or perspective to the fore, as much as possible.” Sincerely valuing student knowledge and the perspectives they bring, as well as finding ways to make this knowledge a meaningful part of classes, was discussed by several librarians. This focus on student-centeredness and its effects is described by one interviewee:

When classes are conducted in critical ways I think students get to hear their own voice and hear their own experiences validated. They see themselves, their whole selves, as part of the academic enterprise, an enterprise that they can change for the better. They can realize that the questions that really matter to them, many of which relate to power, are valid academic questions. It can help them make sense of the strange new place (to many) called the university.

Annie Downey notes that one “issue that arises from librarians’ lack of teacher training is that they struggle with finding ways to make their instruction meaningful to students. They often confront the problem of students being unaware to relate the information they are supposed to learn in library instruction sessions to what they may be doing in their classes or to their lives in any meaningful way” (2016). For critical librarians and their students alike, it appears that critical information literacy is one way to provide this meaningfulness.

One unexpected finding was that critical information literacy’s associated ideas and approaches to teaching was a way for librarians to connect with faculty and course instructors in a wide range of institutional settings. Many of these connections were related to shared pedagogical approaches or interests. One interviewee in a large research university said critical information literacy helped her “build a bridge” with teaching faculty and instructors in that these critical approaches to information act as “a shared language that we now have,” and as such, is a positive in developing collaborations such as alternative research assignments. For another librarian in a small liberal arts college, critical information literacy is a way to begin the conversation of going past the one-shot instruction model. “When someone comes to you and says, ‘I want you to come to my class and do a demo of JSTOR, can we have fifteen minutes of your time?’…I can comment back and say, ‘How about, if instead, we do this?’…Critical information literacy is really helpful to open up that dialogue again in a more meaningful way. And talk about the goals of instruction, and not just sort of fall back to that familiar routine and model.” Another interviewee found it a way to connect with faculty regarding shared interests in social justice issues: “As I’ve been exploring CIL I’ve been continually amazed at how many other faculty on campus come from a critical or social justice background! CIL helps me make immediate and deep connections with the faculty I relate to and with whom I work with as I teach IL sessions.”

One librarian at an urban public university observed how the campus setting impacted their teaching and the interests of students and faculty: “We’re a campus that’s really not far from the center of the city, where there are huge protests that happening downtown…even if I am not talking about those things, because…I don’t teach a class on my own, students might be thinking about these issues anyways in one their other classes, or maybe in the work that they do outside of class. So I think all of that makes it more easier for me to incorporate critical information literacy. Because, I feel like they also get it…they understand.” She added that instructors at her institution also generally think about the same types of political issues happening, such as protests and social movements occurring both on-campus and off. Another librarian in a different setting found inspiration for her classes at a small private college in a rural area through an event that transpired near their campus. The U.S. Federal Government conducted a raid of immigrant workers at a meatpacking plant, which was the largest at that point in time. In discussing this raid, the librarian asked students to gather materials from a variety of sources, including the College Archives, that documented or addressed the raid and its implications, and to evaluate what arguments each source was making as well as how it might be useful to or not useful to students’ research. For these librarians, a great deal of inspiration was found in events that impacted their nearby community and were important to students.

How barriers shape the practice of critical information literacy

The challenges faced by critical information literacy practitioners are important to consider because they identify obstacles that may then be more easily recognized and addressed. To learn more about challenges critical librarians face, I asked about barriers they experienced in making critical information literacy part of their classroom practice.

Far and away, the one-shot instruction model and a lack of time were the primary barriers. This is due in part because critical information literacy requires a significant time investment: “It takes more time to enact critical information literacy instruction–time to plan, time to reflect. This is not the kind of teaching you can do on autopilot.” Another interviewee makes an important distinction between finding time and making time: “I don’t think we ever ‘find’ the time, we can only make the time for things we think important. So making the time to make critical information literacy important is the key.” Although prioritizing what one finds important in their work is important, a related challenge is that of the single instruction session that many academic librarians make do with. “Obviously, if that’s the best you have to work with, I fully encourage anyone to do what they can,” one person says. “But it takes a certain degree of trust for students to take risks and challenge hegemonic assumptions – and showing up to one class session is not enough time to develop that trust.” Five other interviewees made similar remarks about the limitations of this common teaching scenario.

Having the confidence or courage necessary for a critical approach to teaching as well as the support essential to do so was also discussed. This was sometimes scary for one librarian: “When you’re committing to a way of teaching and learning that is mostly outside the norm, or outside what you’re most familiar with, it can feel scary,” while another interviewee noted how colleagues’ perspectives and their newness to a job impacted their ability to practice critical information literacy: “I struggle to feel confident in my professional actions in general, particularly when they differ from what my co-workers are doing or from what I’ve done in the past. Add to that the fact that I’m still new to my position…and that can lead to a lot of questioning and second-guessing on my part.” Others described being the only one at their institution who teach using critical information literacy as lonely, since “People don’t always get what you’re trying to do, even if you try to explain it to them.” An interviewee at a large institution felt confident in her teaching practice but misunderstood by colleagues, explaining that they “think that I’m wasting my time, or that I’m just being a bit too much of like a warrior, and take myself too seriously.”

Critical librarians also faced resistance from students, for a variety of reasons. Most often was because critical information literacy is not always comfortable or enjoyable for students who are accustomed to lectures and passive means of education:

Students don’t always recognize critical pedagogy as teaching, because it doesn’t look like most of the teaching they’ve experienced before. And maybe they don’t want to be actively engaged; maybe they just want to be lectured to, to be passive. Similarly, the teaching faculty member also may not recognize or understand what you’re doing as real actual teaching and may try to interfere or undermine you while you’re in the middle of teaching.

While interviewees typically found a few students in their classes willing to engage, getting all students on board and changing their expectations, especially within a 50 to 75-minute session, proved difficult. Another concern was balancing critical information literacy topics and methods with immediate student needs and course instructors’ expectations: “This is a difficult balance because not only do I feel pressure to give the instructor and students what they want and expect (database demonstrations), I really need them to understand the practical use of this information and how to be engaged actors within their learning experience.” Helping students succeed in their academic work so they can complete their assignments, receive their grades, and attain the degree they seek is by no means incompatible with the goals of critical information literacy, but requires attention of its own. In an article on the tension between neoliberal definitions of student success and critical library instruction, Ian Beilin notes the necessity of reconciling student preparation and the demands they must meet with teaching approaches that emphasize broader structural ideas: “Especially for first-generation students, students of colour, and working-class students, librarians have a responsibility to teach skills, so many of which more-privileged students have already acquired” (2016, p. 17). One possibility for meeting this challenge is for library educators to “encourage alternative definitions of success while at the same time ensure success in the existing system,” which can be a necessary but difficult balance to strike (2016, p. 18).

Apart from some experiences with student resistance and pedagogical challenges, the librarians I spoke with also faced difficulties in terms of faculty expectations of what information literacy instruction is or could be. As noted by one interviewee: “I had an activity where the students were evaluating different sources and sharing, like going up to the front of the class and sharing what they found, and demoing the resource themselves…and the instructor just cut them off. Because they wanted just the traditional librarian standing up there, telling them where to click.” Several librarians stated the biggest barrier to critical information literacy was faculty and course instructor expectations. Part of the reason for this lies in the power differentials between librarians and faculty, described by a librarian at a small liberal arts college:

I move carefully with faculty because they are the ones holding the power. If I want to have them bring their classes in and send their students to me, I have to be very respectful of their wishes. I am even more careful with more established faculty. At a small college, especially, a wrong move with the faculty can seriously undermine your ability to do anything with a specific department or can appear to reflect badly on your performance in the eyes of supervisors in the library.

Related to the challenge of faculty expectations is that of course instructors impinging on librarians’ classroom decisions or interjecting themselves into discussions. One interviewee noted the raced and gendered dynamics among librarian teachers and faculty, and in particular the tendency for certain faculty to interrupt librarians when they are in the middle of teaching: “My supervisor who’s an older white woman…she can really command a class. She’s also an excellent teacher. But would an instructor tell her to stop doing what she’s doing? I don’t think so….Someone like me, I look young, and am a woman of color.” This observation draws attention to the challenges that librarians with marginalized identities are likely to face in classroom environments.

Beyond student resistance and faculty expectations, the greater domain that education takes place within was discussed as a challenge. The increasing corporatization of higher education and expectations fueled by universities that students “invest” in an education in order to receive a “return” such as a high-paying job is one major factor at odds with the goals of critical information literacy: “Though we get great support and have the enthusiastic backing of the administration, it’s impossible to ignore that there is an explicitly career-oriented education on offer here. This undermines efforts to make the student the subject of the learning process rather than the object and certainly cuts against any effort to liberate learning from creeping corporatization.” A related issue is the culture of assessment embraced by many universities. The values of assessment and reporting and the tension with critical education approaches is related by one interviewee: “I’ve chaired our campus Academic Assessment Committee before; I fully understand the stakes of assessment in higher ed in a state-funded institution. Assessment culture privileges ways of teaching and learning that are quantifiable. I can’t put ‘changed lives and enacted social change’ on a rubric, but I am pressured to report student learning findings in ways that are rubric-able.”

How factors contribute positively to the practice of critical information literacy

While critical librarians face significant challenges, it is important to also understand what factors enable their practice and help it to thrive. The librarians I interviewed identified several elements: seeing critical information literacy work–something that has been a big contributing factor in my own practice–having a community of librarians who are attempting similar things, having colleagues at their own workplace to talk and collaborate with, and online spaces for discussion such as #critlib on Twitter.

“When you know you’ve been successful, that you have had an impact on the student in some way, this reinforces the practice of this kind of teaching in a very positive, affirming way,” one librarian wrote. “Knowing I’ve had an impact, for me, is more through informal observation and conversations, through the questions students ask, or what they write down on the 3-2-1 sheet I use a lot.” Seeing critical information literacy work in their classes, and knowing firsthand the difference that this approach can make, was cited by several interviewees as being a major positive factor. As one person stated, “the biggest thing is my past experiences where it’s worked. Where things have happened in a classroom and people have said things that I never, ever, ever, would have thought or said.” These opportunities for unexpected and authentic conversations inspired several critical librarians. In considering the challenges faced in her critical and feminist pedagogy, Maria Accardi writes, “What gave me hope, what kept me going, what helped me remember that feminist teaching is worth the effort and difficulty, was that even amongst all my failures and flops, there were shining moments of success” (2013, p. 3).

Librarians need to not just see that this approach to teaching works for them and their students, but to be supported in their efforts. For some librarians I spoke with, this meant finding a community of colleagues interested in critical librarianship. “Being connected to a like-minded community who also cares about critical information literacy instruction is hugely important. Without having anyone else to talk to about it, I don’t know if I’d have the fortitude to keep doing what I’m doing,” notes one interviewee. Another librarian came to the same conclusion: “The biggest positive factor is talking to other librarians trying to do the same thing. This is wonderfully beneficial at conferences, but even better was when people at my institution got interested.”

In a similar way that finding a community of librarians helped foster librarians’ critical practices, locating colleagues at one’s own institution, inside the library or out, was important for the same reasons. One interviewee identified this as building allies: “A big thing for me is when I can build allies. Whether that’s with the course instructors, or in the library. So I tend to sort of, try to find people, try to find instructors who I know are engaged with Women’s and Gender Studies, or some other kind of department that thinks about these kinds of things and uses these methods already.” Five other interviewees noted that they made strong connections with faculty colleagues and this contributed positively to their critical information literacy efforts. In describing the experiences of critical librarians she interviewed, Downey found a similar theme: “Even having just one other person at their institution who is cognizant of and uses critical information literacy can make a big difference for librarians’ comfort level with critical approaches and content” (2016, p. 141). At the same time, Downey makes a strong case for the necessity of librarians, solo or with others, working to change the direction of teaching at one’s institution, for “teachers inspire other teachers and trust other teachers” (2016, p. 145).

One tool that interviewees used to connect with others was the “critlib” hashtag on Twitter. Short for “critical librarianship,” this hashtag is used for chats on a variety of topics as well as a way to converse with other librarians following the hashtag. One interviewee discusses the helpfulness of #critlib in establishing an online meeting place for critical librarians: “The #critlib community and artifacts they create (conferences, website, etc.) have been really helpful, not necessarily for things like lesson planning or creating activities, but for giving me content to think about that I can then integrate into the classroom…It’s also really nice to know there are people out there thinking and excited about the same things as me.” For one librarian at a liberal arts college, #critlib provided an umbrella for different theories and approaches that could later be applied: “I sort of fumbled my way through information literacy at first, and that’s why I find #critlib very helpful. The nice thing about #critlib is that it has help [sic] provide a (loose) framework for bringing together many different theories and approaches that could be considered critical.”


The librarians I interviewed appreciated the fact that this approach to librarianship has been blooming. This shift was described by one librarian who has been interested in critical librarianship for several years: “The Progressive Librarians Guild and Library Juice Press have been going for some time, but criticality seems to be flowering these days. Just a few years ago I was looking for a conference to go to actually talk to some live folks about critical pedagogy. I looked all over the web in all sorts of disciplines and countries and didn’t hardly find anything, and definitely nothing LIS related.” But now there are many events, from those on critical librarianship specifically, such as the first #critlib unconference, events organized by the Radical Librarians Collective in the UK, the 2016 Critical Librarianship and Pedagogy Symposium, and the Critical Information Literacy Unconference held prior to the European Conference on Information Literacy, to the presence of critical sessions in large American conferences such as ACRL and ALA.

In noting the increasing popularity of critical information literacy, one interviewee urged librarians to continue applying critical thought to other areas as well: “Instruction doesn’t stop at the classroom door, and then this whole process of conscientization doesn’t just stop there as well.” Critical information literacy is not limited to teaching, and thinking broadly about the implications of libraries can encourage positive changes. Relatedly, critical librarianship must be informed by diverse perspectives. The issue of perceived barriers was brought up as one problem: “the biggest concern that I have…is that there are these perceived barriers, and that there is a price of admission. That you cannot be a critical librarian until you’ve read these six books, or have this degree, or this background. That really bugs me. Just because it’s the antithesis of what critical pedagogy is supposed to be, which is valuing the experiences and understandings that everybody brings.” One thing these interviews clearly showed me was that every person’s background contributed immensely to their critical practice, and it was drawing upon their individual passions that made them the librarian they are.

Regarding future directions for critical information literacy, some interviewees responded that they wished for information literacy to become “something that by nature needs to be critical”: “My hope is that someday there will come a time when information literacy and critical information literacy are the same thing. And we don’t have to live with this older model of, you know, dealing with the tools kind of instruction.” In contrast to this hope that information literacy and critical approaches would become one in the same, another person made the point that, “I also don’t think that it has to be for everybody. We are all very different people and we don’t all have the same political views, and…we definitely don’t have the same philosophies when it comes to our approach to being librarians. So, at the same time, if you don’t feel like this is for you, that’s also fine.”

Whether or not one makes efforts to adopt a critical approach to librarianship through action, reflection, and theory, the relationships we develop with our communities and ways to meaningfully work towards creating a better world should be a central consideration. “The way we research, the way we teach, the way we practice our profession are all really building relationships with scholarly communities and with students that are becoming part of scholarly communities,” one interviewee wrote. “So one thing about getting into critical practices is that I’m connecting with a whole network of folks that think deeply and believe a better future is possible. Librarians, scholars, and students.”

As I finished one interview, the librarian I was in contact with shared several reflective questions, stating, “These are the things that I try to reflect on to distance myself from the daily grind and getting caught up in the monotony or frustrations of work. I thought you might enjoy them as well.” I would like to conclude with some of those questions that were offered, with the hope that readers might take them as an invitation to reflect intentionally upon their work and themselves.

  • What are some existing forms of oppression our students engage with at the academy?
  • How do librarians reinforce those systems of oppression in the classroom or inadvertently within library practices? How do our assumptions work their way into our teaching practices?
  • What are some ways in which you design the classroom experience to be a democratic, collaborative, and transformative site?
  • How do we balance the lived experiences of our students with “canonical” sets of knowledge and skills that they are required to learn?
  • How do you view your role as an academic librarian and its relationship to social justice?

A sincere thank you to the reviewers who contributed their insights and expertise to make this a better piece: Lauren Smith, Sofia Leung, and Ryan Randall. Thank you to Annie Pho for serving as the Publishing Editor for this article and seeing it through the Lead Pipe publication process. The Institute for Research Design in Librarianship gave me the methodological footing I needed to begin this project and the camaraderie I needed to see it through. A special thanks to the 13 librarians I interviewed. It was a true pleasure to talk with them about critical information literacy and their work, and they left me inspired and hopeful that librarians and libraries can help create positive social change.

Works Cited

Accardi, Maria. (2013). Feminist pedagogy for library instruction. Sacramento, CA: Library Juice Press.

Accardi, Maria T., Emily Drabinski, and Alana Kumbier, eds. (2010). Critical library instruction: Theories and methods. Duluth, MN: Library Juice Press.

Beilin, Ian. “Student success and the neoliberal academic library.” Canadian Journal of Academic Librarianship 1, no. 1 (2016): 10-23. Available at: (retrieved 19 September 2016).

Berman, Sanford. (1971). Prejudices and antipathies: A tract on the LC subject heads concerning people. Metuchen, NJ: Scarecrow Press.

Downey, Annie. (2016). Critical information literacy: Foundations, inspirations, and ideas. Sacramento, CA: Library Juice Press.

Drabinski, Emily. “Teaching the radical catalog.” In Radical cataloging: Essays at the front, edited by K.R. Roberto, 198-205. Jefferson, NC: McFarland and Company, 2008. Available at: (retrieved 19 September 2016).

Drabinski, Emily. “Critical pedagogy in a time of compliance | Information literacy keynote, Emily Drabinski.” YouTube video, 01:09:36. Posted by Moraine Valley Community College Library, May 3, 2016.

Elmborg, James. (2006). “Critical information literacy: Implications for instructional practice.” Journal of Academic Librarianship 32, no. 2: 192-199. Available at: (retrieved 19 September 2016).

Ettarh, Fobazi. (2014). “Making a new table: Intersectional librarianship.” In the Library with the Lead Pipe. Available at: (retrieved 19 September 2016).

Fister, Barbara. “Practicing freedom in the digital library.” Library Journal, 26 August 2013. Available at: (retrieved 19 September 2016).

Gregory, Lua, and Shana Higgins, eds. (2013). Information literacy and social justice: Radical professional praxis. Sacramento, CA: Library Juice Press.

bell hooks. (1994). Teaching to transgress: Education as the practice of freedom. New York: Routledge.

Jacobs, Heidi L.M. “Information literacy and reflective pedagogical praxis.” The Journal of Academic Librarianship 34, no. 3 (2008): 256-262. Available at: (retrieved 19 September 2016).

Kopp, Bryan M., and Kim Olson-Kopp. “Depositories of knowledge: Library instruction and the development of critical consciousness.” In Critical library instruction: Theories and methods, edited by Maria T. Accardi, Emily Drabinski, and Alana Kumbier, 55-67. Duluth, MN: Library Juice Press, 2010.

Kumbier, Alana. Interview by Robert Schroeder in Critical journeys: How 14 librarians came to embrace critical practice, 157-173. Sacramento, CA: Library Juice Press, 2014.

Lynn, Marvin, and Adrienne D. Dixson, eds. (2013). Handbook of critical race theory in education. New York: Routledge.

Morrone, Melissa, and Lia Friedman. “Radical reference: Socially responsible librarianship collaborating with community.” The Reference Librarian 50, no. 4 (2009): 371-396. Available at: (retrieved 19 September 2016).

Schroeder, Robert. (2014). Critical journeys: How 14 librarians came to embrace critical practice. Sacramento, CA: Library Juice Press.

Smith, Lauren. “Towards a model of critical information literacy instruction for the development of political agency.” Journal of Information Literacy 7, no. 2 (2013): 15-32. Available at: (retrieved 19 September 2016).

Tewell, Eamon. “A decade of critical information literacy: A review of the literature.” Communications in Information Literacy 9, no. 1 (2015): 24-43. Available at: (retrieved 19 September 2016).