Planet Code4Lib

Membership-driven news media / Casey Bisson Journalism is facing both a trust crisis and a sustainability crisis. Membership answers to both. It is a social contract between a news organization and its members in which members give their time, money, energy, expertise, and connections to support a cause that they believe in. In exchange, the news organization offers transparency and opportunities to meaningfully contribute to both the sustainability and impact of the organization. Elsewhere it continues:

Weeknote 43 (2020) / Mita Williams

This was the week that I planned to remove myself as much as possible from my regular working responsibilities and reconnect with my chosen community of Access 2020 which is the GOAT of conferences, in my books.

This did not happen.

Instead, I ended up working on a variety of management-related responsibilities and caught what Access sessions I could, asynchronously. I mention this not as a consideration for myself as some sort of martyr but because middle management work is work that can be devalued by both librarians and administration.

I was able to watch the opening keynote. Jessie Loyer’s talk on indigenous language revitalization through the lens of technology was everything an opening keynote should be: welcoming, questioning, challenging, and illuminating.

I also want to give a special shout-out to Shelley Gullikson’s “Web librarians who do UX: We are sad, we are so very very sad”.

IMHO: Leadership/management/librarians must understand that charging individuals with the responsibility of the library website without the authority to make those changes without consensus or vote taking from librarians is nothing less than the abject rejection of professional expertise of UX librarians.

I say this as a former UX librarian who also found a relief from sadness in Scott Pilgrim .

Another Access presentation that I very much enjoyed was Amy McLay Paterson’s What is a Library Website, Anyway?

The library website is many different things to many different people, but in the academic context, it is primarily thought of as a research portal. But Paterson suggests that considering the library as a contribution to student success should not be completely overshadowed.

Later in the day, after I had watched Amy’s presentation, I tried to catch up on some of my reading and found this article — Creating a Student-Centered Alternative to Research Guides: Developing the Infrastructure to Support Novice Learners — that rhymed with some of concerns Amy raised earlier.

Ruth L. Baker (2014) suggested that LibGuides could be used more effectively if they were structured as tutorials that guided students through the research process. Such guides would “function to reduce cognitive load and stress on working memory; engage students through metacognition for deeper learning; and provide a scaffolded framework so students can build skills and competencies gradually towards mastery.”28 In one of the few studies conducted to assess the impact of research guides on student learning, Stone et al. (2018) tested two types of guides for different sections of a Dental Hygiene first year seminar course. One guide was structured around resource lists organized by resource types (pathfinder design) while the second was organized around an established information literacy research process approach. The results showed that students found the pedagogical guide more helpful than the resource guide in navigating the information literacy research process. Stone et al. concluded that these pedagogical guides, structured around the research process with tips and guidance explaining the “why” and the “how” of the research process, led to better student learning.29

Jeremiah Paschke-Wood, Ellen Dubinsky and Leslie Sult, “Creating a Student-Centered Alternative to Research Guides: Developing the Infrastructure to Support Novice Learners“, In the Library with the Lead Pipe, 21 Oct 2020

I take some comfort from the conclusions above.

Recently I was asked to give a 3 hour lecture to a small class of graduate students from the University of Windsor’s Great Lakes Institute for Environmental Research. I found that I needed some form of scaffolding to frame the information I was about to present or students (and I) would feel terribly lost. I opted to structure the class around work of The Open Science Research Cycle, based on Jeroen Bosman and Bianca Kramer work on academic workflows at

In a perfect world, my set of H5P slides of The Open Science Research Cycle would be finished in time for the last day of Open Access Week, but here we are.

Political bias in social media algorithms and media monetization models / Casey Bisson

New reports reveal yet more structural political biases in consumption and monetization models. More evidence has emerged about how Facebook tilted its algorithm in favor of conservative voices: In fact, we have now learned that executives were even shown a slide presentation that highlighted the impact of the second iteration on about a dozen specific publishers—and Mother Jones was singled out as one that would suffer, while the conservative site the Daily Wire was identified as one that would benefit.

OCLC-LIBER Open Science Discussion on Metrics and Rewards / HangingTogether

What is the role of metrics and rewards in an ideal open science ecosystem? What are the challenges in getting there? What would collective action look like? The fourth session of the OCLC/LIBER Open Science Discussion series, which brought together a group of participants from universities and research institutes across ten different countries, focused on these questions.

Ideal future state

Photo by Jonathan Chng on Unsplash

The group envisioned an ideal future open science ecosystem in which the current academic system of metrics and rewards will have been overhauled. Its new characteristics were described as transparent and open, involving the researcher in the design of the assessment process, incentivizing research behaviour that attaches greater importance to openness and integrity and “thinking more of the researcher as a whole person”. In other words, the new system puts the metrics in a human context, looking at the researcher’s profile, network and relationships, development trajectory, and narrative. This requires a different taxonomy for research assessment and more attention for qualitative measurements. Should openness become the new focus of the reward system or should the system continue to focus on ranking research quality and excellence? There was agreement that in an Open Science ecosystem openness should be central and that what needs to be measured is “the way open impacts society”.

Doing away with the “perverse incentives” that are embedded in the current ecosystem, such as the “pressure to publish” and the associated use of the journal impact factor, was seen as the way forward – although one participant cautioned against throwing the baby out with the bathwater because the main problem is not the measure itself, but the way it is used. This led to a discussion of the current system of individualistic rewards linked to tenure and the question: at what level should metrics and rewards be applied? The dilemma of individual versus aggregate level metrics and the ways to analyze the data, is a methodological one. On the whole, the group felt that research metrics are best applied at the aggregate and not the individual level.

Responsible metrics and rewards requires “thinking more of the researcher as a whole person”

Obstacles and challenges

Photo by Alyssa Ledesma on Unsplash

It was clear to the participants that achieving this ideal future OS ecosystem for metrics and rewards won’t happen overnight. The new ecosystem is slow to start. One participant mentioned a recent survey on open research conducted at their university in the UK, which revealed that some researchers still do not know what Open Science is. There is no common understanding of OS across regions in the world, sighed another discussant. Someone observed that compliance figures in repositories did not make her optimistic about the pace of adoption of Open Science. On the other hand, there are also signs that the current system is causing much stress to researchers and is reaching its limitations. One participant mentioned a survey, conducted at a US university, which revealed that most respondents were burned-out on the research assessment practice and face mental health risks. The group mentioned many challenges and obstacles to effectuating change. Culture change was one of them, but not among the top three. The challenges that scored high and that were chosen to discuss further were:

  1. Uncertainty among researchers on what Open Science actually means to them
  2. Many open science activities don’t directly result in articles (e.g., open methods, open infrastructure, open data). Other stuff matters too but is not measured/supported and based on unpaid and unrecognized labor.
  3. The big divide between Open Science advocacy and the day-to-day research work of researchers: ‘this is yet another thing we need to do’

Collective Action

With these three challenges in mind, what can libraries collectively do to raise awareness about Open Science and help effectuate systemic change in terms of metrics, incentives, rewards and funding? The group clearly saw an educational opportunity for libraries. Research libraries could act as a neutral party on campus, creating teachable moments on open research and explaining what it could mean for different disciplines. Someone suggested to start with educating early career researchers or even target undergraduate level students. The library is not always the natural go-to partner for researchers. Some participants thought it was more effective to engage higher-up, with campus administrators (the Research Office, Faculty Senate, etc.), in conversations around research impact and strength areas – as a way to raise awareness and demonstrate what can be done differently. Others suggested creating OS-metrics together with those who conduct research assessments and those who decide on assessment criteria. These stakeholders – in particular the funding agencies – have shown the influential role they play and their ability to effect change rapidly.

In the same vein, “ranking the rankers” was viewed as an effective way to change and help bridge the divide. This approach aims to disrupt the hierarchy of stakeholders in research evaluation, by targeting the top of the pyramid – the League Tables or World University Rankings – and adding an additional layer to evaluate the rankers. This is something the INORMS Research Evaluation Working Group have suggested and are currently working on. 

We need to make open easy – and we are currently not making it easy.”

Finally, another option and effective way to engage researchers was discussed, namely, to support their workflow in sharing outputs. This was also seen as an opportunity to help address the burden for researchers of having to report their outputs multiple times, in different information systems. “We need to make things easy – and we are currently not making it easy.” So what would be ideal or responsible metrics in the day-to-day research workflows?

Participants pointed out that the new metrics & rewards system is not only about Open Access publications. It is about valuing and recognizing all contributions to research and scholarly activity. And a main issue is standardisation, which is needed in the evaluation system at the national, regional and international levels. How aligned are systems across geographies? How different are the levels of adoption across different nations? To help start the conversation at the institutional and national levels, the European Working Group on Rewards under Open Science published a Report in 2017.  The ‘Hong Kong’ principles for assessing researchers was also mentioned in this context. There is an effort to standardize the format of a CV – not just listing publications, but all sorts of research outputs, such as peer review contributions. Using open standards and infrastructures can help make OS easier and also support greater system-to-system integration required to reduce researcher burden. Greater systems integration also allows access to more complete data – and this in turn enables to perform analytics which is the driving force behind the new metrics & rewards system.

About the OCLC-LIBER Open Science Discussion Series

The discussion series is a joint initiative of OCLC Research and LIBER (the Association of European Research Libraries). It focusses on the seven topics identified in the LIBER Open Science Roadmap, and aims to guide research libraries in envisioning the support infrastructure for Open Science (OS) and their roles at local, national, and global levels. The series runs from 24 September through 5 November.

The kick-off webinar opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all seven topical small group discussions will be published on the OCLC Research blog, Hanging Together. Up to now these are: (1) Scholarly Publishing, (2) FAIR research data and (3) Research Infrastructures and the Europen Open Science Cloud.

Join us! We invite all members of the open science community to join our organizations for the closing round-up webinar on 5 November, where we will synthesize and share the findings from all seven group discussions. Register today.

The post OCLC-LIBER Open Science Discussion on Metrics and Rewards appeared first on Hanging Together.

Ding, Ding, Round Two / Archives Unleashed Project

Photo by İrfan Simsar on Unsplash

The Archives Unleashed project was launched in 2017 with a focus on expanding web archive accessibility and lowering barriers for working with web archive data. Over the course of the following three years, the team developed tools such as the Archives Unleashed Toolkit, Cloud, and Warclight to enable scholars, librarians, and archivists to access, share and investigate archived webpages since the early days of the World Wide Web. Our 2020 Community Report details our accomplishments over the past three years.

But our work isn’t finished!

In July, the project was awarded a second Andrew Mellon Foundation grant, which will support the continuation and expansion of our work to make web archives more accessible over the next three years (2020–2023). Our original team of investigators will be joined by colleagues from the Internet Archive to integrate and blend complementary services and tools to broaden access to web archives.

What will we be doing?

The web archiving landscape continues to experience a growth of tools and approaches to working with web data. In this next project, we will contribute to this by bringing together the sector-leading and comprehensive web archiving collection service (Archive-It) with our focused tools for research analysis (Archives Unleashed). In other words, rather than needing to collect collections on one platform to then pivot and analyze them on another, we will be bringing them together!

Our project has two main priorities:

First, we will merge the Archives Unleashed analytical tools with the Internet Archive’s Archive-it service to provide an end-to-end process for collecting and studying web archives. This will be completed in three stages:

  1. Build. Our team will be setting up the physical infrastructure and computing environment needed to kick start the project. We will be purchasing dedicated infrastructure with the Internet Archive.
  2. Integrate. Here we will be migrating the back end of the Archives Unleashed Cloud to Archive-it and paying attention to how the Cloud can scale to work within its new infrastructure. This stage will also see the development of a new user interface that will provide a basic set of derivatives to users.
  3. Enhance. The team will incorporate consultation with users to develop an expanded and enhanced set of derivatives and implement new features.

Secondly, we will engage the community by facilitating opportunities to support web archives research and scholarly outputs. Building on our earlier successful datathons, we will be launching the Archives Unleashed Cohort program to engage with and support web archives research. The Cohorts will see research teams participate in year-long intensive collaborations and receive mentorship from Archives Unleashed with the intention of producing a full-length manuscript.

The underlying goal of this project is to help develop computational research and ensure the long-term sustainability of web archiving tools such as ours. By co-locating the analysis features of the Archives Unleashed Cloud within the storage and collection service framework of Archive-It, we anticipate scholars will conceptualize and realize new research paths in using web archive data.

We want to work with the community!

The broader web archiving community, both our immediate users as well as anybody interested in using the web to answer questions about the relatively recent past, are a vital component for how we approach development. As we move forward, there are a number of opportunities for getting involved!

User Feedback

While we send out occasional surveys, our team always appreciates feedback from our users. Your feedback helps to inform the features and processes we incorporate into the project! We invite you to use our general feedback form to provide comments, feedback on challenges you are facing, or suggestions for new features. To report any issues or specific feature requests, we a reporting template in our Toolkit and Cloud repositories.

User Testing

As we migrate the Archives Unleashed Cloud and create a new interface with the Archive-It platform, we will work closely with our advisory board and community to seek focused feedback and conduct user testing.

Cohort Program

We will be opening the application process for the Archives Unleashed Cohort Program in February 2021. This will provide an opportunity for teams of researchers to engage in a year-long collaborative mentorship opportunity and will highlight tangible case studies to the broader community. Cohorts will begin in July 2021 and July 2022. Announcements will be made through our project channels (Slack and Twitter) so be sure to keep watch!

Stay Connected

Keep up to date with our project and the latest developments!

Ding, Ding, Round Two was originally published in Archives Unleashed on Medium, where people are continuing the conversation by highlighting and responding to this story.

DEI workshops and courses that I’m excited about / Tara Robertson

photo by Jacob Lund from Noun Project

Diversity, Equity and Inclusion (DEI) is a growing business. There are numerous DEI tech startups, DEI companies, DEI consultants and DEI certifications. I’ve been underwhelmed by the certifications offered by academic institutions as they are overly theoretical and don’t seem to equip learners with practical skills to do DEI work. Here are some trainings and workshops that are coming up that I’m excited about.

This Friday, October 23rd Paradigm’s Joelle Emerson, Dr. Evelyn Carter, and Courri Brady are offering a 1 hour free webinar on Creating Your 2021 Diversity, Equity, and Inclusion Strategy

Dereca Blackmon’s Inclusive Mindset for Committed Allies (free if you have LinkedIn Premium, unsure of the pricing if you don’t.) Dereca is the CEO of Inclusion Design Group and the former Assistant Vice Provost and Executive Director, Diversity and Inclusion Office at Stanford. She’s a dynamic speaker and I imagine this will be good. This 1 hour class is asynchronous, so you can do it at your own pace. 

Nicole Sanchez’s Building a More Inclusive Workplace: A 5-week series for measuring and improving DEI at your company (in partnership with O’Reilly. Nicole is the Founder and CEO of Vaya consulting and has been the VP of Social Impact at GitHub and a lecturer at UC Berkeley’s Haas School of Business. She is a thought leader and practitioner I admire. This class starts on November 3 and runs for 5 weeks, once a week. O’Reilly offers monthly or yearly subscriptions to their learning platform, so as an individual you would need to pay $49USD for for 2 months).

Dr. Dori Tunstall is offering Hiring for Decolonization, Diversity, and Inclusion in the Creative Industries Micro-Certification ($675 CAD or ~$515 USD) through OCAD U’s Continuing Studies department. Dori has been a leader at OCAD in effectively using cluster hires to shift increase the number of Black and Indigenous faculty and to start to shift the culture of the university. This class starts on November 18th and runs for 5 weeks, once a week synchronously with 3 optional synchronous sessions for students to share their work with each other.

What other trainings would you recommend?

Creating a Student-Centered Alternative to Research Guides: Developing the Infrastructure to Support Novice Learners / In the Library, With the Lead Pipe

In Brief:

Research and course guides typically feature long lists of resources without the contextual or instructional framework to direct novice researchers through the research process. An investigation of guide usage and user interactions at a large university in the southwestern U.S. revealed a need to reexamine the way research guides can be developed and implemented to better meet the needs of these students by focusing on pedagogical support of student research and information literacy skill creation. This article documents the justification behind making the changes as well as the theoretical framework used to develop and organize a system that will place both pedagogically-focused guides as well as student-focused answers to commonly asked questions on a reimagined FAQ/research page. This research offers academic libraries an alternative approach to existing methods of helping students. Rather than focusing on guiding students to a list of out-of-context guides and resources, it reconceptualizes our current system and strives to offer pedagogically-sound direction and alternatives for students who formerly navigated unsuccessfully through the library’s website, either requiring more support, or failing to find the assistance they needed.


The way librarians teach research methods and interact with faculty and staff across campus has changed over the years. This is due to a number of factors including reduced or flat budgets, increasing undergraduate enrollment, and changes to content delivery brought on by technological adaptations and users’ needs. Amid these trends, more and more librarians search for active ways to engage novice researchers with instruction that provides guidance and scaffolding into more complex research practices and concepts, instead of instruction that focuses on search mechanics or rote practices. Strangely, since their inception almost 50 years ago, research guides, often used to supplement instruction, have evolved into resource lists despite ample research suggesting this approach has limited efficacy as an instructional approach. Librarians also now often need to look to technology to help support student learning or provide this instruction, with fewer opportunities for in-person instruction or fewer librarians to conduct this instruction.

While academic libraries have long relied on subject guides as a means for supporting students through the research process, the advent of widespread Internet usage allowed libraries to begin making guides available online. This process was streamlined even further with Springshare’s development of the LibGuides platform in 2007. The ease of creating and copying LibGuides has provided librarians a means of developing online, scalable research support for students. In surveying guides across institutions, it is clear that the guides tend to follow a traditional “pathfinder” model that provides students with extensive lists of resources.1 While this is a valid use of guides, the changing expectations of students and faculty as well as more nuanced views of the research process require libraries to rethink the ways they support students as they attain information literacy skills and competencies.

Given these factors, our research focuses on whether or not current practices around the use and presentation of guides, which generally include comprehensive lists of resources without context or instruction, align with information literacy concepts as well as with commonly accepted practices around the way students learn. If the answer is no, what can we – as academic librarians and educators – do to provide a more useful and pedagogically sound option for early career undergraduates? How do we leverage our technology solutions to better serve this constituency who might not receive information literacy instruction through their coursework and might be intimidated by the prospect of asking for assistance from a person at a public service desk?

At the University of Arizona (UArizona), where this research is taking place, 20 liaison librarians are tasked with serving as the primary research support for the entire campus of over 45,000 students, while a smaller group focuses on information literacy instruction to the 4,000-6,000 new undergraduates that arrive on campus every year. The students possess varying levels of experience and skill in research. With the small number of liaisons working with this large community, the need for research support delivery via the library website and other online tools is more and more important. In this article, we will discuss utilizing the LibGuides and LibAnswers platforms to allow students to have more control over their research journey as they navigate the types of resources and library instructional support they need to develop successful research habits and practices. The methods we have used for these changes correspond to research in the application of adult learning theory in library instruction and the conclusions drawn by Kathy Watts in her 2018 analysis of the application of principles of andragogy in online library instruction that “college students… display the characteristics of adult learners. They like to know that their learning is relevant. They learn best when tutorials are problem-based. They come to library instruction with prior learning that needs to be accommodated. They prefer, and are capable of, self-directing their learning.”2


Given the current resource-heavy content in the University of Arizona Libraries’ (UAL) course and subject guides, we began our research by looking for older literature about subject and research guides with the hope of discovering how research guides evolved. While we knew of more recent literature and projects – such as those identified by Alison Hicks in her 2015 article “LibGuides: Pedagogy to Oppress”3 – that position LibGuides as instructional tools, we were surprised to find that researchers have stressed the importance of designing guides with pedagogy at the forefront for decades. Few of the suggestions that researchers previously put forth have been followed, including in the creation of LibGuides at our own institution.  

The origin story of library research guides usually starts with topic-specific reference aids developed at MIT in the early 1970s as part of the Model Library Program of Project Intrex.4 These printed aids were called Library Pathfinders and marketed as such. 5 The Pathfinders were expressly “designed to be useful for the initial stages of library research.” They were not intended to be bibliographies, exhaustive guides to the literature, or accessions tools. Pathfinders were a “compact guide to the basic sources of information specific to the user’s immediate needs” and “a step-by-step instructional tool.”6 Canfield (1972) explained that by “a judicious combination of a series of selected informational elements … a Pathfinder enables the user to follow an organized search path.”7 The initial intention was never to create a comprehensive listing of resources but rather a suggested sequence of first steps.

An even earlier precursor may have been the Montieth College Library Experiment at Wayne State University in the early 1960s. Patricia Knapp, an academic librarian and library educator, was an early proponent of integrating librarianship with academic instruction. Knapp’s “path-ways” instruction embedded the library, both its physical collections and the organization of the collections, throughout the four-year Montieth curriculum, building assignments that progressed in complexity as students advanced in their study and understanding of their disciplines.8

Early articles described the strategic purposes of research/resource guides. Alice Sizer Warner (1983) acknowledged that Library Pathfinders could be used as teaching tools and could enhance students’ research skills, though she did not offer specifics on how to accomplish those goals.9 Thompson and Stevens (1985) felt that traditional pathfinders were unsatisfactory because “they provided specific references to information and did not require students to develop their own search strategies.”10 Jackson (1984) described the guides created at the University of Houston-University Park as “search strategy guides.”11 Their guides emphasized a process for searching rather than pointing to specific information resources. The intention was to teach users methods for searching that could be applied in situations where subject guides did not exist. Kapoun (1995) suggested that pathfinders failed to serve their original purpose. He stressed that pathfinders “should not dictate a single ‘correct way’ to perform topical research. Instead they should facilitate individual styles of information gathering…. A pathfinder should offer suggestions, not formulas [emphasis Kapoun].”12

By the late 1990s most libraries had developed online guides to both locally-held and internet-based subject resources, according to research by Cohen & Still (1999).13 While much of the literature continued to focus on the instructional purposes of online guides, many articles described methods, applications, and software that could be used to produce guides. Yet even within these “how-to” articles were references to the instructional uses of guides.

Andrew Cox (1996) described hypermedia library guides. He promoted the incorporation of graphics, images, sound and video files while acknowledging the technical challenges and limitations of existing browsers (Netscape at that time).14 Corinne Laverty (1997) suggested that the Web could function as a library’s desktop publishing system, revitalizing subject guides and pathfinders and allowing the creation and incorporation of interactive library tutorials. In addition to a discussion about technical solutions, she suggested several desired features of online pathfinders, including the “addition of a complete research strategy within a subject area rather than limitation to the traditional list of reference tools,” how to critically evaluate information and write a paper, and links to databases and tutorials.15 The challenge, according to Laverty, was to “take advantage of the versatility and accessibility of the Web in a way that enhances the library learning process.”16

A study of electronic pathfinders from nine Canadian university libraries (Dahl 2001) considered the intended functions of these guides. Dahl felt that pathfinders had an instructional purpose — if they were mere bibliographies, they could not help students learn how to do research.17 Carla Dunsmore (2002) looked at the explicit and implicit purposes, concepts, and principles of online pathfinders. Using both Canadian and American university library pathfinders on three business topics, Dunsmore identified two major functions: “facilitating access and providing a search strategy.”18 Galvin (2005) found that “[p]athfinders which only list resources without providing explanations of the type of information offered in different sources do not teach students to evaluate information.”19 Bradley Brazzeal (2006) compared online forestry research guides to study how the guides incorporated the ACRL’s Information Literacy Competency Standards for Higher Education. Findings showed that some guides engaged the users by incorporating features that corresponded directly to elements of a library instruction session. He concluded that research guides had great potential to educate library users by helping them to understand the practical use of library resources and services.20

The time required to create and maintain Internet-based subject guides was noted by Morris and Grimes (1999) in their study of research university libraries in the Southeast. While the creation of guides was time-consuming, the librarians surveyed believed that the guides saved their users’ time in finding quality sites. The additional challenges of creating internet-based guides included the possible need for Web masters, student workers, paraprofessionals, and new software to create, monitor and maintain the guides. Consideration of search strategies or methods of conducting research were eclipsed by the technical challenges of creating online guides.21 In a follow-up study, the same authors concluded that library internet-based subject guides were becoming almost universal.22 The researchers’ use of the term “webliographies,” speaks to their use as a list of links rather than as a pedagogical tool.

Creation of “dynamic subject guides”, at York University, using an open source CMS application was discussed by Dupuis, Ryan, & Steeves (2004). The key objective of their guides was to serve as a starting point for research for undergraduate students. While the guides could be updated and maintained by librarians rather than computing staff, the guides themselves were chiefly search interfaces for library e-resources.23 Moses & Richard (2008) detailed the experience of two university libraries in implementing Web 2.0 technologies (SubjectsPlus and LibGuides) for building online subject guides. At the time of writing in November 2008, the open source SubjectsPlus, developed by Andrew Darby at Ithaca College, had been adopted by 15 libraries.24 LibGuides, a vendor solution developed by Springshare, was reportedly being used by over 400 institutions.

Another early article (Kerico & Hudson 2008) about adopting LibGuides as a web-based platform described the ease of use and functionality of the LibGuides platform. The embedded Web 2.0 features allowed librarians without expertise in computer programming or Web design to quickly create general online resource guides and course-specific subject guides that utilized interactive Web 2.0 features. More importantly, LibGuides could help refine instruction: the platform could make it easy to identify instructional elements that are common to all disciplines and encourage a “refined and collaborative approach to best practices for delivering content online to students and faculty alike.”25

Glassman & Sorensen (2010) suggested several web-based tools for the creation of library subject guides, pathfinders, and toolkits. Options included content management systems such as Drupal, blogging software such as Blogspot and WordPress, and wikis such as MediaWiki. Other options included the open source applications LibData, developed by the University of Minnesota Libraries, and SubjectsPlus. Ultimately Glassman & Sorensen’s library chose LibGuides for their online guides, citing the platform’s ease of use, customizability, strong vendor support, and content sharing.26

A nuanced criticism of research guides was offered by Alison Hicks in 2015. Hicks questioned whether the predominant usage of LibGuides focused far too heavily on the decontextualized listing of tools and resources which isolated research from the reading and writing processes. This was troublesome because it positioned research as static and linear, leading to a predefined or pre-identified truth or right answer. A better solution would be guides designed around research processes, allowing opportunity for students to construct their own meaning-making process. Hicks argued that “when we construct LibGuides around the resources that the librarian thinks the student should know about in order to ace their research paper, we attempt to simplify the processes of research.”27

Ruth L. Baker (2014) suggested that LibGuides could be used more effectively if they were structured as tutorials that guided students through the research process. Such guides would “function to reduce cognitive load and stress on working memory; engage students through metacognition for deeper learning; and provide a scaffolded framework so students can build skills and competencies gradually towards mastery.”28 In one of the few studies conducted to assess the impact of research guides on student learning, Stone et al. (2018) tested two types of guides for different sections of a Dental Hygiene first year seminar course. One guide was structured around resource lists organized by resource types (pathfinder design) while the second was organized around an established information literacy research process approach. The results showed that students found the pedagogical guide more helpful than the resource guide in navigating the information literacy research process. Stone et al. concluded that these pedagogical guides, structured around the research process with tips and guidance explaining the “why” and the “how” of the research process, led to better student learning.29

A study focusing on the influence of guide design on information literacy competency (as delineated in the 2018 ACRL Framework for Information Literacy for Higher Education) for guides used outside the classroom by Lee & Lowe (2018) showed similar results.30 The pedagogical guide was organized around the research process identified in Carol Kuhlthau’s Information Search Process (1991) and employed numbered steps to lead students through the research process. Students using the pedagogical guide reported a more positive experience, spent more time using the guide, interacted more with the guide, and consulted more resources listed on the guide than students using a more traditional pathfinder (resource lists) guide.31 Even though the study did not reveal a statistically significant difference in the information literacy learning outcomes between the students using the pedagogical guide and the students using the pathfinder guide, the authors proposed that there was a pedagogical advantage to having a more usable guide as well as lessening students’ negative emotions and anxiety related to research.

If, as Hemmig (2005) suggests, the origin of subject guides was Knapp and the Montieth Library Experiment project’s library “path ways”, then one of the central aspects of Knapp’s research has been repeatedly lost and rediscovered, reiterated and ignored, over the last 50 years.32 There has been recurrent consideration of subject guides as pedagogical tools to teach how information is used within the disciplines and how research is conducted, but too often the focus has shifted to the maintenance, readability, format, consistency, language usage, and discoverability of guides. Several authors share the same message of teaching strategies and methods; few reported on the successful implementation of those recommendations.

Our Challenge

As a large, public, land-grant university with over 35,000 undergraduate students, the two small departments of liaison librarians at UAL face a daunting task of supporting students in pedagogically sound ways with limited resources. Librarians often turn to online tutorials and guides to support the large student population. The UAL has a recently updated suite of tutorials that librarians work to embed into early career undergraduate courses. In addition, liaisons consistently collaborate with faculty to develop course guides that support specific classes and assignments. Although this approach has been useful, when we analyzed the usage of our guides as well as the questions that students were asking via chat and the reference desk, we found that the UAL could improve our support for students by investing more effort and energy into developing guides that better connect information literacy practices to the principles of andragogy and that better support students in the meaning making processes of research that Alison Hicks so adroitly champions in her article “LibGuides: Pedagogy to Oppress?” Research has shown that “[l]ibrary instruction seems to make the most difference to student success when it is repeated at different levels in the university curriculum, especially when it is offered in upper-level courses” and that “[a] tiered approach to teaching information literacy is in line with the way many universities teach other literacies, such as writing and math, with introductory skills at the freshman level and then more advanced practice as students matriculate.”33 A Utah State University study that examined the impact of sequenced library instruction reinforces these findings as well as the need to use online learning tools to take advantage of flipped models of instruction when setting up a scaffolded program.34

Given the need for scaling and providing opportunities for scaffolded and flipped instructional experiences that online research guides help fulfill, the use and usefulness of research guides for students is a primary concern for librarians. Courtois, Higgins, and Kapur (2005) studied user satisfaction of online subject research guides at George Washington University and found that while just over 50 percent of respondents rated the online guides positively, a full 40 percent rated the guides negatively.35 Reeb & Gibbons (2004) studied the disconnection between students and librarians’ mental models of information organization within academic disciplines as evident in online subject guides. Their usability testing repeatedly revealed low usage of or dissatisfaction with subject guides. Reeb & Gibbons suggested that an undergraduate student’s mental model was focused on courses or specific coursework rather than the discipline itself. Students found discipline-based subject guides lacking in context – they were confused by subject categorization and frustrated by not finding resources specifically tailored for their informational needs. The authors concluded that creating guides to support specific courses would be more useful to students than discipline-based guides.36 Data on the usage of subject guides produced at UAL bears out previous researchers’ doubts regarding usefulness. The research supports the conclusion that even though librarians may want to rely on subject guides as teaching and research support tools, most guides are underused. In observing the UAL website and existing subject guides in the period from January 1 to May 31, 2019, there is an apparent gap in the way that librarians present information and the way that library users wish to interact with the information being provided. Multiple subject guides produced by UAL have less than 100 views for that five-month period, which amounts to less than one view per day. The most heavily viewed guides on the UAL website focus on a specific, narrow topic or those developed for a specific course or program.

UAL Libguides Page Views, Jan. 1, 2019–May 31, 2019
LibGuide Page Views
AZ Residential Tenants Rules (topic) 19,287
BCOM 214 (course) 8,621
GIS & Geospatial Data (topic) 6,230
ENGL 102/108 (course) 4,837
Mexican Law (topic) 4,765
Business (subject) 2,439
Art (subject) 1,073
Psychology (subject) 881
Music (subject) 682
Nutritional Sciences (subject) 640

Along with issues related to the use of UAL subject guides, an analysis of our current site reveals that novice researchers encounter a number of navigational challenges when looking for guided research and/or instructional support. When looking for guidance, a user must navigate to the “Research and Publish” link, which then activates a dropdown selection where the user can select between links to Research By Subject/Topic, Research By Course/Program (both linking to alphabetical lists of LibGuides), “Learn With Tutorials,” which links to a set of foundational tutorials, “Write & Cite,” which provides links to citation and plagiarism resources and “Support for Researchers,” which links to specialized support for advanced research. While this linear and alphabetical representation of instructional support materials is not uncommon in academic libraries, it creates access challenges and misses an opportunity to demonstrate to students that research is process-oriented and recursive. It also raises the question of whether students understand the terminology in a way that allows them to find the help they need. In addition to navigational challenges, local decisions that were made when LibGuides were first implemented in 2013 further confound the research process. The original templates that the UAL developed for LibGuides pages were designed through a lens that focused heavily on creating a consistent user experience (UX) across guides and are very linear and somewhat rigid in nature. As research on how students learn online has grown, we believe that UX concerns with navigation and consistency must be wedded to design approaches that incorporate the learner experience (LDX). We believe that the purposeful melding of UX and LDX will help ensure that libraries design interfaces that support and enhance “the cognitive and affective processes that learning involves.”37

A Two-Pronged Approach: FAQs and LibGuides

Several attempts have been made by the UAL over the years to address these challenges and better integrate guides into the academic lives of students. One of the more successful projects has involved embedding library resources and instructional materials directly into the campus Learning Management System (LMS).38 This project, named the Library Tools Tab, began in the early 2000s and remains in use today. The goal of the project was to develop a tool that would provide access to a robust, embedded set of library instructional materials and resources through the campus LMS. While the team did succeed in developing and launching a tool that integrated into the LMS, it struggled with maintaining ongoing support and development and was never able to build it into as robust of a learning system as initially intended.

In response to the above observations and experiences, a small working group of librarians began the process of rethinking and revising the UAL’s approach to supporting online student research and learning. At the outset, the focus and intent was to improve the design of our subject and course guides. Our project grew as we worked to incorporate the research and best practices that we had uncovered as part of our research. Several factors influenced this expansion in scope including research conducted by William Hemmig (2005), Jennifer Little (2010), Shannon Staley (2007), Carol Kuhlthau (1991), and Meredith Farkas (2012). 

Hemmig’s 2005 article credited Patricia Knapp and the Montieth College Library Experiment project in the early 1960s as the genesis of pathfinders and later subject research guides. Knapp’s work to develop library instruction as part of the college curriculum was user-centered. It was designed to teach students the effective use of the library and its resources, creating both ways for the student to progress from their current state (What they know) to their desired level of knowledge (What they want to know) and methods for the student to navigate the organization of scholarly information resources.39 Knapp explained that “[k]nowing the way means understanding the nature of the total system, knowing where to plug into it, knowing how to make it work.”40 Jennifer Little (2010) pointed to cognitive load theory to inform the creation of pedagogically sound and useful research guides. Little’s suggestions for incorporating cognitive load theory principles into research guide creation included tying guides to specific courses rather than broad subject areas and assisting students in developing self-regulated learning strategies by breaking down research into smaller steps. According to Little, such guides “will motivate students to learn and remember how to navigate and use a wide variety of information resources.”41  

Shannon Staley used the results of a 2007 study on the usefulness of subject guides at San José State University to suggest that the prevailing model of subject guides – primarily a presentation of lists of resources – did not match the Information Search Process (ISP) used by students that was first documented by Carol Kuhlthau in 1991.42 Kuhlthau, who focused on students’ information behavior, identified six stages of the ISP: initiation, selection, exploration, formulation, collection, and presentation. Staley proposed that subject guides incorporating “the cognitive process to completing course assignments – steps addressing the different stages of the student ISP – would more closely parallel students’ mental model” and thus prove more useful to and more used by students.43 In 2013, Meredith Farkas and a team of librarians at Portland State University released Library DIY, which is a “system of small, discrete learning-objects designed to give students the quick answers they need to enable them to be successful in their research.”44 The Library DIY approach is grounded in the idea that “Libraries also need to rethink how we create online instructional content, which is often designed based on how we teach. A patron looking for information on how to determine whether an article is scholarly doesn’t want to go through a long tutorial about peer review to find the answer.”45

A common theme across the instruction-focused articles on library guides is the need for libraries to unveil systems and processes so that students can engage in research in a way that supports them as creators, explorers, and interlocutors in the research conversation. After exploring several different ideas, we landed on developing a scaffolded approach that is centered on an online, student-initiated, and self-guided research experience. Our intent is to have a system that addresses discrete research concerns while surfacing the iterative nature of the research process. The centerpiece of the redesign is a set of reconceived Frequently Asked Questions (FAQ) pages, developed to support the pedagogical approaches identified by Knapp, Little and Kuhlthau, and heavily modeled on the Library DIY approach – so students have a great deal of personal control in the ways in which they plug into, navigate, and engage with library research. 

To begin, we gathered local data by looking at queries submitted to our current FAQ system between Jan.1 and May 31, 2019.The queries represent suggested questions for the FAQ, which theoretically will guide the user to their topic via a keyword system. However, for the six-month time period, 202 questions did not result in users clicking on a FAQ item. We found that though over half (n=125) of the questions submitted by users were related to account, software or facilities issues — e.g. “How do I renew books when I have fines?” Most  of the remaining 77 questions submitted by users dealt with traditionally research-related topics. Citation/copyright help was heavily represented, as were questions about peer review and scholarly articles, general searching, finding liaison librarians, and other miscellaneous research topics. Chat transcripts followed a similar theme. The bulk (n=265) of the 479 sampled questions asked for basic research help — generally of the “How do I find an article about X?” variety or known-item searches, followed by general access issues (such as eBook or database access) then by citation and or copyright help questions. Although the UAL has a multi-search box in a central location on the website homepage, the data gathered from local chat transcripts and FAQ meshed with the research literature and confirmed that students need support related to how they navigate, understand, and apply the steps of the research process, not just ease of access to resources. 

Armed with data and a strong theoretical underpinning, we began the process of creating landing pages that serve as the gateway to the new system. After a few false starts, we worked with our instructional designer to develop the landing page below. It is designed to be visually simple and to help provide a quick on-ramp to research and library navigation as well as straightforward access to help via chat, text, telephone, email, or a liaison librarian. All Answers to FAQs are searchable from the landing page and are organized by category on the sub-pages.

The University of Arizona Libraries’updated
Image 01. Image of Ask Us landing page.
The University of Arizona Libraries’ updated
Image 02. Image of Library Research FAQ subpage.
We labeled and ordered the sub-categories to represent the major components of the research process, but also included a search bar so that students can quickly access information that they are seeking.

The FAQ answers are grounded in approaching reference through the lenses of pedagogy and andragogy  and are designed to scaffold students into increasingly more complex and in-depth information after they have gleaned what they need from the introductory materials. Each FAQ is constructed to answer a specific question as succinctly as possible and then provide links to more in-depth tutorials and resources that students can use as they continue on their research journey. This approach supports Elmborg’s (2002) idea that librarians “must see our job as helping students to answer their own questions”46 and Nancy Fried Foster’s assertion that librarians need to provide opportunities for students “to develop their information seeking skills and their judgement.”47  We feel that this treatment allows us to support students as they take ownership of their searching and learning processes and devise paths through the research process.

Although the initial rationale behind FAQ pages on library websites might have been a means to avoid potential redundancy in the sort of questions asked by patrons to an already understaffed and overtaxed public services staff (West 2015), the authors feel that the platform has potential to provide an additional opportunity for research help, particularly for novice researchers.48 Since FAQs provide an opportunity to create a living document that is updated often (West), the authors hope that the FAQs might also provide an excellent opportunity to create a living pedagogical document that helps support students through the iterative process of research.

Along with restructuring the FAQs, our research helped us identify several ways that we could improve the pedagogical functioning of our course, subject and topic guides. Our original guides were structured to encourage creators to list all resources and content in a single column. This approach was heavily informed by UX best practices and aligned well with those but at times was overly restrictive and pushed creators into developing lists of decontextualized resources.

A Linear Course Guide Layout for an Anthropology Course
Image 04. Image of

A Linear Course Guide Layout for an Anthropology Course

To address this, we worked closely with our instructional designer to develop guides that allowed the UAL to expand out of our linear, resource centric approach.

The pilot guides have been well-received by faculty and students, and we soon realized that we would need to implement a system that supported content creators to develop their own instruction-focused guides rather than rely on a single person to develop these guides. To reach the goal of reimagining the way LibGuides can be developed and implemented to better support students in gaining research and information literacy skills, we constructed a system designed to support content creators in developing pedagogically sound guides that adhere to instructional best practices. We want this system to allow for flexibility in presentation and design while maintaining a consistent user experience. We searched across institutions to learn how different libraries managed guides and found that developing blueprint guides would be the most effective way of supporting UAL content creators. The blueprint guides we have developed are meant to synthesize and represent the findings of the many years of research that librarians have conducted on the best ways to teach and learn with library guides. The blueprints are designed to provide creators with flexibility in design as well as efficiency in creation. This support is achieved through providing easy to adapt frameworks as well as specific directions ( on how and why to use a particular type of guide. 

Sample Blueprint Guide
Image 06. Image of One Page Four Column Guide page.


Our goal for the new process was to purposefully redesign our existing guides and reference ecosystem to move away from decontextualized lists of resources which encourage students to “engage in a one-stop shopping process.”49 Instead, we would focus on students as active learners constructing their own meaning through the process of research. Doing so would hopefully  strengthen students’ sense of self efficacy and ownership of the process, allowing them to become thoughtful contributors to the scholarly conversation. The new system was launched in August 2020, and guide creators are receiving training and support in adapting existing guides as well as in creating new ones. To ensure that librarians across the UAL system are able to successfully implement this new approach, we have developed an infrastructure that starts with pedagogically oriented FAQs that have been designed to adhere to adult learning theory and encourage independent use and discovery.  Along with the FAQs, guides have been rethought to better accompany students through the process of research rather than simply provide them with lists of potential resources. Although constructing guides in this way often requires creators to commit to a philosophical move away from a “just in case” provision of resources mindset  as well as invest more time in thinking about how to construct paths through a particular research process, we have attempted to lessen the workload by providing a set of easy to duplicate blueprints as well as regularly updated instructions on how to implement these new practices. As of this writing there are six different blueprints with more in development. In the next phase of our research, the authors will be collaborating on a multi-institutional study to assess the pedagogical efficacy of the different blueprints and will share findings in a future publication. 

Finally, this model offers a means of bridging the gap between the UAL discovery tool and the more in-depth tutorials and guides that UAL librarians create to support students in their in-class research. It has been designed to provide  a way to support students who need help understanding or navigating a specific facet of their research process but are not in need of (or willing to invest the time in) more in-depth instruction. These changes are being undertaken with the intent of developing concrete ways to make the research experience as intuitive and seamless as possible for novice researchers. 


Many thanks to publishing editor Kellee Warren, internal reviewer Dr. Nicole Cooke, and external reviewer Erica DeFrain for their many insightful and generous comments on the manuscript. A special thanks to Nicole Hennig for all her hard work and expertise taking our ideas and turning them into something concrete and functional. Thank you also to Jennifer Church-Duran for being supportive of the need for changes and our research around it. 


Baker, R.L. (2014). Designing LibGuides as instructional tools for critical thinking and effective online learning. Journal of Library & Information Services in Distance Learning, 8(3-4), 107-117. 

Bowles-Terry, M. (2012). Library Instruction and Academic Success: A Mixed-Methods Assessment of a Library Instruction Program. Evidence Based Library & Information Practice 7(1), 82–95. 

Brazzeal, B. (2006). Research Guides as library instruction tools. Reference Services Review, 34(3), 358-367.

Canfield, M.P. (1972). Library pathfinders. Drexel Library Quarterly, 8, 287-300.

Cohen, L.B. & Still, J.M. (1999). A comparison of research university and two-year college library web sites: content, functionality, and form. College & Research Libraries, 60(3): 275-289.

Courtois, M., Higgins, M. & Kapur, A. (2005). Was this guide helpful? Users’ perceptions of subject guides. Reference Services Review, 33(2), 188-196. 

Cox, A. (1996). Hypermedia library guides for academic libraries on the world wide web. Program, 30(1), 39-50.

Dahl, C. (2001). Electronic pathfinders in academic libraries: an analysis of their content and form. College and Research Libraries, 62(3), 227-237.

Dunsmore, C. (2002). A qualitative study of web-mounted pathfinders created by academic business libraries. Libri 52(3), 137-156.

Dupuis, J., Ryan, P. & Steeves, M. (2004). Creating dynamic subject guides. New Review of Information Networking, 10(2), 271-277.

Elmborg, J.K. (2002). Teaching at the Desk: Toward a Reference Pedagogy. portal: Libraries and the Academy 2(3), 455-464. doi:10.1353/pla.2002.0050.

Farkas, M.G. (2013, July 2). Library DIY: Unmediated point-of-need support. Information Wants to be Free [blog]. 

Farkas, M.G. (2012). Technology In Practice. The DIY Patron: Rethinking How We Help Those Who Don’t Ask. American Libraries, 43(11/12), 29.

Foster, N. F & Gibbons, S. (Eds.). (2007). Studying Students: The Undergraduate Research Project at the University of Rochester. Chicago: Association of College and Research Libraries, 2007.

Galvin, J. (2005). Alternative Strategies for Promoting Information Literacy, The Journal of Academic Librarianship 31(4), 352-357.

Glassman, N.R. & Sorensen, K. (2010) From Pathfinders to Subject Guides: One Library’s Experience with LibGuides, Journal of Electronic Resources in Medical Libraries, 7:4, 281-291,

Grimes, M. & Morris, S.E. (2001). A comparison of academic libraries’ webliographies. Internet Reference Services Quarterly, 5(4), 69-77.

Hemmig, W. (2005). Online pathfinders. Reference Services Review, 33(1), 66-87. 

Hicks, A. (2015). LibGuides: Pedagogy to Oppress? Hybrid Pedagogy.

Jackson, R. & Pellack, L.J. (2004). Internet subject guides in academic libraries: an analysis of contents, practices, and opinions. Reference and User Services Quarterly, 43(4), 319-27.

Jackson, W.J. (1984). The user-friendly library guide. College & Research Libraries News, 45(9), 468-71.

Kapoun, J.M. (1995). Re-thinking the library pathfinder. College and Undergraduate Libraries, 2(1), 93-105. 

Kerico, J. & Hudson, D. (2008). Using LibGuides for outreach to the disciplines. Indiana Libraries, 27(2), 40-42.

Kline, E., Wallace, N., Sult, L., & Hagedon, M. (2017). Embedding the Library in the LMS: Is It a Good Investment for Your Organization’s Information Literacy Program?. Distributed Learning, 255-269.

Kuhlthau, C. (1991). Inside the Search Process: Information Seeking from the User’s Perspective. Journal of the American Society for Information Science 42(5), 361-371. 

Laverty, C. (1997). Library instruction on the web: inventing options and opportunities. Internet Reference Services Quarterly, 2, 55-66.

Lee, Y.Y. & Lowe, M.S. (2018). Building positive learning experiences through pedagogical research guide design. Journal of Web Librarianship, 12(4), 205-231, 

Little, J.J. (2010). Cognitive load theory and library research guides. Internet Reference Services Quarterly, 15(1), 53-63. https://10.1080/10875300903530199 

Lundstrom, K., Martin, P., & Cochran, D. (2016). Making Strategic Decisions: Conducting and Using Research on the Impact of Sequenced Library Instruction. College & Research Libraries, 77(2), 212-226. doi:

McMullin, R. & Hutton, J. (2010). Web subject guides: virtual connections across the university community. Journal of Library Administration, 50(7-8), 789-797, 

Morris, S. E., & Grimes, M. (1999). A great deal of time and effort: an overview of creating and maintaining internet-based subject guides. Library Computing, 18(3), 213-216.

Moses, D. & Richard, J. (2008). Solutions for Subject Guides. Partnership: the Canadian Journal of Library and Information Practice and Research, 3(2),

Peters, D. (2012, July 24). UX for Learning: Design Guidelines for the Learner Experience. UX Matters.

Reeb, B. & Gibbons, S. (2004). Students, librarians, and subject guides: improving a poor rate of return. Portal: Libraries and the Academy, 4(1), 123-30. 

Sizer Warner, A. (1983, March). Pathfinders: a way to boost your information handouts beyond booklists and bibliographies. American Libraries 14, 151. 

Staley, S. M. (2007). Academic subject guides: a case study of use at San Jose State University. College & Research Libraries, 68(2), 119–139.

Stevens, C.H., Canfield, M.P. & Gardner, J.J. (1973). Library pathfinders: a new possibility for cooperative reference service. College & Research Libraries, 34(1), 40-6.

Stone, S.M., Lowe, M.S., Maxson, B.K. (2018). Does course guide design impact student learning? College & Undergraduate Libraries, 25(3), 280-296. 

Vileno, L. (2007). From paper to electronic, the evolution of pathfinders: a review of the literature. Reference Services Review, 35(3), 434-451. 

Watts, K. A. (2018) Tools and Principles for Effective Online Library Instruction: Andragogy and Undergraduates, Journal of Library & Information Services in Distance Learning, 12(1-2), 49-55.

West, J. (2015). Getting Your FAQs Straight. Computers in Libraries 35(3), 28-29.

Wilbert, S. (1981).Library pathfinders come alive. Journal of Education for Librarianship, 21(4), 345-349.

Worrell, D. (1996). The Work of Patricia Knapp (1914-1972). The Katharine Sharp Review, no. 3. Available at

  1. Little 2010; Hicks 2015
  2. p. 54
  3. Wilbert 1981; Sizer Warner 1983; Dunsmore 2002; Hemmig 2005; Brazzeal 2006; Vileno 2007
  4. Canfield 1972; Stevens et al 1973
  5. Stevens et al., 1973, p. 41
  6. p. 287
  7. Hemmig 2005; Worrell 1996
  8. p. 224
  9. p. 96
  10. p. 59
  11. p. 66
  12. p. 150
  13. p. 353
  14. Grimes and Morris 2000
  15. p. 41
  16. Hicks
  17. p. 111
  18. Bowles-Terry 2012
  19. Lundstrom et al 2016
  20. Peters 2012
  21. Kline 2017
  22. Knapp 82-84
  23. p. 82
  24. p. 60
  25. p. 132
  26. Farkas 2013
  27. Farkas 2012
  28. p. 459
  29. p. 78
  30. Hicks, 2015

OCLC-LIBER Open Science Discussion on Research Infrastructures and the European Open Science Cloud (EOSC) / HangingTogether

Astrid Verheusen

Thanks to Astrid Verheusen, Executive Director of LIBER, for contributing this guest blog post.

What is the ideal future vision for the European Open Science Cloud (EOSC) in the global Open Science ecosystem? What are the challenges in getting there and how can research libraries help address these challenges through collective action? The third installment of the OCLC/LIBER discussion series on open science brought together an international group of participants with a shared interest in research infrastructures and the EOSC.

The EOSC is an initiative of the European Commission and the vision behind it is as follows:  

The EOSC will offer 1.7 million European researchers and 70 million professionals in science, technology, the humanities and social sciences a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines by federating existing scientific data infrastructures, currently dispersed across disciplines and the EU Member States.”

What does an ideal future state look like for Research Infrastructures and the EOSC?

The conversation began with a participant emphasizing the importance of awareness raising and promotion; all researchers in Europe should know what the federated infrastructure of EOSC aims to be. This is necessary to activate their involvement. The EOSC should become the backbone for storing, sharing, processing and archiving research data. If this core functionality is realized, then the EOSC would be a huge success. It would not only allow researchers to use data within their own discipline but also to easily find relevant data from other disciplines, according to the FAIR data principles. The EOSC should become part of the global Open Science endeavor and be embedded in a global network of federated infrastructures leading to a so called “Internet of FAIR data and services.”

Photo courtesy of LIBER

In this future environment, researchers will be supported by “data experts” who will get credits for their work and reduce the workload of researchers. This would lead to a more balanced distribution of labor, reduce duplication, boost scientific output and improve data quality. One participant expressed the growing need for large numbers of data experts, while another indicated that EOSC should train users and be user friendly so that that fewer data experts are required.

Another participant envisioned the EOSC as offering a participatory and inclusive infrastructure, not limited to a select group of users. In this vision, citizens would be allowed to have access to the data and scientific processes, in addition to scientific researchers. Transparency and access to data and services across different user communities and disciplines would lead to a more homogenous system, bridging information silos. The EOSC services layer should also support scientists with the many issues related to open data, such as license management, copyright, and intellectual property rights (IPR).

Other participants in the discussion confirmed these views, adding that the EOSC should be user-friendly. There was concern about how the EOSC compares and relates to the many national and local research infrastructures and services already in place or under development. The EOSC should ideally complement and integrate these existing infrastructures and services and build upon them. Duplication should be avoided. One participant even suggested that the EOSC should not be about data at all—because local repository infrastructure is already quite mature in much of Europe, and therefore, the EOSC should focus on new and innovative services—such as support for innovative peer-review or for linked data.

Finally, it was stated that the EOSC should include the principle of co-creation, so that researchers and data stewards at universities are involved from the start. Cross-system interoperability is also a priority, as researchers do not want to interact with multiple systems. Currently this is not the case.

What are the main challenges and obstacles preventing progress toward this ideal state?

While formulating a vision of what an ideal EOSC might look like in the future, the discussion revealed how far away we are from this ideal state. After a real-time online polling of the obstacles, a listing of the top three challenges ahead was defined and further discussed.

  1. Cultural change
  2. Involvement of researchers (and libraries)
  3. Current reward system
Cultural change

Researchers must become more aware of the importance of open science, and cultural change is necessary regarding open science and data sharing. But how do we catalyze this cultural change? What are the right incentives? Researchers who have already adopted open science can serve as champions. And researchers must be incentivized by reaping rewards that they value: research gets more dissemination, recognition, and funding when it is open and shared with everyone. For research libraries it is however not always easy for libraries to connect with researchers.

Participants also agreed that culture change is also needed in libraries. In general, libraries are risk averse, and they are not always engaging in new developments. This is another reason why libraries are not yet participating in the development of the EOSC.

Involvement of researchers (and libraries)

In the discussion, concerns were expressed about the (lack of) involvement of universities, libraries, and researchers in the development of the EOSC. Many libraries already have Open Science programmes in place and can be part of other initiatives like the development of the EOSC. However, the EOSC is being developed by a small group of organisations with significant funding from the European Commission. Universities are not involved in its development and it is difficult to contribute. The principle behind building the EOSC should be co-creation and universities need funding to be able to contribute.

Participants in this discussion were divided on how involved researchers should be at this stage of EOSC development. One discussant suggested that in order to avoid frustration, we should not involve the researchers yet, but engage them after the EOSC has been developed a little further. But another participant expressed concern that the EOSC effort is too confident about who the researchers are and what they want. That it wants to build a system and then want to present it to “the researcher,” ignoring the fact that researchers are a diverse group and that significant disciplinary differences inform their practices. The development of the EOSC must be researcher-driven and taking the needs of researchers as its starting point.

Current reward system

The current reward system for researchers does little to incentive participation in open science and data sharing activities. (This issue was also discussed in the “Rewards and Metrics” discussion within this series, and a blog on that discussion is forthcoming.) Therefore, without changes to the current system of metrics and rewards, EOSC will not operate optimally.

How can library (and other) communities take collective action to address these challenges?

Cooperation between research libraries, between librarians and their institutions, and beyond, is desperately needed to meet the challenges. The LIBER Open Science Roadmap also emphases cooperation, but the European landscape of research libraries is very diverse. We need to put extra energy into learning from each other to avoid making the same mistakes. One participant encouraged more transnational opportunities to exchange good practices, to learn, and to prevent pitfalls and mistakes.

Research libraries can also learn and become stronger by talking with other parts of the universities (e.g., the research office) and assist in developing innovative metrics for research evaluation. They can also work together with organisations such as the Research Data Alliance (RDA) and the GO-FAIR initiative and can provide input to funding agencies on institutional policies.

Participants offered some examples of collective action, taking place at national and international levels:

  • In the Netherlands, impulse funding for Digital Competence Centres was awarded by the Netherlands Organisation for Scientific Research (NWO) in September 2020. Universities of Applied Sciences will work together in a Digital Competence Centre for practice-based research to further facilitate research data management, FAIR data, and data-intensive research at universities of applied sciences in order to realize their open-science ambitions. NWO has awarded an impulse grant of 900,000 euros for this purpose.
  • In Germany, a National Research Data Infrastructure (NFDI) is under development. The aim of this initiative, which began in 2015, is to systematically manage scientific and research data, provide long-term data storage, backup and accessibility, and network the data both nationally and internationally. The NFDI will bring multiple stakeholders together in a coordinated network of consortia tasked with providing science-driven data services to research communities.
  • In the context of other initiatives for the development of research infrastructures, the GAIA-X project was also mentioned. GAIA-X is an initiative created to ensure that commercial parties in the EU have a better grip on their own data and to meet the desire for greater data sovereignty.

Although these examples can serve as a guide and good practice, the development of an interdisciplinary European or even global infrastructure is complex. The discussion on the role of research libraries in research infrastructures and the EOSC has only just begun. One participant concluded that libraries should take the risk — even if we do not know where we are going, we should keep walking along.

About the OCLC-LIBER Open Science Discussion Series

The discussion series is a joint initiative of OCLC Research and LIBER (the Association of European Research Libraries). It focusses on the seven topics identified in the LIBER Open Science Roadmap, and aims to guide research libraries in envisioning the support infrastructure for Open Science (OS) and their roles at local, national, and global levels. The series runs from 24 September through 5 November.

The kick-off webinar opened the forum for discussion and exploration and introduced the theme and its topics. Summaries of all seven topical small group discussions will be published on the OCLC Research blog, Hanging Together: including the first two: (1) Scholarly Publishing, (2) FAIR research data.

Join us! We invite all members of the open science community to join our organizations for the closing round-up webinar on 5 November, where we will synthesize and share the findings from all seven group discussions. Register today.

The post OCLC-LIBER Open Science Discussion on Research Infrastructures and the European Open Science Cloud (EOSC) appeared first on Hanging Together.

We should regulate virality / Eric Hellman

It turns out that virality on internet platforms is a social hazard! 

Living in the age of the Covid pandemic, we see around us what happens when we let things grow exponentially. The reason that the novel coronavirus has changed our lives is not that it's often lethal - it's that it found a way to jump from one infected person to several others on average, leading to exponential growth. We are infected with virus without regard to the lethality of the virus, but only its reproduction rate.

For years, websites have been built to optimize virality of content. What we see on Facebook or Twitter is not shown to us for its relevance to our lives, its education value, or even its entertainment value. It shown to us because it maximizes our "engagement" - our tendency to interact and spread it. The more we interact with a website, the more money it makes, and so a generation of minds has been employed in the pursuit of more engagement. Sometimes it's cat videos that delight us, but more often these days it's content that enrages and divides us.

Our dissatisfaction with what the internet has become has led calls to regulate the giants of the internet. A lot of the political discourse has focused on "section 20"  a part of US law that gives interactive platforms such as Facebook a set of rules that result in legal immunity for content posted by users. As might be expected, many of the proposals for reform have sounded attractive, but the details are typically unworkable in the real world, and often would have effects opposite of what is intended. 

I'd like to argue that the only workable approaches to regulating internet platforms should target their virality. Our society has no problem with regulations that force restaurant, food preparation facilities, and even barbershops to prevent the spread of disease, and no one ever complains that the regulations affect "good" bacteria too. These regulations are a component of our society's immune system, and they are necessary for its healthy functioning.

never going to give you covid
Add caption

You might think that platform virality is too technical to be amenable to regulation, but it's not. That's because of the statistical characteristics of exponential growth. My study of free ebook usage has made me aware of the pervasiveness of exponential statistics on the internet. Sometime labeled the 80-20 rule, the Pareto principle, or log-normal statistics, it's the natural result of processes that grow at a rate proportional to their size. As a result, it's possible to regulate virality of platforms because only a very small amount of content is viral enough dominate the platform. Regulate that tiny amount of super-viral content, and you create incentive to moderate the virality of platforms. The beauty of doing this is that a huge majority of content is untouched by regulation.

How might this work? Imagine a law that removed a platform's immunity for content that it shows to a million people (or maybe 10 million - I've not sure what the cutoff should be). This makes sense, too; if a platform promotes illegal content in such a way that a million people see it, the platform shouldn't get immunity just because "algorithms"! It also makes it practical for platforms to curate the content for harmlessness- it won't kill off the cat videos! The Facebooks and Twitters of the world will complain, but they'll be able to add antibodies and T-cells to their platforms, and the platforms will be healthier for it. Smaller sites will be free to innovate, without too much worry, but to get funding they'll need to have plans for virality limits.

So we really do have a choice; healthy platforms with diverse content, or cesspools of viral content. Doesn't seem like such a hard decision!

Web Librarians Who Do UX: Access presentation / Shelley Gullikson

This is the text (approximately) of my presentation from the virtual Access conference on Oct.19, 2020, “Web librarians who do UX: We are so sad, we are so very very sad.”

Last year, I was doing interviews with library people who do User Experience work and noticed that people who were primarily focused on the web had the most negative comments and fewest positive comments overall. It made me think of the song from Scott Pilgrim—the comic and the movie—“I am so sad, I am so very very sad.”

So there’s the title.  And I’m saying “We are so sad” because I am also a web person who does UX work. And a lot of what I heard seemed familiar.

I want to say that although the title and the visuals are based around a comic and comic book movie, I’m not trying to be flip. A lot of the people who I talked to were very open about being unhappy. Not everyone was unhappy. But, there was a lot in common among the people who said they were struggling and those who were pretty positive. Here are some quotes from people who were generally pretty positive :

  • “How much can I do that no one will block me from doing?”
  • “Why am I really here then, if I’m just moving things around the page?”
  •  [I keep feedback] “for promotion purposes but also not-being-sad purposes.”

And from the not-so-positive :

  • “You have all the people who have their own personal opinions… and you’re like “you’re violating every good norm of website development”… they think their opinion is just as good as anyone else’s opinion. … That can definitely demoralize you.”
  • “I bounce back and forth between, for my own sanity’s sake, needing to be apathetic about it, saying ‘I can’t change this therefore I can’t be stressed about it’, and also on the other hand, caring that we have crappy stuff out there and wanting to improve it.”
  • “It is what it is. There’s lots of other things to be disappointed by.”

Heartbreaking, right? So why is this the case?

First  a tiny bit of background on the research project. The aim of the project was to look at how UX work is structured and supported in academic libraries and then to examine those supports within the context of the structures. I did hour-long semi-structured interviews with 30 people in academic libraries from 5 countries (Canada, the US, the UK, Sweden, and Norway). These were library workers who do UX, so not necessarily librarians, and not necessarily people in UX positions. The people I’m talking about today focus mostly on the web in their jobs.

The frustrations of web folks were particularly  striking because I didn’t ask a question about frustrations; I asked what supports were helpful to them and what would be helpful. Admittedly, asking “what would be helpful” is going to bring up deficiencies, but I didn’t ask what supports were missing or what they found frustrating in their work. And again, the web folks talked more about frustrations and difficulties than participants who didn’t have a web focus.

So let’s dig in a bit. Why, specifically, are we so sad?

First off, we have a tendency to want to think big! Do more!

  • “That’s what motivates me—the opportunity to really sit down, talk, observe, have a conversation with our users, how they approach the website, how they approach the research process, how they approach finding out about our services and how we in turn can better highlight our resources, how we can better highlight our collections, our services.”
  •  “If I see people struggling with things, I want to make them better.”
  • “I don’t want UX to be just a website thing. I don’t want people to think of it ‘oh, it’s just a web thing.’ I want it to be in everything.”
  • “I just see lots of potential all the time. I see potential everywhere, the whole library. I see things we could do that would enhance things.”

That doesn’t sound sad. There’s energy and excitement in those words!

But contrast it with:

  • “Why am I really here then, if I’m just moving things around the page? I’m trying to get deeper. I’m trying to get a better understanding. It’s not just a matter of moving things around.”

Web people who do UX are, I think, well positioned—and perhaps uniquely positioned—to see big picture problems across the library. One participant told me they found that users were confused about the Circulation section of the website because there were 18 different policies underlying it; they could rewrite the web content but couldn’t do anything about the underlying spaghetti of policies. Another said that users found the floor maps confusing but the maps reflected the language used on the library’s signage; they could put clear language on the website’s floor maps but couldn’t do anything about the signage in the building.

So we see these problems and naturally want to solve them. We get excited about the potential to make more things better. And we chafe against having to think smaller and do less.

Which brings us to: lack of authority. Lack of authority often comes up around those larger library issues. One participant put it this way:

  • “The UX work is actually informing something else to happen. Whether that’s a space being reorganized or a webpage being redesigned—the UX work is informing this other work. Right? So it would be easier for me to do the UX work if I could actually do the work that it’s informing.”
  • Another person was even having problems at the research stage: [I’d like to] “have the authority and freedom to actively engage with users.”
  • And someone else, in talking specifically about their web work said: “Nobody tries to stop me.” The implication being that people try to stop them when they do other things.

But for many participants there was a lack of authority even when dealing with the library website:

  • “The web team doesn’t feel like they can really make changes without consult, consult, consult with everybody even though – even if, and even though – the web team has web expertise.”
  •  “Just because I’m our internal expert on this stuff doesn’t mean I can persuade everybody.”
  • “There’s too much of a sense that these things have to be decided by consensus”
  • “Everyone feels… like they should have the right to declare how databases should work, how links should be configured, things like that.”
  • [Each library unit feels] “they have the right to do whatever they want with their content and their presentation. … I’m not their boss and they realize that.  I’m happy to draw up more guidelines and stuff like that but if I’m not allowed to enforce that… [it’s] hard to keep things together when you just have to go hat in hand to people and say ‘pretty please, stop breaking the guidelines.’”

One participant described how having no authority for the one thing they were responsible for made them feel: “Of course that has stymied my initiative, not to mention my disposition. My purpose even.”

Another frustration that came through was resistance from colleagues. A few comments have already touched on colleagues ignoring expertise but resistance comes through in other ways

  • One participant described how they always approach a particular department: [I’m] “treading very slowly and carefully and choosing my words very carefully”
  • Another said: “Are they deliberately killing the idea but trying to avoid being disagreeable about it but just letting it die from attrition, or do they really actually mean it when they say they agree with the idea in principle but just don’t want to be bothered to follow through? I don’t know – I can’t tell the difference.”

These are things participants were told by their colleagues:

  • A manager said that “staff felt unfairly targeted” by their work
  • In opposing to changes to the website: “We have to keep it this way because we teach it this way”
  • And similarly, “It’s our job to teach people how to use, not our job to make it easier to use.”

So, not surprisingly, these kinds of things make us feel isolated. Feelings of isolation come through in a few ways. Some participants felt they were completely on their own when deciding where to focus their attention. This is one participant talking about being new in their position:

  • “I remember asking for, if there were any focuses they wanted to focus on… they said ‘no, there’s nothing. We don’t have any direction for you to go in.”

That lack of direction is often coupled with not having colleagues who do the same work:

  • “It’s really me and up to me to figure out where to focus my attention by myself. So sometimes having someone to bounce ideas off of and talk things through with… would be nice.”

And when no one else does what you do:

  • “Sometimes that’s a barrier, if I’m the ‘expert’ and other people don’t really know what I’m talking about.”

So, isolation, having to think small and do less, resistance from colleagues, and lack of authority. Yeah, no wonder we feel a bit sad.

What are my take-aways?

We need to find our people. UX folks who worked with groups of colleagues were more positive about their work. However, people who tried to do UX work with non-UX committees were even more negative than people who had no group at all. So we can’t just look for any people, they have to be the right people.

I wrote an article about the larger project that was published in Weave earlier this month and in it, one of my recommendations was to try to move beyond the website. But I want to say here that moving beyond the web is not a panacea. I talked to someone who had great success in UX for the website and other digital projects. They wanted to embed UX throughout the library and they had management support to do it. But after continued resistance from colleagues, they realized they couldn’t make it work, and decided to move to a completely different area of the library. Which brings me to my next point.

Advocacy is important, absolutely, but when we’re not getting buy-in, we need look at next steps: do we need to change our tactics? Would it be better to have someone else advocate on our behalf? Do we need to wait for a change of leadership? Or, as a few participants said, a few retirements? At a certain point, do we give up, or do we get out? Because advocacy doesn’t always work. And if it’s not working , we shouldn’t keep banging our heads against the post, right?

Ultimately , I think we need to be clear about authority.

We need to understand how authority works in our own library. Not just who can block us and who can help, but are there organizational structures that confer some authority? Is it better to chair a committee or a working group? For example.

Then, we need a clear understanding of what our own authority is within our organization. Maybe we underestimate the authority we have. Maybe not. But we need to be clear before we get to the next part.

Which is: we need to clearly understand our own tolerance for doing work that will never be acted on. The report that sits in a drawer. If our tolerance is low, if it’s upsetting to have our work ignored, then we need to stick very closely to our own sphere of authority. We have to dream within that sphere or burn out.

“Dream small or burn out” is an exceptionally grim note to end on.  But these frustrations are largely beyond one person’s control. If you’re feeling so very very sad because of some of these things, IT’S NOT JUST YOU. The fact that these issues were common to web folks, regardless of how they seemed to feel about their work, suggests that these positions are prone to these kinds of frustrations.

I wish I had some ideas for how to fix it! If you do,  please add them to the chat, tweet at me, email me (see contact info). I’ll gather it all in a blog post so it’s all in one spot. Thanks.

Do we trust the plane or the pilot? The problem with ‘trustworthy’ AI / Open Knowledge Foundation

On April 8th 2019, the High-Level Expert Group on AI, a committee set up by the European Commission, presented the Ethics Guidelines for Trustworthy Artificial Intelligence. It defines trustworthy AI through three principles and seven key requirements. Such AI should be: lawful, ethical and robust, and take into account the following principles:

  • Human agency and oversight
  • Technical Robustness and safety
  • Privacy and data governance
  • Transparency
  • Diversity, non-discrimination and fairness
  • Societal and environmental well-being
  • Accountability

The concept has inspired other actors such as the Mozilla Foundation which has built on the concept and wrote a white paper clarifying its vision. Both the ethics guidelines and Mozilla’s white paper are valuable efforts in the fight for a better approach to what we at Open Knowledge call Public Impact Algorithms:

“Public Impact Algorithms are algorithms which are used in a context where they have the potential for causing harm to individuals or communities due to technical and/or non-technical issues in their implementation. Potential harmful outcomes include the reinforcement of systemic discrimination (such as structural racism or sexism), the introduction of bias at scale in public services or the infringement of fundamental rights (such as the right to dignity) »

The problem does not lie in the definition of trustworthiness: the ethical principles and key requirements are sound and comprehensive. Instead, it arises from the aggregation behind a single label of concepts whose implementation presents extremely different challenges.

Going back to the seven principles outlined above, two dimensions are mixed in: the technical performance of the AI and the effectiveness of the oversight and accountability ecosystem which surrounds it. The principles fall overwhelmingly under the Oversight and Accountability category.

Technical performance Oversight and Accountability
Technical robustness and safety
Human agency and oversight
Privacy and data governance
Diversity, non-discrimination and fairness
Societal and environmental well-being

Talking about ‘trustworthy AI’ emphasizes the tool while de-emphasizing the accountability ecosystem, which becomes a bullet point; all but ensuring that it will not be given the attention it deserves.

Building a trustworthy plane

The reason why no one uses the expression ’trustworthy’ plane(1) or car (2) is not because trust is not essential to the aviation or automotive industries. It’s because trust is not a useful concept for legislative or technical discussions. Instead, more operational terms such as safety, compliance or suitability are used. Trust exists in the discourse around these industries, but is instead placed in the ecosystem of practices, regulations and actors which drive the industry: for the civil aviation industry this includes the quality of pilot training, the oversight on airplane design, or the standard of safety written in the legislation (3).

The concept of ‘trustworthy AI’ displaces the trust from the ecosystem to the tool. This has several potential consequences:

  • Trust could become embedded in the discourse and legislation on the issue, pushing to the side other concepts that are more operational (safety, privacy, explicability) or essential (power, agency(4)).
  • Trustworthy AI could become an all encompassing label —akin to an organic fruit label— which would legitimize AI-enabled tools, cutting off discussions about the suitability of the tool for specific contexts or questions about whether these tools should be deployed at all. Why do the hard work of building accountable processes when a label can be used as a shortcut?
  • Minorities and disenfranchised groups would again be left out of the conversation: the trust that a public official puts into an AI tool will be extended by default to their constituents.

This scenario can already be seen in the European Commission’s white paper on AI(5): their vision occults completely the idea that some AI applications may not be desirable; they outline an ecosystem made of labels, risk levels(6) and testing centers, which would presumably give a definitive assessment on AI tools before their deployment; they use the concept of ’trust’ as a tool for accelerating the development of AI rather than as a way to examine the technology on its merits. Trust as the oil in the AI industry’s engine.

We should not trust AI

Behind Open Knowledge’s Open AI and Algorithms programme is the core belief that we can’t and shouldn’t trust Public Impact Algorithms by default. Instead, we need to build an ecosystem of regulation, practices and actors in which we can place our trust. The principles behind this ecosystem will resonate with the definition given above of ’trustworthy’ AI: human agency and oversight, privacy, transparency, accountability… But while a team of computer science researchers may discover a breakthrough in explainable deep learning, the work needed to set up and maintain this ecosystem will not come through a breakthrough: it will be a years-long, multi-stakeholder driven and cross-sector effort that will face its share of opponents and headwinds. This work can not, and should not, simply be a bullet point under a meaningless label.

Concretely, this ecosystem would emphasize:

  • Meaningful transparency: at the design level (explainable statistical model vs black box algorithms)(7), before deployment (clarifying goals, indicators, risks and remediations)(8) and during the tool’s lifecycle (open performance data, audit reports)
  • Mandatory auditing: although algorithms deployed in public services should be open source, Intellectual Property Laws dictate that some of them will not. The second best option should consequently be to mandate auditing by regulators (who would have access to source code) and external auditors using API designed to monitor key indicators (some of them mandated by law, others defined with stakeholders)(9).
  • Clear redress and accountability processes: multiple actors intervene between the design and the deployment of an AI-enabled tool. Who is accountable for what will have to be clarified.
  • Stakeholder engagement: algorithms used in public services should be proactively discussed with the people they will affect, and the possibility of not deploying the tool should be on the table
  • Privacy by design: the implementation of algorithms in the public sector often leads to more data centralisation and sharing, with little oversight or even impact assessment.

These and other aspects of this ecosystem will be refined and extended as the public debate continues. But we need to make sure that the ethical debates and the ecosystem issue are not sidelined by an all-encompassing label which will hide the complexity and nuance of the issue. An algorithm may well be trustworthy in a certain context (clean structured data, stable variables, competent administrators, suitable assumptions) while being harmful in others, however similar they might be.

(1) The aviation industry talks about ‘airworthiness’ which is technical jargon for safety and legal compliance
(2) The automotive industry mainly talks about safety
(3) which is why federal aviation agencies (FAA) generally do not re-certify a plane validated by the USA’s FAA: they trust their oversight. The Boeing scandal led to a breach of trust and certification agencies around the world asked to re-certify the plane themselves.
(4) I purposefully did not mention fairness here. See this paper discussing the problems with using fairness in the AI debate:
(5) It was published on February 2020, which means that they already had access to the draft version of the Ethics Guidelines for Trustworthy AI
(6) See also the report from Data Ethics Commission of the Government which defines 5 risk levels
(7) Too little scrutiny is put on the relative performance of black box algorithms vs explainable statistical models. This paper discusses this issue:
(8) As of October 2020, Amsterdam (The Netherlands), Helsinki (Finland) and Nantes (France) are the only governments having deployed algorithm registers. But in all cases, the algorithms were deployed before being publicized.
(9) oversight through investigation will still be needed. Algorithm Watch has several projects in that direction, including a report on Instagram. This kind of work relies on volunteers sharing data about their social media feeds. Mozilla is also involved in helping them structure this kind of ‘data donation’ project

This Is Just Amazing / Harvard Library Innovation Lab

The other day, I noticed this on the side of the house.

Category 5 cable with broken jacket

That is near the bottom of the run of Cat 5 Ethernet cable I installed over twenty years ago, from the cable modem and router in the basement through a window frame, up the side of the house and into the third floor through another hole in a window frame. What I found amazing was not so much that the cable, neither shielded nor rated for the out-of-doors, had lasted so long in such an amateurish installation, but that all of our Zoom meetings for the last eight months had passed through these little wires.

The really amazing part, beyond the near-magic of all that audio and video flying through little twists of copper, is the depth of dependency: at each end of that cable is hardware that changes voltages on the wires, operating system drivers for interacting with the hardware, the networking stacks of the operating systems that offer network interfaces to software, the software itself, the systems of authentication and authorization that the software uses to permit or deny access—a cascade of protocols, standards, devices, programming languages, and codebases that become the (mostly) seamless experience of the discussion we have at ten each morning. Or, a moment later, the experience of confirming that the city has accepted the ballot I mailed.

Starry-eyed delight in an amazing machine is clearly not sufficient, with as good a view as we now have of the broken dream of a liberatory Internet. We have to have an acute awareness of the system accidents implicit in our tools and the societal technologies that are connected to them. I believe the delight is necessary, though—without it, I don't see how we can ever learn to treat computers as anything other than an apparatus of control. There's hope, if a grimy cable with a broken jacket can carry joy.

Weeknotes : 42 (2020) / Mita Williams

“Weeknotes are blogposts about our working week”

Web of Weeknotes

Having a set regular writing schedule seems to work for me. Since 2016, I send out a small set of recommended reads, games, and other things every Saturday morning via a TinyLetter to around 200 people. Since August of this year, I’ve managed to send out weekly updates of local civic matters every Monday. I’ve been meaning to write more regularly about library things, so it would make sense to start writing weeknotes here. I’m going to aim for every Friday.

I quite enjoyed the latest Secret Feminist Agenda in which host Hannah McGregor discusses matters of academic mentorship with York Associate Professor and Associate Dean, Lily Cho. I liked how this discussion brought up the existence of the recalcitrant mentored – those students who does not recognize their abilities or do not see themselves in a particular role. But what I particularly appreciated in the conversation was Cho’s remarks that it is either necessary to detangle closeness with mentorship or we need to reimagine closeness. Her insights into University Administration are also worth a listen.

To file under ‘high citations numbers does not always mean a great paper’ is this thread:

Last week I stumbled upon this video that alerted me that a plug-in for Zotero called Zotfile exists that allows for highlighted text from PDFs to easily imported as a note.

This prompted me to revisit the Zotero plug-in page where I learned of a bunch of extensions that I wasn’t previously aware of.

The Zutilo extension appears particularly useful.

There are lots of videos in this inaugural Librarian of Things Weeknotes.

So I may as well include this fine one

Curating Corpora / Ed Summers

Here’s a really nice talk by Everest Pipkin about the need to curate datasets for generative text algorithms, especially when they are being used in creative work. Everest considers creative work broadly as any work where care for the experience is important.

To curate your own corpora is to let you have a hyper-specific control for the tone, vibe, content, ethics, language, and poetics of that space.

Since I’m teaching a data curation class in an information studies department this semester, but I also work in a digital humanities center, this approach to understanding the algorithmic impacts of data curation is super interesting to me. I particularly liked how Everest extended this idea of curation to the rules that sit on top of the models, which select and deslect particular interactions that are useful in a particular context.

It feels like this creative practice that Everest provides a view into could actually be more prevalent than the popular press might have it. There is so much focus on obtaining larger and larger datasets, and neural networks with more and more synapses to approach the complexity found in the brain.

Rather than the selection of an algorithm, tuning its hyperparameters, and the vast infrastructures for training, being the secret sauce perhaps the data that is selected, and how model interactions are interpreted are more equally if not more important, especially in particular contexts.

I guess this also highlights why data assets are so guarded, and why projects like Wikipedia and Common Crawl are so important for tools like GPT-3 and spaCy. It would be pretty cool to be able to select a model based on some subset, or subsets in a dataset like CommonCrawl. Like for example if you wanted to generate text based on text in a fan fiction site like AO3–or even an author or set of authors within an AO3. Maybe something like that already exists?

Unexpected performance characteristics when exploring migrating a Rails app to Heroku / Jonathan Rochkind

I work at a small non-profit research institute. I work on a Rails app that is a “digital collections” or “digital asset management” app. Basically it manages and provides access (public as well as internal) to lots of files and description about those files, mostly images.

It’s currently deployed on some self-managed Amazon EC2 instances (one for web, one for bg workers, one in which postgres is installed, etc). It gets pretty low-traffic in general web/ecommerce/Rails terms. The app is definitely not very optimized — we know it’s kind of a RAM hog, we know it has many actions whose response time is undesirable. But it works “good enough” on it’s current infrastructure for current use, such that optimizing it hasn’t been the highest priority.

We are considering moving it from self-managed EC2 to heroku, largely because we don’t really have the capacity to manage the infrastructure we currently have, especially after some recent layoffs.

Our Rails app is currently served by passenger on an EC2 t2.medium (4G of RAM).

I expected the performance characteristics moving to heroku “standard” dynos would be about the same as they are on our current infrastructure. But was surprised to see some degradation:

  • Responses seem much slower to come back when deployed, mainly for our slowest actions. Quick actions are just as quick on heroku, but slower ones (or perhaps actions that involve more memory allocations?) are much slower on heroku.
  • The application instances seem to take more RAM running on heroku dynos than they do on our EC2 (this one in particular mystifies me).

I am curious if anyone with more heroku experience has any insight into what’s going on here. I know how to do profiling and performance optimization (I’m more comfortable with profiling CPU time with ruby-prof than I am with trying to profile memory allocations with say derailed_benchmarks). But it’s difficult work, and I wasn’t expecting to have to do more of it as part of a migration to heroku, when performance characteristics were acceptable on our current infrastructure.

Response Times (CPU)

Again, yep, know these are fairly slow response times. But they are “good enough” on current infrastruture (EC2 t2.medium), wasn’t expecting them to get worse on heroku (standard-1x dyno, backed by heroku pg standard-0 ).

Fast pages are about the same, but slow pages (that create a lot of objects in memory?) are a lot slower.

This is not load testing, I am not testing under high traffic or for current requests. This is just accessing demo versions of the app manually one page a time, to see response times when the app is only handling one response at a time. So it’s not about how many web workers are running or fit into RAM or anything; one is sufficient.

ActionExisting EC2 t2.mediumHeroku standard-1x dyno
Slow reporting page that does a few very expensive SQL queries, but they do not return a lot of objects. Rails logging reports: Allocations: 8704~3800ms~3200ms (faster pg?)
Fast page with a few AR/SQL queries returning just a few objects each, a few partials, etc. Rails logging reports: Allocations: 820581-120ms~120ms
A fairly small “item” page, Rails logging reports: Allocations: 40210~200ms~300ms
A medium size item page, loads a lot more AR models, has a larger byte size page response. Allocations: 361292~430ms600-700ms
One of our largest pages, fetches a lot of AR instances, does a lot of allocations, returns a very large page response. Allocations: 19837333000-4000ms5000-7000ms

Fast-ish responses (and from this limited sample, actually responses with few allocations even if slow waiting on IO?) are about the same. But our slowest/highest allocating actions are ~50% slower on heroku? Again, I know these allocations and response times are not great even on our existing infrastructure; but why do they get so much worse on heroku? (No, there were no heroku memory errors or swapping happening).

RAM use of an app instance

We currently deploy with passenger (free), running 10 workers on our 4GB t2.medium.

To compare apples to apples, deployed using passenger on a heroku standard-1x. Just one worker instance (because that’s actually all I can fit on a standard-1x!), to compare size of a single worker from one infrastructure to the other.

On our legacy infrastructure, on a server that’s been up for 8 days of production traffic, passenger-status looks something like this:

  Requests in queue: 0
  * PID: 18187   Sessions: 0       Processed: 1074398   Uptime: 8d 23h 32m 12s
    CPU: 7%      Memory  : 340M    Last used: 1s
  * PID: 18206   Sessions: 0       Processed: 78200   Uptime: 8d 23h 32m 12s
    CPU: 0%      Memory  : 281M    Last used: 22s
  * PID: 18225   Sessions: 0       Processed: 2951    Uptime: 8d 23h 32m 12s
    CPU: 0%      Memory  : 197M    Last used: 8m 8
  * PID: 18244   Sessions: 0       Processed: 258     Uptime: 8d 23h 32m 11s
    CPU: 0%      Memory  : 161M    Last used: 1h 2
  * PID: 18261   Sessions: 0       Processed: 127     Uptime: 8d 23h 32m 11s
    CPU: 0%      Memory  : 158M    Last used: 1h 2
  * PID: 18278   Sessions: 0       Processed: 105     Uptime: 8d 23h 32m 11s
    CPU: 0%      Memory  : 169M    Last used: 3h 2
  * PID: 18295   Sessions: 0       Processed: 96      Uptime: 8d 23h 32m 11s
    CPU: 0%      Memory  : 163M    Last used: 3h 2
  * PID: 18312   Sessions: 0       Processed: 91      Uptime: 8d 23h 32m 11s
    CPU: 0%      Memory  : 169M    Last used: 13h
  * PID: 18329   Sessions: 0       Processed: 92      Uptime: 8d 23h 32m 11s
    CPU: 0%      Memory  : 163M    Last used: 13h
  * PID: 18346   Sessions: 0       Processed: 80      Uptime: 8d 23h 32m 11s
    CPU: 0%      Memory  : 162M    Last used: 13h

We can see, yeah, this app is low traffic, most of those workers don’t see a lot of use. The first worker, which has handled by far the most traffic has a Private RSS of 340M. (Other workers having handled fewer requests much slimmer). Kind of overweight, not sure where all that RAM is going, but it is what it is. I could maybe hope to barely fit 3 workers on a heroku standard-2 (1024M) instance, if these sizes were the same on Heroku.

This is after a week of production use — if I restart passenger on a staging server, and manually access some of my largest, hungriest, most-allocating pages a few times, I can only see Private RSS use of like 270MB.

However, on the heroku standard-1x, with one passenger worker, using the heroku log-runtime-metrics feature to look at memory… private RSS is I believe what should correspond to passenger’s report, and what heroku uses for memory capacity limiting…

Immediately after restarting my app, it’s at sample#memory_total=184.57MB sample#memory_rss=126.04MB. After manually accessing a few of my “hungriest” actions, I see: sample#memory_total=511.36MB sample#memory_rss=453.24MB . Just a few manual requests not a week of production traffic, and 33% more RAM than on my legacy EC2 infrastructure after a week of production traffic. Actually approaching the limits of what can fit in a standard-1x (512MB) dyno as just one worker.

Now, is heroku’s memory measurements being done differently than passenger-status does them? Possibly. It would be nice to compare apples to apples, and passenger hypothetically has a service that would let you access passenger-status results from heroku… but unfortunately I have been unable to get it to work. (Ideas welcome).

Other variations tried on heroku

Trying the heroku gaffneyc/jemalloc build-pack with heroku config:set JEMALLOC_ENABLED=true (still with passenger, one worker instance) doesn’t seem to have made any significant differences, maybe 5% RAM savings or maybe it’s a just a fluke.

Switching to puma (puma5 with the experimental possibly memory-saving features turned on; just one worker with one thread), doesn’t make any difference in response time performance (none expected), but… maybe does reduce RAM usage somehow? After a few sample requests of some of my hungriest pages, I see sample#memory_total=428.11MB sample#memory_rss=371.88MB, still more than my baseline, but not drastically so. (with or without jemalloc buildpack seems to make no difference). Odd.

So what should I conclude?

I know this app could use a fitness regime; but it performs acceptably on current infrastructure.

We are exploring heroku because of staffing capacity issues, hoping to not to have to do so much ops. But if we trade ops for having to spend much time on challenging (not really suitable for junior dev) performance optimization…. that’s not what we were hoping for!

But perhaps I don’t know what I’m doing, and this haphapzard anecdotal comparison is not actually data and I shoudn’t conclude much from it? Let me know, ideally with advice of how to do it better?

Or… are there reasons to expect different performance chracteristics from heroku? Might it be running on underlying AWS infrastructure that has less resources than my t2.medium?

Or, starting to make guess hypotheses, maybe the fact that heroku standard tier does not run on “dedicated” compute resources means I should expect a lot more variance compared to my own t2.medium, and as a result when deploying on heroku you need to optimize more (so the worst case of variance isn’t so bad) than when running on your own EC? That’s maybe just part of what you get with heroku, unless paying for performance dynos, it is even more important to have an good performing app? (yeah, I know I could use more caching, but that of course brings it’s own complexities, I wasn’t expecting to have to add it in as part of a heroku migration).

Or… I find it odd that it seems like slower (or more allocating?) actions are the ones that are worse. Is there any reason that memory allocations would be even more expensive on a heroku standard dyno than on my own EC2 t2.medium?

And why would the app workers seem to use so much more RAM on heroku than on my own EC2 anyway?

Any feedback or ideas welcome!

Evergreen 3.6.0 Released / Evergreen ILS

The Evergreen Community is proud to announce the release of Evergreen 3.6.0! Release 3.6 is more fully featured than originally planned, and includes a number of significant new features available to all Evergreen users thanks to the efforts of the Evergreen Community. Highlights include:

New Features:

  • Course Reserves Module
  • Curbside Pickup Module
  • Experimental Bootstrap-Based OPAC
  • Angular Staff Catalog becomes the default in the client
  • Patron API Authentication
  • Enhanced Documentation using Antora


  • Angularization of many client interfaces
  • Matomo Support for catalog analytics
  • Dedicated interface for managing Hopeless Holds
  • Test patron SMS and Email notification methods
  • Enhanced Print and Email from the public catalog
  • Stripe credit card payments upgraded to use v.3 (Elements)
  • Support for subtotals in Reports
  • Support for pasting a list of barcodes in Item Status

Release 3.6 is available for download from the Evergreen downloads page; the full feature list is available in the release notes.

Thanks to all those who contributed code, testing, feedback, documentation, and their time and expertise to make Release 3.6.0 possible!

The Evergreen 3.6 Release Team

Galen Charlton
Jason Boyer
Terran McCanna
Michele Morgan

Fedora Migration Paths and Tools Project Update: October 2020 / DuraSpace News

This is the first in a series of monthly blog posts that will provide updates on the IMLS-funded Fedora Migration Paths and Tools: a Pilot Project. The first phase of the project began in September with kick-off meetings for each pilot partner: the University of Virginia and Whitman College. These meetings established roles and responsibilities for the pilots, goals and deliverables, and timelines to accomplish the work. Following the meetings we established project infrastructure, including publicly-accessible GitHub project boards.

GitHub project board

The project plan for each pilot is available on the wiki: University of Virginia and Whitman College. These plans will be updated as work progresses and goals are met. Relevant project deliverables will be shared publicly as they are completed.

This month, the University of Virginia pilot will focus on working with a sample of their Fedora 3 content to document data models and metadata and make decisions with regard to mappings. An initial test migration will be conducted to determine what changes will need to be made in order to accomplish the complete migration. They are also setting up a new cloud-based Fedora 6.x instance using fcrepo-docker and the newly-created fcrepo-aws-deployer. These tools can be used by others interested in setting up a Fedora instance using Docker and AWS. At the same time, the grant team is working on a validation tool that will compare Fedora 3 data with migrated Fedora 6 data to ensure completeness and integrity of the migration.

The Whitman College team is focused on metadata this month, both in terms of remediation and mapping. Scope, goals, and a timeline for these activities has been established, and deliverables will be shared with the community for feedback as they are completed. The team is also working on a detailed set of functional requirements, which will be used to conduct a gap analysis and prioritize and development efforts that might be needed. This list of functional requirements will be shared once it has been completed.

Stay tuned for future blog posts as we make progress on this project. All relevant resources can be found linked off the project landing page. Please contact David Wilcox with any questions or feedback.


The post Fedora Migration Paths and Tools Project Update: October 2020 appeared first on

Fuzzy / Ed Summers

It’s pretty weird to watch your eyesight dim. Over the past 6 months or so I’ve noticed my eyesight degrading surprisingly fast. I learned a few months ago that I’ve developed cataracts in both eyes. My right eye is significantly worse, and it is at the point that I can still see light, but I can’t really make out objects any more. I have some some difficulty reading unless I close my right eye. But now my left eye is getting worse too. I’m scheduled for cataract surgery on December 1st. Every day I wake up feeling grateful that modern science has made it possible to try to repair them, and doctors have gotten pretty good at it.

This will sound pretentious and/or melodramatic but this situation reminded me of how the writer Borges went slowly blind by the age of 55. I’m just a few years shy of that. It’s hard to imagine someone who treasured their eyes and reading more than Borges. I’m no writer, but I do rely heavily on my eyes to do my work as a software developer, so I can relate a little.

When I first read his poem In Praise of Darkness in my twenties I found it strangely comforting. This idea of a far off dimming as an old man, lost in memories, losing memories, a stripping away that leaves an essence. But now it has an unsettling edge to it–especially as I remember that Borges was reportedly frightened of mirrors. If there was a solution to the algebra of the self, would you want to know what it was?

In Praise of Darkness

Old age (this is the name that others give it)
may prove a time of happiness.
The animal is dead or nearly dead;
man and soul go on.
I live among vague whitish shapes
that are not darkness yet.
Buenos Aires,
which once broke up in a tatter of slums and open lots
out toward the endless plain,
is now again the graveyard of the Recoleta,
    the Retiro square,
the shabby streets of the old Westside,
and the few vanishing decrepit houses,
that we still call the South.
All through my life things were too many.
To think, Democritus tore out his eyes;
time has been my Democritus.
This growing dark is slow and brings no pain;
it flows along an easy slope
and is akin to eternity.
My friends are faceless,
women are as they were years back,
one street corner is taken for another,
on the pages of books there are no letters.
All this should make me uneasy,
but there’s a restfulness about it, a going back.
Of the many generations of books on earth
I have read only a few,
the few that in my mind I go on reading still–
reading and changing.
From south and east and west and north,
roads coming together have led me
to my secret center.
These roads were footsteps and echoes,
women, men, agonies, rebirths,
days and nights,
falling asleep and dreams,
each single moment of my yesterdays
and of the world’s yesterdays,
the firm sword of the Dane and the moon
    of the Persians,
the deeds of the dead,
shared love, words,
Emerson, and snow, and so many things.
Now I can forget them. I reach my center,
my algebra and my key,
my mirror.
Soon I shall know who I am.

ISLE Sprint: November 2nd to 13th / Islandora

ISLE Sprint: November 2nd to 13th dlamb Tue, 10/13/2020 - 15:48

Have you seen ISLE for Islandora 8 yet?  It's Islandora on Docker and it's crazy fast.  It's a much better overall experience for intstalling and maintaining Islandora.  And best of all, ISLE has recently transitioned into a full community project!  What started as a Islandora Collaboration Group and Born Digital project is now officially under the umbrella of the Islandora Foundation. To ensure it gets the open source love it needs to really take off, we're proposing something we've never done before: quarterly sprints. ISLE already has so much momentum with an active interest group and contributors that we think this is sustainable moving forward.

ISLE is such an amazing facet of Islandora.  It lowers the overall barrier to entry by making installation and maintenance much smoother. What used to take 45 minutes to an hour and was subject to random fails now takes about 5 to 10 minutes and is much more stable.  After the initial build, making changes generally takes seconds because ISLE uses Docker's amazing buildkit feature, which dramatically speeds up the process using aggressive caching.  Oh, and did we mention it also works on Windows? But ISLE is still relatively new for Islandora 8, and it needs a little work... and a lot of documentation.

That's why we're asking all you developers, documenters, and users out there to help us out!  Even if you have no experience with Docker, just trying it out and letting us know what you think is helpful.  From November 2nd to the 13th, we'll be documenting, testing, and hacking on ISLE or Islandora 8.  If you're interested in joining us, please sign up here.  Commit to as much or as little as you like.

If you'd like to help organize the sprint, we're also holding a sprint planning meeting the week before to get ready.  Please fill out this doodle poll if you are interested.  We'll close the poll on October 23rd.

We hope to see you there. And if Docker isn't your thing, don't worry, we're lining up a bunch more community sprints around everything from Drupal 9 to embargoes to migration tools and batch ingesting.  And of course, there's always plenty of work to do on documentation. We'll be making more announcements in the coming weeks.




Announcing Incoming NDSA Coordinating Committee Members for 2021- 2023 / Digital Library Federation

Please join me in welcoming the two newly elected Coordinating Committee members Elizabeth England and Jessica Neal, and one re-elected member, Linda Tadic. Their terms begin January 1, 2021 and run through December 31, 2023. 

 Elizabeth England is a Digital Preservation Specialist at the U.S. National Archives and Records Administration, where she participates in strategic and operational initiatives and services for the preservation of born-digital and digitized records. She previously was the Digital Archivist and a National Digital Stewardship Resident at Johns Hopkins University. Elizabeth currently serves on the NDSA Communications and Publications group and the DigiPres 2020 Planning Committee.

Jessica Neal, was recently named the Sterling A. Brown Archivist at Williams College, having previously been the  College Archivist at Hampshire College. Additionally, Jes is a workshop facilitator with DocNow, and a member of NDSA’s DigiPres 2020 Planning Committee.

 Linda Tadic has served on the Coordinating Committee for the past two years. As an educator, she incorporates NDSA reports and projects into her courses in the UCLA Information Studies department. Additionally, Linda brings her diverse experience working in non-profit and educational archives, managing digital asset management systems, and founding Digital Bedrock, a managed digital preservation service provider.

We are also grateful to the very talented, qualified individuals who participated in this election.

We are indebted to our outgoing Coordinating Committee members, Karen Cariani, Bradley Daigle (Chair), Sibyl Schaefer, and Paige Walker, for their service and many contributions. To sustain a vibrant, robust community of practice, we rely on and deeply value the contributions of all members, including those who took part in voting.

The post Announcing Incoming NDSA Coordinating Committee Members for 2021- 2023 appeared first on DLF.

Endangering Data Interview with Sarah Lamdan / Digital Library Federation

Sarah Lamdan headshotSarah Lamdan is a Professor of Law at CUNY School of Law in Long Island City, NY. She has a master’s degree in library science and legal information management. She also has a law certificate in environmental law. Her work focuses on information law and policy.

Professor Lamdan works on issues across the spectrum from open government to personal privacy. She is currently writing a book about data control and access called Data Cartels, which will be published by Stanford University Press. Sarah is a member of the Environmental Data & Governance Initiative and works with immigration groups on government surveillance issues. Lamdan’s book, Environmental Information: Research, Access & Environmental Decisionmaking (Environmental Law Institute 2017) serves as a resource for journalists, scientists, and researchers who use government science information in their work.

Tell us a bit about your projects and how you became interested in issues of data privacy, collection, and surveillance.

I became interested in the topic after seeing a news article in 2017 about ICE’s “extreme vetting” social media surveillance program, and noticing that Thomson Reuters and LexisNexis reps had attended an ICE event to learn about how to win gov’t contracts to participate in the invasive immigrant surveillance program. Thomson Reuters and LexisNexis (part of the data analytics giant RELX Group) are the main suppliers of legal research products for the legal profession. Their products, Westlaw and Lexis, are considered the “gold standard” legal research products, and together, the companies have a legal information duopoly. I was concerned about the ethical implications of immigration lawyers using products that may ultimately be participating in ICE surveillance programs that harm their clients.


You’ve written several pieces[1] detailing how many vendor business models go far beyond licensing scholarly journals to academic researchers and law firms, and include selling mailing addresses, social media data, credit and criminal records, and much more to marketing firms, political consultants, and law enforcement. How did those companies develop?

So, as I started researching about Thomson Reuters and LexisNexis’s relationships with ICE, it became clear that these companies weren’t the companies that I thought they were. As a librarian, these companies were marketed as publishers. I knew Reed Elsevier (RE of the RELX) as a publisher of scholarly journals, and LexisNexis (LX) as a publisher of legal resources and news. Thomson Reuters supplied financial and legal search platforms to business and law firm libraries that I’d worked in. 

I learned that, over the past decade, these companies have morphed from being “publishers” to being “data analytics corporations.” Library markets are changing as more information becomes open access and freely available online, especially when it comes to legal resources. Government websites and nonprofit groups have pushed to make laws more accessible on the internet. At the same time, data analytics seems to be the future profit source – collecting huge amounts of data and using algorithms, AI, and machine learning to “slice and dice” data to build informational resources for clients. Since the 90’s Thomson Reuters and RELX Group have acquired hundreds of companies and tons of data to position themselves as the premier data analytics firms.

Although vendors like Thomson Reuters and RELX are notoriously secretive about the library data they collect and how they use it, do members of the library community have any idea about how that data is used in their broader data broker ecosystem? How might data collected from users of LexisNexis, Scopus, Elsevier journals, etc. be of value to non-library audiences? How it may be aggregated with other data?

It seems that Thomson Reuters, RELX Group, and other online research platforms benefit from using library data to market their products, and create new products, for those same users. Sam Moore describes how these platforms use “seamless access” (“Get Full Text Research, for example[2]) to gather data about its users that the companies can monetize researchers’ searches to tailor services for those, and other, users. Wolfie Christl similarly noticed that when you do research using Elsevier, ThreatMetrix, an RELX surveillance data product, stores a personal identifier in your browser to track your searching.

We can’t be sure what the companies are doing with this data (aka we don’t know whether they are using it internally or selling it/sharing it externally, etc.) but we do know that our research is being tracked by the companies whose platforms we, and our patrons, rely on to do our research.


You’re working on a book manuscript about data cartels. Can you share a little bit more about that project, and what the larger ecosystem of data cartels looks like?

As I tried to figure out what these data analytics companies do and how their different products connected, I learned that there isn’t much research on these publishers-turned-data analytics corporations. Information science tends to focus more on communication technologies and platforms (algorithms, machine learning, social media, search engines) and not as much on the duller, less-dynamic data vendor side. It’s like focusing on modems, themselves, instead of the Internet – boooring. Because there isn’t much discussion of these companies beyond librarianship, we haven’t seen the full pictures of these companies: they don’t just sell platforms to libraries, they also sell platforms to financial firms, cops, news orgs, and more. Several companies are simultaneously academic research oligarchies, legal research duopolies, federal and state police surveillance monopolies. These companies have consolidated control over informational flows in libraries and beyond, restricting and stratifying informational access and data privacy in all of our communities.


In Librarianship at the Crossroads of ICE Surveillance, you write that we must not pass privacy protections on to patrons, or donate the labor of erasing our patrons’ data to vendors, but rather to demand “privacy by design” from vendors. Have you seen any progress on this front?

“Privacy by design” is an idea described by Ann Cavoukian, the former Information and Privacy Commissioner for the Canadian province of Ontario. I bought into this idea in an article I wrote in 2015 (Social Media Privacy: A Rallying Cry to Librarians), and tried to incorporate it into librarians’ work with vendors and the resources we use in our research and reference work. I haven’t seen any data analytics corporations affirm privacy by design concepts lately, and in fact, it seems that, based on research like Moore’s and Christl’s, they are expanding the surveillance in their own products. 


In Librarianship at the Crossroads of ICE Surveillance, you also wrote that librarians are information technology’s early adopters, and often information technology’s first critics. As information professionals, what do you think our role is outside of the library to advocate for data justice?

While I think that librarians have a lot of leverage as the gatekeepers for research platform products, the people who sign the contracts, teach the patrons how to use the products, etc., I am always cognizant of Ettarh Fobazi’s work on “vocational awe.” Librarians can harm ourselves as workers by assuming the huge societal burdens sometimes foisted on libraries and their employees. So, I think we have power, but I don’t think it’s our job, alone, to save the world. We can use our power as we choose, and there have been some really thoughtful and excellent library initiatives around data privacy including the open access movement, ideas around baking privacy guarantees into contracts with data analytics companies, and other negotiations with these data platform giants. We’ve seen how libraries can even choose to walk away from “big deal” contracts, which is very empowering.


Is there anything else you want to add, or any work or other projects you want readers to know about?

There is so much awesome librarian work going on right now. Information access and the products we use are changing all the time, and I think that there is no group more aware of how the changing data privacy and access universe impacts our lives than librarians. So, stay strong and keep going! 

  1. Defund the Police, and Defund Big Data Policing, Too (2020), Librarianship at the Crossroads of Big Data & Corporate Surveillance (2019), When Westlaw Fuels ICE Surveillance: Ethics in the Big Data Policing Era (2019).
  2. Individuation through infrastructure: Get Full Text Research, data extraction and the academic publishing oligopoly (2020)


The post Endangering Data Interview with Sarah Lamdan appeared first on DLF.

Things are already happening for Samvera Connect! / Samvera

Samvera Connect ‘proper’ is a little over a week away but the poster exhibition is now available here!  There is a Slack channel #connect-posters for asynchronous comment or discussion and each presenter has a 30 minute video conferencing slot Monday 19th – Wednesday 21st October for live discussion.  See Sched for details.  Need a Slack account? – use this link here.

The main conference events are  on Thursday 22nd, Friday 23rd, and Monday – Thursday 26th-29th from 11:00am – 2:30pm ET (approx).  Full details can be found on the wiki pages here and in Sched.

Connect 2020 On-line is free of charge but registration is required. To register:

  1. Create an account on the conference Sched website:
  2. Select the workshops, presentations, and/or community events you will attend and add them to your schedule. Some workshops have limited seats and are very close to full!
  3. Later this week, all Attendees registered in Sched will be emailed a Zoom webinar registration link. Follow this link to obtain your unique URL to join the conference.  The same URL will work throughout the conference program on Friday 10/23 and Monday – Thursday 10/26 – 10/29.  You may also receive specific connection information for any workshops or community events you’ve added to your schedule.

Questions? Email Heather Greer Klein, Samvera Community Manager,

The post Things are already happening for Samvera Connect! appeared first on Samvera.

Penny / Ed Summers

We welcomed a new dog into our house this weekend. Her name is Penny.

It’s amazing to me how much the energy changes in our family with a dog around. I still miss our Tim who died suddenly last year. But seeing Penny reminds me of all the good times we had. I wish they could have met each other.

Here is Maeve’s list of potential dog names. She was polling everyone to see what their favorite names were. We ended up going with Penny because she is the color of a penny, at least right now. We also liked the idea of having her full name be Penny Lane.

Her litter name was Latte, which we debated keeping. Ruthie had a strong contingent of support (despite the poll numbers). But it looks like Penny stuck for now.

Shiny - maintenance and memory / Hugh Rundle

When I opened my RSS feed to check the latest edition of IT and Libraries I could scarcely believe my eyes. There at the end of the list of articles is Integrated Technologies of Blockchain and Biometrics Based on Wireless Sensor Network for Library Management. It promised to be a horrifying nightmare combining all the worst technologies of our current moment, and it didn't disappoint - by which of course, I mean it turned out to be even worse than I expected.

Has ITAL been pwned?

Firstly, we need to better define some terms used in this rather strange paper.

Blockchain solves the conundrum of how to turn ransomware into accelerated climate change. Estimates of exactly how much energy is used by Bitcoin and friends vary - from "only as much as the whole of Estonia" to "single handedly ensuring fiery death for all mammals" - but whatever the true number is, nobody disagrees that by design proof-of-work blockchains use astounding amounts of energy in order to perform some basic accounting.

Biometrics are like a password but easier to use because you don't have to remember it. The big advantage of biometrics is that when the database is breached you simply change your face to a new one, or replace your eyeballs, or create new fingerprints.

Wireless Sensor Networks a.k.a. "The Internet of Things" (IoT) solve the problem of no longer being able to simply add radium to to your uninspiring product in order to make it seem modern. With IoT, toy manufacturers can now solve the problem of not being able to record and store the inner thoughts of children; lightglobe manufacturers can solve the problem of customers still being able to see in the dark after the venture capital runs out; and the Dutch East India Company Amazon has been able to solve the multiple problems of customers needing to interact with freight transport workers, white Americans having to share public streets with Black people, and governments having to go to the bother of installing surveillance devices in the homes of citizens before spying on them.

The biggest problem solved by the Internet of Things, however, is that of private communications being too secure. Hand the problem to companies with no information security experience, sprinkle a bit of IoT magic on it, and you can finally sleep at night knowing that every message you send can be read by anyone, anywhere, at any time. Nice work 👍.

Whilst we have now established the individual problems these technologies set out to solve, you might be wondering what particular problem this proposal to combine them aims to solve. Alas, when you finally reach the end of the article, crying into your screen, you will remain none the wiser. Like George Mallory contemplating Mount Everest, the author appears to want to combine these technologies simply because "they're there".

When we look a little closer, however, the vision becomes even weirder. There seem to be only three possible explanations for the existence of this article. Either:

  1. it was actually written by GPT-3 and ITAL is following the lead of The Guardian and Aaron Tay;
  2. It's an Ern Malley-style prank;
  3. ITAL has given up peer-reviewing articles; or
  4. All of the above

I'm not going to give a blow-by-blow description of everything that is problematic about this article, but it shows such a questionable understanding of both basic library operations and basic software development principles that ITAL really needs to explain how it came to publish it. If we gloss over the particular details, however, you've read this sort of article many times.


Blockchain is designed for recording transactions across distributed, untrusting independent actors, in a way that deliberately prevents deletion of historical data (the database is "immutable"). What libraries need and want is the complete inverse of all these things: there could not be a less appropriate technology for managing library loans. Libraries trust members and members trust libraries: that's the deal. Libraries are built on trust, particularity, and free inquiry. These are human qualities, and need human maintenance. The most striking, if mundane, aspect of Integrated Technologies of Blockchain and Biometrics Based on Wireless Sensor Network for Library Management is the unexamined assumption that "library management" is not only possible without human staff, but desirable.

I say mundane because this is not exactly a new idea. The UK has spent much of the last decade turning its libraries into receptacles for old Tom Clancy novels, "managed" by retired busybodies or nobody at all. Purveyors of RFID hardware for libraries spruik integrated systems for controlling door access to staff-free "libraries". All of these pushers of the new shiny and efficient library seem oblivious to the fact that library management consists mostly not of transactions, but rather of maintenance.


Tech bros confusing their success in a very tiny subset of human endeavour for generalised genius in all areas of life is well documented. The case of How Uber Turned a Promising Bikeshare Company Into Literal Garbage is a particularly relevant example:

It was, to JUMP’s longtime employees, a fundamental misunderstanding of what kind of business they were in. Uber was running JUMP with the mindset that anything that’s broken can be patched, but, as one employee put it, “a firmware update can’t fix a bike chain.”

But the original JUMP team didn't just understand bicycle maintenance: they also understood that successful public infrastructure requires a huge amount of community maintenance: building relationships, trust, and mutual respect. Prior to the sale to Uber, the company spent month working on "Requests for Proposal" from city governments, developing partnerships with the cities to put infrastructure in the right places, where it would be both useful and accepted by local residents. Towards the end of the article, journalist Aaron Gordon provides a summary of why it was inevitable that venture-capital backed "dockless" share bikes would fail, as they have in both the United States and Australia:

Useful mass transportation doesn’t suddenly appear. It is carefully nurtured from a tiny seedling of a good idea to a fully-formed organism that breathes life into a city. It is a process that takes time and effort and patience as well as money.

But time, effort, patience and money are boring. When things — inevitably — go wrong, it's usually the technology, or the last remaining staff, who get the blame. Sometimes, as in Australia's robodebt clusterfuck, it's the victims themselves who are blamed. Mar Hicks describes in a piece for Logic how decades of under-investment in code maintenance and organisational knowledge was blamed on the entire COBOL language, when America's unemployment payment systems melted down at the onset of COVID-19. The pandemic is, itself, an illustration of simple, pro-social behaviours being much more effective than the latest shiny technology. The richest countries with the most expensive and fancy hospitals in the world have experienced a catastrophic breakdown in their health systems and tens or hundreds of thousands of deaths. The places most effectively dealing with the virus are doing so through low-tech techniques proven to work for centuries: masks, hand-washing, restricted movement, and quarantine.

Anthropologist Shannon Mattern explores similar ideas in a wonderful article in Places. Mattern has written extensively on maintenance, care, and — as our Marxist friends would call it — social reproduction:

We should always ask: what, exactly, is being maintained? “Is it the thing itself,” Graham and Thrift ask, “or the negotiated order that surrounds it, or some ‘larger’ entity?” Often the answer is all of the above.

What, exactly, is being maintained?

Mattern has put her finger on the big question of our time. For GLAM workers, asking "what, exactly, is being maintained?" has thrown up some discomforting answers. Assuming that "preserving the cultural record" is an incontestable good is foolish at best. Whose cultural record? On what terms? In what manner? To what end? Maintenance is politics.

How we approach maintenance is a reflection of our personal and institutional values. The Australian government's recent budget has allocated an undisclosed sum to pay an oil company for advice on how to manage a floating oil platform they owned before it was abandoned, whilst returning unemployment payments to a level where every unemployed single person in Sydney will have to fit into the literally six rental properties they can afford. Australia's first Evangelical Prime Minister flirted with compassion for a few months when COVID hit, but I guess he must have eventually remembered that God Wants You to be Rich.

So we're having a reckoning. Libraries, archives, museums, and art galleries are full of people like this. Soldiers who brought "civilisation" at the end of a bayonet or gun barrel. Wildlife artists who ensured their subjects were good and dead before they sketched them. Missionaries who "preserved" local languages and customs in order to best understand how to eradicate them. "Explorers" who were unembarrassed to create new place names like Massacre Bay, Convincing Ground and Murdering Creek. Our streets, universities, and towns are named after them. Our public parks are mass graves with the statues of violent robbers set on plinths above the bodies.

The "History wars"; the shrieks about "Cultural Marxism"; the armed Police protecting statues of long-dead military men — this is all about maintenance. They're ordering us to keep touching up the gold leaf, and instead we're peeling back the wallpaper to reveal what's underneath. Preserving culture as memory institutions requires constant maintenance. Tape turns to soup, paper desiccates, hard drives fail. But what, exactly, is being maintained? — culture is constantly produced and reproduced, and this requires communities to decide what to remember and what to forget. GLAM workers are in a powerful position to determine what is reproduced through maintenance and use.

Hence the push to remove us. Robots and software don't ask difficult questions. They don't ask whether we should change the date, or pay the rent. They're not interested in Makarrata, or reparations. They don't ask whether ancestors should be repatriated, or how the artefacts came to be locked in a box. They just do what they're told, maintaining the status quo in an endless, self-feeding loop.

A world with no workers, no trust, and no privacy, where nothing ever changes. That's the dream of libertarian-capitalists. Let's not give them any more space in our journals or our conversations: they're taking up enough already.

Samvera Connect 2020 On-line is nearly here! / Samvera

Samvera’s annual Connect conference has gone virtual this year, like so many others.  Nevertheless, we’ve put together an exciting program of workshops, presentations, posters and community social events that we hope will make up for not being able to meet in person.  The main events are on Friday 10/23 and Monday – Thursday 10/26 – 10/29 with workshops on 10/22.  The conference runs from 11:00am EDT to approximately 2:30pm EDT each day.

You can find full details of the conference on our wiki pages at this link.  There you’ll also find details of our online shop where you can get themed conference goodies to make up for the lack of our usual free conference t-shirt.

Connect 2020 On-line is free of charge but registration is required. To register:

  1. Create an account on the conference Sched website:
  2. Select the workshops, presentations, and/or community events you will attend and add them to your schedule. Some workshops have limited seats and are filling up fast!
  3. During the week of October 12th, all Attendees registered in Sched will be emailed a Zoom webinar registration link. Follow this link to obtain your unique URL to join the conference.  The same URL will work throughout the conference program on Friday 10/23 and Monday – Thursday 10/26 – 10/29.  You may also receive specific connection information for any workshops or community events you’ve added to your schedule.

Questions? Email Heather Greer Klein, Samvera Community Manager,

The post Samvera Connect 2020 On-line is nearly here! appeared first on Samvera.

Fuzzy File Formats / Ed Summers

We just finished the third module in the Intro to Digital Curation class that I’m teaching this semester. The first module was mostly us getting our bearings, and the second was learning about how confusing the term digital object is, and getting acquainted with interacting with the file system.

The overarching goal of the class is to start with seeming simple ideas like digital objects, and files and keep zooming out until the last module where we talk about infrastructure. This may prove to be a bad idea, but so far it seems to be working ok. Each module is two weeks where we alternate between reading and discussion (in Canvas) and reading and coding (in Jupyter notebooks).

In Module 3 we discussed and experimented with file formats and standards. We talked about Chapter 3 in Trevor Owens’ Theory and Craft of Digital Preservation where he lays out some examples of the artifactual qualities of digital objects, that center on the properties of several types of files, and their use. I asked students to identify the formats at play, and I was surprised and intruiged by the confusion between file formats and other computational objects. For example some thought that a game was a format. I mean, I guess they often are. But a game itself is abstract a concept to be a file format.

Maybe the game is made up of some Python source code files (as in Owens’ Civilization example). Or perhaps the script to Rent is made up of Word files. Or in the case of the Mystery_House game a specific disk imaging format was used to store a representation of a physical disk. Some students confused the file format and the medium it was stored on (a zip disk, a CD or a floppy disk).

Perhaps this confusion arose from my pairing of the Owens reading with Andrew Russell and Lee Vinsel’s short NYTimes piece The Joy of Standards. I wanted students to think about the relationship of file formats and standards. But standards swarm in devices like storage media and computational devices. I didn’t do a great job of distinguishing between file formats, storage and standards, and showing how they were interrelated. That being said, more than half the students were able to communicate the difference–so it wasn’t a total wash.

In the Jupyter notebook exercise we explored file format identification through three different methods:

Since running fido wasn’t exactly trivial in a Jupyter notebook (it’s really designed to be run from the command line) I created a small pip installable utility called puid which give you one function get_puid() which made it much easier to give to students to use. Maybe it could be useful in other contexts:

The notebook provides some examples of how to programmatically iterate through files and apply each method for file format identification. Then it asks them to do some file format checking of unique dataset of files that I created for each student that was sampled from the Govdocs1 dataset on Digital Corpora. The interesting part came when they were asked to compare the results, and try to explain why there might be differences.

Several students honed right in on the fact that python-magic and fido represented different ways of classifying formats. The granularities were often different, and the identifiers they used were different as well. Some even highlighted the processes by which these tools are maintained which was very interesting to see.

All in all I think it was a successful exercise, because students started thinking about how different tools generate different truth values, and that it’s important to think critically about the tools we use, especially in digital curation practices. Tools themselves are part of particular computational practices, and not law etched in stone. Simple things like file format identifiers have fuzzy edges that can be hard to define. But identifying them is super important for rendering them as digital objects.

Next up we’re looking at internal metadata which I’ll try to write about here when we get to it. I would like to bundle up these notebooks in a useful way at the end of the semester if these exercises look useful to others. I’ve really been enjoying using Colab so far. Thanks Nick for the recommendation!

Why would anyone pay $1500 to learn how to write notes? / Mita Williams

Part one

In 2018, musician and writer Claire L. Evans spoke at the XOXO Festival sharing some of the stories that she tells more fully in her book, Broad Band: The Untold Story of the Women Who Made the Internet. It was from this presentation that I first learned about the Microcosm system – a working hypertext system that predated the world wide web.

I learned from Evans that the Microcosm system – like the world wide web – offered links between documents and media – but unlike the World Wide Web – the links between objects were not stored in the documents themselves but in a separate system. Not only did this extra infrastructure ensure that the reader would never be presented a broken link, but the system allowed for multiple sets of different links that could connect files together. This meant that a beginner could be provided a different experience from say, a domain expert.

It was a system that was more aligned to Vannevar Bush’s original vision of MEMEX – an environment in which the reader and not the author who makes the most associations between documents.

Crucially, Microcosm offered bi-directional linking.

“The system we were working on at Southampton Microcosm [the pre-web hypermedia system developed in the 1980s] had very sophisticated two way linking,” says Dame Wendy Hall, professor of computer science at the University of Southampton. “It was very prescient of the Semantic Web – you used the links to describe why you were making that relationship between those two data objects.”

How Google warped the hyperlink, WIRED UK, Sophie Charara, 26 March 2019

Recently, I’ve became interested in new-to-me note taking software because some of my favourite newsletter writers wouldn’t stop talking about how much better their lives had improved now that they had adopted Notion or Roam or Obsidian to their lives. Unable to restrain my curiosity any longer, I moved my to do lists and other notes to Notion and I watched a lot of YouTube videos on how to best build my system.

On September 16th, I wrote a blog post called Noting well about these systems and how they fit into a model called The Digital Garden.

On September 17th, Notion introduced bi-directional Linking to their system.

Part Two

Once you have a note-taking system such as Notion, Obsidian, or Roam Research, or other system that uses bi-directional linking, now you can build your second brain.

How? You can spend $1500 USD to find out.

You will learn how to capture, organize, and share your ideas and insights using digital notes, with a systematic approach and tools that you trust to support creative breakthroughs in your work

Or you can spend $13.99 USD for the print version of How to take Smart Notes: One Simple Technique to Boost Writing, Learning and Thinking – for Students, Academics and Nonfiction Book Writers.

This is the step-by-step guide on how to set up and understand the principle behind the note-taking system that enabled Luhmann to become one of the most productive and systematic scholars of all time. But most importantly, it enabled him to do it with ease. He famously said: “I never force myself to do anything I don’t feel like.” Luhmann’s system is often misunderstood and rarely well explained (especially in English). This book aims to make this powerful tool accessible to everyone with an interest in reading, thinking and writing. It is especially helpful for students and academics of the social sciences and humanities and nonfiction writers.

I opted to spend the $13.99.

You may opt to watch this video instead:

Part Three

Both the Building a Second Brain and the Smart Notes systems are means to encourage better note taking for learning, and by demanding that the user immediately paraphrases what they’ve just learned, they end up creating an environment where excerpts can easily be found and brought together into a linear text.

From what I can understand, the major difference between the Build a Second Brain method of notes taking and the Smart Notes method, is that while the Smart Notes method encourages the reader to connect captured ideas together as growing lines of thought, the BASB method encourages the reader to file ideas into new or existing Projects.

It is not surprising that newsletter writers, podcasters, YouTubers, and other content creators have gravitated to these note taking systems since they are built for “borrowed creativity”, “intermediate packets”, and “idea recycling”.

The video above is from Ali Abdaal who largely makes videos about productivity. In another video, Ali flexed that he makes more money from his passive income sources of YouTube Adsense and Skillshare than his day job as a junior doctor in the UK.

Is it surprising then to learn that the creator of the BASB of note-taking situates that work in a larger context of being a Full-Stack Freelancer?

Except from The Rise of the Full-Stack Freelancer

Is it just me or does this sound a little too much like a ponzi scheme or multi-level marketing system in which each influencer sells the promise of productivity systems through sponcon-paying videos on Adsense-paying YouTube channels to gather enough of an audience to drive the viewer to Skillshare?

It almost makes me worried for Academia.

Luckily Ali has a Skillshare course on stoicism for that worry.

(Man, what is it with these stoics?)

Part Four

For the record, I was surprised how much I was inspired by the promise of the Smart Notes system as described by Sönke Ahrens.

I used my own version of it to develop this very blog post:

I am trying to take smart notes on my readings going forward. I wish I had started earlier. Much earlier.

I was not a great undergraduate student. I felt like I immediately forgot everything I learned in class after I wrote the final exam, even in courses that I had excelled in. What I learned never felt like my own. It felt like I was being asked to memorize textbooks rather than than build my own sense of understanding and ask my own questions. What if, I wonder, what if I had otherwise imagined my undergraduate degree as a time to build up a zettelkasten to call my own?

There’s another reason why I am gravitating to the smart notes system.

I have been writing on the web (otherwise known as blogging) for over 20 years. I recognize that many times I feel inspired to share some insight that occurred only because I had stumbled on a connection between 2 or 3 disparate ideas within the span of a week or two. But I’m a middle aged woman now and I’ve forgotten more than I can even remember. I don’t write blog posts that mention an amazing essay I’ve bookmarked seven years ago, because I’ve forgotten that I’ve even read it.

I’m not doing this for a future career in making Skillshare videos. I’m not even doing it for this blog. I’m doing this for myself because there is a particular quiet joy that comes from reading and writing and learning and sharing.

Note bene.

Announcing the New Frictionless Framework / Open Knowledge Foundation

By Evgeny Karev & Lilly Winfree

Frictionless Framework

We are excited to announce our new high-level Python framework, frictionless-py: Frictionless-py was created to simplify overall user-experience for working with Frictionless Data in Python. It provides several high-level improvements in addition to many low-level fixes. Read more details below, or watch this intro video by Frictionless developer Evgeny:  

Why did we write new Python code?

Frictionless Data has been in development for almost a decade, with global users and projects spanning domains from science to government to finance. However, our main Python libraries (datapackage-py, goodtables-py, tableschema-py, tabulator-py) were originally built with some inconsistencies that have confused users over the years. We had started redoing our documentation for our existing code, and realized we had a larger issue on our hands – mainly that the disparate Python libraries had overlapping functionalities and we were not able to clearly articulate how they all fit together to form a bigger picture. We realized that overall, the existing user experience was not where we wanted it to be. Evgeny, the Frictionless Data technical lead developer, had been thinking about ways to improve the Python code for a while, and the outcome of that work is frictionless-py.

What happens to the old Python code (datapackage-py, goodtables-py, tableschema-py, tabulator-py)? How does this affect current users?

Datapackage-py (see details), tableschema-py (see details), tabulator-py (see details) still exist, will not be altered, and will be maintained. If your project is using this code, these changes are not breaking and there is no action you need to take at this point. However, we will be focusing new development on frictionless-py, and encourage you to consider starting to experiment with or work with frictionless-py during the last months of 2020 and migrate to it starting from 2021 (here is our migration guide). The one important thing to note is that goodtables-py has been subsumed by frictionless-py (since version 3 of Goodtables). We will continue to bug-fix goodtables@2.x in this branch and it is also still available on PyPi as it was before. Please note that frictionless@3.x version’s API is not stable as we are continuing to work on it at the moment. We will release frictionless@4.x by the end of 2020 to be the first SemVer/stable version.

What does frictionless-py do?

Frictionless-py has four main functions for working with data: describe, extract, validate, and transform. These are inspired by typical data analysis and data management methods. 

Describe your data: You can infer, edit and save metadata of your data tables. This is a first step for ensuring data quality and usability. Frictionless metadata includes general information about your data like textual description, as well as field types and other tabular data details.

Extract your data: You can read your data using a unified tabular interface. Data quality and consistency are guaranteed by a schema. Frictionless supports various file protocols like HTTP, FTP, and S3 and data formats like CSV, XLS, JSON, SQL, and others.

Validate your data: You can validate data tables, resources, and datasets. Frictionless generates a unified validation report, as well as supports a lot of options to customize the validation process.

Transform your data: You can clean, reshape, and transfer your data tables and datasets. Frictionless provides a pipeline capability and a lower-level interface to work with the data.

Additional features: 

  • Powerful Python framework
  • Convenient command-line interface
  • Low memory consumption for data of any size
  • Reasonable performance on big data
  • Support for compressed files
  • Custom checks and formats
  • Fully pluggable architecture
  • The included API server
  • More than 1000+ tests

How can users get started?

We recommend that you begin by reading the Getting Started Guide and the Introduction Guide. We also have in depth documentation for Describing Data, Extracting Data, Validating Data, and Transforming Data.

How can you give us feedback?

What do you think? Let us know your thoughts, suggestions, or issues by joining us in our community chat on Discord or by opening an issue in the frictionless-py repo:


Where’s the documentation?

Are you a new user? Start here: Getting Started & Introduction Guide

Are you an existing user? Start here: Migration Guide

The full list of documentation can be found here: 

What’s the difference between datapackage and frictionless?

In general, frictionless is our new generation software while tabulator/tableschema/datapackage/goodtables is our previous generation software. Frictionless has a lot of improvements over them. Please see this issue for the full answer and a code example:

I’ve spotted a bug – where do I report it?

Let us know by opening an issue in the frictionless-py repo: For tabulator/tableschema/datapackage issues, please use the corresponding issue tracker and we will triage it for you. Thanks!

I have a question – where do I get help?

You can ask us questions in our Discord chat and someone from the main developer team or from the community will help you. Here is an invitation link: We also have a Twitter account (@frictionlessd8a) and community calls where you can come meet the team and ask questions:

I want to help – how do I contribute?

Amazing, thank you! We always welcome community contributions. Start here ( and here ( and you can also reach out to Evgeny (@roll) or Lilly (@lwinfree) on GitHub if you need help.

Additional Links/Resources

OCLC-LIBER Open Science Discussion on the FAIR Principles / HangingTogether

What is the ideal future vision of an open science ecosystem supporting FAIR data? What are the challenges in getting there? These were the topics of the second installment of the OCLC/LIBER discussion series on open science, which brought together an international group of participants with a shared interest in the FAIR principles. The discussion series, which runs from September 24 through November 5, illuminates key topics related to the LIBER Open Science Roadmap. Both the discussion series and the Roadmap have the mutual goal of informing research libraries as they envision their roles in an open science landscape.

The first discussion in the series addressed the topic of scholarly publishing; a summary of the discussion highlights can be found here. In the second discussion, the focus was FAIR research data. FAIR is a set of broadly articulated principles describing the foundations of “good data management”, aimed at those who produce, publish, and/or steward research data sets, and serving as a set of guideposts for leveraging the full value of research data in support of scholarly inquiry. FAIR research data – that is, data that is findable, accessible, interoperable, and reusable – is seen as an important component of a broader open science ecosystem.

What does an ideal future look like for FAIR research data?

The discussion led off with one participant noting that “open science is just science done the right way”, emphasizing that the FAIR principles, and other aspects of open science, are elements in service of a broader vision of scientific progress unencumbered with barriers to access and communication. In an ideal world, adherence to FAIR would mean that research outputs like data, as well as software and metadata, would be equally available to both humans and machines as part of a cooperative effort among stakeholders in the scholarly process – including vendors. However, as one participant noted, at present this is “just a dream”.

Several participants noted that application of the FAIR principles must take into account the priorities and needs of a diverse set of communities. For example, one participant mentioned the CARE principles (Collective Benefit, Authority to Control, Responsibility, Ethics), which complement FAIR by addressing the interests of Indigenous Peoples. Another participant noted that the idea of “accessible” data must extend to users with physical limitations – all should have access to data in ways that are both “natural and easy”.

As we unpacked the notion of an ideal future vision for FAIR research data, discussion centered around several broad themes:

Library as data steward: At times, the library is overlooked as a campus partner in data management. Yet a great deal of the expertise needed to bring about FAIR data is already embedded in the librarian’s skill set. Moreover, librarians are well-placed to advocate for and raise awareness among researchers about the principles of open science that underpin FAIR. In an ideal world, researchers and other stakeholders will recognize that the library is an integral partner in bringing the FAIR principles to life as part of a broader shift to open science practices.

Standardization and specialization: Data management practices vary across disciplines, with different research cohorts solving data management issues – including the application of the FAIR principles – in different ways. This is often the result of ad hoc, insular approaches to data management, and, ideally, opportunities will be found to consolidate practices, standards, and protocols across disciplines where possible. But data stewards must also understand and support necessary differences in data management practices across disciplinary settings, such as specialized description standards for research data.  

Researchers as partners: The ideal of FAIR data cannot be achieved without the active support of data producers. Researchers must therefore understand the importance of making their data available in ways that adhere to the FAIR principles, as well as have access to the resources necessary to put these principles into action, such as dedicated funding for data management. Ideally, data management training will become a standard component of educating future researchers, with learning communities forming to share curricula and best practices for instruction. All of this would be facilitated by the emergence of FAIR data “champions” among influential researchers, who would serve as models for good data management practices both as colleagues and mentors, as well as potential collaborative partners for the library in spreading awareness about the FAIR principles.

Staffing up and skilling up: In an ideal world, more library staff will be involved in supporting open science initiatives, including FAIR research data. However, the composition of staff skill sets will evolve: in particular, the complexities of increasingly sophisticated data management services will require the skills of individuals who are formally trained as data stewards. In addition, more cooperation across campus stakeholders in the provision of data services will occur, with partnerships between the library and units such as Campus IT and the Research Office. Close liaison with researchers will also help data librarians understand the nuances of data management needs in specific disciplines.

What are the main challenges in achieving this future?

After identifying how participants envisioned the FAIR principles operating in a future open science ecosystem, the conversation moved on to what challenges stood in the way of achieving that future, and how the library community can work together to overcome those challenges. A real-time, online poll among participants yielded their collective view of the top barriers to attaining FAIR research data in an open science environment: 

Lack of rewards/incentives for researchers: As noted above, achieving FAIR data requires the active support of researchers. However, too many researchers lack sufficient incentives to allocate scarce time and resources to data management activities. Participants noted a number of ways that the library might address this problem, including gathering evidence on the incentives gap and raising awareness about it among campus leadership, as well as presenting evidence to researchers on the potential benefits – such as reputation enhancement – of making their data FAIR. Libraries can also join data management communities, such as Dryad or ICPSR, and provide funds and staffing to support these memberships.

Participants noted that smaller libraries with fewer resources could team up and act collectively in supporting researchers. National-scale systems for rewarding researchers for making their data available can also be helpful. Several participants observed that both top-down and bottom-up incentives are needed to create the appropriate reward structures, and that libraries can play a key role by bringing together the right campus stakeholders to create the right incentives. One participant emphasized that top-down support is essential for reward structures to work; however, another participant advised libraries not to be idle while waiting for top-down support to materialize: instead, get started right away by building data management services that can attract top-down support.

One participant suggested that funds could be re-allocated from collection budgets for demonstration projects that show the value of FAIR data. This elicited mixed reactions from others, but nevertheless highlighted an important point: increased activity in new service areas like data management will likely involve resource trade-offs with more traditional library budget lines like collections. This means that library involvement in new initiatives must be carefully considered – as one participant put it, are we sure it is the library’s role to try to develop incentives for FAIR data?    

Culture change: A shift to open science in general, and to FAIR research data in particular, involves changes in attitudes, practices, and priorities for all stakeholders. Effecting this “culture change” will require a collective effort across campus, including but not limited to the library. Libraries need to support other campus units in navigating these changes, but they also need to support each other.

Changing the attitudes, and achieving the buy-in, of influential stakeholders at all levels – from campus leadership to senior researchers – is an essential step toward shifting the culture around data management. For example, one participant noted that the principle investigators (PIs) on a research project set the tone for the entire team. If the PIs treat data management requirements as merely “a box to tick”, junior members will likely do so as well; however, if PIs truly embrace the importance of making research data FAIR, they can instill a similar attitude throughout the rest of the project team. This highlights the importance of reaching out to “influencers” as a first step in bringing about culture change.

Several participants noted the Carpentries training model – in particular, the Data Carpentry version – as a possible means of promoting FAIR data, and equally as important, cultivating the skills needed to support it. The Carpentries model has the dual advantage of providing a dynamic learning community for researchers, along with a strong emphasis on “training the trainer” instruction.

Skill-building: As libraries become more deeply embedded in research support services like data management, the skill set needed to support these services will continue to evolve. In response, as one participant observed, “we need to create a community of data librarians and train them!” One approach is to ensure that data management skills are included in library school curricula – several participants felt that training of future librarians is sometimes too oriented toward “classic” library topics. For current librarians, participants suggested that it is important to obtain buy-in from library staff, in the form of a critical mass of staff acknowledging the importance of acquiring data management skills.

One participant noted that while it is important to train data librarians, it is equally important to retain them. Opportunities exist for librarians with data management skills to move on to other domains or industries. What can be done to keep them engaged in the library world?  

Much of the discussion of this challenge focused on how libraries could act collectively to fill the skills gap in data management. One participant pointed out that many academic libraries are small, and do not have the resources to cultivate the full range of skills needed to address the diverse data management requirements encountered across disciplinary settings. A possible solution might be to develop coordinated specializations within groups of libraries, which could then be shared as a collective resource. Examples of pooling expertise in this way include the Data Curation Network in the United States, as well as the network of Dataverse communities.

The discussion continues …  

Our conversation about the FAIR principles, and data management generally, was an instructive example of the power of collective wisdom. The participants brought a multiplicity of national and institutional backgrounds to bear on the questions we discussed, and the result was an illuminating exploration of both the challenges and opportunities for libraries in making FAIR data a robust component of the open science ecosystem. Please join us for more blog posts as we continue to collect community perspective on the seven focus areas of the LIBER Open Science Roadmap.

The post OCLC-LIBER Open Science Discussion on the FAIR Principles appeared first on Hanging Together.

Evergreen 3.6-rc available / Evergreen ILS

The release candidate for Evergreen 3.6 is now available for download.

Barring the discovery of a significant blocking bug, the release candidate is expected to be nearly identical to the general release of 3.6.0 scheduled for 14 October 2020. Users of Evergreen are strongly encouraged to test the release candidate as soon as possible.

OCLC-LIBER Open Science Discussion on Scholarly Publishing / HangingTogether

Recently OCLC Research and LIBER (the Association of European Research Libraries) hosted the first of seven small group discussions comprising the OCLC-LIBER Open Science Discussion Series. This discussion series, which takes place from 24 September through 5 November 2020, is based upon the LIBER Open Science Roadmap, and will help guide research libraries in envisioning the support infrastructure for Open Science (OS) and their roles at local, national, and global levels. I wrote about this collaborative effort between our two organizations in an earlier blog on 28 September.

Our first small group discussion focused on the topic of Scholarly Publishing, included participants from eight countries and two continents, and was facilitated by Rachel Frick, Executive Director of the OCLC Research Library Partnership. The goal of this and all focused discussions in this series is to:

  • Imagine and articulate what the ideal open science ecosystem will look like 
  • Identify barriers toward that future
  • Envision how the library community—working together—could take collective action, in order to address the challenge and effect change

The overarching goal of the discussion series is to inform our organizations as we seek to identify research questions that OCLC and LIBER can collaboratively address to advance Open Science. 

What does the ideal future state look like for scholarly publishing?

Participants shared their goals for an open and accessible scholarly publishing ecosystem, which they believed should have the following characteristics: 

  • The open scholarly publishing system will be flexible enough to support a diverse and dynamic landscape of publishing options–facilitating innovation, sharing, and discovery.
  • This new ecosystem must be sensitive of, and adaptable to disciplinary differences. Participants remarked on how divergent disciplinary norms can be in publishing (for instance, practices around author name order). And data management and sharing practices also vary widely.
  • Published digital content will not only be open, but it will also be systematically preserved. This in turn will help inspire confidence in researchers. Participants agreed that we need a landscape where scholars can be confident in the preservation of digital, open content. Humanities scholars are sometimes concerned about this, and it slows their participation in open publishing. 
  • Like research data, scholarly publications should be FAIR (findable, accessible, interoperable, and reusable), and discussants emphasized that metadata should be standardized, interoperable, and machine readable—a reality that is far from realization today, particularly for green open access existing in repositories. 
  • Persistent identifiers like ORCIDs and DOIs should be used across all disciplines and for all types of content, including publications and datasets, but also grants, proposals, and more. Relatedly, participants expressed concerns about the uneven adoption of ORCIDs and DOIs for much book scholarship, in great part because of the failure to implement PIDs in book publishing workflows. This makes for less robust information for discovery and access, and also hinders the potential for metadata harvesting of humanities content into CRIS/RIM systems.
  • Open Science terminology should be free from jargon and easy for scholars of all disciplines to understand and value. One participant in our discussion, who joined a university library after a career operating on another part of campus, expressed concern about the amount of jargon that may impede rather than accelerate the understanding, acceptance, and adoption of open science practices.
  • Researchers and scholars should be incentivized to publish open access. Research universities are heavily attuned to research metrics, and our participants suggested that open science metrics–like the number or percentage of open publications produced by an institution–are also vital for charting an open future.

What are the main challenges and obstacles preventing progress toward this ideal state?

Once we had a sense of what the ideal open publishing destination looked like, we used online polling to brainstorm a list of barriers–or roadblocks–toward progress. Participants offered about 20 items, although some are closely overlapping. Using up- and down-voting, we quickly identified three top challenges that we spent more time discussing:

  1. Evaluation and funding mechanisms for institution and researchers
  2. No agreement around which standards to back–let’s all get behind CrossRef, DataCite, ORCID, etc. and stop trying to develop different solutions
  3. Many researchers are still not aware about Open Science practices

How can the library community, working together, take collective action—in order to address the challenge and effect change?

Show me the money

Several discussants described the need for scholarly publishing and access to be an institutional priority and not just a library priority. Currently support sits primarily in the library budget for subscriptions and licenses, but this support could also logically reside in the research office budget, as they are research support expenditures. One participant said, 

“In terms of funding, it would behoove us to think about our collections budgets in terms of monies that support the actual research process. Things are going to have to change internally as to how scholarly communications is funded.” 

Participants recognized the challenge of moving open science awareness and support out of the library and into other campus units. Collaboration and partnership with other university stakeholders is only going to grow in importance. The research office is seen as a primary stakeholder and partner in this regard, and, as one participant described it “most researchers are [already] attached to the research office, whereas they may not have relationships with the library.” However, localized dynamics mean this will look different at different locales. One participant described their experience working on two different university campuses—one with a strong office of research and another with a weaker, less centralized research office. Advocacy and change is easier in the former environment. Note that OCLC Research has recently published the research report, Social Interoperability in Research Support: Cross-Campus Partnerships and the University Research Enterprise, of which I am a co-author, on this topic of intra-institutional collaboration and the need for libraries to effectively engage with other campus stakeholders. 

One participant described the shift to open scholarly publishing going in one of three ways: “legal, flipping, or revolutionary.” This would happen through 1) transformative agreements, where the content is legally shifted from subscription to OA, 2) flipping from the prioritization of gold open access to the acceptance of green open access, with shorter embargo periods—and emphasizing that green OA is not less effective or important than other methods, and 3) starting more university presses to change the paradigm, with the more radical or revolutionary change of funneling money to smaller publishers instead of large behemoths. 

Show me the metadata

Participants applauded the use of open, persistent identifiers like ORCIDs, DOIs, and RORs (Research Organization Registry), and see them as essential for supporting disambiguation and interoperability. However, they pointed to the uneven adoption of PIDs across the scholarly communications landscape and one participant recommended the adoption of ORCID for “everything and anything,” including grants, research proposals, publishing, repository deposits, and much more. 

Photo by Alain Pham on Unsplash

Another discussant emphasized how imperative it is for us to integrate these identifiers into all our systems, as this lack of infrastructure inhibits interoperability, and later, discoverability and access. Dspace lacks a native space for making ORCIDs usable and discoverable, for instance, and publishing workflows, particularly for humanities monographs, usually don’t capture ORCIDs, and might not even be friendly with DOIs. (And forget about being ready to ROR!) While many STEM publishers have been requiring ORCIDs in their publishing workflows for a few years now, large disciplinary gaps remain, particularly in humanities publishing. These gaps flow downstream, as library discovery systems may fail to pick up the 856 field that indicates if content is open access, meaning that open content exists but can’t be discovered or accessed by the user. 

The group also discussed how these failures exist because of a lack of social interoperability (my term) between different silos in the research life cycle. Some discussants recommended greater involvement from the research office. And that librarians needed to better understand the publishing workflow in order to advocate for change—and to integrate better into library services. There are also opportunities for improved collaboration between scholarly communications librarians and technical services librarians. And of course, ILS service providers must also recognize the importance of collecting and making actionable information about open access availability. 

Show me why I should care

In addition to metadata, conversations also seem to come back to issues of communications. One discussant described their weariness in this regard, 

“Sometimes I’m really tired of the communications, because it’s just broadcasting. But this is not effective enough. A kind of tailor-made communication and interaction and somehow being more visible and a part of the researcher process is essential.”

Photo by Jon Tyson on Unsplash

This rang true with others, who recognized that broadcasting doesn’t work because of the significant disciplinary differences—while terabytes of data matter to one community, it’s a non-issue for another and will fall upon deaf ears. Instead, discussants agreed that effective open science communications “will mean myriad things,” and must be customized and discipline-based. One non-library discussant also warned against the use of jargon in our communications, as library-speak may not jibe with terminology well known by researchers, especially across all disciplines. Efforts like the recent OAPEN OA Books Toolkit, offer resources to help raise awareness about open access books. And posters like this one from the ARMA conference offer a way to educate researchers about why they should care about PIDs. 

I found this a rich, positive conversation, and one that OCLC and LIBER will be collectively reflecting upon in the coming weeks, as we also accumulate input from the community on the other six focus areas in the LIBER Open Science Roadmap. Stay tuned for more blog posts in this series. 

The post OCLC-LIBER Open Science Discussion on Scholarly Publishing appeared first on Hanging Together.

On Bookstores, Libraries & Archives in the Digital Age / Open Library

The following was a guest post by Brewster Kahle on Against The Grain (ATG) – Linking Publishers, Vendors, & Librarians

See the original article here on ATG’s website

By: Brewster Kahle, Founder & Digital Librarian, Internet Archive​​​​​​​

​​​Back in 2006, I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a framed blown-up letter “S.”  This was an inside joke about the Internet Archive being named in the singular, Archive, rather than the plural Archives. Of course, he was right, as I should have known all along. The Internet Archive had long since grown out of being an “archive of the Internet”—a singular collection, say of web pages—to being “archives on the Internet,” plural.  My evolving understanding of these different names might help focus a discussion that has become blurry in our digital times: the difference between the roles of publishers, bookstores, libraries, archives, and museums. These organizations and institutions have evolved with different success criteria, not just because of the shifting physical manifestation of knowledge over time, but because of the different roles each group plays in a functioning society. For the moment, let’s take the concepts of Library and Archive.

The traditional definition of a library is that it is made up of published materials, while an archive is made up of unpublished materials. Archives play an important function that must be maintained—we give frightfully little attention to collections of unpublished works in the digital age. Think of all the drafts of books that have disappeared once we started to write with word processors and kept the files on fragile computer floppies and disks. Think of all the videotapes of lectures that are thrown out or were never recorded in the first place. 

Bookstores: The Thrill of the Hunt

Let’s try another approach to understanding distinctions between bookstores, libraries and archives. When I was in my 20’s living in Boston—before and before the World Wide Web (but during the early Internet)—new and used bookstores were everywhere. I thought of them as catering to the specialized interests of their customers: small, selective, and only offering books that might sell and be taken away, with enough profit margin to keep the store in business. I loved them. I especially liked the used bookstore owners—they could peer into my soul (and into my wallet!) to find the right book for me. The most enjoyable aspect of the bookstore was the hunt—I arrived with a tiny sheet of paper in my wallet with a list of the books I wanted, would bring it out and ask the used bookstore owners if I might go home with a bargain. I rarely had the money to buy new books for myself, but I would give new books as gifts. While I knew it was okay to stay for awhile in the bookstore just reading, I always knew the game.

Libraries: Offering Conversations not Answers

The libraries that I used in Boston—MIT LibrariesHarvard Libraries, the Boston Public Library—were very different. I knew of the private Boston Athenæum but I was not a member, so I could not enter. Libraries for me seemed infinite, but still tailored to individual interests. They had what was needed for you to explore and if they did not have it, the reference librarian would proudly proclaim: “We can get it for you!” I loved interlibrary loans—not so much in practice, because it was slow, but because they gave you a glimpse of a network of institutions sharing what they treasured with anyone curious enough to want to know more. It was a dream straight out of Borges’ imagination (if you have not read Borges’ short stories, they are not to be missed, and they are short. I recommend you write them on the little slip of paper you keep in your wallet.) I couldn’t afford to own many of the books I wanted, so it turned off that acquisitive impulse in me. But the libraries allowed me to read anything, old and new. I found I consumed library books very differently. I rarely even brought a book from the shelf to a table; I would stand, browse, read, learn and search in the aisles. Dipping in here and there. The card catalog got me to the right section and from there I learned as I explored. 

Libraries were there to spark my own ideas. The library did not set out to tell a story as a museum would. It was for me to find stories, to create connections, have my own ideas by putting things together. I would come to the library with a question and end up with ideas.  Rarely were these facts or statistics—but rather new points of view. Old books, historical newspapers, even the collection of reference books all illustrated points of view that were important to the times and subject matter. I was able to learn from others who may have been far away or long deceased. Libraries presented me with a conversation, not an answer. Good libraries cause conversations in your head with many writers. These writers, those librarians, challenged me to be different, to be better. 

Staying for hours in a library was not an annoyance for the librarians—it was the point. Yes, you could check books out of the library, and I would, but mostly I did my work in the library—a few pages here, a few pages there—a stack of books in a carrel with index cards tucked into them and with lots of handwritten notes (uh, no laptops yet).

But libraries were still specialized. To learn about draft resisters during the Vietnam War, I needed access to a law library. MIT did not have a law collection and this was before Lexis/Nexis and Westlaw. I needed to get to the volumes of case law of the United States.  Harvard, up the road, had one of the great law libraries, but as an MIT student, I could not get in. My MIT professor lent me his ID that fortunately did not include a photo, so I could sneak in with that. I spent hours in the basement of Harvard’s Law Library reading about the cases of conscientious objectors and others. 

But why was this library of law books not available to everyone? It stung me. It did not seem right. 

A few years later I would apply to library school at Simmons College to figure out how to build a digital library system that would be closer to the carved words over the Boston Public Library’s door in Copley Square:  “Free to All.”  

Archives: A Wonderful Place for Singular Obsessions

When I quizzed the archivist at MIT, she explained what she did and how the MIT Archives worked. I loved the idea, but did not spend any time there—it was not organized for the busy undergraduate. The MIT Library was organized for easy access; the MIT Archives included complete collections of papers, notes, ephemera from others, often professors. It struck me that the archives were collections of collections. Each collection faithfully preserved and annotated.  I think of them as having advertisements on them, beckoning the researcher who wants to dive into the materials in the archive and the mindset of the collector.

So in this formulation, an archive is a collection, archives are collections of collections.  Archivists are presented with collections, usually donations, but sometimes there is some money involved to preserve and catalog another’s life work. Personally, I appreciate almost any evidence of obsession—it can drive toward singular accomplishments. Archives often reveal such singular obsessions. But not all collections are archived, as it is an expensive process.

The cost of archiving collections is changing, especially with digital materials, as is cataloging and searching those collections. But it is still expensive. When the Internet Archive takes on a physical collection, say of records, or old repair manuals, or materials from an art group, we have to weigh the costs and the potential benefits to researchers in the future. 

Archives take the long view. One hundred years from now is not an endpoint, it may be the first time a collection really comes back to light.

Digital Libraries: A Memex Dream, a Global Brain

So when I helped start the Internet Archive, we wanted to build a digital library—a “complete enough” collection, and “organized enough” that everything would be there and findable. A Universal Library. A Library of Alexandria for the digital age. Fulfilling the memex dream of Vanevar Bush (do read “As We May Think“), of Ted Nelson‘s Xanadu, of Tim Berners-Lee‘s World Wide Web, of Danny Hillis‘ Thinking Machine, Raj Reddy’s Universal Access to All Knowledge, and Peter Russell’s Global Brain.

Could we be smarter by having people, the library, networks, and computers all work together?  That is the dream I signed on to.  I dreamed of starting with a collection—an Archive, an Internet Archive. This grew to be  a collection of collections: Archives. Then a critical mass of knowledge complete enough to inform citizens worldwide: a Digital Library. A library accessible by anyone connected to the Internet, “Free to All.”

About the Author: Brewster Kahle, Founder & Digital Librarian, Internet Archive

Brewster KahleBrewster Kahle

A passionate advocate for public Internet access and a successful entrepreneur, Brewster Kahle has spent his career intent on a singular focus: providing Universal Access to All Knowledge. He is the founder and Digital Librarian of the Internet Archive, one of the largest digital libraries in the world, which serves more than a million patrons each day. Creator of the Wayback Machine and lending millions of digitized books, the Internet Archive works with more than 800 library and university partners to create a free digital library, accessible to all.

Soon after graduating from the Massachusetts Institute of Technology where he studied artificial intelligence, Kahle helped found the company Thinking Machines, a parallel supercomputer maker. He is an Internet pioneer, creating the Internet’s first publishing system called Wide Area Information Server (WAIS). In 1996, Kahle co-founded Alexa Internet, with technology that helps catalog the Web, selling it to in 1999.  Elected to the Internet Hall of Fame, Kahle is also a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, and holds honorary library doctorates from Simmons College and University of Alberta.

Library IT Service Portfolio / Library Tech Talk (U of Michigan)

TRACC: A tool developed by Michigan to help with portfolio management

Academic library service portfolios are mostly a mix of big to small strategic initiatives and tactical projects. Systems developed in the past can become a durable bedrock of workflows and services around the library, remaining relevant and needed for five, ten, and sometimes as long as twenty years. There is, of course, never enough time and resources to do everything. The challenge faced by Library IT divisions is to balance the tension of sustaining these legacy systems while continuing to innovate and develop new services. The University of Michigan’s Library IT portfolio has legacy systems in need of ongoing maintenance and support, in addition to new projects and services that add to and expand the portfolio. We, at Michigan, worked on a process to balance the portfolio of services and projects for our Library IT division. We started working on the idea of developing a custom tool for our needs since all the other available tools are oriented towards corporate organizations and we needed a light-weight tool to support our process. We went through a complete planning process first on whiteboards and paper, then developed an open source tool called TRACC for helping us with portfolio management.

A Note On Blockchains / David Rosenthal

Blockchains have three components, a data structure, a set of replicas, and a consensus mechanism:
  • The data structure is often said to provide immutability or to be tamper-proof, but this is wrong. It is made out of bits, and bits can be changed or destroyed. What it actually provides is tamper-evidence, revealing that the data structure has changed.
  • If an unauthorized change to the data structure is detected the damage must be repaired. So there must be multiple replicas of the data structure to allow an undamaged replica to be copied to the damaged replica.
  • The role of the consensus mechanism is to authorize changes to the data structure, and prevent unauthorized changes. A change is authorized if the consensus of the replicas agrees to it.
Below the fold, some details.

Data Structure

The data structure used for blockchains is a form of Merkle or hash tree, published by Ralph Merkle in 1980. In the blockchain application it is a linear chain to which fixed-size blocks are added at regular intervals. Each block contains the hash of its predecessor; a chain of blocks. Hash algorithms have a limited lifetime, but while the hash algorithm remains unbroken it is extremely difficult to change blocks in the chain but maintain the same hash values. A change that does not maintain the same hash values is easy to detect.


The set of replicas can be either closed, composed of only replicas approved by some authority, or open, in which case no approval is required for participation. In blockchain jargon, closed replica sets correspond to permissioned blockchains, and open replicas sets to permissionless blockchains.

Consensus Mechanism

Faults Replicas
1 4
2 7
3 10
4 13
5 16
6 19
An important result in theoretical computer science was published in The Byzantine Generals Problem by Lamport et al in 1982. They showed that the minimum size of a replica set to survive f simultaneous failures was 3f+1. Thus Byzantine Fault Tolerance (BFT) is the most efficient possible consensus mechanism in terms of number of replicas. BFT requires a closed replica set, and synchronized operation of the replicas, so can be used only in permissioned blockchains.

If joining the replica set of a permissionless blockchain is free, it will be vulnerable to Sybil attacks, in which an attacker creates many apparently independent replicas which are actually under his sole control. If creating and maintaining a replica is free, anyone can authorize any change they choose simply by creating enough Sybil replicas.

Defending against Sybil attacks requires that membership in a replica set be expensive. The cost of an attack is at least the membership cost of half the replica set, so that the attacker controls a majority of the replicas. Permissionless blockchains have implemented a number of ways to make it expensive to take part, including:
  • Proof of Work (PoW), a concept originated by Cynthia Dwork and Moni Naor in 1992, in which the expensive resource is CPU cycles. This is the "mining" technique used by Bitcoin, and is the only technique that has been demonstrated to work well at scale. But at scale the cost and environmental damage is unsustainable; the top 5 cryptocurrencies are estimated to use as much energy as The Netherlands. At smaller scales it doesn't work well because renting 51% of the mining power is cheap enough to motivate attacks. 51% attacks have become endemic among the smaller alt-coins. For example, there were three successful attacks on Ethereum Classic in a single month.
  • Proof of Stake (PoS) in which the expensive resource is capital tied up, or staked. Participants stand to lose their stake in case of detected misbehavior. The Ethereum blockchain has been trying to implement PoS for 5 years, so far without success. The technique has similar economic linits and vulnerabilities as PoW.
  • Proofs of Time & Space (PoTS), advocated by Bram Cohen, in which the expensive resource is disk storage.


Eric Budish points out the fundamental problem with expensive defenses in The Economic Limits of Bitcoin and the Blockchain:
From a computer security perspective, the key thing to note ... is that the security of the blockchain is linear in the amount of expenditure on mining power, ... In contrast, in many other contexts investments in computer security yield convex returns (e.g., traditional uses of cryptography) ... analogously to how a lock on a door increases the security of a house by more than the cost of the lock.
The difference between permissioned and permissionless blockchains is the presence or absence of a trusted authority controlling the replica set. A decision not to trust such an authority imposes enormous additional costs and performance penalties on the system because the permissionless consensus mechanism has to be expensive.

Decentralization in Bitcoin and Ethereum Networks by Adem Efe Gencer et al compares the cost of a permissioned system using BFT to the actual Bitcoin PoW blockchain:
a Byzantine quorum system of size 20 could achieve better decentralization than proof-of-work mining at a much lower resource cost.
As an Englishman I appreciate understatement. By "much lower", they mean around 5 orders of magnitude lower.

Endangering Data Interview with Terra Graziani / Digital Library Federation

Terra Graziani with her dog wearing a sign that reads "abolish racial capitalism"Terra Graziani is a researcher and tenants’ rights organizer based in Los Angeles, CA. She founded and co-directs the Los Angeles chapter of the Anti-Eviction Mapping Project (AEMP), a digital storytelling collective documenting dispossession and resistance in solidarity with gentrifying communities through research, oral history, and data work. She is also a researcher with the UCLA Institute on Inequality and Democracy and The Center for Critical Internet Inquiry at UCLA. Before this, she organized with AEMP in the San Francisco Bay Area and worked for several tenants’ rights organizations including The Los Angeles Center for Community Law and Action, The Eviction Defense Collaborative and Tenants Together. She is currently Research Program Officer at Educopia where she works to cultivate community in the information field. Terra earned her Master’s in Urban and Regional Planning at UCLA and her Bachelor’s degree in Social and Cultural Geography at UC Berkeley.

Tell us a bit about your projects and how you became interested in issues of data privacy, collection, and surveillance.

One of my first jobs in the tenant movement was Community Outreach Coordinator at The Eviction Defense Collaborative (EDC) in San Francisco, a legal clinic where anyone who has received an eviction notice goes first to get immediate help responding to their eviction. I was responsible for writing EDC’s annual eviction report, which analyzed the data EDC collected through its clinic to provide a picture of displacement in San Francisco. To make this report, we partnered with The Anti-Eviction Mapping Project (AEMP), and the rest is history! I joined AEMP shortly thereafter and began to learn — both through the research and visualization I was doing with AEMP, as well as through the experiences I was having working with tenants facing eviction at EDC — all about eviction data! Since then, I’ve worked in various roles in the tenant movement organizing against displacement and building power through knowledge production around issues of tenants’ rights, our speculative and racist housing system (market), police and property, policing technologies, and so much more. I spend a lot of time thinking about how we can mobilize data to build power from below, rather than to surveil and punish.


Your work in AEMP and in tenant organizing more broadly has touched on areas where mass data collection can endanger tenants, as well as areas where civic data and grassroots data collection can help to counter anti-tenant narratives and build community power. 

How does the concept of “endangering data” arise in your work?

Early on, I learned that the most commonly-used data set on evictions comes from the courts, and that each county’s courts has a different data management system that has a huge impact on how accessible that data is. I also learned that, in California, there is a masking law that prevents court eviction records from being made public. As the article states, before this masking law was passed, “under longstanding California law, records in eviction lawsuits [were] kept sealed for 60 days after they [were] filed. On the 61st day, the court clerk look[ed] to see if the tenant prevailed. If not, the record [became] public — even if there [had] been no ruling in the case, and even if the landlord [had] abandoned the lawsuit.” After these 60 days, names of those who had lost their case and been evicted would be published, third party companies were taking and publishing this data, and landlords would subscribe in order to blacklist tenants. A tenants’ record and credit score could be affected for up to 7 years, significantly impacting their access to housing, which is already so hard to secure. With the new state law, which took effect in January 2017, the landlord would have to win the case in those 60 days for the record to be made public. The law is still not perfect, as tenants still avoid taking their cases to court where, if they have access to an attorney (also a huge problem) they could get a better outcome, for fear of losing and getting an eviction put on their record, but the new laws protect many more tenants from being blacklisted.

This masking law is a double-edged sword, though, because it also means those within the tenant movement who want access to court eviction data, including AEMP, have a very hard time getting it. There is an exemption to the law that grants access to “a person who has an order from a court, issued upon a showing of good cause.” This is the exemption AEMP used to get a court order to access the data in Alameda County. In San Francisco, the court has a practice of sharing address-level eviction data with the rent board, who then we request the data from through public records’ request. The format of the data we receive from these public bodies is all over the place and often takes months of work to make it usable. In sum, the data management and sharing practices of public institutions are so variable and have a huge impact on the work we can do. 

One of my many jobs at EDC was to help re-vamp their intake form, meaning I was helping shape what data we collected on tenants facing eviction in San Francisco. I still think about this process often, because working with clinics and tenant organizations to mobilize their data is one way AEMP has been able to visualize a more holistic data set of evictions. If we only look at eviction data from courts, we’re only representing those tenants who made it to court. So many tenants are forced out informally, through intimidation, harassment, informal buy-out offers etc., none of which are captured by court data. Some of this is recorded in clinic records where tenants have gone for help. Another way this is helpful is that many clinics collect demographic data. EDC’s data taught us, for example, that in 2015, compared to the city’s population, Black residents were overrepresented by 300% in their eviction records. Race and other sociopolitical aspects of the displacement crisis are not at all captured in court records. 

We are still very far away from having holistic, community-controlled, non-punitive data on evictions, but this is the work AEMP is always engaged in. We collect, mobilize, and preserve data for justice, by and for those who face or have faced displacement. And we’re always collectively evolving our practices on how best to do this. 


How have you used data to fight for greater justice?

Wahoo, you’ve given me some space to talk about making property ownership data public! 

It is extremely difficult to track the real, or “beneficial owner” of rental property in our communities, because property owners are able to hide their identities by purchasing and owning property behind LLCs. Currently, owners are not required to disclose their real identity when they register an LLC or purchase a property. This anonymity in the public record is not an oversight – it is by design, and property owners benefit enormously from it. When a tenant is facing eviction, and they only know their landlord to be an LLC, they have to do a lot of work to figure out who to put pressure on to drop their eviction. In order to unveil and hold accountable the increasingly corporate actors who control so much property in our neighborhoods, municipalities around the world should demand the disclosure of beneficial ownership in both companies and property ownership. 

The Anti-Eviction Mapping Project has worked for years to unveil property ownership webs in the San Francisco Bay Area, Los Angeles, and New York so that tenants can use this information to fight displacement. We’re building a tool (which is almost ready!) called Evictorbook, which allows you to search a property, see who the real owner is, and see the property’s eviction history as well as whether or not its covered by rent control, and a few more details. So much data is collected on tenants, particularly by the real estate industry. Evictorbook aims to keep track of landlords and gives tenants a tool to fight back with. 


You and your colleagues on the AEMP team have written in multiple places[1][2][3] about the need for grassroots, community based work. What steps can we, as academics and information professionals, take for more equitable and democratic data practices?

Several members of AEMP, myself included, straddle academia and organizing work. Mary Shi, another AEMP member, and I recently published an article in ACME: An International Journal for Critical Geographies entitled, “Data for Justice Tensions and Lessons from the Anti-Eviction Mapping Project’s Work Between Academia and Activism.” In it, we talk about what we’ve learned from straddling these two spaces, focusing on mutual aid, accountability, and embeddedness as guiding principles to producing knowledge outside of academia. I’ll share our conclusion here, which I think sums up the article nicely:

“It is no accident that AEMP, as a project fighting displacement, finds itself straddling  the  space  between  academia  and  activism  with  its  epistemologically critical perspective.  Like traditional,  objectivist  knowledge,  displacement  is  a strategy  of  violence  through  erasure.  Resistance,  therefore,  requires  strategies  that fight   this   erasure   at   each   point.   Countermapping,   story-telling,  and   deep collaborations with community organizers are all strategies AEMP has developed to fight such erasures at multiple levels. And as the guiding principles of mutual aid, accountability,  and  embeddedness  illustrate,  it  is  not  only  the  critical  nature  of AEMP’s  tools  but  also  AEMP’s  constant  assessment  of  its  work’s  impacts  as measured  from  the  perspective  of  the  communities,  organizers,  and  activists  it  is meant to serve that allow AEMP to pursue its mission of producing data for justice.”

Projects   like   AEMP   are   being   offered   new   opportunities–often   by sympathetic insiders– to take advantage of the centuries of resources accumulated by universities,  research  institutes,  and  other  such  organizations  in  pursuit  of  their critical objectives. AEMP recognizes these as redistributive opportunities that should be taken with eyes wide open. In  this spirit, AEMP  continues straddling the space between academia and activism despite the challenges this position entails. The path AEMP has discovered in navigating this terrain is not disavowal and exit but rather constant critique and strategic engagement. We offer our reflections not as an end-all-be-all  guide  for  scholars  seeking  to  do  critical,  community  engaged  work,  but instead  as a sharing of the surest signposts we have discovered along the way. As more scholars reevaluate the way they study changing urban landscapes in particular and  the  relationship  between  academia  and  activism  more  generally,  we  hope  this piece can contribute to the forging of a more just and reparative relationship between academia and the publics it serves.”

I’ll also share AEMP’s Data Use Agreement, which we’ve formulated over the years and welcome questions, concerns, and feedback on. 

  1. Graziani, Terra, and Mary Shi (2020) “Data for Justice”
  2. Aiello, Daniela, Lisa Bates, Terra Graziani, Christopher Herring, Manissa Maharawal, Erin McElroy, Pamela Phan, and Gretchen Purser (2018) “Eviction Lab Misses the Mark”
  3. Ferrer, Alex, Graziani, Terra et al., The Vacancy Report: How Los Angeles Leaves Homes Empty and People Unhoused

The post Endangering Data Interview with Terra Graziani appeared first on DLF.

Glassdoor’s D&I ratings: What does 4.6 out of 5 even mean? / Tara Robertson

close up of star shaped glitterstars by Darko Pevec, licensed under Creative Commons

Today I learned that Glassdoor recently added diversity and inclusion metrics to their company rankings. My first reaction was excitement–this could drive accountability and increase transparency on diversity, equity and inclusion (DEI). We know that many many people care about DEI in an employer’s brand, so this seems like useful functionality for candidates researching potential companies.

Glassdoor launched these user submitted D&I reviews with 12 companies. Salesforce scored the highest, with 4.6/5. That’s great! But what does it mean?

screenshot of Greenhouse's interface: Diversity and Inclusion at Salesforce 4.6 out of 5 (52 reviews)

When people are scoring their company on D&I, what factors are they considering in that category? I also wondered “If 9/10 white people think their company is anti-racist, what does that mean?, or if 4/5 men think a company isn’t sexist what does that mean?” In my DEI work I’ve used employee engagement data to diagnose the places where different demographic groups are having very different experiences and then ask “what’s going on here?”

In a blog post about this feature, Christian Sutherland-Wong, the Glassdoor CEO said, “Job seekers and employees today really care about equity, and for too long they’ve lacked access to the information needed to make informed decisions about the companies that are, or are not, truly inclusive.” Equity is something I care about and the D&I score on Glassdoor doesn’t help me evaluate that. By only having an overall score it reflects the sentiment and scores of the largest demographic groups. I set up a Glassdoor account and saw that I wasn’t asked for demographic information, so any type of weighting based on demographic wouldn’t be possible. (I’m sure what a methodologically sound way of weighted scores would be, but I’m sure smarter, mathier people would have informed opinions on this).

Edit: The Glassdoor post about D&I in their product says that for “U.S.-based employees and job seekers to voluntarily and anonymously share their demographic information to help others determine whether a company is actually delivering on its diversity and inclusion commitments. Glassdoor users can provide information regarding their race and ethnicity, gender identity, sexual orientation, disability status, parental status, and more, all of which can be shared anonymously through their Glassdoor user profile. With these demographic contributions, Glassdoor will soon display company ratings, workplace factor ratings, salary reports, and more aggregate, broken out by specific groups at specific companies. This information will equip employers with further data and insights to create and sustain more equitable workplaces.”

Most large companies now release diversity disclosure reports (Google, Apple, Microsoft, and the one I authored at Mozilla). It would be very useful if Glassdoor used this data to share high level diversity numbers in each company profile.

screenshot of Greenhouse's interface: Google's company overview

Overall I’m glad to see Glassdoor add more information about D&I to their platform but question if user submitted scores out of 5 are terribly useful. I’d love to see Glassdoor surface diversity metrics in their company profiles. I’m curious to see how more transparency around D&I helps candidates and companies make better decisions.

Release 3.6 Beta2 Available / Evergreen ILS

Release 3.6 Beta2

The first Beta release of Evergreen 3.6 has been undergoing testing for nearly two weeks. Efforts of Evergreen community members and a successful Bug Squashing Week have resulted in 253 patches committed and released for 3.6 Beta1, and another 40 patches for 3.6 Beta2. That second beta release is now available on the downloads page.

Feature Highlights

Better Tools to Manage Hopeless Holds

A new interface identifies holds with no remaining items that can fill them, and provides actions staff can take to deal with them. This development was contributed by Equinox, sponsored by MassLNC and NOBLE.

Test Patron Notification Methods Functionality

The patron edit screen now offers the option of testing a patron’s email address and SMS notification preference by clicking a button next to the email and SMS data fields. This development was contributed by Catalyte, sponsored by MassLNC.

Enhancements to Print/Email from the OPAC

Patrons will have more options when printing bibliographic information from the catalog. Choices for Full and Brief format have been added as well as the option to include holdings. This development was contributed by Equinox, sponsored by MassLNC.

Deserving of a second mention is Chris Burton’s guide to the Bootstrap opac:

Other improvements for Beta2 include multiple Angular Catalog refinements, accessibility improvements and many bug fixes which will also be included in maintenance releases of 3.4 and 3.5.

What’s next

Bug Squashing week may be over, but continued testing of the beta release is strongly encouraged to further refine the 3.6 features. The 3.6 Release Candidate is planned for October 7th.

The 3.6 Release Team:

Galen Charleton
Jason Boyer
Terran McCanna
Michele Morgan