Planet Code4Lib

NDSA Welcomes Eight New Members / Digital Library Federation

As of 17 September 2021, the NDSA Leadership unanimously voted to welcome its eight most recent applicants into the membership. Each new member brings a host of skills and experience to our group. Keep an eye out for them on your calls and be sure to give them a shout out. Please join me in welcoming our new members.

Botanical Research Institute of Texas

The Botanical Research Institute of Texas (BIRT) was founded with an institutional commitment to the preservation and dissemination of botanical knowledge. BRIT’s core collection is the Herbarium which holds almost 1.5 million preserved plant specimens. Approximately 70% of this collection has been digitally imaged and is continually growing through ongoing digitization efforts. These specimen images and data are disseminated through data portals such as TORCH and SERNEC and preserved through partnerships with the Texas Advanced Computing Center at the University of Texas and CyVerse.

Oklahoma State University Library

The Oklahoma State University Library has been dedicated to digital preservation for over two decades, beginning with efforts to preserve and make accessible online Charles Kappler’s Indian Affairs: Laws and Treaties. Digital preservation efforts were first enacted by the Library’s Electronic Publishing Center (EPC, 2000-2008), but larger initiatives were soon in place throughout the Library. Another major contributor in the digital preservation area has been the Oklahoma Oral History Research Program, which was founded in 2007 and has been fully digital since its inception.

NYC Department of Records and Information Services (DORIS)

The New York City Department of Records and Information Services (DORIS) manages formats dating 1640-2020 and 500 TB of both digitized and born digital material. A team of 8 permanent staff are dedicated to digitizing collections at a growth rate of 150TB per year. DORIS is preparing to ingest ~200 TB of born digital government material with the mayoral changeover in January 2022. Since 2017, DORIS has used BitCurator for reviewing and ingesting born-digital content from city agencies. 

Amistad Research Center

The Amistad Research Center (ARC) has been digitizing archives and manuscripts, photographs and text for online access and digital exhibitions for a number of years. In partnership with Adam Matthews Digital and other library vendors, one record collection has been digitized with interest in digitizing other collections housed here. Additionally, ARC established over the last decade a robust audiovisual reformatting program with the ability to digitize most audio and some video formats in house, while outsourcing film collections or U-matic videotapes as funding is acquired. ARC is now exploring long-term cloud storage solutions for our high quality access and digital master files, as well as the funding required to maintain such storage.

University of Dubuque Charles C. Myers Library 

The Charles C. Myers Library manages its preservation environment using open-source, on-site, and off-site, and cloud technologies. The library curates digital exhibitions featuring  minority populations in the aviation field and underrepresented groups on campus over the last 100 years at the university.

University of Arkansas at Fayetteville Libraries

At the University of Arkansas Fayetteville Libraries there are two units that participate in various levels of digital preservation. The Digital Services Department provides various stage levels of digital preservation depending on the collection scope and grant agreements for digitization.  Special Collections, which also houses the University Archives, performs a detailed digital preservation process.

The African American Research Library and Cultural Center 

The Broward County African American Research Library and Cultural Center (AARLCC) is a public research library. Their digitization and digital preservation efforts hope to create access and awareness of content in their collection which focuses on Black history and life. AARLCC is also using 3D scanning for artifacts. As a Black collecting library and archive, another area of interest is the work of web archiving for Black collecting institutions. AARLCC has co-created the Archiving the Black Web initiative to support efforts of similar Black collecting organizations and to begin to document and preserve content on the web related to Black history and life.

Congregation of the Sisters of St. Joseph in Canada 

The Congregation of the Sisters of St. Joseph is a consolidated archive committed to following best practices in ensuring both born-digital records received through transfer or donation and analogue records which have been digitized for preservation or outreach purposes, are preserved long-term. The Congregation has built an in-house digital preservation system using free and open source software, following the OAIS model and strives to achieve the highest NDSA levels of preservation over time.

~Nathan Tallman, NDSA Vice Chair

The post NDSA Welcomes Eight New Members appeared first on DLF.

Government Labels / Ed Summers

In the lead up to the 2020 US Presidential Election Twitter implemented new labels for government officials, organizations and state-affiliated media accounts. This was a follow on from their previous ban on state-backed political advertising in 2019.

By their own description Twitter apply these labels to:

  • Accounts of key government officials, including foreign ministers, institutional entities, ambassadors, official spokespeople, and key diplomatic leaders. At this time, our focus is on senior officials and entities who are the official voice of the state abroad.
  • Accounts belonging to state-affiliated media entities, their editors-in-chief, and/or their senior staff.

Here is an example of a government official (look for the label “United Kingdom government official” just underneath their Twitter handle).

More importantly the label takes up significant screen real estate in each of that user’s tweets:

And here is a government organization:

Known government run media organizations have labels too:

But it’s important to note that not all government run accounts will have the labels. Here is the verified account for the Prime Minister of Pakistan.

How the Prime Minister of Pakistan’s office gets verified and not added to a list of known government accounts is hard to imagine. It probably says something about how difficult it is to uniformly apply these labels, and also keep them up to date. In case you were wondering, no the current Prime Minister of Pakistan doesn’t have a label either:

Over in the Documenting the Now Slack we had some questions recently about where to find these government labels in the data our tools collect. We took a look and it doesn’t appear that these labels are made available through either the v1.1 or v2 API endpoints. If this is wrong please get in touch!

Not making this information available is unfortunate because these labels are highly significant pieces of metadata for researchers who are studying the influence of government in social media conversations. They also provide a window in on how Twitter themselves see the governments of the world through their categorization rules and processes.

Since this information is only available through the web interface I recently spent some time adding functionality to the snscrape utility to extract the labels, as well as the label URLs. The label URLs are useful because, while they aren’t as specific as the label descriptions, they can sometimes be used to group together different language variants of the same label.

This morning I tested it out by collecting 60,000 tweets that mention the word “covid19”.

snscrape --jsonl twitter-search covid19 > results.jsonl

I then wrote a simple program to read the JSON and count the labels that were present:

Label tweets
China state-affiliated media 173
Iran state-affiliated media 21
Çin devletine bağlı medya 9
Медиј који сарађује са владом Србија 9
China government official 8
中国官方媒体 7
Thailand government organization 7
Russia state-affiliated media 4
Média affilié à un État, Russie 4
Média affilié à un État, Chine 4
Representante gubernamental de Cuba 3
Organisation du gouvernement - France 2
Organización gubernamental de España 2
Cuba - Funzionario di Stato 2
Medios afiliados al gobierno, China 2
Russia government account 2
Lembaga pemerintah Indonesia 2
Organización gubernamental de Cuba 2
Canada government official 1
Cina - Organizzazione governativa 1
Medios afiliados al gobierno, Honduras 1
Italia - Funzionario di Stato 1

However for that same dataset of tweets there were only these three Label URLs used:

URL tweets 219 47 22

So while the URLs don’t provide anywhere near the level of granularity that the descriptions do, they could be useful for grouping together language variants of the same label like:

  • China state-affiliated media
  • Média affilié à un État, Chine
  • Medios afiliados al gobierno, China

Right now my modification to the snscrape tool is in a pull request. If you think it might be useful in your own work please go and give it a thumbs up. Normally I’m not a huge advocate of scraping social media. But when it comes to data that is not available through sanctioned channels (APIs), and the data is being created (and gatekeeped) by powerful entities we aren’t left with much of a choice.

Reflections on Active Collecting During Difficult Times / In the Library, With the Lead Pipe

by Kyna Herzinger and Rebecca Pattillo

In Brief

At the onset of the COVID-19 pandemic, the Archives and Special Collections Library at the University of Louisville launched a project to collect the experiences of those living through what many saw as history in the making. Just weeks later, in the wake of the police killings of Breonna Taylor and George Floyd, activists and citizens took to Louisville’s streets to protest racial injustice, again marking an unequivocally historical moment. Instead of collecting protestors’ experiences, Archives and Special Collections began to consider the practicalities and the ethics of active collecting. These historical events occurred in succession, but the conversations around the movement for racial justice reframed our efforts as archivists to document pandemic experiences. This article explores the implications of active collecting and concludes that active collecting needs a framework of support. We briefly review the COVID-19 project we launched at the University of Louisville and assess its outcomes. We then use this project as a reflection on the role of active collecting as reframed by the lens of the movement for racial justice, and we propose that the scaffolding of this framework be formed by three important features: critical reflection, institutional affirmation, and an ethic of care.

Positionality Statement

Kyna Herzinger is a white-presenting Yonsei (fourth-generation Japanese American), Pacific Northwest U.S.-born, straight, cis, able-bodied woman. Rebecca Pattillo is a white, southern U.S.-born, queer, cis, able-bodied woman. At the time of writing, both are early to mid-career archivists employed in tenure-track faculty positions at the University of Louisville in Louisville, KY.


In 2016 and 2017, physicists Jordan Cotler and Frank Wilczek explored a framework in quantum theory that they called “entangled histories.” Using this framework, they conceptualized the past as “an entangled superposition [or interweaving] of time evolutions which are shaped by the outcome of measurement in the present.”1 For the non-physicist, journalist Jenna Wortham offers a useful translation of Cotler and Wilczek’s work. “Our best description of the past,” she summarizes, “is not a fixed chronology but multiple chronologies that are intertwined with each other.” 2 Historians, too, have harnessed the term “entangled histories” as a way to conceptualize the interconnected practices, perceptions, and processes that influence seemingly disparate cultures.3 Many of us, though, do not need the disciplines of physics or history to draw from simple observation this lesson: that countless interwoven stories give depth and nuance to a seemingly monolithic past, and at the same time, current technologies can potentially give voice to the many discrete stories that are entangled within a collective experience.4

Rewind to the early months of 2020 and the archivist’s thorniest question—what should we save?—seemed obvious. Significant history is rarely apparent when we are living in the moment, but with the global pandemic of the novel coronavirus, it felt like history was unfolding in real time. This awareness was paired with the sense that libraries, archives, and museums could throw all their resources toward documenting the evolving events but could never capture them all, and for that reason, archivists were uninhibited by the question of whose stories were the most significant and therefore the most worth saving. The pandemic represented an opportunity to preserve the entangled histories that would bring depth and nuance to tomorrow’s past. It was a chance for the documentary record to reflect more than just a handful of culturally dominant voices; it was a chance to capture a diverse cacophony.

The University of Louisville Archives and Special Collections (ASC) in Louisville, Kentucky, responded, sensing both an opportunity and an imperative. Along with many repositories across the United States, ASC launched a project to document the experiences of the university’s faculty, staff, students, and administrators during the COVID-19 pandemic by inviting the submission of photos, videos, emails, blog or social media posts, and the like. Experiences and reflections could range from direct observations to artistic expression and could touch on themes that spanned displacement from student housing, working from home, the shift to online learning or teaching, or leading the university through a crisis. In the following weeks, ASC also volunteered to serve as the preservation home for a COVID-19 time capsule initiated by the city’s Frazier Kentucky History Museum.5 This partnership avoided competing collecting while it harnessed each institution’s strengths, pairing the Frazier’s contacts with ASC’s technological infrastructure.

At the same time that schools moved online and white-collar employees were asked to work from home, a troublingly familiar story was captured by the media. A 26-year-old Black woman named Breonna Taylor was shot and killed by Louisville Metro Police when plain-clothed officers served a “no knock” warrant shortly after midnight on March 13, 2020. As the pandemic’s lockdowns and unprecedented job loss spurred widespread uncertainty, Louisville activists began to seek justice for Breonna. These local protests gained momentum as the national movement for racial justice grew in response to the May 25, 2020 murder of George Floyd by police in Minneapolis, Minnesota. Demonstrations and marches in Louisville grew from hundreds to thousands of participants and prompted a range of responses from local, state, and federal officials that at times grew violent. The events of the days and weeks that followed May 25 were extraordinary, and again, ASC saw the unequivocal significance of the moment. But while these events stirred the same impulse to collect, ASC did not attempt a documentation project. Instead, we began to consider the practicalities and the ethics of active collecting through the lens of this movement for racial justice.6

The reasons for this shift were varied and will be discussed in greater detail below. Without a doubt, though, the movement to recognize the deep-seated role of racial injustice within our infrastructures, our laws, our institutions, and our own selves all served to illuminate an archival past that has been fraught with issues of ownership, agency, and representation. As we saw violence and force used against protestors, we turned to activist archivists like the Blackivists and members of the teams that support Documenting the Now and WITNESS.7 Each highlighted issues that could emerge when preserving emotional, often traumatizing history and doing so with no foundation of trust, no mechanism for sustained support, and no guarantee of protection from the authorities. Out of their poignant work, we began to ask questions like: Is our collecting appropriate? How do we shift focus from collecting materials to serving communities?8 How do we ensure underrepresented communities are not simply the subject of collections but exercise control over their own histories and representation? How do we, as Scott Cline advocated in 2009, “operate within a moral and ethical imperative that ultimately associates archival practice [with]…the call of justice”?9

This article does not offer a tidy formula or a multi-step plan that will guide active collecting during difficult times. Instead, it offers a glimpse into a messy and—as of yet—unfinished process, grappling with the areas of professional practice that these short-term projects expose, but that demand thoughtful, long-term cultivation and commitment. The authors hope that through our experience readers will find useful issues to consider before launching any active collecting project. We will briefly review the COVID-19 project that we launched and assess the outcomes, and we will set ASC’s efforts to collect individual experiences during the pandemic into a reflection on the role of active collecting in the archival profession as re-examined through the lens of the movement for racial justice.


Historically, ASC’s component units, which consist of Rare Books, Photographic Archives, University Archives, Digital Initiatives, and the Oral History Center, have collected materials related to underrepresented individuals, groups, and topics.10 Each collecting area has proudly embraced this identity, redoubling efforts in recent years and enabling targeted collecting to inform routine practice in an effort to capture a more representative documentary record. One of the factors that influenced this trajectory was the preexistence of a collecting repository in Louisville, the Filson Historical Society, which had been actively preserving the early history of the city and the surrounding region since the late nineteenth century. Local history for many early historical societies focused on affluent, white families, and such was the case for the Filson.11 The core of its collections and resulting publications reflected the affluent, white, cis male perspectives and interests of its founding members, and as a consequence, modern, twentieth and twenty-first-century collections that focus on urban history, social service, cultural organizations, and Black and LGBTQ rights found a more natural fit with ASC. These circumstances have enabled ASC staff to actively collect materials that range from the personal experiences of Latinx community members to Louisville’s underground punk rock scene, but they have also been hindered by an institutional history of racism and a lack of racial diversity within ASC’s personnel.12

When situated within ASC’s collecting practices, the COVID-19 documentation project was not unusual, but neither was it unprecedented. In 2009, a flood of the Ohio River had triggered ASC’s first in-the-moment active collecting. This was prompted by local interest in Louisville’s worst recorded flood, which had occurred in 1937. The 1937 flood served as a community touchstone, captivating interest across generations because of its wide-reaching impact. The 2009 flood project ensured that documentation would not be lost and, more importantly, supported the ability of Louisville’s citizens to reflect on their own history. Without established policies or procedures, ASC quickly adapted forms and workflows in order to document the flood’s impact. ASC solicited digital photographs from community members and collected online content through a trial subscription of Archive-It, but even though the project established a precedent for quick-response collecting, it did not ultimately prompt a conversation about future active collecting. It is our hope that the COVID-19 project will help ASC frame key considerations that could guide future decision-making.13


Even though this article seeks to explore the implications of active collecting and not the merits of how ASC executed the project, we will briefly chart our approach to establish context. The University of Louisville issued a work-from-home mandate effective March 16, 2020, and classes, which were scheduled to resume after spring break but had been postponed for two days, moved online. One week into the university’s remote operations, the Director of ASC shared the University of North Carolina – Charlotte’s newly launched COVID-19 active collecting project, suggesting that this could be a task for the hourly and student workers who we worried would lose hours during remote operations.14 The idea of launching a project of this nature garnered enthusiastic support from us and our colleagues. As with the flood of 2009, we sensed that future generations would want to know how the university and the broader Louisville community responded to and navigated the crisis.

We quickly realized that the challenge of designing and executing this project in a short time frame was ill-suited for hourly and student workers, so a team of four faculty archivists worked to implement the project.15 In fact, the initial task of determining a scope and workflow were more complex than we anticipated and were further complicated by our new teleworking realities. Workstations consisted of whatever equipment employees had on hand at home and communication was handled through a flurry of overlapping emails, instant messages, and texts.

Despite these challenges, we quickly made several key decisions. First, we determined to collect solely from individuals affiliated with the University of Louisville. We felt that the deed of gift would be less complicated if we treated submissions like university records, and we were concerned about the number of submissions we might receive if we opened the project to the community at large. We also did not want to duplicate efforts that we learned were underway at other area institutions. Second, we selected a platform that would allow us to receive files with corresponding metadata along with a digitally signed agreement. Initially, we assumed that we could replicate the UNC Charlotte project wholesale, but we found that the Google Form that they used had some limitations. ASC would have had to pay for temporary storage, and contributors would have had to limit submissions to 10GB or less. We also considered file-sharing programs like the University’s Box subscription, DropBox, and others, but ultimately, we decided to use LibWizard. This product, which is an add-on to the popular LibGuides platform, ensured that each submission, along with its metadata and donor agreement, would remain together. Even though LibWizard accepted only certain standard file types, contributors could submit up to 100GB per entry, and we had the benefit of unlimited storage. Finally, we determined what information we wanted to collect from participants, and more importantly, we crafted a donor agreement (see Appendix A). The latter was adapted from existing agreements and drew heavily from what was used during the 2009 Flood project. Our libraries’ Scholarly Communications Endowed Chair, who holds a JD, reviewed the text which gave ASC a non-exclusive license to exhibit, promote, reproduce, and distribute the donated materials. As an open license, the donor would retain copyright over their material.

Our director proposed the active collecting project on March 21, and we launched it on March 25. With the submission portal live and a workflow adapted from our existing digital processing practices, we then turned attention to publicizing the portal. First, we shared directly with faculty partners who had regularly incorporated archival research into their coursework and with whom ASC staff had already developed a rapport. We also shared widely, notifying departments across campus and requesting their assistance to disseminate the link. We sent additional messages to student and employee organizations and placed a short pitch in the university’s announcement digest, which distributes separate daily emails to students and employees. Within a few days, the Office of Communications and Marketing contacted us about a write-up for the university’s official news site, and later, the project was featured by the Association of Research Libraries, alongside several other libraries and archives that launched similar projects.16 We also shared the project on ASC’s Facebook and Instagram social media accounts. Archivists from other states reached out with questions, and we encouraged them to borrow as much or as little as made sense for their circumstances. Meanwhile, the University Libraries’ Communications Coordinator ensured that the project was visible to the libraries’ stakeholders.

Once the portal was ready to receive submissions, we also revisited the idea of collecting experiences from the broader Louisville community. We learned by then that a local institution, the Frazier Kentucky History Museum, had launched a similar project called the Coronavirus Capsule.17 We knew it was impractical to run similar projects in the same geographic region. We also knew that the Frazier, a Smithsonian Institution affiliate whose mission is to interpret stories from the history of the state, did not have the infrastructure to preserve born-digital content. For that reason, we sought to form a partnership by offering to be the preservation home of the Coronavirus Capsule.

By the time representatives from both institutions met, the Frazier had established a website for the project, received submissions via email, and displayed them in a virtual exhibit. ASC’s offer to preserve and provide long-term access to the Coronavirus Capsule was accepted, an arrangement that had the added benefit of satisfying our vision to document as many experiences as possible. Since the Frazier had been affected by the governor’s “Healthy-at-Home” order, which suspended operations of all non-essential businesses, frontline employees who typically worked with the public were assigned remote tasks arranging and describing the submissions.18 ASC’s Metadata Librarian created a Google Sheet to capture metadata and document the submissions. The fields were standardized as much as possible by using data validation, drop-down menus with localized controlled vocabularies, and directions embedded in field headings. After several weeks of ongoing training and iterative changes to the spreadsheet, the Frazier staff developed a rhythm, and we repeated a similar process when they began preparing the files for transfer.


What we hoped to accomplish was clear to us. We wanted to capture the individual stories that would enrich an understanding of our shared experience during a truly historic moment. We could not, however, predict what would occur as a result of our efforts. The COVID-19 documentation project and the Frazier’s Coronavirus Capsule were, in effect, wildcards. Following the March 25 launch of ASC’s campus project, a few submissions trickled in and stopped just two weeks later. A second batch of materials, which appeared to be a class assignment, were submitted a few months later so that, in total, only eighteen individuals consisting of fourteen students, three staff, and one faculty member contributed their experiences. This was strikingly meager for a campus community of twenty-three thousand students.19 Although we plan to keep the portal open for at least the first half of 2021, there has been no evidence to suggest that we will capture the full campus experience that we sought.

In considering why we had such a low response rate, it is difficult to imagine that it never reached its audience—especially given our thorough and repeated efforts to share the project with the students, faculty, staff, and even administrators of our campus community. It is possible, though, that the submission portal itself was a deterrent. Metadata fields that required the donor to provide a description of the submission, to upload each file individually, and to navigate the “legalese” tone of the agreement may have discouraged some individuals. Some simply may not have had the time. Spring break had coincided with midterms, and students and faculty needed to quickly turn their focus to learning or teaching online and then to tackling end-of-semester projects. More importantly, the early months of the pandemic were especially stressful and uncertain. Some people quite understandably did not want to participate. Others did not have the emotional, mental, or even physical capacity to reflect on the moment, particularly as many juggled family health concerns, childcare responsibilities, job loss, food or housing insecurity, and other challenges triggered by the pandemic.

The Frazier’s Coronavirus Capsule, in comparison, received a much greater response with 328 separate submissions from individuals, families, and entire classrooms. This number reflected the project’s scope as it accepted materials from anyone in the Louisville Metro region, but it also revealed the number of relationships that the Frazier had already cultivated with area educators and which had been formalized for the purpose of the project as a partnership with the Jefferson County Public Schools. Although there were more submissions, the vast majority came from elementary school children and were works of amateur art. The Coronavirus Capsule thus contained a considerable amount of material that we had not previously accepted as archival.

Similarly, materials received from the faculty, staff, and students of the university community were not items that we would have otherwise automatically accessioned. Submissions included some visual items like a photograph of the vacant campus and a queue of hardware store customers spaced every six feet, as well as a video of empty toilet paper shelves. As discrete items, the materials told little of the story that was not better framed and contextualized by local media. Submissions also included written reflections, but the narratives were short, averaging only a paragraph long, and tended to be broad. They touched on multiple themes like isolation from friends, increased time with family, and remote instruction, but were scant on the illustrations or explanations that give depth to first-hand accounts. Furthermore, the typos and unclear prose betrayed several authors’ hasty efforts.

One student, who is now an alumna, did succeed in capturing a full picture of her experience, and she did so by submitting thoughtful content every few weeks. Her submissions contained a mix of journal entries, photographs, and even an original political cartoon. The student struck a balance between specific experiences and contextual observations. Early on, she reflected on the inequities exposed by the pandemic, and as the summer progressed, she shifted to the movement for racial equality, the presidential election, and the spike of infections going into the new year. The Frazier’s Coronavirus Capsule contained similar gems such as an original folk song by two local musicians and evocative artwork that captured the range of experiences—from isolation and loss to determination and hope.

The trends we observed in the first few months remained steady throughout the year. In hindsight, these early observations hinted at some underlying issues with active collecting during difficult times, but these issues did not become apparent to us until the movement for racial justice began to unfold. This social reckoning, which centered on the impact of systemic racism and white supremacy in American life, highlighted racial disparities like higher death and unemployment rates in communities of color during the COVID-19 pandemic. This simple fact served as a catalyst to reframe our active collecting. We began to wonder how we might be perpetuating problematic representation in our collections and on whose emotional labor we were relying. Put another way, we witnessed two significant historical events—a pandemic and a social movement—that occurred in rapid succession, but in which the second illuminated the first. In response, we began to grapple with the ways that privilege or the lack of privilege, stress or trauma, and movements or individual acts of dissent can play out in what is collected, how it is collected, and how it is represented or used.


The death of Breonna Taylor on March 13 coincided with early efforts to contain the spread of COVID-19 in the United States, and the investigation into Taylor’s death paralleled the launch of both ASC and the Frazier’s active collecting projects. Taylor’s death was covered by the local media, but it was overshadowed by coverage of the growing pandemic. The movement to seek justice for Breonna Taylor began to take shape two months later around mid-May, and following the May 25 murder of George Floyd, quickly solidified locally and across the nation. Hundreds of protestors gathered in Louisville on May 26 and, over the following weeks, that number swelled to thousands. Jefferson Square Park, which is located at the heart of the city and adjacent to the county courthouse, city hall, and the Louisville Metro Police Department headquarters, became an impromptu base for activists and was renamed “Injustice Square Park” and “Breonna Taylor Park” by local Black organizers. Activists built a memorial to Taylor and the countless Black victims of racial and police violence and continuously occupied the park daily until winter weather made conditions unsafe.20 Meanwhile, civic groups, institutions, religious organizations, and University of Louisville students, faculty, and staff organized additional marches throughout the city during the spring, summer, and fall of 2020.

The movement for racial justice occurred nearly back to back with the pandemic and captured our attention as caretakers and advocates of our local history and witnesses to the continued brutalization of Black lives. Our desire to document was driven by the conviction that history and the ability to reflect on one’s own experience are important, but that day-to-day life is often fleeting. These sentiments absolutely rang true in the early weeks of the pandemic and reverberated during the local movement for racial justice. But this resonance also offered an intense, strikingly useful lens that nudged us toward self-reflection. We had not grappled with the pragmatic issues encountered during the COVID-19 projects (such as the impact on staff and lack of community engagement), much less the more troubling (the inclination to collect traumatic experiences, including from those who would bear a greater position of the costs). Meanwhile, the movement for racial justice illuminated the need for humility and an ethical response in our archival work that we had overlooked. The following highlights some of the reflections and questions that we have been grappling with ever since.

As the Justice for Breonna Taylor movement gained momentum, we were immediately struck by our own impulse to actively collect materials that would document the protests. We considered our positionality as white and white presenting women, respectively, within a nearly all-white archival staff, at a predominantly white institution. We did not trust our own perspectives, which, from our own place of privilege, had driven a sense of urgency into documenting the COVID-19 pandemic. We wondered how collecting racial justice experiences would only reinforce the cultural dominance of racism. Our archival colleagues, after all, cautioned that records are “value-laden instruments of power,” and their creation, representation, and use are far from neutral.21 For that reason, we were hesitant to expand our active collecting to incorporate the movement for racial justice. Our first question shifted to if, rather than what, we should collect.

We were also hesitant to document protest experiences without the collaboration of local organizers. Comments that came out of the Society of American Archivists’ Community Reflection on Black Lives and Archives forum stressed that some local organizations may not want the assistance of archivists from predominantly white institutions, and forum participants recognized that some communities had little reason to trust those who have historically ignored their history.22 These considerations reframed our role as we accepted that we may not be the right people to actively collect the material, our institution may not be the right place to preserve them, and now may not be the right time. Underlying this archival restraint was not inaction, but a movement toward an archival autonomy that is grounded in the recognition of participants as co-creators who should be empowered as decision-makers.23 Because we did not have established partnerships with the activist community, nor an adequate relationship of trust with existing community-based memory workers, we chose to collect widely accessible materials as a launching point for our future researchers. These materials lacked depth as they took the form of local and national news articles, as well as social media posts promoting local events, press conferences, and actions by local activists and racial justice organizations. To protect privacy, the materials did not identify individual participants but captured the public face of organizations that were mentioned in the media. We have not ruled out the possibility of more substantial or directed collecting in the future, but only if it involves the Black activist community and memory- workers, is given institutional priority and support, and can be done with what public historian Aleia Brown describes as an “ethic of care.”

Brown’s work is situated in a long history of Black feminist scholarship and memorializes ongoing events that affect Black life. Brown describes an ethic of care as “a deep love for Black folks, and commitment to being accountable to Black communities.”24 Michelle Caswell and Marika Cifor also frame the archivists’ role through the lens of feminist theory, highlighting that an ethic of care binds archivists “to records creators, subjects, users, and communities through a web of mutually effective responsibility.”25 As we discussed the possibility of documenting the racial justice movement in Louisville, we did so with an understanding that without a reciprocal and mutually beneficial relationship with the activists and citizens we sought to document, we would be neglecting that ethic of care.

The issues that became apparent during our conversations about documenting the racial justice movement in Louisville revealed a lack of thoughtful consideration about active collecting as a whole, so that, ironically, these relatively short-term projects exposed areas in need of long-term attention. Two areas, in particular, were captured by archivist Mario Ramirez, who highlighted, first, that simply acquiring diverse collections without diversifying those who are doing the collecting is inadequate. ASC’s success in collecting from underrepresented communities obscured an institutional history steeped in whiteness, and one tellingly marked by a nearly all-white archival staff. Our white identities influence the decisions we make about acquisition, description, and use, and ultimately, shape what and how collections are represented to researchers. By choosing a rather passive approach to our COVID-19 documentation project by way of an online portal and lack of significant interpersonal engagement, we relied on the neutrality and safety of our whiteness. Second, Ramirez argued that repositories must embrace “a paradigmatic shift in power wherein whiteness no longer claims unquestioned and protected status and where the roots of our professional imbalances are addressed.”26 Black, Indigenous, and other archivists of color have borne the weight of this work, highlighting the harmful and oppressive systems that are upheld by a predominantly white profession.27 Public historian GVGK Tang poignantly observed one of the consequences of these power structures. “White middle-class public historians” they note, “are columbusing activist history-making and grassroots preservation work—treating it as a new frontier to be discovered, explored, and exploited. The recent spate of pandemic and protest collection projects initiated by traditional practitioners erases a rich history of activist-led scholarship and documentation efforts.”28 At minimum, white archivists must listen and learn, particularly when it comes to active collecting of events that disproportionately affect BIPOC, LGBTQ+, neurodiverse, disabled, and other historically marginalized groups. “Ultimately, rendering marginalized communities the subjects of your research [or in this case, active collecting effort]” Tang writes, “doesn’t absolve you of your privilege or complicity in an inherently anti-Black, racist, classist, and ableist system.”29 If, indeed, “one of the most overlooked but important things that archivists working in hegemonic institutions can do is to ensure the acquisition, preservation, and accessibility of the very records that hold that institution accountable to its constituents,” what can archivists do to redirect their efforts and attention?30 Instead of commandeering the work and history of underrepresented communities, perhaps archivists should consider what they can do to dismantle oppressive structures by maintaining the evidence of individual and group actions.

An ethic of care should have also extended to employee wellness. As we came to realize the importance of aligning rapid response projects with strategic priorities, we realized that we had never discussed what tasks would be postponed to make room for active collecting. Although projects that take longer than expected are far from abnormal, our day-to-day responsibilities never abated with some things taking longer to accomplish simply because of our remote work environments. The financial impact of the pandemic, meanwhile, prompted salary reductions, retirement contribution cuts, and furloughs across campus, but we met the demands of the projects aware of how important it was to ensure that ASC was seen as a contributing member of the campus community. To be sure, we were grateful to have jobs, but in hindsight we wondered how to affirm workers during times when compensation is simultaneously diminished. We also wondered what could have been done to ensure that the workload did not add additional pressure to employees who were already coping with the stresses of an evolving COVID-19 environment. ASC employees, after all, were not immune to the concerns of the pandemic and Louisville’s reckoning with racial injustice.

Our archival colleagues outside of Louisville took interest in our active collecting efforts, with an eye toward launching something similar at their own institutions. It seemed to us that these kinds of projects were almost (forgive the bad pun) contagious as if many of us were engaged in a professional form of “keeping up with the Joneses.” Thanks to an increase in virtual professional development, which quickly became the norm through the summer and into the fall, we connected with institutions across the nation that also launched documentation projects and who described the same lack of engagement and dearth of submissions that we had seen. In our own community we were careful not to duplicate the collecting efforts of other area institutions, so if individuals did not want to share their experiences or did not see a need to do so, were these projects only serving archivists’ desires to collect? Eira Tansey described this initial fervor to collect first COVID-19 and then racial justice materials as “the newest form of archival commodification” wherein archivists exploit personal experiences—at times traumatic experiences—in their scramble to appear relevant or signal care for their communities. In the end, though, archivists only jeopardize their own relevance as they shift attention away from one of their most important functions: holding institutions accountable.31 If preserving evidence of and maximizing access to institutional acts enables public scrutiny, could our time have been better spent? For example, in the early months of the pandemic, our administrators claimed that the university was committed to ensuring furloughed employees remained financially whole, but in fact, the administration targeted the lowest wage earners at a time when unemployment offices were overwhelmed. Should we have done more to capture decisions that resulted in the most precariously employed navigating multiple pay periods without receiving either a paycheck or unemployment insurance benefits?32

Finally, in the midst of these weighty issues, we wondered whether active collecting was the best use of our greatest resource: time. By May, we had observed the lackluster response to the COVID-19 projects just as we realized that we had invested many more hours than we intended. This is not to say that an underperforming project or even a failed project is inherently without merit, but as time progressed, we lost sight of where active collecting fit into our core mission—to the degree that we wondered if, in taking on these projects, we had undermined our ability to accomplish our own strategic priorities that centered on access to existing collections and outreach to the faculty, staff, and students of our campus community. As we grappled with the best ways to achieve our goals within the confines of the pandemic, we realized that our finite resources deserved thoughtful allocation. This became even more apparent when we noticed that our administrators and publicists were the parties most interested in our COVID-19 active collecting. Archivist Eira Tansey aptly cautioned that these types of projects can “provide our administrators with feel-good press releases so they can somehow show that we’re responding to societal concerns, but without actually requiring any accountability or significant resource allocation on the part of the institution itself.”33 We felt like we presented ourselves capable of weaving gold from the proverbial straw, and even though ASC has had a long history of shoe-string resources and can-do attitudes, we wondered what we were communicating to resource allocators and stakeholders when we continued to do more with less, especially in the face of a public health crisis that was financially impacting our institution.


The Latin roots of our English word record mean “to recall the heart of” and aptly capture the idea of revisiting the central significance of the past.”34 In recording our experiences, we do much more than jot down facts or plot data points; we capture and remember an essential and vital part of our own story. The archivist is especially attuned to the value of capturing these stories, but not only that; the archivist is aware that the documents which support our understanding of history are skewed heavily toward those in power. “[W]e learn most about the rich, not the poor; the successful, not the failures; the old, not the young; the politically active, not the politically alienated; men, not women; white, not black; free people rather than prisoners.”35 In response, we archivists have seized the idea that ordinary, obscure, and even silenced people have something to say, which leads us, on the one hand, to suggest that active collecting—especially when undertaken during difficult times—should have a robust framework of support. Even when events necessitate a rapid response, the archivist’s instinct to collect should be buoyed with self-reflection, affirmed by institutional prerogatives, and marked by an ethic of care. But, we also recognized the impossibility of a single archive collecting the experiences of all members of society; it must be handled widely, and it must be handled outside of institutions.

We entered into our active collecting projects without the sort of scaffolding that supported self-reflection, but despite our short-sightedness, the important events of 2020 created space to reflect on our decisions. They challenged our assumptions and reframed our responsibilities, prompting us to ask how we might better prepare for the next historic moment. This is not to say that we have formulated a one-size-fits-all solution. We have increasingly wondered, in fact, how an entirely different approach might have better served the faculty, staff, and students of our university community, and how ASC, in partnership with other collecting repositories in our area, might have empowered individuals living in and around Louisville to document themselves. Rather than replicate the many documentation projects that sought to collect and preserve, we wondered how we could have facilitated discussion around personal archiving and preservation methods. How might we have equipped individual citizens to carry out their own active collecting?

Indeed, the most evocative stories about COVID-19 and the Justice for Breonna Taylor movement in Louisville were not captured by either ASC’s COVID-19 project or the Frazier’s Coronavirus Capsule. During the fall, professors from the university’s Departments of Pan-African Studies and History had students, most of whom were Black and Brown, conduct interviews as part of the semester’s coursework. The classes collaborated with the director of ASC’s Oral History Center who provided the professors and students with training and guidance and accessioned the materials which are now part of ASC’s permanent collection. The resulting oral histories captured rich stories that revealed the complex interweaving of the effects of COVID-19 and the racial justice movement on Louisville’s Black and Brown community and its activists, and they inspired a sense of ownership in the telling of history. They also highlighted an impressive level of participant engagement. Empowering University of Louisville students to do their own archival project, under the guidance of their professor and an archivist, challenged the traditional oppressive structure of the archive—one which traditionally places power of accession, description, and collecting in the hands of a majority white, institutional archive. In comparison, our efforts to document were, at best, clumsy and, at worst, dominating as we toed the line between collecting university employee and student records and controlling or even shaping them.36 The very nature of our online submission form required a sense of technical know-how from those we sought submissions from. Who might this have excluded?

Although we have explored some issues that may surface while documenting history in the moment, we recognize that these considerations are not exhaustive, but that they are the beginnings of a discussion centered on what is needed to support rapid response collecting. The questions that we have explored and that are outlined below may form the basis of a framework to guide decision-making, whether in response to a localized flood that displaces individuals from their workplaces and dorms, a pandemic that disproportionately impacts the most vulnerable populations, a movement calling for long-overdue justice in the face of racism, or any other significant event. Of all the things to consider, perhaps the most crucial should focus on who is best equipped to meet peer to peer with the community that is being served. Is there potential for the project to be community-led with institutional support, rather than being institutionally led with community support? Is there community-led or grassroots documentation already taking place? If a formal archive is best to handle an active collecting project, it should be supported by three areas: critical reflection, institutional affirmation, and an ethic of care. The following questions may be used to consider these areas.

Critical reflection:

  • What resources do we anticipate we will need to complete the project?
  • Do we have enough financial, human, and technological resources to meet that need? If not, how will we secure those resources? 
  • What might need to be postponed or eliminated to take on the new project?
  • Does the project ultimately serve the institution, rather than archival users? 
  • Could this project be seen as archival commodification? Rather than ask what you should collect, determine if you should collect.

Institutional affirmation

  • Where does this project fit into the archive’s mission and strategic priorities?
  • How will we communicate (and later reiterate) the importance of this project to our staff? To our administrators? 
  • How can we seek additional support from directors, deans, and other administrators? What additional support is needed to see the project through?

Ethic of care

  • What is the potential emotional impact to staff and community members?
  • What resources are available to assist both staff and community members and how do we incorporate those resources into our messaging?
  • How does the positionality and identities of the archivist reinforce or dismantle existing power structures?
  • How does the project serve or support the community it documents? Is there potential for harm? If so, are there ways to mitigate harm?
  • Who are the project partners? How will we ensure the community’s perspectives are heard?
  • If there is a lack of or no established relationship with those you seek to document, how can you build and sustain trusts amongst them?

Ultimately, these questions explore the unintended consequences of archival work and seek to position that work within a framework of professional values. We hope that others will build on this foundation by revealing limitations, suggesting additional considerations, and connecting real-world examples to the professional principles that support our collective work.


We are indebted to Stevie Gunter, for graciously agreeing to review this article and to the Lead Pipe’s Ian Beilin and Denisse Solis, for providing helpful insights and key feedback and shepherding us through the editorial process. We would also like to thank our colleagues at the University of Louisville for providing space to grapple with these ideas. Rebecca would like to thank her partner, Charlotte Asmuth, for graciously offering their writing expertise as they read through many drafts of this article.

Appendix A: Project Portal

Documenting Your Experience during the Covid-19 Outbreak

We are living in an historic moment. In the same way that, today, we want to know how Louisvillians navigated the historic 1937 flood of the Ohio River, years from now, others will want to know how we navigated the experience of a global pandemic brought on by the novel coronavirus.

In the spirit of documenting this moment, the University of Louisville Archives and Special Collections wants to collect and preserve the experiences and reactions of UofL students, staff, faculty, and administrators. Personal accounts can range from direct observations to artistic reflection and may touch on any number of themes such as displacement from student housing, working from home, the shift to online learning or teaching, social distancing or self quarantining, or leading the university through the crisis. Personal accounts can be in the form of a journal or blog, email, photos, videos, audio recordings, or social media posts.

First Name:

Last Name:

Email Address:

What is your University of Louisville affiliation (administrator, faculty, staff, or student)?


  • You may upload standard word processing documents, spreadsheets, presentations, images, audio or video recordings, and compressed files.
  • A single file may not exceed 100 MB (if your file does exceed this limit, please complete this form and contact to coordinate an alternate file transfer method.
  • You may upload 10 files per form.

Describe your items. 

Please include the location/event, date, people included, and any other relevant information known.

Are you the sole creator of these materials?

If not, please list the names of any other creators or co-creators, their email address(es), and the circumstances of how you came to have the materials. IMPORTANT: you must seek the approval of your co-creator(s) before submitting co-created materials. Co-creator(s) should also complete a copy of this form, but note that files do not need to be uploaded.


By clicking on the checkbox and initialing this form, I acknowledge that as the creator and/or copyright holder of the submitted materials (“the materials”), I grant to the University of Louisville and Archives and Special Collections (“ASC”) an irrevocable and nonexclusive license to make use of the materials, including but not limited to reproduction, distribution, derivative adaptations, and public performances and displays, consistent with accepted archival practices and ASC policies as they may exist from time to time. A non-exclusive license transfers no copyright and the submitter otherwise retains all other rights in the materials subject to this prior nonexclusive license. By submitting the materials, I certify that I am the creator and/or copyright holder and have full authority to grant this license and have exercised appropriate diligence in creating the materials and capturing any images, likenesses, and/or other inclusions of possible 3rd party copyrighted materials. I also acknowledge that ASC may distribute the materials under an open license, such as Creative Commons, of ACS’s sole choosing that allow others to make use of the materials consistent with the terms of the open license in order to make the materials available for educational, informational, and similar purposes worldwide.


Check one:

I grant permission to use my name as the donor for exhibits, description, and publicity

I wish to remain anonymous


Briston, Heather. “On Accountability.” In Archival Values: Essays in Honor of Mark A. Greene, edited by Christine Weideman and Mary A. Caldera. Chicago: Society of American Archivists, 2019.

Brown, Alex. Twitter, August 18, 2021.

Brooks,Caitlin. “UofL Archives and Special Collections documenting COVID-19 experiences,” UofL News, April 3, 2020,

Caswell, Michelle. “Teaching to Dismantle White Supremacy in the Archives.” Library Quarterly 87, no. 3 (July 2017): 222-235.

——— and Marika Cifor. “From Human Rights to Feminist Ethics: Radical Empathy in the Archives.” Archivaria 81 (spring 2016): 24. 

———, Marika Cifor, and Mario H. Ramirez. “’To Suddenly Discover Yourself Existing’: Uncovering the Impact of Community Archives.” American Archivist 79, no. 1 (spring/summer 2016): 56-81.

———, Alda Allina Migoni, Noah Geraci, and Marika Cifor. “‘To Be Able to Imagine Otherwise’: Community Archives and the Importance of Representation,” Archives and Records 38, no. 1 (2016): 5-26,

Clay, Richard H.C.. “From the President,” The Filson News Magazine 20, no. 2 (Summer 2020): 3.

Cline, Scott. “‘To the Limit of Our Integrity’: Reflections on Archival Being.” American Archivist 72, no. 2 (fall/winter 2009): 331-343.

Cook, Terry. “Archival Science and Postmodernism: New Formulations for Old Concepts.” Archival Science 1 (2001): 3-24.

———. and Joan M. Schwartz. “Archives, Records, and Power: From (Postmodern) Theory to (Archival) Performance.” Archival Science 2 (2002): 171-185.

Cotler, Jordan and Frank Wilczek. “Entangled Histories,” Physica Scripta 2016, no. T168 (May 2016): 1-7.

———. “Temporal Observables and Entangled Histories.” (Unpublished manuscript, February 20, 2017).

Daniels, Caroline, Heather Fox,  Sarah-Jane Poindexter, Elizabeth Reilly. “Saving All the Freaks on the Life Raft: Blending Documentation Strategy with Community Engagement to Build a Local Music Archives.” American Archivist 78, no. 1 (spring/summer 2015): 238-261.

Drake, Jarrett M. “Liberatory Archives: Towards Belonging and Believing (Part 1).” On Archivy (blog). (October 22, 2016).

Dunbar, Anthony W. “Introducing Critical Race Theory to Archival Discourse: Getting the Conversation Started.” Archival Science 6, no. 1 (2006): 109-29.

Evans, Joanne  et al. “Self-determination and Archival Autonomy: Advocating Activism.” Archival Science 15 (2015): 356-57.

Gould, Eliga H. “Entangled Histories, Entangled Worlds: The English-Speaking Atlantic as a Spanish Periphery.” American Historical Review 112, no. 3 (June 2007): 766.

Groves, Kaylyn. “Research Libraries, Archives Document Community Experiences of COVID-19 Pandemic.” ARL Views (blog), May 14, 2020.

Harrison, Lowell H. “A Century of Progress: The Filson Club, 1884-1984.” Filson Club History Quarterly 58, no. 4 (October 1984): 381-407.

Howard, Rachel, Heather Fox, and Caroline Daniels, “The Born-Digital Deluge: Documenting Twenty-First Century Events,” Archival Issues 33, no. 2 (2011): 100-109,

Huvila, Isto. “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management.” Archival Science 8 (2008): 15-36.

Jimerson, Randall C. “Embracing the Power of Archives.” American Archivist 69, no. 1 (spring/summer 2006): 32.

———. Archives Power: Memory, Accountability, and Social Justice. Chicago: Society of American Archivists, 2009.

Jules, Bergis. “Confronting Our Failure of Care Around the Legacies of Marginalized People in the Archives.” On Archivy (blog), November 11, 2016.

———. “Archiving Protests, Protecting Activists.” DocNow, published June 17, 2020.

Ketelaar, Eric. “Tacit Narratives: The Meanings of Archives.” Archival Science 1 (2001):131-141.

Project STAND. “Documenting Student Activism w/o Harm.” Accessed March 1, 2021.

Quinn, Patrick M. “Archivists and Historians: The Times They Are A-Changing.” Midwestern Archivist 2, no. 2 (1977): 5-13.

Ramirez, Mario. “Being Assumed Not to Be: A Critique of Whiteness as an Archival Imperative.” American Archivist 78, no. 2 (fall/winter 2015): 339-356.

Rothert, Otto A. The Filson Club and its activities, 1884-1922: A History of the Filson Club, including lists of Filson Club publications and papers on Kentucky History prepared for the Club, also Names of Members. Louisville, KY: J.P. Morton, 1922.

Sixty Inches from Center. “The Blackivists’ Five Tips for Organizers, Protestors, and Anyone Documenting Movements.” Published June 2, 2020.

Tang, GVGK. “We need to talk about public history’s columbusing problem.” History@Work (blog), National Council on Public History, June 25, 2020.

Tansey, Eira. “No One Owes Their Trauma to Archivists or the Commodification of Contemporaneous Collecting,” Eira Tansey (blog), June 5, 2020.

Texas After Violence Project. “Trainings.” Accessed March 1, 2021.

Wetherington, Mark V. “Filson Club Historical Society.” In The Encyclopedia of Louisville, edited by John E. Kleber, 289. Lexington, KY: University Press of Kentucky, 2001.

WITNESS. “Witness Resources.” Accessed March 1, 2021.

Wolf, Stephanie,. “Downtown Breonna Taylor Memorial Will ‘Rest with her Ancestors’ at Roots 101.” WFPL News, November 2, 2020.

Wortham, Jenna. “How an Archive of the Internet Could Change History.” New York Times Magazine, June 21, 2016.
Zinn, Howard. “Secrecy, Archives, and the Public Interest.” Midwestern Archivist 2, no. 2 (1977), 14-26.

  1. Jordan Cotler and Frank Wilczek, “Entangled Histories,” Physica Scripta 2016, no. T168 (May 2016): 7, See also Jordan Cotler and Frank Wilczek, “Temporal Observables and Entangled Histories” (unpublished manuscript, February 20, 2017),
  2. Jenna Wortham, “How an Archive of the Internet Could Change History,” New York Times Magazine, June 21, 2016,
  3. Eliga H. Gould, “Entangled Histories, Entangled Worlds: The English-Speaking Atlantic as a Spanish Periphery,” American Historical Review 112, no. 3 (June 2007): 766,
  4. Wortham, “How an Archive of the Internet Could Change History.”
  5. “Coronavirus Capsule,” Frazier Kentucky History Museum, accessed March 1, 2021,
  6. Throughout this article we use the term “active” to describe the idea of contemporaneous collecting in order to document events as they are happening. Even though our project’s methods were relatively passive (participation, after all, was voluntary), we use this term in keeping with its broader use in the profession. Active collecting finds its roots in the American archival tradition and, specifically, in Theodore Schellenberg’s mid-twentieth century contributions to appraisal theory. Schellenberg argued that one of the archivist’s key roles is to select records for long-term preservation, and his work informs the intellectual foundation for the late-twentieth century idea of saving records as they are created. Nonetheless, a number of archivists have acknowledged the immense challenge and inherent pitfalls of actively shaping the historical record. As Canadian archivist Terry Cook noted, the archive is a “site where social memory has been (and is) constructed—usually in support, consciously or unconsciously, of the metanarratives of the powerful.” South African archivist Verne Harris highlighted an additional layer of complexity when he considered how the archivist infuses layers of meaning into the records at each stage of the curatorial process—not just when it is selected for preservation. When taken together Cook and Harris offer cautionary words, but they also invite archivists to be highly transparent in their professional practice and to create space for multiple interpretations of the record. While our active collecting project cannot be divorced from the positionality or temporality of ourselves or our institution, we echo Cook in acknowledging that records are in a continuous process of being reimagined, and we hope that this article can be a tool in that process. Theodore R. Schellenberg, Modern Archives Principles and Techniques (Chicago: University of Chicago, 1956), 138-139; Terry Cook, “Fashionable Nonsense or Professional Rebirth: Postmodernism and the Practice of Archives,” Archivaria 51 (2001), 27; Verne Harris, “Claiming Less, Delivering More: A Critique of Positivist Formulations on Archives in South Africa,” Archivaria 44 (1997), 136-138.
  7. “The Blackivists’ Five Tips for Organizers, Protestors, and Anyone Documenting Movements,” Sixty Inches from Center, published June 2, 2020,; Bergis Jules, “Archiving Protests, Protecting Activists,” DocNow, published June 17, 2020,; “Witness Resources,” WITNESS, accessed March 1, 2021,; “Trainings,” Texas After Violence Project, accessed March 1, 2021,; “Documenting Student Activism w/o Harm,” Project STAND, accessed March 1, 2021, The authors are indebted to the members of Documenting the Now, WITNESS, The Blackivists, Texas After Violence Project, Project STAND, and other Black archivists and memory workers who have shared their expertise on documenting the racial justice movement.
  8. Throughout this article, we use the term “community” to describe groups of people that may share a common geographic location, race, ethnicity, position (i.e. student or employee), or some combination of these and other identities. We recognize that this is an imprecise term and, more importantly, that librarians and archivists can be imperceptive to the many communities that ought to be served. Alex Brown (@QueenOfRats) aptly noted, “Our service population is not 1 group but many. There is more than 1 community in a town or mentro [sic] area, but we provide the most services to 1 of those groups (guess which).” Twitter, August 18, 2021, 9:26 a.m.,
  9. Scott Cline, “‘To the Limit of Our Integrity’: Reflections on Archival Being,” American Archivist 72, no. 2 (fall/winter 2009): 331-343,
  10. ASC’s component units developed separately with Rare Books having been established in 1957; Photographic Archives in 1962; the Oral History Center in 1968; University Archives, which includes community manuscript collections, in 1973; and Digital Initiatives in 2006. All units merged into a single library in 2013. Of the collecting areas, the university’s archives and manuscripts was the one to embrace inclusive collecting from its inception thanks to the work of several key individuals. Broadly, their efforts reflected then-emerging currents within the field of history as well as their individual experiences. (Our colleague Tom Owen, for example, taught the University’s inaugural course in African American history—a concession by the university’s administration in the wake of student protests in 1969—and brought that lens into his work with the archives. He was also a self-proclaimed community hustler for defunct organizations.) When the archives received records from local individuals and organizations, the Oral History Center collected corresponding narratives, resulting in the African American Community and Louisville’s Jewish Community oral history collections. As the paper, photo, and rare book collections grew, the oral histories expanded as well to cover topics in the 1980s-90s like school integration, urban renewal, the Civil Rights Movement in Louisville, and more recently in the early 2000s Louisville’s LGBTQ community and Fairness Campaign.
  11. Patrick M. Quinn, “Archivists and Historians: The Times They Are A-Changing,” Midwestern Archivist 2, no. 2 (1977): 8. Quinn observed that “traditional notions of what types of primary source materials should be collected and from what sectors of the population…encouraged an elitist approach to writing history, an approach that in effect ignored the history of blacks and other minorities, women, working people and the poor.” Lowell H. Harrison, “A Century of Progress: The Filson Club, 1884-1984,” Filson Club History Quarterly 58, no. 4 (October 1984): 381-407; Otto A. Rothert, The Filson Club and its activities, 1884-1922: A History of the Filson Club, including lists of Filson Club publications and papers on Kentucky History prepared for the Club, also Names of Members (Louisville, Ky.: J.P. Morton, 1922), 15-19; Mark V. Wetherington, “Filson Club Historical Society” in The Encyclopedia of Louisville, ed. John E. Kleber (Lexington, Ky.: University Press of Kentucky, 2001), 289. Like many collecting repositories, the Filson Historical Society has since taken steps to expand its collecting scope. Most recently, in the wake of the movement for racial justice, to “[a]ctively engage with the Louisville Black community to more fully archive the marginalized histories of our city, state, and region.” Richard Clay, “From the President,” The Filson 20, no. 2 (Summer 2020): 3.
  12. Caroline Daniels et al., “Saving All the Freaks on the Life Raft: Blending Documentation Strategy with Community Engagement to Build a Local Music Archives,” American Archivist 78, no. 1 (spring/summer 2015): 238-261,
  13. Rachel Howard, Heather Fox, and Caroline Daniels, “The Born-Digital Deluge: Documenting Twenty-First Century Events,” Archival Issues 33, no. 2 (2011): 100-109,
  14. “Contribute your stories of the Covid-19 outbreak,” J. Murrey Atkins Library, UNC Charlotte, published March 26, 2020,
  15. The University of Louisville extends faculty status to some positions within the University Libraries, including to seven of the thirteen positions in ASC. Each individual who was involved in the project holds one of these tenure-track faculty positions.
  16. Caitlin Brooks, “UofL Archives and Special Collections documenting COVID-19 experiences,” UofL News, April 3, 2020,; Kaylyn Groves, “Research Libraries, Archives Document Community Experiences of COVID-19 Pandemic,” ARL Views (blog), May 14, 2020,
  17. “Coronavirus Capsule,” Frazier Kentucky History Museum, accessed March 1, 2021,
  18. Executive Order 2020-257 (signed March 25, 2020),
  19. University of Louisville, Just the Facts 2019-20, 2020,
  20. Stephanie Wolf, “Downtown Breonna Taylor Memorial Will ‘Rest with her Ancestors’ at Roots 101,” WFPL News, November 2, 2020, The memorial to Breonna Taylor has since been transferred to Roots 101 African American Museum.
  21. Terry Cook and Joan M. Schwartz, “Archives, Records, and Power: From (Postmodern) Theory to (Archival) Performance,” Archival Science 2 (2002): 178. See also Terry Cook, “Archival Science and Postmodernism: New Formulations for Old Concepts,” Archival Science 1 (2001): 7; Eric Ketelaar, “Tacit Narratives: The Meanings of Archives,” Archival Science 1 (2001):139, 141; Randall C. Jimerson, “Embracing the Power of Archives,” American Archivist 69, no. 1 (spring/summer 2006): 32. The literature applying postmodern theory to archival concepts is a useful foundation for conceptualizing archival power. Cook, for example, shows that the power at work during record creation—including even “the document’s structure, resident information system, and narrative conventions”—shape the records, arguing that these forces are, in fact, more important than the records’ content. Ketelaar observed that “not only the administrative context, but also the social, cultural, political, religious contexts of…creation, maintenance, and use,” shape the record. He challenges archivists to “stress the archive’s power” through deconstruction and reconstruction, an idea that Jimerson also encouraged.  Archivists, he wrote, should “embrace the power of archives” as a force for good, suggesting that archives can protect public interest rather than the privileges of the powerful.
  22. Society of American Archivists, Community Reflection on Black Lives and Archives (forum, held online, June 12, 2020), accessed March 1, 2021,
  23. Joanne Evans et al., “Self-determination and Archival Autonomy: Advocating Activism,” Archival Science 15 (2015): 356-57.  The authors defined “archival autonomy” as “the ability for individuals and communities to participate in societal memory, with their own voice, and to become participatory agents in recordkeeping and archiving for identity, memory, and accountability purposes.”
  24. Aleia Brown (@CollardStudies) “The missing details, assumptions, one-dimensional presentations, and lack of accountability were all symptomatic of a lack of care for Black life. Lack of care doesn’t seem nefarious until you consider the ways this approach to Black public history manifests deeper issues…Care is also answering Black Studies scholar Christina Sharpe’s inquiry, ‘How do we memorialize an ongoing event?’ What is the violence wreaked by a capitalist, white supremacists, patriarchal society?” Twitter, July 8, 2020, 7:06 p.m.,
  25. Michelle Caswell and Marika Cifor, “From Human Rights to Feminist Ethics: Radical Empathy in the Archives,” Archivaria 81 (spring 2016): 24.
  26. Mario Ramirez, “Being Assumed Not to Be: A Critique of Whiteness as an Archival Imperative,” American Archivist 78, no. 2 (fall/winter 2015): 352,
  27. Michelle Caswell et al., “‘To Be Able to Imagine Otherwise’: Community Archives and the Importance of Representation,” Archives and Records 38, no. 1 (2016): 5-26,; Michelle Caswell, Marika Cifor, and Mario H. Ramirez, “’To Suddenly Discover Yourself Existing’: Uncovering the Impact of Community Archives,” American Archivist 79, no. 1 (spring/summer 2016): 56-81,; Michelle Caswell, “Teaching to Dismantle White Supremacy in the Archives,” Library Quarterly 87, no. 3 (July 2017): 222-235,; Jarrett M. Drake, “Liberatory Archives: Towards Belonging and Believing (Part 1),” On Archivy (blog), October 22, 2016,; Anthony W. Dunbar, “Introducing Critical Race Theory to Archival Discourse: Getting the Conversation Started,” Archival Science 6, no. 1 (2006): 109-29,; Isto Huvila, “Participatory Archive: Towards Decentralised Curation, Radical User Orientation, and Broader Contextualisation of Records Management,” Archival Science 8 (2008): 15-36,; Bergis Jules, “Confronting Our Failure of Care Around the Legacies of Marginalized People in the Archives,” On Archivy (blog), November 11, 2016,
  28. GVGK Tang, “We need to talk about public history’s columbusing problem,” History@Work (blog), National Council on Public History, June 25, 2020,
  29. Ibid.
  30. Eira Tansey, “No One Owes Their Trauma to Archivists or the Commodification of Contemporaneous Collecting,” Eira Tansey (blog), June 5, 2020,
  31. Ibid.
  32. Archivists define accountability as the capacity to “answer for, explain, or justify actions or decisions” that are the responsibility of a system, individual, or corporate entity. What is more, archivists have identified accountability as a core professional value, and one that should guide daily practices and characterize professional intentions. Archives’ role in supporting accountability gives archives power—power that is exercised during daily processes like deciding what gets saved, who gets access, and how records are represented, which ultimately impacts how we remember the past. Nevertheless, the work of the archivists cannot foster accountability by itself; this is also the work of record creators and researchers, and in fact, the complexities of collecting for evidentiary purposes from the very institutions that often fund the archive can introduce tensions or, worse, interference. Nevertheless, archives play a key role in cultivating transparency as they engage in this work. Dictionary of Archives Terminology Society of American Archivists, s.v. “accountability (n.),” accessed August 3, 2021,; SAA Core Values Statement and Code of Ethics,” Society of American Archivists, last modified March 30, 2018,; Heather Briston, “On Accountability,” in Archival Values: Essays in Honor of Mark A. Greene, ed. Christine Weideman and Mary A. Caldera (Chicago: Society of American Archivists, 2019), 76-81. Randall C. Jimerson, Archives Power: Memory, Accountability, and Social Justice (Chicago: Society of American Archivists, 2009), 246-247.
  33. Tansey, “No One Owes Their Trauma to Archivists or the Commodification of Contemporaneous Collecting.”
  34. Merriam-Webster, s.v. “record (n.),” accessed March 1, 2021,
  35. Howard Zinn, “Secrecy, Archives, and the Public Interest” Midwestern Archivist 2, no. 2 (1977), 21.
  36. Jimerson, 22-24. In summarizing other scholars, Jimerson highlights the role that archives play in legitimizing and sanctifying certain documents over others and the role that archivists play in representation and controlling access.  The consequence, he notes, is a reinforcement of that which is culturally dominant: powerful, well-resourced, white, cis, male.

MilliCent / David Rosenthal

When I retired more than 4 years ago a top-priority task to keep me occupied was cleaning out the garage. It turned out that there were a lot of other things to do, and I never made it to the La-Z-Boy, let alone the mess in the garage.

As this Labor Day long weekend approached Vicky became very insistent that that we at least start actually doing some clearing. Our first target was the many boxes of books, a good portion of which were from the eclectic collection of the late Mark Weiser. In among them we found this 1998 CD, a relic of the early days of the Web when it was generally understood that the Web's business model would be micropayments.

Below the fold I discuss the history of what Paul Krugman would probably call a "zombie idea".

In the late 90s as the Web was becoming an arena for commercial interests it became obvious that content providers needed to be rewarded for their efforts. There were two competing models of how this could be done; advertising and micropayments. In the advertising model companies would pay Web sites for including their message in pages that were free to view. In the micropayment model visiting a page would automatically deduct a payment from the reader's wallet, with the amount being so small it would not be a disincentive. MilliCent is one of the four such systems Wikipedia discusses under Early research and systems. It was a project of Digital Equipment's Systems Research Center.

The protocol was described in The Millicent Protocol for Inexpensive Electronic Commerce by Steve Glassman, Mark Manasse, Martín Abadi, Paul Gauthier and Patrick Sobalvarro. The basic idea was that users made infrequent, relatively large transfers of funds to brokers, and in return obtained digital scrip (think coins) with which to make small payments. The goal was to reduce transaction costs for small payments as much as possible because, like most micropayment advocates, they considered the barrier to monetizing Web content to be transaction costs.

Clay Shirky's 2000 The Case Against Micropayments debunking starts by capturing the end-of-century hype around micropayments:
Jakob Nielsen, in his essay The Case for Micropayments writes, "I predict that most sites that are not financed through traditional product sales will move to micropayments in less than two years," and Nicholas Negroponte makes an even shorter-term prediction: "You're going to see within the next year an extraordinary movement on the Web of systems for micropayment ... ." He goes on to predict micropayment revenues in the tens or hundreds of billions of dollars.
Before providing a detailed analysis of their failure, Shirky provides the TL;DR:

The Short Answer for Why Micropayments Fail

Users hate them.
Why does it matter that users hate micropayments? Because users are the ones with the money, and micropayments do not take user preferences into account.
An even more detailed debunking was Andrew Odlyzko's 2003 The Case Against Micropayments. It again focused on the user's preferences. While all their arguments are valid, they miss what I think is a much simpler reason why micropayments didn't become the Web's business model.

Micropayments were competing with advertising to be the mechanism for rewarding Web content providers. Shirky and Odlyzko were correct that readers hated micropayments, but even in those days before the full flowering of intrusive Web advertising, readers hated ads too. But the choice of monetization strategies wasn't up to the readers, it was up to the content providers. They were faced with a choice between a mechanism that readers hated and had to cooperate with, and one that readers hated but didn't need their cooperation.

Advertising was an incumbent business model for content providers in other channels, for example TV, radio, newspapers and magazines. To displace it, micropayments needed to be sufficiently better for the content providers than advertising. But, from their point of view, micropayments had a major disadvantage. They depended upon the reader doing something other than simply accessing the content. A business model in which the reader was passive was obviously less risky than one in which the user was active. It isn't necessary to delve into the contorted UX (User Experience) of setting up accounts with brokers, monitoring account balances, refilling accounts, and deciding whether to purchase. Because the decision to adopt micropayments was for the content provider, not for the reader, the risk of an untried technology that depended on readers cooperating was the key barrier. Transaction costs were at most a second-order problem. The focus of micropayment system developers like the MilliCent team on reducing them was essentially irrelevant.

Despite these stakes through its heart, the zombie idea of micropayments refused to die. In early 2009 it formed a significant part of Satoshi Nakamoto's case for Bitcoin:
The root problem with conventional currency is all the trust that's required to make it work. The central bank must be trusted not to debase the currency, but the history of fiat currencies is full of breaches of that trust. Banks must be trusted to hold our money and transfer it electronically, but they lend it out in waves of credit bubbles with barely a fraction in reserve. We have to trust them with our privacy, trust them not to let identity thieves drain our accounts. Their massive overhead costs make micropayments impossible.
This was one of the many failures of Nakamoto's vision. With average transaction fees currently about $3.56 the "massive overhead costs" make micropayments infeasible. But this vastly understates the "massive overhead costs" of Bitcoin. First, when as now demand for transactions is low fees are low. As the graph shows, when demand for transactions is high fees are high, spiking in April to $60. Second, the actual cost of a Bitcoin transaction comprises not just the fee but also the mining rewards. Currently, fees are under 10% of the total mining reward, so that the total cost of a $3.60 average fee transaction is actually $36.

PS: Note that the CD contains tracks by Severe Tire Damage, the first band to play live on the Internet. The band's core members were:

Exploration and Consultation: The OCLC RLP Trajectory / HangingTogether

As we are at the midpoint of 2021 and looking forward to what lies ahead, I want to take a moment to extend my thanks and gratitude to those of you who are members of the OCLC Research Library Partnership (OCLC RLP). It’s invigorating to see that the programming, expertise, and opportunities to connect continue to resonate with our Partners. Engaging with our Partners, and closely listening to feedback drives the direction and the application of our work. 

“Stop and learn” to better meet community needs

Cover of the OCLC Research Report, Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections

During the last year, the RLP team has learned that now is not the time to rush and rely on default practices. To grow, change, and contribute toward better supporting our communities, it is crucial that we pause, reflect, and make the time to reimagine our work. Part of this reimagining is to shift effort up front by planning time at the start of a project or new effort to connect with stakeholders, check assumptions, and invest in relevant learning. This strategy can help to achieve important goals and meet our communities’ expectations. The need to “stop and learn,” a sentiment expressed by Dorothy Berry (Houghton Library, Harvard University) at the Reimagine Descriptive Workflows convening, is a strong signal that we are hearing from our Partners across our programmatic focus areas.

This concept of shifting effort up front strongly informed The Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections. The Total Cost of Stewardship Framework introduces a holistic approach to responsible management of archives and special collections. This work has been enthusiastically embraced by the archives community and been described as “game-changing.” A major learning gleaned during this work is that responsible stewardship and transparent communication can help build trust with a diverse range of stakeholders.

Metadata innovations

Cover art for the Next Generation of Metadata report

We have also heard loud and clear from our Metadata Managers and others about the importance of more inclusive and anti-racist description. Our 2017 survey on EDI activities in the RLP identified harmful language as a focus area for many of our institutions. Based on what we learned from the Partnership survey, we’ve continued research and action in this area, leading to a grant award from the Mellon Foundation for a major global convening on this topic: Reimagine Descriptive Workflows. In this blog post, Merrilee Proffitt reports on the June convening and lays out our next steps.

The report Transitioning to the Next Generation of Metadatawhich was published last year, synthesizes six years (2015–2020) of OCLC RLP Metadata Managers focus group discussions. One of the most downloaded reports on our website, this research informed discussions about the future of metadata across Europe, and we’ve been asked to contribute to many conferences and journals on the basis of this publication.

The OCLC RLP Metadata Managers focus group has made a difference for OCLC staff in understanding the needs of metadata managers at research institutions.

For example, the continued investment of this group in discussing the importance and role of identifiers has been critical to the more than 10 years of investment that OCLC has made in linked data research, which is now moving into production in OCLC services. Additionally, the work on enriching  WorldCat with Cyrillic for Russian and other languages has been tremendously impactful. We are now actively expanding that work to the good of all who use WorldCat. This makes those records more useful and those resources more discoverable, in WorldCat and beyond. That is an outcome of group discussions—a small idea that blossomed into action.

What’s new and next for the OCLC RLP

We are currently creating our end-of-year slate, which will include the publication of the RIM in the United States report, webinar and additional reporting out from the NAFAN project, additional updates from the Reimagine Descriptive Workflows project, and many impactful webinars, such as the upcoming, “Ngangaaanha (to care for)—Improving cultural safety at the University of Sydney Library.”

We are also pleased to share information about a new project to identify collaborative models to support the continued availability of the art research collective collection. The Operationalizing the Art Research Collective Collection (OpArt) project will help art libraries identify collaborative, sustainable operational models. OpArt is supported through a grant by the Samuel H. Kress Foundation with significant co-investment from OCLC. Look for more information in the coming months.

Maximize planning with OCLC RLP

A group of people around a table having a discussion.Working with the OCLC Research Library Partnership

I’d like to highlight another way to leverage the RLP Partnership: utilizing the RLP team of experts in strategic planning processes. For example, this past year Senior Program Officer Rebecca Bryant provided direct consultation with senior university leaders as they examined their goals and investments in research information management (RIM) infrastructure. In these consultations, Rebecca not only provided context about RIM systems and their potential to university libraries and senior library staff, but also met with institutional leaders including a VP of research, associate provost for faculty affairs, and a CIO. And, she provided formal presentations to RIM task force committees. Each of our program officers is ready to provide insight for your planning. This is a benefit to your RLP membership, and we welcome the opportunity to serve you. 

Looking forward with the OCLC Research Library Partnership

If you would like to learn more about how to get the most out of your RLP affiliation, please contact me or your RLP liaison, and we will be happy to set up a virtual orientation or refresher on our programming for you and your staff. If you are not yet part of the OCLC Research Library Partnership, we invite you to join.

It is with deep gratitude that I offer my thanks to our partners for your investment in the Research Library Partnership. We are committed to offering our very best to serve your research and learning needs.

The post Exploration and Consultation: The OCLC RLP Trajectory appeared first on Hanging Together.

Catching up with past NDSA Innovation Awards Winners: Dorothea Salo / Digital Library Federation

In 2017, Dorothea Salo received the NDSA Innovation Award in the Educators category for her development projects, RADD (Recovering Analog and Digital Data), PROUD (Portable Recovery of Unique Data), and PRAVDA (Portably Reformat Audio and Video to Digital from Analog). These projects were designed to extend the reach of digitization and preservation tools to those without the resources of large-scale memory institutions. Today, she is a Distinguished Faculty Associate in The Information School at University of Wisconsin-Madison and took some time to offer us an update on these projects, her work, and her plans.

What have you been doing since receiving an NDSA Innovation Award?

A little bit of everything, as always! The iSchool is in a time of significant change, from joining the brand-new Computer, Data, and Information Sciences division to hiring several new faculty to launching an entire new MS Information degree. I’ve been building and teaching a bunch of new courses, working with a peerless team of co-investigators on the Data Doubles research project, doing solo work on library privacy, teaching for the Digital POWRR workshop series… and, of course, surviving (knock on wood) the COVID pandemic. Right now I’m teaching an undergraduate computer-science course, the first time I ever have.

What did receiving the NDSA award mean to you?

Paraphrasing my favorite actress from my favorite movie: “it makes me feel as though my hard work ain’t been in vain for nothin’.” Quixotic solo projects like RADD can absolutely feel frustratingly pointless at times. I can’t say enough about how much I appreciated recognition from a group of people as wise, experienced, and pragmatic as NDSA. Bless you all!

What efforts/advances/ideas of the last few years have you been impressed with or admired in the field of data stewardship and/or digital preservation?

Oooh, let me check my Pinboard… I definitely think the Oxford Common File Layout and the Portland Common Data Model are valiant and worthwhile attempts to solve real issues in an efficient and effective way. I’m always grateful for NARA’s work, like their Digital Preservation Framework on GitHub. The revised NDSA Levels of Digital Preservation are terrific. On a lighter note, I also really appreciate the Australasia Preserves and Digital Preservation Coalition YouTube channels for their karaoke takes on preservation. They’re so fun and great!

How have the RADD, PROUD, and PRAVDA projects evolved since you won the Innovation Award? 

Less than I wish they had – I just haven’t had the time or the strength. I have managed to get several pieces of equipment properly overhauled and repaired, which given that I have no dedicated or reliable budget and repairs are expensive is a feat. (Of course as soon as I say this – I have three Digital8 cameras that are all aggravatingly broken in different ways…) I’ve gotten some projects done for folks, though the pandemic made that extra-difficult. The PROUD and PRAVDA kits did a fair bit of traveling (including by air) and demos pre-pandemic, and they have held up like troupers. I couldn’t know in advance how well that would work, so I’m pleased to say that it’s been fine, no equipment casualties whatever. 

What I’m really rethinking now is the project model. I’ve demonstrated to my own dissatisfaction that I can’t manage RADD well as an all-comers rescue service: there isn’t enough of me, digitizing A/V takes too long, equipment breaks unpredictably, when the rig is in use I can’t take it out of service to improve it, and random-project work is too unpredictable to schedule. I’m tentatively thinking about an approach with a few more guardrails that provides more and better opportunities for iSchool students to get to know RADD and work with it.

What do you currently see as some of the biggest challenges in digitization and preservation for smaller memory institutions? 

You know, there was a time I would reflexively have yelled “funding!” in response to this question. Don’t get me wrong, funding is absolutely still a big obstacle! But the obstacle behind the funding obstacle, I think, is ignorance about what this work actually requires – everybody’s ignorance, from the general public to journalists to funders to legislators… all the way to actual information professionals. 

I went completely ballistic a couple of years back over a painfully ignorant, wrongheaded, and condescending article in Wired that came out shortly after the devastating Brazil national-museum fire, an article calling blithely for a “digital backup of cultural memory” with absolutely zero understanding of the magnitude and cost of such an undertaking. We can’t possibly get the funding to do the work we desperately have to do until there is a general understanding of very basic phenomena such as “audio and video digitize in real time.” 

Info pros don’t always make this better. I was told by a Digital POWRR participant that in some formal continuing education they’d done, the instructor, a respected archivist, had told them it was impossible to rescue data off digital media without a multi-thousand-dollar FRED device. If that’s what that archivist actually said (and it may not be, human memory being fragile)… it’s nonsense! PROUD rescues data from several common types of digital media at a small fraction of the cost of a FRED, and far more portably! This just breaks my heart, because when learners go home thinking they’ll never have the equipment budget, data will die of neglect. I built RADD, PROUD, and PRAVDA because I didn’t think it has to be this way. I still don’t!


The post Catching up with past NDSA Innovation Awards Winners: Dorothea Salo appeared first on DLF.

Paid hourly student research programmer position at UIUC for Fall 2021: network visualization in Python with NetworkX / Jodi Schneider

My Information Quality Lab is seeking a student research programmer (graduate hourly/undergraduate hourly) to do network visualization in Python with NetworkX this semester.

REQUIRED background:

  • Programming experience in Python
  • Elementary knowledge about network analysis including nodes, edges, attribute list, edge list, and adjacency matrices
  • How to read, store, and retrieve network data from a network object
  • Interest in or experience with NetworkX
  • Interest in or experience with visualization

PREFERRED background:

  • Experience in a research or R&D environment
  • Familiarity with publication and citation data

The immediate goal is to reformat dynamic network visualizations in a conference paper for a journal article to be submitted this semester (publication credit possible in addition to pay). Data for this is publicly available:
A conference paper describes the underlying ideas

This person will also develop utilities to be used in future network visualizations (e.g. an ongoing analysis of a similar but larger network where other aspects, e.g. co-authorship and data cleaning, will also be relevant).

Application details in Virtual Job Board

Humane Ingenuity 40: In Sight / Dan Cohen

I’m back from a summer hiatus — perhaps not into the carefree fall I (and you) had hoped for. But with students streaming once again into my library, the beginning of this academic year still has that rejuvenating anticipation of new experiences and encounters — a prompt for all of us to shake out of our complacency, to open ourselves once again to new ways of seeing.


Seeing as an underexplored, strange experience animates the art of James Turrell. Our family made one of our pilgrimages to Mass MoCA to see the new Turrell exhibit “Into the Light,” which I recommend if you can make the journey to the far northwestern corner of Massachusetts. The exhibit restages some of his classic approaches to abstract lightwork, including a room where a floating pink cube is actually, somehow, an inset into a curved wall, and darkened spaces with just enough reflected light to confuse and, ultimately, enthrall.

Turrell’s art can be read on many levels, and I am too amateur an art critic to give it that that proper multilevel reading, but my shorthand for what he is trying to do — beyond art and architecture’s traditional interest in color, form, space, and the interactions thereof, and the presentation of some deeply engaging, often transcendent experiences, like his now ubiquitous Skyspaces — is the disaggregation of seeing itself.

Two years ago on an episode of the What’s New podcast, I interviewed Ennio Mingolla, the head of the Computational Vision Laboratory at Northeastern University, and Ennio briskly shook up my ill-conceived, almost comically oversimplified notions about seeing. Human sight is not even close to a representation of the world around us, with the eye like the megapixel sensor at the heart of a digital camera. Instead, it is an aggregation of many distinct skills we have accumulated over the course of evolution, such as the ability to separate objects from backgrounds, the sense of when an object is coming toward us or moving away, and the talent of discerning colors at the periphery or in the center of our field of vision. Together, through a mysterious process in the brain, these elements are nearly instantly synthesized into something comprehensible, appearing as ho-hum as a hotel lobby painting.

Turrell rips that visual complacency apart, presenting to the eye profoundly abnormal situations that confront us with the wonder of vision itself. In his most powerful work in the Mass MoCA exhibit, shown above, you are placed into a cavernous room with no defined edges and an ever-shifting “screen” of color. In this environment, your brain cannot perform its tricks: it is unclear how far away the walls and screen are, or even if they firmly exist, and your peripheral vision and focal vision seemingly reverse their roles.

Because your regular assembly of sight has been scattered — the magic of the synthesis dispelled — you “see” new things. Subtle transitions between the screen tones make the room feel like it’s in a hazy cloud; the color you see on the backs of your eyelids when you blink changes repeatedly and begins a conversation with your open-eyed view; and when a strobe light startlingly comes on, for the first time you see…well, I don’t want to ruin the whole thing for you. Go see it. (And if you do, get reservations weeks in advance; they only let a dozen people in at a time, which makes it even more special.)

The larger lesson of Turrell’s art is that our apparently obvious views are complicated composites that should be challenged and deconstructed. Do not be lulled by the faux Rothko and mellow Musak in the hotel lobby. As this season of the Humane Ingenuity newsletter begins, I invite you to put yourself in the headspace of a first-year college student, curious and skeptical, averting your eyes from our monochromatic media landscape as you seek a more subtle and colorful world.

[For some of my previous writing on Mass MoCA exhibits, see “The Artistic and the Digital” (2007); “Sol LeWitt and the Soul of Creative and Intellectual Work” (2008); “For What It’s Worth: A Review of the Wu-Tang Clan’s ‘Once Upon a Time in Shaolin’” (2016)]

A visualization of the logos of 5,837 metal bands, grouped by theme, similarity, font, and “13 Dimensions of Doom”:

Screen Shot 2021-09-08 at 9.04.24 AM.png

Extra headbanging points awarded for the creative use of:

Screen Shot 2021-09-08 at 9.13.18 AM.png

An editor’s note about my media production: Over the summer, and as I have to remind myself to do every few years, I once again consolidated what I write (and broadcast and post) on my online home for the last twenty years, I am still planning to use Buttondown to send this newsletter to those who like to receive it by email; I like supporting small developer shops, and Justin does a great job with the mechanics of newslettering. But I’m moving finished issues back over to my own domain, so they can commingle and be archived with my other work rather than living elsewhere.

For those new to Humane Ingenuity, that means that you can now access back issues on my own domain, and that’s also where you can subscribe (as always, for free) to the newsletter.

Slow life, slow librarianship / Meredith Farkas

snail image

It’s been a quiet summer over here, focused on family, recovering from the stress of the academic year, and doing a lot of reading. I’d had fantasies of getting a lot of writing done over the summer (more on that below), but I didn’t get nearly as much done as I’d hoped. I’m trying to be very gentle with myself. I know I’m burnt out and emotionally exhausted. I’m dealing with stressful family health issues. I feel demoralized at work between the College trying to take away faculty and staff cost-of-living increases (which proved unsuccessful — woo hoo! union strong!), the increasing lack of voice and agency for faculty and staff in an ever-expanding hierarchy, and the fun of working during a pandemic. And I’m feeling really okay with not getting much done. I had some wonderful moments with my family this summer and that is without question the most important thing I could have accomplished in these months.

That would not have cut it a few years ago; I’d have been beating myself up for my laziness and lack of productivity. Back in the day, the worse I felt, the harder I’d work. I’d bury myself in work to focus on something other than my feelings. I’d work hard in the hopes of getting external validation that might make me feel better (spoiler: it never did). I taught classes through migraines and told myself that it would take my mind off the pain. And in a way it did, but these strategies of muscling through pain just led to burnout. We need to feel our feelings. We need to rest when our body or our mind is unwell. I’m immensely proud of the fact that I have not once felt guilty for not getting a lot done lately and I’ve resisted the pull to get involved in things that wouldn’t give me time to prioritize taking care of myself and my family. It’s taken a long time to get into recovery for my workaholism and I still do slip into bad habits from time to time, but those are getting fewer and I’m getting better and better at saying no to things.

One of the hardest parts of recovering from workaholism is having colleagues who still are active workaholics, constantly go above and beyond, and have very few boundaries. I don’t worry much about how my performance looks compared to theirs (though I used to), but I sometimes feel like I’m abandoning them. Last year, I was determined not to work on our annual instructional assessment project because I had worked on it or led it for so many years in a row and it is hard thankless work. But then my friend (who has worked on it even more times than I have) volunteered to lead and no one else was volunteering to help so I didn’t want her to have to do it alone. I’m struggling with the conflict between having boundaries and being in solidarity with my fellow workaholics. In the end though, I can’t make other people erect boundaries, and if I am ever to truly recover, I have to stay true to my own.

For the past couple of years, I’ve been thinking about something I call slow librarianship. It was in response to the realizations I had about my workaholism and the ideas I explored around ambition, striving, productivity, self-optimization, and achievement culture on this blog two years ago. It felt like the answer to all this was to slow down, to notice and reflect, to focus more on being true to our values than innovating, to build relationships, to really listen (to our communities our colleagues, and ourselves), and to be in solidarity with others. I then discovered that another librarian, Julia Glassman, had written an essay in 2017 sharing her vision of slow librarianship called “The innovation fetish and slow librarianship: What librarians can learn from the Juicero.” In it, she brought up many of the concerns I have about achievement culture, reward structures that create a sense of scarcity and thus toxic competition, and the focus on flashy innovative work. While she wrote that defining slow librarianship was beyond the scope of her essay, I think she got a pretty good start with the last sentence she offers:

Perhaps, if we reject the capitalist drive to constantly churn out new products and instead take a stand to support more reflective and responsive practices, we can offer our patrons services that are deeper, more lasting, and more human.

It sounds about right to me. I can’t tell you how good it felt to see that someone was thinking along the exact same lines as I was. Thank you Julia for starting this conversation!

Last Fall, I gave a talk at the New York Library Association’s annual conference where I started sketching out my vision of slow librarianship. And I’ve been surprised since then to have been asked to speak at 8 different events on the topic (many of which I’ve turned down due to my own focus on slow living), when I’ve barely been asked to do any speaking outside of my state in YEARS. Clearly, we’re at a place where people are questioning the roles whiteness and capitalism play in our work and are looking for a new path. I’ve done a lot of reading and have made refinements since the NYLA talk and I have so many things in my head that I want to get onto the page. I started work this summer on something. I’m thinking it will be a book, since I’ve already written 9,000 words and have barely scratched the surface. I don’t exactly know what I’ll do with it when I’m done, but I know I want it to be open access and I don’t want it to be some perfectly polished scholarly product.

In my first draft of the first chapter of whatever it is I’m writing, I defined slow librarianship this way:

Slow librarianship is an antiracist, responsive, and values-driven practice that stands in opposition to neoliberal values. Workers in slow libraries are focused on relationship-building, deeply understanding and meeting patron needs, and providing equitable services to their communities. Internally, slow library culture is focused on learning and reflection, collaboration and solidarity, valuing all kinds of contributions, and supporting staff as whole people. Slow librarianship is a process, not a destination; it is an orientation towards our work, ourselves, and others that creates positive change. It is an organizational philosophy that supports workers and builds stronger relationships with our communities.

I’ve been thinking a lot about how individualism is at the root of so many of our problems and how things like solidarity, mutual aid, and collective action are the answer. Capitalism does everything it can to keep us anxious and in competition with each other. It gave us the myth of meritocracy – the idea that we can achieve anything if we work hard enough, that our achievements are fully our own (and not also a product of the privileges we were born to and the people who have taught us, nurtured us, and helped us along the way), and that we deserve what we have (and conversely that others who have less deserve their lot in life). It gave us petty hierarchies in the workplace – professional vs. paraprofessional, faculty vs. staff, full-time vs. part-time, white-collar vs. blue-collar – that make us jealously guard the minuscule privilege our role gives us instead of seeing ourselves in solidarity with all labor. It’s created countless individual awards and recognitions that incentivize us not to collaborate and to find ways to make ourselves shine. It’s created conditions of scarcity in the workplace where people view their colleagues as threats or competitors instead of rightly turning their attention toward the people in power who are responsible for the culture. This is how the system was made to work; to keep us isolated and anxious, grinding away as hard as we can so we don’t have time or space to view ourselves as exploited workers. It is only through relationships and collaboration, through caring about our fellow workers, through coming together to fight for change, that things will improve. But that requires us to focus less on ourselves and our desire to shine, rise, or receive external recognition, and to focus more on community care and efforts to see everyone in our community rise. It goes against everything capitalism has taught us, but we’ll never create meaningful change unless we replace individualism with solidarity and care more about the well-being of the whole than the petty advantages we can win alone.

I’m honestly really proud of myself for working so slowly on this. It used to be that I’d stay up until 2am writing if I felt passionately about something, so impatient to get my thoughts out of my brain and onto the screen. At the pace I’m working and with the academic year starting, it’s going to be a long time before this book sees the light of day, so I thought I’d share some things I’ve read, watched, and listened to that really influenced my own thinking (thank you all for your labor in getting these ideas out there and letting me learn from you!!). I hope these are just as inspirational for you as they have been to be.

(sorry my citations are sloppy and I don’t always include the url to articles (you know how to google/google scholar) )

Andrews, Nicola. “It’s Not Imposter Syndrome: Resisting Self-Doubt as Normal For Library Workers.” In the Library with the Lead Pipe, 2020.

Bowler, Kate. Everything Happens for a Reason: And Other Lies I’ve Loved. Random House, 2018. (Kate’s podcast is also consistently AMAZING!!!)

brown, adrienne maree. Emergent Strategy: Shaping Change, Changing Worlds. AK Press, 2017.

Ettarh, Fobazi. “Vocational awe and librarianship: The lies we tell ourselves.” In the Library with the Lead Pipe 10 (2018).

Ferretti, Jennifer A. “Building a Critical Culture: How Critical Librarianship Falls Short in the Workplace.” Communications in information literacy 14.1 (2020): 134-152.

Gallagher, Brian. “How Inequality Imperils Cooperation.” Nautilus, 9 Jan. 2020,

Glassman, Julia. 18 Oct. 2017. “The innovation fetish and slow librarianship: What librarians can learn from the Juicero.” In the Library with the Lead Pipe, 18 Oct. 2017,

Graeber, David. “After the Pandemic, We Can’t Go back to Sleep.” Jacobin, 4 Mar. 2021,

Graeber, David. The Utopia of Rules: On Technology, Stupidity and the Secret Joys of Bureaucracy. Melville House, 2016.

Han, Byung-Chul. “The Tiredness Virus.” The Nation, 12 Apr. 2021,

Han, Byung-Chul. “Why Revolution Is No Longer Possible.” OpenDemocracy, 23 Oct. 2015,

Headlee, Celeste. Do Nothing: How to Break Away from Overworking, Overdoing, and Underliving. Harmony, 2020.

Honoré, Carl. In praise of slowness: Challenging the cult of speed. Harper Collins, 2009.

Hudson, David J. “The Displays: On Anti-Racist Study and Institutional Enclosure.” up//root: a we here publication. October 22, 2020.

Soooooo many episodes of the podcast Hurry Slowly inspired me — they’re too numerous to name.

Kendrick, Kaetrena Davis. “The low morale experience of academic librarians: A phenomenological study.” Journal of Library Administration 57.8 (2017): 846-878. (all of Kaetrena research and writing is amazing and her more recent works are OA, so look ’em up!)

Leung, Sofia and Jorge López-McKnight (Eds.), Knowledge Justice: Disrupting Library and Information Studies through Critical Race Theory. MIT Press, 2021.

Mountz, Alison, et al. “For slow scholarship: A feminist politics of resistance through collective action in the neoliberal university.” ACME: An International Journal for Critical Geographies 14.4 (2015): 1235-1259.

Nicholson, Karen P., Jane Schmidt, and Lisa Sloniowski. 2020. “Editorial.” Canadian Journal of
Academic Librarianship
6: 1–11.

Nicholson, Karen P. “The” value agenda”: Negotiating a path between compliance and critical practice.” Canadian Libraries Assessment Workshop (CLAW) 2017 Conference, Victoria, BC.

Ndefo, Nkem. “Nkem Ndefo on the Body as Compass.” In Young, Ayana. For the Wild podcast, 24 March 2021,

Odell, Jenny. How to do nothing: Resisting the attention economy. Melville House Publishing, 2020. (if you don’t have time for her book, the conference talk she gave that became the book is excellent!)

Okun, Tema. White Supremacy Culture.

Parkins, Wendy. “Out of time: Fast subjects and slow living.” Time & Society 13.2-3 (2004): 363-382.

Petersen, Anne Helen. “Why Office Workers Didn’t Unionize.” Culture Study, 18 Oct. 2020,

​​Petrini, Carlo. Slow food: The case for taste. Columbia University Press, 2003.

Sandel, Michael J. The Tyranny of Merit: What’s Become of the Common Good? Penguin Books, 2021.

Seale, Maura, and Rafia Mirza. “The Coin of Love and Virtue: Academic Libraries and Value in a Global Pandemic.” Canadian Journal of Academic Librarianship/Revue canadienne de bibliothéconomie universitaire 6 (2020): 1-30.

Solnit, Rebecca. “When the Hero Is the Problem.” Literary Hub, 2 Apr. 2019,

Spade, Dean. Mutual aid: Building solidarity during this crisis (and the next). Verso Books, 2020.

Walters, Alicia. “Centering Blackness: A World Re-imagined.” In Parker, Priya. Together Apart podcast, 17 June 2020,

Weber, Max. 2001 [1930]. The Protestant Ethic and the Spirit of Capitalism. New York, NY: Routledge.

Wolff, Richard D. Democracy at Work: A Cure for Capitalism. Haymarket Books, 2012.

Image credit: slow by elycefeliz on Flickr (CC-BY-NC-ND)

New job, yay! / Coral Sheldon-Hess

I started a new job last week. I’m excited about it! I also want to issue this reminder: as has always been the case, my posts here represent my own feelings and opinions and not those of any employer, past, present, or future.

First, the truly excellent news: I’ve joined Coiled Computing, a startup (the company is 1 year old as of February) that offers, if I can try to sum it up in a phrase, a platform for distributed data science.

The company was founded by the creator of Dask, a library that allows scaling/parallelization of the Python data stack: your NumPy array or pandas dataframe won’t fit in memory? That’s where Dask comes in, using very nearly the same syntax. Dask is open source and free for anyone to use, and having spent part of the last week learning it, myself, I have to say: it’s pretty dang neat!

While Dask will remain open source and free to use, Coiled’s deal is that we offer managed Dask on the cloud (GCP, AWS, and Azure), simplifying things for companies and individuals who don’t want to do all their own DevOps for distributed data science processing. There’s a free tier, so if that sounds like a thing you want to try out, but you don’t have any budget allocated, I mean… go wild (but not too wild 😉).

As for what I’ll be doing at Coiled: To start, I will be wearing a few hats, including customer support, quality assurance, and (soon, but not just yet) software development, including integration testing. I may or may not also be contributing to the docs—I’m taking notes as I go through them, anyway. Obviously, that many different roles won’t be sustainable forever, so after a while I’ll specialize. We’ll see what I like best and am best at, as well as what the company most needs at that point.

Between you and me, I have a little bit of a worry that, for instance, QA and automated testing is going to be where I want to focus, but customer support will be where they need me the most and/or what I’m best at, which will make for an imperfect situation. But, to reassure myself and potentially any coworker who might see this, I hurry to add: even if that ends up being the case, I can think of multiple people who have kept several hats on, long-term. I’m sure I’ll find opportunities to contribute in ways I find satisfying.

So that’s all fantastic, and I’m genuinely super excited about it! A full-time job in the Python data ecosystem! I’m overjoyed.

I also have to acknowledge, though, that I’m currently going through a lot, and despite how much I want! to start! contributing! right! away!, I’m not really at my best, as I start this new endeavor. My focus isn’t as good as it needs to be, and at only a week in, I already feel like I’m behind where I “should” be. (My manager does not seem concerned, but I still worry.) I’ll talk more in a minute about how the job change timing happened—no, surprisingly, I did not actually set out to start a new branch of my career during a pandemic, effectively throwing away 2+ years’ worth of effort in my last job—but also, we lost our most sociable pet bird, who had (to be maybe too honest) been getting me through the pandemic, on the day before my job was supposed to start, and I’ve been an absolute mess since then. Just. Absolutely destroyed. And then, in the same week, the highest court in my country downgraded the amount of human I am considered to be, so… it’s been a lot.

On one hand, the fact that I’ve learned anything at all this week is pretty good. If I was looking at any other person in this situation, accomplishing the same amount, I’d be impressed with them. But—I’m coming from academia, so this surprises nobody—I’m an anxious perfectionist who holds themself to very high standards. And—probably also to nobody’s surprise, given the ongoing pandemic, the loss I’ve just undergone, and the near-burnout I’m still nursing from my last position—I cannot possibly meet my own standards for myself, right now. Outside of work, I mostly just sleep, read romance novels, and watch TV I’m too embarrassed to name, even for the limited posterity represented by a blog post. I’ve found us a meal delivery service that doesn’t throw something I’m allergic to into every dish, so I am at least eating healthy food. At work, I can fake “chipper” during meetings OK, and I’ve picked up some pretty good tricks while shadowing coworkers on support calls; but when left to my own devices, I have trouble concentrating. I’m spending some of my time reading docs and working through tutorials, as one does in a new position. I am learning(!), but it’s so slow. “I know I just read a paragraph and ran some code, but I don’t know what it said or did, so I have to do it over”: that kind of slow.

I’m certain it will get better as I work through my grief and my near-burnout. It looks like we’ll be under pandemic conditions for a while, still, so I don’t know when/if I will be back at my very best. But even at less than my best, I tend to be a really good employee, far better than I’ve been this last week. I just … the timing is rough. My boss and grand-boss are both being really patient and kind, though, so while I am worried about the impression I’m making, I am not worried that I’ll lose the job. I’ll get it together well before I’m in danger, there.

Now I’m going to switch tracks, a bit, and talk about leaving my last job. Generally, in professional circles, we all agree not to say bad things about previous employers. However, given that they broke not only the law but also their own commitments (referred to as “the 5 Cs” or “the 5 [Employer Name] Commitments to Diversity, Equity and Inclusion”), I am not feeling the pressure to smooth things over that I usually would. Frankly, when an organization deliberately mistreats disabled and feminized employees, they do not deserve to be protected by professional norms.

So. Although I technically resigned, I felt very much under duress about it and continue to carry a lot of anger and sadness. And guilt, because my leaving made several students’ and coworkers’ lives harder. And relief, too, because I was not treated well in that position, in ways that I didn’t even fully internalize until I went to work for a company who treats me like a human. And, again, while I want to talk about what happened, I do not want to detract from my excitement about my new position. As upset as I was/am about being forced out of the college, in a lot of ways, this also feels like one of those “a door shuts and a window opens” kinds of things. I strongly suspect I will end up looking back on this with gratitude that it happened.

Anyway, a given: I did not want to leave when I did. One does not accept a tenure-track position with the intention of leaving in two years. Even more, teaching is one of those things that gets easier over time: you have more material prepped, more explanations thought through, more experience recognizing the pieces that trip new folks up. You get a chance to work through and clean up mistakes (e.g. eight assignments and four projects, all consisting of multiple files that had to be hand-graded, for 60+ students during a 14-week semester—that was a mistake I’d have remedied, the next time I taught that course), and you gradually end up with fewer “new to you” courses to design in between semesters, so you start being able to actually take breaks during the months of the year you aren’t paid to work. To put it bluntly, everyone agrees the first two years are hell, and it starts getting easier after the third year or so. So this was pretty bad timing.

It’s all made worse because, on top of the standard “first two years experience,” I had burned through all of my reserves trying to build the Data Analytics program and support our students through an incredibly difficult time. Like every faculty member and teacher in 2020, I moved my courses online with only a week’s notice, and did not get any kind of spring break to recharge or catch up on grading. (I mean, I’m a realist with a borked immune system, so I saw the writing on the wall before March 13, 2020, when we closed our buildings. But not very much before.) Like every faculty member and front-line academic staff person, everywhere, I ended up supporting a whole bunch of young adults who were facing enormous hurdles, from financial insecurity to mental health issues to illness and grief, adding a whole set of responsibilities that I didn’t have the training for on top of my already more than full-time job. (To be fair, yes, I taught at a community college; all of those hurdles were already there for a number of my students before the pandemic, but in smaller, more manageable numbers. And our support staff were less overwhelmed and could take over sooner.) Like many academic institutions, my employer also didn’t see fit to let any other job duties slide, in acknowledgment of the enormous added workload, so faculty found ourselves doing absolutely pointless paperwork at 2am, sometimes, for no good reason.

Unfortunately, the college also decided in early 2021 that bringing everyone back to campus would be a higher priority than their commitments to equity, their responsibilities to the community, or their legal obligations under the Americans with Disabilities Act. The Provost told the faculty last spring that HR would accommodate those of us who couldn’t safely return to campus, and HR seems to have consistently refused everyone who applied for accommodations, throughout the spring and summer and into the fall. (I do not know of anyone who received an accommodation, but I know of multiple people, including me, who applied, with more than sufficient documentation, and were denied.) The college also still does not mandate vaccination, even after full authorization of the Pfizer vaccine, which puts not only their students and employees at risk, but the whole community they purport to serve.

Here’s the real kicker, though: every Data Analytics course is taught remotely, and my parent department, Computer Information Technology, always has online and remote options, as well. I taught a wide enough variety of courses (more than maybe anyone else in the department) that I could have taught at a distance forever, even if I remained the lowest-seniority member of the faculty. So when I say that I was “pushed out” by HR’s refusal to provide an accommodation, I mean that. It would have cost the college nothing to accommodate me, but instead, like so many other academics, I found myself in the situation of weighing my life and long-term health against my need for an income.

I’m lucky to have been able to choose my health; my heart aches and my temple throbs to think about all of the people who could not make that decision. Academia in general and my former employer, specifically, are responsible for so much unnecessary suffering and more than a few deaths.

I’ve been asked if I would consider coming back—not by administrators, obviously, who could have stepped in and stopped this, but by faculty and staff colleagues—and my answer has been consistent: I have a personal guideline that I do not leave a job in fewer than three years, except in case of emergency, so it’ll be at least that long before I’m looking again, and the college would have to be able to 1) appoint me as a distance-only employee and 2) grant me the correct rank at (re)appointment, with credit for the years I’ve already served. (This has gotten long, so I’ve glossed over the reason my job search actually started before the accommodation was denied: a man in my department came in with two masters degrees, and they were both counted toward his initial rank; I also came in with two masters degrees, and they were not counted toward my rank. When I realized the discrepancy and reported it to HR, they refused to take action to correct it, giving me inconsistent reasoning and questionable information when I pressed.)

But I do love and believe I could grow to be quite good at remote and online teaching (my remote courses were already higher-quality than my in-person courses had been); I enjoy working with community college students; I like doing my part to build up a practical curriculum that will get my students hired; I am good at committee work (I know, right?!); and I’m interested in building up more knowledge around CS and data pedagogy.

So that door is not exactly closed—unless my having written this well-deserved critique results in their closing the door on me, which it might—but it’s not one I can possibly walk through quickly enough to help the students in the pipeline right now, or to save Data Analytics if it succumbs to no longer having a dedicated faculty member trying to keep it afloat, despite some seemingly insurmountable organizational hurdles. Just to clarify: I’m not calling my former department head “not dedicated”; I’m pointing out that she and the others left holding this particular bag have other priorities. She’s the only faculty member in her own department. The CIT Coordinator is extremely busy. The other Data Analytics faculty, one of whom is a 1-year hire, both seem to be focusing on cybersecurity instead of data analytics. So it does not feel out of line to say that the program may be in some danger. But if it survives these challenges, and if the stars align correctly, sure, it’s technically possible that I might come back.

But this window that I’ve gone through, in the meantime? It seems really, really good. So. I’m not going to spend a lot of energy holding open that door, you know?

Excess Deaths / David Rosenthal

It is difficult to comprehend how abject a failure the pandemic response in countries such as the US and the UK has been. Fortunately, The Economist has developed a model estimating excess deaths since the start of the pandemic. Unfortunately, it appears to be behind their paywall. So I have taken the liberty of screen-grabbing a few example graphs.

This graph compares the US and Australia. Had the US handled the pandemic as well as Australia (-17 vs. 250 per 100K), about 885,000 more Americans would be alive today. With a GDP per capita about $63.5K/year, this loses the economy about $56B/year.

This graph compares the UK and New Zealand. Had Boris Johnson handled the pandemic as well as Jacinda Arden (-49 vs. 170), about 149,000 more Britons would be alive today. With a GDP per capita about $42K/year, this loses the economy about $6.3B/year.

A graph is worth a thousand words. Below the fold, a little commentary.

The Economist argues that the true scale of the pandemic can only be determined from excess deaths:
Many people who die while infected with SARS-CoV-2 are never tested for it, and do not enter the official totals. Conversely, some people whose deaths have been attributed to covid-19 had other ailments that might have ended their lives on a similar timeframe anyway. And what about people who died of preventable causes during the pandemic, because hospitals full of covid-19 patients could not treat them? If such cases count, they must be offset by deaths that did not occur but would have in normal times, such as those caused by flu or air pollution.
Their machine-learning model:
estimates excess deaths for every country on every day since the pandemic began. It is based both on official excess-mortality data and on more than 100 other statistical indicators. Our final tallies use governments’ official excess-death numbers whenever and wherever they are available, and the model’s estimates in all other cases.
The model estimates that:
Although the official number of deaths caused by covid-19 is now 4.6m, our single best estimate is that the actual toll is 15.3m people. We find that there is a 95% chance that the true value lies between 9.4m and 18.2m additional deaths.
Excess deaths in the US and the UK are far from the worst, but my point is that countries at a similar level of development have done far better, and so have much less well-resourced countries. Had the US done as well as the model's estimate for China (38 vs. 250) about 702,000 more Americans would be alive today.

Redirecting to non-http-based protocols with nginx / Hugh Rundle

I've written before about Gemini, a relatively young protocol for sending, receiving and interpreting text based files over the Internet. A cross between Gopher and HTTP.

Because Gemini — like HTTP — is built on top of other Internet technologies, it uses the Domain Name system. A side effect of this is that if you have a gemini capsule at gemini:// and somebody tries to load they will end up at whatever the default web site is on your server.

For my gemini capsule I dealt with this by just creating a landing page for the same domain name in the http/https world. But today I was looking into whether it's possible to simply redirect a whole site to the gemini protocol, and it turns out this is actually quite straightforward. The trick is to remember that http and https are themselves different protocols, yet we redirect from http to https all the time. In that case, it's easy to miss because they open the same files in the same browser. But if you have a Gemini browser like Lagrange, a redirect will open the URL in that application. If you're running nginx as your web server software, a redirect is as simple as making a block like this in your default site config, or as a separate file, in your sites-available directory:

server {
listen 80;
return 301 gemini://$host$request_uri;

To redirect both http and https, use certbot, which will essentially hoist your gemini redirect into an https block, and redirect http traffic to that (http => https => gemini). This will avoid users getting certificate errors, even though it feels like a waste of time given your https site is just redirecting to gemini.

Blacklight 7.x, deprecation of view overrides, paths forward / Jonathan Rochkind

This post will only be of interest to those who use the blacklight ruby gem/platform, mostly my collagues in library/cultural heritage sector.

When I recently investigated updating our Rails app from Blacklight to the latest 7.19.2, I encountered a lot of deprecation notices. They were related to code both in my local app and a plugin trying to override parts of Blacklight views — specifically the “constraints” (ie search limits/query “breadcrumbs” display) area in the code I encountered, I’m not sure if it applies to more areas of view customization.

Looking into this more to see if I could get a start on changing logic to avoid deprecation warnings — I had trouble figuring out any non-deprecated way to achieve the overrides.  After more research, I think it’s not totally worked out how to make these overrides keep working at all in future Blacklight 8, and that this affects plugins including blacklght_range_limit, blacklight_advanced_search, geoblacklight, and possibly spotlight. Some solutions need to be found if these plugins are to be updated keep working in future Blacklight 8.

I have documented what I found/understood, and some ideas for moving forward, hoping it will help start the community process of figuring out solutions to keep all this stuff working. I may not have gotten everything right or thought of everything, this is meant to help start the discussion, suggestions and corrections welcome.

This does get wordy, I hope you can find it useful to skip around or skim if it’s not all of interest. I believe the deprecations start around Blacklight 7.12 (released October 2020). I believe Blacklight 7.14 is the first version to suport ruby 3.0, so anyone wanting to upgrade to ruby 3 will encounter these issues.


Over blacklight’s 10+ year existence, it has been a common use-case to customize specific parts of Blacklight, including customizing what shows up on one portion of a page while leaving other portions ‘stock’. An individual local application can do this with it’s own custom code; it is also common from many of shared blacklight “plug-in”/”extension” engine gems.

Blacklight had tradtionally implemented it’s “view” layer in a typical Rails way, involving “helper” methods and view templates. Customizations and overrides, by local apps or plugins, were implemented by over-riding these helper methods and partials. This traditional method of helper and partial overrides is still described in the Blacklight project wiki — it possibly could use updating for recent deprecations/new approaches).

This view/helper/override approach has some advantages: It just uses standard ruby and Rails, not custom Blacklight abstractions; multiple different plugins can override the same method, so long as they all call “super”, to cooperatively add funtionality; it is very flexible and allows overrides that “just work”.

It also has some serious disadvantages. Rails helpers and views are known in general for leading to “spaghetti” or “ball of mud” code, where everything ends up depending on everything/anything else, and it’s hard to make changes without breaking things.

In the context of shared gem code like Blacklight and it’s ecosystem, it can get even messier to not know what is meant to be public API for an override. Blacklight’s long history has different maintainers with different ideas, and varying documentation or institutional memory of intents can make it even more confusing. Several generations of ideas can be present in the current codebase for both backwards-compatibility and “lack of resources to remove it” reasons. It can make it hard to make any changes at all without breaking existing code, a problem we were experiencing with Blacklight.

One solution that has appeared for Rails is the ViewComponent gem (written by github, actually), which facilitates better encapsulation, separation of concerns, and clear boundaries between different pieces of view code.The current active Blacklight maintainers (largely from Stanford I think?) put in some significant work — in Blacklight 7.x — to rewrite some significant parts of Blacklight’s view architecture based on the ViewComponent gem. This is a welcome contribution to solving real problems! Additionally, they did some frankly rather heroic things to get this replacement with ViewComponent to be, as a temporary transition step, very backwards compatible, even to existing code doing extensive helper/partial overrides, which was tricky to accomplish and shows their concern for current users.

Normally, when we see deprecation warnings, we like to fix them, to get them out of our logs, and prepare our apps for the future version where deprecated behavior stops working entirely. To do otherwise is considered leaving “technical debt” for the future, since a deprecation warning is telling you that code will have to be changed eventually.

The current challenge here is that it’s not clear (at least to me) how to change the code to still work in current Blacklight 7.x and upcoming Blacklight 8x. Which is a challenge both for running in current BL 7 without deprecation, and for the prospects of code continuing to work in future BL 8. I’ll explain more with examples.

Blacklight_range_limit (and geoblacklight): Add a custom “constraint”

blacklight_range_limit introduces new query parameters for range limit filters, not previously recognized by Blacklight, that look eg like &range[year_facet][begin]=1910 In addition to having these effect the actual Solr search, it also needs to display this limit (that Blacklight core is ignoring) in the “constraints” area above the search results:

To do this it overrides the render_constraints_filters helper method from Blacklight, through some fancy code effectively calling super to render the ordinary Blacklight constraints filters but then adding on it’s rendering of the contraints only blacklight_range_limit knows about. One advantage of this “override, call super, but add on” approach is that multiple add-ons can do it, and they don’t interfere with each other — so long as they all call super, and only want to add additional content, not replace pre-existing content.

But overriding this helper method is deprecated in recent Blacklight 7.x. If Blacklight detects any override to this method (among other constraints-related methods), it will issue a deprecation notice, and also switch into a “legacy” mode of view rendering, so the override will still work.

OK, what if we wanted to change how blacklight_range_limit does this, to avoid triggering the deprecation warnings, and to have blacklight continue to use the “new” (rather than “legacy”) logic, that will be the logic it insists on using in Blacklight 8?

The new logic is to render with the new view_component, Which is rendered in the catalog/_constraints.html.erb partial. I guess if we want the rendering to behave differently in that new system, we need to introduce a new view component that is like Blacklight::ConstraintsComponent but behaves differently (perhaps a sub-class, or a class using delegation). Or, hey, that component takes some dependent view_components as args, maybe we just need to get the ConstraintsComponent to be given an arg for a different version of one of the _component args, not sure if that will do it.

It’s easy enough to write a new version of one of these components… but how would we get Blacklight to use it?

I guess we would have to override catalog/_constraints.html.erb. But this is unsastisfactory:

  • I thougth we were trying to get out of overriding partials, but even if it’s okay in this situation…
  • It’s difficult and error-prone for an engine gem to override partials, you need to make sure it ends up in the right order in Rails “lookup paths” for templates, but even if you do this…
  • What if multiple things want to add on a section to the “constraints” area? Only one can override this partial, there is no way for a partial to call super.

So perhaps we need to ask the local app to override catalog/_constraints.html.erb (or generate code into it), and that code calls our alternate component, or calls the stock component with alternate dependency args.

  • This is already seeming a bit more complex and fragile than the simpler one-method override we did before, we have to copy-and-paste the currently non-trivial implementation in _constraints.html.erb, but even if we aren’t worried about that….
  • Again, what happens if multiple different things want to add on to what’s in the “constraints” area?
  • What if there are multiple places that need to render constraints, including other custom code? (More on this below). They all need to be identically customized with this getting-somewhat-complex code?

That multiple things might want to add on isn’t just theoretical, geoblacklight also wants to add some things to the ‘constraints’ area and also does it by overriding the render_constraints_filters method.

Actually, if we’re just adding on to existing content… I guess the local app could override catalog/_constraints.html.erb, copy the existing blacklight implementation, then just add on the END a call to both say <%= render(BlacklightRangeLimit::RangeConstraintsComponent %> and then also <%= <%= render(GeoBlacklight::GeoConstraintsComponent) %>… it actually could work… but it seems fragile, especially when we start dealing with “generators” to automatically create these in a local app for CI in the plugins, as blacklight plugins do?

My local app (and blacklight_advanced_search): Change the way the “query” constraint looks

If you just enter the query ‘cats’, “generic” out of the box Blacklight shows you your search with this as a sort of ‘breadcrumb’ constraint in a simple box at the top of the search:

My local app (in addition to changing the styling) changes that to an editable form to change your query (while keeping other facet etc filters exactly the same). Is this a great UX? Not sure! But it’s what happens right now:

It does this by overriding `render_constraints_query` and not calling super, replace the standard implementation with my own.

How do we do this in the new non-deprecated way?

I guess again we have to either replace Blacklight::ConstraintsComponent with a new custom version… or perhaps pass in a custom component for query_constraint_component… this time we can’t just render and add on, we really do need to replace something.

What options do we? Maybe, again, customizing _constraints.html.erb to call that custom component and/or custom-arg. And make sure any customization is consistent with any customization done by say blacklight_range_limit or geoblacklight, make sure they aren’t all trying to provide mutually incompatible custom components.

I still don’t like:

  • having to override a view partial (when before I only overrode a helper method), in local app instead of plugin it’s more feasible, but we still have to copy-and-paste some non-trivial code from Blacklight to our local override, and hope it doesn’t change
  • Pretty sensitive to implementation of Blacklight::ConstraintsComponent if we’re sub-classing it or delegating it. I’m not sure what parts of it are considered public API, or how frequently they are to change… if we’re not careful, we’re not going ot have any more stable/reliable/forwards-compatible code than we did under the old way.
  • This solution doesn’t provide a way for custom code to render a constraints area with all customizations added by any add-ons, which is a current use case, see next section.

It turns out blacklight_advanced_search also customizes the “query constraint” (in order to handle the multi-field queries that the plugin can do), also by overriding render_constraints_query, so this exact use case affects that plug-in too, with a bit more challenge in a plugin instead of a local app.

I don’t think any of these solutions we’ve brainstormed are suitable and reliable.

But calling out to Blacklight function blocks too, as in spotlight….

In addition to overriding a helper method to customize what appears on the screen, traditionally custom logic in a local app or plug-in can call a helper method to render some piece of Blacklight functionality on screen.

For instance, the spotlight plug-in calls the render_constraints method in one of it’s own views, to include that whole “constraints” area on one of it’s own custom pages.

Using the legacy helper method architecture, spotlight can render the constraints including any customizations the local app or other plug-ins have made via their overriding of helper methods. For instance, when spotlight calls render_constraints, it will get the additional constraints that were added by blacklight_range_limit or geoblacklight too.

How would spotlight render constraints using the new architecture? I guess it would call the Blacklight view_component directly, render( But how does it manage to use any customizations added by plug-ins like blacklight_range_limit? Not sure. None of the solutions we brainstormed above seem to get us there.

I suppose (Eg) spotlight could actually render the constraints.html.erb partial, that becomes the one canonical standardized “API” for constraints rendering, to be customized in the local app and re-used every time constraints view is needed? That might work, but seems a step backwards to go toward view partial as API to me, I feel like we were trying to get away from that for good reasons, it just feels messy.

This makes me think new API might be required in Blacklight, if we are not to have reduction in “view extension” functionality for Blacklgiht 8 (which is another option, say, well, you just cant’ do those things anymore, significantly trimming the scope of what is possible with plugins, possibly abandoning some plugins).

There are other cases where blacklight_range_limit for example calls helper methods to re-use functionality. I haven’t totally analyzed them. It’s possible that in some cases, the plug-in just should copy-and-paste hard-coded HTML or logic, without allowing for other actors to customize them. Examples of what blacklight_range_limit calls here include

New API? Dependency Injection?

Might there be some new API that Blacklight could implement that would make this all work smoother and more consistently?

“If we want a way to tell Blacklight “use my own custom component instead of Blacklight::ConstraintsComponent“, ideally without having to override a view template, at first that made me think “Inversion of Control with Dependency Injection“? I’m not thrilled with this generic solution, but thinking it through….

What if there was some way the local app or plugin could do Blacklight::ViewComponentRegistration.constraints_component_class = MyConstraintsComponent, and then when blacklight wants to call it, instead of doing, like it does now, <%= render( stuff) %>, it’d do something like: `<%= stuff) %>.

That lets us “inject” a custom class without having to override the view component and every other single place it might be used, including new places from plugins etc. The specific arguments the component takes would have to be considered/treated as public API somehow.

It still doesn’t let multiple add-ons cooperate to each add a new constraint item though. i guess to do that, the registry could have an array for each thing….

Blacklight::ViewComponentRegistration.constraints_component_classes = [

# And then I guess we really need a convenience method for calling
# ALL of them in a row and concatenating their results....

Blacklight::ViewComponentRegistration.render(:constraints_component_class, search_state: stuff)

On the plus side, now something like spotlight can call that too to render a “constraints area” including customizations from BlacklightRangeLimit, GeoBlacklight, etc.

But I have mixed feelings about this, it seems like the kind of generic-universal yet-more-custom-abstraction thing that sometimes gets us in trouble and over-complexified. Not sure.

API just for constraints view customization?

OK, instead of trying to make a universal API for customizing “any view component”, what if we just focus on the actual use cases in front of us here? All the ones I’ve encountered so far are about the “constraints” area? Can we add custom API just for that?

It might look almost exactly the same as the generic “IoC” solution above, but on the Blacklight::ConstraintsComponent class…. Like, we want to customize the component Blacklight::ConstraintsComponent uses to render the ‘query constraint’ (for my local app and advanced search use cases), right now we have to change the call site for every place it exists, to have a different argument… What if instead we can just:

Blacklight::ConstraintsComponent.query_constraint_component =

And ok, for these “additional constraint items” we want to add… in “legacy” architecture we overrode “render_constraints_filters” (normally used for facet constraints) and called super… but that’s just cause that’s what we had, really this is a different semantic thing, let’s just call it what it is:

Blacklight::ConstraintsComponent.additional_render_components <<

Blacklight::ConstraintsComponent.additional_render_components <<

All those component “slots” would still need to have their initializer arguments be established as “public API” somehow, so you can register one knowing what args it’s initializer is going to get.

Note this solves the spotlight case too, spotlight can just simply call render Blacklight::ConstraintsComponent(..., and it now does get customizations added by other add-ons, because they were registered with the Blacklight::ConstraintsComponent.

I think this API may meet all the use cases I’ve identified? Which doesn’t mean there aren’t some I haven’t identified. I’m not really sure what architecture is best here, I’ve just trained to brainstorm possibilities. It would be good to choose carefully, as we’d ideally find something that can work through many future Blacklight versions without having to be deprecated again.

Need for Coordinated Switchover to non-deprecated techniques

The way Blacklight implements backwards-compatible support for the constraints render, is if it detects anything in the app is overriding a relevant method or partial, it continues rendering the “legacy” way with helpers and partials.

So if I were to try upgrading my app to do something using a new non-deprecatred method, while my app is still using blacklight_range_limit doing things the old way… it woudl be hard to keep them both working. If you have more than one Blacklight plug-in overriding relevant view helpers, it of course gets even more complicated.

It pretty much has to be all-or-nothing. Which also makes it hard for say blacklight_range_limit to do a release that uses a new way (if we figured one out) — it’s probably only going to work in apps that have changed ALL their parts over to the new way. I guess all the plug-ins could do releases that offered you a choice of configuration/installation instructions, where the host app could choose new way or old way.

I think the complexity of this makes it more realistic, especially based on actual blacklight community maintenance resources, that a lot of apps are just going to keep running in deprecated mode, and a lot of plugins only available triggering deprecation warnings, until Blacklight 8.0 comes out and the deprecated behavior simply breaks, and then we’ll need Blacklight 8-only versions of all the plugins, with apps switching everything over all at once.

If different plugins approach this in an uncoordianted fashion, each trying to investnt a way to do it, they really risk stepping on each others toes and being incompatible with each other. I think really something has to be worked out as the Blacklgiht-recommended consensus/best practice approach to view overrides, so everyone can just use it in a consistent and compatible way. Whether that requires new API not yet in Blacklight, or a clear pattern with what’s in current Blacklight 7 releasees.

Ideally all worked out by currently active Blacklight maintainers and/or community before Blacklight 8 comes out, so people at least know what needs to be done to update code. Many Blacklight users may not be using Blacklight 7.x at all yet (7.0 released Dec 2018) — for instance hyrax still uses Blacklight 6 — so I’m not sure what portion of the community is already aware this is coming up on the horizon.

I hope the time I’ve spent investigating and considering and documenting in this piece can be helpful to the community as one initial step, to understanding the lay of the land.

For now, silence deprecations?

OK, so I really want to upgrade to latest Blacklight 7.19.2, from my current 7.7.0. To just stay up to date, and to be ready for ruby 3.0. (My app def can’t pass tests on ruby 3 with BL 7.7; it looks like BL added ruby 3.0 support in BL 7.14.0? Which does already have the deprecations).

It’s not feasible right now to eliminate all the deprecated calls. But my app does seem to work fine, just with deprecation calls.

I don’t really want to leave all those “just ignore them for now”. deprecation messages in my CI and production logs though. They just clutter things up and make it hard to pay attention to the things Iwant to be noticing.

Can we silence them? Blacklight uses the deprecation gem for it’s deprecation messages; the gem is by cbeer, with logic taken out of ActiveSupport.

We could wrap all calls to deprecated methods in Deprecation.silence do…. including making a PR to blacklight_range_limit to do that? I’m not sure I like the idea of making blacklight_range_limit silent on this problem, it needs more attention at this point! Also I’m not sure how to use Deprecation.silence to effect that clever conditional check in the _constraints.html.erb template.

We could entirely silence everything from the deprecation gem with Deprecation.default_deprecation_behavior — I don’t love this, we might be missing deprecations we want?

The Deprecation gem API made me think there might be a way to silence deprecation warnings from individual classes with things like Blacklight::RenderConstraintsHelperBehavior.deprecation_behavior = :silence, but I think I was misinterpreting the API, there didn’t seem to be actually methods like that available in Blacklight to silence what I wanted in a targetted way.

Looking/brainstormign more in Deprecation gem API… I *could* change it’s behavior to it’s “notify” strategy that sends ActiveSupport::Notification events instead of writing to stdout/log… and then write a custom ActiveSupport::Notification subscriber which ignored the ones I wanted to ignore… ideally still somehow keeping the undocumented-but-noticed-and-welcome default behavior in test/rspec environment where it somehow reports out a summary of deprecations at the end…

This seemed too much work. I realized that the only things that use the Deprecation gem in my project are Blacklight itself and the qa gem (I don’t think it has caught on outside blacklight/samvera communities), and I guess I am willing to just silence deprecations from all of them, although I don’t love it.

David Graeber - 5 favourites from a radical life / Hugh Rundle

It was the first anniversary of anthropologist David Graeber's death yesterday.

I first encountered his work when I was intrigued by the title and description of Debt: The first 5000 years whilst browsing in a bookshop. This is on my list of psychotropic books because it drastically changed what I noticed and how I thought about money, economies, and human societies in general. Graeber covers all the big question: What is money? Is repaying debt a moral responsibility? What was the Axial Age really all about? What does it mean to have an agreement without trust?

I found this book fascinating but I still didn't really understand much about Graeber and the rest of his work. Later, he become more widely known through his article and then book on the phenomenon of Bullshit Jobs. I found it interesting, but with 20 years working in local government I found The utopia of rules: On technology, stupidity, and the secret joys of bureaucracy to be more compelling and, well, confronting. Here Graeber explained why I was so attracted to bureaucracy at the same time it repelled me. Whilst The Utopia of Rules restates some of the observations and arguments of James Scott's Seeing like a state, there's a lot here that's new, and Graeber had a knack for putting things in clear, often blunt ways — particularly notable for someone whose entire working life was spent in academia.

It was hinted at in earlier work, but Graeber's joint article with David Wengrow, How to change the course of human history really blew my mind. My undergraduate degree was officially a "Bachelor of Arts" but the last two years were essentially just me taking every history class available. This article (soon to be published in longer form as a book), energetically argues that many historical "facts" about the rise of states and stratification of societies are at best conjecture, and at worst fly in the face of most evidence. The important thing about this argument, though, is that it widens the horizons of what is possible in the future. It suits a certain type of person and philosophy for us to believe that the rise of unitary states with different classes defined by wealth disparities is natural and inevitable. But ...what if it's neither?

Now I was hooked. Graeber wrote a lot of articles for a general readership in addition to his more "scholarly" work. Revolution in reverse can probably be blamed for me becoming comfortable with disengaging from electoral politics after being deeply involved in it for the decade prior:

Why is it that the idea of any radical social transformation so often seems “unrealistic”? What does revolution mean once one no longer expects a single, cataclysmic break with past structures of oppression? These seem disparate questions but it seems to me the answers are related.

Finally, there's Graeber's argument that There never was a West. This was written in 2016 but I only encountered it this year, and it feels as fresh as if written yesterday. A long treatise on the history and nature of democracy, like many of his works it shows how intricately connected are the intellectual histories of Europe, the Americas, and Mediterranean world.

David Graeber was taken from the world too soon. His work has influenced me profoundly. Perhaps, if you've not yet read it, it will change the way you see the world too.

NDSA Announces 2021 Slate of Candidates for Coordinating Committee / Digital Library Federation

NDSA is happy to announce the 2021 slate of Coordinating Committee (CC) candidates. Elections will soon be held for three (3) CC members.  The CC is dedicated to ensuring a strategic direction for NDSA, to the advancement of NDSA activities to achieve community goals, and to further communication among digital preservation professionals and NDSA member organizations. The CC is responsible for reviewing and approving NDSA membership applications and publications; updating eligibility standards for membership in the alliance, and other strategic documents; engaging with stakeholders in the community; and working to enroll new members committed to our core mission. The successful candidates will each serve a three year term. Ballots will be sent to membership organization contacts in the coming weeks.

Stacey Erdman

Stacey Erdman is the Digital Preservation & Curation Officer at Arizona State University. In this position, she has responsibility for designing and leading the digital preservation and curation program for ASU Library. She is also currently serving as the Acting Digital Repository Manager at ASU, where she has been working with the repository team on migrating repository platforms to Islandora. She is the former Digital Archivist at Beloit College; and Digital Collections Curator at Northern Illinois University. She has been a part of the Digital POWRR Project since its inception in 2012, and is serving as Principal Investigator for the recently funded IMLS initiative, the Digital POWRR Peer Assessment Program. Stacey currently serves on the 2021 NDSA Program Committee, and is also a member of the Membership Task Force. She has been excited to see the steps that the NDSA has taken recently to diversify the member base, and would work as a part of the CC to help make this work mission-critical. Stacey feels passionately about making the digital preservation field more equitable and inclusive, and would be a strong advocate for expanding NDSA’s outreach, advocacy, and education efforts.

Daniel Johnson

Daniel Johnson is the Digital Preservation Librarian at The University of Iowa and Consulting Archivist for The HistoryMakers. Previously Johnson worked as a digital archivist at The HistoryMakers and as a project archivist for the Gordon Hall and Grace Hoag Collection of Extremist and Dissenting Printed Propaganda at Brown University. Johnson has experience working in digital preservation, digital archives, reformatting/digitization, digital file management, web archiving, metadata standards, database management and project management. Johnson has presented on digital preservation related topics at many conferences including the American Library Association, the Society of American Archivist, the Digital Library Federation and the Upper Midwest Digital Collections Conference. Johnson earned his B.A. degree in English from the University of Illinois at Urbana-Champaign in 2007 and his MLIS from the University of Illinois at Urbana-Champaign in 2009.

Jen Mitcham

Jen Mitcham has been working in the field of digital preservation for 17 years after an early career as an archaeologist. Her data preservation work began at the Archaeology Data Service where she worked on the preservation of a range of different types of datasets, including databases, laser scan data and Geographic Information Systems, also developing front ends for online access. At the Archaeology Data Service, she led a successful application for the Data Seal of Approval (now CoreTrustSeal) and was involved in the ‘Big Data Project’. From here she moved to the Borthwick Institute for Archives at the University of York where she focused on establishing policy and procedures for both digital preservation and digitisation. Here she was heavily involved in research data management, both as a facilitator in training sessions for researchers and in working on a preservation infrastructure. She led the Jisc funded project ‘Filling the Digital Preservation Gap’ which was a finalist in the Research and Innovation category of the 2016 Digital Preservation Awards. She currently holds the post of Head of Good Practice and Standards at the Digital Preservation Coalition and has been working closely with the UK Nuclear Decommissioning Authority over the last two years on a digital preservation project. As part of this work, she has been involved in the development of a new maturity model for digital preservation called the Rapid Assessment Model and chairs a task force with a focus on the preservation of records from Electronic Document and Records Management Systems. She works with Coalition Members internationally to help facilitate collaboration and communication in the field of digital preservation. Jen has been involved in several NDSA efforts, including the NDSA Levels Revision working group., NDSA Level Steering group, the Standards and Practices interest group, and the Fixity Survey working group.

Hannah Wang

Hannah Wang works at Educopia Institute, where she is the Community Facilitator for the MetaArchive Cooperative and the Project Manager for the BitCuratorEdu project. Her work and research focuses on digital archives pedagogy and amplifying and coordinating the work of digital preservation practitioners through communities of practice. She currently serves on the NDSA Staffing Survey Working Group. Hannah was previously the Electronic Records & Digital Preservation Archivist at the Wisconsin Historical Society, and has taught graduate-level archives classes at the University of Wisconsin-Madison. She is interested in joining the Coordinating Committee because she wants to advance the NDSA as an educational and advocacy resource for practitioners, particularly students and early-career professionals. She is also interested in exploring how the NDSA can align itself with the activities of other communities working toward the common goal of advancing digital stewardship practice through collaboration and knowledge exchange.

The post NDSA Announces 2021 Slate of Candidates for Coordinating Committee appeared first on DLF.

DLF Digest: September 2021 / Digital Library Federation

DLF Digest

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation

This month’s news:

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), make sure to bookmark the DLF Community Calendar. Can’t find meeting call-in information? Email us at


DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member institution. Learn more about our working groups and how to get involved on the DLF website. Interested in starting a new working group or reviving an older one? Need to schedule an upcoming working group call? Check out the DLF Organizer’s Toolkit to learn more about how Team DLF supports our working groups, and send us a message at to let us know how we can help. 

The post DLF Digest: September 2021 appeared first on DLF.

Registration and Programs for 2021 CLIR Events are NOW LIVE, Keynotes Announced, and more / Digital Library Federation

Graphic: Left side contains multi-color lines in woven pattern with white text on top that reads, "Join Us Online." Right has a white background with text indicating the dates of DLF Forum, NDSA's Digital Preservation, and Learn@DLF.

Featuring sixteen sessions; Keynote speakers Dr. Nikole Hannah-Jones, Dr. Stacey Patton, Nisha Mody. Registration closes October 25 or when the events sell out. 

CLIR/DLF is delighted to share many exciting updates and announcements pertaining to our events taking place in November. Read on and act fast!

  • Programs for ALL events (the DLF Forum; NDSA’s Digital Preservation 2021: Embracing Digitality; and Learn@DLF) are now available here. Some details are still being determined, but we think you’ll like what you’ll see in this year’s new conference platform, Midspace (formerly Clowdr).
  • As part of our program, we’re delighted to announce our amazing 2021 keynote speakers for the DLF Forum. Dr. Nikole Hannah-Jones will be interviewed by Dr. Stacey Patton to kick off the DLF Forum and Nisha Mody will deliver the Forum’s closing plenary keynote. The keynote speaker for DigiPres will be announced soon. 
  • Now that you’ve checked out the great content we’re offering in November, it’s time to register. The DLF Forum and NDSA’s Digital Preservation are free with the option to donate; Learn@DLF is $35 per workshop. Registration for ALL events is open (but limited) through October 25 or until the event sells out. 
  • None of this would be possible without our sponsors. As a DLF Forum and DigiPres sponsor you will be part of the premier digital library conference that fosters leadership, strengthens relationships, engages in difficult conversations, sets grassroots agendas, and organizes for action. Sponsorship opportunities are limited and will go quickly. Check out our 2021 sponsorship opportunities and reserve your preferred sponsor level today.
  • Want to show your 2021 DLF Forum spirit? Then head on over to our t-shirt fundraiser to nab your 2021 DLF Forum t-shirt, only available through October 1, benefiting our Child Care Fund as always.

Apply to Be a Community Journalist: This year, we will be providing $250 stipends to a cohort of 10 DLF Forum attendees from a variety of backgrounds. We will feature the voices and experiences of our 2021 Community Journalists on the DLF blog after our events this fall. Apply by September 20.

We’re excited to see everyone in November! Remember, registration closes October 25th or when the events sell out.

Register Here

The post Registration and Programs for 2021 CLIR Events are NOW LIVE, Keynotes Announced, and more appeared first on DLF.

Call for proposals EXTENDED to Sept 13th – Samvera Connect 2021 Online / Samvera

The Program Committee​​ is extending its Call for Proposals (CfP) for workshops, presentations, and panels through Monday, September 13th at 8pm EDT. 

Samvera Connect Online 2021 workshops will be held October 14th -15th, with plenary presentations October 18th – 22nd.

Whether you are new to the Community or a longtime Samveran, we welcome your participation in Connect 2021!  Any topic of interest to the Community is welcome, from any role in your organization. 

Workshops: submission form open through Monday, September 13th at 8pm EDT

Presentations and Panels: submission form open through Monday, September 13th at 8pm EDT

Lightning Talks: submission form open through Thursday, September 30th, 2021

Virtual Posters: submission form open through Thursday, September 30th, 2021

Here are just a few topics the Committee would love to see from the Community:

  • Hackathon workshops
  • Hybrid work challenges and success stories
  • Accessibility challenges and successes
  • Workflow tours
  • Overview of Samvera’s Github contribution process from issue creation to update (presentation or workshop)
  • Technology selection processes
  • Staff and departmental advocacy in times of change
  • “Cool tools” lightning talks to share something you built or use in your work
  • Repository services, advocacy, and promotion
  • Maintenance best practices and new approaches (for every aspect of the digital repository)
  • Hybrid work challenges and success stories

You may find it helpful to refer to the workshop program, presentation/lightning talk program, and posters from last year’s online conference.

The post Call for proposals EXTENDED to Sept 13th – Samvera Connect 2021 Online appeared first on Samvera.

Economies Of Scale / David Rosenthal

Steve Randy Waldman is a very interesting writer. He has a fascinating short post entitled Economies of scale in which he distinguishes four different types of "economies of scale". In reverse order, they are:
  1. Insurance
  2. Market power
  3. Network effects
  4. Technical economies
The effects of economies of scale in technology markets, such as storage media, digital preservation and cryptocurrencies, is a topic on which I have written many times, drawing heavily on W. Brian Arthur's 1994 book Increasing Returns and Path Dependence in the Economy. Below the fold I discuss Waldman's classification of them.

Brian Arthur's analysis focuses on the effects of economies of scale, where Waldman's focuses on their causes.


Waldman's last type is probably one that few have identified, the economy of scale in risk pools:
there are economies of scale in the insurance of stakeholders, which is a genuine efficiency and of tremendous social value. A large firm can provide generous sick leave or parental leave, because the absent employee is one of a large stable among whom the extra burden can be shared, and over which the financial cost can be amortized. For a small firm, even temporary loss of a skilled worker can paralyze the business. And a small firm’s finances may be too weak to pay the leave. “Mom and Pop” firms are notoriously shitty at providing flexibility and insurance benefits not because Mom and Pop are bad people, but because a big insurance pool functions better than a tiny one. This is a real economy of scale.
My go-to source on risk and insurance is Peter L. Bernstein's Against The Gods: The remarkable story of risk. He writes:
In practice, insurance is available only when the Law of Large Numbers is observed. The law requires that the risks insured must be both large in number and independent of one another, like successive deals in a game of poker.
The idea that only a big firm can provide the necessary size of risk pool to generate economies of scale is a curiously US-centric one. Waldman even acknowledges this:
However, much of this advantage of bigness would disappear if the social insurance function were sensibly provided by the state instead of our relying upon individual businesses to offer “benefits”. (The state cannot relieve businesses of the risk that a critical employee may need to step back, but this risk fades even at small-to-medium scales beyond “Mom and Pop”.)
This "socialized medicine" might work in "less efficient" economies such as the EU, but how could it possibly work in the US? Two words: Medicare, and VA. Note that both huge risk pools deliver cheaper care despite catering for populations commercial insurers consider too expensive to cover. The point being that it isn't just the size of the risk pool, but its diversity that matters.

Market Power

Waldman is again on point when he describes two forms of market power. First:
There is traditional monopoly or market power by which firms can extract rents from workers, suppliers, and consumers. Market power is a correlate of scale that looks great from any firm’s perspective, but its “efficiencies” are just transfers from other stakeholders, and are destructive in aggregate.
Unfortunately, this entrenched destructiveness, a legacy of the Chicago school, is hard to displace because, second:
There are resource and coalitional “economies of scale”, the way very large firms can engage in predatory pricing, or coordinate the activities of lawyers and lobbyists and media, and eventually politicians and regulators, in a firm’s interest. Again, these are not true “economies” at all. They may benefit incumbent firms, but are of negative social value.
The "negative social value" of this second form was demonstrated in the Global Financial Crisis, when the banks collectively arranged for the negative effects of their reckelss lending to fall everywhere but on themselves.

Network Effects

Because of the dominance in technology markets of the FAANGs, network effects are what most people think of as technology's economies of scale. As we see with markets such as social media, search and operating systems, network effects provide dominant companies both extraordinary profits and very strong defenses against rising competitors. Waldman correctly points out the need for governments to regulate "natural monopolies":
These are real economies, but as John Hussman describes them, network effects should be classified as “uninvented public goods”. Firms should be rewarded for discovering them — and indeed they have been and are rewarded, quite handsomely — but networks should not remain monopoly franchises of private entities indefinitely. They are “natural monopolies”, which competition will not regulate in the public interest. They should fall, whether through outright ownership or as “regulated utilities”, into management by the state. [1]
His footnote 1 is perceptive:
[1] Yes, states are corrupt, in that they often improperly serve particular private interests. But the only reason we don’t understand firms to be even more corrupt is that serving particular private interests is each firm’s overt function and purpose. It’s not that monopolists behave better, from a social perspective, than states, it’s that their misbehavior gets coded as legitimate competence.

Technical Economies

As regards technical economies of scale, Waldman writes:
States should not try to insist that Mom and Pop should be able to bootstrap competitors to GM out of savings from their second job. But technical economies of scale peter out at scales much smaller than megafirms. Tesla, which (in physical, rather than casino-financial terms) is not so big, can compete with GM. Technical economies of scale require the scale of a factory, producing in quantities that fully amortize fixed capital costs, but not more than that.
Waldman is just wrong about this, at least in the technology space. Its massive size in "casino-financial terms" is precisely the reason Tesla can compete with GM. The world is littered with small car companies that, lacking lavish early backing from a high-profile billionaire, could not compete with GM. And lets not lose sight of the fact that, until recently, Tesla's profits came from selling carbon credits; it lost money selling cars.

Tesla isn't even a good example. In many technology markets the investment needed to build "a factory, producing in quantities that fully amortize fixed capital costs" is immense. To compete in chip manufacturing you need a state-of-the-art 5nm fab, costing $12B. To stay competitive, you'll need to be planning the next one, at 3nm and $25B. To compete in AI you need huge data centers to train the models. Arguably, these "economies of scale" are actually financial rather than technical. The larger you are, the easier it is to finance investments at the necessary scale.

This has anti-trust implications. Would breaking up TSMC improve consumer welfare? The fragments combined wouldn't be able to afford TSMC's $200B investment program, so the world would take much longer to get to 3nm and beyond, increasing the price of chips that go into so many products. The co-evolution of dominant technology companies and their suppliers has created an investment "moat" protecting them from emerging competion. When ASML's EUV machines cost $160M each and the queue for them is several years long a new entrant is hopeless.


I find Waldman's classification useful. Government regulation is undoubtedly needed to counteract network effects and the abuses of market power, but as he points out these are not the only economies of scale in play. Governments certianly have a role to play in eliminating the economy of scale that the employer-based (and massively dysfunctional) US health insurance system imposes. But these still leave the technical economies of scale that Waldman downplays.

Brian Arthur's analysis of the way economies of scale drive market concentration is agnostic to the cause of the economies. It is hard to see how, at least in many technology markets, governments could push back against the very large technical economies of scale. So it seems we are doomed to live with highly concentrated markets. Perhaps we can learn from Raymond Zhong and Li Yuan's The Rise and Fall of the World’s Ride-Hailing Giant:
Under Xi Jinping, the Communist Party’s most powerful leader since Mao, China has taken a hard ideological turn against unfettered private enterprise. It has set out a series of strictures against “disorderly” corporate expansion. No longer will titans of industry be permitted to march out of step with the party’s priorities and dictates.
On issues like data security, privacy and worker protections, Beijing’s scrutiny is long overdue. Yet Chinese officials have moved against tech companies with a speed and ferocity that might unsettle even the most ardent Western trustbusters.

Learn about automated decision making in the UK housing benefits system / Open Knowledge Foundation

– Are you a lawyer, campaigner or activist working in the UK housing benefits system?
– Do you want to learn how automated decision systems are currently used in the housing benefits system in the UK ?
– Do you want to learn about legal strategies for challenging the (mis)use of these technologies?

= = = = = = =

Join the The Justice Programme team for an online 90 minute interactive workshop on Thursday September 23rd 2021 between 12.00 – 13.30 BST (London time).

Tickets for the event cost £110 (inc VAT) and can be purchased online here.

= = = = = = =

Tickets are limited to 20 people – to ensure that everyone who attends can maximise their learning experience.

If you are unwaged and can’t afford a ticket, please email The Justice Programme is offering two places at a 75% discount (£27.50 each).

All proceeds from this event are reinvested in the work of The Justice Programme, as we work to ensure Public Impact Algorithms do no harm.

= = = = = = =

What will I learn ?

In this Interactive Workshop on housing benefit and automation we will:

– explore how AI and algorithms are presently being used and likely to be used in UK and elsewhere
– review a summary of how algorithms work
– discuss the potential harms involved at the individual and societal levels
– summarise legal strategies, resources and best practices
– participate in a group exercise on a realistic case study

You will also get access to a guide summarising the key points of the workshop and documenting the answers to your questions.

This workshop is brought to you by Meg Foulkes, Director of The Justice Programme and Cedric Lombion, our Data & Innovation Lead.

Read more about The Justice Programme team here.

About The Justice Programme

The Justice Programme is a project of the Open Knowledge Foundation, which works to ensure that Public Impact Algorithms do no harm.

Find out more about The Justice Programme here, and learn more about Public Impact Algorithms here

Why? / Ed Summers

I tuned into the virtual talk at Berkeley last Friday evening (for me) from Clifford Lynch Why “Web Archiving” is No Longer a Useful Concept or Phrase. Alas, the event was not recorded. As one participant pointed out, this was perhaps a great illustration of Lynch’s main point, that the concept of “web archiving” means different things to different people. Should the video from a desktop application (Zoom) that is pulling data from a web server using a URL be considered part of the web? I think Lynch was suggesting that it wasn’t, or that it was at best unclear. I think he is wrong.

I didn’t take good notes, and almost felt encouraged not to share anything publicly about the talk. But you can’t really put a talk title like this one out in the public eye and expect that. Lynch was mostly previewing a paper that he is writing for Against the Grain, and getting some feedback from attendees. So I think you can expect to see his argument in print, perhaps in some modulated form, at some point in the future.

His basic argument seemed to be that “Web Archiving” is no longer useful because:

  • the web is unfathomably large
  • web documents are no longer static, and are assembled dynamically based on user behavior
  • web documents are not canonical and are customized for particular users based on their browsing history, and countless other inscrutable algorithmic machinations
  • the web is run by corporations who actively make their content difficult to archive (so people have to come to them for it)
  • some web content is designed to disappear (e.g Snapchat, Insta Stories, etc)
  • all sorts of bespoke apps that traffic in JSON have supplanted browsers as clients of the web

I’m probably missing, and misstating, parts of his argument. But for me there was an implicit all that was missing from his title, and his argument, which (for clarity) should have read Why “Archiving All the Web” is No Longer a Useful Concept or Phrase. Of course this would hardly have been eye catching anymore, because most archivists, especially ones who are directly involved in archiving the web, already know this…all too well.

The problem with the original title is that it seems to suggest that collecting and storing content from the web is no longer a useful concept or practice. But if there is anything that’s no longer useful here, it is our antiquated idea that web archives have a singular, distinct architecture: harvesting data from the web (e.g. Heritrix), storing the data (WARC), and playing it back for users (e.g. Wayback Machine).

Clearly this is a type of web archive, but limiting our discussion of web archives to only this particular form ignores the fact that there is abundant practice by a diversity of actors who are actively collecting data from the web and using it for a wide range of purposes, sometimes at great cost. These are web archives too, and talking about them as web archives is important because it helps us put these practices in historical context.

One participant commented in the Zoom chat that perhaps a better phrase than “Web Archiving” was “Archiving from the Web”. I tend to agree since it better illustrates a particular mode of collecting and storing data that is prevalent in web archiving activities. But to Lynch’s point, I think he is onto something if instead he said that the practice of Web Archiving needs more specificity and nuance. Hopefully that’s the direction that his article is spun in.

The argument that the web is no longer static, which therefore means that web archiving is no longer possible is simply just not the case. The web hasn’t been (only) static for a very long time, and that hasn’t prevented web archiving technology from evolving. The architecture of the web and the HTTP protocol still mean that web clients receive bitstreams of data with an envelope of metadata (aka representations), and these can be stored and retrieved. Furthermore the implication that static publishing to the web is no longer practiced is just not true. Much to the contrary, static web publishing has been going through a renaissance over the past decade.

It seems to me that this talk was primarily an extension of Lynch’s previous writing about the difficulties that algorithms present to archives (Lynch, 2017). I thought that presentation was pretty insightful (rather than inciteful) because of how it highlighted the important roles that social scientists and humanists play in understanding and documenting the algorithms that shape and govern what and how we see on the web. The slippery, elusive, and perennially important context. Archives have always been only a sliver of a window into process (Harris, 2002), and it remains the case when archiving from the web.

Ok, ok. Last year I finished writing a dissertation about web archives, so I’ll admit I’m kind of touchy on this subject :-) What do you mean this thing I spent 6 years researching is no longer a useful concept?!? As a way to vent some of my angst I created a little web app. I hope you enjoy it. It’s a static website.


Harris, V. (2002). The archival sliver: power, memory, and archives in South Africa. Archival Science, 2(1-2), 63–86.

Lynch, C. (2017). Stewardship in the age of algorithms. First Monday, 22(12). Retrieved from

Cons(train)ed music: a kit for the traveling ambient artist / Mark Matienzo

After an intense year and a half (pandemic, work, other personal stuff) I needed a vacation. I booked a trip on the Amtrak Empire Builder to go from Seattle to Milwaukee and Chicago. Because of persistent low-grade burnout, I wanted to use the considerable time I’d have offline to work on creative projects including music and writing. This is a brief writeup inspired by The Setup that talks about what I brought with me in an effort to travel relatively light.

Case Studies: Metadata Assessment / Digital Library Federation

DLF Digital Library AssessmentThis blog post was authored by members of the Digital Library Assessment Interest Group’s Metadata Assessment Working Group (DLF AIG MWG),

If you are interested in metadata evaluation, or want to learn more about the group’s work, please consider attending one of our meetings!


The DLF Assessment Interest Group Metadata Working Group (MWG) collaborates on projects related to a number of metadata quality and assessment issues (  The metadata quality aspects discussed in this post were formulated by Bruce & Hillmann (2004) and also used as the basis of the Metadata Assessment Framework formulated by the MWG:

Organizations assess metadata in their digital collections in a number of different ways.  This post is intended to provide a case study to show how three universities evaluate metadata in their digital library systems as an example for other institutions that may be interested in similar assessment activities.



Colgate University 

System. Colgate currently uses Islandora 7, and is closely watching institutions like Whitman College who are actively migrating to Islandora 8.  We migrated our main digital collections from CONTENTdm about 4 years ago and are still migrating our institutional repository from Bepress’ Digital Commons.

Size. Colgate’s digital collections ( contain nearly 6000 individual objects, but a count of book and newspaper pages increases the number to 112,781.  An additional 500 items in our institutional repository will come online soon.

Editors.  Students create or capture basic descriptive and administrative metadata for the majority of our Special Collections/University Archives materials.  One librarian then adds subject-related metadata, and checks the descriptive metadata for accuracy and conformance to our local standards before upload.  Three paraprofessional catalogers have been working on two historical collections as remote work during the pandemic,  which is also reviewed by the same librarian before upload.  Work on assessing and creating metadata and migrating the Colgate institutional repository has been shared by two librarians and one paraprofessional cataloger.

Assessment.  Most of our larger assessment projects have been tied to migration: either as we migrated out of CONTENTdm or Digital Commons.  Now that we are seeing how some of the metadata is working together (or not working, as the case may be), we are doing targeted assessment and resultant amelioration of our prestige and most used collections.

University of North Texas 

System.  At the University of North Texas (UNT) Libraries, our digital library system was built in-house and is maintained by our team of programmers, so we can make changes and additions when needed.

Size.  The Digital Collections ( currently house more than 3.1 million items/metadata records, hosted locally within our archival system.  All materials are described using the same schema (UNTL), which has 21 possible fields including 8 that are required for every item.

Editors.  Most metadata creation is handled by students who are employed and trained in UNT Libraries departments, or Libraries staff.  Also, many of our materials come from partner institutions that own the physical collections, so staff or volunteers at those locations often have access to edit records in their collections.  On a weekly basis, we usually have around 50 people editing records in our system.

Assessment.  At UNT, we have ongoing assessment at a number of levels, including research, which has been published in a number of places.  Additionally, in 2017 we implemented three integrated tools (count, facet, cluster) that allow any editor to review metadata at a collection level or systemwide (a paper from when we implemented them:  These tools reorganize metadata values and information about the records to assist in identifying records that require correction (or just review), but an editor still has to manually edit each record.  As a project is completed, an editor will generally use the tools to do general quality control of their own work; they are also used to identify various system-wide problems that can be addressed as time permits.

  • COUNT allows an editor to see the number of entries for a field or field/qualifier combination (e.g., how many records have 0 subjects, or 3 subjects, or 1,000 subjects; how many records have 1 series title or 15 series titles; etc.).
  • FACET displays the unique values for a field or field/qualifier combination, which is particularly useful for finding typos or seeing which values are most common in a collection (e.g., most popular keywords).  
  • CLUSTER has two functions that work across all values in a field or field/qualifier combination: [1] normalizes values based on a chosen algorithm (there’s a list of options), then displays any instances (clusters) where there are multiple unique strings that have the same normalized key (e.g., “John Smith” and “Smith, John” would both normalize to “john smith” in the default algorithm, so those would cluster together); or [2] sorts every value according to criteria (e.g., length or alphanumeric pattern) and groups together the values that have the same chosen quality (e.g., all the subject values that are 1 character long, or 3 characters long, or 500 characters long).

These tools have allowed us to identify and correct a large number of issues, however, we are still enhancing various parts of our system.

Whitman College 

At Whitman, we are currently in the process of migrating from Islandora 7 to Islandora 8 as part of a LYRASIS IMLS grant which focuses on moving from Fedora 3 to Fedora 6. As a smaller institution, we have roughly 20,000 items and formed a Metadata Working Group to assess all things metadata.  Members consist of our Scholarly Communications Librarian and Institutional Repository Manager, our Associate Archivist, and our Digital Assets and Metadata Librarian. The three of us have weekly group meetings and primarily use Google Sheets and Google Documents. In the beginning, the Metadata Working Group used Trello as well to help keep track of our metadata assessment and clean up, however as we’ve progressed and formalized our fields, we’ve moved away from it. 


How do you assess accuracy?

Colgate.  We have generally measured accuracy in two ways: are the record fields present (or blank) and are the values correct or appropriate for the material, i.e., this is or is not a photograph of Colgate President Ebenezer Dodge.  The first we can assess in a mostly automated manner by using filters on our spreadsheets or Open Refine.  The second can only be assessed by a human who has some familiarity with Colgate’s  collections and history, which naturally takes much longer and is still subject to human error.

UNT.  This generally requires manual assessment (e.g., to determine if the record values match the item), but depending on the problem, the count or facet tools may show where something doesn’t align with what we know about a collection.  Also, facet is helpful to find mis-matches between values and qualifiers (e.g., a creator/contributor name labeled “organization” but formatted as a personal name or vice versa).

screenshot of count toolExample of the count tool; this shows that 609 serial “type” items have 0 serial title entries

Whitman.  Our repository is broken into several Main Collections, (Archives, Honors Theses, Faculty and Student Scholarship, Maxey Museum, and Sheehan Gallery) . Each Main Collection has its own measure of accuracy, however, during the past year, our Metadata Working Group created guidelines to help unify our metadata and create a structural accuracy that would be of most help upon our migration.  We began our assessment on each field, comparing the field and metadata use across all collections. In this process, we created standards for our fields, and evaluated the clean up that needed to happen in order for our legacy metadata to follow the new standards. Once decisions were made, it was up to the three of us to manually make the changes, and check our metadata for accuracy.


How do you assess completeness?

Colgate.  We have different, but informal standards of completeness depending on whether the item resides in our digital collections or in our institutional repository.  For our digital collections, which are generally used by students or faculty looking for items on a topic or theme, we have 15 standard fields that we use, and all are required if appropriate metadata exists or can be determined.  In our digital repository, which contains university sponsored research, patrons are generally looking for a ‘known item’, and their searches are more straightforward on titles, course titles/numbers, or authors.  In our IR, we use 10 fields, and consider the record complete if we have or can capture or develop metadata for each of them.  After we migrate to Islandora 8, we also hope to add a drupal form onto the front of our IR where we can have submitters deposit their own research and accompanying metadata.  Submissions would not be accepted or deemed ‘complete’ unless all of the required fields are completed with the appropriate metadata chosen from a drop down list.

UNT.  We have a formal definition of “minimally-viable records” ( which includes a metric to calculate completeness based on values in 8 fields required for every record: (main) title, language, (content) description, 2 subjects, resource type, format, collection, and institution (i.e., the partner department or organization that owns the items). Our system automatically calculates “completeness” and displays it in the item record — both a summary view and with color-coded notations in the editing form so that editors can see which fields are missing — and on the main Dashboard so that editors can limit their view to just complete/incomplete records.

screenshot of completeness filterScreenshot of completeness filter on the Dashboard (left) and item summary noting “incomplete” (right)

Additionally, we can use the count tool to find fields that may be missing for particular material types or collections.  For example, we don’t require “creator” values, but if we are looking at a collection of theses/dissertations or published books, a record that has a creator count of 0 would need to be reviewed. Similarly, materials from our Libraries’ Special Collections always include a “citation” that references the archival collection, even though this is rarely used for materials from other partners.

Whitman.   It’s hard to pin-point when exactly our metadata assessment is complete when preparing for our migration as we’re constantly learning new things about Islandora 8, and adapting our metadata. However during our initial metadata assessment, we used a spreadsheet to track changes. It was a good place to indicate that a change (such as a field being removed) was completed and what changes still needed to be implemented. This came in hand specifically as we worked on several fields at once, and had up to 80 individual spreadsheets that needed to be worked on.

screenshot of spreadsheets

How do you assess conformance to expectations?

Colgate.  Colgate switched from CONTENTdm to Islandora because of patron complaints about the system, and how difficult items were to discover.  Whether that was because the metadata was non-standard or the interface was not user-friendly was not clearly documented.  In a previous iteration of our institutional repository, items could be self-uploaded with any value in any field, or even incomplete information, so there was very little conformance to either internal or external expectations.  A combination of Excel filters, individual assessment, or fresh creation of metadata was all that could ensure any conformity for our IR.  

We have learned that users like to search by subject in our more ‘historical’ collections, so we try to include at least one subject field for each described item.  In the case of our regular digital collections, we use Library of Congress Subject Headings, while our institutional repository uses a combination of other standards such as JEL codes for economics papers or author supplied keywords.

screenshot of LCSHScreenshot showing popular Library of Congress Subjects Headings for Edward H. Stone Collection

For internal expectations, we have a formal best practices document that all metadata creators must use that provides standardized formats for names, dates, locations, collections names, rights management, and so on.  The occasional filtering or faceting of spreadsheets and MODS datastreams allows us to check general compliance with the best practices.  

screenshot from Colgate Digital CollectionsScreenshot from Colgate Digital Collections Metadata Best Practices document showing how Creator field is to be completed and formatted

UNT.  User/external expectations are difficult to evaluate and generally require manual proofreading, but we have started reviewing subject headings across various Libraries holdings/descriptions for ethical representation.  Facet can be helpful for finding records that contain specific, problematic values.

In terms of internal expectations, structural metadata elements are built into our system, and we have extensive guidelines that document usage and formatting for all fields (  Our evaluation tools help to identify values that do not meet formatting requirements, particularly related to authorities and controlled vocabularies.  One of the new adjustments that has been added within the last year is validation for selected controlled subject vocabularies.  A warning message displays in the edit form if someone enters one of these values incorrectly, but also highlights incorrect values in facet and cluster when limited to those qualifiers.  This also applies to date values that do not align with EDTF formatting.

Example of clusters for Library of Congress Medium of Performance (LCMPT) subject termsExample of clusters for Library of Congress Medium of Performance (LCMPT) subject terms

Whitman.  When it comes to conformance to expectation it’s good to look at our Taxonomy Fields (field_linked_agent, field_subject, field_genre). As of now, our process is pretty labor intensive. First we take a field (field_subject for example) and copy the content from all our collections, pasting into a single spreadsheet tab. Then we would separate out using either Text to Columns (Excel) or Split into Columns (Google Sheets). This helped address repeated fields.

screenshot of taxonomy fields in spreadsheet

Once terms are separated, we manually add them to one column. From there we remove duplicates. Once duplicates are removed, we visually go down each cell to make sure our terms conform. This is a labor-intensive practice, however, it allows us to tie which collection the term came from, as well as create the opportunity for a drop-down list. For our relatively small item count and timeframe, it was easier to do in Google Sheets than OpenRefine. 

How do you assess consistency?

Colgate.  We are a joint metadata and cataloging department, so we use many of the same standards of uniformity with our non-MARC metadata as with our MARC metadata.  The team of paraprofessional catalogers that create or edit non-MARC metadata finds it helpful to apply the rules of authority control and uniformity that we use in the ‘other half’ of our work.  Combined with our Best practices for metadata core documents, we try to ensure that the way we format metadata is consistent across collections, and that we use our profiled fields consistently across multiple collections with very little variability.  

We hope that our migration to Islandora 8 which allows for an ‘authority file’ style drop down list of terms will result in significantly less cleanup, particularly when it comes to names and geographical locations.  We anticipate this will open up a broader range of metadata creation to our student workers, who will appreciate the convenience of picking a name or location from a list, which will make these terms accurate and in compliance, and less prone to typos or inconsistencies.

Screenshot of spreadsheet being filtered to show diversion in names found in Stone collectionScreenshot of spreadsheet being filtered to show diversion in names found in Stone collection

UNT.  Generally we think of consistency as representing values in standardized ways to ensure that exact-string matching or searching for specific phrases could find all relevant items.  Cluster and facet are both particularly useful to compare value formatting and usage for consistency.  Within particular collections, consistency may also be evaluated in terms of whether fields are used in the same ways, which tends to show up in count.

Example of photographer names in the facet toolExample of photographer names in the facet tool

Cluster examples of (a) contributor names grouped by normalized key and (b) dates with the same alpha-numeric pattern

Cluster examples of (a) contributor names grouped by normalized key and (b) dates with the same alpha-numeric patternCluster examples of (a) contributor names grouped by normalized key and (b) dates with the same alpha-numeric pattern

Whitman.   The standards we created as a group included an in depth description as to what our metadata would look like going forward. 

screenshot of descriptive fields

We also created a document that depicted examples of our metadata in each field and collection. 

screenshot of examples document

As we assessed our metadata remediating it so it conformed to our standards, we came across areas in which conforming was not possible (either it was not feasible, or the collection did not belong to the library). In those cases we indicated those exceptions in our documentation. Throughout our assessment, we continuously checked for consistency both between and within our collection spreadsheets manually. Much like checking for conformance of expectations, using spreadsheets and having multiple eyes checking for inconsistencies helped make sure they were consistent in both following our policies, but also consistent in the formatting of our metadata.  


How do you assess provenance?

Colgate.  We use Islandora, which supports storage of multiple versions of metadata.  We can roll back metadata to previous versions in case of mistake, disaster, or re-assessment.   It helps keep us honest when we can see who is responsible for the work being done, and also identify the legacy metadata (by creation date) that was performed by vendors, or migrated in from our CONTENTdm system.

Screenshot of versioning of MODS datastream for a digital image in Islandora showing dates created, by whom, and ability to delete, revert to previous versions (operator names obscured for privacy)Screenshot of versioning of MODS datastream for a digital image in Islandora showing dates created, by whom, and ability to delete, revert to previous versions (operator names obscured for privacy)

UNT.  We version all of our metadata and make the history/previous versions available to editors.  Although we don’t generally assess this (except for a couple of papers comparing change across versions), it can be useful when doing other assessment or review activities to see who “touched” the record, what they changed, or when specific values/changes were introduced into a record.  We also have the ability to “roll back” changes in extreme circumstances, although this isn’t something we have used.

Example of history for an item record, showing timestamps and editors with change summaries and versionsExample of history for an item record, showing timestamps and editors with change summaries and versions

Whitman.  We have future plans to assess and fine-tune the Provenance of our metadata. As of now, that information lives on archived spreadsheets across our shared Drive. 


How do you define/measure timeliness?

Colgate.  We used the migration of our institutional repository as a chance to peek at the timeliness and currency of our metadata and descriptions across collections.  Some collections had been bouncing around for years, variously described in Dublin Core, MARC, or sometimes according to no schema at all.  The IR is now fully MODS compliant, and it feels like we are on a strong foundation to accept and effectively describe new submissions in the fall.  Looking more broadly, we are now sure that dates have been a particular problem in our history collections that were migrated from CONTENTdm.  We plan to use Islandora’s filter option to look for missing date ranges for where we are confident we have coverage, and Open Refine to clean and standardize date-related metadata.

Screenshot from student newspaper collection duplicating coverage dates, rather than showing only one, indicating date migration problem from legacy metadataScreenshot from student newspaper collection duplicating coverage dates, rather than showing only one, indicating date migration problem from legacy metadata

UNT.  Our system contains a large amount of imported data from government databases and old values from early records that do not conform to current guidelines.  Cluster and facet, in particular, help us to identify where values or formatting need to be updated.  We can also use count to locate records that we might want to target for additional research (e.g., historical photos of the UNT campus that don’t have dates or locations that the wider community might be able to identify).  Additionally, users often submit feedback about names, locations (including place points), or other information related to particular items that we can add to records.

Example of two types of text soliciting location information from users in a public recordExample of two types of text soliciting location information from users in a public record

Whitman.  When it comes to the timeliness of creating metadata, the process varies between the different Main Collections. For example, Faculty and Scholarship, and Archives tend to contain items that are likely to fall into the “timely ingest” category so classes can use their content. Theses, on the other hand, take a lengthy approach. First the theses are cataloged in OCLC, then the metadata is added to our repository. This process can take up to two months depending on our cataloger’s workload. 

For our migration to Islandora 8, we started the remediation process in Summer of 2020 and are still fine-tuning it a year later. This lengthy process allowed us to assess our metadata and improve legacy descriptions. Though the clean up took a year, we acknowledge that it was time well spent as our new items will now have standards and examples to follow, making it easier and faster to add to our repository. 


What challenges do you have evaluating metadata?

Colgate.  Since we are a small department with a dual focus (metadata and cataloging) we are torn between balancing demands for creating new metadata for new collections, as well as improving and standardizing legacy metadata.  Some of the most commonly requested changes (addressing missing or ill-formatted date metadata) are also the most difficult and time-consuming to fix, as they can not occur via a batch process.  We are trying to target collection-by-collection assessment by prioritizing projects that improve the overall user experience, rather than just cleanup for cleanup’s sake.  

Our general metadata review workflow is also not sustainable, with many creators and only one reviewer. On our wishlist is another metadata librarian who can focus on standards compliance at the point of collection creation, as well as training and supervising student workers.  

UNT.  Despite our evaluation tools, there are still issues that are difficult for us to find or change.  For example, we can easily find unique values that exist across records, but not values that are duplicated in the same record.  Also, editors still need to determine which values are correct and know the expectations for various items or collections, which requires quite a bit of manual review or correction.  Additionally, some issues will always require an editor (or supervisor) to check metadata values against an item to verify accuracy or make a judgement call about how the item has been described within the bounds of our established guidelines and the context of the particular collection.

Another general concern is determining the best ways to prioritize and distribute resources to review and correct records across our system.  Our edit tools help us to find potential issues system-wide, but, like all institutions, we have limited time and resources available to review or address those issues.  Determining the most effective way to deploy editors or identify the “most important” issues is an area of ongoing research for us.

Whitman.  We had a couple of challenges when it came to evaluating our metadata. First was the time in which we planned to evaluate and remediate our metadata. We were selected for the LYRASIS Grant in which our content would migrate from Islandora 7 to Islandora 8 around the time we were beginning our metadata assessment and remediation. This was challenging in the sense that there was a lot of information that needed to be remembered for not only the metadata remediation, but also to prepare for a MODS-RDF migration, as well as document the process in detail. Our Metadata Librarian often had multiple projects going on in a single day including  planning the next fields to be assessed during our Metadata Working Group meetings while fixing fields that had just been assessed, while mapping all fields to RDF. Along with that there was some communication challenges where some group members referred to fields by their Public Lable, while others referred to them as their spreadsheet label (example: Access Statement” vs. access_condition).

The second challenge in evaluating metadata is spreadsheet versioning. We were lucky enough to have a sizable collection in which the work could be done in spreadsheets, however, it was difficult keeping track of various versions of spreadsheets and who/what work was done. Along with that, it was challenging to update each spreadsheet as we changed field names.

The post Case Studies: Metadata Assessment appeared first on DLF.

OpArt: The Art Research Collective Collection in Motion / HangingTogether

“Descending” by Bridget Riley (1965). Gemeentemuseum Den Haag; Rob Oo from NL, CC BY 2.0, via Wikimedia Commons

Optical art – Op Art for short – explores the illusion of movement in two-dimensional spaces. To create the impression that the images on a canvas are in motion – pulsating, moving in waves, blinking on and off, etc. – artists in this genre carefully orchestrate detailed patterns of shapes and colors to arrive at the desired effect. And that brings us to collective collections …

OCLC Research has done quite a bit of work using WorldCat bibliographic and holdings data to offer a view of what the collective collections of groups of institutions might look like – size, overlap patterns, salient characteristics, and so on. But ideally, collective collections should be more than static images in data; instead, they should be collections in motion: dynamic, functional collections that release value both to their stewards and users. For this, we need to think about the orchestrations that lie underneath – not of patterns of shapes and colors, but of collection characteristics and collaboration models that make the collective collection come to life, or in other words, put it in motion.  

This is the theme motivating a new OCLC Research project, Operationalizing the Art Research Collective Collection (OpArt). Art libraries play a vital role in supporting art scholarship within their own institutions and in the broader scholarly community. But the economic impact and other repercussions of the COVID pandemic are accelerating and amplifying challenges to institutional sustainability, and the consequences are likely to persist into the coming decade. An important component in building sustainability is identifying new opportunities for collaboration that will address the challenges brought on or exacerbated by the pandemic. OpArt will help art libraries find sustainable pathways through this season of diminished resources and shared challenges by identifying potential collaborative models around collective collections, and illuminating the practical considerations involved in such collaborations.

OpArt is a four-phase research project aimed at developing a better understanding of the relationship between art, academic, and independent research libraries, through an analysis of their collections and resource sharing activity. We will then use these findings to inform an exploration of new possibilities for collaboration and partnership models that support sustainable, ongoing availability of the rich collections of art libraries to researchers, wherever they may be.

The four phases of OpArt include:

  • Analyze Collective Collections: How do art library collections, and the collections of other types of libraries, compare against one another? What areas of overlap and redundancy do we see? What does this tell us about potential opportunities for cooperation, coordination, and/or sharing across collections?
  • Analyze Collection Sharing Patterns: What patterns exist in resource sharing activity (past and present) across art library collections, and across art library collections and the collections of other types of libraries? What does this tell us about the opportunities for cooperation, coordination, and sharing that art libraries are currently leveraging? 
  • Explore Collaborative Case Studies: Building on the findings from the first two phases, we will take a “deep dive” into several case studies of regional art library cooperation/coordination/sharing. What insights and lessons can we learn from these examples?
  • Operationalize Collaboration: What are the practical challenges of creating and maintaining cross-institutional relationships that can deliver the opportunities highlighted in our collections and resource sharing analyses, and the case studies? What general recommendations can be identified to help art libraries create and manage partnerships around their collections?  

The concept for this project originated in a discussion in 2019 between members of the OCLC Research Library Partnership (RLP), a transnational collaborative network formed to address issues of collective interest to research libraries. The discussion focused on an acute lack of space at art research libraries, difficulties in arranging for offsite storage of art research print collections, a lack of knowledge regarding the library collections of peer institutions, and the perceived value of art libraries partnering with other types of libraries on the shared management of print collections. The COVID pandemic has amplified the urgency of these and other issues related to the long-term sustainability of art research collections. 

The OpArt project will engage key stakeholders as partners in our research, tapping into the deep expertise within the RLP membership – including those members participating in SHARES, the resource sharing arm of the RLP. Art libraries are an important part of the RLP and SHARES, and the experiences of staff at these institutions will be critical in informing our case studies work, as well as providing guidance for all aspects of the investigation. In addition to regular engagement with the RLP membership, an advisory committee drawn primarily from SHARES member institutions will provide advice and consultation throughout the project, including helping to frame the collective collections and resource sharing analyses, identify case study partners, interpret results, and finalize recommendations. The advisory committee membership includes:

  • Jon Evans, Chief of Libraries and Archives, Museum of Fine Arts, Houston
  • Rebecca Friedman, Assistant Librarian, Marquand Library, Princeton University
  • Roger Lawson, Executive Librarian, National Gallery of Art
  • Autumn Mather, Director, Ryerson and Burnham Libraries, Art Institute of Chicago
  • Lori Salmon, Head, Institute of Fine Arts Library, New York University
  • Keli Rylance, Head Librarian, Richardson Memorial Library, Saint Louis Art Museum
  • Kathleen Salomon, Chief Librarian, Associate Director, Getty Research Institute
  • Tony White, University Librarian, OCAD University

The OpArt project is supported through a grant by the Samuel H. Kress Foundation, with significant co-investment from OCLC. It draws on OCLC Research’s deep experience analyzing collective collections, including those of art libraries (see An Art Resource in New York: The Collective Collection of the NYARC Art Museum Libraries), as well as more recent work focusing an strategies for operationalizing collective collections (see Operationalizing the BIG Collective Collection: A Case Study of Consolidation vs Autonomy). And as mentioned above, it benefits from the experiences and perspectives of practitioners engaged in the stewardship of art research collections. 

In the time of COVID, budget realities and sustainability challenges are more acute than ever. Art libraries will look to innovative partnerships and collaborations as one way to move forward. Understanding the scope for cooperation and coordination between art, academic, and independent research libraries can help identify new collaborative models to support the continued availability of art research resources – and contribute toward putting the art research collective collection in motion. Stay tuned as the OpArt project explores these exciting topics!

The post OpArt: The Art Research Collective Collection in Motion appeared first on Hanging Together.

MarcEdit 7.5 Installer Questions / Terry Reese

Hi all,

As I continue to refine the Installer for 7.5 – I had a couple of specific questions.

How the installer works now

When the installer first runs, the first step is a dependency evaluation.  MarcEdit requires the following dependencies to run the core program:

  • .NET 5.0 Desktop Runtime (32 bit or 64 bit/32 bit)
  • If one of these dependencies are missing, the program has these runtimes embedded in the installer and will attempt to install the dependency.  To install the dependency, MarcEdit requires Admin permissions
  • After dependencies are installed, if a user selects by user install, the program will install the application with user permissions and will *not* install COM components.  If installed for all, will require an admin to install, and COM objects *will* be installed.
  • If the OCLC plugin is selected, then admin install will need to be used to install the OBDC dependencies.

My Questions

  1. Because of the .NET 5.0 Desktop Runtime, there are dependencies that require Admin permissions to install the program.  Two questions come out of that.
    • Right now, I’ve removed the pre-req install dialogs and just prompt for install of the dependencies.  I find the reduced dialogs desirable; but I’m wondering if this is true for others.  If more information is better, I can put these dialogs back.
    • Do we need a true User Permissions only installation?  There is a way to do this.  I can publish a build that “freezes” the framework files into the application folder which would then allow for a truly, user only build.  It would be a big installer (~350 MB); but I could certainly do that.  That would absolutely create a fully, user only install without the need to do any admin installation (it would also remove the OCLC plugin option in the installer since that *must* have admin permissions to install (it’s a system component that is installed).
    • Currently, the installer on 64 bit systems will install both the 32 and 64 bit runtimes in order for COM functionality to work.  I could pull out the COM installation as a “feature” in the installer, and then change the “requirements”  It would work like the OCLC plugin option – and allow users to go back to the installer and modify the install to add these features as necessary.
  2. COM automation has long been something MarcEdit has supported – which is why I’ve been working hard to make sure this continues in 7.5.  I’d implemented this early on but didn’t capture bitness because at this point, all my code development and software I personally use is 64 bit.  I’ve started to either retire or find replacements for 32-bit applications.  I’d like to understand how much automation is occurring.  Some things I could consider doing:
    • For users, activate COM for users only (I can do this manually in the registry – I’m not now, and Microsoft’s official position is, they’d prefer not – but I could do this pretty easily as the user Hive doesn’t affect the local system)
    • For COM support, bitness is important.  I could create install options to allow users to decide what COM they want to support (all, none, system type) which then changes what dependencies are required for installation.
  3. How is the unified installer working for people?

    So, from my perspective, a single installer makes my life a lot easier.  It’s easier for me to track, to build, and to test.  When I was building 4 installers, the install process was taking me an extra 3-4 hours due to testing as I had to maintain a significant number of VMs within my infrastructure to test a larger number of install/update scenarios.  A single build simplifies some of that.  I can consider some different build types – but that work definitely makes the management process a bit more difficult on my end.  But at this point, I’m looking for feedback from the community around what is easier as the goal is to make the install/update process as easy as possible as again, ensuring the program remains updates also makes my life easier as it helps when questions come up since I’m primarily testings/providing solutions using the current version of the application.

Thanks for the feedback,


Autonowashing / David Rosenthal

On the 16th Tom Krisher reported that US Opens Formal Probe Into Tesla Autopilot System:
The U.S. government has opened a formal investigation into Tesla’s Autopilot partially automated driving system after a series of collisions with parked emergency vehicles.

The investigation covers 765,000 vehicles, almost everything that Tesla has sold in the U.S. since the start of the 2014 model year. Of the crashes identified by the National Highway Traffic Safety Administration as part of the investigation, 17 people were injured and one was killed.

NHTSA says it has identified 11 crashes since 2018 in which Teslas on Autopilot or Traffic Aware Cruise Control have hit vehicles at scenes where first responders have used flashing lights, flares, an illuminated arrow board or cones warning of hazards.
The agency has sent investigative teams to 31 crashes involving partially automated driver assist systems since June of 2016. Such systems can keep a vehicle centered in its lane and a safe distance from vehicles in front of it. Of those crashes, 25 involved Tesla Autopilot in which 10 deaths were reported, according to data released by the agency.
On the 19th Katyanna Quach reported that Senators urge US trade watchdog to look into whether Tesla may just be over-egging its Autopilot, FSD pudding:
Sens. Edward Markey (D-MA) and Richard Blumenthal (D-CT) put out a public letter [PDF] addressed to FTC boss Lina Khan on Wednesday. In it, the lawmakers claimed "Tesla’s marketing has repeatedly overstated the capabilities of its vehicles, and these statements increasingly pose a threat to motorists and other users of the road."
These are ridiculously late. Back in April, after reading Mack Hogan's Tesla's "Full Self Driving" Beta Is Just Laughably Bad and Potentially Dangerous, I wrote Elon Musk: Threat or Menace?:
I'm a pedestrian, cyclist and driver in an area infested with Teslas owned, but potentially not actually being driven, by fanatical early adopters and members of the cult of Musk. I'm personally at risk from these people believing that what they paid good money for was "Full Self Driving". When SpaceX tests Starship at their Boca Chica site they take precautions, including road closures, to ensure innocent bystanders aren't at risk from the rain of debris when things go wrong. Tesla, not so much.
I'm returning to this topic because an excellent video and two new papers have shown that I greatly underestimated the depths of irresponsibility involved in Tesla's marketing.

Let me be clear. Tesla's transformation of electric cars from glorified golf carts to vehicles with better performance, features and economy than their conventional competitors is both an extraordinary engineering achievement and unambiguously good for the planet.

Family members drive a Model 3 and are very happy with it. This post is only about the systems that Tesla tells regulators are a Level 2 Automated Driver Assist System (ADAS) but that Tesla markets to the public as "Autopilot" and "Full Self-Driving".

Four years ago John Markoff wrote about Waymo's second thoughts about self-driving cars in Robot Cars Can’t Count on Us in an Emergency:
Three years ago, Google’s self-driving car project abruptly shifted from designing a vehicle that would drive autonomously most of the time while occasionally requiring human oversight, to a slow-speed robot without a brake pedal, accelerator or steering wheel. In other words, human driving was no longer permitted.

The company made the decision after giving self-driving cars to Google employees for their work commutes and recording what the passengers did while the autonomous system did the driving. In-car cameras recorded employees climbing into the back seat, climbing out of an open car window, and even smooching while the car was in motion, according to two former Google engineers.
As someone who was sharing the road with them, I can testify that seven years ago Waymo's cars were very good at self-driving, probably at least as good as Tesla's are now. But Waymo had run into two fundamental problems:
  • Over-trust or complacency. Markoff wrote:
    Over-trust was what Google observed when it saw its engineers not paying attention during commutes with prototype self-driving cars. Driver inattention was implied in a recent National Highway Traffic Safety Administration investigation that absolved the Tesla from blame in a 2016 Florida accident in which a Model S sedan drove under a tractor-trailer rig, killing the driver.
    The better the system works most of the time, the less likely the driver is to be paying attention when it stops working.
  • The hand-off problem. Markoff wrote:
    Last month, a group of scientists at Stanford University presented research showing that most drivers required more than five seconds to regain control of a car when — while playing a game on a smartphone — they were abruptly required to return their attention to driving.

    Another group of Stanford researchers published research in the journal Science Robotics in December that highlighted a more subtle problem. Taking back control of a car is a very different experience at a high speed than at a low one, and adapting to the feel of the steering took a significant amount of time even when the test subjects were prepared for the handoff.
    But as I wrote at the time:
    But the problem is actually much worse than either Google or Urmson say. Suppose, for the sake of argument, that self-driving cars three times as good as Waymo's are in wide use by normal people. A normal person would encounter a hand-off once in 15,000 miles of driving, or less than once a year. Driving would be something they'd be asked to do maybe 50 times in their life.

    Even if, when the hand-off happened, the human was not "climbing into the back seat, climbing out of an open car window, and even smooching" and had full "situational awareness", they would be faced with a situation too complex for the car's software. How likely is it that they would have the skills needed to cope, when the last time they did any driving was over a year ago, and on average they've only driven 25 times in their life?
Waymo, the company that knows the problem best, understood seven years ago that "Full Self-Driving" was decades away. But Elon Musk started selling customers "Full Self-Driving" five years ago, two years after Waymo had decided it was too hard to be feasible in the medium term. Neither Waymo nor Tesla have an excuse for missing the problems of complacency or hand-off. They were well-understood in aviation and covered in 1997's Humans and Automation: Use, Misuse, Disuse, Abuse by Raja Parasuraman and Victor Riley:
Most automated systems are reliable and usually work as advertised. Unfortunately, some may fail or behave unpredictably. Because such occurrences are infrequent, however, people will come to trust the automation. However, can there be too much trust? Just as mistrust can lead to disuse of alerting systems, excessive trust can lead operators to rely uncritically on automation without recognizing its limitations or fail to monitor the automation's behavior. Inadequate monitoring of automated systems has been implicated in several aviation incidents — for instance, the crash of Eastern Flight 401 in the Florida Everglades. The crew failed to notice the disengagement of the autopilot and did not monitor their altitude while they were busy diagnosing a possible problem with the landing gear
That was published seventeen years before Waymo figured it out.

Musk was not just selling a product he couldn't deliver, he was selling an investment that couldn't deliver — the idea that a "Full Self-Driving" Tesla would make its owner a profit by acting as an autonomous taxi. Tesla's marketeers faced a choice that should not have been hard, but obviously was. They could either tell Musk to back off his hype (and get fired), or go along. Going along required two new marketing techniques, "Autonowashing" the product and "Econowashing" the investment.


Mahmood Hikmet's must-watch YouTube video Is Elon Musk Killing People? is an excellent introduction to the thesis of Liza Dixon's Autonowashing: The Greenwashing of Vehicle Automation:
According to a recent study, “automated driving hype is dangerously confusing customers”, and, further, “some carmakers are designing and marketing vehicles in such a way that drivers believe they can relinquish control” (Thatcham Research, 2018). Confusion created by OEMs via their marketing can be dangerous, “if the human believes that the automation has more capability than it actually has.” (Carsten and Martens, 2018). The motivations for this are clear: “Carmakers want to gain competitive edge by referring to ‘self-driving’ or ‘semi-autonomous’ capability in their marketing...” (Thatcham Research, 2018). As a result, a recent survey found that 71% of 1,567 car owners across seven different countries believed it was possible to purchase a “self-driving car” today (Thatcham Research, 2018).
Dixon uses three case studies to illustrate autonowashing. First, media headlines:
Over the past decade, terms such as “autonomous”, “driverless”, and “self-driving” have made increasing appearances in media headlines. These buzzwords are often used by media outlets and OEMs to describe all levels of vehicle automation, baiting interest, sales and “driving traffic” to their respective sites. It is not uncommon to come across an article discussing Level 2 automation as “autonomous” or a testing vehicle as “driverless”, even though there is a human safety driver monitoring the vehicle and the environment
Second, the Mercedes E class sedan:
In 2016, Mercedes-Benz launched a new advertising campaign called “The Future” in order to promote the new automated features launching in its E-Class sedan. The campaign stated:
“Is the world truly ready for a vehicle that can drive itself? An autonomous-thinking automobile that protects those inside and outside. Ready or not, the future is here. The all new E-Class: self-braking, self-correcting, self-parking. A Mercedes-Benz concept that's already a reality.”
The headline of one of the ads read, “Introducing a self-driving car from a very self-driven company.”
Mercedes pulled the campaign, in part because it appeared just after a fatal Autopilot crash and in part because consumer groups were pressuring the FTC.

But primarily Dixon focuses on Tesla's marketing:
It is explicitly stated on the Tesla website and in the vehicle owner's manual in multiple instances that the driver must keep their hands on the wheel and their attention on the road ahead (Tesla, 2019b, 2019a). Despite these statements, Tesla is the only OEM currently marketing Level 2, ADAS equipped vehicles as “self-driving” (The Center for Auto Safety and Consumer Watchdog, 2018).

In October 2016, Tesla announced that “all Tesla vehicles produced in our factory...will have the hardware needed for full self-driving capability at a safety level substantially greater than that of a human driver” (Tesla Inc., 2016a) (see Fig. 2). This announcement also came with the sale of a new Autopilot option called “Full Self-Driving Capability” (FSD). Tesla stated that customers who purchased the FSD upgrade would not experience any new features initially but that in the future, this upgrade would enable the vehicle to be “fully self-driving” (Lee, 2019). This option was later removed, but then subsequently reintroduced for sale in February of 2019.
Dixon Fig. 3
The message that "Full Self-Driving" was actually available, or at least just around the corner, was reinforced in many ways, including promotional videos but primarily by Elon Musk personally hyping the capability:
Tesla's CEO Elon Musk has promoted “Full Self-Driving Capability” on his personal Twitter account, in one case stating “Tesla drives itself (no human input at all) thru urban streets to highway to streets, then finds a parking spot” without clarifying that this feature is not yet enabled (@ elonmusk, 2016). Further, Musk has been seen in multiple TV interviews (Bloomberg, 2014; CBS News, 2018) removing his hands from the wheel with Autopilot active. In one of these examples, he did so and stated, “See? It's on full Autopilot right now. No hands, no feet, nothing,” as he demonstrates the system to the interviewer, who is sitting in the passenger seat (Fig. 3) (Bloomberg, 2014). This behavior is at odds with appropriate use, and is explicitly warned against in the Tesla Owner's Manual (Tesla, 2019a).
Screen grab
Tesla's 2016 "Paint It Black" advert was even more explicit.

Lets get real. It is half a decade later and Gabrielle Coppola and Mark Bergen have just published Waymo Is 99% of the Way to Self-Driving Cars. The Last 1% Is the Hardest:
In 2017, the year Waymo launched self-driving rides with a backup human driver in Phoenix, one person hired at the company was told its robot fleets would expand to nine cities within 18 months. Staff often discussed having solved “99% of the problem” of driverless cars. “We all assumed it was ready,” says another ex-Waymonaut. “We’d just flip a switch and turn it on.”

But it turns out that last 1% has been a killer. Small disturbances like construction crews, bicyclists, left turns, and pedestrians remain headaches for computer drivers. Each city poses new, unique challenges, and right now, no driverless car from any company can gracefully handle rain, sleet, or snow. Until these last few details are worked out, widespread commercialization of fully autonomous vehicles is all but impossible.
Musk wasn't alone in having excessively optimistic timelines, but he was alone in selling vehicles to consumers based on lying about their capabilities. This is bad enough, but the story Hikmet tells is worse. You need to watch his video for the details, but here is the outline (square brakets are timestamps for the video):

Tesla's description of "Autopilot" and "Full Self-Driving" reads:
Autopilot and Full Self-Driving Capability are intended for use with a fully attentive driver, who has their hands on the wheel and is prepared to take over at any moment.
In other words, when these automated systems are in use the driver must monitor their behavior and be ready to respond to any anomalies. Dixon writes:
There is a long-standing consensus in human-automation interaction literature which states that humans are generally poor monitors of automation (Bainbridge, 1983; Sheridan, 2002; Strand et al., 2014). Partial vehicle automation requires a shift in the role of the user, from manual control to a supervisory role. Thus, the demand on the user for monitoring increases
Humans aren't good at the monitoring task; because the system works well most of the time they become complacent. In Tesla's Q1 2018 earnings call Musk explained the problem [29:30]:
when there is a serious accident on autopilot people for some reason think that the driver thought the car was fully autonomous and we somehow misled them into thinking it was fully autonomous it is the opposite case when there is a serious accident maybe always the case that it is an experienced user ... the issue is more one of complacency like we just get too used to it
Thus it is necessary to equip vehicles with Driver Monitoring Systems (DMS), which ensure that the driver is actually paying attention to their assigned task. This has long been a standard in railroad practice. Hikmet's story is essentially about Tesla's DMS, and the conflict it posed between the need to ensure that customers were "fully attentive" at all times, and Elon Musk's irresponsible hype.

Other car companies' DMS are effective. They combine:
  • Capacitative sensors ensuring that the driver's hands are on the wheel.
  • A camera system looking at the driver, with image processing software ensuring that the driver is looking at the road.
  • Infra-red illumination ensuring that the camera system continues to operate at night.
The problem for Tesla was that these systems are not just relatively expensive but also highly intrusive. They would, for example, have been triggered during Elon Musk's televised demonstrations, and completely destroyed the "Look Ma, No Hands" message that Tesla had "Full Self-Driving". They would have constantly reminded customers that the high-tech features they had paid for did not actually exist.

Tesla's solution to this dilemma was to implement a DMS using a torque sensor to determine whether the driver's hands were on the wheel. This suffered from two problems, it did not determine whether the driver was looking at the road, or even in the driver's seat, and the torque needed to activate the sensor was easy to provide with, as Consumer Reports did, a roll of tape [18:13]. Hikmet reports that specifically designed "Steering Wheel Boosters" are easily purchased on-line.

Musk's explanation for why Tesla hadn't adopted the industry standard DMS technology emerged in an April 2019 interview with Lex Freedman [31:19]:
Freedman: Do you see Tesla's Full Self-Driving for a time to come requiring supervision of the human being?
Musk: I think it will require detecting hands on wheel for at least 6 months or something like that. ... The system's improving so much, so fast, that this is going to be a moot point very soon. If something's many times safer than a person, then adding a person the effect on safety is limited. And in fact it could be negative. ... I think it will become very quickly, maybe towards the end of this year, I'll be shocked if its not next year at the latest, having a human intervene will decrease safety.
Steve Jobs notoriously possessed a "reality distortion field", but it pales compared to Musk's. Not the "end of this year", not "next year at the latest", but two years after this interview NTHSA is investigating why Teslas crash into emergency vehicles and Mack Hogan, writing for the authoritative Road and Track, started an article:
if you thought "Full Self Driving" was even close to a reality, this video of the system in action will certainly relieve you of that notion. It is perhaps the best comprehensive video at illustrating just how morally dubious, technologically limited, and potentially dangerous Autopilot's "Full Self Driving" beta program is.
Why can Musk make the ludicrous claim that Autopilot is safer than a human driver? Hikmet explains that it is because Tesla manipulates safety data to autonowash their technology [20:00]. The screengrab shows Tesla's claim that Autopilot is ten times safer than a human. Hikmet makes three points:
  1. Tesla compares a new, expensive car with the average car, which in the US is eleven years old. One would expect the newer car to be safer.
  2. Tesla compares Autopilot, which works only on the safest parts of the highway network, with the average car on all parts of the network.
  3. Tesla doesn't disclose whether its data includes Teslas used in other countries, almost all of which have much lower accident rates than the US.
NTHSA's 2017 report on an early Autosteer fatality depended upon data Tesla generated. Elon Musk tweeted [24:57]:
Report highlight: "The data show that the Tesla vehicles crash rate dropped by almost 40% after Autosteer installation"
Two years later, after they FOIA'ed the data, Quality Control Systems Corporation published a report entitled NHTSA's Implausible Safety Claim for Tesla's Autosteer Driver Assistance System:
we discovered that the actual mileage at the time the Autosteer software was installed appears to have been reported for fewer than half the vehicles NHTSA studied. For those vehicles that do have apparently exact measurements of exposure mileage both before and after the software's installation, the change in crash rates associated with Autosteer is the opposite of that claimed by NHTSA - if these data are to be believed.

For the remainder of the dataset, NHTSA ignored exposure mileage that could not be classified as either before or after the installation of Autosteer. We show that this uncounted exposure is overwhelmingly concentrated among vehicles with the least "before Autosteer" exposure. As a consequence, the overall 40 percent reduction in the crash rates reported by NHTSA following the installation of Autosteer is an artifact of the Agency's treatment of mileage information that is actually missing in the underlying dataset.
Musk is high on his own supply. The interview continues [32:57]:
Freedman: Many in the industry believe you have to have camera-based driver monitoring. Do you think there could be benefit gained from driver monitoring?
Musk: If you have a system that's at or below human-level reliability, then driver monitoring makes sense, but if your system is dramatically better, more reliable than a human then driver monitoring does not help much.
Musk's reality distortion field is saying Autopilot is "dramatically better ... than a human". Who are you going to believe — Musk or the guys in the 11 emergency vehicles hit by Teslas on Autopilot?

As Timothy B. Lee reported nearly a year ago in Feds scold Tesla for slow response on driver monitoring:
The National Transportation Safety Board, a federal agency tasked with investigating transportation crashes, published a published a preliminary report Tuesday about a January 2018 crash in Culver City, California. For the most part, the report confirmed what we already knew about the incident: a Tesla Model S with Autopilot engaged crashed into a fire truck at 31 miles per hour. Thankfully, no one was seriously injured.

But near the end of its report, NTSB called Tesla out for failing to respond to a 2017 recommendation to improve its driver monitoring system.
What the 2017 report said [42:04] was:
monitoring steering wheel torque provides a poor surrogate means of determining the automated vehicle driver's degree of engagement with the driving task
They recommended manufacturers "develop applications to more effectively the driver's level of engagement". Five of the manufacturers responded; Tesla didn't.

Musk is famously contemptuous of regulation, but this May Tesla finally made a slight concession to reality [37:51]. Four years after the NTSB told them to do better and two years after Musk claimed it "wouldn't help much", they implemented a camera-based driver monitoring system. Unfortunately, it uses the cabin camera, which wasn't designed for the job. Not merely does it lack infra-red capability, so it doesn't work at night, but Hikmet also shows it being fooled by a picture [38:25].

This Potemkin system likely won't be enough for China. Simon Sharwood reports that, under new rules announced this month:
Behind the wheel, drivers must be informed about the vehicle's capabilities and the responsibilities that rest on their human shoulders. All autonomous vehicles will be required to detect when a driver's hands leave the wheel, and to detect when it's best to cede control to a human.
And, as a further illustration of how little importance Tesla attaches to the necessary belt and braces approach to vehicle safety, in May Telsa announced that their Model 3 and Model Y cars will no longer have radar. They will rely entirely on image processing from cameras. Removing a source of navigation data seems likely to impair the already inadequate performance of Autopilot and Full Self-Driving in marginal conditions. The kind of conditions that someone who takes Musk at his word would be likely to be using the systems.

As Hikmet says, people have died and will continue to die because Elon Musk regards driver monitoring as an admission of failure.


The stock market currently values Ford at around $50B, General Motors at around $70B, Toyota at around $237B and Volkswagen around at $165B. It currently values Tesla at about 1/3 more than these four giants of its industry combined. That is after its P/E dropped from a peak of almost 1,400 to its current 355, which is still incredible compared to established high-tech growth companies such as Nvidia (P/E 70). The market clearly can't be valuing Tesla on the basis that it makes and sells cars. That's a low-margin manufacturing business. Tesla needs a story about the glorious future enormous high-margin high-tech business that will shower it with dollars. And that is where econowashing comes in.

Musk is repeatedly on record as arguing that the relatively high price of his cars is justified because, since they are will soon be might eventually be completely autonomous they can will might generate a return on the investment by acting as autonomous taxis when they aren't needed by the owner. For example, here is Musk more than two years ago:
I feel very confident in predicting autonomous robotaxis for Tesla next year. Not in all jurisdictions because we won't have regulatory approval everywhere, but I am confident we will have regulatory approval somewhere, literally next year
And here is Musk this year:
In January, after Tesla stock shot up nearly 700 percent over the course of a year, Elon Musk explained how shared autonomous vehicles, or SAVs, can help justify the company's valuation.

Speaking hypothetically on a fourth-quarter earnings call in January, Musk laid out a scenario in which Tesla reached $50 billion or $60 billion in annual sales of fully self-driving cars that could then be used as robotaxis.

“Used as robotaxis, the utility increases from an average of 12 hours a week to potentially an average of 60 hours a week,” he told investors on the call, according to a Motley Fool transcript. “So that’s, like, roughly a five times increase in utility.”
There are already companies that claim to be high-tech in the taxi business, Uber and Lyft. Both initially thought that robotaxis were the key to future profits, but both eventually gave up on the idea. Both have consistently failed to make a profit in the taxi business.

Transport economist Hubert Horan started his epic series Can Uber Ever Deliver nearly five years ago with this table. It shows his prediction that Uber's economics simply didn't work in the taxi business. It is still looking good.

So Uber needed to econowash itself. Horan's second part Understanding Uber’s Uncompetitive Costs explains how they did it:
Uber dealt with this Catch-22 with a combination of willful deception and blatant dishonesty, exploiting the natural information asymmetries between individual drivers and a large, unregulated company. Drivers for traditional operators had never needed to understand the true vehicle maintenance and depreciation costs and financial risks they needed to deduct from gross revenue in order to calculate their actual take home pay.

Ongoing claims about higher driver pay that Uber used to attract drivers deliberately misrepresented gross receipts as net take-home pay, and failed to disclose the substantial financial risk its drivers faced given Uber’s freedom to cut their pay or terminate them at will. Uber claimed “[our} driver partners are small business entrepreneurs demonstrating across the country that being a driver is sustainable and profitable…the median income on UberX is more than $90,000/year/driver in New York and more than $74,000/year/driver in San Francisco”[4] even though it had no drivers with earnings anything close to these levels.[5]
An external study of actual driver revenue and vehicle expenses in Denver, Houston and Detroit in late 2015, estimated actual net earnings of $10-13/hour, at or below the earnings from the studies of traditional drivers in Seattle, Chicago, Boston and New York and found that Uber was still recruiting drivers with earnings claims that reflected gross revenue, and did not mention expenses or capital risk.
Horan has just published the 27th part entitled Despite Staggering Losses, the Uber Propaganda Train Keeps Rolling explaining the current state of the process:
In order to prevent investors and the business press from understanding these results, Uber improperly combined the results of its ongoing, continuing operations with claims about valuation changes in securities of companies operating in markets they had abandoned. To further confuse matters, Uber and Lyft both emphasized a bogus, easily manipulated metric called “Adjusted EBITDA profitability” which does nor measure either profitability or EBITDA.

Part Twenty-Seven returns to an important question this series has discussed on multiple occasions—how can a company that has produced eleven years of horrendous financial results and failed to present any semi-coherent argument as to how it could ever achieve sustainable profitability, still be widely seen as a successful and innovative company? One aspect of that was discussed in Part Twenty-Six: the mainstream business press reports of Uber’s financial results are written by people who have difficulty reading financial statements and do not understand concepts such as “profitability.”

The primary driver of the huge gap between Uber’s positive image and its underlying economic reality was its carefully crafted and extremely effective propaganda-based PR program. This series has documented the origins and results of this program in great detail over the years. [3] In the years before objective data about Uber’s terrible economics became widely available, these accounts were designed to lead customers and local officials into believing that Uber was a well-run and innovative company producing enormous benefits that justified its refusal to obey existing laws and regulations and its pursuit of monopoly power.

Uber propaganda is still being produced since the company needs to give potential investors and the general public some reason to believe that a company with its problematic history and awful financials still has a promising future.
Horan goes on to discuss two recent examples of Uber propaganda, Maureen Dowd's fawning profile of CEO Dara Khosrowshahi, and a Wall Street Journal editorial entitled How Uber and Lyft Can Save Lives based on more of the bogus "academic" research Uber has a track record of producing.

Musk's continual hyping of the prospect of robotaxis flies in the face of the history of Uber and its competitors in the non-automated taxi business. Even if robotaxis worked, they'd be a lot more expensive to buy than conventional taxis. They'd eliminate paying the driver, but the Uber driver is lucky to make minimum wage. And they'd incur other costs such as monitoring to rescue the cars from their need to hand-off to a non-existent driver (as has been amply demonstrated by Waymo's Phoenix trial). If Uber can't make a profit and can't figure out how to make a profit even if cars drive themselves, skepticism of Musk's robotaxi hype was clearly justified.

Now, Estimating the energy impact of electric, autonomous taxis: Evidence from a select market by Ashley Nunes et al puts some real detail behind the skepticism. They compare Autonomous Taxis (ATs) with Conventional Taxis (CTs) and Personal Vehicles (PVs):
The findings of our paper are fourfold. First, we illustrate that an AT’s financial proposition, while being more favorable than CTs, remains — contrary to existing discourse — less favorable than PVs. ATs impose a cost of between $1.42 and $2.24 per mile compared to $3.55 and $0.95 per mile incurred when using CTs and PVs respectively. Second, we identify previously overlooked parameters, the most notable being capacity utilization and profit incentive, as significant impediments to achieving cost parity between ATs and PVs. Omission of these parameters lowers AT rider costs to as low as $0.47 per mile. Third, we document that rebound effects do not require cost parity between ATs and PVs. We show that AT introduction produces a net increase in energy consumption and emissions, despite ATs being more expensive than PVs. Fourth we identify and quantify the technological, behavioral and logistical pathways — namely, conformance to AT-specific energy profile, ride-pooling and ‘smart deployment’ — required to achieve net reduction in energy consumption and emissions owing to AT deployment.
For the purpose of critquing Tesla's econowashing, it is only necessary to consider Nunes et al's financial analysis. Their model includes the full range of cost factors:
Expenditures considered when estimating consumer cost include vehicle financing, licensing, insurance, maintenance, cleaning, fuel and, for ATs specifically, safety oversight (16,22). Requisite safety oversight is assumed to decrease as AT technology advances. We also take account of operator-envisioned profit expectations and fluctuations in capacity utilization rates that reflect demand heterogeneity.

AT cost estimates also consider heterogeneity in vehicle operational lifespan and annual mileage. As the pro-rating of fixed costs over time impacts the financial proposition of ATs, both factors warrant attention. Mileage heterogeneity considers vehicle recharging requirements that may limit vehicle productivity and subsequently, profitability (23,24). Productivity may be further impeded when vehicle electrification is paired with vehicle automatization owing to increased vehicular weight, sensor load and aerodynamic drag, all of which limit vehicle range (25).

We also consider consumer travel time in terms of hourly wages and thus transform differences in travel time to money units (19,26). Literature suggests that productivity benefits are realized through the re-allocation of time to paid or leisure activities that replace the demands of driving on attention. Envisioned benefits include would-be drivers performing other valued activities (19).
They explain the difference between their analysis and earlier efforts thus:
Our financial results admittedly differ from past studies demonstrating cost competitiveness of ATs with PVs (12-14,19). The primary reason for this is that our model accounts for capacity utilization considerations and operator-envisioned profit expectations. Although the inclusion of these factors ‘worsens’ an AT’s financial proposition, their consideration is timely and consistent with commercial fleet operator business practices (20,21).


Why does Elon Musk keep lying about the capabilities, timescales and economics of his self-driving technology? After all this time it isn't plausible that someone as smart as Musk doesn't know that "Full Self-Driving" isn't, that it won't be in 6-24 months, and that even if it worked flawlessly the robotaxi idea won't make customers a profit. In fact, we know he does know it. In Tesla's most recent earnings call Musk said (my emphasis):
We need to make Full Self-Driving work in order for it to be a compelling value proposition,” Musk said, adding that otherwise the consumer is “betting on the future.”
And last night he tweeted:
FSD Beta 9.2 is actually not great imo
Why Elon Musk Isn’t Superman by Tim O'Reilly suggests why Musk needs people to be “betting on the future.”:
Elon Musk’s wealth doesn’t come from him hoarding Tesla’s extractive profits, like a robber baron of old. For most of its existence, Tesla had no profits at all. It became profitable only last year. But even in 2020, Tesla’s profits of $721 million on $31.5 billion in revenue were small—only slightly more than 2% of sales, a bit less than those of the average grocery chain, the least profitable major industry segment in America.
O'Reilly should have noted where 56% of those profits came from:
Tesla’s revenue and bottom line were helped by the sale of $401 million in emissions credits in the fourth quarter to other automakers who need them to meet regulatory standards.
He continues:
Why is Musk so rich? The answer tells us something profound about our economy: he is wealthy because people are betting on him.
despite their enormous profits and huge cash hoards, Apple, Google, and Facebook have [P/E] ratios much lower than you might expect: about 30 for Apple, 34 for Google, and 28 for Facebook. Tesla at the moment of Elon Musk’s peak wealth? 1,396.
The insane 1,396 P/E, and the only slightly less insane current 355 P/E depend upon investors believing a story. So far this year Musk has lost 22% of his peak paper wealth. If Tesla had dropped to Google's P/E Musk would have lost 93% of his peak paper wealth in 7 months. He would be only 7% as rich as he once thought he was. Preventing that happening by telling pausible stories of future technologies is important to Musk.

Participatory Archiving / Ed Summers

These are some rough notes for a virtual event that University College London Department of Information Studies hosted on Participatory Archiving and Digital Memory. I woke up at 3:30AM to attend, so I apologize in advance for the sketchiness of what follows 😴 ☕ ☕ ☕

Data Cultures: Implications for Digital Equity and Archival Practice / Gillian Oliver

Gillian Oliver got things started by focusing attention on the importance of “data cultures”. As an example of data cultures she pointed to how the idea of data privacy can be seen through notions of collectivism or the continuum between I and Us. In some of her research she has looked at how shared use of cell phone technology in Bangladesh happens despite the fact that the devices themselves are designed for individual use by (Ayn Rand inspired) Silicon Valley. She cited two of her works that are available for more information on data cultures:

She emphasized the first because it is open access, and also authored by Frank Upward who just died last week :(

There was a brief discussion afterwards about the differences between talking about data, information and records. It seems to me this kind of tension is pretty important to address for archival research programs. A participant suggested that record keeping perspectives are a form of expertise, and that all academic disciplines have their forms of expertise. I wondered about the connection between “data cultures” and fields like critical data studies.

New Technologies, Crowdsourcing and Online User Participation in Archives / Alexandra Eveleigh

Alexandra Eveleigh is currently working at the Wellcome Trust and has been involved at the intersection between crowdsourcing/participation and archives for about 10 years. She gave an overview of the idea of participation in archives, and how the social web, or web 2.0 perspectives/technologies impacted archives. Terms like “crowdsourcing”, “archives 2.0”, “participatory archives” are slippery. Much work has ended up being facilitated in part through social media. Eveleigh put this XKCD cartoon alongside her timeline to emphasize how much early thinking about participation has been understood through our evolving conceptions about social media platforms, that are themselves fragmented territories.

Early on Palmer (2009) emphasizes that discussion needs to be focused on benefits, who will benefit and how, rather than on tools themselves. More recent thoughts about crowdsourcing can be found in the new Collective Wisdom: Perspectives on Crowdsourcing in Cultural Heritage which was itself written using a collaborative process as a book sprint.

Projects tend to focus on artifactual outputs: number of pages transcribed, etc. But there has been a shift towards understanding participatory archives in terms of people and communities. Participation isn’t just about adding user generated content, or transcribing … we should be talking about building trust and generating new knowledge, and stimulate critical reflection on record keeping. How we work together (process) is more important than what we are working on. People engage for different reasons, and engagement during the pandemic has exploded in projects like the Smithsonian’s Transcription program.

Participatory strategies need to focus on collaboration and “reciprocal curation” such as LocalContexts. Processes involving communities are equally important to the outcomes. Participation need to be ethical and equitable. This reminds me a lot of what we are trying to do in Documenting the Now with Social Humans labels … to give agency to social media users to shape how their data can be made part of an archive.

10 years ago it was acceptable to ask people to transcribe, today this is usually performed by OCR technology. Although it is increasingly common for people to be involved in the evaluation of algorithmic outputs. Just because a task can be automated does not mean it should be. If not carefully applied AI techniques can introduce subtle biases that alienate particular communities.

Early principles for participatory archives can be seen as having shifted:

  1. Decentralization Curation -> Agile and Adaptable
  2. Radical User Orientation -> Respectful and Responsive
  3. Contextualization of Records and the entire archival process -> Sustainable and Accessible

There’s a temptation to talk about “projects”. But the idea of a project is at odds with the idea of participation. Seeing participation as a one-off, or add-on doesn’t help organizations build participatory archival processes. Participation requires organizational commitment. Respecting the labor of participants by sustaining the work and ensuring accessibility are important.

There was a good question about who gets to decide what is participatory. Alexandra thought that a decade ago it was the archivist who decided. But today more mature projects allow participants to decide, this is why agile processes are important, because things are learned during the participation.

I asked about how the goals of open access are sometimes at odds with important archival work that needs to be done. Institutions like Wellcome have a clear public mission. Alexandra responded saying that people should check out the CARE Principles which should be read in response or in relation to the open science focused FAIR Principles. Here CARE focuses on:

  • Collective Benefit
  • Authority to Control
  • Reponsibility
  • Ethics

A shifting archival paradigm: Counter-archiving as a collaborative, community-led and participatory endeavour / Andrew Flinn

[Andrew Flinn] started by talking about different paradigms for participation:

  • Cook (2013) : Community
  • Caswell (2021) : Praxis
  • Kashmere (2010) ; Ben-David (2020) : Contesting what knowledge is

When talking about participation Flinn notes that it’s important to recognize the work of Sven Lindqvist Dig Where You Stand, on the significance of documenting your own history.

The Swedish connection is interesting, especially (for me) thinking about the impact that participatory design has made in digital work. Of course I thought about how social media offers a way for people to document their own lives in ways that are constrained (and exploited) by the social media platforms themselves.

Flinn gave a series of examples of different community focused archives that embody participatory ideas. I couldn’t really note all of these down, although many were familiar to me (perhaps because I’ve seen Flinn speak before). One new one to me was the MayDay Rooms which is an example of the counter-archive, run by activist communities to document their own activities. The archive is a social resource rather than a repository.

He mentioned the work of the Documenting the Now (which was great) while citing Caswell:

Flinn notes that it’s not an either-or between the past and the now. He points towards ideas of [salvage] and challenging absence. This gets seen in the types of material that are valued. Things that are ephemeral sometimes are devalued: handbills, flyers, posters, programmes, are sometimes the only surviving records.

It was awesome to hear Flinn cite Anat Ben-David’s work, which I was familiar with from the web archiving connection.

Sometimes copyright frameworks constrain the type of participation that archives need to do. Witness are a good example of applying evidentiary practices from archives, but for witnesses of violence. Similar to the work of the Syrian Archive. Counter archives are often personal and affective. Their records are preserved from the threat of deletion and are not removed from the context of their creation. They useful for building solidarity.

Flinn noted in the discussion period that despite their colonial history archival institutions can be led in the right direction and can be allies. Relationships between community archives and institutions can be important for sustainability of some records. Institutions have a huge role to play when it comes to national narratives. There is a struggle within institutions about these narratives. Equitable partnerships should be pursued, and attention to how funding is distributed is critically important. Funding tends to be focused on projects which are time based, and go to the institution rather than the community. How can we move to more sustainable models that better recognize community expertise and community labor. Also we need to reflect on who we fund and who we don’t fund.

He cited dissertation work by Hannah Ishmael on The Development of Black-Led Archives in London which I want to check out. Elizabeth Shepherd during the Q&A noted that there is quite a bit of similarity between community informatics and the types of counter-archives that Flinn was talking about. Flinn agreed, and that much of this work is happening at Monash where Gillian Oliver is from.

He was asked also whether the role of the digital has much to do with counter-archives. Flinn noted that counter-archives are often hybrid, the digital content starts as informal documentation which shifts into a more archival/documentary mode. The role of platforms, whether people can get their content out of the platforms again, and share it in a sustainable way are a big concern.

Someone asked a question about how to find out about activist archiving. Flinn noted that many of these efforts are specialized and localized. If you are interested in the role of web documents in activism work I encourage you to join the Documenting the Now Slack. We want it to be bigger than just a discussion of Twitter, and welcome participation from all different perspectives.

A note to the organizers (in the unlikely event that they read this): Having an online event about participatory archives where participants can’t chat and use the Q&A is perhaps not the greatest look.


Ben-David, A. (2020). Counter-archiving Facebook. European Journal of Communication, 35(3), 249–264.

Caswell, M. (2021). Urgent archives: enacting liberatory memory work. Abingdon, Oxon ; New York, NY: Routledge.

Cook, T. (2013). Evidence, memory, identity, and community: four shifting archival paradigms. Archival Science, 13(2-3), 95–120.

Kashmere, B. (Ed.). (2010). Counter-Archive. Oakland, California. Retrieved from

Palmer, J. (2009). Archives 2.0: If We Build It, Will They Come? Ariadne, (60). Retrieved from

The Open Book Genome Project / Open Library

We’ve all heard the advice, don’t judge a book by its cover. But then how should we go about identifying books which are good for us? The secret depends on understanding two things:

  1. What is a book?
  2. What are our preferences?

We can’t easily answer the second question without understanding the first one. But we can help by being good library listeners and trying to provide tools, such as the Reading Log and Lists, to help patrons record and discover books they like. Since everyone is different, the second question is key to understanding why patrons like these books and making Open Library as useful as possible to patrons.

What is a book?

As we’ve explored before, determining whether something is a book is a deceptively difficult task, even for librarians. It’s a bound thing made of paper, right? But what about audiobooks and ebooks? Ok, books have ISBNs right? But many formats can have ISBNs and books published before 1967 won’t have one. And what about yearbooks? Is a yearbook a book? Is a dictionary a book? What about a phonebook? A price guide? An atlas? There are entire organizations, like the San Francisco Center for the Book, dedicated to exploring and pushing the limits of the book format.

In some ways, it’s easier to answer this question about humans than books because every human is built according to a specific genetic blueprint called DNA. We all have DNA, what make us unique are the variations of more than 20,000 genes that our DNA are made of, which help encode for characteristics like hair and eye color. In 1990, an international research group called the Human Genome Project (HGP) began sequencing the human genome to definitively uncover, “nature’s complete genetic blueprint for building a human being”. The result, which completed in 2003, was a compelling answer of, “what is a human?”.

Nine years later, Will Glaser & Tim Westergren drew inspiration from HGP and launched a similar effort called the Music Genome Project, using trained experts to classify and label music according to a taxonomy of characteristics, like genre and tempo. This system became the engine which powers song recommendations for Pandora Radio.

Circa 2003, Aaron Stanton, Matt Monroe, Sidian Jones, and Dan Bowen adapted the idea of Pandora to books, creating a book recommendation service called BookLamp. Under the hood, they devised a Book Genome Project which combined computers and crowds to “identify, track, measure, and study the multitude of features that make up a book”.

Their system analyzed books and surfaced insights about their structure, themes, age-appropriateness, and even pace, bringing us withing grasping distance of the answer to our question: What is a book?


Sadly, the project did not release their data, was acquired by Apple in 2014, and subsequently discontinued. But they left an exciting treasure map for others to follow.

And follow, others did. In 2006, a project called the Open Music Genome Project attempted to create a public, open, community alternative to Pandora’s Music Genome Project. We thought this was a beautiful gesture and a great opportunity for Open Library; perhaps we could facilitate public book insights which any project in the ecosystem could use to create their own answer for, “what is a book?”. We also found inspiration from complimentary projects like StoryGraph, which elegantly crowd sources book tags from patrons to help you, “choose your next book based on your mood and your favorite topics and themes”, HaithiTrust Research Center (HTRC) which has led the way in making book data available to researchers, and the Open Syllabus Project which is surfacing useful academic books based on their usage across college curriculum.

Introducing the Open Book Genome Project

Over the last several months, we’ve been talking to communities, conducting research, speaking with some of the teams behind these innovative projects, and building experiments to shape a non-profit adaptation of these approaches called the Open Book Genome Project (OBGP).

Our hope is that this Open Book Genome Project will help responsibly make book data more useful and accessible to the public: to power book recommendations, to compare books based on their similarities and differences, to produce more accurate summaries, to calculate reading levels to match audiences to books, to surface citations and urls mentioned within books, and more.

OBGP hopes to achieve these things by employing a two pronged approach which readers may continue learning about in following two blog posts:

  1. The Sequencer – a community-engineered bot which reads millions of Internet Archive books and extracts key insights for public consumption.
  2. Community Reviews – a new crowd-sourced book tagging system which empowers readers to collaboratively classify & share structured reviews of books.

Or hear an overview of the OBGP in this half-hour tech talk:

Notes on retrying all jobs with ActiveJob retry_on / Jonathan Rochkind

I would like to configure all my ActiveJobs to retry on failure, and I’d like to do so with the ActiveJob retry_on method.

So I’m going to configure it in my ApplicationJob class, in order to retry on any error, maybe something like:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError # other args to be discussed

Why use ActiveJob retry_on for this? Why StandardError?

Many people use backend-specific logic for retries, especially with Sidekiq. That’s fine!

I like the idea of using the ActiveJob functionality:

  • I currently use resque (more on challenges with retry here later), but plan to switch to something else at some point medium-term. Maybe sideqkiq, but maybe delayed_job or good_job. (Just using the DB and not having a redis is attractive to me, as is open source). I like the idea of not having to redo this setup when I switch back-ends, or am trying out different ones.
  • In general, I like the promise of ActiveJob as swappable commoditized backends
  • I like what I see as good_job’s philosophy here, why have every back-end reinvent the wheel when a feature can be done at the ActiveJob level? That can help keep the individual back-end smaller, and less “expensive” to maintain. good_job encourages you to use ActiveJob retries I think.

Note, dhh is on record from 2018 saying he thinks setting up retries for all StandardError is a bad idea. But I don’t really understand why! He says “You should know why you’d want to retry, and the code should document that knowledge.” — but the fact that so many ActiveJob back-ends provide “retry all jobs” functionality makes it seem to me an established common need and best practice, and why shouldn’t you be able to do it with ActiveJob alone?

dhh thinks ActiveJob retry is for specific targetted retries maybe, and the backend retry should be used for generic universal ones? Honestly I don’t see myself doing much specific targetted retries, making all your jobs idempotent (important! Best practice for ActiveJob always!), and just having them all retry on any error seems to me to be the way to go, a more efficient use of developer time and sufficient for at least a relatively simple app.

One situation I have where a retry is crucial, is when I have a fairly long-running job (say it takes more than 60 seconds to run; I have some unavoidably!), and the machine running the jobs needs to restart. It might interrupt the job. It is convenient if it is just automatically retried — put back in the queue to be run again by restarted or other job worker hosts! Otherwise it’s just sitting there failed, never to run again, requiring manual action. An automatic retry will take care of it almost invisibly.

Resque and Resque Scheduler

Resque by default doens’t supprot future-scheduled jobs. You can add them with the resque-scheduler plugin. But I had a perhaps irrational desire to avoid this — resque and it’s ecosystem have at different times had different amounts of maintenance/abandonment, and I’m (perhaps irrationally) reluctant to complexify my resque stack.

And do I need future scheduling for retries? For my most important use cases, it’s totally fine if I retry just once, immediately, with a wait: 0. Sure, that won’t take care of all potential use cases, but it’s a good start.

I thought even without resque supporting future-scheduling, i could get away with:

retry_on StandardError, wait: 0

Alas, this won’t actually work, it still ends up being converted to a future-schedule call, which gets rejected by the resque_adapter bundled with Rails unless you have resque-scheduler installed.

But of course, resque can handle wait:0 semantically, if the code was willing to do it by queing an ordinary resque job…. I don’t know if it’s a good idea, but this simple patch to Rails-bundled resque_adapter will make it willing to accept “scheduled” jobs when the time to be scheduled is actually “now”, just scheduling them normally, while still raising on attempts to future schedule. For me, it makes retry_on.... wait: 0 work with just plain resque.

Note: retry_on attempts count includes first run

So wanting to retry just once, I tried something like this:

# Will never actually retry
retry_on StandardError, attempts: 1

My job was never actually retried this way! It looks like the attempts count includes the first non-error run, the total number of times job will be run, including the very first one before any “retries”! So attempts 1 means “never retry” and does nothing. Oops. If you actually want to retry only once, in my Rails 6.1 app this is what did it for me:

# will actually retry once
retry_on StandardError, attempts: 2

(I think this means the default, attempts: 5 actually means your job can be run a total of 5 times– one original time and 4 retries. I guess that’s what was intended?)

Note: job_id stays the same through retries, hooray

By the way, I checked, and at least in Rails 6.1, the ActiveJob#job_id stays the same on retries. If the job runs once and is retried twice more, it’ll have the same job_id each time, you’ll see three Performing lines in your logs, with the same job_id.

Phew! I think that’s the right thing to do, so we can easily correlate these as retries of the same jobs in our logs. And if we’re keeping the job_id somewhere to check back and see if it succeeded or failed or whatever, it stays consistent on retry.

Glad this is what ActiveJob is doing!

Logging isn’t great, but can be customized

Rails will automatically log retries with a line that looks like this:

Retrying TestFailureJob in 0 seconds, due to a RuntimeError.
# logged at `info` level

Eventually when it decides it’s attempts are exhausted, it’ll say something like:

Stopped retrying TestFailureJob due to a RuntimeError, which reoccurred on 2 attempts.
# logged at `error` level

This does not include the job-id though, which makes it harder than it should be to correlate with other log lines about this job, and follow the job’s whole course through your log file.

It’s also inconsistent with other default ActiveJob log lines, which include:

  • the Job ID in text
  • tags (Rails tagged logging system) with the job id and the string "[ActiveJob]". Because of the way the Rails code applies these only around perform/enqueue, retry/discard related log lines apparently end up not included.
  • The Exception message not just the class when there’s a class.

You can see all the built-in ActiveJob logging in the nicely compact ActiveJob::LogSubscriber class. And you can see how the log line for retry is kind of inconsistent with eg perform.

Maybe this inconsistency has persisted so long in part because few people actually use ActiveJob retry, they’re all still using their backends backend-specific functionality? I did try a PR to Rails for at least consistent formatting (my PR doesn’t do tagging), not sure if it will go anywhere, I think blind PR’s to Rails usually do not.

In the meantime, after trying a bunch of different things, I think I figured out the reasonable way to use the ActiveSupport::Notifications/LogSubscriber API to customize logging for the retry-related events while leaving it untouched from Rails for the others? See my solution here.

(Thanks to BigBinary blog for showing up in google and giving me a head start into figuring out how ActiveJob retry logging was working.)

(note: There’s also this: But I’m not sure how working/maintained it is. It seems to only customize activejob exception reports, not retry and other events. It would be an interesting project to make an up-to-date activejob-lograge that applied to ALL ActiveJob logging, expressing every event as key/values and using lograge formatter settings to output. I think we see exactly how we’d do that, with a custom log subscriber as we’ve done above!)

Warning: ApplicationJob configuration won’t work for emails

You might think since we configured retry_on on ApplicationJob, all our bg jobs are now set up for retrying.

Oops! Not deliver_later emails.

Good_job README explains that ActiveJob mailers don’t descend from ApplicationMailer. (I am curious if there’s any good reason for this, it seems like it would be nice if they did!)

The good_job README provides one way to configure the built-in Rails mailer superclass for retries.

You could maybe also try setting delivery_job on that mailer superclass to use a custom delivery job (thanks again BigBinary for the pointer)… maybe one that subclasses the default class to deliver emails as normal, but let you set some custom options like retry_on? Not sure if this would be preferable in any way.

GSoC 2021: Making Books Lendable with the Open Book Genome Project / Open Library

By: Nolan Windham & Mek

I’m Nolan Windham, an incoming freshman at Claremont McKenna College. This summer I participated in my first Google Summer of Code with the Internet Archive. I’ll be sharing the achievements I’ve made with the Open Book Genome Project sequencer, an open source tool which extracts structured data from the contents of the Internet Archive’s massive digitized book collection.

The purpose of the Open Book Genome Project to create “A Literary Fingerprint for Every Book” using the Internet Archive’s 5 million book digital library. A book’s fingerprint currently consists of 1gram (single word) and 2gram (two word) term frequency, Flesch–Kincaid readability level, referenced URLs, and ISBNs found within the book.

Try it out!

Anyone can try running the OBGP Sequencer on an Internet Archive open access book using the new OBGP Sequencer™ Google Colab Notebook. This interactive notebook runs directly within the browser, no installation required. If you have any questions, please email us.

If you are interested in seeing the source code or contributing check out the GitHub. If this project sounds fascinating to you and you’d like to learn more or keep the project going, please talk to us!

How I got involved

I first found the Internet Archive in high school where I used the Wayback Machine for research and Open Library for borrowing books. As I found out more about the Archive’s services and history, I became more and more interested in its operation and its mission: to provide “Universal Access to All Knowledge”. Once I heard this mission, I was hooked and knew I wanted to help. During a school trip to San Francisco, I joined one the Archive’s Friday physical tours (which I highly recommend). The tour guide was impressed with the amount of information this high-schooler knew about the Archive’s operation and took me aside after the tour and showed me Book Reader’s read aloud feature and answered some questions about the book derive process. The tour guide then invited me to join the Open Library community chat where developers, librarians, and patrons discuss all things Open Library. This tour guide turns out to be Mek, my project mentor, Open Library Program Lead, and Citizen of The World.

I started attending the weekly Open Library community calls to learn more about how Open Library works, the issues the project faced, and how I could help. After months of showing up to calls, learning about open source, and developing my programming skills, Mek showed me an interesting prototype called the Open Book Genome Project.


The Open Book Genome Project (OBGP) is a public good, community-run effort whose mission is to produce, “open standards, data, and services to enable deeper, faster and more holistic understanding of a book’s unique characteristics.” It was based on a previous effort led by a group in 2003 called the Book Genome Project, to “identify, track, measure, and study the multitude of features that make up a book.” Think of it as Pandora’s Music Genome Project but for books. Apple acquired and discontinued the Book Genome Project in 2014, leaving a gap in the book ecosystem which the Open Book Genome Project community now hopes to help fill for the public benefit.

The Open Book Genome Project is one of many efforts facilitated by members of the Internet Archive’s Open Library community. Their flagship service, is a non-profit, open-source, public online library catalog founded by the late Aaron Swartz, which allows book lovers around the world to access millions of the Internet Archive’s digital books using Controlled Digital Lending (CDL). Open Library hopes the Open Book Genome Project may help patrons discover and learn more about books in some of the ways the Book Genome Project originally aimed to accomplish.

You can learn more about the history of the Open Book Genome Project in an upcoming blog post. You can also learn more about the other half of the Open Book Genome Project called Community Reviews in this blog post.

Here’s where we started

When I began working on the OBGP Sequencer, the general code structure and a few features were in place. The sequencer could extract a book’s N-gram term frequency and identify its copyright page number. There were many features in the product development pipeline, but no one dedicated to implement them. Over the past few months, I led development to add and improve the Sequencer’s functionality, created an automated pipeline to process books in volume, and deployed this pipeline to production on the Archive’s corpus of books.

One challenging part of the development process was getting ISBN extraction working accurately. The ISBN extractor works by first finding what it thinks is the book’s copyright page and then checking for a valid ISBN checksum in every number sequence. Although this approach works, there are often a lot of strange edge cases usually having to do with poor optical character recognition. To address this, I was  manually spot checking books for ISBN’s that were detected and missed, and investigating why to iteratively improve the extraction process. Here is a screenshot of my process.

Another challenge later on in the development process was getting books processed at scale. With a collection as large as the Archive’s, parallelization of processing is an essential component of scaling the sequencer up. I taught myself to use some of Python’s parallelization libraries and implemented them. Another challenge was getting parallelization working with the database. I addressed this by making the file system and directory layout database because modern file systems are built to work well with parallel I/O.

Here’s what we were able to accomplish with OBGP

  1. Make more books borrowable to patrons
  2. Add reading levels for thousands of books
  3. Identify & save urls found within books
  4. Produce a large public dataset of book insights

Making Books Lendable

Nearly 200,000 books digitized by the Internet Archive were missing key metadata like ISBN. The ISBN is used to look up all sorts of book information which is helpful for determining whether a book is eligible for the Internet Archive’s lending program. The absence of this key information was thus preventing tens of thousands of eligible books from.

As of writing, the Open Book Genome Project sequencer has extracted ISBN’s for 25,705 books that were previously unknown. 12,700 of those are newly lendable to patrons. Take a look at them here!

These books now have identifying information and are linked to Open Library Records. Open Library pages that  had no books available now have borrowable books. Here is a before and after screenshot.



Adding reading levels

It’s often difficult to identify age-appropriate materials for students and children. By adding reading level information to Internet Archive’s book catalog, we’re able to make age-appropriate books more accessible.

The Sequencer now performs a Flesch–Kincaid readability test on each book on which it is run. This resulting Flesch–Kincaid grade level estimation allows students, parents, and teachers to filter their searches for books which include appropriate reading levels.

Preserving URLs

Open Library is aware of more than 1M books containing urls. These mentions by credible authors are like a vote of confidence of their relevance and usefulness. These websites are at risk of link rot and without preservation could be lost forever. But given the average webpage only lasts 100 days, it’s only a matter of time before millions of URLs found in millions of books will be preserved for future generations.

As of writing, URLs have been successfully extracted from more than 13,000 books, which will soon be preserved on the Wayback Machine. Many of the high quality references found in published books have not yet been preserved and now will be.

Producing public datasets

The original goal of OBGP was to produce an open, public data set of book insights capable of powering the open web. As of writing, the Open Book Genome Project sequencer has uploaded genomes for 180,642 books. For every book sequenced, a book genome is made publicly accessible that provides insights into the book without needing to borrow it. The goal of this is to increase the quantity and quality of publicly available descriptive information available for every book, so that readers and researchers can make better informed decisions and glean deeper insights about books. This supports readers, researchers, book sellers, libraries, and beyond.

Personal Development

I really enjoyed participating in GSoC with the Internet Archive because I was able to build programming foundations and gain industry experience that will prove invaluable in my future. I developed my project management skills, became more comfortable programming in Python and using new software libraries, and advanced my knowledge of dev-ops tools like Docker.

The future of the project

If you may be interested in contributing or learning more about the Open Book Genome Project sequencer, please send us an email.

Although we made a lot of progress with this projects development, there is still a lot more to be done. Here is a quick list of possible future features to get you excited about the possibilities of this project:

  • Make URL’s clickable
  • Identifying meaningful semantic elements in books, like Entities and Citations
  • increase the number of previewed pages & volume of previewable content.
  • Clickable Chapters in Table of Contents
  • Library of Congress Catalog Number extraction
  • Copyright information (Publisher, copyright date) extraction
  • Book and chapter Summarization and Topic Classification

twarc-hashtags / Ed Summers

One of the nice things that we did with twarc2 is to design it so you can add plugins relatively easily. These plugins extend twarc’s basic functionality to do different things with collected Twitter data. This is just a quick post about a new plugin twarc-hashtags.

twarc-hashtags was born of necessity. As I mentioned in the last post I’ve been doing some work with Alejandra Josiowicz to examine tweets about Brazilian activist Marielle Franco. Thanks to the Academic Research product track we were able to collect all the tweets matching the phrase Marielle Franco.

But we wanted to discover what hashtags were used in this initial dataset in order to broaden the search using relevant hashtags and then run it again.

Once you pip install twarc-hashtags you get a new command hashtags which you can use to generate a CSV dataset that represents the hashtags present in your data:

The generated CSV is pretty simple. It has two columns: hashtag and tweets. While your data is being read a little SQLite database is populated which has three columns: tweet id, created, and hashtag. This allows for easy counting (using SQL) but also for a bit more manipulation.

For example, if you would like to see the hashtags grouped by month you can:

The generated dataset will have an additional column for time containing a year-month value, e.g. 2020-03. You can do the same for day, week, and year if you want to group differently.

In addition you can limit the number of hashtags to display. So if you just wanted to see the top 20 hashtags per month you could:

And finally, since loading the SQLite database can take some time (for example if you are looking through 3.8 million tweets like I was) you can load it the first time and then use --no-import afterwards to skip the import step and use the existing SQLite database. This will allow you to try grouping by something different, or using a new limit, without needing to parse all that data again.

Maybe some fancier output than CSV will get added over time (ideas are welcome). Having the output in CSV means you can pretty easily drop it into another tool, like D3, GoogleSheets, Tableau, DataWrapper, etc.

I’ve been meaning to try RawGraphs after seeing it come up in a thread that Anne Helmond kicked off about some of her data visualization work some time ago. Here’s what the top 25 hashtags look like over time as a BumpGraph. If you want you can use the rawgraphs file (it’s JSON) to upload it yourself and tweak it.

If you click on the image it might be a bit easier to read (you can hover on the streams to see what they are)–but clearly it needs a bit of work still. One thing it shows pretty clearly though is the emergence of #QuemMandouMatarMarielle (Who Ordered Marielle to be Killed), the yellow band, that started the year after her murder, and has continued since.

If you get a chance to try the twarc-hashtags plugin please let me know!

Introducing: Community Reviews / Open Library

You can now publicly review books using structured book #tags on Open Library with Community Reviews. Take a look, try it out, and send us feedback!

Many social book websites including Goodreads & LibraryThing feature text reviews from the community. Why hasn’t Open Library?

As a non-profit library service with a small staff, there are three reasons we’ve resisted the urge to add text reviews to Open Library. First and foremost, we feel strongly about preserving Open Library as an inclusive, safe, neutral place where readers can trust the information they receive. Some opinionated reviews, even though valid, may contend with this goal. Secondly, we’re cautious about adding features which may require a large time investment to moderate well. We’d rather spend our time making it easier for people across the globe to find books in their native languages than sink all of our time reviewing spam. Finally, there are indeed already several websites which feature text reviews. We’re excited to link patrons to these resources and think our time may be better served exploring new ways of adding unique value back to the book ecosystem.

This all said, reviews are one of the most requested features by book lovers on Open Library and we feel its important readers to have their voices heard. So what are our options?

A review of reviews

One super-power of text reviews is that they are unstructured. Their open-ended format allows reviewers to express very nuanced and deep thoughts like, how impressively the male author Arthur Golden was able to portray the emotional turmoil of the female characters portrayed in Memoirs of A Geisha. This super-power does come with a trade-off. It can be challenging to compare reviews and know which should be trusted; two reviews may have completely diverging styles or focus. One reviewer may be reacting to the story line while another may be critiquing the book’s pace. Reviews are often not easily digestible. A lot of information is lost when one tries to compress a review into a single star rating. Because of these challenges with “digestibility”, it’s also challenging to summarize text reviews as data which may be used to help people discover new books. Amazon has some techniques which we considered:

A collaborative approach

How can Open Library empower readers to share their impressions about books in a new way, facilitate useful reviews which are structured and easily digestible, while maintaining a safe and neutral library landscape?

Open Library’s collaborative approach, which we’re calling Community Reviews, borrows from an old (now defunct) project called BookLamp and a more recent project called StoryGraph, which let participants use tags to vote on & review various aspects of books like pace, genre, mood, and more:

StoryGraph crowd sources tags like genre and mood from the community and use this information to help readers find the right book for them
BookLamp used a hybrid of robots and crowd sourcing to identify themes and topics within books.

The more participants who vote using review tags, the more accurate and meaningful the review becomes for the community. Instead of sifting through dozens of text reviews, Community Reviews gives readers a birds-eye view across many publicly listed dimensions they might care about like Pace, Enjoyability, Clarity, Difficulty, Breadth, Genre, Mood, Impressions, Length, Credibility, Text Features, Content Warnings, Terminology, and Purpose.

Here’s what Open Library Community Reviews looks like:

By clicking “+ Add your community review”, any logged in reader may submit their own public, anonymous reviews:

Building Together

Community Reviews features a public schema which anyone may reference or propose changes to. It’s a work in progress and will undoubtedly need the community’s feedback to become useful over time.


Community Reviews is a beta work in progress and we expect it to change drastically over the coming weeks based on feedback from our community. We also anticipate issues and bugs may emerge — you can help by reporting bugs and issues here.

We do have every intention for Community Reviews to be included (in an anonymized form) in our public monthly data dumps for the benefit of our community and via our APIs, though this may take some time to implement.

As the number of Community Reviews increases, our plan is to include them in our search engine so you have ever more ways to identify the best books for you.

We know many patrons would still love to see text reviews on Open Library and that Community Reviews isn’t a replacement for every use case. We sincerely appreciate this and still, we hope that readers will find this new feature valuable and provide us with feedback to improve it over time.


We’d like to sincerely thank Jim Champ who recently joined as staff member on Open Library and whose leadership was indispensable in bringing this feature to life. Thank you to you Drini Cami, also staff at Open Library, for his contributions to improving the user experience. If you hate the idea or execution, blame Mek but do give us feedback to improve.

Optical Media Durability Update / David Rosenthal

Three years ago I posted Optical Media Durability and discovered:
Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary.
Two years ago I repeated the mind-numbing process of feeding 45 disks through the reader and verifying their checksums. A year ago I did it again.

It is time again for this annual chore, and yet again this year I failed to find any errors. Below the fold, the details.

The fields in the table are as follows:
  • Month: The date marked on the media in Sharpie, and verified via the on-disk metadata.
  • Media: The type of media.
  • Good: The number of media with this type and date for which all MD5 checksums were correctly verified.
  • Bad: The number of media with this type and date for which any file failed MD5 verification.
  • Vendor: the vendor name on the media
Surprisingly, with no special storage precautions, generic low-cost media, and consumer drives, I'm getting good data from CD-Rs more than 17 years old, and from DVD-Rs nearly 15 years old. Your mileage may vary. Tune in again next year for another episode.