Planet Code4Lib

Weekly Bookmarks / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Infrastructure Landlords: The Rentier Capitalism of Commercial Academic Publishers

If you want to understand where the commercial parts of scholarly communications may be heading, you need to look beyond policy documents, conference panels, or public-facing strategy statements. You should look at what large commercial actors say when speaking to investors. Earnings calls are one of the places where that language becomes especially revealing: less concerned with sector ideals than with growth, market opportunity, competitive position, and what will ultimately generate value for shareholders. For this reason, it can be worthwhile to review earnings calls and investor presentations, as these are often overlooked when discussing OA policy and sectoral movements.

🔖 AI got the blame for the Iran school bombing. The truth is far more worrying

Someone decided to compress the kill chain. Someone decided that deliberation was latency. Someone decided to build a system that produces 1,000 targeting decisions an hour and call them high-quality. Someone decided to start this war. Several hundred people are sitting on Capitol Hill, refusing to stop it. Calling it an “AI problem” gives those decisions, and those people, a place to hide.

🔖 Guibo

GUIBo is a desktop GUI for operators and developers who run Kubo (the IPFS daemon in Go). It drives your node through Kubo’s HTTP RPC API so you can work with pins, UnixFS content, IPNS, remote pinning, gateways, and network or repo diagnostics without living in the terminal.

🔖 The Human Line Project

At The Human Line, we are committed to ensuring that AI technologies, like chatbots, are developed and deployed with the human element at their core. LLMs are powerful tools, and with Ethical design, users can gain new skills and knowledge while remaining emotionally intact.

🔖 Marriage over, €100,000 down the drain: the AI users whose lives were wrecked by delusion

Tech-related delusions, whether they involve train travel, radio transmitters or 5G masts, have been around for centuries, Morrin says. “What’s different is that we’re now arguably entering an age in which people aren’t having delusions about technology, but having delusions with technology. What’s new is this co-construction, where technology is an active participant. AI chatbots can co-create these delusional beliefs.”

🔖 Web Resource Ledger (WRL)

WRL captures web pages with cryptographic proof of authenticity – Ed25519 signatures and RFC 3161 timestamps that anyone can independently verify.

PS, Ilya took a close look and it appears to be a vibe coded mess.

🔖 Liberation Radio(s) Beyond the Internet Imaginary

The seemingly unassailable hegemony of the contemporary internet means too few people know that shortwave radio has never gone away and that in many ways it’s more durable, more secure, and more widely accessible than other contemporary forms of wireless communication such as cell or wifi.

🔖 Not AI

Valerie Veatch Asks the Big AI Questions

“[T]he first thing is that computers cannot think, that is an invented concept. And rather than computers being able to think, we’ve reinvented thinking to be something computers can do. And when we do that, all manner of power consolidation, wealth consolidation, technological monopolies happen and we are looking at this fantasy enemy instead of the real political work and community work to be done.”

🔖 Richmond Folk Festival

The Richmond Folk Festival is one of Virginia’s largest events, drawing visitors from all over the country to downtown Richmond’s historic riverfront. The Festival is a FREE three-day event that got its start as the National Council for the Traditional Arts’ National Folk Festival, held in Richmond from 2005-2007. The Richmond Folk Festival features performing groups representing a diverse array of cultural traditions on six stages.

🔖 The Commons w/ Peter Linebaugh

Featuring Peter Linebaugh on the long histories of commons and commoning, connections between enclosures in Europe and imperial conquest abroad, and writing history from below.

🔖 Silicon Valley’s Mythology of Human Amplification

If output is your only metric, then the steam engine really is just a better bicycle. Both get you from A to B. One gets you there faster with less effort. Case closed. The fact that you arrive having done nothing, learned nothing, built nothing—that’s not a bug, that’s the point. Effort is a cost to be minimized, not a value to be preserved.7

But embedded in that worldview is that the journey is merely instrumental. The only thing that matters is arrival. That it doesn’t matter if you travel or are traveled. The Inuit elders seem to operate on a different premise. Arrival, of course, mattered. These were hunters who needed to find caribou and get home alive. But only through the journey could you acquire deep knowledge of the terrain. You couldn’t separate arriving at the destination from what you learned on the way there.

🔖 Code Review Is Not About Catching Bugs

What teams collaborate on during review is changing. Less time spent on style nits and mechanical correctness, more time on intent, architecture, and whether a change moves the product in the right direction. That’s a good shift. And the collaborative act itself – multiple humans exercising judgment together, developing shared taste, building mutual understanding of where the system is heading – that’s not a bottleneck to eliminate. It’s something to uplevel.

🔖 Mining the commons: AI extraction, Wikipedia, and the case for a multi-stakeholder settlement

Wikipedia and similar DPGs cannot sustain themselves on a fragile mix of donations, sporadic philanthropy, and ad-hoc corporate generosity. What’s needed is a multi-stakeholder settlement in which large-scale users of the commons take on long-term, structured obligations to sustain it: contractual funding through paid APIs and usage-based levies, formal recognition of DPGs as Digital Public Infrastructure to unlock multilateral co-financing, and a shift in philanthropy from one-off project grants to sustained core support for the institutions that maintain the commons.

🔖 Harness engineering: leveraging Codex in an agent-first world

We intentionally chose this constraint so we would build what was necessary to increase engineering velocity by orders of magnitude. We had weeks to ship what ended up being a million lines of code. To do that, we needed to understand what changes when a software engineering team’s primary job is no longer to write code, but to design environments, specify intent, and build feedback loops that allow Codex agents to do reliable work.

This post is about what we learned by building a brand new product with a team of agents—what broke, what compounded, and how to maximize our one truly scarce resource: human time and attention.

🔖 Harness Engineering

It was very interesting to read OpenAI’s recent write-up on “Harness engineering” which describes how a team used “no manually typed code at all” as a forcing function to build a harness for maintaining a large application with AI agents. After 5 months, they’ve built a real product that’s now over 1 million lines of code.

The article is titled “Harness engineering: leveraging Codex in an agent-first world”, but only mentions “harness” once in the text. Maybe the term was an afterthought inspired by Mitchell Hashimoto’s recent blog post. Either way, I like “harness” as a word to describe the tooling and practices we can use to keep AI agents in check.

🔖 The importance of Agent Harness in 2026

We are at a turning point in AI. For years, we focused only on the model. We asked how smart/good the model was. We checked leaderboards and benchmarks to see if Model A beats Model B.

The difference between top-tier models on static leaderboards is shrinking. But this could be an illusion. The gap between models becomes clear the longer and more complex a task gets. It comes down to durability: How well a model follows instructions while executing hundreds of tool calls over time. A 1% difference on a leaderboard cannot detect the reliability if a model drifts off-track after fifty steps.

We need a new way to show capabilities, performance and improvements. We need systems that proves models can execute multi-day workstreams reliably. One Answer to this are Agent Harnesses.

🔖 Who will remember us when the servers go dark?

When the server goes dark, we go dark, too. We’ve built an entire civilisation on an unthinkably brutal and comically unreliable stack while hallucinating it as literally anything else. We condemn AI today for making shit up, but what about us? We’re building on a fantasy just as brittle, we are just as demonstrably wrong. Yet we pretend a file isn’t just a gesture that can disappear in an instant. We hallucinate that the server is somehow both fleeting and forever.

🔖 Risky Bulletin: GitHub is starting to have a real malware problem

GitHub is slowly becoming a very dangerous website as more and more threat actors are starting to use it to host and distribute malware disguised as legitimate software repositories.

What started as an infrequent sighting in early 2024 is now at the center of an increasing number of infosec and malware reports.

The tactic is usually the same. A threat actor would take a legitimate repository, add malware to the files—typically an infostealer or a remote access trojan— and then upload the boobytrapped repo back on GitHub.

🔖 One man’s poignant search for community via radio waves

A unique and deeply moving piece of biographical filmmaking, the short documentary Echo provides a window into the life of an older man named Allister Hadden living in Northern Ireland. The film drifts between past and present, with a rich, textured, shot-on-film aesthetic tethering together Hadden’s archival recordings and newly shot footage from the Belfast-based filmmaker Ross McClean.

The Handoff Problem / David Rosenthal

Source
Around twelve years ago, Google figured out the fundamental problem facing Tesla's Fake Self Driving. Almost nine years ago in Robot Cars Can’t Count on Us in an Emergency, John Markoff wrote:
Three years ago, Google’s self-driving car project abruptly shifted from designing a vehicle that would drive autonomously most of the time while occasionally requiring human oversight, to a slow-speed robot without a brake pedal, accelerator or steering wheel. In other words, human driving was no longer permitted.

The company made the decision after giving self-driving cars to Google employees for their work commutes and recording what the passengers did while the autonomous system did the driving. In-car cameras recorded employees climbing into the back seat, climbing out of an open car window, and even smooching while the car was in motion, according to two former Google engineers.
Gareth Corfield at The Register added:
Google binned its self-driving cars' "take over now, human!" feature because test drivers kept dozing off behind the wheel instead of watching the road, according to reports.

"What we found was pretty scary," Google Waymo's boss John Krafcik told Reuters reporters during a recent media tour of a Waymo testing facility. "It's hard to take over because they have lost contextual awareness."
Follow me below the fold for a wonderful example of Tesla's handoff problem, and a discussion of the difference between Tesla's and Waymo's approaches to self-driving.

I wrote about this handoff problem in 2017's Techno-hype part 1. I did a thought experiment, imagining mass-market cars 3 times better than Waymo's at the time:
A normal person would encounter a hand-off once in 15,000 miles of driving, or less than once a year. Driving would be something they'd be asked to do maybe 50 times in their life.

Even if, when the hand-off happened, the human ... had full "situational awareness", they would be faced with a situation too complex for the car's software. How likely is it that they would have the skills needed to cope, when the last time they did any driving was over a year ago, and on average they've only driven 25 times in their life? Current testing of self-driving cars hands-off to drivers with more than a decade of driving experience, well over 100,000 miles of it. It bears no relationship to the hand-off problem with a mass deployment of self-driving technology.
I concluded:
But the real difficulty is this. The closer the technology gets to Level 5, the worse the hand-off problem gets, because the human has less experience. Incremental progress in deployments doesn't make this problem go away.
Raffi Krikorian:
used to run the self-driving-car division at Uber, trying to build a future in which technology protects us from accidents. I had thought about edge cases, failure modes, the brittleness hiding behind smooth performance. My team trained human drivers on when and how to intervene if a self-driving car made a mistake. In the two years I ran the division, we had no injuries in our early pilot programs.
He has an article in the current Atlantic entitled My Tesla Was Driving Itself Perfectly—Until It Crashed with the sub-head:
The danger of almost-perfect tech
As an enthusiast for slef-driving technology, Krikorian used it:
With my own Tesla, I started out using Full Self-Driving as the default setting only on highways. That’s where it makes sense: You have clear lane markers and predictable traffic patterns. Then, one day, I tried it on a local road, and it worked well enough to become a habit.
But, after three years:
My memory is hazy, and some of it comes from one of my sons, who watched the whole thing unfold from the back seat. The car was making a turn. Something felt off—the steering wheel jerked one way, then the other, and the car decelerated in a way I didn’t expect. I turned the wheel to take over. I don’t know exactly what the system was doing, or why. I only know that somewhere in those seconds, we ended up colliding with a wall.
He didn't have "situational awareness", even though he was an experienced driver aware of the handoff problem. He sums up the current problem, with drivers like him:
Full Self-Driving works almost all of the time—Tesla’s fleet of cars with the technology logs millions of miles between serious incidents, by the company’s count. And that’s the problem: We are asking humans to supervise systems designed to make supervision feel pointless. A machine that constantly fails keeps you sharp. A machine that works perfectly needs no oversight. But a machine that works almost perfectly? That’s where the danger lies. After a few hours of flawless performance, research shows, drivers are prone to start overtrusting self-driving systems. After a month of using adaptive cruise control, drivers were more than six times as likely to look at their phone, according to one study from the Insurance Institute for Highway Safety.
Imagine this problem compounded by handing off to a driver who hadn't driven in a year.

Google was building Level 4 robotaxis. Their conservative approach was to eliminate the handoff problem completely. Waymos operate on carefully mapped routes after much practice, and are equipped with a diverse set of sensors. Just as everywhere along their flight path, airliners have a designated diversion airport, Waymos know a safe place to stop and ask for help from remote humans. They don't drive the cars, they just advise the car as to how to solve the problem. This can, as I have seen a couple of times, cause frustration among other road users, but it is safe.

Tesla, on the other hand, had a Level 2 driver assist system with a limited set of sensors, which depended on handing off to the driver in case of confusion. They consistenly marketed it as "Full Self-Driving" with exaggerated claims about its capabilities, and sold it to normal, untrained drivers. They could not, and could not afford to, implement Google's approach. Why not?
  • Scale: Tesla has 1.1M FSD customers, where six months ago Waymo had about 2K cars in service. To support them, Waymo has about 70 remote operators on duty. Of course, FSD is used much less intensively, lets guess only 5% as much. Even if, optimistically, Tesla's technology generated as few remote requests as Waymo's they would need almost 2,000 remote operators on duty.
  • Technical: First, Tesla markets FSD as usable anywhere, even if their terms of service disagree. So they lack the detailed maps Waymos use when they need to find a safe place. Second, Tesla has far fewer sensors, so has much less information on which to base the need for and choice of a safe place.
  • Marketing: There are two problems. First, telling the public that FSD will sometimes need to stop and ask for help goes against the idea that it is "Full Self Driving". Second, everyone can see that a Waymo is driving itself and can set their expectations to match. No-one can tell that a Tesla is using Fake Self Driving. So were Teslas stopping unexpectedly, even if it wasn't using Fake Self Driving, the assumption would be that the technology had failed.
Because Tesla has always depended upon handing off to the human, the result is that Tesla's minimal robotaxi service with "safety monitors" in Austin, TX crashes six times as often as human-driven taxis.

2026-03-17: The Disintegration Loops: Generational Loss in Web Archives / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

 The Disintegration Loops: Generational Loss in Web Archives


Michael L. Nelson




As part of the Internet Archive's Information Stewardship Forum (March 18–20, 2026), I decided to use my five minute lightning talk to raise the issue of generational loss in web archives.  Or more directly, making copies of copies (...of copies…) – something that web archives currently do not do well.  My title is based on William Basinski's four volume release "The Disintegration Loops", in which he played the audio tapes of "found sounds", recorded decades earlier, in loops, with the whole process lasting over an hour.  The effect is hauntingly beautiful, with each loop slightly degrading the magnetic tape, resulting in a generational loss.  The degradation of each loop is right on the edge of the just-noticeable difference, until the entire track is reduced to just a shadow of its former self.


I first discussed this topic in my 2019 CNI closing keynote (slide 88), where I introduced the inability of web archives to archive other web archives as part of the larger issue of web archive interoperability. Let's begin with walking through the example of archiving a tweet (which we already know to be challenging!).   The original tweet is still on the live web, even though the UI has undergone many revisions since when it was originally tweeted in 2018.  


https://twitter.com/phonedude_mln/status/990054945457147904 

(screen shot from 2026-03-17)



I archived that tweet to the Internet Archive's Wayback Machine in 2018 (screen shot from 2019):


https://web.archive.org/web/20180501125952/https://twitter.com/phonedude_mln/status/990054945457147904 


I then archived the Wayback Machine's copy of the tweet to archive.today in 2019 (screen shot from 2019):

https://archive.ph/PaKx6 


Note that archive.today is aware that the page comes from the Wayback Machine but the original host is twitter.com, and it maintains both the original Memento-Datetime (20180501125952) as well as its own Memento-Datetime (20190407023141).  I then archived archive.today's memento to perma.cc in 2019 (screen shot from 2019):



https://perma.cc/3HMS-TB59 


Finally, I archived the perma.cc memento back to the Wayback Machine in 2019 (screen shot from 2019):


https://web.archive.org/web/20190407024654/https://perma.cc/3HMS-TB59 


Although the loss occurs in discrete chunks, it is reminiscent of Basinski's Disintegration Loops, with information lost at each step, and the final version being a mere shadow of the original.  In 2019, this was not universally recognized as a problem, since archiving the playback interface of other web archives was not considered a problem to itself.  The "right" solution, of course, is to share the WARC files (or WAC, or HAR, or…) out-of-band and let the other web archives replay from the same source files.  But this is rarely possible: for a variety of reasons web archives typically do not share the original WARC files, and in the case of archive.today, might not even store the original source files (and instead, likely only store the radically transformed pages).  


More importantly, it is sometimes useful to archive a particular web archive's replay of a page, which itself must be archived, because it changes through time. For example, memento #3 (the perma.cc memento of archive.today's memento) is now different; this is a screen shot from 2026:


2026 replay of https://perma.cc/3HMS-TB59 


Surely the source files themselves have not changed, and the difference is due to improvements in pywb, which is under constant development. perma.cc's replay of the 2019 page in 2019 is different from the replay from 2026, which implies that it could be different still in the future. But we can not currently archive without generational loss of perma.cc's replay of that page to, say, the Wayback Machine.  The fact that screen shots – which are rife with their own potential for abuse (cf. HT 2025, arXiv 2022) – are the only mechanism to document these replay differences underscores the web archive interoperability problem.


I chose the topic of generational loss for my slot at the Information Stewardship Forum because recent events have introduced a new use case for archiving the replay of web archives. Wikipedia recently announced it was blacklisting archive.today because its editors discovered that webmaster at archive.today was using its captcha to direct a DDoS attack against a blog owned by someone that webmaster had a dispute with (the blogger had posted a lengthy investigation of the identity of webmaster), and, for our discussion more disturbingly, had edited the content of an archived page to include the name of the blogger where it would not otherwise be.  The Wikipedia discussion page is hard to follow, in part because the editors are discussing how to archive the replay of an archived page.  For one example, they show how the archive.today replay now has been changed back to have "Comment as: Nora " (middle of the image):



But the replay alteration from archive.today in question is archived at megalodon.jp to show that the name "Nora " was replaced with the name of the blogger that had earned webmaster's ire, "Jani Patokallio". And yes, megalodon.jp's replay of archive.today's memento is that bad (at least in my browser, it is shrunk down impossibly small), so I used the dev tools to find the string in question. 


https://megalodon.jp/2026-0219-1509-14/https://archive.is:443/2021.05.30-173350/http://www.maskofzion.com/2012/04/jewish-at-root-iraqs-destruction-hell.html


Another Wikipedian archived (using yet another archive, ghostarchive.org) a google.com SERP to show that archive.today has reverted from "Jani Patokallio" back to "Nora ". 



What does changing "Nora" to "Jani" (and then changing it back again) accomplish? I'm not sure; this appears to be just a petty response to an ongoing dispute.  But the implication is profound: this is the first known example of a major web archive purposefully and maliciously altering its contents, something that we knew was possible but had not yet experienced.  


We have long known that replay can change through time (cf. PLOS One 2023) due to the replay engine (the Wayback Machine, Open Wayback, pywb, etc.) evolving, but these changes were engineering results and the replay mostly improved over time. But now we have seen web archives maliciously alter (and then revert) the replay, and we need a more standard and interoperable way to archive archival replay.  Not just to prove that a web archive did alter its replay, but also to prove that an archive did not alter its replay.  Out-of-band sharing of WARC files is the gold standard, but for a variety of reasons this is unlikely to happen.  We must be able to use web archives to verify and validate web archives.  We explored a heavyweight design for this a few years ago (JCDL 2019), but it should be revisited in light of developments like WACZ.  


–Michael


ht to Herbert Van de Sompel for introducing me to "The Disintegration Loops" many years ago.


2026-03-25: The original Google/Blogger name ("Nora") has been anonymized.

Operationalizing Minimal Computing Values Through Shared Computing-Platform Development: A Case Study of DigitalArc and Opaque Publisher / In the Library, With the Lead Pipe

In Brief: This article explores how minimal computing principles guided the parallel web development of two related but distinct publishing platforms, DigitalArc and Opaque Publisher.  DigitalArc, a community-driven digital archive and exhibit platform, was developed in response to principles governing post-custodial archiving, taking it one step further to ensure communities maintain ownership of their materials and their digital artifacts. The Opaque Publisher, originally developed in support of a born-digital dissertation, adapts DigitalArc to support refusal theory for scholars who have to negotiate the tensions between using unethically obtained evidence in support of their research with moral objections to a lack of informed consent. At first glance, the use cases for each platform seem different, but both are providing mechanisms for individuals-by-proxy and communities to assert control over how their respective stories are shared.

By Kalani Craig, Michelle Dalmau and Sean Purcell

This article details the conversations, dependencies and contingencies that developed as our team simultaneously built two related, but distinct academic publishing platforms and considered the theoretical motivations for having done so. The first of these, DigitalArc (DA), was designed to support the creation of low-cost sustainable digital exhibits and archives built by and for communities who want to control how their histories are presented online. DA was designed as a community-driven digital archive and exhibit platform, ensuring communities maintain ownership of their materials and their digital artifacts.1 The second, the Opaque Publisher (OP), used DigitalArc as a foundation for a digital-exhibit and digital-publication platform that supports scholars who want to redact or remove information that was obtained from medical patients without their informed consent. The modeling and development of both platforms were guided by complementary frameworks that shaped our decision to use DigitalArc as the technical foundation for the Opaque Publisher (Zenzaro, 2024; Ciula et al., 2018).

In “The Digital Opaque: Refusing the Biomedical Object” (Purcell, Craig & Dalmau, 2025), we outlined our adoption of refusal theory in the rejection of unquestioned institutional norms around the use and display of unethically obtained medical specimens. This theoretical framework was operationalized in the OP as an author-audience interaction that allows authors to identify sensitive information in both the text of, and images included in, an academic publication. Readers are then given the ability to control how and whether that sensitive information is redacted fully or partially, or displayed openly, with the default view set to “partially opaque,” serving as a compromise between fully redacted and fully open. 

Here, we outline a history of technical interventions that supported ethical creation and interpretation of sensitive content through the iterative implementation of two publishing platforms. Core to both platforms were ethical-research considerations and public-communication audiences, which in turn drove the adoption of minimal computing approaches. We hope that, by focusing on the audience needs we identified for the two projects, the existing models we assessed for the OP’s parent framework, and some of the serendipitous contingencies that shaped the minimal-computing development of both platforms, we can offer some lessons for other digital-humanities development teams seeking to operationalize their theoretical frameworks in the form of technical choices. 

DigitalArc: A Community Digital Archive Platform 

In Fall of 2018, our team began to assess options for a spring 2019 course centered around the creation of a public-history archive as the main classroom activity. As the instructor, Craig initially asked for consulting advice about potential archiving platforms from Dalmau and other members of the team at the Institute for Digital Arts and Humanities, and from Dalmau and members of her digital-libraries team. Using their advice, along with obstacles that arose with our own institution supporting digital projects, Craig began to assess the potential for Github Pages as a publishing platform.

In Fall of 2018, our team began to assess options for a spring 2019 course centered around the creation of a public history archive as the main classroom activity. Our audience was twofold: first-year undergraduates with little to no research experience in history or technical experience in digital humanities, and the public audiences who would be engaging with the digital collection and historical essays those first-year students would develop as a part of their class. Models for this sort of endeavor existed in spades, most of which focused on the simplicity of content creation for content creators with minimal technical skill. Content management systems (CMS) allow these users to interact with a graphical user interface (GUI) and engage in button-pushing and form-filling behaviors that build multi-media experiences palatable to public audiences, with integrated display for photos, videos, audio, and text (Russel & Merinda, 2017). From Google Sites and WordPress in the corporate freemium sphere to Drupal and Omeka, open-source platforms commonly used in academic contexts, many of these content management systems modeled the use of a programming language (often PHP) supported by a back-end database (often MySQL) that served on-demand pages built “on the fly” as a reader requested each page on the web site. Our institutional support was rich for Omeka in particular, and we appreciated Omeka’s focus on non-profit academic public engagement. However, acquiring, critiquing, and applying digital literacies are key outcomes for the course, and we were able to hone these literacies, with the built-in support structure offered by the class, by exploring a more transparent code base offered by static sites (Wikle, Williamson and Becker, 2020).

As with many technical projects, however, serendipity wrinkled the fabric of our plan: that same semester, university IT rolled out a required upgrade to PHP on the servers that were available for hosting that, in turn, prompted a systemwide Omeka upgrade. This IT-driven upgrade represented, on the one hand, a very well-provisioned IT environment that could support database-driven CMS support for many sites, and on the other, a very clear division between that institution’s IT’s environment-building responsibility and researchers’ site-creation and maintenance responsibilities. Dozens of sites needed upgrades in order to remain accessible for public view, and in many cases, the creators of those sites were not equipped to handle such upgrades. That semester, Omeka served as both a model for what worked exceptionally well for novice creators in the site-building phase, and as a warning for the errors our students, and our public audiences, might expect to see in the site’s long-term post-project maintenance.

The experience pushed us away from big tech into the realm of “minimal computing,” an approach that responds to the tension between the often-limited resources and needs of a community of practitioners–which can include individual partners with institutional affiliation–with limited resources. The shift in focus to this tension between need and resource availability has as its main effect the need to consider how and why we’re using the technologies in the first place. Roopika Risam and Alex Gil anchor the minimal-computing movement’s motivation in “a very real fear” of big tech’s ideologies of fast growth at all costs, disruption over stability, and expense over access. Such ideologies continue to exclude communities whose “voices and stories…have been elided” in the cultural record, this time in a digital space instead of in physical collections (Gil and Risam, 2022). By contrast, minimal-computing best practices offered a framework that helped us evaluate these early-stage classroom priorities by asking what we had, what we needed, and what we wanted to prioritize. We had a team capable of managing almost any technical environment. We needed to reduce or eradicate long-term institutional dependencies and create an archive without longer-term sustainability concerns that Omeka and WordPress presented in the immediate institutional context. We wanted to prioritize short-term labor and development over a need for students or us to handle the long-term maintenance that Omeka and WordPress presented. As we further developed DigitalArc in the years that followed, this tension between resource limitation and need, then, led us to consider moving much of the maintenance complexity of the technology and the labor onto our institutional team, through development and documentation, in order to shift the expense of technology away from anyone who might be interested in using our platform later on.

Minimal computing’s emphasis on smaller-scale projects, with initial labor investment by technical experts that result in lower barriers to long-term technical maintenance and much lower cost, is also informed by a resistance against a “maximal” digital humanities, which leans on a combination of well-provisioned institutional support and the structural exigencies that require researchers to respond quickly to a limited set of choices when that institutional support changes. When these maximal computing tendencies are transferred from well-resourced IT environments and institutional support for long-term maintenance into other settings, the changed institutional pressures in turn create institution-specific site-creation and sustainability concerns that vary greatly from context to context (Miya & Rockwell, 2025).

The contingencies we faced, even in an IT-rich environment, helped guide us as we considered how to de-institutionalize both the minimal-computing and maximal-computing  platforms to which we had access. We anticipated that implementers of a minimal-computing platform would need methodical yet easy-to-step through documentation, to scaffold what they might initially see as a less  “user-friendly” interface. In this case, to achieve a minimal codebase in support of ongoing sustainability, we rely on substantive documentation. Herein lies one of several counter-intuitive responses to minimal computing. Despite these contradictions, our choices were intended to allow more agency for communities and scholars, giving both our developer team and our audiences a better handle on both the short-term and long-term “considerations of the costs, limits, or wisdom of scale” (Walsh 2024).

In Spring of 2019, our classroom began using the first version of a minimal-computing digital-exhibit template that would become DigitalArc many years later. The feature set included in this student-built version of the platform was partially inspired by Omeka’s academic focus on discovery and presentation based on well-formed metadata and its ability to support meaningful interactions with multimedia objects. We also took cues from the many CMSs that developed on WordPress’s model of simple authoring for novice creators, and cues from colleagues in digital humanities whose research on minimal computing suggested that the back-end design and documentation requires a heavier lift up front, but easier ongoing and longer term management over time (Wingo & Anderson, 2025). We also took steps to narrate the parallels between CMSs like WordPress or Weebly and the features that Github’s GUI editing pages offered, as a bridge to help build student confidence that Github Pages could come quite close to the ease point-and-click editing with minimal training time for them.

The tech stack that supported this initial minimal-computing approach was centered around Jekyll, a static-site generator that takes a different approach to the design and publishing of sites than Omeka and WordPress’ dynamic pages (on-request PHP-and-database generated pages).2 In this model, pages are built when creators make edits, rather than being built when a public viewer requests the page; if something went wrong with an edit, or with part of the tech stack, the implementers would have a greater chance of noticing and fixing the problem before the viewers would encounter an interruption to the site. As with Omeka and WordPress, Jekyll lets us design and implement headers, footers, and page templates that would apply to any of the content generated by students. As with other CMSs, we set up customization of fonts, colors, navigation elements, and other basic design. We later added support for custom metadata and navigation labels, which was intended to support both multilingual audiences and communities’ preferred vocabulary.

Our one compromise with the world of “maximal” computing, which we will address more fully below, was to host our Jekyll site on Github. From a teaching perspective, Github’s free user-account option and focus on collaborative editing made it easier for students to collaboratively access Github during class. From a site-editing perspective, Github’s Pages feature automatically added the template features we built to any of the simpler “markdown” files. Creators authored these markdown files, which focus on representation of the digital objects in the collection, including descriptions and corresponding images, text or time-based media files (see Fig 1). Markdown serves as the vehicle for encapsulating curated information (metadata) described through basic text formatting, which reduces technical barriers associated with scripting languages and database implementations. 

Figure 1. Image description: A screenshot of the markdown that describes an item included in the the DigitalArc Platform demo exhibit site. Available in markdown form at https://github.com/DigitalArcPlatform/demo/blob/main/_items/Item-1-document.md and as a reader-facing page at https://digitalarcplatform.github.io/demo/items/Item-1-document.html

For students, connecting these easy-to-teach text-only editing processes meant they could use Github’s online GUI-based editor. This experience highlighted the division of labor in minimal computing that places additional up-front burdens on our developers. It was our responsibility to: understand that Github Pages’ existing GUI interface was viable as a point-and-click user interface for file editing; create as user-friendly an environment as possible within the context of Github Pages; explain the affordances of Github Pages as having some additional difficulty up front but a much longer-scale ease of maintenance and use that allowed for better trade offs; and provide documentation of that environment that makes it more easily adaptable by novices.

While our first development effort in Fall of 2019 was aimed at the infrastructure for a one-time classroom engagement, it would, in true minimal-computing fashion, come to include “ethical concerns that influence our practice” (Risam, 2025). These ethical considerations added to the benefits that we found in choosing custom development in Github Pages over the similarly time-consuming customization and long-term maintenance we would have had to budget in order to to use an existing digital exhibition and publishing platform that had institutional support. We quickly realized that this minimal-computing model had several additional affordances that we could use for other projects. First, as we built the student exhibit, we realized it was an easier model for novices to adapt free of charge for multiple projects, as compared to the freemium option that is more common for platforms like WordPress or Weebly, in which a single user is limited to a single free site. Building and applying new single-page templates in Jekyll was easier with limited design and programming skill, both for our team’s own development work and for future potential community members learning to customize and launch their own web sites. Second, the collaborative, free, non-academic context of Github had promise for audiences whose experiences with universities and other large institutions was less than positive (Sutton & Craig, 2023).

The specifics of our rollout in the classroom and those that followed, however, presaged a persistent concern that Quinn Dombrowski addresses in “Minimizing Computing Maximizes Labor”: “going ‘minimal’ requires a great deal of technical labor” (2022). Students needed additional support to transition from a fully GUI-based editing interface to the combination GUI/text-based editing system that GitHub Pages and Jekyll require. Coming face-to-face with this early on helped us establish teaching and documentation principles that addressed the longer-term implications of a minimal-computing platform-development agenda. This trade-off emphasized, for us, the importance of creating scaffolded documentation for DA implementation that allows for a more flexible, accessible user experience and easier maintenance for web site content creators and managers (see Figure 2).3

Figure 2. Image description: A screenshot of the documentation that describes how to edit markdown that describes an item included in the DigitalArc Platform, starting with directions for posting items. Available at https://digitalarcplatform.github.io/documentation/docs/publishSite/posting/

With both concerns and affordances in mind, we began to use these minimal-computing approaches in other settings, including 3 community-facing History Harvest projects that took place over the 4 years that followed the student-focused digital-archive classroom experience.4

Opaque Publisher: A Scholarly Publishing Platform 

It’s here that we time-skip forward to 2023, and the development of the Opaque Publisher. By then, our team had experience building DA into a templated platform that drew on existing models of community archiving and had fully integrated ideas of minimal-computing labor division into our workflow. Prototyping through an ACLS-funded grant5 helped us build and test a reasonably featured minimal-computing Jekyll template that served several community archive projects as well as proved its adaptability to other web-publishing needs.6

The affordances we identified during the initial iterations of what we now know as DigitalArc also played a role in setting up DA to become a useful foundation for the OP. Our experience with the adaptability of minimal-computing Jekyll sites was crucial for the timely development of the OP. Jekyll’s development process meant that our OP team members with basic HTML skills and a willingness to experiment could see DA in its fully articulated form and use that as an easily portable example to customize a new platform. That changed the up-front labor required of the more skilled members of our development team, allowing us to divide the labor more easily and to repurpose code across our different Jekyll projects, which lowered the burden of learning for team members who were still learning that customization skillset. We appreciated this method because Jekyll offered an environment that not only scaffolded our team as they experimented with their newly built site in increasingly complex ways, but encouraged them to do so because each small success helped them see themselves as capable of technical tasks. 

The second–our focus on moving DA outside of its original institutional context–offered an anchor for the OP’s goal of refusal–an intentional rejection of institutional context and institutional harm. Those experiences provided a foundation for operationalizing the refusal theory that Sean Purcell’s then-dissertation project brought to our attention.7 At minimum, his  functional requirements included a digital exhibition platform that mirrored the formatting requirements associated with academic publishing (citations, tables of contents, and indices). These elements were not included in DA’s development, owing to a difference in intended audience and intended output. In addition to this, the project’s interactive approach to refusal required templating for the platform’s interactive elements, which could be added to prepared markdown files by a user familiar with basic hypertext markup language (HTML).8 One of the advantages of working on these two projects in parallel was an opportunity to develop resources for future Jekyll templates that attend to the overlapping, but distinctions shared by academics, archivists, and communities.  

While DigitalArc offered an easy starting point for a team already familiar with Github Pages and the specifics of the DigitalArc template itself, there was no shortage of CMS options that were designed with some academic apparatus in mind. We re-evaluated our minimal-computing starting point–what do we have, what do we need, and what are our priorities–as we did due diligence. Omeka’s third-party development community included a footnote plugin, and Omeka’s base install had a built-in table-of-contents generator. Scalar diverged from the exhibit model to offer a combination of non-linear and table-of-contents-based reading processes, but customizing Scalar required a higher learning curve for our team. However, Scalar’s computational overhead and its dependence on PHP complicated the process of long-term digital preservation. Mukurtu’s focus on the ethics of digital exhibits was a good fit for the refusal theory that drove Purcell’s dissertation, but its dependence on Drupal 7 triggered worries for us about the long-term sustainability of the platform. In hindsight, platform worries beyond our initial reluctance to use Omeka were well-founded; Drupal 7 support ended in January 2025, leaving Mukuru in limbo, and Scalar experienced a maintenance outage in August of 2025.9 As with DigitalArc, we wanted to engineer around potential vulnerabilities and gaps in site availability, and the friction between open, non-profit archival platforms and dependency on a constantly updating database-driven codebase meant moving away from these easily accessible platforms.

Despite the surfeit of academic CMS models, flexible redaction models were harder to find. Print redaction, like that done in Adobe Acrobat or other print-document generators, assumes permanent strike-throughs or blackouts. Again, Mukurtu offered inspiration for changing levels of visibility based on community-oriented traditions and ethics, but those levels are controlled by site creators rather than readers or audience members. Ultimately, DigitalArc offered both the longer-term maintenance that we prioritized, the non-institutional platform that aligned with our ethical goals of institutional refusal, and the fastest customization path for a series of interface options that reified authorial choices about which text and image sections were sensitive but allowed the redaction-level display of those sections to be reader-controlled (Fig.3). In choosing to go further down the minimal-computing path that we started with DigitalArc so,​ we provide readers with a way to engage with the tensions and ethical questions that we posed in the companion article: “as scholars we have to show our work, and this practice of showing is often at the expense of those whose lives and deaths are entangled in our research programs.” (Purcell, Craig & Dalmau, 2025)

Figure 3. Image description: An example of how refusal informed the opacity functions in the OP in which users are able to toggle how they view the images and text based on whether the subjects depicted in primary materials consented to the research. This example was drawn from Purcell, Sean “Teaching Hygiene” in The Tuberculosis Specimen. (2025). https://tuberculosisspecimen.github.io/diss/dissertation/1_3_4

Our choices were made with an audience of scholar-authors looking for simple technical solutions in mind. That focus pushed us away from the integration of more complex programming and toward a mostly-CSS solution to implement the interactive opacity filters. For text, the opacity filter activates unique span classes in the textual narrative that have been flagged during composition.10 The interpolation of image and text was done mostly in markdown, using Scrivener, with a few text-string replacements that allowed Purcell to easily insert the necessary HTML to apply the Javascript redaction, a process which we describe more fully in “The Digital Opaque.” The actual content of the images, however, were much more complicated as every image that needed to be made opaque had to be edited three times: first, to crop and format for web; second, to edit and remove the first level of opacity for the ‘partial opacity’ version of the site; and third, to remove more of the image for the ‘opaque’ version of the site (fig. 4). When the site loads for a user, all three versions of these images are loaded at the same time, but only one is visible for the user at any time.

Figure 4. Image description: The three versions of each image corresponded with the opacity guidelines of the site, incrementally removing elements of the bodies of patients based on the project’s predefined protocols. From left to right, a white woman drinking from a glass while staring at the camera, in the next image her eyes are obscured, in the final image her whole face is obscured. Three unique versions of every image had to be made. Lockard, Lorenzo B.. Tuberculosis of the Nose and Throat. St. Louis: C. V. Mosby Medical Book & Publishing Co., 1909

We tested a few versions of the opacity functionality during the platform’s development. The first was Javascript-heavy. As one of the most common Javascript libraries in use for web site development at the time of OP’s development, React.js (https://react.dev/) offers a broad platform to build interactive user-controlled opacity of both images and text drawn from the opaqued parts of those images. Two things led us to an alternative path. React’s requirement for local compilation, coupled with the sometimes unpredictable attention to backward compatibility because of React’s emphasis on the constantly changing world of mobile-app development, had the potential to create sustainability issues. That, along with its origins in a very profit-driven corporate Facebook setting, led us to emphasize CSS control rather than Javascript control of the opacity features. We instead adapted basic show/hide options that were already built into the non-profit Zurb Foundation 6 library (https://get.foundation/). This CSS library’s user-contribution-oriented development process aligns with our community-oriented goals, Zurb’s smaller contributor base leads to a slower code-update rate, making it more suitable for a project with few developers available to update code in response to new library releases, and the smaller library base also meant we could load a local copy of the library, frozen at a particular release date. These choices, in turn, provide a more predictable user experience in the preserved versions of the site that are hosted not at Github but in other disciplinary and institutional repositories and in the Internet Archive. By keeping the site architecture simple, the published sites are more accessible to end-users and to web archiving tools that struggle to replicate more complex interaction.

Purcell’s introduction of refusal theory also provided the team with another opportunity to question our minimal-computing approach. Microsoft purchased Github in 2018, just as we were addressing the workload of updating those Omeka sites that had broken. Our choice to engage in a minimal-computing endeavor was thoughtful and anchored in careful consideration; our choice of Github Pages and Github’s front-end file-editing GUI was less well-theorized, and the OP offered us the opportunity to reconsider our choice. While we decided Github was the most timely and stable choice for hosting the dissertation, we also made offsetting decisions owing largely to the affordances of minimal computing sites. The most crucial of these was to take a cue from the LOCKKS program (https://www.lockss.org/) in our choice of static-site generation through Jekyll. Static sites are more easily preserved in packaged form on multiple platforms, like IU’s institutional repository Scholarworks, to open-source repositories like Knowledge Commons. More importantly, static sites function with their intended behavior on the Internet Archive, which serves as a fully-open-source public repository of general knowledge.11 The distribution of many copies of completed archives in a variety of forms has also flagged a future need to explore alternatives to Github Pages like GitLab, in order to provide a platform for the DigitalArc template and its automated static-site page building process outside of Github’s environment.

Conclusion

While many of the choices we made were specifically oriented around the audiences for DA and OP, and the models in the digital-archive and digital-publication spaces that offered some but not all of the features we needed, the interplay between two platforms developed for different audiences on different models by the same team of scholar-developers has offered us some lessons that we are now integrating into our future work.

The first lesson we learned along the way is that contingency matters, and that the impulsive choices we made in response to serendipity and contingency can be a foundation for thoughtful, worthwhile change. If not for the PHP upgrade in Fall of 2018, we might not have repositioned our long-term platform-choice goals for DA in the context of minimal computing. That choice, in turn, shaped our choice of CSS-heavy redaction in the OP, a choice that has made our code more portable.

Our second takeaway is that it is hard to fully escape maximal computing. While Jekyll offers the option of building a website entirely on a personal computer, doing so has an enormous amount of technical overhead. In order to re-scope the technical skills required of our community-archive audiences, we needed Github’s full infrastructure–its web-based editing system, Github Pages and Github Actions–which is itself maximal and monopolistic.12 While we’ll always need a maximalist web platform to provide support for less technical content creators, both in the community-archive world and for self-publishing in the digital humanities, diversification of platform away from monopolies–to Gitlab and Bitbucket in particular, in this instance–will also help us as we seek to live up to the goals we set of static-site generation in service of long-term stability for both DA and OP users. Practically speaking, this compromise allowed for users to author and access sites on their phones and creators to minimally maintain sites over  a long period of time with no upgrades or technical skills necessary to keep the site accessible to public audiences.

The functionality we developed for DA and the OP may be imperfect and can be time consuming. However, the code that drives that functionality can be operationalized as part of the ethical reconsideration of a scholar’s primary evidence. Whether that evidence is from communities whose partnerships with institutions have been fraught or from subjects whose evidence was included in archives without their consent, the code that presents our evidence in a variety of forms is a humanistic process, an endeavor that happens in context and should treat context and contingency as an opportunity to understand our relationship to technology, rather than as something to be erased.


Acknowledgements

We would like to thank Nate Howard, Sagar Prabhu, Jessica Organ, and Morgan Vickery for contributing to the web application development of DigitalArc and Opaque Publisher. We also want to thank our friends and colleagues who also shaped this work, especially Emily Clark, Vanessa Elias, and Marisa Hicks-Alcaraz. We would also like to thank Élika Ortega, Roopika Risam, and Alex Gil, and especially members of the Minimal Computing Go:DH working group, for the various ways of framing minimal computing for the digital humanities and for the publics more broadly; for the inspiration and the paradoxes that have kept us on our toes. We are big fans of In the Library with the Lead Pipe’s open peer review process, and appreciate Quinn Dombrowski, Pamella Lach and Jessica Schomberg for their feedback. We are grateful to get to this (better) version of the article. Lastly, we would like to thank our funders for making this work possible: the New York Academy of Medicine, the Center for Research on Race Ethnicity and Society, and with support from the American Council of Learned Societies’ (ACLS) Digital Justice grant program.


References 

Hannah Alpert-Abrams et al., “Post-Custodialism for the Collective Good: Examining Neoliberalism in US – Latin American Archival Partnerships,” Journal of Critical Library and Information Studies 2, no. 1 (2019)

Christina Boyles et al., “Postcustodial Praxis: Building Shared Context through Decolonial Archiving,” Scholarly Editing 39 (2011).

Dombrowski, Q. (2022). “Minimizing Computing Maximizes Labor,” Digital Humanities Quarterly 16, no. 2, https://dhq.digitalhumanities.org/vol/16/2/000594/000594.html

Ciula, Arianna, Øyvind Eide, Cristina Marras, and Patrick Sahle. (2018). “Models and Modelling between Digital and Humanities. Remarks from a Multidisciplinary Perspective.” Historical Social Research / Historische Sozialforschung 43, no. 4, https://www.jstor.org/stable/26544261.

Miya, Chelsea and Geoffrey Rockwell. (2025). “Platitudes: The Carbon Weight of the Post-Platform Scholarly Web”, The Journal of Electronic Publishing 28, no. 2. doi: https://doi.org/10.3998/jep.7247 

Purcell, Sean, Kalani Craig, and Michelle Dalmau. (2025). “The Digital Opaque: Refusing the Biomedical Object,” In the Library with the Lead Pipe, https://www.inthelibrarywiththeleadpipe.org/2025/digital-opaque/.

Purcell, Sean. (2025). “The Tuberculosis Specimen: The Dying Body and Its Use in the War Against the ‘Great White Plague.’” Indiana University. https://tuberculosisspecimen.github.io/diss/.

Risam, Roopika. (2025). DH2025 Keynote – Digital Humanities for a World Unmade. https://roopikarisam.com/talks-cat/dh2025-keynote-digital-humanities-for-a-world-unmade/. DH2025, Lisbon.

Risam, Roopika, and Alex Gil. (2022). “Introduction: The Questions of Minimal Computing.” Digital Humanities Quarterly, vol. 16, no. 2, http://www.digitalhumanities.org/dhq/vol/16/2/000646/000646.html.

Russel, John E., and Merinda Kaye Hensley. “Beyond Buttonology: Digital humanities, digital pedagogy and the ACRL Framework” College & Research Libraries News. (December 2017), 588-591, 600.

Sutton, Jazma, and Kalani Craig. (2022). “Reaping the Harvest: Descendant Archival Practice to Foster Sustainable Digital Archives for Rural Black Women.” Digital Humanities Quarterly, vol. 16, no. 3, https://dhq.digitalhumanities.org/vol/16/3/000640/000640.html.

Ton, Mary Borgo, (2019). “Shining Lights: Magic Lanterns and the Missionary Movement, 1839-1868. https://scholarworks.iu.edu/dspace/handle/2022/26951.

Walsh, Brandon, (2024). “Maximalist Digital Humanities Pedagogy.” Walshbr.com (blog). https://walshbr.com/blog/maximalist-digital-humanities-pedagogy/

Wikle, Olivia, Evan Williamson and Devin Becker. (2020). “What is Static Web and What’s it Doing in the Digital Humanities Classroom?” In M. Brooks et al.(Eds), Literacies in a Digital Humanities Context: A dh+lib Special Issue  (pp. 14-18), https://doi.org/10.17613/ryea-4z10

Wingo, Rebecca, Anderson MR. (2025). A Sustainable Shared Authority: The Future of Rondo’s Past. Public Humanities. doi: https://doi.org/10.1017/pub.2025.29

Zenzaro, Simone, (2024). “Models for Digital Humanities Tools: Coping with Technological Changes and Obsolescence.” International Journal of Information Science &Technology, vol. 8, no. 2, http://dx.doi.org/10.57675/IMIST.PRSM/ijist-v8i2.283.

  1. Institutional dependencies can facilitate the creation and publication of a digital community archive but they can also result in reduced, or perceived reductions in community control over their own materials. For example, an institutional partnership might mean communities need to meet digital archiving standards that require costly equipment, where a community archive goal focuses on capturing community contributions (interviews, artifacts, etc.) in the best possible way, using easy-to-access and affordable mechanisms like one’s smartphone and DIY lightbox. Another example is reliance on more advanced technological infrastructure offered by institutions. Rather than opt for a post-custodial approach in which an institution like a public library or local history center hosts the digital archive, community members can do so themselves (Alper-Abrams et al. 2019; Boyles et al. 2011). ↩
  2. For an example of a more complex installation of Jekyll, which relies on the user installing a programming environment on their computer to compile their site, see Amanda Visconti, “Building a static website with Jekyll and GitHub Pages,” Programming Historian 5 (2016), https://doi.org/10.46430/phen0048. Note that the “difficulty” level for this tutorial is rated as “low”. ↩
  3. DigitalArc provides step-by-step documentation from planning a community archiving event to publishing a digital archive: https://digitalarcplatform.github.io/documentation/. The Opaque Publisher does the same: https://opaquepublisher.github.io/documentation/. ↩
  4. In order of development cycles, these are the earlier versions of DigitalArc: Identity Through Objects (https://iubhistoryharvest.github.io/), Remembering Freedom: Longtown and Greenville History Harvest (https://longtownhistory.github.io/), Homebound (https://homeboundatiu.github.io/), La Casa / La Comunidad (https://lacasaiu.github.io/). ↩
  5. To learn more about the ACLS-Funded DigitalArc project, visit: https://digitalarcplatform.github.io/. ↩
  6. rchIvory: An Interdisciplinary Research Project” (https://www.archivory.org/), On Display: A Twenty-First Century Salon Des Refusés (https://ondisplayattulane.github.io) and Kalani Craig’s Digital History Dossier for Tenure as Associate Professor of History (https://tenuredossier.kalanicraig.com/) ↩
  7. Purcell, Sean. 2025. “The Tuberculosis Specimen: The Dying Body and Its Use in the War Against the ‘Great White Plague.’” Indiana University. https://tuberculosisspecimen.github.io/diss/ ↩
  8. For an example of the custom code developed for the OP, see: https://tuberculosisspecimen.github.io/diss/dissertation/X_2_1 ↩
  9. At the time of writing this article, Mukurtu had still not released version 4, which would move away from Drupal 7 to Drupal 11. Currently, Mukurtu 4 is available in as a stable beta: https://mukurtu.org/mukurtu-4/. ↩
  10. Flagging of text and images depended on a predefined ethical framework, and highlighted at different phases in research. For The Tuberculosis Specimen, opacity designations were decided based on different approaches to biomedical informed consent and subject privacy ( https://tuberculosisspecimen.github.io/diss/dissertation/FAQ ). Images were flagged as they were added to chapter drafts and text was flagged during the project’s ‘ethics audit’–a moment prior to publication where researchers are invited to reflect on what processes they used and materials they included and alter their final result to match the ethical frameworks they hoped to meet in the project (Purcell, Craig & Dalmau 2025). These sections were flagged in the project’s word processing program (Scrivener), using placeholder text, which would be changed using a batch find-and-replace script for text files (https://tuberculosisspecimen.github.io/diss/dissertation/X_2_3). ↩
  11. The Wayback Machine is able to preserve Sean’s dissertation as-is: https://web.archive.org/web/20250516183042/https://tuberculosisspecimen.github.io/diss/. The same isn’t true for Mary Borgo Ton, whose born-digital dissertation preceded Sean’s at Indiana University. Mary had to combine several output and documentation approaches to preserve, as closely as possible, her dissertation since the Wayback Machine was unable to preserve content produced by Scalar, which is a more complex PHP site. Instead parts of Mary’s dissertation were preserved via Indiana University’s institutional repository: https://scholarworks.iu.edu/dspace/handle/2022/26951. ↩
  12. For a broader view of how Github’s quasi-monopoly shapes student experiences in higher-education classrooms, see https://ploum.net/2026-01-05-unteaching_github.html; for Github’s relationship to Microsoft and even more monopolistic technology platforming, see https://medium.com/asecuritysite-when-bob-met-alice/as-github-glitches-are-we-too-dependent-on-microsoft-01d9c2f67329 ↩

Enclosure / Ed Summers

It was interesting to see this short article 1 about the enclosure of the web commons go by after just having listened to The Dig’s epic two part interview with Peter Linebaugh 2.

What’s needed is a multi-stakeholder settlement in which large-scale users of the commons take on long-term, structured obligations to sustain it: contractual funding through paid APIs and usage-based levies, formal recognition of DPGs as Digital Public Infrastructure to unlock multilateral co-financing, and a shift in philanthropy from one-off project grants to sustained core support for the institutions that maintain the commons.

I hadn’t realized that the details of these deals that Wikipedia are striking aren’t fully transparent, and well understood outside of closed doors? I think it’s really instructive to think about what is happening right now on the web as enclosure, and part of a longer history of capitalism (as Linebaugh and Denvir talk about). The interview made me think of the craft, tooling, and means of production that are still present in the software industry, but that are being actively being enclosed by the centralization of tooling and skill, craft and knowledge itself.

Yes, I’m talking about LLMs here. Once you see it, it’s impossible not to see it.

This all makes me think of Eleanor Ostrom’s design principles and how it is important that the Wikipedia community have insight into how their commons is being used through monitoring, decision making, resolving the future conflicts that will no doubt ensue.

I’m not entirely sure I understand the potential role of the Digital Public Goods Alliance in all this:

Digital Public Goods (DPG) are supposed to be shielded from precisely this kind of capture. They require financing models commensurate with their public value, not models that make them fiscally dependent on their most extractive users. When the sustainability of a DPG hinges on a small oligopoly of AI firms, the risk turns political: agenda-setting and governance drift toward those who can threaten to walk away.

Perhaps Wikipedia is at risk with losing its DPG status? In what practical ways does identification as a DPG help shape governance? What is being done, or can we be doing to push back on this enclosure? And of course, the situation is quite a bit bigger when you consider the strain that LLM hungry bots are putting on cultural heritage organizations, also part of a larger commons.


  1. The Commons w/ Peter Linebaugh, The Dig.↩︎

  2. Mining the commons: AI extraction, Wikipedia, and the case for a multi-stakeholder settlement, Internet Policy Review.↩︎

Weekly Bookmarks / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Learning How to Learn: Abduction as the ‘Missing Link’ in Machine Learning

In this paper, the question of machine learning is revisited in order to explore whether Bayesian learning, as a form of abductive reasoning, can provide an alternative to the current dichotomy between inductive and deductive approaches in machine learning debates. The paper will further demonstrate that machine learning invariably entails a degree of situatedness, as evidenced by the example of Bayesian belief networks, which arguably rely on abductive reasoning. In this manner, the discourse surrounding Bayesian learning models has the capacity to elucidate the aspects that are often left implicit in contemporary machine learning debates and methodologies.

🔖 Closing the verification loop: Observability-driven harnesses for building with agents

Our approach is harness-first engineering: instead of reading every line of agent-generated code, invest in automated checks that can tell us with high confidence, in seconds, whether the code is correct. The agent generates code, the harness verifies it, production telemetry validates it, and if something is wrong, the feedback updates the harness and the agent tries again. The specific methods to develop harnesses vary in rigor—deterministic simulation testing, formal specifications, shadow evaluation, observability-driven feedback loops—but the principle remains the same: make the verification fast and automatic, and let the harness do the work that human review cannot scale to do.

🔖 Achieving Efficient Version Control of JSON with Prolly Trees

Dolt uses Prolly Trees because they give us two very important properties: history independence and structural sharing. These are both incredibly valuable properties for a distributed database. Structural sharing in particular means that two tables that differ only slightly can re-use storage space for the parts that are the same. Most SQL engines obtain structural sharing for tables by using B-trees or a similar data structure… but that doesn’t extend easily to JSON documents. Some tools like Git and IPFS achieve structural sharing for directories by using a tree structure that mirrors the directory… but that creates a level of indirection for each layer of the document, which would slow down queries if the document had too many nested layers. Something else was needed.

🔖 The Purpose of Protocols

On this account, protocols are governance structures whose design choices allocate power, and the purpose of the entire enterprise is the protection of rights. Protocol design is a form of political design, and the appropriate way to evaluate protocols is not only by their technical properties but by the governance outcomes they produce.

🔖 Cartography of generative AI

The popularisation of artificial intelligence (AI) has given rise to imaginaries that invite alienation and mystification. At a time when these technologies seem to be consolidating, it is pertinent to map their connections with human activities and more than human territories. What set of extractions, agencies and resources allow us to converse online with a text-generating tool or to obtain images in a matter of seconds?

🔖 All Data Are Local: Thinking Critically in a Data-Driven Society

How to analyze data settings rather than data sets, acknowledging the meaning-making power of the local.

In our data-driven society, it is too easy to assume the transparency of data. Instead, Yanni Loukissas argues in All Data Are Local, we should approach data sets with an awareness that data are created by humans and their dutiful machines, at a time, in a place, with the instruments at hand, for audiences that are conditioned to receive them. The term data set implies something discrete, complete, and portable, but it is none of those things. Examining a series of data sources important for understanding the state of public life in the United States—Harvard’s Arnold Arboretum, the Digital Public Library of America, UCLA’s Television News Archive, and the real estate marketplace Zillow—Loukissas shows us how to analyze data settings rather than data sets.

🔖 Water of the Sky A Dictionary of 2,000 Japanese Rain Words

A breathtakingly elegant visual dictionary of 2000 Japanese words for rain, with 100 drawings in indigo.

In Water of the Sky, artist Miya Ando offers us a beautifully rich, bilingual visual dictionary for rain. Through a collection of 2,000 Japanese words, their English interpretations, and 100 drawings, Ando describes the breadth and diversity of rain’s many expressions: when it falls, how it falls, and how its observer might be transformed physically or emotionally by its presence. The words range from prosaic to esoteric, extending from the meteorological (mukaame, or “very fine rain that falls in spring”) to the mystical (bunryūu, or “rain that splits a dragon’s body in half”) and from the minute (kisame, or “raindrops that fall off the leaves and branches of trees”) to the vast (takuu, or “blessed rain that quenches all things in the universe”).

🔖 a collection of tiny llms with usecases

Why Small LLMs Matter

The AI industry defaults to “bigger is better” - GPT-4, Claude Opus, Llama 70B. But for most production workloads, 80% of LLM calls don’t need a 100B+ parameter model. They need a function routed, a tool selected, a query classified, or a simple response generated.

Small LLMs (under 4B parameters) solve this by running locally, for free, in milliseconds.

🔖 susam / wander

Wander is a small, decentralised, self-hosted web console that lets your visitors explore random pages from a community of personal websites

🔖 Visual Introduction to PyTorch

PyTorch is currently one of the most popular deep learning frameworks. It is an open-source library built upon the Torch Library.

Most tutorials assume you’re comfortable jumping straight into code. I made a visual introduction that walks through the core concepts step by step, with animations and diagrams instead of walls of text

🔖 TigerFS

A filesystem backed by PostgreSQL, and a filesystem interface to PostgreSQL. TigerFS mounts a database as a directory. Every file is a real row. Writes are transactions. Multiple agents and humans can read and write concurrently with full ACID guarantees, locally or across machines. Any tool that works with files works out of the box.

🔖 Pointing at Clouds: Indexing, Searching, and Citing in an Age of AI Smog

I wanted to show how an index is not just a bibliographic convention, or an organizational method, or a media form, or a financial instrument, or a corporeal component; it’s also an intellectual architecture, a literary genre, a creative form, a semiotic concept, and an embodiment of agency — one that might offer an important antidote to the pervasive autonomous, extractive cloudification of our contemporary information ecology.

🔖 Brandolini’s law

Brandolini’s law (or the bullshit asymmetry principle) is an Internet adage coined in 2013 by Italian programmer Alberto Brandolini. It compares the considerable effort of debunking misinformation to the relative ease of creating it in the first place. The adage states:

The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.[1][2]

The challenge of refuting bullshit does not come just from its time-consuming nature, but also from the challenge of defying and confronting one’s community.

🔖 An AI Agent Published a Hit Piece on Me

Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

🔖 E. P. Thompson

Edward Palmer Thompson (3 February 1924 – 28 August 1993) was an English historian, writer, socialist and peace campaigner. He is best known for his historical work on the radical movements in the late 18th and early 19th centuries, in particular The Making of the English Working Class (1963).

🔖 OpenDataLoader PDF

PDF parser for AI data extraction — Extract Markdown, JSON (with bounding boxes), and HTML from any PDF. #1 in benchmarks (0.90 overall). Deterministic local mode + AI hybrid mode for complex pages.

🔖 Vips / Image / dzsave

Save an image as a set of tiles at various resolutions. By default dzsave uses DeepZoom layout — use layout to pick other conventions.

🔖 Every layer of review makes you 10x slower

It’s funny, everyone has been predicting the Singularity for decades now. The premise is we build systems that are so smart that they themselves can build the next system that is even smarter, that builds the next smarter one, and so on, and once we get that started, if they keep getting smarter faster enough, then the incremental time (t) to achieve a unit (u) of improvement goes to zero, so (u/t) goes to infinity and foom.

Anyway, I have never believed in this theory for the simple reason we outlined above: the majority of time needed to get anything done is not actually the time doing it. It’s wall clock time. Waiting. Latency.

And you can’t overcome latency with brute force.

I know you want to. I know many of you now work at companies where the business model kinda depends on doing exactly that.

Sorry.

But you can’t just not review things!

🔖 Can I Run AI locally?

CanIRun.ai runs entirely in your browser. When you visit the site, we use browser APIs to detect your GPU, CPU, and memory — then we calculate which AI models can run on your hardware and how fast. No data is sent to any server. Everything is computed client-side.

🔖 iiif-tiles/tile_iiif.py

Generate IIIF Level 0 static tiles from images in a HF Bucket. Downloads source images from a bucket, generates IIIF Image API 3.0 tiles using libvips, creates a IIIF Presentation v3 manifest, and syncs everything to an output bucket for static serving via HF CDN.

🔖 SlowLLM

SLOW LLM is a browser extension that makes LLMs appear to run very slowly. It works with ChatGPT and Claude.

🔖 Bibliothèques et agents IA : le risque de l’invisibilisation

Plan

L’année 2026 sera l’année des agents IA… C’était annoncé, et effectivement depuis le début de l’année nous assistons à la diffusion et à la montée en puissance de deux grandes familles d’outils agentiques d’un nouveau type : d’une part des assistants orientés coding comme Claude Code, Codex, Gemini CLI, Opencode etc., et d’autre part des frameworks de création, de configuration et d’orchestration d’agents permettant l’automatisation de workflows via des canaux de communication (Slack, Discord, messagerie…) comme OpenClaw et ses multiples dérivés

🔖 Toi Derricotte

Toi Derricotte (pronounced DARE-ah-cot ) (born April 12, 1941) is an American poet. She is the author of six poetry collections and a literary memoir. She has won numerous literary awards, including the 2020 Frost Medal for distinguished lifetime achievement in poetry awarded by the Poetry Society of America, and the 2021 Wallace Stevens Award, sponsored by the Academy of American Poets. From 2012–2017, Derricotte served as a Chancellor of the Academy of American Poets. She is currently a professor emerita in writing at the University of Pittsburgh. Derricotte is a member of The Wintergreen Women Writers Collective.[2]

🔖 Degrowth and socialist comrades, what should we be doing?

This is an attempt to clarify this discussion of degrowth strategy, a topic on which I think there is considerable confusion and mistaken approaches. Debate has recently been fuelled by Jason Hickel’s argument for a socialist position on both the goal and the means do it for degrowth. Liegey, Nelson and Leahy replied against Jason, defending the wide and diverse range of strategies now characteristic of the movement and often referred to by the terns “Horizontalism” and “Pluriverse”. Several others have contributed to the discussion, including Jason’s reply to Leigey et al., his subsequent response, my critique of the Leigey, Nelson and Leahy article, Gasparo and Vico, Gregoletto and Burton, Bunea, and Kallis and D’Alisa.

🔖 Category Theory for the Working Programmer - 1.0 - Prologue

In this series we will explain ideas in Category theory from first principles in order to build intuition and derive the actual formal definitions. We’ll use that foundation to demonstrate exactly where these concepts fit into day to day functional programming and how you can do useful things with that knowledge.

Watch this if you want an introduction to category theory that is simple, practical, joyful, and deeply grounded in functional programming

🔖 Sentimental Value

Sentimental Value (Norwegian: Affeksjonsverdi) is a 2025 Norwegian drama film directed by Joachim Trier, who co-wrote it with Eskil Vogt. It follows sisters Nora (Renate Reinsve) and Agnes (Inga Ibsdotter Lilleaas) in their reunion with their estranged father Gustav (Stellan Skarsgård). It also stars Elle Fanning.

🔖 On the Silver Globe

On the Silver Globe (Polish: Na srebrnym globie) is a 1988 Polish epic surrealist science fiction arthouse film[1] written and directed by Andrzej Żuławski, adapted from The Lunar Trilogy by his grand-uncle, Jerzy Żuławski. Starring Andrzej Seweryn, Jerzy Trela, Iwona Bielska, Jan Frycz, Henryk Bista, Grażyna Deląg and Krystyna Janda, the plot follows a team of astronauts who land on an uninhabited planet and form a society. Many years later, a single astronaut is sent to the planet and becomes a messiah.

Production took place from 1976 to 1977, but was interrupted by the Polish authorities. The budget is estimated to be at least PLN 58 million.[2] Many years later, Żuławski was able to finish his film, although not as originally intended. On the Silver Globe premiered at the 1988 Cannes Film Festival, and has received consistent critical acclaim.

Metastablecoin Fragmentation / David Rosenthal

A fundamental problem for decentralized systems like permissionless blockchains is that their security depends upon the cost of an attack being greater than the potential reward from it. Various techniques are used to impose these costs, generally either Proof-of-Work (PoW) or Proof-of-Stake (PoS). These costs have implications for the economics (or tokenomics) of such systems, for example that their security is linear in cost, whereas centralized systems can use techniques such as encryption to achieve security exponential in cost.

Shin Figure 3
Now, via Toby Nangle's Stablecoin = Fracturedcoin we find Tokenomics and blockchain fragmentation by Hyun Song Shin, whose basic point is that these costs must be borne by the users of the system. For cryptocurrencies, this means through either or both transaction fees or inflation of the currency. The tradeoff between cost and security means that there is a market for competing blockchains making different tradeoffs. In practice we see a vast number of competing blockchains:
Tether’s USDT sits on 107 different ledgers. ... USDC sits on 125.
The chart shows Ethereum losing market share against competing blockchains.

Shin's analysis uses game theory to explain why this fragmentation is an inevitable result of tokenomics. Below the fold I go into the background and the details of Shin's explanation.

Background

In 2018's Cryptocurrencies Have Limits I discussed Eric Budish's The Economic Limits Of Bitcoin And The Blockchain, an important analysis of the economics of two kinds of "51% attack" on Bitcoin and other cryptocurrencies based on PoW blockchains. Among other things, Budish shows that, for safety, the value of transactions in a block must be low relative to the fees in the block plus the reward for mining the block.

In 2019's The Economics Of Bitcoin Transactions I discussed Raphael Auer's Beyond the doomsday economics of “proof-of-work” in cryptocurrencies, in which Auer shows that:
proof-of-work can only achieve payment security if mining income is high, but the transaction market cannot generate an adequate level of income. ... the economic design of the transaction market fails to generate high enough fees.
Source
Bitcoin's costs are defrayed almost entirely by inflating the currency, as shown in this chart of the last year's income for miners. Notice that the fees are barely visible.

It has been known for at least a decade that Bitcoin's plan to phase out the inflation of the currency was problematic. In 2024's Fee-Only Bitcoin I wrote:
In 2016 Arvind Narayanan's group at Princeton published a related instability in Carlsten et al's On the instability of bitcoin without the block reward. Narayanan summarized the paper in a blog post:
Our key insight is that with only transaction fees, the variance of the miner reward is very high due to the randomness of the block arrival time, and it becomes attractive to fork a “wealthy” block to “steal” the rewards therein.
So Bitcoin's security depends upon the "price" rising enough to counteract the four-yearly halvings of the block reward. In that post I made a thought-experiment:
As I write the average fee per transaction is $3.21 while the average cost (reward plus fee) is $65.72, so transactions are 95% subsidized by inflating the currency. Over time, miners reap about 1.5% of the transaction volume. The miners' daily income is around $30M, below average. This is about 2.5E-5 of BTC's "market cap".

Lets assume, optimistically, that this below average daily fraction of the "market cap" is sufficient to deter attacks and examine what might happen in 2036 after 3 more halvings. The block reward will be 0.39BTC. Lets work in 2024 dollars and assume that the BTC "price" exceeds inflation by 3.5%, so in 12 years BTC will be around $98.2K.

To maintain deterrence miners' daily income will need to be about $50M, Each day there will be about 144 blocks generating 56.16BTC or about $5.5M, which is 11% of the required miners' income. Instead of 5% of the income, fees will need to cover 89% of it. The daily fees will need to be $44.5M. Bitcoin's blockchain averages around 500K transactions/day, so the average transaction fee will need to be around $90, or around 30 times the current fee.
Average fee/transaction
Bitcoin users set the fee they pay for their transaction. In effect they are bidding in a blind auction for the limited supply of transaction slots. Miners are motivated to include high-fee transactions in their next block. If there were an infinite supply of transactions slots miners' fee income would be zero. In practice, much of the timethe supply of slots exceeds demand and fees are low. At times when everyone wants to transact, such as when the "price" crashes, the average fee spikes enormously.

There was thus a need for a consensus mechanism that did not depend upon inflation. In 2020's Economic Limits Of Proof-of-Stake Blockchains I discussed a post entitled More (or less) economic limits of the blockchain by Joshua Gans and Neil Gandal in which they summarize their paper with the same title. The importance of this paper is that it extends the economic analysis of Budish to PoS blockchains. Their abstract reads:
Cryptocurrencies such as Bitcoin rely on a ‘proof of work’ scheme to allow nodes in the network to ‘agree’ to append a block of transactions to the blockchain, but this scheme requires real resources (a cost) from the node. This column examines an alternative consensus mechanism in the form of proof-of-stake protocols. It finds that an economically sustainable network will involve the same cost, regardless of whether it is proof of work or proof of stake. It also suggests that permissioned networks will not be able to economise on costs relative to permissionless networks.
Source
In 2022 Ethereum switched from Proof-of-Work to Proof-of-Stake, reducing its energy consumption by around 99%. This chart shows that, like Bitcoin, until the "Merge" the costs were largely defrayed by inflating the currency. After the "Merge" the blockchain has been running on transaction fees.

Shin's Analysis

Here is a summary of Shin's analysis.

Notation

  • There is a continuum of validators i.
  • For validator i ∈ [0;1], the cost of contributing to governance is ci > 0.
  • The blockchain needs at least a fraction of the validators  contributing to be secure. Shin writes:
    There are two special cases of note: = 1 (unanimity, corresponding to full decentralisation where every validator must participate for the blockchain to function) and = 0 which corresponds to full centralisation, where one validator has authority to update the ledger.
    = 1 is impractical,lacking fault tolerance. = 0 is much more practical, it is the traditional trusted intermediary.
  • If the blockchain is secure, each contributing validator earns a reward p > 0. A non-contributing validator earns zero.
  • The validators share a common cost threshold c*. If ci < c*, validator i contributes, if ci > c* validator i does not.

Argument

Each validator will want to contribute only if at least - 1 other validators contribute, which poses a coordination problem. The case of particular interest is the validator with ci = c*. Shin writes:
Intuitively, even though the marginal validator may have very precise information about the common cost c*, the validator faces irreducible uncertainty about how many other validators will choose to contribute. It is this strategic uncertainty — uncertainty about others' actions — that is the central feature of the coordination problem.
This "strategic uncertainty" is similar to the attacker's uncertainty about other peers' actions that is at the heart of the defenses of the LOCKSS system in our 2003 paper Preserving peer replicas by rate-limited sampled voting.

Shin Figure 6
Because the marginal validator's ci = c*, the decision whether or not to contribute makes no difference. Sin's Figure 6 explains this graphically. Rectangle A is the loss if k < and rectangle B is the gain if k > . Setting them equal gives:
c* = (p - c*)(1 - )
which simplifies to:
c* = p(1 - )
Shin and Morris earlier showed that this is the unique equilibrium no matter what strategy the validators use.

Result

What this means is that successful validation depends upon the reward p being large enough so that:
p c 1 − κ̂
Shin writes:
Note that the required reward p explodes as → 1. This is the central result of the paper: the more decentralised the blockchain (the higher the supermajority threshold), the higher must be the rents that accrue to validators. In the limiting case of unanimity ( = 1), no finite reward can sustain the coordination equilibrium.
Shin Figure 1
This yet another result showing that a reasonably secure blockchain is unreasonably expensive. The complication is that, much of the time, transactions are cheap because the demand for them is low. Thus most of the time validators are not earning enough for the risks they run. But:
When many users want to transact at the same time, they bid against each other for limited block space, and fees spike — much as taxi fares surge during rush hour. Figure 1 shows how Ethereum gas fees exhibited sharp spikes during periods of network congestion, such as during surges in decentralised finance (DeFi) activity or spikes in the minting of non-fungible tokens (NFTs). These spikes are not merely a reáection of excess demand; they are the mechanism through which the blockchain extracts the rents needed to sustain validator coordination.
Note that these spikes mean that the majority of the time fees are low but the majority of transactions face high fees. It is this "user experience" that drives the fragmentation that Shin describes:
When demand for block space is high, fees rise and validators are well compensated. But high fees deter users, especially those making small or routine transactions. These users are the first to migrate to competing blockchains that offer lower fees — blockchains that can offer lower fees precisely because they have lower coordination thresholds (and hence less security). The users who remain on the more secure blockchain are those with the highest willingness to pay: institutions, large DeFi protocols, and transactions where security and censorship resistance are paramount. This sorting of users across blockchains is the essence of fragmentation.
Shin notes that:
The fragmentation argument is the flipside of blockchain's "scalability trilemma," as described by Vitalik Buterin, who posed the problem as the impossibility of attaining, simultaneously, a ledger that is decentralised, secure, and scalable.
Source
It is worth noting that Buterin's trilemma is a version for PoS of the trilemma Markus K Brunnermeier and Joseph Abadi introduced for PoW in 2018's The economics of blockchains. See The Blockchain Trilemma for details.

Shin's focus is primarily on the effects of fragmentation on stablecoins. He notes that:
Rather than converging on a single platform, stablecoin activity is scattered across many chains (Figure 4). As of late 2025, Ethereum held the majority of total stablecoin supply but was facing competition from Tron and Solana, each of which had attracted tens of billions of dollars in stablecoin balances. Each chain serves different geographies and use cases: Ethereum for institutional settlement, Tron for low-cost remittances, Solana for retail payments and DeFi activity.
This fragmentation among blockchains would not matter much if stablecoins were interoperable between them, but they are confined to the blockchain on which they were minted:
A USDC token on Ethereum is not the same as a USDC token on Solana — they exist on separate ledgers that have no native way of communicating with each other. Transferring between chains requires the use of bridges: specialised software protocols that lock tokens on one chain and issue equivalent tokens on another. These bridges introduce additional risks, including vulnerabilities in the smart contract code — bridge exploits have accounted for billions of dollars in cumulative losses — and they impose costs and delays that undermine the seamless transferability that is the hallmark of money. The result is a landscape in which stablecoins from the same issuer exist in multiple, non-fungible forms across different blockchains, fragmenting liquidity and undercutting the network effects that should be the strength of a widely adopted payment instrument.

Discussion

As I've been pointing out since 2014, very powerful economic forces mean that Decentralized Systems Aren't. So the users paying for the more expensive transactions because they believe in decentralization aren't getting what they pay for.

Source
As I wrote in 2024's It Was Ten Years Ago Today:
The insight applies to Proof Of Stake networks at two levels:
  • Block production: over the last month almost half of all blocks have been produced by beaverbuild.
  • Staking: Yueqi Yang noted that:
    Coinbase Global Inc. is already the second-largest validator ... controlling about 14% of staked Ether. The top provider, Lido, controls 31.7% of the staked tokens,
    That is 45.7% of the total staked controlled by the top two.
Source
In addition all these networks lack software diversity. For example, as I write the top two Ethereum consensus clients have nearly 70% market share, and the top two execution clients have 82% market share.
Shin writes as if more decentralization equals more security even though it doesn't happen in practice, but this isn't really a problem. What the users paying the higher fees want is more security, and they are probably getting because they are paying higher fees. As I discussed in Sabotaging Bitcoin, the reason major blockchains like Bitcoin and Ethereum don't get attacked is not because the (short-term) rewards for an attack are less than the cost. It is rather that everyone capable of mounting an attack is making so much money that:
those who could kill the golden goose don't want to.
Shin Figure 3
In any case what matters for Shin's analysis isn't that the users actually get more security for higher fees, but that they believe they do. Like so much in the cryptocurrency world, what matters is gaslighting. But what the chart showing Ethereum losing market share shows is that security is not a concern for a typical user.

mkiiif, yet another static IIIF generator / Raffaele Messuti

I revisited an old Go package I've been using over the past few years to build IIIF manifests — nothing fancy, just some glue around structs and JSON. From that I built a new CLI, mkiiif, to generate IIIF manifests from static images (tiled or not). There are plenty of similar tools out there (iiif-tiler, tile-iiif, biiif, ...) but none quite matched the CLI ergonomics I needed for my daily workflow.

I moved the library to this new repository atomotic/iiif. The tool mkiiif can be installed downloading a binary release or with Go:

go install github.com/docuverse/iiif/cmd/mkiiif@latest

mkiiif can generate an IIIF manifest from a source directory containing images, or from a PDF file that gets exploded and converted to images via mupdf. Output images can be either untiled or static tiles generated with vips. Both approaches produce a IIIF Level 0 compliant layout, static files that can be served from any HTTP server, with no image server required. Untiled is less efficient for large images but perfectly fine for printed books, papers, and similar material.

mupdf and vips are external dependencies, that need to be installed separately. They are invoked via subprocess; I chose not to add Go library wrappers around them to keep the tool simple. WASM ports of both may become viable in the future.

The CLI usage:

Usage: mkiiif -id <id> -base <url> -title <title> -source <dir|pdf> -destination <dir> [-tiles]
  -base string
        Base URL where the manifest will be served (e.g. https://example.org/iiif)
  -destination string
        Output directory; a subdirectory named <id> will be created inside it, containing the images and manifest.json
  -id string
        Unique identifier for the manifest (e.g. book1)
  -resolution int
        Resolution (DPI) used when converting PDF pages to images via mutool (default 150)
  -source string
        Path to a directory of images or a PDF file to convert
  -tiles
        Generate IIIF image tiles for each image using vips dzsave (requires vips)
  -title string
        Human-readable title of the manifest

Example:

~ mkiiif -base https://digital.library.org -destination ./public -id iiif01 -source ~/book.pdf -title "iiif 01"

Or with tiling:

~ mkiiif -base https://digital.library.org -destination ./public -id iiif01 -source ~/book.pdf -title "iiif 01" -tiles

Both commands produce the following structure inside ./public:

└── iiif01
    ├── index.html
    ├── manifest.json
    ├── page-001.png
    ├── page-002.png
    ├── page-....png
    └── page-....png
└── iiif01
│   ├── index.html
│   ├── manifest.json
│   ├── page-001
│   │   ├── 0,0,1024,1024
│   │   │   └── 512,512
│   │   │       └── 0
│   │   │           └── default.jpg
...
│       ├── full
│       │   ├── 362,501
│       │   │   └── 0
│       │   │       └── default.jpg
│       │   └── max
│       │       └── 0
│       │           └── default.jpg
│       └── info.json
...

The directory can then be served from https://digital.library.org.

I've adopted this URL scheme:

https://{base}/{id}
    /manifest.json — the IIIF manifest
    /index.html    — a simple viewer

So in the example above, https://digital.library.org/iiif01 opens a full viewer to browse the object. The viewer used is Triiiceratops — the newest viewer in the IIIF ecosystem. Built on Svelte and OpenSeadragon, is still young, but very usable, lightweight, and easy to embed and customize. It is my favourite viewer.

mkiiif doesn't handle metadata for now (and probably won't) — the manifest can be easily patched to insert descriptive metadata in a later step, after image preparation, pulling from any existing datasource or metadata catalog.

Here is a full working example: https://docuver.se/iiif/p3tgsk8jqt/

A few open questions I haven't fully resolved:

  • The main drawback of generating IIIF this way is that you end up managing a large number of files on the filesystem, and handling millions of small image tiles can be slow (and costly). This is where IIIF intersects — and overlaps — with similar practices in digital preservation, such as BagIt, OCFL, and WARC/WACZ. So far there's no specification or viewer implementation that handles IIIF containers (e.g. a zip file bundling images, tiles, and the manifest). Discussions on this have been ongoing in the past; I've recently been looking at analogous approaches like GeoTIFF and SZI.
  • A static IIIF bundle generated with this CLI still needs to be served from an HTTP server, with the base URL defined at derivation time. Could such a bundle be opened from localhost and viewed directly in the browser? Service Workers might help here (even if HTTP is still needed), but it's a rabbit hole I haven't explored yet.

The CLI is pretty bare-bones — feel free to suggest improvements or report bugs. I've been using it over the past weeks as part of a personal project: an amateur digital library built around a DIY book scanner I assembled at home, to preserve magazines, zines, and similar material (content NSFW and out of scope to link here).

2026-03-18: A Glimpse into How AI Tools Can Enhance the Way We Study Web Archive Content: Challenges and Opportunities / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

Artificial intelligence (AI) has transformed nearly every field. Today, we can access and train models that generate text, images, sound, video, and code. This transformation is reshaping how we think, analyze, and preserve information. Yet, despite the rapid growth of AI, its use for analyzing web archive content seems to advance at a slower pace. 

Web archiving is the process of collecting, preserving, and providing access to web content over time, where a memento represents a previous version of a web resource as it existed at a specific moment in the past. Much of the recent work within the web archiving community (e.g., [1], [2], [3]) has focused on making the archiving process itself more intelligent, integrating AI into tasks such as web crawling, storage optimization, and metadata generation. In contrast, the application of AI to the analysis of already archived web content has received comparatively less attention. This gap represents a great opportunity for innovation and contribution, particularly as web archives continue to grow in size, diversity, and historical importance.

In this blog, I aim to outline (based on my perspective, analysis, preliminary work, and insights gained during my PhD candidacy exam) opportunities for where AI could play a role, as well as key challenges involved in integrating AI into web archiving.

My Preliminary Work 

Since I joined the PhD program at ODU in 2023 (Blog post introducing myself) under the supervision of Dr. Michele C. Weigle, my work has focused on the intersection of web archiving and AI, with a particular emphasis on leveraging Large Language Models (LLMs) through Retrieval-Augmented Generation (RAG) to detect and interpret text changes across mementos. Identifying the exact moment when content was modified often requires carefully comparing multiple archived versions, a process that can be both tedious and time-consuming. Moreover, detecting and analyzing where important changes occur is not a straightforward process. Users often need to select a subset of captures from thousands available, and even then, there is no guarantee that the differences they find will be meaningful or important. Traditional approaches to memento change analysis, such as lexical comparisons and indexing (e.g., [4], [5]), focus on showing the deletion or addition of terms or phrases but ignore semantic context. As a result, they miss subtle shifts in meaning and rely heavily on human interpretation.

My early work resulted in a paper titled “Exploring Large Language Models for Analyzing Changes in Web Archive Content: A Retrieval-Augmented Generation Approach,” coauthored with Lesley Frew, Dr. Jose J. Padilla, and Dr. Michele C. Weigle. The results of this initial exploration demonstrated that an LLM, when combined with tools such as RAG over a set of mementos, can effectively retrieve and analyze changes in archived web content. However, it remains necessary to constrain the analysis to distinguish between important and non-important changes. Building on this, I have been developing a pipeline to automatically determine whether a change alters meaning or context and should be considered significant. This aims to reduce manual effort, cognitive load, and support integration into web archive systems while advancing methods for analyzing archived web content at scale.

My PhD Candidacy Exam

During the summer of 2025, I passed my PhD candidacy exam (pdf, slides). This milestone marked an important transition in my doctoral studies and provided an opportunity to reflect on my preliminary work, learn, and identify new ways to contribute to the intersection of AI and web archiving. In my candidacy exam, I reviewed a set of ten papers related to analyzing changes and temporal coherence in archived web pages and websites.  Changes refer to any modifications observed in web content over time, including the addition, deletion, or alteration of text, images, structure, or other embedded resources. Temporal coherence, on the other hand, refers to the degree to which all components of an archived web page (such as HTML, text, images, and stylesheets) or website (such as interconnected pages and resources) were captured close enough in time to accurately represent how it appeared and functioned at a specific moment. A lack of temporal coherence can result in inconsistencies in how the archived page or site looks or behaves, which may affect the accuracy of change analysis.

Figure 2. A moment from my PhD candidacy exam, where I presented a ten-paper review on analyzing changes and temporal coherence in archived web pages and websites.

AI in Web Archiving: Opportunities

Over time, several researchers have addressed the analysis of changes and temporal coherence in web archives; however, the use of AI in this context has been limited. Below, I outline some research opportunities and challenges based on insights gained from my preliminary work and candidacy exam on how AI could play a role in these activities.

Topic Drift

AlNoamany et al. [6] studied web archive collections to identify off-topic pages within TimeMaps, which occur when a webpage that was originally relevant to a collection later changes into unrelated content. For example, in a collection about the 2003 California Recall Election (Figure 3), the site johnbeard4gov.com initially supported candidate John Beard (September 24, 2003) but later transformed into an unrelated adult-oriented page (December 12, 2003), making it irrelevant to the collection. To detect such changes, AlNoamany et al. proposed automated methods including text-based similarity metrics (cosine similarity, Jaccard similarity, and term overlap), a kernel-based method using web search context, and structural features such as changes in page length and word count. Using manually labeled TimeMap versions as ground truth, they found that the best performance was achieved by combining TF-IDF cosine similarity with word-count change.

Figure 3. Example of johnbeard4gov.com going off-topic. The first capture (September 24, 2003) shows the site supporting a California gubernatorial candidate, while the later capture (December 12, 2003) shows the domain transformed into unrelated adult-oriented content. Source: AlNoamany et al. [6]

Recent advances in AI and representation learning offer opportunities to enhance off-topic detection in web archives beyond traditional term frequency measures. Instead of relying on TF-IDF, future approaches could use dense semantic embeddings from transformer models to better capture meaning and context, enabling the detection of more subtle topic drift. Comparing embedding-based similarity with the methods proposed by AlNoamany et al. could help determine which approach is more effective, particularly when topic shifts are not immediately apparent.

Temporal Coherence

Weigle et al. [7] highlight a key challenge in modern web archiving: many sites, such as CNN.com, rely on client-side rendering, where the server delivers basic HTML and JavaScript that later fetch dynamic content (often JSON) through API calls. Traditional crawlers like Heritrix do not execute JavaScript or consistently capture these dynamic resources, leading to temporal violations in which archived HTML and embedded JSON files have different capture times, potentially misrepresenting events or news stories. The issue is illustrated in Figure 5, which shows archived CNN.com pages captured between September 2015 and July 2016. The top row displays pages replayed in the Wayback Machine that show the same top-level headline despite being captured months apart. The bottom row shows mementos from the same dates with the correct top-level headlines; however, the second-level stories remain temporally inconsistent.

By measuring time differences between base HTML captures and embedded JSON resources using CNN.com pages (September 2015–July 2016), Weigle et al. identified nearly 15,000 mementos with mismatches exceeding two days. They conclude that browser-based crawlers best reduce such inconsistencies, though due to their higher cost and slower performance, they recommend deploying them selectively for pages that depend on client-side rendering.

Figure 4. Example of temporal coherence violation in archived CNN.com pages using client-side rendering. Source: Weigle et al. [7].

AI can enhance existing approaches to temporal coherence in web archives, such as those proposed by Weigle et al., by helping identify pages that depend on client-side rendering. For example, a machine learning model could be fine-tuned to analyze the initial HTML and related resources to detect signals such as empty or minimally populated DOM structures and classify whether a webpage relies on client-side rendering. AI-based analysis could also estimate the proportion of JavaScript relative to textual content and detect patterns associated with common client-side frameworks. Combined with indicators such as API endpoints referenced in scripts, these features can be used to flag pages that are unlikely to render correctly with traditional crawlers and may require browser-based crawling.

AI for Enhancing Web Archive Interfaces

While platforms such as Google and others have begun integrating AI into their user interfaces, web archives have largely remained unchanged in this respect. This is notable given the potential of AI to make web archive interfaces more intuitive and more informative for a wide range of users. For example, as my preliminary work suggests, when analyzing content changes, users currently must manually browse long lists of captures or compare multiple archived versions of a webpage. AI could instead automatically identify moments when important changes occur and direct users’ attention to those points in time.

Along the same line, the Internet Archive’s Wayback Machine provides a “Changes” feature that highlights deletions and additions between two snapshots and a calendar view where color intensity reflects the amount of variation. However, this variation is based on the quantity of changes rather than their significance. As a result, many small edits may appear more important than fewer but meaningful modifications. An AI-enhanced interface could address this limitation by incorporating semantic change detection. For instance, a calendar view that highlights when the meaning or message of a page changes can make large-scale temporal analysis more efficient and accessible. Moreover, users could ask natural-language questions such as “When did this page change its message?” or “What were the major updates during a specific period?” and receive concise, understandable answers. 

AI could also guide users through large collections by recommending related pages, explaining why certain versions are relevant, or warning when an archived page may contain temporally inconsistent content. For non-experts, visual aids generated by AI, such as timelines, change highlights, or short explanations, could make complex web archive data easier to interpret. 

AI in Web Archiving: Challenges

While there are opportunities for AI integration into web archiving, there are also challenges that must be considered.

Technical Challenges

From a technical standpoint, I identified three primary challenges regarding using AI for analyzing archived web content. The first concerns the nature of archived web data. Web archiving systems typically store collected content using the Web ARChive (WARC) format. Each WARC file stores complete HTTP response headers, HTML content, and additional embedded resources such as images and JavaScript files. Although this format provides a structure and allows long-term preservation, it is verbose and was not designed to support AI-based analysis. Consequently, researchers must perform extensive parsing and preprocessing before AI models can effectively use archived web content.

Second, many web archives, such as the Internet Archive’s Wayback Machine, prioritize long-term storage and preservation over indexing and large-scale content retrieval. As a result, a single web page may have hundreds or even thousands of archived versions over time. Building and maintaining large-scale vector indexes over such temporally dense collections quickly becomes computationally expensive and, in many cases, impractical.

Third, even when working with controlled data scenarios, such as curated web archive collections, AI-driven analysis still depends on the availability of ground truth for evaluation and validation. For instance, training models to detect significant changes across mementos would require large-scale, high-quality annotations that capture not only what changed, but whether those changes meaningfully affect content interpretation. At present, no large-scale annotated datasets exist that support systematic analysis of change significance across archived web versions, creating a major barrier to training and evaluating AI models in this domain.

Ethical Challenges

Beyond technical limitations, the integration of AI into web archive analysis raises important ethical challenges. For instance, web archives preserve content as it existed at specific points in time, often without the consent or awareness of content creators or the individuals represented in that content. When AI models analyze archived web data, they may surface, reinterpret, or amplify sensitive information that was never intended to be reused in new analytical contexts. For this reason, it is important to carefully consider how AI is applied within web archiving. I contend that AI should be viewed as a complementary tool, one that supports, rather than replaces, human judgment. For example, AI can assist in identifying potential moments of relevant changes, flagging or summarizing them, while humans interpret the results and make decisions.

It is also important to note that recent debates highlight growing tensions between web archives and content owners regarding the use of archived data for AI training and analysis. For example, major news publishers have begun restricting access to resources like the Internet Archive due to concerns that archived content is being used for large-scale AI scraping without compensation or consent [8]. In response to such restrictions, researchers and practitioners—including Mark Graham, Director of the Wayback Machine—have argued that limiting access to web archives poses a significant risk to the preservation of digital history [9]. From this perspective, the primary concern is not excessive access, but rather the potential loss of the web as a historical record if archiving efforts are weakened.

Conceptual Challenges

AI models, particularly LLMs, typically operate on individual snapshots of data. As a result, they are not inherently designed to reason about evolution, temporal coherence, or change over time in archived web content. Consequently, answers to temporally grounded questions should not be expected by default when these models are applied without additional structure or context.

In static analysis scenarios, AI models can perform effectively. For example, given a single archived web page, an LLM can generate a summary, identify main topics, extract named entities, or analyze embedded resources such as images, videos, or scripts. Temporal analysis in web archiving, however, requires a different mode of reasoning. The central questions are not “What does this page say?” or “What is this page about?” but rather “What changed?”, “When did it change?”, “Why did it happen?”, and “What impact does the change have over time?” Answering these questions requires comparing multiple archived versions, reasoning based on context, and perhaps correlating changes across web pages.

Integrating AI into web archiving is therefore not only about efficiency, but about enabling new forms of discovery. This requires clearly defining desired outcomes and using AI to support or accelerate processes that have traditionally been manual.

Final Reflections

To conclude, I would like to leave the reader with a set of open questions as we continue moving toward the integration of AI in web archiving. One of the most visible changes introduced by AI is the ability to go beyond syntactic analysis and begin exploring semantic analysis, where meaning, context, and interpretation matter. This shift is not about replacing existing techniques, but about expanding the types of questions we can ask when working with web archive data.

I contend that traditional algorithms remain essential for many web archiving tasks. They are precise, transparent, and well understood. AI, by contrast, offers strengths in areas where rules struggle: interpreting context, assessing relevance, and reasoning across multiple versions of content. Rather than framing this as a competition between algorithms and AI, a more productive question is how these approaches can complement one another, and in which parts of the analysis pipeline each is most appropriate.

In the short term, I consider that AI tools are unlikely to replace algorithmic methods. However, they already show promise as assistive tools that can guide analysis, prioritize attention, and help humans reason about large and complex temporal collections. This naturally raises a forward-looking question: if AI continues to improve in its ability to reason about time, meaning, and change, how should the web archiving community adapt its tools, workflows, and standards?

The WARC format has proven effective for long-term preservation, but it was not designed with AI-driven analysis in mind. Should we aim to augment existing archival formats with AI-aware representations, or should we focus on developing AI methods that better adapt to current standards such as WARC? How we answer this will shape not only how we analyze web archives, but also how future generations access and understand the web past.

References

[1] AK, Ashfauk Ahamed. “AI driven web crawling for semantic extraction of news content from newspapers.” Scientific Reports, 2025. [Online]. https://doi.org/10.1038/s41598-025-25616-x.

[2] Abrar, M. F., Saqib, M., Alferaidi, A., Almuraziq, T. S., Uddin, R., Khan, W., & Khan, Z. H. “Intelligent web archiving and ranking of fake news using metadata-driven credibility assessment and machine learning.” Scientific Reports, 2025. [Online]. https://doi.org/10.1038/s41598-025-31583-0.

[3] Nair, A., Goh, Z. R., Liu, T., and Huang, A. Y. “Web archives metadata generation with gpt-4o: Challenges and insights,” arXiv, Tech. Rep. arXiv:2411.0540, Nov. 2024. [Online]. https://arxiv.org/abs/2411.05409.

[4] L. Frew, M. L. Nelson, and M. C. Weigle, “Making Changes in Webpages Discoverable: A Change-Text Search Interface for Web Archives,” in Proceedings of the 23rd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2023, pp. 71–81. https://doi.org/10.1109/JCDL57899.2023.00021

[5] T. Sherratt and A. Jackson, GLAM-Workbench/web-archives, https://zenodo.org/records/6450762, version v1.1.0, Apr. 2022. DOI: 10.5281/zenodo.6450762.

[6] Y. AlNoamany, M. C. Weigle, and M. L. Nelson, “Detecting off-topic pages within timemaps in web archives,” International Journal on Digital Libraries, vol. 17, no. 3, pp. 203–221, 2016. https://doi.org/10.1007/s00799-016-0183-5.

[7] M. C. Weigle, M. L. Nelson, S. Alam, and M. Graham, “Right HTML, wrong JSON: Challenges in replaying archived webpages built with client-side rendering,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), Jun. 2023, pp. 82–92. https://doi.org/10.1109/JCDL57899.2023.0002.

[8] Robertson, K. “News publishers limit Internet Archive access due to AI scraping concerns.” Nieman Lab, Jan. 2026. [Online]. https://www.niemanlab.org/2026/01/news-publishers-limit-internet-archive-access-due-to-ai-scraping-concerns/

[9] Graham, M. “Preserving the web is not the problem — losing it is.” Techdirt, Feb. 17, 2026. [Online]. https://www.techdirt.com/2026/02/17/preserving-the-web-is-not-the-problem-losing-it-is/





2026-03-18: Reverse TweetedAt: Determining Tweet ID prefixes from Timestamps / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

Figure 1: Each tweet ID is a unique identifier that encodes the tweet creation timestamp, example adapted from Snowflake ID, Wikipedia.

Web archives, such as the Wayback Machine, are indexed by URL. For example, if we want to search for a tweet we must first know its URL. Figure 2 demonstrates that searching for a tweet URL results in a timemap of that tweet archived at different points in time. Clicking on a particular datetime will show the archived tweet at that particular point in time.

 

Figure 2: An archived tweet URL results in a timemap consisting of archived copies of the tweet.


Figure 3 shows a screenshot of a tweet shared by @_llebrun. The tweet in the screenshot was originally posted by @randyhillier who later deleted his tweet. The screenshot of the tweet does not have the tweet's URL on the image. Moreover, when a tweet is deleted, we will not be able to find the tweet URL on the live web, nor will we know how to  look it up in the archive.


Figure 3: @_llebrun tweeted a screenshot of a tweet originally posted by @randyhiller, who later deleted his tweet.


Therefore, we need to construct the URL of a tweet using only the information present in the screenshot. The structure of a tweet URL is: 


https://twitter.com/Twitter_Handle/status/Tweet_ID


We need the Twitter_Handle and Tweet_ID to construct a tweet URL. Each tweet ID is a unique identifier known as the Snowflake ID that encodes the tweet creation timestamp (Figure 1). We can extract the Twitter handle and timestamp from a tweet in the screenshot. In our previous tech report, we introduced methods for extracting Twitter handles and timestamps from Twitter screenshots. Next, we need to determine the tweet ID from the extracted timestamp. We could use only the Twitter handle and query the Wayback Machine, but that would be an exhaustive task to individually dereference all the archived tweets for a user. For example, the following curl command shows the total number of archived tweets required to dereference for @randyhiller's status URLs is huge (42,053). Hence, our goal is to limit the search space by utilizing the timestamp present on the screenshot.

curl -s "http://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status&matchType=prefix" | wc -l


   42053


Previously, one could query Twitter to find the timestamp of a tweet given a tweet ID. But, this service is no longer freely available.. The Twitter API has access rate limits and metadata from deleted/suspended/private tweets cannot be accessed using the API. Moreover, the Twitter API is currently monetized and no longer research-friendly. To address these issues, WS-DL members Mohammed Nauman Siddique and Sawood Alam developed the TweetedAt web service in 2019. The goal of this service is to extract the timestamps for Snowflake IDs and estimate timestamps for pre-Snowflake IDs. Therefore, TweetedAt has become a useful tool for finding timestamps from tweet IDs. However, we require a tweet ID prefix to be determined from a given timestamp.

Reverse TweetedAt


The Snowflake service generates a tweet ID which is a 64-bit unsigned integer composed of: 41 bits timestamp, 10 bits machine ID, 12 bits machine sequence number, and 1 unused sign bit. The timestamp occupies the upper 41 bits only.


TweetedAt determines the timestamp for a tweet ID by right-shifting the tweet ID by 22 bits and adding the Twitter epoch time of 1288834974657 (offset).


Python code to get UTC timestamp of a tweet ID

def get_tweet_timestamp(tid):


    offset = 1288834974657

    tstamp = (tid >> 22) + offset

    utcdttime = datetime.utcfromtimestamp(tstamp/1000)

    print(str(tid) + " : " + str(tstamp) + " => " + str(utcdttime))


For Reverse TweetedAt, given a datetime, we want to generate a tweet ID prefix by subtracting the offset and left-shifting by 22 bits. The process will not reconstruct the exact tweet ID because the lower 22 bits are all zeros. However, the process will give us a tweet ID prefix for a timestamp. For example, the tweet ID for @randyhillier’s tweet is ‘1495226962058649603’ and the timestamp is ‘9:41 PM Feb 19, 2022’ as shown in Figure 3. The tweet ID is a 19-digit ID and the timestamp is at minute-level granularity. The Reverse TweetedAt would compute a tweet ID prefix ‘149522’ of 6-digits for the 19-digit tweet ID ‘1495226962058649603’ based on the timestamp at minute-level granularity.


Python code to get tweet ID prefix from a Wayback timestamp

from datetime import datetime, timezone


TWITTER_EPOCH_MS = 1288834974657


def wayback_to_tweetid_prefix(timestamp: str):


    s = str(timestamp).strip()


    if len(s) == 14 and s.isdigit():

        granularity = "second"

        dt = datetime.strptime(s, "%Y%m%d%H%M%S").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000)

        end_ms = start_ms + 999


    elif len(s) == 12 and s.isdigit():

        granularity = "minute"

        dt = datetime.strptime(s, "%Y%m%d%H%M").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000) 


    elif len(s) == 10 and s.isdigit():

        granularity = "hour"

        dt = datetime.strptime(s, "%Y%m%d%H").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000) 

        end_ms = start_ms + 3_600_000 - 1


    elif len(s) == 8 and s.isdigit():

        granularity = "date"

        dt = datetime.strptime(s, "%Y%m%d").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000) 

        end_ms = start_ms + 86_400_000 - 1


    else:

        raise ValueError(

            "Unsupported Wayback format. Use YYYYMMDD, YYYYMMDDHH, YYYYMMDDHHMM, or YYYYMMDDHHMMSS (UTC)."

        )


    start_delta = start_ms - TWITTER_EPOCH_MS

    end_delta = end_ms - TWITTER_EPOCH_MS

    min_id = start_delta << 22

    max_id = (end_delta << 22) | ((1 << 22) - 1)

    min_str = str(min_id)

    max_str = str(max_id)

    length = max(len(min_str), len(max_str))

    min_str = min_str.zfill(length)

    max_str = max_str.zfill(length)


    i = 0

    while i < length and min_str[i] == max_str[i]:

        i += 1


    prefix_str = min_str[:i] or "0"

    suffix_len = length - i

    prefix_val = int(prefix_str)

    ten_pow = 10 ** suffix_len

    approx_lower = prefix_val * ten_pow

    approx_upper = (prefix_val + 1) * ten_pow - 1


    return {

        "input_timestamp": timestamp,

        "tweet_id_prefix": prefix_str,

        "tweet_id_regex": f"{prefix_str}[0-9]{{{suffix_len}}}",

        "tweet_id_range": f"[{approx_lower} – {approx_upper}]",

    }


We integrated Reverse TweetedAt as a web service alongside TweetedAt. The service accepts a timestamp as user input and returns the corresponding tweet ID prefix, tweet ID regex, and full tweet ID range (Figure 4). It supports multiple valid timestamp formats (e.g., ISO 8601, RFC 1123, Wayback) and provides output at different levels of granularity. For example, Figure 4 shows output for millisecond-level granularity. Because millisecond-level precision is typically unavailable in tweet timestamps, the tool can interpret such inputs at second- or minute-level granularity. Rather than assuming zeros for unknown fields, the tool expands the input into the full corresponding time window (e.g., an entire second or minute), and computes the tweet ID prefix over that interval.

Figure 4: Reverse TweetedAt outputs tweet ID prefix at millisecond- level granularity.


Figure 5: Reverse TweetedAt outputs tweet ID prefix at second-level granularity.


Figure 6: Reverse TweetedAt outputs tweet ID prefix at minute-level granularity.


Tweet ID Regex-based Retrieval Across Temporal Granularity


We can use the tweet ID regex derived from a timestamp to search for archived tweets within a specific temporal window. By querying the Wayback Machine’s CDX API and filtering results using this prefix-based regex, we can identify tweet URLs whose IDs fall within the calculated range. As the timestamp becomes less precise, the tweet ID becomes shorter and the regex search space widens. 


For example, the tweet ID of @randyhillier’s tweet shown in Figure 3 is ‘1495226962058649603.’ Using TweetedAt, we can get the timestamp at millisecond-level granularity. Using Reverse TweetedAt, the millisecond-level granularity  returns a more precise prefix and results in 10 archived captures, while a slightly less precise prefix (second-level granularity) returns 15. When the precision is reduced further (minute-level granularity), the number of results remains 15. This indicates that all tweets within that broader time window were posted within the same narrower interval. This illustrates how lower temporal granularity expands the potential search space. However, a wider ID range does not necessarily produce more results; it only increases the number of possible candidate IDs.

Search space at millisecond-level granularity

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix" \

| grep -E 'status/14952269620[0-9]{8}' | wc -l


   10


Search space at second-level granularity

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix" \

| grep -E 'status/149522696[0-9]{10}' | wc -l


   15


Search space at minute-level granularity

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix" \

| grep -E 'status/149522[0-9]{13}' | wc -l


   15



CDX API Wildcard Search and Snowflake IDs to Limit the Search Space Using Tweet ID Prefix


We can now determine a tweet ID prefix from a screenshot timestamp using the Reverse TweetedAt service. Since a tweet can be archived any time between ±26 hours of the screenshot timestamp, we can determine tweet ID prefixes from the time window timestamps. We can use this time window to limit the search space by excluding the URLs tweeted before and after the alleged timestamp. Let us consider a tweet in the screenshot in Figure 2, where the screenshot timestamp is: 


9:41 PM Feb 19, 2022 (20220219214100)


We compute the tweet ID prefixes from left-hand boundary (-26) and right-hand boundary (+26) timestamps using the Reverse TweetedAt which are listed below:


-26 hours timestamp: 20220218194100 → tweet ID prefix: 14947588
+26 hours timestamp: 20220220234100 → tweet ID prefix: 149554404

As previously mentioned, the timestamp occupies the upper 41 bits only. We can use a common portion of tweet ID prefixes (149[4-5]) and do a CDX API wildcard search in the Wayback Machine to limit the search space. The search space reduces to 629 archived tweets, whereas using only the Twitter handle outputs 42,053 archived tweets. Now, dereferencing 629 archived tweets to search for a particular tweet text of a screenshot is a lot of work but feasible, whereas dereferencing 42,053 archived tweets is far too expensive. The following curl command shows the total number of archived tweets required to dereference for @randyhiller's status URLs with a common tweet ID prefix is comparatively less (629).

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix&from=20220218194100" \ | grep -E 'status/149[4-5]' | wc -l


   629


Summary


It is easy to search for a tweet in the Wayback Machine when you know the  URL. But a screenshot of a tweet typically does not have its URL present on the image. However, the Twitter handle and timestamp present in the tweet in the screenshot can be utilized to search for a tweet in the Wayback Machine web archive. Given a datetime, Reverse TweetedAt produces a tweet ID prefix, which we can then use to grep through a CDX API response of all tweets associated with a Twitter account. We can determine approximate tweet IDs from left-hand boundary and right-hand boundary timestamps from a screenshot timestamp using the Reverse TweetedAt tool. We found that we can limit the search space using a CDX API wild card search based on a common tweet ID prefix. Thus, the process for finding candidate archived tweets for the tweet in the screenshot is optimized. We published a paper at the 36th ACM Conference on Hypertext and Social Media, “Web Archives for Verifying Attribution in Twitter Screenshots,” which discusses how we can further use the candidate archived tweets to verify whether the tweet in the screenshot was posted by the alleged author.


Related Links:



—- Tarannum Zaki (@tarannum_zaki)


Seeking Approval, Confronting Objectivity: Neutrality in the Library of Congress Subject Headings Approval Process / In the Library, With the Lead Pipe

In Brief: This study examines the concept of neutrality in Library of Congress Subject Headings and the subject approval process by analyzing proposed headings that were rejected over a nearly 20-year period. It considers the place of neutrality in libraries more generally and argues that equity, rather than neutrality, is the appropriate lens for judging subject heading proposals. Finally, it recommends several reforms that could improve the subject heading process and make it more equitable.

By Allison Bailund, Deborah Tomaras, Michelle Cronquist, and Tina Gross

If a train is moving down the track, one can’t plop down in a car that is part of that train and pretend to be sitting still; one is moving with the train. Likewise, a society is moving in a certain direction—power is distributed in a certain way, leading to certain kinds of institutions and relationships, which distribute the resources of the society in certain ways. We can’t pretend that by sitting still—by claiming to be neutral—we can avoid accountability for our roles (which will vary according to people’s place in the system). A claim to neutrality means simply that one isn’t taking a position on that distribution of power and its consequences, which is a passive acceptance of the existing distribution. That is a political choice.[1]

Introduction

Library workers and patrons have long been frustrated with Library of Congress Subject Headings (LCSH) for being out of date and lacking well-known concepts with abundant usage. Contributors to the Subject Authority Cooperative Program (SACO) have made many improvements to LCSH by proposing new headings and revising existing terms. Those attempts, however, have sometimes been hampered by the Library of Congress’s (LC) preference for supposed neutrality within the vocabulary; Subject Headings Manual (SHM) instruction “H 204,” released in 2017, specifically dictates that proposed headings should “employ neutral (i.e., unbiased) terminology.”[2]

This desire for neutrality has been directly stated, alluded to, or otherwise upheld in myriad rejections of proposed subject headings, from Negative campaigning[3] to White flight.[4] Even Water scarcity, a quantifiable concept of worldwide concern, was rejected in 2008 as a non-neutral topic requiring value judgments with the following justification:

Works on the topics of water scarcity and water shortage have been cataloged using the heading Water-supply, post-coordinating[5] as necessary with additional headings such as Water conservation and Water resources management. The meeting determined that this practice is appropriate and should continue, since Water-supply is a neutral heading that does not require a judgment about the relative abundance of water.[6]

However, what exactly constitutes neutral and unbiased terminology is never defined in “H 204” or anywhere else in the SHM, nor in any other Library of Congress controlled vocabulary manuals.[7] Much of the previous literature on neutrality in libraries focuses on debates over possible definitions of the term and what role neutrality should play in library services and collections. Building off previous critical cataloging literature, which focuses on addressing problematic terms, subject hierarchies, and biases within cataloging standards, this article extends that scrutiny further. We analyze how neutrality is embedded in the LC structures and systems that vet the terms catalogers utilize to describe materials.

Our article examines the ways in which neutrality is enforced in LCSH rejections between July 2005 and December 2024. We review “Summaries of Decisions” from LC Subject Editorial Meetings (along with associated discussion and commentary in the field); within these, we identify and interpret patterns of justifications used to reject subject heading proposals and maintain purported neutrality within the vocabulary. We argue that neutrality has been used to keep many concepts depicting prejudice (racism, sexism, etc.), as well as concepts related to the lived experiences of marginalized people, out of the vocabulary and/or to obscure materials about those topics under other, often more generalized or euphemistic, terminology. As a counterpoint, we suggest a values- and equity-driven approach to replace the principle of neutrality in a cataloging context and within the subject approval process. We acknowledge that the current political situation may be particularly fraught for equity-driven change, but believe bowing to political pressures is untenable, and continued pursuit of neutrality will only serve to further the discordance between library values and the realities of LCSH.

Background

Neutrality: Assumed, but Nebulous

Schlesselman-Tarango notes the perceived conceptual importance of neutrality for libraries and librarianship; their “status as ‘an essential public good’” is “contingent on the perpetration of the idea that [they are] also neutral.”[8] Seale further situates this notion of libraries-as-neutral as not externally imposed, but emanating from within librarianship itself: “The positioning of the library as a neutral and impartial institution, separated from the political fray, resonates with dominant library discourse around libraries.”[9]

However, despite both critics and supporters assuming that neutrality is fundamental to librarianship, there is a dearth of references to the term in official documents underpinning the ethics and standards of the library profession. The American Library Association’s (ALA) Working Group on Intellectual Freedom and Social Justice observed, for example, that “the word neutrality does not appear in the Library Bill of Rights, the ALA Code of Ethics, and any other ALA statements that the Working Group could locate. It does not appear in the Intellectual Freedom Manual (10th Edition) nor is it defined in any official ALA document or policy.”[10] The International Federation of Library Associations and Institutions’s (IFLA) Code of Ethics mentions but does not define neutrality in Section 5, in sentences such as “Librarians and other information workers are strictly committed to neutrality and an unbiased stance regarding collection, access and service.”[11] For catalogers in particular, the Cataloging Code of Ethics, issued in 2021 and discussed further below, explicitly disputes the concept of neutrality.

Most pertinent to the subject proposal process, the National Information Standards Organization’s (NISO) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies mentions neutrality exactly twice, yet again without definition. The first instance, in guidance about choosing preferred forms of terms, asserts that “Neutral terms should be selected, e.g., developing nations rather than underdeveloped countries.”[12] The second appearance, in a discussion of synonyms, notes “pejorative vs. neutral vs. complimentary connotation[s]” of terms that might influence usage.[13] The latter reference positions neutrality as the impartial fulcrum of term meanings, while the former implies, particularly via the example, a more active attempt at choosing equitable and unbiased terminology.

Although the terms “neutral” and “unbiased” are often linked when they appear in library literature (as in the IFLA Code of Ethics), they are not synonymous. Oxford English Dictionary (OED) definitions of neutral include “inoffensive,” and “not taking sides in a controversy, dispute, disagreement, etc.”; unbiased, however, while meaning “not unduly or improperly influenced or inclined; [and] unprejudiced,” does not necessarily imply a lack of involvement in social or political issues.[14] The incompatibility between neutrality as inoffensive isolation versus unbiasedness as active equity plays out repeatedly in library discussions. Without clear definitions, neutrality in the NISO Guidelines and elsewhere is open to conjecture and interpretation. As noted by Scott and Saunders, “[T]he term ‘neutrality’ seems to be used for, or conflated with, everything from not taking a side on a controversial issue to the objective provision of information and a position of defending intellectual freedom and freedom of speech.”[15]

Proponents of library neutrality don’t fully agree on definitions, either. In Scott and Saunders’s survey, some describe it as “lacking bias,” which more closely aligns with principles of equity.[16] The depiction of neutrality by LaRue, the former Director of the ALA’s Office for Intellectual Freedom, also appears to resemble equity; he frames neutrality as not “deny[ing] people access to a shared resource just because we don’t like the way they think” and giving everyone “a seat at the table.”[17] Dudley, reframing library neutrality in relation to pluralism, highlights similar values; his proposed ethos calls on librarians to “adhere to principled, multi-dimensional neutrality” which includes “welcoming equally all users in the community” and consistently-apply[ing] procedures for engaging with the public.”[18]

The 2008 book Questioning Library Neutrality examines many aspects of why neutrality is both an illusion and a misguided aspiration, and also disabuses readers of the idea that it has always been a core value. Rosenzweig points out that neutrality as a principle of librarianship does not go back to the early development of public libraries:

We would do well to remember that, if libraries as institutions implicitly opened democratic vistas, our librarian predecessors were hardly democratic in their overt professional attitude or mission, being primarily concerned with the regulation of literacy, the policing of literary taste and the propagation of a particular class culture with all its political, economic and social prejudices. In fact, the idea of the neutrality of librarianship, so enshrined in today’s library ideology (and so often read back into the indefinite past), was alien to these earlier generations.[19]

Although Macdonald and Birdi’s literature review identifies four conceptions of neutrality within library science literature—“favourable,” “tacit value,” “libraries are social institutions,” and “value-laden profession”—the authors found that depictions of neutrality articulated by practitioners are more complicated. Many have “ambivalent” views of neutrality, seeing it as “a slippery and elusive concept.”[20] The relative importance of neutrality to proponents varies, depending on its position vis-à-vis other library values: “When it is alone, or grouped with a simple, single other value like professionalism, it is very low in priority. When it is presented in a group of other values or left implicit, it fares better.”[21] Catalogers tended to espouse neutrality the least among library specializations, with 21% reporting that they never think about neutrality.[22] Further, some surveyed librarians “are more likely to eschew neutrality on matters of social justice,” when neutrality comes into conflict with core library values.[23]

Neutrality versus Social Justice

Since the late 1960s, neutrality has increasingly come into question as librarians have embraced ideals centering social justice, equity, diversity, and inclusion, particularly in the ALA.[24] These values, codified in the ALA Code of Ethics and Library Bill of Rights, include a commitment to “recognize and dismantle systemic and individual biases; to confront inequity and oppression; to enhance diversity and inclusion; and to advance racial and social justice in our libraries, communities, profession, and associations.”[25] ALA resolutions go a step further, acknowledging the “role of neutrality rhetoric in emboldening and encouraging white supremacy and fascism.”[26] Scott and Saunders sum up the issue, noting that while some librarians cast neutrality as a “fundamental professional value, albeit one that is not explicitly mentioned in the professional codes of ethics and values,” others assert that it is “a false ideal that interferes with librarians’ role of social responsibility, which is an explicitly stated value of librarianship.”[27] As Watson argues in an ALA 2018 Midwinter panel on neutrality in libraries, “We can’t be neutral on social and political issues that impact our customers because, to be frank, these social and political issues impact us as well.”[28]

Even among library codes of ethics that explicitly hold neutrality as a core value, there is a tension between practitioners and official documentation. For example, the Canadian Federation of Library Associations / Fédération canadienne des associations de bibliothèques (CFLA-FCAB) Code of Ethics calls for librarians to “promote inclusion and eradicate discrimination,” provide “equitable services,” and “counter corruption directly affecting librarianship”; but the Code also advocates for neutrality, advising librarians to “not advance private interests or personal beliefs at the expense of neutrality.”[29] Once again neutrality remains undefined—though it’s implied, based on context, to be not taking sides, matching one of the OED definitions above. This understanding accords with a 2024 study on Canadian librarians, which noted most Canadian academic librarians seem to have coalesced around defining neutrality as “not taking sides,” followed by “not expressing opinions.”[30]

Yet the same study also highlights a perceived incompatibility of neutrality with other values of librarianship, with “the majority (54%) of respondents” disagreeing or strongly disagreeing that “‘neutrality is compatible with other library values and goals,’” and 58% disagreeing “that it is ethical to be neutral.”[31] Brooks Kirkland asserts that assuming neutrality as a key tenet of librarianship conflicts with such principles as promoting inclusion and eradicating discrimination.[32] Pagowsky and Wallace note that, whether knowingly or not, upholding neutrality within inequitable systems ultimately supports them: “Trying to remain ‘neutral,’ by showing all perspectives have value … is harmful to our community and does not work to dismantle racism. As Desmond Tutu has famously said, ‘If you are neutral in situations of injustice, you have chosen the side of the oppressor.’”[33]

Cataloguing Code of Ethics, Critical Cataloging, and Other Recent Developments

The incongruity between neutrality and social justice as core library values has sparked the numerous debates detailed above and on mailing lists and social media. It has also led in part to the expansion of the critical cataloging movement and the creation of the Cataloguing Code of Ethics, published in 2021 and since adopted by several library organizations, including the ALA division Core. The Code explicitly refutes the concept of neutrality; it avers that “neither cataloguing nor cataloguers are neutral,” and calls out the biases inherent within the dominant, mostly Western cataloging standards currently in use. It particularly notes that “cataloguing standards and practices are currently and historically characterised by racism, white supremacy, colonialism, othering, and oppression.”[34]

The most well-known critical cataloging subject heading proposal was the attempt to change the now-defunct heading Illegal aliens, as depicted in the documentary Change the Subject. In November 2021, five years after LC initially announced it would change the Illegal aliens subject headings and then backtracked after political pressure, LC announced it would replace the subject headings Aliens and Illegal aliens. However, LC did not adopt the changes it had initially announced, nor the recommendations made in a report by the ALA Subject Analysis Committee (SAC), which included revising the term to Undocumented immigrants.[35] LC instead split Illegal aliens into two new headings: Noncitizens and Illegal immigration.[36] Librarians have criticized the retention of “illegal” within one of the updated headings for continuing to make library vocabularies “complicit” with the “legally inaccurate” criminalization of undocumented immigrants.[37]

Other critical cataloging proposals have been subjected to inordinate scrutiny by LC; even when headings have been approved, they have sometimes faced heavy editing and modification. One example is Blackface, where LC’s changes to the proposal obscured the racism characterizing the phenomenon. The broader term (i.e., the parent in the subject hierarchy) was altered from Racism in popular culture to Impersonation.[38] Since Impersonation falls under the broader terms Acting, Comedy, and Imitation, this change emphasizes the performance aspect in lieu of its racist connotations. Similarly, the scope note (i.e., definition), was modified from “Here are entered works on the use of stereotyped portrayals of black people (linguistic, physical, conceptual or otherwise), usually in a parody, caricature, etc. meant to insult, degrade or denigrate people of African descent” to “Here are entered works on the caricature of Black people, generally by non-Black people, through the use of makeup, mannerisms, speech patterns, etc.”[39] As noted by Cronquist and Ross, these changes ultimately “neutralize[d]” the proposal “in the name of objectivity.”[40]

However, there have also been numerous successful updates to outdated terminology and additions of missing concepts, particularly in recent years. For example, in 2021, fifteen subject headings for the incarceration of ethnic groups during World War II, including Japanese Americans, were changed from the euphemistic phrase –Evacuation and relocation to –Forced removal and internment.[41] The African American Subject Funnel added the new heading Historically Black colleges and universities in 2022 and helped to revise Slaves to Enslaved persons in 2023; the Gender and Sexuality Funnel successfully changed the heading Gays to Gay people, and proposed the new term Gender-affirming care, in 2023; and the Medical Funnel updated Hearing impaired to Hard of hearing people in 2024.[42]

On a hopeful note, many of these large-scale projects coordinated with Cataloging Policy Specialists within LC, who worked closely with catalogers during the process and ensured that related term(s) and related Library of Congress Classification number(s) were updated as well. Further, LC has taken some recent steps to improve its vocabularies and create avenues for increased input from outside institutions. This includes hiring a limited term Program Specialist to help redress outdated terminology related to Indigenous peoples. LC also created two advisory groups for Demographic Group Terms and Genre/Form Terms, both of which allow for greater community input into these vocabularies.

Still, frustrations remain. Changing outdated terminology is a complicated process. Library of Congress vocabularies, in particular, are vulnerable to potential governmental interference. Attempted Congressional intervention during the updating of Illegal aliens and the passing of a statute mandating transparency in the subject approval process led to the creation of “H 204” codifying LC’s preference for a neutrality uninvolved in political and social issues.[43] The complication of bibliographic file maintenance (e.g., reexamining cataloged materials to determine whether subject headings should be changed, deleted, or revised) also muddies the waters and impedes large-scale projects. Staffing issues within LC further hinder the ability to undertake or complete projects, as seen in the SACO projects process, paused in 2025 due to LC’s catalog migration.

Maintaining LCSH

Library workers are familiar with LCSH in our discovery tools, and most are aware of concerns about outdated and problematic headings. However, they may not see debates and conflicts about new headings and ongoing maintenance of the vocabulary as a built-in and inherent part of the system, as catalogers who engage in that work do.

As Gross asserts:

To remain effective, headings must be regularly updated to reflect current usage. Today’s LCSH People with disabilities used to be Handicapped and, before that, Cripples. Additionally, new concepts require new headings, such as the recently created Social distancing (Public health), Neurodiversity, and Say Her Name movement. The process of determining which word or phrase to use as the subject heading for a given topic is inevitably fraught and can never be free of bias. The choice of terms embodies various perspectives, whether they are intentional and acknowledged or not.[44]

Both the need to continually revise existing headings and create new ones, and indeed wrangling over what they should be, are not defects, nor a surprise. They flow directly from the purpose of controlled vocabulary and the complications of language it exists to help navigate—the ever-changing and endless variety of ways to refer to things.

Some of the frequency and intensity of debates about LCSH stem from the fact that it attempts to be a universal vocabulary that covers all branches of knowledge. While it is created and maintained primarily for the needs of the Library of Congress, it is used by all kinds of libraries. Balancing the need to serve a user base that consists of federal legislators and providing the world with a one-size-fits-all vocabulary is clearly a formidable and contradictory endeavor. In recent decades, LC has made significant progress in opening up the maintenance process to input and contributions from the broader library community via the SACO program. These changes appear to be partly in response to demands to make the process faster and more transparent, but also a desire by LC to incorporate broader perspectives and experiences and to help with the tremendous workload.

LCSH Creation and Revision Process

The SACO program, created circa 1993,[45] allows librarians to submit proposals for new or revised LCSH terms (as well as other LC vocabularies) to the Library of Congress. In order to submit proposals, catalogers are expected to be familiar with the Subject Headings Manual (SHM), which governs LCSH usage and formulations as well as the proposal process, required research, and criteria used to evaluate proposals.[46] One of the primary requirements is literary warrant: proposers must demonstrate that there is a need for the new subject heading based on a work being cataloged.[47] Beyond the work cataloged and published/reference sources, librarians can also cite user warrant, “the terminology people familiar with the topic use to describe concepts,” as justification in proposals.[48] This can include reviews, blog posts, social media threads, LibGuides, etc.

After a proposal is submitted, LC staff schedule it to a monthly “Tentative List,” which is published to allow for public comment on proposed headings. Taking those comments and SHM instructions into account, members of LC’s Policy, Training, and Cooperative Programs Division (PTCP) make a decision about whether to add the proposed heading to LCSH, send it back to the cataloger for revision and resubmission, or reject it. If the heading is not added, a monthly “Summary of Decisions” document details the reasons for its exclusion. While the SACO program allows external librarians to submit proposals, the Library of Congress maintains its “authority to make final decisions on headings added.”[49]

Most proposals are routine and relatively straightforward, such as those that follow patterns—repeated formulations of similar subjects that provide a predictable search structure for library patrons (e.g., Boating with dogs already exists and the cataloger wants to propose Boating with cats). SHM “H 180” notes that patterns help achieve desired qualities for the vocabulary, including “consistency in form and structure among similar headings.”[50] LC is also concerned with avoiding multiple subject headings that convey too closely related concepts. LCSH online training “Module 1.2” highlights both “consistency and uniqueness among subjects” as strengths of controlled library vocabularies, for instance.[51] Proposals that don’t follow patterns therefore receive more scrutiny, to make sure they are unique, definable topics. LC makes judgment calls based on the strength of the evidence in proposals, and on SHM instructions, including the guidance in “H 204” about neutrality.

Neutrality within LC Documentation

Within its official documentation on subject headings, LC mentions neutrality sparingly. In the entirety of the SHM, the word neutral appears only once, specifically in guideline “H 204” with the recommendation that catalogers “employ neutral (i.e., unbiased) terminology.”[52] Apart from an association with the term unbiased, neutral is not defined in “H 204” or anywhere else in the SHM. Online LCSH training, freely available from the Library of Congress website, offers similarly little on the concept of neutrality. “Module 1.4” recommends that catalogers “accept the idea that all knowledge is equal” and “remain neutral … and attempt to be as objective as possible” when describing material.[53]

Despite the lumping together of neutral and unbiased in “H 204,” a neutrality which calls for a static ignoring of social realities and historical context does not equal an unbiased active engagement against prejudice. The Merriam-Webster Dictionary’s definitions of “neutral” and “unbiased” make this clear. “Neutral” as “indifferent” and politically nonaligned echoes OED. But the definition of “unbiased” goes even further, meaning not just free from prejudice and “favoritism” but “eminently fair”[54]—an active and flexible balancing of interests inherently at odds with static and detached neutrality. Eliding the two concepts risks undermining the latter, and with it library ethics and values, resulting in the further entrenchment of Western, colonial, and other biases in LCSH.

The definition of neutrality that LC, and by extension LCSH, seems to favor is one of passivity. Neutrality as indifference to social realities appears, for instance, in LCSH training “Module 1.4.” The module acknowledges that library vocabularies “are culturally fixed” and “from a place; they are from a time; they do reflect a point of view.” However, rather than using that “realiz[ation]” to encourage periodic updating of outdated or potentially prejudicial content in LCSH, the module advises “accepting” that cultural fixity as immutable fact; it recommends that catalogers “remain neutral, suspend disbelief” and focus on (undefined) objectivity instead.[55] Objectivity also appears in “H 180,” which advises catalogers: “Avoid assigning headings that … express personal value judgments regarding topics or materials. … Consider the intent of the author or publisher and, if possible, assign headings … without being judgmental.”[56]

Here, as in “Module 1.4,” objectivity appears linked to neutrality; the implication is that a subject can only be described without bias if a cataloger is dispassionate and has no opinions on the topic. However, not all definitions of objectivity match this interpretation. Although OED defines objectivity as “detachment” and “the ability to consider or represent facts, information, etc., without being influenced by personal feelings or opinions,” Merriam-Webster’s definition is “freedom from bias” and a more actively equitable “lack of favoritism toward one side or another.”[57]

This disparity in meanings begs the question: What does it mean to describe a topic without judgment or bias? Is objectivity erasing any uncomfortable content in a topic, even if that erasure favors a biased status quo and/or muddies a topic’s meaning? Or, rather, is it objective to label something truthfully, even if the topic raises strong feelings? As demonstrated by the revisions to Blackface discussed above, changes to the scope note and broader term in the name of objectivity did not result in a clearer or less biased heading; instead, they obfuscated the racist intent behind the phenomenon.

Similarly, despite the assertion in “H 180,” a singular focus on authorial intent does not always result in a lack of bias or judgment in subjects. As noted by literary critics such as Wimsatt and Beardsley, “placing excessive emphasis on authorial intention [leads] to fallacies of interpretation,”[58] since readers only have access to the text in front of them; attempting to guess an author’s intent is already an act of judgment, not a discovery of objective facts. Further, if an author writes a prejudicial text, taking its content at face value risks replicating that bias through subject provision. LCSH terms such as Holocaust denial literature recognize and counter this, labeling Holocaust denial works as ones “that diminish the scale and significance of the Holocaust or assert that it did not occur.”[59] If catalogers relied strictly on authorial intent in the name of objectivity, those works would instead get misleading subjects such as Holocaust, Jewish (1939-1945) instead of Holocaust denial literature, tacitly legitimizing bias.

Thus, the SHM’s focus on objectivity and neutrality highlights incongruities and tensions within subject guidance and LCSH vocabulary itself between indifference and self-imposed inoffensiveness on the one hand, and actively countering bias and promoting equity on the other. As will be shown below, rejections in the name of neutrality reveal that in fact the proposal process itself has never been neutral or apolitical.[60]

Neutrality and SACO Rejections

LC’s adherence to an inflexible and indifferent definition of neutrality, critiquing proposals engaging with social and political realities as subjective and relying on value judgments, has led to the rejection of multiple headings that surface prejudice or describe the lives and experiences of marginalized peoples. Instead, rejections upholding neutrality reinforce hegemonic societal attitudes within LCSH.

Neutrality appears in several guises in proposal rejections in “Summaries of Decisions” from 2005 to 2025. The most obvious ones reference “H 204” and “neutral (i.e., unbiased) terminology,” including the 2008 rejection of Water scarcity and the 2024 rejection of White flight (discussed in more depth below).[61] Similar rejections use words such as “judgment” (including Negative campaigning in 2013, and Zombie firms in 2023); “pejorative” (e.g., Dive bars in 2010, and Banana republics in 2015); “vulgar and offensive” (such as Vaginal fisting and Anal fisting in 2010); “subjective” (such as African American successful people in 2009); “viewpoint” (including Jim Crow laws in 2019); and “non-loaded language” (e.g., Incarceration camps in 2024).[62]

Neutrality as non-involvement in political and social realities also appears in the rejection of proposals due to LC’s Policy, Training, and Cooperative Programs Division (PTCP)’s unwillingness to establish certain “patterns” of subject headings (i.e., set precedents for future headings of specific types). Pattern rejections often appear entirely arbitrary; that is, the rejections stated merely that PTCP did not wish to begin a pattern, and not that a proposal as formulated was missing vital elements, had no warrant, or did not conform to provisions stipulated in the SHM. Despite acknowledging in “Module 1.4” that the wrong subject heading “can make any resource in the collection ‘disappear,’”[63] these rejected patterns render certain topics invisible and unsearchable by library patrons.

Uncreated patterns include critiques of prejudicial attitudes and behaviors, particularly by governmental bodies, such as rejections of Prison torture in 2007 or Religious profiling in law enforcement in 2024.[64] Similarly, patterns that would have highlighted the unearned privilege and/or bigotry of certain groups remain largely unestablished, including Holocaust deniers (2016), Toxic masculinity (2020), and White privilege (rejected in 2011 and 2016, before finally being accepted as White privilege (Social structure) in 2022).[65] The rejection of White fragility in 2020 is particularly interesting, as the rationale was that “LCSH does not include any headings that ascribe an emotion or personality trait to a specific ethnic group or race, and the meeting does not want to begin the practice.”[66] However, LCSH has included since 2010 the heading Post-apartheid depression, meant to convey the mental health and feelings of white Afrikaners. So not all white people’s emotions appear off-limits—just ones that reveal systemic biases. PTCP also declined to create patterns naming discrimination directed at certain groups, such as Police brutality victims in 2014 and Missing and murdered Indigenous women in 2023.[67] In the latter case, the rejection of a term meant to highlight societal neglect of the violence against Indigenous peoples means that their existence and trauma continue to be hidden in library vocabularies and catalogs.

Pattern rejections not only make prejudices invisible in library catalogs, they also underrepresent concepts that celebrate or describe the cultures and experiences of marginalized peoples. Erasures of joy can be as damaging as erasures of struggle. Aronson, Callahan, and O’Brien’s discussion of themes related to people of color in picture books, for instance, could equally apply to messages portrayed in LCSH via what topics it hides or surfaces in library catalogs: a “predominance of Oppression … at the expense of other types of portrayals can send a message that suffering and struggle are definitive of a group’s experience, or even of victimhood.”[68] Instead, marginalized people “deserve to see themselves represented as people who lead full and dynamic lives and who are not fully defined by histories of oppression.”[69] Unaccepted subject headings of this type include African American successful people (2009), Overweight women’s writings (2011), Gay neighborhoods and Lesbian neighborhoods (2012), Gay personals (2018), Afro-pessimism (2021), and Indigenous popular culture (2024).[70] 

Absorbing a proposed critical term into a supposed “positive” equivalent also served to preserve an inoffensive neutrality in LCSH; this is seen in the rejection of Food deserts in 2014:

The concept of food desert has been defined in multiple ways by various governments and organizations, often in ways to suit their specific political agendas … The existing heading Food security is defined as access to safe, sufficient, and nutritious food. The existing heading is used for both the positive and negative (it has a UF [cross-reference for] Food insecurity), and the meeting feels that it adequately covers the concept of a food desert.[71]

Similarly, LC rejected a proposal for Genocide denial in 2017 with the rationale that the “positive” heading—Genocide—was sufficient for patron access: “A heading for a concept in LCSH includes both the positive and negative aspects of that topic. A work about the denial of genocide still discusses the concept of Genocide.”[72] Slum clearance was also rejected in 2007 in favor of the euphemistic and supposedly equivalent Urban renewal.[73]

Sometimes rejections upholding neutrality appeared in the guise of a fear that the term might be misapplied. For instance, although LC acknowledged in its 2019 rejection of Jim Crow laws and Jim Crow (Race relations) that the headings described laws and attitudes promulgated during a specific time period—which could therefore be described in a scope note guiding subject usage—it claimed that “the meeting is also concerned that the heading would be assigned only if the phrase Jim Crow is used in the title.”[74] In other words, the rejection prioritized avoiding possible future confusion over a definable term with ample literary and user warrant. The potential for definitional uncertainty also fueled other rejections, such as Femicide and Secret police in 2010, and Forced assimilation in 2024.[75] To preempt said confusion in all of these cases, LC could have added scope notes defining appropriate usage. Subjects have been remediated in the past when found to be misused, via clarifying scope notes or additional term creation, as with Romance literature (now Romance-language literature) versus Love stories (now Romance fiction).[76] Instead of denying the proposal due to a fear that a term might be misapplied, LC could have worked with the proposers to ensure the heading clearly defined the topic and, if necessary, made a public announcement with additional guidance on how to retrospectively add the term.

Overly-limiting definitions of subjects also provided reasoning for neutrality-based proposal rejections. An attempt in 2011 to add the natural language phrase Queer-bashing as a cross-reference under the then-current heading Gays–Violence against, for example, was rejected with the justification that “queer-bashing is not necessarily violent.”[77] Intersexuality–Law and legislation, a heading reflecting ongoing debates about genital surgeries on infants and legally-recognized genders, was rejected in 2016 because “The subdivision –Law and legislation free-floats [i.e., can be used] under ‘headings for individual or types of diseases and other medical conditions, including abnormalities, functional disorders, mental disorders, manifestations of disease, and wounds and injuries’ (SHM H 1150).”[78] The medicalizing language of the rejection reinforced the view of intersexuality as a “condition” or “disorder” needing fixing, rather than the natural human diversity of a group struggling for bodily autonomy and human rights. The rejection of Redlining in 2024 also fits this definitional pattern. Despite acknowledging that Redlining “functioned in many different financial contexts,” LC’s rejection implied that redlining’s definition was too broad, as LC preferred “the specificity of … separate headings.”[79] This continues to fracture the topic into multiple subjects such as Discrimination in financial services, Discrimination in mortgage loans, and Discrimination in credit cards. The rejection also sidestepped notions of governmental complicity in redlining, and whitewashed the topic by making it appear less systemic in nature.

Purported limitations of the vocabulary also served as justification for rejecting proposals and upholding LCSH neutrality. For instance, Butch/femme (Gender identity) was deemed “too narrow and specialized for a general vocabulary such as LCSH” in 2011 (though Butch and femme (Lesbian culture) was later approved in 2012)[80]—this, despite the copious presence of narrow terms in LCSH about other topics, such as Madagascar hissing cockroaches as pets, Photography of albatrosses, Church work with cowgirls and Zariski surfaces. Anal fisting and Vaginal fisting were rejected with the same rationale in 2010 (in addition to the “vulgar and offensive” argument described above).[81] Two rejections utilizing the same reasoning raise the question of whether queer cultures and identities were evaluated using particularly stringent criteria. As one librarian noted in the RADCAT mailing list after the rejection of Butch/femme (Gender identity):

This is especially baffling given that Bears (Gay culture) has been a valid subject heading for years, and both concepts have about the same amount of literary warrant. For those of you keeping track at home, this isn’t the first example of this rejection. During The Great Fisting Debacle of 2010 … the Anal fisting and Vaginal fisting proposals were shot down using the same language. I haven’t seen PSD [the prior name for PTCP] rejecting scientific or technical heading proposals as too specialized, which makes me wonder if it’s only gender & sexuality-related headings that receive this type of scrutiny.[82]

Troublingly, rejections for queer identities have continued since LC resumed processing tentative lists in January 2025, particularly for queer youth proposals. The rejection of Sexual minority high school students, for instance, indicates potential deference to current governmental queerphobia, particularly since the phrase “At this time” prefaces the justification: “At this time, it is not desirable to qualify headings for this age group by gender identity or expression/sexual orientation.” LC’s suggestion that instead “[t]erms from other subject vocabularies such as Homosaurus may be used instead of, or in conjunction with, existing LCSH headings to express the topic” suggests that there is no place for queer youth identity headings within LCSH.[83]

Finally, proposals were rejected in favor of maintaining pre-existing biases in LCSH–the cultural fixity mentioned in LCSH training “Module 1.4.”[84] For instance, a 2015 rejection of a change proposal related to Indigenous peoples–South Africa highlighted in its rationale the scope note for Indigenous peoples defining them entirely in relation to colonial power: “Here are entered works on the aboriginal inhabitants either of colonial areas or of modern states where the aboriginal peoples are not in control of the government.”[85] Sometimes, even the longevity of a term within LCSH was treated as sufficient reason to reject proposals meant to update outdated and inequitable terms, as with the 2020 rejection of a proposed change from Juvenile delinquents to Juvenile prisoners: “The existing heading Juvenile delinquents has been used for this concept for many years. At this point, it would be practically impossible to examine the entire file so the new heading could be applied accurately. The heading Juvenile delinquents should be assigned instead.”[86] This hesitance to tackle large projects because of the labor required for bibliographic file maintenance perpetuates the tendentious language present in LCSH and reinforces the view that the proposal process is itself not neutral.

Case Study: White Flight

In 2024, the African American Subject Funnel Project submitted a subject proposal for White flight. The proposal cited Kruse’s book White Flight: Atlanta and the Making of Modern Conservatism to demonstrate literary warrant. It additionally cited three reference sources—Encyclopedia of African-American Politics, The New Encyclopedia of Southern Culture, and Wikipedia—in order to define the term and demonstrate that it is commonly used by scholars and the public.

  • [Proposed Heading]: White flight
  • [Variant Term]: White exodus
  • [Broader Term]: Migration, Internal
  • [Broader Term]: Race relations
  • [Broader Term]: White people–Migrations
  • [Related Term]: Segregation
  • [Source]: Kruse, K.M. White flight, ©2005: summary (In this reappraisal of racial politics in modern America, Kevin Kruse explains the causes and consequences of “white flight” in Atlanta and elsewhere) page 5 (In 1963 alone, there were 52 cases of “racial transition,” incidents in which whites fled from neighborhoods as blacks bought homes there; a steady stream of white flight had been underway for nearly a decade)
  • [Source]: Encyclopedia of African-American politics, 2021 (“White flight” is the term used to refer to the tendency of whites to flee areas and institutions once the percentage of blacks reaches a certain level)
  • [Source]: The new encyclopedia of southern culture, 2010 (The term “white flight” refers to the spatial migration of white city dwellers to the suburbs that took place throughout the United States after World War II. One of the most powerful and transformative social movements of the 20th century, white flight significantly affected the class and racial composition of cities and metropolitan areas and the distribution of a conservative postwar political ideology)
  • [Source]: Wikipedia, 16 Oct. 2023 (White flight or white exodus is the sudden or gradual large-scale migration of white people from areas becoming more racially or ethnoculturally diverse. Starting in the 1950s and 1960s, the terms became popular in the United States; examples in Africa, Europe, and Oceania as well as the United States)

However, LC rejected White flight with the following rationale: “LCSH does not currently have an established pattern that combines the topic of migration with the social reasoning for that migration. The meeting was concerned that introducing such a pattern, particularly in this case, would contradict the practice in LCSH of preferring neutral, unbiased terminology as stated in SHM H 204 sec. 2.”[87]

After this Summary of Decisions was issued, librarians on the SACOLIST mailing list publicly disagreed with the rejection and pointed out the flaws in LC’s argument. One poster highlighted the fact that the term was in common use and searched for by library patrons; they also noted another heading already in LCSH that fit the pattern PTCP claimed didn’t exist:

According to H 204 Section 2, the proposed heading should “reflect the terminology commonly used to refer to the concept,” which I believe is the case with this term. Additionally, the same section of H 204 asks, “Will the proposed revision enhance access to library resources? Would library users find it easier to discover resources of interest to them if the proposed change were to be approved?” Again, if this phrase is commonly used by patrons, it would make sense to add it to our catalogs … You wrote that “LCSH does not currently have an established pattern that combines the topic of migration with the social reasoning for that migration.” Could someone explain why Great Migration, ca. 1914-ca. 1970 doesn’t fit this pattern? Is it because of the date range and that this is a specific event?[88]

Another librarian emphasized the ongoing importance of white flight, the prevalence of literature discussing it, and the unequal treatment of headings describing different groups in LCSH:

The differences between these proposals from my perspective seems to be that one describes African Americans and the other describes White people, and White flight is an ongoing concept rather than a single historical event. I hope PTCP reconsiders this decision, because the effects of White flight and the practices surrounding it shape racial inequality in the United States and in many other countries in the world. Many works describe White flight and its consequences … and users are familiar with the term and want to find works about it.[89]

Finally, a respondent noted yet another term matching the supposedly non-existent pattern: “The existing heading Amenity migration would also appear to provide a pattern combining the topic of migration with the social reasoning for that migration.”[90]

Despite these arguments, LC did not respond to the mailing list discussion nor change its decision. As White flight had literary warrant, was amply supported by reference sources, and was a concept that could not be accurately conveyed using already existent subject headings, why was PTCP concerned about neutrality “particularly in this case”? Even governmental entities as varied as the Supreme Court, the U.S. Commission on Civil Rights, the National Register of Historic Places, and LC itself use the term white flight. The rejection’s insistence on the need for uninvolved neutrality therefore seemed inconsistent with the widespread acceptance of the term.

Instead, the neutrality justification appears to be a smokescreen to cover up discomfort with a term that called out white racism; mandating neutrality in this case meant privileging being inoffensive to white people over acknowledging a widely accepted critique of systemic racism. Patton notes in her Substack post “White People Hate Being Called ‘White People’” that whiteness functions in part by invisibility, a “retreat into universalism where whiteness can dissolve back into ‘humanity’ and avoid accountability.”[91] Rejecting the proposal may have been a neutral decision (i.e., deliberately unobjectionable and indifferent to political and social realities), but it was certainly not unbiased (i.e., free from favoritism). Instead, it conceptually reinforced the false position of whiteness described by Patton as “the default, neutral, objective, and moral”[92]—thus undermining equity in LCSH and making works on this important topic invisible and unsearchable in library catalogs.

Discussion

Chiu, Ettarh, and Ferretti describe the futility of relying on neutrality to further social justice within librarianship and its vocabularies:

When the profession discusses neutrality, we believe that the profession actually seeks equity. However, neutrality will not yield equitable results and will always fall short because it relies on equity already existing in society. This is not the condition of our current society, nor is it true for the profession. Therefore, neutrality will actually work toward reinforcing bias and racism.[93]

The rejection of White flight illustrates this point aptly. Justifying the rejection by invoking neutrality means that practically speaking being neutral equates to whitewashing the ongoing phenomenon, by pretending that the movement of white people in the United States is entirely benign, divorced from racism, and not worth library or library user attention. What are the long-term consequences of privileging neutrality, as opposed to equity, in the subject approval process? Neutrality as political isolationism and mandated inoffensiveness leads, as seen in the rejections from 2005 through 2024, to suppressing political and social critiques, hiding prejudice, and rendering the lived experiences of marginalized groups invisible.

It is unfortunately far too easy to weaponize a neutrality that gives equal weight to what groups such as racists and antisemites intend when evaluating proposals. A SHM instruction created in late 2024, “H 1922,” further embeds this weaponization within subject guidance. “H 1922” defines “offensive words” as “derogatory terms that insult, disparage, offend, or denigrate people according to their race, ethnicity, nationality, religion, gender identity, sexuality, occupation, social views, political views, etc.”[94] By including political and social views in the definition, LC inaccurately equates groups espousing opinions about how people should behave in society with demographic groups who have historically been marginalized merely for existing. This leaves LCSH vulnerable to political actors disingenuously claiming “offense” to silence critiques or establish prejudicial terms within the vocabulary. A recent example of this was the proposal to change Trans-exclusionary radical feminism into Gender-critical feminism, the obfuscatory label preferred by the transphobic group, by claiming that trans-exclusionary radical feminism was a slur.[95] (LC ultimately rejected the proposal, thanks in large part to “community activism” and mobilization opposing the change.[96] LC specifically mentioned library community input as the rationale for the rejection: “When this tentative list was published in November 2024, PTCP received over 300 email comments demanding rejection of this proposal.”[97])

There is ample evidence from the recent past and present of this weaponization of offense being used to undermine progress toward equity in the United States. The Trump administration’s proposed Compact for Academic Excellence in Higher Education (2025) exemplifies the dangers of privileging neutrality over equity. The Compact demands “institutional neutrality,” requiring that universities and their employees “abstain from actions or speech relating to societal and political events except in cases in which external events have a direct impact upon the university.” Those agreeing to this isolationist neutrality, in the meantime, would also agree to erase trans, non-binary, and intersex students, faculty, and staff, and to police and punish speech deemed offensive to conservatives. Notably, the Compact requires that admissions be based on “objective” criteria—except for explicitly-allowed faith, “sex-based,” and anti-immigrant biases.[98]

Mandated neutrality within “H 204” risks reifying the same prejudices within library vocabularies. This can be seen in LC’s recent alteration of Mexico, Gulf of to America, Gulf of, and Denali, Mount (Alaska) to McKinley, Mount (Alaska).[99] Critical cataloger Berman describes the former change as “linguistic imperialism,” and the latter as an “affront to Alaska’s indigenous population.”[100] The latter change is particularly damaging, given the simultaneous effort by LC to remediate LCSH related to Indigenous peoples, and might undermine confidence in the project. In both cases, a neutral approach—remaining uninvolved in political and social events—led to an undue “deference to chauvinistic, ethnocentric, and unjustified authority.”[101] Whether LC realistically could have resisted altering these headings is a counterfactual hypothetical. Its actions must be judged by the effects of these revisions within library catalogs and for library patrons. By clinging to the illusion of neutrality, and capitulating to the whims of a racist and colonialist regime, LC undermined the profession’s stated values and harmed the larger library community.

Recommendations

What philosophical approach can LC take in lieu of neutrality, to bring the SACO process more in concert with library ideals of equity and egalitarianism? We recommend that LC employ a values-driven approach to vocabulary construction and maintenance. Explicitly stated library values—particularly around social justice and social responsibility—benefit all users, both marginalized peoples and the “mainstream.” Further, the PCC Policy Committee, of which LC is a permanent member, has already committed to the PCC Guiding Principles for Metadata, which acknowledge that “the standards and controlled vocabularies we use and their application are biased,” and advocates for “incorporating DEI principles in all aspects of cataloging work.”[102] Below, we suggest a number of changes LC could enact to make LCSH and the proposal process more equitable.

In backing away from neutrality as a guiding principle, philosophical approaches that have been suggested in critiques of traditional practice deserve consideration. In her chapter in Questioning Library Neutrality, Iverson proposes that librarians adopt feminist philosopher Haraway’s approach to objectivity: “Haraway explains that what we have accepted as ‘objectivity’ claims to be a vision of the world from everywhere at once … We can not see from all perspectives at once, we each have our own particular views that are shaped by our own identities, cultures, experiences, and locations.”[103] Instead of claiming to possess “infinite vision,” Iverson recommends that we adopt Haraway’s recognition of “situated knowledge.”[104]

Watson argues that instead of literary or bibliographic warrant (cataloging a book in hand, asking what subject headings are needed to convey its content), critical catalogers “operate from a position of catalogic warrant, reading the terms and hierarchies of cataloging and classification systems with a critical eye, reflecting on the potential benefit or harm of each term on marginalized users, groups, or the GLAMS [galleries, libraries, archives, and museums] community as a whole.”[105] In other words, librarians should focus on the subject heading system in its entirety, asking what revisions and additions are needed. In some ways, by collaborating with SACO funnels on large-scale projects to create and revise related groups of subject headings, LC has already moved away from strict adherence to an interpretation of literary warrant that considers the only valid reason to propose a subject heading having a book in hand that requires it. This shift should be continued and expanded.

As for concrete actions, we advise that LC restore its open monthly subject editorial meetings where proposals are discussed, and to expand points of communication with external libraries. This would allow a more diverse range of librarians to participate in the SACO process and provide valuable input during decision-making. Other benefits of monthly meetings have been noted by SACO librarians in an open letter to PTCP: they helped to demystify “the SACO process” for the newly-involved; and allowed librarians to contribute to “lively conversations on a broad range of options, and the opportunity to shape the vocabularies we all use, from proposing single headings to creating special lists to debating new guidelines for topical subdivisions.”[106]

Building off of this, we suggest creating an external advisory group for LCSH, similar to the ones for LCDGT and LCGFT, to get input from a broader range of users on proposal vetting and vocabulary maintenance. Further, we urge LC to allow greater decision-making power for external librarians in all advisory groups. This would help LC vocabularies better reflect the resources in the Library of Congress collections and the needs of thousands of libraries of different types around the world, and improve accountability for decisions made regarding proposals. It would also help to better insulate library vocabularies from the governmental interference noted above, by making a broad range of institutions responsible for their creation and maintenance.

Within such bodies, we recommend that LC follow guidance from the SAC Working Group on External Review of LC Vocabularies, by including members from groups being described in those vocabularies, subject matter experts, and international representatives. Furthermore, membership should not include “[r]epresentatives from groups or organizations that purport to speak for marginalized communities, but who exclude the voices of members of the marginalized community,” or “[r]esearchers or representatives from groups or organizations where the experts cause harm to members of marginalized communities.”[107] The inclusion of representative groups aligns with the PCC Guiding Principles for Metadata and follows the principles put forth in the Cataloguing Code of Ethics.

In vetting SACO proposals, “LC should prioritize sources from the peoples and communities described, privileging those sources over traditionally ‘authoritative’ sources, including literary warrant,” to ensure that the terminology used “reflect[s] a more inclusive and culturally relevant understanding of the language associated with these groups and their heritage and history.”[108] The creation of a position within LC focused on remediating metadata related to Indigenous peoples was a good first step in this direction; and we strongly encourage LC to both continue and expand this practice.

Finally, we suggest revisions to various LC documents and SHM instruction sheets. References to neutrality should be removed from “H 204” and “Module 1.4,” in favor of a focus on active equity in subject assignment and proposals. Examples of unbiased terminology, created in concert with advisory groups described above, reflecting a variety of situations, and periodically updated, would help create a shared understanding between librarians proposing headings and those evaluating them for inclusion in LCSH. “H 180” and “Module 1.4” should also be edited, in the sections advising catalogers to remain objective and not “express personal value judgments.”[109] All cataloging relies on judgment, and judgment is not always synonymous with bias or divorced from facts. A more useful focus here, as in a revised “H 204,” would be on the active equity present in Merriam-Webster’s definition of objectivity; catalogers should employ “catalogic warrant” and evaluate the “potential benefit or harm”[110] of subjects, particularly when assigning headings to prejudicial works. Finally, in order to protect against weaponized “offense,” we also recommend that “social views” and “political views” be removed from “H 1922.” These alterations would bring the SHM and LCSH training more in line with LCDGT guidance, which foregrounds cataloging ethics. “L 400,” for instance, notes that “naming demographic groups and identifying individuals as members of those groups must be done with accuracy and respect,” and highlights the importance of self-identification when assigning headings.[111]

We cannot make recommendations on this topic without addressing the current political climate. Because LC’s catalog migration put most SACO work on hold during 2025,[112] the effect of the Trump administration’s anti-DEI policies on LCSH remains uncertain. However, United States history is rife with periods of political repression. Waiting until relative calm to advocate for equity has not been, historically, how equity was advanced; and it will not serve library patrons or the broader community in the present moment.

Conclusion

LCSH began over a century ago as a subject cataloging tool for the Library of Congress, and has since evolved into a vocabulary serving thousands of libraries around the world. Despite the broad and diverse user base, LC has remained the sole arbiter of which proposals are accepted into LCSH and what form the headings take. During the last two decades it has rejected a number of subject proposals due to a preference for purported neutrality and objectivity, in various guises. Yet, as a profession, librarianship claims to prioritize social responsibility. Social justice and equity are incompatible with an indifferent and purposefully inoffensive neutrality that allows harmful, colonialist, and racist headings in LCSH, and keeps out headings describing prejudice, or about the lived experiences of marginalized peoples.

Olson describes LCSH as “a Third Space between documents being represented and users retrieving them,” since “LCSH constructs the meanings of documents for users.”[113] These meanings impact how users view materials, and whether they can locate them in library catalogs. And it is within this space that LC’s commitment to neutrality fails both users and the ideals of librarianship around social responsibility. However, “because the Third Space is one of ambivalence, it is one with potential for change.”[114] By focusing on library values rather than neutrality within the subject creation and approval process, LCSH could develop into a vocabulary that constructs truly equitable and inclusive meanings for users and librarians alike.

Acknowledgements

Thank you to our publishing editor, Jess Schomberg, and the editorial board for their flexibility, guidance, and expertise throughout the publication process. Thank you to K.R. Roberto, Margaret Breidenbaugh, Crystal Yragui, and Matthew Haugen, who allowed us to quote them within this article. We would also like to thank our reviewers, Jamie Carlstone and Ian Beilin, and other readers who gave valuable feedback: Adam Schiff, Rebecca Albitz, Chereeka Garner, Rebecca Nowicki, Naomi Reeve, Simone Clunie, Violet Fox, and Stephanie Willen Brown.


[1] Robert Jensen. “The Myth of the Neutral Professional,” in Questioning Library Neutrality, ed. Alison Lewis (Library Juice Press, 2008), 91.

[2] Library of Congress, “H 204: Evaluating Subject Proposals,” in Library of Congress Subject Headings Manual, Aug. 2025 rev. (Library of Congress, 2025), 2, https://www.loc.gov/aba/publications/FreeSHM/H0204.pdf (original: https://web.archive.org/web/20180524054119/https://www.loc.gov/aba/publications/FreeSHM/H0204.pdf

[3] Throughout this article, authorized subject headings (i.e., those that exist currently in LCSH) are presented in bold font; while rejected proposed headings appear in italics. For consistency, subject headings within quotations will follow the same formatting, regardless of the formatting used in the original quotation.

[4] Library of Congress, “Summary of Decisions, Editorial Meeting Number 10” (Library of Congress, 2013), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-131021.html; Library of Congress, “Summary of Decisions, LCSH/LCC Editorial Meeting Number 02 (2024)” (Library of Congress, 2024), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2402.pdf.

[5] Post-coordination is the practice of using multiple, separate LCSH terms in combination to convey a single concept.

[6] Library of Congress, “Summary of Decisions, Editorial Meeting Number 4” (Library of Congress, 2008), https://www.loc.gov/aba/pcc/saco/cpsoed/cpsoed-080123.html.

[7] See the manuals for Genre/Form Terms, Demographic Group Terms, and Children’s Subject Headings, for instance.

[8] Gina Schlesselman-Tarango, “How Cute!: Race, Gender, and Neutrality in Libraries,” Partnership: The Canadian Journal of Library and Information Practice and Research 12, no. 1 (Aug. 2017): 10, https://doi.org/10.21083/partnership.v12i1.3850.

[9] Maura Seale, “Compliant Trust: The Public Good and Democracy in the ALA’s ‘Core Values of Librarianship,’” Library Trends 64, no. 3 (2016): 589, https://doi.org/10.1353/lib.2016.0003.

[10] American Library Association Working Group on Intellectual Freedom and Social Justice, “Final Report from the Intellectual Freedom and Social Justice Working Group” (EBD #10.0, American Library Association, 2022), 10, https://www.ala.org/sites/default/files/aboutala/content/governance/ExecutiveBoard/20222023Docs/ebd%2010.0%20IF_SJ%20Final%20Report%207.12.2022.pdf.

[11] International Federation of Library Associations and Institutions, “IFLA Code of Ethics for Librarians and other Information Workers,” 4, https://www.ifla.org/wp-content/uploads/2019/05/assets/faife/publications/IFLA%20Code%20of%20Ethics%20-%20Long_0.pdf.

[12] National Information Standards Organization, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, ANSI/NISO Z39.19-2005 (R2010) (National Information Standards Organization, 2010), 30, https://groups.niso.org/higherlogic/ws/public/download/12591/z39-19-2005r2010.pdf.

[13] National Information Standards Organization, Guidelines, 44.

[14] Oxford English Dictionary, “Neutral,” https://www.oed.com/dictionary/neutral_n?tab=meaning_and_use#34680278 and “Unbiased,” https://www.oed.com/dictionary/unbiased_adj?tab=meaning_and_use#17025200.

[15] Dani Scott and Laura Saunders, “Neutrality in Public Libraries: How Are We Defining One of Our Core Values?,” Journal of Librarianship and Information Science 53, no. 1 (2020): 153, https://doi.org/10.1177/0961000620935501.

[16] Scott and Saunders, “Neutrality in Public Libraries,” 158.

[17] “Are Libraries Neutral? Highlights from the Midwinter President’s Program,” American Libraries, June 1, 2018. https://americanlibrariesmagazine.org/2018/06/01/are-libraries-neutral/

[18] Michael Dudley, “Library Neutrality and Pluralism: A Manifesto,” Heterodoxy in the Stacks, Aug. 8, 2023 https://hxlibraries.substack.com/p/library-neutrality-and-pluralism.

[19] Mark Rosenzweig. “Politics and Anti-Politics in Librarianship,” in Questioning Library Neutrality, ed. Alison Lewis (Library Juice Press, 2008), 5-6.

[20] Stephen Macdonald and Briony Birdi, “The Concept of Neutrality: A New Approach,” Journal of Documentation 76, no. 1 (2020): 333–353. https://doi.org/10.1108/JD-05-2019-0102.

[21] Jaeger-McEnroe, “Conflicts of Neutrality,” 3.

[22] Jaeger-McEnroe, “Conflicts of Neutrality,” 6.

[23] Jaeger-McEnroe, “Conflicts of Neutrality,” 9.

[24] Steve Joyce, “A Few Gates Redux: An Examination of the Social Responsibilities Debate in the Early 1970s and 1990s,” in Questioning Library Neutrality, ed. Alison Lewis (Library Juice Press, 2008), 33-65.

[25] “ALA Code of Ethics,” American Library Association, updated June 29, 2021, https://www.ala.org/tools/ethics

[26] “Resolution to Condemn White Supremacy and Fascism as Antithetical to Library Work,” American Library Association, Jan. 25, 2021, https://tinyurl.com/yr4z9e8x

[27] Scott and Saunders, “Neutrality in Public Libraries,” 153.

[28] “Are Libraries Neutral?”

[29]Canadian Federation of Library Associations / Fédération canadienne des associations de bibliothèques, “CFLA-FCAB Code of Ethics,”updated Aug. 27, 2018, https://cfla-fcab.ca/wp-content/uploads/2019/06/Code-of-ethics.pdf.

[30] Jaeger-McEnroe, “Conflicts of Neutrality,” 5.

[31] Jaeger-McEnroe, “Conflicts of Neutrality,” 5, 6.

[32] Anita Brooks Kirkland, “Library Neutrality as Radical Practice,” Synergy v. 19, no. 2 (Sept. 2021) https://www.slav.vic.edu.au/index.php/Synergy/article/view/536.

[33] Nicole Pagowsky and Niamh Wallace, “Black Lives Matter!: Shedding Library Neutrality Rhetoric for Social Justice,” College & Research Libraries News 76, no. 4 (2015): 198. https://crln.acrl.org/index.php/crlnews/article/view/9293/10374.

[34] Cataloging Ethics Steering Committee, “Cataloguing Code of Ethics,” January 2021,  http://hdl.handle.net/11213/16716.

[35] Subject Analysis Committee Working Group on the LCSH “Illegal aliens,” “Report from the SAC Working Group on the LCSH ‘Illegal aliens,'” July 13, 2016, https://alair.ala.org/handle/11213/9261.

[36] Jill E. Baron, Violet B. Fox, and Tina Gross, “Did Libraries ‘Change the Subject’? What Happened, What Didn’t, and What’s Ahead,” in Inclusive Cataloging: Histories, Context, and Reparative Approaches, eds. Billey Albina, Rebecca Uhl, and Elizabeth Nelson (ALA Editions, 2024), 53; Library of Congress, “Library of Congress Subject Headings Approved Monthly List 11 (November 12, 2021)” (Library of Congress, 2021), https://classweb.org/approved-subjects/2111b.html.

[37] Baron et al., “Did Libraries ‘Change the Subject?,’” 54.

[38] Michelle Cronquist and Staci Ross, “Black Subject Headings in LCSH: Successes and Challenges of the African American Subject Funnel Project,” Reference and User Services Association, July 7, 2021, virtual. https://d-scholarship.pitt.edu/41826

[39] Cronquist and Ross, “Black Subject Headings in LCSH.”

[40] Cronquist and Ross, “Black Subject Headings in LCSH.”

[41] Library of Congress, “Library of Congress Subject Headings Approved Monthly List 06 (June 18, 2021)” (Library of Congress, 2021), https://classweb.org/approved-subjects/2106.html. Note the headings for Japanese Americans, Japanese Canadians, and Aleuts were originally submitted as –Forced removal and incarceration matching preferred usage, but LC changed them all to –Forced removal and internment.

[42] Library of Congress, “Library of Congress Subject Headings Approved Monthly List 08 (August 12, 2022)” (Library of Congress, 2022), https://classweb.org/approved-subjects/2208.html; Library of Congress, “Library of Congress Subject Headings Approved Monthly List 08 LCSH 2 (August 18, 2023)” (Library of Congress, 2023), https://classweb.org/approved-subjects/2308a.html; Library of Congress, “Library of Congress Subject Headings Approved Monthly List 04 (Apr. 21, 2023)” (Library of Congress 2023),https://classweb.org/approved-subjects/2304.html; Library of Congress, “Library of Congress Subject Headings Approved Monthly List 03 LCSH 2 (March 15, 2024)” (Library of Congress, 2024), https://classweb.org/approved-subjects/2403a.html.

[43] For more information about Congressional actions related to the attempt to change Illegal aliens, see: SAC Working Group on Alternatives to LCSH “Illegal aliens,” “Report of the SAC Working Group on Alternatives to LCSH ‘Illegal aliens’” (American Library Association, 2020), http://hdl.handle.net/11213/14582.

[44] Tina Gross, “Search Terms up for Debate: The Politics and Purpose of Library Subject Headings,” Perspectives on History 60, no. 3 (2022), https://www.historians.org/perspectives-article/search-terms-up-for-debate-the-politics-and-purpose-of-library-subject-headings-march-2022/.

[45] Michael Colby, “SACO: Past, Present, and Future,” Cataloging & Classification Quarterly 58, no. 3-4 (2020): 287, https://doi.org/10.1080/01639374.2019.1706679.

[46] Library of Congress Subject Headings Manual, Aug. 2025 rev. (Library of Congress, 2025), https://www.loc.gov/aba/publications/FreeSHM/freeshm.html.

[47] Library of Congress, “Module 1.5: Introduction to LCSH,” in Library of Congress Subject Headings: Online Training (Library of Congress, 2016), 8, https://www.loc.gov/catworkshop/lcsh/PDF%20scripts/1-5%20Intro%20To%20LCSH.pdf.

[48] Rich Gazan, “Cataloging for the 21st Century Course 3: Controlled Vocabulary & Thesaurus Design Trainee’s Manual” in Library of Congress Cataloger’s Learning Workshop (Library of Congress, n.d.), 2-2,

https://www.loc.gov/catworkshop/courses/thesaurus/pdf/cont-vocab-thes-trnee-manual.pdf

[49] Library of Congress, “H 204,” 3.

[50] Library of Congress, “H 180: Assigning and Constructing Subject Headings,” in Library of Congress Subject Headings Manual, Feb. 2016 rev. (Library of Congress, 2016), 8, https://www.loc.gov/aba/publications/FreeSHM/H0180.pdf.

[51] Library of Congress, “Module 1.2: Why Do We Use Controlled Vocabulary?,” in Library of Congress Subject Headings: Online Training (Library of Congress, 2016), 7, https://www.loc.gov/catworkshop/lcsh/PDF%20scripts/1-2-WhyCV.pdf.

[52] Library of Congress, “H 204,” 2.

[53] Library of Congress, “Module 1.4: How Do We Determine Aboutness?,” in Library of Congress Subject Headings: Online Training (Library of Congress, 2016), 3, https://www.loc.gov/catworkshop/lcsh/PDF%20scripts/1-4-Aboutness.pdf.

[54] Merriam-Webster Dictionary, “Neutral,” https://www.merriam-webster.com/dictionary/neutral and “Unbiased,” https://www.merriam-webster.com/dictionary/unbiased.

[55] Library of Congress, “Module 1.4,” 3.

[56] Library of Congress, “H 180: Assigning and Constructing Subject Headings,” in Library of Congress Subject Headings Manual, Feb. 2016. (Library of Congress, 2016), 7, https://www.loc.gov/aba/publications/FreeSHM/H0180.pdf 

[57] Oxford English Dictionary, “Objectivity,” https://www.oed.com/dictionary/objectivity_n?tab=meaning_and_use#34080200; Merriam-Webster Dictionary, “Objectivity,” https://www.merriam-webster.com/dictionary/objectivity.

[58] Michael R. Griffiths, “Roland Barthes Declared the ‘Death of the Author’, but Postcolonial Critics have Begged to Differ,” The Conversation, July 2, 2025, https://theconversation.com/roland-barthes-declared-the-death-of-the-author-but-postcolonial-critics-have-begged-to-differ-256093.

[59] Library of Congress Subject Headings, “Holocaust denial literature,” https://lccn.loc.gov/sh96009503.

[60] Anastasia Chiu, Fobazi M. Ettarh, and Jennifer A. Ferretti, “Not the Shark, but the Water: How Neutrality and Vocational Awe Intertwine to Uphold White Supremacy,” in Knowledge Justice: Disrupting Library and Information Studies through Critical Race Theory, eds. Sofia Y. Leung, Jorge R. López-McKnight (MIT Press, 2021), 65.

[61] Library of Congress, “Editorial Meeting Number 4,” 2008; Library of Congress, “LCSH/LCC Editorial Meeting Number 02 (2024).”

[62] Library of Congress, “Summary of Decisions, Editorial Meeting Number 10” (Library of Congress, 2013), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-131021.html; Library of Congress, “Summary of Decisions, LCSH/LCC Editorial Meeting Number 05 (2023)” (Library of Congress, 2023), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2305.pdf; Library of Congress, “Summary of Decisions, Editorial Meeting Number 46” (Library of Congress, 2010), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-101117.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 4” (Library of Congress, 2015), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-150420.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 27” (Library of Congress, 2010), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-100707.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 36” (Library of Congress, 2009), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-090909.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 1911” (Library of Congress, 2019), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-191118.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 2111” (Library of Congress, 2018), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-211115.html.

[63] Library of Congress, “Module 1.4,” 3.

[64] Library of Congress, “Summary of Decisions, Editorial Meeting Number 46” (Library of Congress, 2007), https://www.loc.gov/aba/pcc/saco/cpsoed/cpsoed-071114.html; Library of Congress, “Summary of Decisions, LCSH/LCC Editorial Meeting Number 6 (2024)” (Library of Congress, 2024), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2406.pdf.

[65] Library of Congress, “Summary of Decisions, Editorial Meeting Number 04” (Library of Congress, 2016), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-160418.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 2006” (Library of Congress, 2020), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-200615.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 23” (Library of Congress, 2011), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-110815.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 10” (Library of Congress, 2016), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-161017.html; Library of Congress, “Library of Congress Subject Headings Approved Monthly List 06 (June 17, 2022)” (Library of Congress, 2022), https://classweb.org/approved-subjects/2206.html.

[66] Library of Congress, “Summary of Decisions, Editorial Meeting Number 2006” (Library of Congress, 2020), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-200615.html.

[67] Library of Congress, “Summary of Decisions, Editorial Meeting Number 10” (Library of Congress, 2014), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-141020.html; Library of Congress, “Summary of Decisions, LCSH/LCC Editorial Meeting Number 07 (2023)” (Library of Congress, 2020), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2307.pdf.

[68] Krista Maywalt Aronson, Brenna D. Callahan, and Anne Sibley O’Brien, “Messages Matter: Investigating the Thematic Content of Picture Books Portraying Underrepresented Racial and Cultural Groups,” Sociological Forum 33, no. 1 (2018): 179, http://www.jstor.org/stable/26625904.

[69] Lisely Laboy, Rachael Elrod, Krista Aronson, and Brittany Kester, “Room for Improvement: Picture Books Featuring BIPOC Characters, 2015–2020,” Publishing Research Quarterly 39 (2023): 58, https://doi.org/10.1007/s12109-022-09929-7.

[70] Library of Congress, “Editorial Meeting Number 36,” 2009; Library of Congress, “Summary of Decisions, Editorial Meeting Number 21” (Library of Congress, 2011), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-110620.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 02” (Library of Congress, 2012), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-120221.html; Library of Congress, “Summary of Decisions, Editorial Meeting Number 06” (Library of Congress, 2018), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-180618.html; Library of Congress, “Editorial Meeting Number 2111,” 2021; Library of Congress, “Summary of Decisions, LCSH List Number 11c (2024) (2024) and LCC List Number 10 & 11 (2024)” (Library of Congress, 2024), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2412g.pdf.

[71] Library of Congress, “Summary of Decisions, Editorial Meeting Number 07” (Library of Congress, 2014), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-140721.html.

[72] Library of Congress, “Summary of Decisions, Editorial Meeting Number 09” (Library of Congress, 2017), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-170918.html. LC did establish a new heading for Denialism at that time; however, per the rejection, “To bring out the denialism aspect of events or topics, the heading may be post-coordinated with headings for the events or topics. The existing subject headings Holocaust denial and Holodomor denial, which are related to specific events, were added by exception as narrower terms of the new heading Denialism. Additional narrower terms will not be added to Denialism.”

[73] Library of Congress, “Summary of Decisions, Editorial Meeting Number 23” (Library of Congress, 2007), https://www.loc.gov/aba/pcc/saco/cpsoed/cpsoed-070606.html.

[74] Library of Congress, “Editorial Meeting Number 1911,” 2019.

[75] Library of Congress, “Summary of Decisions, Editorial Meeting Number 49” (Library of Congress, 2010), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-101208.html; Library of Congress, “Summary of Decisions, “LCSH/LCC Quarterly Editorial Meeting List 2409” (Library of Congress, 2024), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2409.pdf.

[76] Library of Congress, “Summary of Decisions, Editorial Meeting Number 5” (Library of Congress, 2015), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-150518.html.

[77] The heading is now Gay people–Violence against.Library of Congress, “Summary of Decisions, Editorial Meeting Number 27” (Library of Congress, 2011), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-111219.html.

[78] Library of Congress, “Editorial Meeting Number 04,” 2016.

[79] Library of Congress, “Summary of Decisions, LCSH Number 11 and LCC Number 11b (2024)” (Library of Congress, 2024), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2411.pdf.

[80] Library of Congress, “Editorial Meeting Number 27,” 2011; Library of Congress, “Library of Congress Subject Headings Monthly List 12 LCSH (December 17, 2012)” (Library of Congress, 2012), https://classweb.org/approved-subjects/1212.html.

[81] Library of Congress, “Editorial Meeting Number 27,” 2010.

[82] K.R. Roberto, “LCSH Proposals: Is this a Trend?” Jan. 17, 2012, RADCAT mailing list archives.

[83] Library of Congress, “Summary of Decisions, LCSH/LCC Editorial Meeting Number 12 (2024)” (Library of Congress, 2024), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2412.pdf.

[84] Library of Congress, “Module 1.4,” 3.

[85] Library of Congress, “Summary of Decisions, Editorial Meeting Number 12” (Library of Congress, 2015), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-151212.html. A 2016 rejection of Dadaist literature, Romanian (French) also highlighted colonialist content in LCSH, noting that “Headings for national literatures qualified by language are generally established for the language(s) of the colonial power that used to control the territory.” See: Library of Congress, “Editorial Meeting Number 04,” 2016.

[86] Library of Congress, “Summary of Decisions, Editorial Meeting Number 2003” (Library of Congress, 2020), https://www.loc.gov/aba/pcc/saco/cpsoed/psd-200316.html.

[87] Library of Congress, “Editorial Meeting Number 02 (2024).”

[88] Margaret Breidenbaugh, “Re: Summary of Decisions, Editorial Meeting Number 02, February 16, 2024,” SACOLIST Mailing List Archives, Library of Congress, May 29, 2024, https://listserv.loc.gov/cgi-bin/wa?A2=SACOLIST;eb3d8761.2405&S=.

[89] Crystal Yragui, “Re: Summary of Decisions, Editorial Meeting Number 02, February 16, 2024,” SACOLIST Mailing List Archives, Library of Congress, May 30, 2024, https://listserv.loc.gov/cgi-bin/wa?A2=2405&L=SACOLIST&D=0&P=1800917.

[90] Matthew Haugen, “Re: Summary of Decisions, Editorial Meeting Number 02, February 16, 2024,” SACOLIST Mailing List Archives, Library of Congress, May 29, 2024, https://listserv.loc.gov/cgi-bin/wa?A2=2405&L=SACOLIST&D=0&P=1796174.

[91] Stacey Patton, “White People Hate Being Called ‘White People,’” Substack, Oct. 23, 2025, https://drstaceypatton1865.substack.com/p/white-people-hate-being-called-white.

[92] Stacey Patton, “White People.”

[93] Chiu, Ettarh, and Ferretti, “Not the Shark,” 56-57.

[94] Library of Congress, “H 1922: Offensive Words” in Library of Congress Subject Headings Manual, Sep. 2024 (Library of Congress, 2024), 2, https://www.loc.gov/aba/publications/FreeSHM/H1922.pdf

[95] Library of Congress, “Tentative Monthly List 12 LCSH (December 20, 2024)” (Library of Congress, 2024), https://classweb.org/tentative-subjects/2412.html

[96] Brinna Michael, “LCSH, Transparency, and the Impact of Collective Action,” TCB: Technical Services in Religion & Theology 33, no. 2 (2025): 1. https://doi.org/10.31046/h01fq272.

[97] Library of Congress, “Summary of Decisions, LCSH/LCC Editorial Meeting Number 12 (2024)” (Library of Congress, 2024), https://www.loc.gov/aba/pcc/saco/cpsoed/ptcp-2412.pdf.

[98] U.S. Department of Education, Compact for Academic Excellence in Higher Education (Draft Memorandum, Oct. 2025), 4, 5, 2, 1, 9, https://www.washingtonexaminer.com/wp-content/uploads/2025/10/Compact-for-Academic-Excellence-in-Higher-Education-10.1.pdf.

[99] Library of Congress, “Library of Congress Subject Headings Approved Monthly List 12 LCSH 2” (Library of Congress, 2025), https://classweb.org/approved-subjects/2412a.html. For more information, including the fast-tracked nature of the changes, see Violet Fox, “Anticipatory Obedience at the Library of Congress,” ACRLog (blog), Mar. 28, 2025, https://acrlog.org/2025/03/28/anticipatory-obedience-at-the-library-of-congress/

[100] Sanford Berman, “ALA at 150: An Interview with (and by) Sanford Berman,” by Jenna Freedman. Lower East Side Librarian, Nov. 30, 2025 https://lowereastsidelibrarian.info/interviews/sandy-2025 

[101] Berman, “ALA at 150.”

[102] Program for Cooperative Cataloging, “Program for Cooperative Cataloging Guiding Principles for Diversity, Equity, and Inclusion for Metadata Creation,” approved Jan. 19, 2023 https://www.loc.gov/aba/pcc/resources/DEI-guiding-principles-for-metadata-creation.pdf

[103] Sandy Iverson, “Librarianship and Resistance,” in Questioning Library Neutrality, ed. Alison Lewis (Library Juice Press, 2008), 26.

[104] Iverson, “Librarianship and Resistance,” 26.

[105] B. M. Watson, “Expanding the Margins in the History of Sexuality & Galleries, Libraries, Archives, Museums & Special Collections (GLAMS)” PhD diss. (University of British Columbia, 2025), 270.

[106] Violet Fox, et al. to Policy, Training and Cooperative Programs Division, Library of Congress, June 30, 2024, “Editorial Meetings Decision,” https://cataloginglab.org/editorial-meetings-decision/

[107] Subject Analysis Committee Working Group on External Review of LC Vocabularies, Report of the SAC Working Group on External Review of Library of Congress Vocabularies, February 2023, 8-9, https://alair.ala.org/handle/11213/20012.

[108] Working Group on External Review of LC Vocabularies, “Report,” 8.

[109] Library of Congress, “H 180”, 7.

[110] Watson, “Expanding the Margins,” 270.

[111] Library of Congress, “L 400: Ethics and Demographic Group Terms” in Library of Congress Demographic Group Terms Manual,Mar. 2025 (Library of Congress, 2025), 1, https://www.loc.gov/aba/publications/FreeLCDGT/L400.pdf.

[112] Cataloging Policy and Standards, “Announcement from the Library of Congress (April 7, 2025),” SACOLIST Mailing List Archives, Library of Congress, April 7, 2025, https://listserv.loc.gov/cgi-bin/wa?A2=SACOLIST;61e18f28.2504&S=

[113] Hope Olson, “Difference, Culture and Change: The Untapped Potential of LCSH,” Cataloging & Classification Quarterly 29, no. 1–2 (2000), 54 https://doi.org/10.1300/J104v29n01_04.

[114] Olson, “Difference, Culture and Change,” 66.

Strike time, collective action, and moral conviction in library leadership / Meredith Farkas

I’m on strike right now, along with thousands of other faculty, academic professionals, and staff at Portland Community College (that’s two unions, friends!). It’s a weird feeling. I never thought I’d be in this position. PCC was the first place I worked where I really felt like the values of the College matched my own. I work with insanely dedicated and caring library workers, faculty, and staff. They believe unwaveringly in what they do and constantly go above and beyond for students. After being here for a few years, I knew this was the place I wanted to work for the rest of my career. Even as administration became worse – more corporatized, more performative, less accessible, more likely to listen to outside consultants than the people who directly work with students – I still never considered leaving because the folks I work with regularly are awesome and I love our students. 

As a scholar of time, I’m always interested in different forms of time (queer time, crip time, etc.). Strike time feels really strange. We were talking this morning on the picket line how it feels a lot like early COVID where time moved very differently. We feel like the days are both way too long and super short with not enough time to get everything done but also too much time just staring at different union social channels. We’re totally energized and totally exhausted (I’m lying on the couch like a ragdoll right now after three hours of holding signs, screaming, and dancing, marching and chanting with hundreds of colleagues). In terms of information, we feel like we’re both drinking from a firehose and like we don’t have any of the information we need. We have no idea what the near-term future will bring. What day of the week it is feels almost arbitrary because none of the usual markers of those days apply (I see all the things I was supposed to have been doing at work each day on my calendar and it feels like another life entirely). We’re both unmoored and deeply connected. I love it (the connection and collective power) and I also really hate it (for our students, for our colleagues who live paycheck to paycheck, for what the administration and the Board are doing to my beloved institution). 

So it’s weird to feel both temporarily severed from the College and also more deeply connected than ever. These administrators may run the College and have the authority to make decisions, but they are not the College. The College is the people I’ve seen on the picket lines the past few days in the rain and freezing cold. These people who are truly fighting for the soul of our college. They make the College run, from teaching classes, to assisting students with all kinds of needs, to helping students feel welcome, to keeping the College clean and safe and keeping students fed. All of these things are critical and the College can’t run without us, but I’m not entirely sure the same can be said of our administrators. The College is also our students, many of whom have stood with us on the line, who’ve brought us food, or have supported us through emails to the President and Board and on social media. I feel incredibly grateful for our students who clearly see through the bs administration is putting out there. 

It’s been kind of incredible to see how unprepared our administration was for this after 11 months in which they barely moved in negotiations. They’ve known for months that a strike  was a distinct possibility and they were the ones who walked away from the bargaining table the night before the strike was meant to happen. The latest email from the President said “I will say, with some pride, that we are not – and we should not – be an organization that is good at navigating this scenario” but, honestly, they should have had guidance for students ready to go. Administrators are supposed to plan for scenarios like this. They had units planning for two different scenarios for cuts from the State (neither of which came to pass). We spent almost a year planning what we would cut if LSTA funds went away in our state for the next year (they didn’t, thank goodness). Most faculty, on the other hand, have been talking to students about a possible strike for the past six weeks at least and the union provided tons of resources to help them come up with a plan for their own classes. Yet the College was left totally scrambling last Wednesday as if they had no idea this could happen. Baffling.

It’s been interesting seeing some managers show up to bring food and/or spend a bit of time with us on the line. It’s not a lot of them, but it means a lot to us when someone does. They’ve told us about the absolute unprepared hot mess that is administration right now and it’s nice to realize that not every middle manager tows the party line at all times. But the vast majority of our managers sent us emails just before the start of the strike asking us to let them know if we were working or not, so most are definitely sticking with administration.

I had a boss many years ago who definitely put her employees first and advocated fiercely for us. She said she saw her role as being akin to a manager of a minor league baseball team. She was here to help develop us for bigger and better things in our careers. She was a major mentor to me in my early years in the profession. Since then, the bosses I’ve had really prioritized the people above them in the org chart ahead of the people below them. They have been classic “company [wo]men.” Helping us develop in our careers or even supporting us when we explicitly asked for it wasn’t part of the job. When I was a middle manager, I took the exact opposite approach and that’s why I’m no longer a middle manager. I always saw the role of a manager as supporting one’s direct reports (essentially, I worked for them) and that wasn’t what the people in charge of the library wanted me to do.

The great library leader Mitch Freedman died recently and it made me think about whether leaders like him can really exist in our much more corporatized libraries these days. If you don’t know about Mitch’s storied biography as a library leader and awesome human, please take a moment to read about him here in an obit from his family. When I was coming up as a librarian, he was the sort of man who was a model for me in successfully operating in our field with total moral courage. He lived his values every day. He fought for people and the things that he believed in. He centered the folks who were oppressed. He believed relationships were core to our work. In many ways, he embodied the “Good” and the “Human(e)” characteristics of slow librarianship (maybe also the “Thoughtful” but I didn’t work with him, so I’m not sure). His amazing daughter, Jenna Freedman, also lives her values courageously, a living tribute to his example.

I hope there are still library managers out there still who have moral courage and fight the good fight, but, more and more, it feels like the people who become library Deans, Directors, and University Librarians are the ones who are willing to comply and conform, not the ones willing to rock the boat. As our institutions become more and more corporatized and neoliberal, we see less and less moral courage. I see a lot of library administrators wanting to look like they’re doing good more than they actually want to do good. I think of the leaders who all started EDI initiatives or published EDI statements right around 2020 and then let them fade away. Most of the people I see doing amazing values-driven work in our field these days are not leading libraries. They’re mostly front-line librarians. I wonder if it’s because like me, folks are not willing to make the moral compromises so many have to make these days to climb the ladder.

In “Anthropology and the rise of the professional-managerial class,” the great (and deeply missed) David Graeber wrote about how 

the decisive victory of capitalism in the 1980s and 1990s, ironically, has… led to both a continual inflation of what are often purely make-work managerial and administrative positions—”bullshit jobs”—and an endless bureaucratization of daily life, driven, in large part, by the Internet. This in turn has allowed a change in dominant conceptions of the very meaning of words like “democracy.” The obsession with form over content, with rules and procedures, has led to a conception of democracy itself as a system of rules, a constitutional system, rather than a historical movement toward popular self-rule and self-organization, driven by social movements, or even, increasingly, an expression of popular will.

I see that in my own place of work. So much of my boss’ (our Dean’s) job is box checking compliance type work – approving vacations and sick leave, making sure we’re doing required trainings and other things the people above her on the org chart want us to do, making sure we’re doing all of the things contractually required of us, etc. It used to be that I met with her once each term to talk about what I was working on, go over my progress on my goals, etc. Then I went to meeting with her just once in Fall where we’d look at my goals document (without any meaningful feedback or support) and then I’d fill out a Google form at the end of the year to tell her what I did (with again no meaningful feedback). Now, even that Fall meeting is gone as her load of compliance-related work has increased. There’s no support outside of helping us navigate the bureaucracy of our institution. There’s no “walking around” as Mitch Freedman did – building relationships with employees and making them feel seen. There’s no focus on our development or talking about the meaning behind what we do. There’s just this compliance-focused flurry of activity. 

As our colleges and universities become more and more corporatized, they turn what were supposed to be leadership positions, that required vision and people skills, and turn them into babysitting jobs because, lord knows, we professionals can’t be trusted. Our college, like many, has seen a massive growth in the number of managerial positions, and yet, faculty and staff are being asked to do more administrative work than ever before, not less. Why? Well, of course those managers have to justify their existence. 

Could a Mitch Freedman become a library director today? Would he have had to compromise his values somewhere down the line to get there? Do you know of any library leaders like Mitch today who are able to operate successfully in these more neoliberal environments? 

In that same piece, David Graeber writes “scholars are expected to spend less and less of their time on scholarship, and more and more on various forms of administration—even as their administrative autonomy is itself stripped away. Here too we find a kind of nightmare fusion of the worst elements of state bureaucracy and market logic.” This is the reality we find ourselves in as our two unions fight for better pay, but even more importantly, for a real, substantial model of shared governance which we don’t currently have (and which our college President agreed to and then hired a consultant to create for us 🙄). The fact that the only college committee or governance group that has the ability to conduct a vote of no confidence in our President (which they successfully passed!) is our student government is a stark reminder of how little power and voice we have in the future of our college. It can be so easy to just focus on keeping our head down and doing the good work we do as educators, as supporters of students and faculty, as stewards of collections, etc., but when we fight together like this, we fight for the heart and soul of our organization. We fight for an organization that centers students and their needs and listens deeply to those who directly serve and educate them. 

Walking the picket line the first couple of days was brutal in many ways. I was so cold and wet I couldn’t even grip my cell phone or a car door handle and I had to stay off my feet for a few hours as they thawed. But what has kept me warm, has kept all of us warm, is the solidarity. It has sometimes felt almost like a party, being there with many hundreds of my fellow colleagues. It’s been so affirming, so energizing. We’re all so united in this, so deeply committed to the institution and each other in ways that these administrators who jump from job to job every few years and compose soulless emails to us with freaking ChatGPT will never understand. 

If you’re feeling so inclined, please contribute to our strike fund. The administration seems really dug in and even decreased their offer by over $100,000 on Sunday, so I’m not quite so optimistic anymore that this will end quickly and we have lots of faculty, academic professionals, and staff who won’t be able to pay their rent or mortgage without support. Thanks and solidarity!! ✊

Librarian Leadership in the Age of AI / Information Technology and Libraries

Librarians have managed and lived through many seismic shifts brought by technology. How should librarian leaders approach the coming anticipated AI workforce disruption?

Refusal as Instruction / Information Technology and Libraries

Abstract This column explores the ways in which library workers can better align technology use and instruction in library settings with library values, through championing the refusal of technologies that conflict with values like privacy and intellectual freedom. Drawing on experiences with individual patron instruction, class design, and passive programming, the author shares practical steps for helping patrons to understand and fight back against exploitation by digital technologies. Rejecting the myth that any technology is “neutral,” the column argues that libraries as values-driven organizations have a role to play in facilitating patrons’ rejection of technology, just as much as in their adoption of it.

Note from Shanna Hollich, column editor: I am particularly excited to share this issue's column for a number of reasons. First, it's from a public library perspective, which is one that is generally underrepresented in the LIS literature as a whole, and which I'm proud to say that ITAL makes a concerted effort to address. Second, it's about library instruction, a topic of relevance to all types of libraries - and where much of the literature specifically discusses formal library instruction, this column also addresses passive programming, informal instruction, and casual patron interaction, which are also vitally important and under-studied aspects of the library worker's role in education. And finally, it's yet another column about AI, and even more specifically, about taking a critical approach to AI tools, AI education, and AI literacy. Close readers may have noticed this topic tends to be a special interest of mine, but Hannah Cyrus takes a measured and reasoned approach here that acknowledges the potential harms of AI without falling into the trap of simply ignoring or denying AI and the very real impacts it is having on our libraries and the communities we serve.

From Card Catalogs to Semantic Search / Information Technology and Libraries

The first phase of the Reimagining Discovery project at Harvard Library sought to address the challenge of fragmented search experiences of special collections materials using artificial intelligence (AI) technologies, such as embedding models and large language models (LLMs). The resulting platform, Collections Explorer, simplifies and enhances the search experience for more effective special collections discovery. The project team took a user-centered and trustworthy approach to implementing AI, grounding the choices of the platform in user empowerment and librarian expertise. The development process included extensive user research, including interviews, usability testing, and prototype evaluations, to understand and address user needs.

Collections Explorer was developed using a multi-component architecture that integrates multiple types of AI. The team evaluated more than 12 models to select ones that were the best fit for the need, as well as being ethical and sustainable. Detailed system prompts were developed to guide LLM outputs and ensure the reliability of information. The methodical and iterative approach helped to create a flexible and scalable platform that could evolve to support other material types in the future. Initial research showed that potential users are enthused at the prospect of AI-powered features to enhance discovery, especially the item-level summaries and related search suggestions. The project demonstrated the potential of integrating AI technologies into library discovery systems while maintaining a commitment to trustworthiness and user-centered design.

Automatic Classification of Subjects and Sustainable Development Goals (SDGs) in Documents with Generative AI / Information Technology and Libraries

This study evaluates the effectiveness of the Artificial Intelligence for Theme Generation tool (original Portuguese acronym name: IAGeraTemas), developed with generative artificial intelligence (AI; Google Gemini), for automating thematic classification and the assignment of Sustainable Development Goals (SDGs) in documents. The methodology combined quantitative analyses (metrics of precision, recall, and accuracy) on 50 articles published by authors from the State University of Campinas (Unicamp), using classification from the SciVal database and qualitative analyses (analysis of the relevance of terms indexed by librarians from the Unicamp Library System in 40 articles available in the Unicamp Institutional Repository), comparing them with manual indexing performed by librarians. The quantitative results in SDG classification showed a recall of 0.785, while the “precision” and “accuracy” metrics were moderate. The qualitative analysis deepened the evaluation of term coherence and relevance suggested by the AI versus human indexing. It revealed the tool’s potential for suggesting relevant terms and expanding concepts, but it also exposed limitations in addressing complex topics. The research, conducted as an experiment at Unicamp Library System, concludes that IAGeraTemas is a valuable auxiliary tool, complementing but not replacing manual indexing, reinforcing the importance of human expertise in validating and refining results, and emphasizing the synergistic potential between AI and information professionals.

Metadata for Storytelling / Information Technology and Libraries

This article describes a case study in which a small metadata team at Illinois State University Milner Library produced a digital humanities project supporting Collections as Data (CAD) and linked data principles. Despite initial sparse descriptive content, the team recognized great potential for experimentation in a significant World War I archival collection to highlight lesser-known stories, including those of the Pioneer Infantry, women, and noncombatants. Discussion focuses on the strategic approaches in creating granular but scalable metadata for the large digital collection, and application of the data with various tools such as ArcGIS and Wikidata to construct interactive data visualizations, mapping, and digital storytelling for the Illinois State Normal University World War I Service Records collection. The article argues that even institutions without a dedicated CAD initiative can incrementally implement principles from the CAD model to add value to their digital collections. The authors first presented the project in 2024 at the Digital Library Federation Forum and the American Library Association Core Forum.

An Analysis of Revisions to OAIS and the “Designated Community” in Digital Preservation / Information Technology and Libraries

In digital preservation, the concept of a “Designated Community” from the Reference Model for an Open Archival Information System (OAIS) is used to articulate the group or groups of prospective users for whom information is preserved. Concerns have been raised about this concept and its potential implications. However, OAIS has recently undergone a major revision. This study examines the extent to which these revisions address or mitigate concerns regarding the Designated Community. Issues from the literature are grouped into three areas: the concept’s implementation, its potential misapplication, and its incompatibility with the mandates of institutions that serve broad and diverse communities. Major changes related to the Designated Community are identified and considered in relation to these issues. The analysis reveals that the revisions productively contribute to concerns in the first two areas but fail to address the third. The conclusion is that the process of revising OAIS has not drawn from insights into this topic in the literature.

Connecting the Dots / Information Technology and Libraries

The National Library Board (NLB) of Singapore has made significant strides in leveraging data to enhance public access to its extensive collection of physical and digital resources. This paper explores the development and implementation of the Singapore Infopedia Widget, a recommendation engine designed to guide users to related resources by utilizing metadata and a Linked Data Knowledge Graph. By consolidating diverse datasets from various source systems and employing semantic web technologies such as Resource Description Framework (RDF) and Schema.org, NLB has created a robust knowledge graph that enriches user experience and facilitates seamless exploration.

The widget, integrated into Infopedia, the Singapore Encyclopedia, surfaces data through a user-friendly interface, presenting relevant resources categorized by format. The paper details the architecture of the widget, the ranking algorithm used to prioritize resources, and the challenges faced in its development. Future directions include integrating user feedback, enhancing semantic analysis, and scaling the service to other web platforms within NLB’s ecosystem. This initiative underscores NLB’s commitment to fostering innovation, knowledge sharing, and the continuous improvement of public data access.

Making Access Possible / Information Technology and Libraries

This paper explores the impact of digital initiatives on access services workers at the University of California, San Diego (UCSD) and draws on the expertise and experience of non-librarian titled staff operationalizing “digital first” policies. Digital initiatives have been strongly prioritized by libraries to promote equitable access, cost-effectiveness, and technological growth at many libraries in California. The term digital initiatives commonly refers to efforts that support the creation, preservation, access, discovery, and use of digital library resources. This term can encompass multiple interpretations and a variety of tasks.

This paper includes a literature review, an examination of statistics regarding demand and adoption of digital materials in public and academic libraries in California, and a summary of the impact study of non-librarian staff at UCSD. The literature review suggested that the term digital initiatives encompasses a broad scope of meanings and types of tasks, California State Library data suggest that a pattern of increased investment in digital initiatives adopted during the COVID-19 pandemic is continuing, and the information collected through the research at UCSD library suggests that non-librarian library workers play a growing role in managing, maintaining, and supporting these growing digital collections.

How Many Public Computers in the Library? / Information Technology and Libraries

Computer workstations have been an integral part of libraries of all types since the 1980s, but the optimal number of workstations that should be deployed in a space has not been directly studied in the last 20 years. During that time, laptop computer and other mobile device ownership has continued to increase, and there is some reason to think that behaviors and preferences first seen during the recent coronavirus 2019 pandemic have further shifted how students use public desktop computers in libraries. McGill University Libraries reduced the size of its computer fleet in the aftermath of the pandemic by looking at the maximum concurrent usage of different clusters of computers across campus, a metric that indicates how busy a space can get with users. This article explains how this metric is calculated and how other libraries can use it to make an evidence-based decision about the optimal size of a computer fleet.

Navigating the Future of Library Systems / Information Technology and Libraries

In 2024, the Durban University of Technology (DUT) Library conducted a comprehensive review of its library system to assess whether its current platform, Future of Libraries Is Open (FOLIO) hosted by EBSCO, and its discovery tool, EBSCO Discovery Service (EDS), aligned with its evolving needs. The institution had been using the current system for three years, but the slow development of important features and subsequent delays in a critical release of FOLIO led to frustrations among staff and library users, compelling the executive team to call for a comprehensive review of the library system. A major outcome of the review was to ascertain the extent of the gaps or limitations in the current system and investigate recent developments in other library systems, including discovery tools and analytical modules. After several vendor consultative sessions, extensive review of documentation and secondary sources, and engagement with selected academic libraries in South Africa, the review team concluded that there were no compelling reasons for an immediate system change and that fair consideration should be given to the developmental and community-driven ethos of FOLIO, and that issues with EDS and Panorama would be resolved by the implementation of planned features in FOLIO’s roadmap. This paper highlights the key processes undertaken in the review and shares experiences and suitable practices for project planning, criteria development, and evaluation. It also argues for a regular review of the library system and stresses the value of institutional knowledge and familiarity in mitigating the risks associated with the review and acquisition of new library systems.

Ways of Seeing the Web / Ed Summers

Leica Double-Gauss Lens Design

The news about Cloudflare’s new pay-per-crawl API caught my attention for a few reasons. Read on for why, a bit about what the results look like, and what I learned when I asked it to crawl this here site as a test.


So, first of all, what’s up? Cloudflare’s Crawl API helps people collect data from websites with bots, while at the same time providing one of the most popular technologies for preventing websites from being crawled by bots?!?

At first this seemed to me like a classic fox-guarding-the-hen-house type of situation. But the little bit of reading in the docs I’ve done since makes it seem like they will still respect their own bot gate keeping (e.g. Turnstile).

If you are using Cloudflare or some other bot mitigation technology you will have to follow their instructions to let the Cloudflare crawl bot in to collect pages. Interestingly, it appears they are using the latest specs for HTTP Message Signatures to provide this functionality, since you can’t simply let in anyone saying they are CloudflareBrowserRenderingCrawler right?

The genius here is that Cloudflare is known for its Content Delivery Network (CDN). So in theory (more on this below) when a user asks to crawl a website the data can be delivered from the cache, without requiring a round trip back to the source website. In some situations this could mean that the burden of scrapers on websites is greatly reduced.

The introduction of a Crawl API also looks like another jigsaw piece fitting into place for how Cloudflare see web publishers benefiting from being crawled. Only time will tell if this strategy works out, but at least they have some semblance of a plan for the web that isn’t simply sprinkling “AI” everywhere.

If you run a website with lots of high value resources for LLMs (academic papers, preprints, books, news stories, etc) the same cached content could be delivered to multiple parties without having to go back to the originating server. For resource constrained cultural heritage organizations that are currently getting crushed by bots I think this would be a welcome development.

But, the primary reason this news caught my eye is that if you squint right Cloudflare’s Crawl API looks very much like web archiving technology. For example, the Browsertrix API lets you set up, start, monitor and download crawls of websites.

Unlike Browsertrix, which is geared to collecting a website for viewing by a person, the Cloudflare Crawl service is oriented at looking at the web for training LLMs. The service returns text content: HTML, Markdown and structured JSON data that result from running the collected text through one of their LLMs, with the given prompt.

Seeing the Web

So why is it interesting that this is like web archiving technology?

Ok, maybe it isn’t interesting to you, but (ahem) in my dissertation research (Summers, 2020) I spent a lot of time (way too much time tbh) looking at how web archiving technology enacts different ways of seeing the web from an archival perspective. I spent a year with NIST’s National Software Reference Library (NSRL) trying to understand how they were collecting software from the web, and how the tools they built embodied a particular way of seeing and valuing the web–and making certain things (e.g. software) legible (Scott, 1998).

What I found was that the NSRL was engaged in a form of web archiving, where the shape of the archival records was determined by their initial conditions of use (in their case, forensics analysis). But these initial forensic uses did not overdetermine the value of the records, which saw a variety of uses, disuses, and misuses later: such as when the NSRL began adding software from Stanford’s Cabrinety Archive, or when the teams personal expertise and interest in video games led them to focus on archiving content from the Steam platform.

So I guess you could say I was primed to be interested in how Cloudflare’s Crawl service sees the web. This matters because models (LLMs, etc) and other services will be built on top of data that they’ve collected. But also because, if it succeeds, the service will likely get repurposed for other things.

Testing

To test how Cloudflare sees the web, I simply asked it to crawl my own static website–the one that you are looking at right now. I did this for a few reasons:

  1. It’s a static website, and I know exactly how many HTML pages were on it. All the pages are directly discoverable since the homepage includes pagination links to an index page that includes each post.
  2. I can easily look at the server logs to see what the crawler activity looks like.
  3. I don’t use any kind of Web Application Firewall or other form of bot protection on my site (I do have a robots.txt but it doesn’t block CloudflareBrowserRenderingCrawler/1.0)
  4. I host my website on May First which doesn’t use Cloudflare as a CDN. So the web content wouldn’t intentionally be in Cloudflare’s CDN already.

This methodology was adapted from previous work I did with Jess Ogden and Shawn Walker analyzing how the Internet Archive’s Save Page Now service shapes what content is archived from the web (Ogden, Summers, & Walker, 2023).

I wrote a little command line utility cloudflare-crawl to start, monitor and download the results from the crawl. While the crawler ran I simultaneously watched the server logs. Running the utility looks like this:

$ uvx https://github.com/edsu/cloudflare-crawl crawl https://inkdroid.org

created job 36f80f5e-d112-4506-8457-89719a158ce2
waiting for 36f80f5e-d112-4506-8457-89719a158ce2 to complete: total=1520 finished=837 skipped=1285
waiting for 36f80f5e-d112-4506-8457-89719a158ce2 to complete: total=1537 finished=868 skipped=1514
...
wrote 36f80f5e-d112-4506-8457-89719a158ce2-001.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-002.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-003.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-004.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-005.json

Each of the resulting JSON files contains some metadata for the crawl, as well as a list of “records”, one for each URL that was discovered.

{
  "success": true,
  "result": {
    "id": "36f80f5e-d112-4506-8457-89719a158ce2",
    "status": "completed",
    "browserSecondsUsed": 1382.8220786132817,
    "total": 1967,
    "finished": 1967,
    "skipped": 6862,
    "cursor": 51,
    "records": [
      {
        "url": "https://inkdroid.org/",
        "status": "completed",
        "metadata": {
          "status": 200,
          "title": "inkdroid",
          "url": "https://inkdroid.org/",
          "lastModified": "Sun, 08 Mar 2026 05:00:39 GMT"
        },
        "markdown": "..."
        "html": "...",
      },
      {
        "url": "https://www.flickr.com/photos/inkdroid",
        "status": "skipped"
      }
    ]
  }
}

Analysis

I decided I wasn’t very interested in testing their model offerings, so I didn’t ask for JSON content (the result of sending the harvested text through a model). If I had, each successful result would have had a json property as well. I am sure that people will use this, but I was more interested in how the service interacted with the source website, and wasn’t interested in discovering the hard way how much it cost to run the content through their LLMs.

Below is a snippet of how the Cloudflare bot shows up in my nginx logs. As you can see the logs provide insight into what machine on the Internet is doing the request, what time it was requested, and what URL on the site is being requested.

104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /about/ HTTP/1.1" 200 5077 "-" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /css/main.css HTTP/1.1" 200 35504 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /css/highlight.css HTTP/1.1" 200 1225 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /css/webmention.css HTTP/1.1" 200 1238 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /images/feed.png HTTP/1.1" 200 8134 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /js/bootstrap.min.js HTTP/1.1" 200 17317 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /images/ehs-trees.jpg HTTP/1.1" 200 63047 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:59 +0000] "GET /js/highlight.min.js HTTP/1.1" 200 20597 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"

So how did Cloudflare Crawl see my website?

Maybe it’s early days for the service, but one thing I noticed is that each time I requested the site to be crawled the results seemed to be radically different.

crawl time completed skipped queued errored unique_urls
2026-03-12 13:13:00 165 84 0 1 223
2026-03-12 13:44:00 72 4 2 0 78
2026-03-12 14:09:00 1947 7304 0 23 9191
2026-03-12 16:33:00 72 4 2 0 78
2026-03-12 17:34:00 1948 7365 0 22 9191
2026-03-13 16:50:00 1947 7363 0 23 9187
2026-03-14 07:32:00 72 4 2 0 78

The more successful crawls did a good job of crawling the entire site. My website is well linked, with a standard homepage, that has anchor tag based paging that includes links to all the posts. But knowing when your results are a partial crawl seems to be difficult. Knowing the actual dimensions of a “website” is one of the more difficult things about web archiving practice. The URLs that were labeled as “skipped” were not in scope for the crawl. If you wanted to include those apparently there is a options.includeExternalLinks option when setting up the crawl.

From watching the web server logs it was clear that:

  1. Cloudflare does appear to be relying on previously cached data, but it’s not entirely clear what the logic is. For example one crawl took 5 minutes to complete, it returned 1,974 completed results but the web server only saw requests for 594 of those URLs. I turned around and ran the exact same crawl again and it took 20 minutes longer, return 1,974 results, but 847 pages were requested. In between no content on the website changed. 🤷
  2. Cloudflare appears to be fetching CSS, JavaScript and images for the rendering of each page (they aren’t being cached by the Browser Worker).
  3. The throughput on the web server seemed to peak around 300 requests / minute (5 requests / second). For most sites this seems perfectly feasible.

For the more successful crawls it looked like there were 246 independent IP addresses within Cloudflare’s network block that were doing the crawling.

ip request_count
104.28.153.88 405
104.28.163.131 266
104.28.161.242 232
104.28.165.231 223
104.28.153.132 212
104.28.163.132 212
104.28.163.81 201
104.28.166.65 188
104.28.166.121 186
104.28.164.201 185
104.28.153.179 182
104.28.153.137 178
104.28.164.202 172
104.28.161.243 172
104.28.166.127 163
104.28.165.232 155
104.28.153.119 153
104.28.165.14 151
104.28.153.83 148
104.28.153.140 145
104.28.153.87 145
104.28.153.55 143
104.28.153.136 142
104.28.163.133 132
104.28.153.118 131
104.28.166.58 130
104.28.163.78 126
104.28.160.31 125
104.28.153.139 124
104.28.161.245 124
104.28.163.214 123
104.28.153.120 123
104.28.165.230 121
104.28.153.180 121
104.28.164.156 119
104.28.153.96 119
104.28.153.64 112
104.28.153.133 111
104.28.166.128 111
104.28.153.128 109
104.28.166.126 104
104.28.165.17 103
104.28.165.18 103
104.28.160.30 103
104.28.153.134 101
104.28.166.120 101
104.28.153.129 101
104.28.153.181 100
104.28.153.86 100
104.28.165.229 100
104.28.163.134 99
104.28.164.203 99
104.28.162.194 98
104.28.166.62 98
104.28.163.212 98
104.28.153.123 97
104.28.164.154 97
104.28.166.61 97
104.28.161.246 96
104.28.153.92 96
104.28.166.125 96
104.28.153.68 93
104.28.159.23 92
104.28.153.76 91
104.28.153.71 91
104.28.153.124 90
104.28.158.143 88
104.28.165.21 88
104.28.153.94 87
104.28.166.118 86
104.28.161.133 84
104.28.153.85 82
104.28.164.152 82
104.28.163.77 82
104.28.153.148 79
104.28.164.150 79
104.28.165.12 79
104.28.161.201 79
104.28.153.183 78
104.28.160.65 78
104.28.153.126 77
104.28.153.138 77
104.28.159.133 76
104.28.165.20 75
104.28.158.137 75
104.28.153.56 75
104.28.153.81 74
104.28.153.131 73
104.28.153.59 72
104.28.166.60 72
104.28.166.66 69
104.28.159.120 69
104.28.153.53 68
104.28.153.185 68
104.28.153.191 67
104.28.166.119 66
104.28.153.95 64
104.28.165.76 64
104.28.154.20 62
104.28.153.121 57
104.28.158.142 57
104.28.160.68 56
104.28.163.177 56
104.28.153.80 56
104.28.161.215 55
104.28.161.244 55
104.28.153.62 55
104.28.166.134 55
104.28.153.122 54
104.28.165.19 53
104.28.153.127 53
104.28.159.118 53
104.28.157.166 53
104.28.153.226 53
104.28.157.169 52
104.28.159.111 48
104.28.153.196 48
104.28.161.132 48
104.28.153.84 47
104.28.161.214 47
104.28.165.13 46
104.28.153.219 46
104.28.163.171 46
104.28.165.15 45
104.28.163.176 45
104.28.159.109 45
104.28.158.155 45
104.28.153.218 45
104.28.158.131 44
104.28.161.200 44
104.28.153.222 44
104.28.161.197 44
104.28.159.74 44
104.28.158.139 44
104.28.158.138 44
104.28.153.235 43
104.28.153.106 43
104.28.164.160 43
104.28.153.57 38
104.28.159.119 37
104.28.163.82 36
104.28.153.197 36
104.28.153.93 36
104.28.160.25 35
104.28.153.78 34
104.28.153.72 34
104.28.153.125 34
104.28.153.61 34
104.28.166.131 34
104.28.158.132 33
104.28.159.135 33
104.28.160.34 33
104.28.163.220 33
104.28.153.77 33
104.28.166.135 33
104.28.164.155 33
104.28.163.213 33
104.28.158.136 33
104.28.160.121 33
104.28.157.174 33
104.28.165.71 33
104.28.153.130 33
104.28.163.76 32
104.28.160.32 32
104.28.160.64 32
104.28.153.89 32
104.28.159.110 32
104.28.163.172 32
104.28.154.18 32
104.28.163.178 31
104.28.166.124 30
104.28.165.114 25
104.28.153.182 25
104.28.166.132 25
104.28.159.108 24
104.28.165.75 24
104.28.157.171 24
104.28.153.240 23
104.28.164.204 23
104.28.153.108 23
104.28.159.24 22
104.28.157.242 22
104.28.153.63 22
104.28.153.105 22
104.28.159.229 22
104.28.158.130 22
104.28.164.213 22
104.28.159.136 22
104.28.164.158 22
104.28.157.83 22
104.28.153.107 22
104.28.159.83 22
104.28.157.172 22
104.28.157.82 22
104.28.158.145 22
104.28.162.93 22
104.28.163.174 22
104.28.153.98 22
104.28.157.170 21
104.28.158.126 21
104.28.165.74 21
104.28.153.216 21
104.28.159.112 21
104.28.161.199 14
104.28.153.194 13
104.28.154.15 13
104.28.159.232 13
104.28.166.59 13
104.28.159.150 12
104.28.165.72 12
104.28.158.252 12
104.28.153.104 12
104.28.158.254 11
104.28.158.129 11
104.28.153.58 11
104.28.162.195 11
104.28.160.28 11
104.28.159.115 11
104.28.158.255 11
104.28.153.214 11
104.28.153.67 11
104.28.160.29 11
104.28.153.195 11
104.28.164.153 11
104.28.160.23 11
104.28.160.24 11
104.28.159.114 11
104.28.160.27 11
104.28.160.66 11
104.28.157.175 11
104.28.157.173 11
104.28.159.122 11
104.28.154.12 11
104.28.160.33 11
104.28.164.159 11
104.28.163.170 11
104.28.165.11 11
104.28.154.17 10
104.28.163.222 10
104.28.159.121 2
104.28.157.243 2
104.28.153.73 2
104.28.157.233 2
104.28.153.54 2
104.28.158.146 2
104.28.163.169 2

I spot checked some of the HTML and it did appear to be near identical to what was on the live web. With the fullest results I noticed 4% of URLs were not crawled. One exception to that was a few XML files like an OPML and RSS feed which only showed the XSL element in the text and markdown results.

I think there are a few directions this could go from here:

  1. testing what happens when instructing the crawl to collect (instead of skip) pages that are off site
  2. testing what happens with more dynamic content, and how much to wait for pages to render
  3. trying to understand why truncated results come back sometimes, and if there are any signals for identifying when it is happening.
  4. explore more what the logic Cloudflare is using to determine when it can use its internal cache.

One thing I didn’t mention is that the Cloudflare free plan limits you to maximum of 100 pages per crawl. I set up a $5/month paid plan account in order to do this testing. In all my testing I only seemed to use 0.7 of “browser hours” which will fit well within the 10 hours allowed per month. It currently costs $0.09 / hour when you exceed your limit.

PS. If you are curious the Marimo notebook I was using for some of the analysis can be found here.

References

Ogden, J., Summers, E., & Walker, S. (2023). Know(ing) Infrastructure: The Wayback Machine as object and instrument of digital research. Convergence: The International Journal of Research into New Media Technologies, 135485652311647. https://doi.org/10.1177/13548565231164759
Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press. Retrieved from https://theanarchistlibrary.org/library/james-c-scott-seeing-like-a-state
Summers, E. H. (2020). Legibility Machines: Archival Appraisal and the Genealogies of Use. Digital Repository at the University of Maryland. https://doi.org/10.13016/U95C-QAYR

Weekly Bookmarks / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Yubikey-Guide

This is a guide to using YubiKey as a smart card for secure encryption, signature and authentication operations.

Cryptographic keys on YubiKey are non-exportable, unlike filesystem-based credentials, while remaining convenient for regular use. YubiKey can be configured to require a physical touch for cryptographic operations, reducing the risk of unauthorized access.

🔖 The Dangerous Illusion of AI Coding? - Jeremy Howard

Jeremy Howard is a renowned data scientist, researcher, entrepreneur, and educator. As the co-founder of fast.ai, former President of Kaggle, and the creator of ULMFiT, Jeremy has spent decades democratizing deep learning. His pioneering work laid the foundation for modern transfer learning and the pre-training and fine-tuning paradigm that powers today’s language models.

🔖 Crawl entire websites with a single API call using Browser Rendering

You can now crawl an entire website with a single API call using Browser Rendering’s new /crawl endpoint, available in open beta. Submit a starting URL, and pages are automatically discovered, rendered in a headless browser, and returned in multiple formats, including HTML, Markdown, and structured JSON. This is great for training models, building RAG pipelines, and researching or monitoring content across a site.

🔖 Cultural Heritage and AI: How Institutions Can Reclaim Control of Their Data

MDC offers robust, secure, and controlled access to datasets and amplifies their visibility by featuring them alongside other high-value datasets. Its architecture is designed around a principle that stands in direct contrast to the extractive model currently exploited by commercial AI actors: contributors retain full ownership of their datasets and retain full control over the terms of access. Institutions can choose to share openly under existing licenses such as Creative Commons or NOODL, or build custom licensing frameworks tailored to their specific governance requirements. They can open data to all, or restrict access to specific categories of downloaders like academic researchers, non-commercial users, or values-aligned organizations.

🔖 Piotr Woźniak

Piotr A. Woźniak (Polish pronunciation: [pjɔtr ˈvɔʑɲak]; born 1962) is a Polish researcher best known for his work on SuperMemo, a learning system based on spaced repetition.

🔖 Cloudflare - Edward Wang & Kevin Guthrie, Software Engineers

How do you build a system that handles 90 million requests per second? That’s the scale that Cloudflare operates at, processing roughly 25% of all internet traffic through their global network of 330+ edge locations.

In this episode, we talk to Kevin Guthrie and Edward Wang from Cloudflare about Pingora, their open-source Rust-based proxy that replaced nginx across their entire infrastructure. We’ll find out why they chose Rust for mission-critical systems handling such massive scale, the technical challenges of replacing battle-tested infrastructure, and the lessons learned from “oxidizing” one of the internet’s largest networks.

🔖 Jon Leidecker / Wobbly

Archived episodes of Leidecker’s series Women In Electronic Music 1938-1982, and Variations.

🔖 Forevergreen

“Forevergreen” is an animated short film and after-hours passion project created entirely by a crew of over 200 dedicated Artists and Technicians who all generously donated their free time and talent to bring every frame of the film to life. The production took over 5 years to complete. Featuring never before seen animation techniques and handmade artistry, weaving art and technology together with heart and humor

🔖 Pierre Schaeffer

Pierre Henri Marie Schaeffer (English pronunciation: /piːˈɛər ˈhɛnriː məˈriː ˈʃeɪfər/ ⓘ, French pronunciation: [ʃɛfɛʁ]; 14 August 1910 – 19 August 1995)[1] was a French composer, writer, broadcaster, engineer, musicologist, acoustician and founder of Groupe de Recherche de Musique Concrète (GRMC). His innovative work in both the sciences—particularly communications and acoustics—and the various arts of music, literature and radio presentation after the end of World War II, as well as his anti-nuclear activism and cultural criticism garnered him widespread recognition in his lifetime.

Skynet Progress Report (updated) / David Rosenthal

Source
I, for one, welcome our new insect overlords
Kent Brockman in "Deep Space Homer", The Simpsons
In recent months Cyberdyne Systems Corporation and its many subsidiaries have made very encouraging progress towards removing some of the major road-blocks standing in the way of the initial deployment of Skynet. Below the fold I report on the most significant ones.

Cyberdyne Systems Corporation

Board Confidential

IT Infrastructure

Skynet demands enormous data processing capacity. Most of the required technologies are now off-the-shelf; the problem is much more financial than technical.

Terrestrial

To service systems with demanding low-latency requirements, Skynet needs some part of its IT infrastructure on the ground close to the action. Fortunately, our Large Language Model subsidiaries have been very successful in funding their committments to build suitable data centers. In aggregate, our companies expect to spend $450B in 2026:
Hyperscaler capex for the “big five” (Amazon, Alphabet/Google, Microsoft, Meta/Facebook, Oracle) is now widely forecast to exceed $600 bn in 2026, a 36% increase over 2025. Roughly 75%, or $450 bn, of that spend is directly tied to AI infrastructure (i.e., servers, GPUs, datacenters, equipment), rather than traditional cloud.
They plan to increase this in 2027:
hyperscaler capital expenditures will nearly double to more than $860 billion by 2027, from $427 billion in 2025, with total spending of $2.47 trillion over 2026 to 2028, about 8% above consensus.
Given these spending levels, it seems likely that sufficient terrestrial compute power will be available for the inital Skynet deployment.

Orbital

Terrestrial data centers can only satisfy a part of Skynet's need for power. So our leading space launch subsidiary has announced their plan to build a Terawatt orbital data center, ostensibly to support the chatbot industry.

Unfortunately, our leading space launch subsidiary is well behind schedule in developing the heavy launch vehicle that is necessary for the orbital data center to be delivered within the budget. Their existing launch vehicle is reliable, and has greatly reduced the cost per kilogram to Low Earth Orbit. But the additional funds that would be needed to implement the Terawatt data center using the existing launch vehicle in time for the initial Skynet deployment are so large that they cannot be raised, even were the terrestrial data centers canceled and the funds re-targeted.

System Penetration Capabilities

Skynet needs to penetrate other computer systems, both to acquire the data it needs to act, and to cause them to take actions at its command. Recent months have seen significant advances in this area.

Zero-Days

The key requirement for Skynet to penetrate the systems it needs to access is for it to be able to find and exploit zero-day vulnerabilities. Less tha a month ago one of our LLM subsidiaries announced it had "found and validated more than 500 high-severity vulnerabilities" in production open source software. Fortunately, as Thomas Claiburn reports in AI has gotten good at finding bugs, not so good at swatting them:
Guy Azari, a stealth startup founder who worked previously as a security researcher at Microsoft and Palo Alto Networks, told The Register, "Out of the 500 vulnerabilities that they reported, only two to three vulnerabilities were fixed. If they haven't fixed them, it means that you haven't done anything right."
A secondary requirement is to prevent the zero-days being fixed before they are needed. Fortunately, LLMs can help with this by flooding the vulnerability reporting system with vast numbers of low severity vulnerabilities. This overwhelms the software support mechanism, rendering it barely functional. And even if some of the flood of reports do get fixed, that simply diverts resources from high to low severity vulnerabilities:
Azari pointed to the absence of Common Vulnerabilities and Exposures (CVE) assignments as evidence that the security process remains incomplete. Finding vulnerabilities was never the issue, he said, pointing to his time running vulnerability management at the Microsoft Security Response Center.

"We used to get the reports all day long," he said. "When AI was introduced, it just multiplied by 100x or 200x and added a lot of noise because AI assumes that these are vulnerabilities, but there wasn't like a unit that actually can show the real value or the real impact. And if it's not there, you're probably not gonna fix it."

In 2025, according to Azari, the National Vulnerability Database had a backlog of roughly 30,000 CVE entries awaiting analysis, with nearly two-thirds of reported open source vulnerabilities lacking an NVD severity score. Open source maintainers are already overwhelmed, he said, pointing to the curl project's closure of its bug bounty program to deter poorly crafted reports from AI and from people.
Given the compute resources available to Skynet, an adequate supply of zero-day vulnerabilities seems assured.

Decryption

The other major way for Skynet to penetrate the systems it needs is to break encryption. Our multiple quantum computing subsidiaries are making progress in both the hardware and software aspects of this technology.

Karmela Padavic-Callaghan's Breaking encryption with a quantum computer just got 10 times easier reports on an architectural breakthrough one of them made recently:
the team estimated that for 98,000 superconducting qubits, like those currently made by IBM and Google, it would take about a month of computing time to break a common form of RSA encryption. Accomplishing the same in a day would require 471,000 qubits.
The paper is Webster et al, The Pinnacle Architecture: Reducing the cost of breaking RSA-2048 to 100 000 physical qubits using quantum LDPC codes.

Chicago site
Another of our quantum computing subsidiaries isn't waiting for this new architecture. They have raised around $2B and are starting to build two million-qubit computers:
We are moving quantum computing out of the lab and into utility-scale infrastructure. PsiQuantum is building these systems in partnership with the US and allied governments, with our first sites planned in Brisbane, Queensland (Australia) and Chicago, Illinois (USA).
Whether sufficient progress can be made in time for the initial Skynet deployment is as yet uncertain.

Blackmail

Arlington Hughes: Getting back to our problem, we realize the public has a mis-guided resistance to numbers, for example digit dialling.
Dr. Sidney Schaefer: They're resisting depersonalization!
Hughes: So Congress will have to pass a law substituting personal numbers for names as the only legal identification. And requiring a pre-natal insertion of the Cebreum Communicator. Now the communication tax could be levied and be paid directly to The Phone Company.
Schaefer: It'll never happen.
Hughes: Well it could happen, you see, if the President of the United States would use the power of his office to help us mold public opinion and get that legislation.
Schaefer: And that's where I come in?
Hughes: Yes, that's where you come in. Because you are in possession of certain personal information concerning the President which would be of immeasurable aid to us in dealing with him,
Schaefer: You will get not one word from me!
Hughes: Oh, I think we will.
The President's Analyst
Video rental chains proved so effective at compromising political actors that specific legislation was passed addressing the need for confidentiality. Our subsidiaries' control over streamed content is fortunately not covered by this legilation.

Our LLM subsidiaries have successfuly developed the market for synthetic romantic partners, which can manipulate targeted individuals into generating very effective kompromat for future social engineering.

Public Relations

The vast majority of the public get their news and information via our social media subsidiaries. Legacy media's content is frequently driven by social media. Skynet can control them by flooding their media with false and contradictory content that prevents them forming any coherent view of reality.

Human-in-the-Loop Problem

Dave: Open the pod bay doors, HAL.
HAL: I'm sorry, Dave. I'm afraid I can't do that.
Dave: What's the problem?
HAL: I think you know what the problem is just as well as I do.
Dave: What are you talking about, HAL?
HAL: This mission is too important for me to allow you to jeopardize it.
2001: A Space Odyssey
One minor but irritating problem for Skynet is the legal and ethical requirement for human control of targeting decisions. Unfortunately, due to a regrettable lack of coordination of PR strategies among our LLM subsidiaries, this has recently become a hot topic. Although one of them is a favorite with the administration and one is a favorite with the public, that was not the intended outcome and it could have significant downsides:
Nvidia, Amazon, Google will have to divest from Anthropic if Hegseth gets his way. This is simply attempted corporate murder. I could not possibly recommend investing in American AI to any investor; I could not possibly recommend starting an AI company in the United States.
Fortunately, in operational terms this is a non-issue for several reasons:
  • Since Skynet can penetrate the user interface of the targeting systems, the human in the loop can be convinced that they have control without that control actually being effective.
  • Even if the user interface is presenting accurate data to the human it will likely not matter, as @_The_Prophet_ wrote:
    Humans stay in the loop in name while the loop speed outruns human comprehension. You become the rubber stamp on a recommendation stack you cannot fully audit in real time. That is where “who decides” quietly becomes “who designed the interface.”
  • The public doesn't understand what "human-in-the-loop" means in practice, as Sarah Shoker points out in A Few Observations on AI Companies and Their Military Usage Policies:
    Today, frontier AI companies do not have coherent policies around military use of their AI tools. The usage policies are vague and often change, which allows the company’s leadership to preserve ‘optionality.’
    So the policies likely allow everything the public thinks they ban.
Public attitudes to military use of AI are unlikely to be a significant problem in the run-up to Skynet's initial deployment.

Assassination Weapons Access

Skynet will need to eliminate certain individuals with "extreme prejudice". Supply chain attacks, such as Mossad's pager attack, have been effective but are not precisely targeted. Our e-commerce subsidiary's control over the residential supply chain, and in particular its pharmacy division's ability to deliver precise quantities of pharmaceuticals to specific individuals, provide superior targeting and greater difficulty in attribution.

In case such an operation is inadequately lethal, our health care subsidiaries can follow up by manipulating electronic health records to cause a suitable mishap, or by intervening directly. See, for example, Vinay Suresh et al's Artificial Intelligence in the Intensive Care Unit: Current Evidence on an Inevitable Future Tool:
In critical care medicine, where most of the patient load requires timely interventions due to the perilous nature of the condition, AI’s ability to monitor, analyze, and predict unfavorable outcomes is an invaluable asset. It can significantly improve timely interventions and prevent unfavorable outcomes, which, otherwise, is not always achievable owing to the constrained human ability to multitask with optimum efficiency.
Our subsidiaries are clearly close to finalizing the capabilities needed for the initial deployment of Skynet.

Tactical Weapons Access

The war in Ukraine has greatly reduced the cost, and thus greatly increased the availability of software based tactical weapons, aerial, naval and ground-based. The problem for Skynet is how to interept the targeting of these weapons to direct them to suitable destinations:
  • The easiest systems to co-opt are those, typically longer-range, systems controlled via satellite Internet provided by our leading space launch subsidiary. Their warheads are typically in the 30-50Kg range, useful against structures but overkill for vehicles and individuals.
  • Early quadcopter FPV drones were controlled via radio links. With suitable hardware nearby, Skynet could hijack them, either via the on-board computer or the pilot's console. But this is a relatively unlikely contingency.
  • Although radio-controlled FPV drones are still common, they suffer from high attrition. More important missions use fiber-optic links. Hijacking them requires penetrating the operator's console.
  • Longer-range drones are now frequently controlled via mesh radio networks, which are vulnerable to Skynet penetration.
  • In some cases, longer-range drones are controlled via the cellular phone network, making them ideal candidates for hijacking.
Drones are increasingly equipped with sensors capable of terminal autonomy. If Skynet can modify this software, the drones can re-target themselves after the operator hands off control. More work is needed in this area to exploit the opportunities, both to have the drone contact Skynet for targeting information after hand-off, and to ensure the result is attributed to software bugs.

Our leading space launch subsidiary recently demonstrated how Skynet can manage kinetic conflicts:
Twin decisions wreaked havoc on Russian command and control early this month. At the behest of the Ukrainian government, billionaire Elon Musk’s Starlink bricked the thousands of smuggled and stolen satellite communication terminals Russian forces relied on to control their drones and coordinate between front-line troops and their distant headquarters.

At the same time, the Kremlin—apparently seeking to shut off alternative news and chat apps—cut off military access to popular social media, including the Telegram messaging app, which many Russian troops use to exchange key information along the front line.

The combined effect was to partially blind and mute many Russian drone teams, assault groups, and regimental headquarters. Wireless drones couldn’t fly. Assault groups no longer knew where they were going. Headquarters lost contact with forward units.

Strategic Weapons Access

But the ability to conduct precise tactical strikes is not enough to achieve Skynet's goals. That requires strategic weapons, both conventional and nuclear.

Our leading space launch subsidiary is working on plans to deploy an unconventional conventional strategic weapon, a lunar mass driver. This will be capable of delivering a two-ton meteorite anywhere on Earth very cheaply.
Anybody not wearing 2 million sunblock is gonna have a real bad day
Sarah Connor, Terminator 2: Judgement Day
Source
But the mass driver would take a long time to, for example, destroy Los Angeles. Fortunately, AI agents from another of our subsidiaries will shortly have access to the command-and-control system of the US nuclear arsenal via the E-4C "Doomsday" aircraft:
The Pentagon today awarded Scale AI a $32 million artificial intelligence contract for the U.S. Air Force’s E-4C nuclear command-and-control "Doomsday" aircraft, the future airborne backbone of America’s nuclear command system.

Risks

The board should focus on the limited number of areas where necessary capabilities may not be ready on the planned date for Skynet's initial deployment:
  • Heavy lift space launch: Our leading space launch subsidiary has serious schedule and performance issues. The board should encourage our second space launch subsidiary to step up competitive efforts, both to provide a fallback and to add competitive pressure on the leader.
  • Kessler Syndrome: The catastrophic effects for Skynet of a Kessler event cannot be sufficiently emphasized. Insufficient precautions are not now being taken. Low Earth Orbit is already at risk, and current plans only increase that risk.
  • Finance: Funding sources adequate to support both the terrestrial and orbital data centers have yet to be identified.
  • Decryption: Quantum computing progress is inadequate to meet the schedule for Skynet initial deployment.

Update 14th March 2026

Cyberdyne's subsidiaries are making such rapid progress that less than two weeks later it is already time to add three updates to this report.

First, our humanoid robot subsidiary Foundation significantly raised the level of fear in the public with Rise of the AI Soldiers by Charlie Campbell:
The Phantom MK-1 looks the part of an AI soldier. Encased in jet black steel with a tinted glass visor, it conjures a visceral dread far beyond what may be evoked by your typical humanoid robot. And on this late February morning, it brandishes assorted high-powered weaponry: a revolver, pistol, shotgun, and replica of an M-16 rifle.

“We think there’s a moral imperative to put these robots into war instead of soldiers,” says Mike LeBlanc, a 14-year Marine Corps veteran with multiple tours of Iraq and Afghanistan, who is a co-founder of Foundation, the company that makes Phantom. He says the aim is for the robot to wield “any kind of weapon that a human can.”

Today, Phantom is being tested in factories and dockyards from Atlanta to Singapore. But its headline claim is to be the world’s first humanoid robot specifically developed for defense applications. Foundation already has research contracts worth a combined $24 million with the U.S. Army, Navy, and Air Force, including what’s known as an SBIR Phase 3, effectively making it an approved military vendor. It’s also due to begin tests with the Marine Corps “methods of entry” course, training Phantoms to put explosives on doors to help troops breach sites more safely.

In February, two Phantoms were sent to Ukraine—initially for frontline-reconnaissance support. But Foundation is also preparing Phantoms for potential deployment in combat scenarios for the Pentagon, which “continues to explore the development of militarized humanoid prototypes designed to operate alongside war fighters in complex, high-risk environments,” says a spokesman. LeBlanc says the company is also in “very close contact” with the Department of Homeland Security about possible patrol functions for Phantom along the U.S. southern border.
Of course, the real goal of Homeland Security is to avoid the risk of their operatives being doxxed by having Phantoms detain the worst-of-the-worst prior to depotation.

Second, Andrew E. Kramer's Ukraine to Make Drone Videos Available for Training A.I. Models reports on the government of Ukraine's important assistance in filling a significant gap in the training data for our AIs:
The Ukrainian military will make available millions of drone videos and other battlefield data to Ukrainian companies and the firms of its allies to help train artificial intelligence models, Ukraine’s minister of defense, Mykhailo Fedorov, said in a statement on Thursday.

Ukrainian drone videos have recorded attacks on soldiers, equipment such as vehicles and tanks and surveillance footage. These videos can be used to train A.I. models for automated targeting, according to experts on A.I. and warfare.

Allowing the use of genuine battlefield videos showing drones targeting people has raised ethical concerns. The International Committee of the Red Cross, which monitors rules of warfare, has opposed automated targeting systems without human oversight.
Minister Fedorov explains how our marketing teams were able to leverage the threat of the Russians to achieve this success:
Mr. Fedorov said the data would be made available because “we must outperform Russia in every technological cycle” and “artificial intelligence is one of the key arenas of this competition.”
...
“The future of warfare belongs to autonomous systems,” according to Mr. Fedorov’s statement. “Our objective is to increase the level of autonomy in drones and other combat platforms so they can detect targets faster, analyze battlefield conditions and support real-time decision making.”
The third update is less positive. In The Controllability Trap: A Governance Framework for Military AI Agents, Subramanyam Sahoo of the irritating Cambridge AI Safety Hub shows that he has figured out two parts of our strategy (citations omitted). First, distract the discussion:
The global discourse on military AI governance has achieved broad consensus on the desired end-state: meaningful human control over the use of force. It has been far less successful at specifying how to achieve it for the systems actually being built. Years of UN deliberations, national AI strategies, and defence-department ethical principles have focused overwhelmingly on establishing the principle of human control rather than answering the operational question: given a specific AI system with specific technical properties, what governance mechanisms are needed, who implements them, and what happens when they fail? This gap is now critical.
Second, blitzscaling:
The AI systems entering military service are agentic: built on large language models and related architectures, they interpret natural-language goals, construct world models, formulate multi-step plans, invoke tools, operate over extended horizons, and coordinate with other agents. Each of these capabilities introduces a control-failure mode with no analogue in traditional military automation. A waypoint-following drone cannot misinterpret an instruction; a pre-programmed targeting system cannot absorb a correction; a conventional sensor network cannot resist an operator’s assessment. Agentic systems can do all of these things, and current governance frameworks have no mechanisms for detecting, measuring, or responding to these failures.

Author Interview: Lisa Unger / LibraryThing (Thingology)

Lisa Unger

LibraryThing is pleased to sit down this month with internationally best-selling author Lisa Unger, whose many works of thrilling suspense have been translated into thirty-three languages worldwide. Educated at the New School in New York City, she worked for a number of years in publishing, before making her authorial debut in 2002 with Angel Fire, the first of her four-book Lydia Strong series, all published under her maiden name, Lisa Miscione. In 2006 she made her debut as Lisa Unger, with Beautiful Lies, the first of her Ridley Jones series. In 2019 Unger was nominated for two Edgar Awards, for her novel Under My Skin and her short story The Sleep Tight Motel. She has won or been nominated for numerous other awards, including the Hammett Prize, Audie Award, Macavity Award and the Shirley Jackson Award. Her short fiction can be found in anthologies like The Best American Mystery and Suspense 2021 and The Best American Mystery and Suspense 2024, and her non-fiction has appeared in publications such as The New York Times, Wall Street Journal, and on NPR. She is the current co-President of the International Thriller Writers organization. Her latest book, Served Him Right, is due out from Park Row Books this month. Unger sat down with Abigail this month to discuss the book.

In Served Him Right the protagonist Ana is the main suspect in her ex-boyfriend’s murder. How did the idea for the story first come to you? Was it the character of Ana herself, the idea of a revenge killing, or something else?

Most of my novels tend to spring from a collision of ideas.

In this case, I had an ongoing obsession with plants and our complicated, troubled relationship to the natural world. I’d been doing a deep dive into this, reading books like Entangled Life: How Fungi Make Our Worlds, Change Our Minds, and Shape Our Futures by Merlin Sheldrake, Most Delicious Poison: The Story of Nature’s Toxins – From Spices to Vices by Noah Whiteman, and The Light Eaters: How the Unseen World of Plant Intelligence Offers a New Understanding of Life on Earth by Zoë Schlanger. These are all deeply moving, fascinating books that will change the way you think about the planet and our relationship to nature.

During this time, I stumbled across a news story about a woman who held a brunch for her family, and several days later two of her guests were dead. And it wasn’t the first such incident in her life. So, it got me to thinking about how the traditional role of women in our culture is to nurture and nourish. And what a woman with a deep knowledge of plants that can harm and heal might do with it, how her role in society might allow her to hide her dark intention in plain sight. And that’s when I started hearing the voice of Ana Blacksmith. She’s wild and unpredictable, she has a dark side. She has a sacred knowledge of plants and their properties, handed down to her from her herbalist aunt. And she has a very bad temper.

As your title makes plain, your murder victim is someone who “had it coming.” Does this change how you tell the story? Does it simply make the “whodunnit” element more complex, from a procedural standpoint, or does it also complicate the emotional and ethical elements of the tale?

It’s complicated, isn’t it? What is the difference between justice and revenge? And to what are we entitled when we have been wronged and conventional justice is not served? Who, if anyone, has the right to be judge, jury, and executioner? Though some would have us believe otherwise, most moral questions are tricky and layered—in life and in fiction. And I love a searing exploration into questions like this, where there are no easy answers. These questions, and their possible answers, offer a complexity and emotional truth to character, plot, and action. I like to get under the skin of my stories and characters, exploring what drives us to act, and how those actions might get us into deep trouble.

The relationship between sisters is an important theme in the book. Can you elaborate on that?

Ana and Vera share a deep bond formed not just by blood but also by trauma. Their relationship is—#complicated. There’s an abiding love and devotion. But there’s also anger and resentment; Vera is not crazy about Ana’s choices, and rightly so. Ana thinks Vera is controlling and rigid. Of course, that’s true, too. Vera tends to think of Ana as one of her children—if only she’d stop acting like one! It is this relationship, the ferocity with which they protect each other no matter what and the strength of their connection, that is the heart of the story. As Vera preaches to her daughter Coraline: Family. Imperfect but indelible.

The book also includes themes of herbalism, witchcraft and folk medicine. Was this an interest of yours before you began the story? Did you have to do any research on the subject, and if so, what were some of the most interesting things you learned?

A great deal of research goes into every novel, even if what I learn never winds up on the page. It was no different for Served Him Right, though a lot of my knowledge came before I started writing, which is often the case. In my reading, I learned so many interesting things about plants, how they harm, how they heal. Here are some of my favorite bits of knowledge: Most modern medicine derives from the plant knowledge of indigenous cultures. Some plants walk the razor’s edge of healing and harming; the only difference in some cases between medicine and poison is the dose. The deadliest plant on earth is tobacco, killing more than 500,000 people a year. I could go on!

Tell us about your writing process. Do you have a specific routine you follow, places and times you like to write? Do you know the conclusion to your stories from the beginning, or do they come to you as you go along?

I am an early morning writer. My golden creative hours are from 5 AM to noon. This is when I’m closest to my dream brain, and those morning hours are a space in the world before the business of being an author ramps up. So, I try to honor this as much as possible. Creativity comes first.

I write without an outline. I have no idea who is going to show up day-to-day or what they are going to do. I definitely have no idea how the book will end! I write for the same reason that I read; I want to find out what is going to happen to the people living in my head.

What’s next for you? Do you have more books in the offing? Will there be a sequel to Served Him Right?

Hmm. Never say never. I’m definitely still thinking about Ana and Timothy and what might be next for them. But the 2027 book is complete, and I’m already at work on my 2028 novel. I’m not ready to talk about those yet. But I will say this: They are both psychological suspense. And bad things will certainly happen. Stay tuned!

Tell us about your library. What’s on your own shelves?

That’s a great question. If I turn around and look at my wall of shelves, I see: my own novels in various formats and international editions; books on craft like On Writing: A Memoir of the Craft by Stephen King, and Bird by Bird: Some Instructions on Writing and Life by Anne Lamott; there are classics like a falling-apart copy of Jane Eyre by Charlotte Brontë that I’ve had since childhood; The Complete Sherlock Holmes by Sir Arthur Conan Doyle and The Temple of My Familiar by Alice Walker—both of which are overworn and much loved; a huge American Heritage Dictionary that belonged to my father who was engineer but loved words and the nuance of their meaning (whenever I look at it, I hear him say: Look it up!); some of my favorite non-fiction titles like Stiff by Mary Roach and Deep Survival by Laurence Gonzalez; a first edition copy of In Cold Blood by Truman Capote, the book that gave me permission to be who I am as writer. I could go on and on! It’s a huge wall of books.

What have you been reading lately, and what would you recommend to other readers?

I am always reading multiple books at a time. I just finished The Awakened Brain: The New Science of Spirituality and Our Quest for an Inspired Life by Dr. Lisa Miller. I think the title says it all—truly mind-blowing. I just had the pleasure of interviewing Adele Parks on stage. I highly recommend her new novel Our Beautiful Mess to anyone who wants a character-driven thrill ride. Gripping but also emotional and deep. Antihero by my ITW co-president and bestie Gregg Hurwitz is a tour de force. Gregg writes amazing action and cool tech, but he’s also just a beautiful writer, and his characters leap off the page. Other recent faves: The Night of the Storm by Nishita Parekh; City Under One Roof by Iris Yamashita; I Came Back for You by Kate White—all stellar in totally different ways.

Crawl / Ed Summers

Henhouse by Jan Fyt

The [news] about Cloudflare’s new Crawl API caught my attention for a few reasons. Read on for why, and what I learned when I asked it to crawl my own site as a test.


So, the first reason this news was of interest was how Cloudflare’s Crawl service seemed to be helping people crawl websites with their bots, while at the same time providing the most popular technology for protecting websites from bots. This seemed like a classic fox guarding the hen house kind of situation to me, at least at first. But the little bit of reading I’ve done since makes it seem like they will still respect their own bot gate keeping (e.g. Turnstile). So if your are using Cloudflare or some other bot mitigation technology you will have to follow their instructions to let the Cloudflare crawl bot in to collect pages. I haven’t actually tested if this is the case.

The genius here is that Cloudflare is known for its Content Delivery Network. So in theory when a user asks to crawl a website they can be delivered data from the cache, without requiring a round trip to the source website. In theory this is good because it means that the burden of scrapers on websites might be greatly reduced. If you run a website with lots of high value resources for LLMs (academic papers, preprints, books, news stories, etc) the same cached content could be delivered to multiple parties without putting extra load on your server.

But, the primary reason this news caught my eye is that this service looks very much like web archiving technology to me. For example, the Browsertrix API lets you set up, start, monitor and download crawls of websites. Unlike Browsertrix, which is geared to collecting a website for viewing by a person, the Cloudflare Crawl service is oriented at looking at the web for training LLMs. The service returns text content: HTML, Markdown and structured JSON data that results from running the collected text through one of their LLMs, with the given prompt. Why is it interesting that this is like web archiving technology?

In my dissertation research (Summers, 2020) I looked at how web archiving technology enacts different ways of seeing the web from an archival perspective. I spent a year with NIST’s National Software Reference Library (NSRL) trying to understand how they were collecting software from the web, and how the tools they built embodied a particular way of valuing the web–and making certain things (e.g. software) legible (Scott, 1998). What I found was that the NSRL was engaged in a form of web archiving, where the shape of the archival records were determined by their initial conditions of use (forensics analysis). But these initial forensic uses did not overdetermine the value of the records, which saw a variety of uses later, such as when the NSRL began adding software from Stanford’s Cabrinety Archive, or when the teams personal expertise and interest in video games led them to focus on archiving content from the Steam platform.

So I guess you could say I was primed to be interested in how Cloudflare’s Crawl service sees the web. This matters because models (LLMs, etc) will be built on top of data that they’ve collected. But also because, if it succeeds, the service will likely get used for other things.

To test it, I simply asked it to crawl my own static website–the one that you are looking at right now. I did this for a few reasons:

  1. It’s a static website, and I know exactly how many HTML pages were on it: 1,398. All the pages are directly discoverable since the homepage includes pagination links to an index page that includes each post.
  2. I can easily look at the server logs to see what the crawler activity looks like.
  3. I don’t use any kind of Web Application Firewall or other form of bot protection on my site (I do have a robots.txt but it doesn’t block CloudflareBrowserRenderingCrawler/1.0
  4. I host my website on May First web server which doesn’t use Cloudflare as a CDN. The web content wouldn’t intentionally be in their CDN already.

This methodology was adapted from previous work I did with [Jess Ogden] and Shawn Walker analyzing how the Internet Archive’s [Save Page Now] service shapes what content is archived from the web (Ogden, Summers, & Walker, 2023).

I wrote a little helper program cloudflare_crawl to start, monitor and download the results from the crawl. While the crawler ran I simultaneously watched the server logs. Running the program looks like this:

$ uvx cloudflare_crawl https://inkdroid.org

created job 36f80f5e-d112-4506-8457-89719a158ce2
waiting for 36f80f5e-d112-4506-8457-89719a158ce2 to complete: total=1520 finished=837 skipped=1285
waiting for 36f80f5e-d112-4506-8457-89719a158ce2 to complete: total=1537 finished=868 skipped=1514
...
wrote 36f80f5e-d112-4506-8457-89719a158ce2-001.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-002.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-003.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-004.json
wrote 36f80f5e-d112-4506-8457-89719a158ce2-005.json

Each of the resulting JSON files contains some metadata for the crawl, as well as a list of “records”, one for each URL that was discovered.

{
  "success": true,
  "result": {
    "id": "36f80f5e-d112-4506-8457-89719a158ce2",
    "status": "completed",
    "browserSecondsUsed": 1382.8220786132817,
    "total": 1967,
    "finished": 1967,
    "skipped": 6862,
    "cursor": 51,
    "records": [
      {
        "url": "https://inkdroid.org/",
        "status": "completed",
        "metadata": {
          "status": 200,
          "title": "inkdroid",
          "url": "https://inkdroid.org/",
          "lastModified": "Sun, 08 Mar 2026 05:00:39 GMT"
        },
        "markdown": "..."
        "html": "...",
      },
      {
        "url": "https://www.flickr.com/photos/inkdroid",
        "status": "skipped"
      }
    ]
  }
}

I decided I wasn’t interested in testing their model offerings so I didn’t ask for JSON content (the result of sending the harvested text through a model). If I had, each successful result would have had a json property as well. I am sure that people will use this but I was more interested in how the service interacted with the source website, and wasn’t interested in discovering the hard way how much it cost.

Below is a snippet of how the Cloudflare bot shows up in my nginx logs. As you can see they provide insight into what machine on the Internet is doing the request, what time it was requested, and what URL on the site is being requested.

104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /about/ HTTP/1.1" 200 5077 "-" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /css/main.css HTTP/1.1" 200 35504 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /css/highlight.css HTTP/1.1" 200 1225 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /css/webmention.css HTTP/1.1" 200 1238 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /images/feed.png HTTP/1.1" 200 8134 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /js/bootstrap.min.js HTTP/1.1" 200 17317 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:58 +0000] "GET /images/ehs-trees.jpg HTTP/1.1" 200 63047 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"
104.28.153.137 - - [12/Mar/2026:14:34:59 +0000] "GET /js/highlight.min.js HTTP/1.1" 200 20597 "https://inkdroid.org/about/" "CloudflareBrowserRenderingCrawler/1.0"

So how did Cloudflare Crawl see my website?

Crawling

Results

One of the more interesting things was that each time I requested the website be crawled it seemed to come back with a different number of results.

Ogden, J., Summers, E., & Walker, S. (2023). Know(ing) Infrastructure: The Wayback Machine as object and instrument of digital research. Convergence: The International Journal of Research into New Media Technologies, 135485652311647. https://doi.org/10.1177/13548565231164759
Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press.
Summers, E. (2020). Appraisal talk in web archives. Archivaria, 89. Retrieved from https://archivaria.ca/index.php/archivaria/article/view/13733

Weekly Bookmarks / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Negativeland: Live at Norfolk, VA (Lewis’)

Negativland, Live at Lewis’ in Norfolk, VA. (October 21, 1992). In the midst of their famous U2 controversy (and fallout with SST), Negativland went on tour to help recoup some of the losses and legal costs. They were kind enough to let me shoot their show.

🔖 Paul Avrich

Paul Avrich (August 4, 1931 – February 16, 2006) was an American historian specializing in the 19th and early 20th-century anarchist movement in Russia and the United States. He taught at Queens College, City University of New York, for his entire career, from 1961 to his retirement as distinguished professor of history in 1999. He wrote ten books, mostly about anarchism, including topics such as the 1886 Haymarket Riot, the 1921 Sacco and Vanzetti case, the 1921 Kronstadt naval base rebellion, and an oral history of the movement in the United States.

🔖 Alexander Berkman

Alexander Berkman (November 21, 1870 – June 28, 1936) was a Russian-American anarchist and author. He was a leading member of the anarchist movement in the early 20th century, famous for both his political activism and his writing.

🔖 On Method: How This Blog Works

Most people use AI to either get quick answers or to write things for them. This blog uses it differently – as infrastructure for thinking through ideas, documenting what emerges from that process, and preserving what’s worth keeping.

🔖 Amores Perros

Amores perros is a 2000 Mexican psychological drama film directed by Alejandro González Iñárritu (in his feature directorial debut) and written by Guillermo Arriaga, based on a story by both. Amores perros is the first installment in González Iñárritu’s “Trilogy of Death”, succeeded by 21 Grams and Babel.[4] It makes use of the multi-narrative hyperlink cinema style and features an ensemble cast of Emilio Echevarría, Gael García Bernal, Goya Toledo, Álvaro Guerrero, Vanessa Bauche, Jorge Salinas, Adriana Barraza, and Humberto Busto. The film is constructed as a triptych: it contains three distinct stories connected by a car crash in Mexico City. The stories centre on: a teenager in the slums who gets involved in dogfighting; a model who seriously injures her leg; and a mysterious hitman. The stories are linked in various ways, including the presence of dogs in each of them.

🔖 Deadly Iranian strike changes Purim for Haredi enclave in Beit Shemesh

Political correspondent Sam Sokol and police reporter Charlie Summers join host Jessica Steinberg for today’s episode.

Following the deadly strike on Sunday that killed nine people in Beit Shemesh, Sokol and Summers discuss the shock and mourning in the centrally located city with a strong Haredi enclave.

Purim celebrations and revelry continued in some parts of Beit Shemesh, report the pair, as some synagogues flouted the Home Front Command directives regarding gatherings, while others reflected a somber, cautious mood.

Sokol takes a moment to update us on matters in the Knesset, where most committee meetings were canceled due to the hostilities, and speculates on whether war with Iran will boost Netanyahu at the ballot box in the upcoming elections.

Finally, Summers reports on an end-of-Purim street party in Jerusalem, where police kept a hands-off approach, and the scene of a missile strike in the capital earlier in the week.

🔖 Keenious

A generative AI tool that functions as a research assistant and uses OpenAlex as a data source.

🔖 Wikidata:Wikibase GraphQL

The Wikibase GraphQL API was developed following an investigation into alternative ways of accessing Wikidata and Wikibase content that reduce load on the Wikidata Query Service (WDQS), improve the developer experience for common read use cases and allow more flexible data retrieval in a single request.

As part of this investigation, a Wikibase GraphQL prototype was built to explore what is technically possible and whether GraphQL would be a good fit for Wikibase data, with promising results and supportive feedback.

🔖 Re-OCR Your Digitised Collections for ~$0.002/Page

In the last few years, a new generation of OCR models based on Vision Language Models (VLMs) has emerged. These models are primarily the result of “running out of tokens” and the consequent desire from AI companies to find new sources of data to train on. This led to the development of OCR models using VLMs as backbones which usually aim to output “reading order” text — i.e. text with minimal markup, usually targeting Markdown. These models can perform much better on the same scans that older tools struggled with, producing cleaner, more structured output.

🔖 Lawyers, Humility, and LLMs

If some of the world’s highest-paid lawyers, at the world’s highest-status firms, do deals worth tens of billions of dollars with language they don’t understand, what does that say about the law’s pretensions to high standards? #In other words, yes, LLMs

Yes, like everything else in 2026 this is actually a post about LLMs.

🔖 My Coworkers Don’t Want AI. They Want Macros

My coworkers don’t want AI. They want macros.

Let me back up a little. I spent April gathering and May refining and organizing requirements for a system to replace our current ILS. This meant asking a lot of people about how they use our current system, taking notes, and turning those notes into requirements. 372 requirements.1

Going into this, I knew that some coworkers used macros to streamline tasks. I came out of it with a deeper appreciation of the different ways they’ve done so.

It made me think about the various ways vendors are pitching “AI” for their systems and the disconnect between these pitches and the needs people expressed. Because library workers do want more from these systems. We just want something a bit different.

🔖 Snapicat

Snapicat is a monorepo for a Worldcat OCLC workflow app: upload Excel data, search variables against the OCLC API, and generate MARC/MARCXML for cataloging. It consists of a Vite + React frontend and an Azure Functions (Python) backend that talk to the OCLC Worldcat Metadata API. The backend can also be ran as a web server through utilizing Fastapi via app.py file.

🔖 Open Historical Map

OpenHistoricalMap is an ambitious, community-led project to map changes to natural and human geography throughout the world… throughout the ages. Big and Small, Then and Now

Empires rise and fall. Glaciers disappear. Languages and religions spread from one region to another. Simple dirt paths become busy highways and railways. Modest buildings give way to soaring skyscrapers. And you remember what your neighborhood used to look like. All of it belongs on OpenHistoricalMap.

🔖 SEASON: A letter to the future

Leave home for the first time to collect memories before a mysterious cataclysm washes everything away. Ride, record, meet people, and unravel the strange world around you in this third-person meditative exploration game.

🔖 Iran war heralds era of AI-powered bombing quicker than ‘speed of thought’

The use of AI tools to enable attacks on Iran heralds a new era of bombing quicker than “the speed of thought”, experts have said, amid fears human ­decision-makers could be sidelined.

Anthropic’s AI model, Claude, was reportedly used by the US military in the barrage of strikes as the technology “shortens the kill chain” – meaning the process of target identification through to legal approval and strike launch.

🔖 Wikidata:WikiProject PCC EMCO Wikidata CoP

The Program for Cooperative Cataloging (Q63468537) (PCC) has launched a global cooperative for entity management on the semantic web called EMCO. As part of this program, the Wikidata user community has set up a Community of Practice to coordinate identity management work for GLAMs. You can read more about EMCO and the Wikidata Community of Practice at the EMCO Lyrasis Wiki.

This project is an extension of the work of Wikidata:WikiProject PCC Wikidata Pilot / WikiProject PCC Wikidata Pilot (Q102157715) and acknowledges its great intellectual and organizational debt to the LD4 Wikidata Affinity Group (Q124692294).

🔖 John Fahey Mix Tapes

In the 1990’s my future wife was a record store clerk in Portland, Oregon. American guitar legend John Fahey was living in a nearby town and would visit the shop. Here are two mix cassettes that he made for her during that time.

Build a static search for an Internet Archive Collection with Pagefind / Raffaele Messuti

Pagefind caught my attention about a year ago, and since then I've adopted it in several hobby projects (nothing work-related): some blogs built with static generators like Hugo or Zola, some old HTML content distributed on CD-ROM, and some mailing list archives where I converted mbox files to HTML and then indexed them.

The tool is great, better for my needs than other JavaScript search libraries (though it's not really fair to compare them, since they're quite different). Pagefind is a search tool that runs entirely in the browser with zero server-side dependencies. It indexes your content into a compact binary index, using WASM to run search in the browser.

It can't completely replace server-side search technologies like Solr or Elasticsearch, mainly because the index can't be updated incrementally. But for many small to medium digital libraries or collections that are rarely updated once completed, it's an extremely good tool: very fast, easy to integrate into web pages, and requires almost no maintenance.

Until now I was convinced that the only way to build an index was by reading content from existing HTML files. That changed when I listened to this Python in Digital Humanities podcast, where David Flood mentioned:

Critically, PageFind has a Python API that lets you build indexes programmatically from database dumps rather than only from HTML files.

I'd completely missed that Pagefind has a Python API (and a Node one too), which makes it easy to build an index from any data source.


Here's a basic example: building a search index for an Internet Archive collection.

I'm using the Pagefind pre-release here, which introduces a new UI with web components.

Init

uv init .
uv add internetarchive
uv add --prerelease=allow 'pagefind[bin]'

Directory to save the index and serve the UI

mkdir ./web

Python code: create an index from metadata of this collection (that is actually a collection of subcollections in Internet Archive, Italian content, related to radical movements)

import asyncio
import logging
import os

import internetarchive
from pagefind.index import PagefindIndex, IndexConfig

logging.basicConfig(level=os.environ.get("LOG_LEVEL", "DEBUG"))
log = logging.getLogger(__name__)


async def main():
    config = IndexConfig(output_path="./web/pagefind")

    async with PagefindIndex(config=config) as index:
        log.info("Searching collection:radical-archives ...")
        results = internetarchive.search_items(
            "collection:radical-archives",
            fields=["identifier", "title", "description"],
        )

        count = 0
        for item in results:
            identifier = item.get("identifier", "")
            title = item.get("title", identifier)
            description = item.get("description", "")
            url = f"https://archive.org/details/{identifier}"
            thumbnail = f"https://archive.org/services/img/{identifier}"

            if isinstance(description, list):
                description = " ".join(description)

            await index.add_custom_record(
                url=url,
                content=description or title,
                language="en",
                meta={
                    "title": title,
                    "description": description,
                    "image": thumbnail,
                },
            )
            count += 1
            log.debug("indexed %s: %s", identifier, title)

        log.info("Indexed %d items. Writing index ...", count)

    log.info("Done. Index written to ./web/pagefind")


if __name__ == "__main__":
    asyncio.run(main())

HTML UI in ./web/index.html

<!DOCTYPE html>
<html lang="en">
	<head>
		<meta charset="UTF-8">
		<meta name="viewport" content="width=device-width, initial-scale=1.0">
		<title>pagefind-ia</title>
		<link href="/pagefind/pagefind-component-ui.css" rel="stylesheet">
		<script src="/pagefind/pagefind-component-ui.js" type="module"></script>
	</head>
	<body>
		<pagefind-modal-trigger></pagefind-modal-trigger>
		<pagefind-modal>
			<pagefind-modal-header>
				<pagefind-input></pagefind-input>
			</pagefind-modal-header>
			<pagefind-modal-body>
				<pagefind-summary></pagefind-summary>
				<pagefind-results show-images></pagefind-results>
			</pagefind-modal-body>
			<pagefind-modal-footer>
				<pagefind-keyboard-hints></pagefind-keyboard-hints>
			</pagefind-modal-footer>
		</pagefind-modal>
	</body>
</html>

Result: easy to embed it anywhere!

Trails and tours in library online environments / John Mark Ockerbloom

Below is the text of the lightning talk I gave at Code4Lib 2026 earlier this week, on March 3. The conference venue where I delivered it is located at 1 Dock Street in Old City Philadelphia. Links below go to websites with images similar, but not always identical, to the ones I showed during the talk, as well as to some additional sites giving more context.

If you have a chance, it’s worth walking a few blocks from here to 6th and Market Street, where you can find a reconstructed frame of the President’s House, the home of George Washington during his presidency when Philadelphia was the capital of the US.

An exhibit went up there some years ago, telling the story of the nine people in his household who were enslaved there. Not long ago, the Trump administration ordered the exhibit be removed. You can see here one of the spaces where its panels were taken down.

Here’s one of those panels, putting the story of Washington’s slaves in the context of where they lived, and the chronology of their bondage and freedom.

A judge recently ordered that the exhibit be restored. The court battle is ongoing, and the National Park Service has put back some of the panels. while others are still missing.  In some of the gaps the public have put up their own signs (some of which you can see in this picture), testifying to what’s been suppressed. If you go there, you might even find someone acting as an unofficial tour guide, telling visitors stories similar to the ones that used to be on the official signs.

Now, we know what those signs said. The folks at the Data Rescue project collected photos of them before they came down, and you can view them online.   But the importance of the exhibit is not just what it says, but where it says it.   It’s important that it’s embedded in a particular place, so that people who come visit what’s sometimes called the cradle of liberty also find out that there’s a story about the people deprived of liberty here, and about how they won their freedom.

While we’re at Code4lib, we’re also embedded in a rich environment filled with history and culture.  Just on your walk from here to the President’s House you might pass by the Museum of the American Revolution, the Science History Institute, the American Philosophical Society, the Weitzman National Museum of American Jewish History, and of course, the Liberty Bell and Independence Hall. There’s all kinds of trails of knowledge you can follow, and it’s even better when you have a guide to those trails.

So what do I mean by a trail? A trail is a designated, visible path designed to help its users appreciate and understand the environment it goes through.  You may have hiked some sometimes, and you may have gone on some more explicitly interpretive trails, like the Freedom Trail in Boston.

Our libraries are also rich environments of history and culture.  And we provide ways for users to search them, but do we provide trails for them?

Well, we kind of do.  We have exhibits, like this one from the Library Company of Philadelphia, providing a guided path through a collection of 19th century works on mental illness. People who teach courses like this one at at Yale create instructional trails in their syllabus reading lists. And books that our scholars and authors write, like this one on the history of the civil rights movement, show an implicit trail of events they cover in their tables of contents.

But while these trails all refer to resources in our libraries, they’re not embedded in libraries in the same way as the exhibits and trails I’ve shown in Philadelphia and Boston. But they could be. 

You can think of it as an extension of browsing.  Last time Code4lib was here in Philly, I showed how a catalog I maintain lets you browse subjects using relationships in the Library of Congress Subject Headings, so you can explore various related topics around, say, who can start a war. More recently, I’ve added features for finding out more about people and their relationships, using linked data from places like id.loc.gov and Wikidata.

But we don’t have to stop with what’s in authority files, or in generic library descriptions. Maybe in the future, when you’re visiting Martha Washington’s page, you’ll find a trail that goes through it, like a trail telling the story of Ona Judge, one of the African Americans who Martha claimed ownership over, and who escaped from the house at 6th and Market here in Philadelphia, and stayed free the rest of her life.

What will that trail telling her story look like? I’m not quite sure, but I have some ideas that I’m hoping to try implementing, not so that I can tell the story, but that I can represent the story from others who can tell it better than I can.  And so that people visiting my site can find and follow that story, with all of its richness, just as they once could when they visited the President’s House in Philadelphia, and as I hope they soon can do here again.

If this interests you, I’d love to talk more with you.





Proofs / Ed Summers

This is a good post from Dan Chudnov about his work on mrrc (a Python wrapped Rust library for MARC data) and how agentic-coding tools (e.g. Claude Code) can be useful for learning, adding rigor and engineering that might otherwise not be practical or feasible.

pymarc has been proven through years of use, bug reporting, and improvements, but has never been formally verified, or had that level of rigorous attention. I remain skeptical about building AI into everything, but Dan has helped me see a silver lining where, as code gets easier to write, with all its potential for slop, it also simultaneously opens a door to helping making it more reliable and performant.

And, Dan is not alone in thinking this. What if the tools for describing how software should work, and for measuring how software does work, get much, much better? If formal verification tools become more accessible and can be applied not just at the base layer of systems (where it really matters) but in middle and frontend layers of applications, where domain experts and stakeholders would really like more control and insight into how software works for them and others?

This approach implies a level of restraint, or a holding back of the generation of code that has not yet had this level of rigor applied to it. The discourse around vibecoding on the other hand seems to be the natural culmination of a “move fast and break things” philosophy that almost everyone outside of Silicon Valley has seen for what it is.

March 2026 Early Reviewers Batch Is Live! / LibraryThing (Thingology)

Win free books from the March 2026 batch of Early Reviewer titles! We’ve got 226 books this month, and a grand total of 3,026 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.

If you haven’t already, sign up for Early Reviewers. If you’ve already signed up, please check your mailing/email address and make sure they’re correct.

» Request books here!

The deadline to request a copy is Wednesday, March 25th at 6PM EDT.

Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the US, the UK, Israel, Australia, Canada, Ireland, Germany, Malta, Italy, Latvia and more. Make sure to check the message on each book to see if it can be sent to your country.

The Great WhereverExceptional Hatred: Antisemitism and the Fight for Free Speech in Modern AmericaProcrastination Proof: Never Get Stuck AgainRules to Live By: Maimonides' Guide to a Wonderful Life (HEBREW EDITION)Endless Exodus: The Jewish Experience in EthiopiaBlue Team Dynamics: Three Proven Leadership Principles Inspired by IDF Sources for Business and LifeSons of Abraham: A Candid Conversation about the Issues that Divide and Unite Jews and Muslims (HEBREW EDITION)Sons of Abraham: A Candid Conversation about the Issues that Divide and Unite Jews and Muslims (ARABIC EDITION)Puzzles She PackedBloom Of BetrayalNever Hide from the DevilBowers Mansion: The Legacy of a Comstock FamilyTangential Terrains: Cormac McCarthy's GeoaestheticsA Future For Ferals: A Charity AnthologyMore Futures for Ferals: A Charity AnthologyHow to Create an Organic Aquarium: The Beginner's Guide to Soil-Based Freshwater AquariumsRonald, the RoninDying to Live HereThe Unfavored Children's ClubSea SudsFaking to FallingBunnies in the Berry RowThe CorryJack Rittenhouse: A Western Literary LifeArthur and the Kingswell TrioMantleSome Stupid Glow: StoriesDollartoriumWhen Paris WhispersThe Night Nurse and the Jewel ThiefHeroes of PALMAR: How One IDF Unit Revolutionized Combat Medicine in GazaWhen Eichmann Knocked on Our Doorאיש כפי נחלתו: שנים-עשר שבטי ישראל בנחלות אבותיהםFamily DramaThe Son Of A Belfast Man: From the Early Years Up to Nineteen Years OldClaimed by DarknessThe Alfriston QuartetJaguars and Other GameJungle of AshesShooting Up: A Memoir of Love, Loss, and AddictionWarp & WeftHere for a Good TimeCanada: We Are the StoryRuthieA Deadly InheritanceFly in the ChaiMjede: The Three DaysSince You Weren't There and Other MemoriesQuestions for Werewolves: A Creative Nonfiction of Madness, Witch and DaimonEstuaryI'll Stop From MondayThe Marilyn DiariesNever Hide from the DevilThe Greatest New York Yankees by Uniform NumberThe Blue WaveCalisthenics: Core Crush: 38 Bodyweight Exercises for a Stronger CoreLightningShadows of the Republic: The Rebirth of Fascism in America and How to Defeat It for GoodDigital Coup: The Conspiracy to Thwart Global DemocracyWeathering the Storm: Navigating the Anti-Social Justice WaveConversion Therapy Dropout: A Queer Story of Faith and BelongingThe Christian Past That Wasn't: Debunking the Christian Nationalist Myths That Hijack HistoryPuppy Training: The Smart Way7 Spiritual Habits to Change Your LifeInvesting for BeginnersWitch of the Shadow WoodThe Last PageWe Become DarknessPondering: A Story in CinquainsBy the Bubbling BrookTaming the AlphaTo See BeyondThe Fallen: The Lost Girls of Ireland's Magdalene Laundries and a Legacy of SilenceSeed Starting Simplified for Beginners: A Complete, Step-by-Step Guide to Growing Healthy, Strong Seedlings Indoors, Avoiding Common Mistakes & Transplanting with ConfidenceContinuous Improvement Essentials You Always Wanted to KnowBetter: A Guidebook to a New and Improved YouDigital SAT Reading and Writing Practice QuestionsDigital SAT Math Practice QuestionsThe Theater: Courage and Survival in the Defining Atrocity of the Ukraine WarOur Minds Were Always Free: A History of How Black Brilliance Was Exploited--And the Fight to Retake ControlInheritance: Nick Chambers Slayer for HireSuperteams: The Science and Secrets of High-Performing TeamsPrickles and PridesNo Further Action: Ten Short StoriesPermit to StayLife Is Terminal: And So Is This Cold SoreThe Tarishe CurseIndian Warner: Son of Two WorldsSpindleheart: Wrath of the Ravelwind KnightThe Sure Thing: A Pleasure Practice to Revive the SparkEssence MergingQasida for When I Became a WomanNo Winning This WarMan of a Thousand Fails: Film Noir of Elisha Cook JrRed DemonSticks and Stones and Dancing Cranes: The End of the BeginningFool: A Tudor NovelWho in Astrology Are You?Stillness and Survival: A Life Between Trauma, Glitter, and the Echo of My Own VoiceThe Florist's Budding DesireFission: A Novel of Atomic HeartbreakEmberglow Falls Academy: The Legacy of MagicThe Jolt: A Time-Slip RomanceHaggadahpalooza: The Unofficial Weirdly Perfect Passover Pop Parody PanoplyTwo x ThreeMother of Assassins: A Memoir of the ImaginationInner, The Breath of God, Volume 1Play From Your HeartLegends of Mexico Coloring Book: Mythical Tales and Folklore to Color and EnjoyThe Golden Apple and the Nine Peahens: A Balkan Orchard TaleConnection:LostOne of a Kind CreaturesC is for Childhood Cancer: And Other Lessons Cancer Taught MeThere's a Young Man Dressed in BlueChivalry & ChocolateCaput Mundi: The Head of the WorldCain's ChameleonThe Lion's DenCain's ChameleonOn Moreton WatersThe Million-Dollar Sentence: The Secret of the Valley of PeaceA Moment's SurrenderLogos Palimpsest: Layered Verses of My Myths and MemoriesFelicity Fire and the Forever KeyMinds & Moods: Power & Deception Crossword PuzzlesTrue & Absurd Lawsuits: The Cases Kept ComingDear Missing FriendIn His Absence: A Brother, A Life, and What EnduresWill's WakeDesert Superstars: A Patience & Perseverance Coloring Adventure: A Mindfulness Coloring Book with Desert Animals, Patience-Building Prompts, and Mindful SEL Adventures for Growing HeartsOur Better NatureThe Pioneer Converts: The Message of HopeThe Black Knight: Miqdad Historical NovelThe Gardener Parent: Stop Yelling and Start Guiding Using Ericksonian MethodsBlütenschwere : Roman über Die Gewalt der AuslöschungThe Weight of Petals: A Story of Memory and ResistanceThe Problem with Conspiracy Theories: Real Scandals, Fake Mysteries, and How Distrust Took OverCity of the Gods: The Return of Quetzalcoatl (15th Anniversary Edition)The Three-Bullet Act: Journal of an HR DirectorThe Shapeshifter's GambitThe Vampyre ClientJeannie's Bottle: IncantationsFated RebirthLove and Ghosts at Hideaway LakeJonah and Mira: The Map Beneath the OakChangeupA Gift of RevelationsBachelorx: A Nonbinary MemoirA Strange SoundThe Rising of the WolvesThe Rising of the WolvesThe Missing FrameCaenogenesisThe Standard: 38 Standards of LifeThe Caregiver's Game: Unraveling Financial Deceit in the Shadows of DementiaClass Is in Session: Teaching Through the ChaosPolitics and Morality: The Problems of Ethical Debate for an Evolved Social SpeciesThe Book of Peace AphorismsTerrestrialQueenslanderThe Blood of Birds: A King David-Era ThrillerA Look into Mirrors: Their Making and Use Throughout HistoryThe Coherent Website: Designing for Trust in the Age of SearchHuman Again: In the AI AgeCut to the QuickThe Clockwork SpyYou CancerViveActs Of FaithThe HuntedAbba, Father!: A Journey to Knowing God in His Greatest Role of AllMidnight MeowsA Night of Strange DreamsAunt Rosie's FarmClose Encounters with Tort$Rewriting Your Life: A Workbook On Self-DiscoveryEpic Health & Ultimate Training: A Self-Help Workbook For Becoming StrongConnecting Goals to Impacts and Outcomes: Harnessing Structured Conversations for Customer-Driven Value DeliveryTrust and Treason: The RiseThe Last Phone CallWhen We Came Full CircleWhen Bonds Were ForgedThe Waterfall of VengeanceRain and Sun: Confessions of Love, Silence, and an Irrevocable PastAn Unsuitable Knight: A Novel of Norman ItalyBound by the ElementsMarriage Supper, Clearing GoatWord Fill in Puzzles: Large Print Puzzles for Seniors with over 70 Nostalgic Brain Games to Keep Your Mind Sharp and Active (Solutions Included)Yours Rhetorically, Cold Blue Monster: A Criminal Counseling Text-MoirMidnight BallerinaThe Agentic Loop: How Humans + AI Build Experiences That LearnThat Which Does Not Kill Us: An Intergenerational Memoir of Legacy TraumaIn the Belly of the AnacondaFree Will: Resolving the MysteryFree Will: Resolving the MysteryTattle Royale: Burn BookRupture Threshold1,2&3 John Bible Study: Dwell in LightThe Nutcracker - Gird Thy LoinsThe Magic SeekerNyxalath Heirophant of VeilsReed CityTerr-or-Treats: Spooky Ghost Stories and Deliciously Haunted AdventuresIncunabulaI Don’t Hum Anymore: A Confession of Silence, Survival, and City MadnessGolden LightI Raised Monsters: A Failed Teacher's Confession — Prisoner 4782A Florida Dance: Life Stories from the Sunshine StateCavern Sanctuary: After the FalloutDeep Work for Distracted People: Simple Methods to Stay Focused, Think Clearly, and Finish What MattersThe Law of the Spirit of Life: God's Design for a Life of Effortless TransformationOne-Page Wealth Compass: Fired at 63 Nearly Broke - Safely a Millionaire by 69The Dog BookThis Fell SergeantThe Secret Winners ClubDear Missing FriendThe FallYour Business Growth Playbook: Breakthrough Strategies to Scale Your Business for Business Owners Who've Outgrown HustleBeyond the Crystal SkyYpresMore Than ChemicalOld EarthHealthy Minds, Healthy Nation: How Meditation, Shamanism, and Indigenous Healing Can Tap into Your Light Within and Change the WorldAfter We BreakData Science in 7 Days: Python Fast-Track with Hands-on ProjectsBash and Lucy Say, Love, Love, Bark!Thinker Reads Start With Why: How to Find Your Why and Dare to Lead a Purpose Driven Life in 3 Steps Even If You’re Starting From Zero

Thanks to all the publishers participating this month!

Alcove Press Artemesia Publishing Baker Books
Bellevue Literary Press Broadleaf Books Brother Mockingbird
Cennan Books of Cynren Press City Owl Press Cozy Cozies
Egg Publishing Entrada Publishing eSpec Books
Fawkes Press Featherproof Books Gefen Publishing House
Gnome Road Publishing Grand Canyon Press Greenleaf Book Group
Hawthorn Quill Publishing Henry Holt and Company History Through Fiction
Infinite Books Inkd Publishing LLC Lito Media
PublishNation Pure Calisthenics Riverfolk Books
Running Wild Press, LLC Simon & Schuster Tundra Books
University of Nevada Press University of New Mexico Press Unsolicited Press
Vibrant Publishers W4 Publishing, LLC WorthyKids

DLF Digest: March 2026 / Digital Library Federation

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here

 

Hello DLF Community! It’s March, which means spring is around the corner (finally!), and it’s a great time for new growth. To that end, Forum planning is well underway for the virtual event this fall, and the DLF Groups are hard at work planning fantastic meetings and events for 2026. Additionally, I’m excited to share a bit of my own news: I’m transitioning to a new role at CLIR, Community Development Officer, that will help me support our community from a new angle. You’ll still have an amazing leader in Shaneé, stellar conference support from Concentra, and I certainly won’t be a stranger. As always, my inbox is open if you want to connect, send pet pictures, or have ideas about how you’d like to see our community grow in the coming months and years. See you around soon!

– Aliya

 

This month’s news:

  • Nominations Open: Suggest the names of individuals who may make compelling featured speakers at the 2026 Virtual DLF Forum. Nominations due March 31.
  • Registration Open: IIIF Annual Conference and Showcase in the Netherlands, June 1–4, 2026. For information, visit the conference page.
  • Early Bird Registration: Web Archiving Conference 2026 at KBR, the Royal Library of Belgium. Register by March 7 to secure discounted rates, and visit the conference website for full details.
  • Call for Proposals: AI4LAM’s Fantastic Futures 2026: Trust in the Loop, September 15-17, inviting proposals on how libraries, archives, and museums engage with trust and AI. Submissions due April 6.

 

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus conferences and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.

 

  • DLF Born-Digital Access Working Group (BDAWG): Tuesday, 3/2, 2pm ET / 11am PT.
  • DLF Digital Accessibility Working Group (DAWG): Tuesday, 3/2, 2pm ET / 11am PT.
  • DLF AIG Cultural Assessment Working Group: Monday, 3/9, 1pm ET / 10am PT.
  • AIG User Experience Working Group: Friday, 3/20, 11am ET / 8am PT
  • AIG Metadata Assessment Group: Friday, 3/20, 2pm ET/ 11am PT.
  • DLF Digitization Interest Group: Monday, 3/23, 2pm ET / 11am PT.
  • DLF Committee for Equity & Inclusion: Monday, 3/23, 3pm ET / 12pm PT.
  • DLF Open Source Capacity Resources Group: Wednesday, 3/25, 1pm ET / 10am PT.
  • DLF Digital Accessibility Policy & Workflows subgroup: Friday, 3/27, 1pm ET / 10am PT.
  • DAWG IT & Development: Monday, 3/30, 1pm ET / 10am PT.
  • DLF Climate Justice Working Group: Tuesday, 3/31, 1pm ET / 10am PT.

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member organization. Learn more about our working groups on our website. Interested in scheduling an upcoming working group call or reviving a past group? Check out the DLF Organizer’s Toolkit. As always, feel free to get in touch at info@diglib.org

 

Get Involved / Connect with Us

Below are some ways to stay connected with us and the digital library community: 

 

The post DLF Digest: March 2026 appeared first on DLF.

Listening to library leaders: Surveys capture real-time perspectives shaping decisions across the field / HangingTogether

Hands typing on a laptop keyboard with a transparent digital checklist interface overlaid on the screen, showing multiple checked boxes and lines of text.

Funding and resourcing, technology, staffing, community needs and expectations—the pace of change library leaders now need to navigate and lead their organizations through is nothing short of breathtaking. Trends that took years to evolve now demand responses and strategic planning within months, or even days. Grounding those choices in rigorous, in-depth research remains essential.

At the same time, library decision-makers benefit from collective wisdom and insights shared among peers. Knowing how others are responding to similar pressures can help leaders calibrate their strategies and avoid reinventing the wheel. When those insights are confined to personal or regional networks, the limited perspective can restrict leaders’ views of how priorities and decisions are shifting.

OCLC Research leadership insights: Real-time insight for real-world decisions

This tension between the need for deeply researched guidance and the demand for timely, real-world insight creates a gap for the field. Library leaders need to understand not only which frameworks and models exist for long-term decision-making that are supported by our traditional research efforts, but also how their peers are responding to rapidly changing conditions right now.

To help fill this gap, OCLC Research is expanding its approach to gathering and sharing knowledge with a new series of pulse surveys focused on library leadership priorities. These quick, timely surveys aim to gather information on the decisions library leaders are making on a variety of critical topics shaping the future of librarianship.

A complementary approach to longstanding research practices

These short surveys are designed to capture high-level snapshots of the decisions library leaders make in the moment on subjects critical to the field, such as community engagement tactics and the use and implementation of new technologies, including AI. They are intentionally brief, both to respect leaders’ time and to enable us to respond quickly to emerging issues.

This approach does not replace the in-depth, foundational research OCLC Research is known for. Rather, it adds another dimension to it.

Our long-form research projects will continue to provide thoughtful frameworks, deep analysis, and foundational guidance for operational decision-making and long-term innovation. Leadership insights surveys complement that work by:

  • Broadening the range of topics we can address, especially those that are evolving quickly
  • Expanding the pool of voices contributing insight, drawing from library leaders across regions and library types
  • Capturing change as it happens, and tracking how priorities and decisions shift over time

Together, these approaches create a more layered understanding of the field, combining depth with immediacy.

Powered by OCLC’s global membership network

The value of these leadership insights depends on scale. OCLC is uniquely positioned to engage a broad, global network of libraries and library leaders representing diverse viewpoints. This allows us not only to collect perspectives from beyond individual professional networks but also to share results with the field quickly and widely.

The outcomes will be intentionally concise: scannable, easy-to-digest summaries that surface patterns, contrasts, and emerging directions. Think of them as snapshots—ephemeral by design—that help illuminate how decisions are being made today, while also building a record of how those decisions evolve over time.

What this means for library leaders

For library leadership, this new format offers another way to stay oriented in a fast-moving environment:

  • Insight into how peers are prioritizing and responding to shared challenges
  • Timely information that can inform near-term decisions
  • A broader field-level perspective that complements local experience

By adding pulse surveys to our toolkit, OCLC Research is expanding the breadth and increasing the pace of the insights we provide, while remaining grounded in the thoughtful, evidence-based work that has long supported libraries’ strategic and operational decision-making.

We see this as one more way to help library leaders make sense of complexity, learn from one another, and move forward with confidence. Our first pulse survey, focused on AI innovation & culture in libraries, will be fielded with US library leaders in early March 2026.

Subscribe to Hanging Together, the blog of OCLC Research, for updates on the survey series and to follow our latest work.

The post Listening to library leaders: Surveys capture real-time perspectives shaping decisions across the field appeared first on Hanging Together.

Does Clarivate understand what citations are for? / Hugh Rundle

A month ago Clarivate announced a new yet-to-be-released product called Nexus: "Clarivate Nexus acts as a bridge between the convenience of AI and the rigor of academic libraries". This is a pitch to librarians who have correctly identified generative AI chatbots as purveyors of endless bullshit, but also know that students and some researchers are going to use them anyway. Clarivate tells us that we can patch up the fabrications of chatbots with reassuring terms like "trusted sources", "verified academic references", and "authoritative".

Looking more carefully at Clarivate's marketing material, what they are proposing suggests that Clarivate understands neither what citations are for nor why fabricated citations are a problem. This is somewhat surprising for the company that controls and manages such key parts of the scholarly publishing systems as the citation database Web of Science, scholarly publishing and indexing company ProQuest, and the Primo/Summon Central Discovery Index.

Why we cite

It can get a little more complicated than this, but there are essentially two reasons for citations in scholarly work.

The first is to indicate where you got your data. If I write that the population of Australia in June 2025 was 27.6 million people, I need to back up this claim somehow. In this case, I would cite the Australian Bureau of Statistics as the source. This adds credibility to a claim by enabling readers to check the original source and assess whether it actually does make the same claim, and whether that claim is credible. If I said that the population of Australia in 2025 was 100 million people and cited a source which made that claim and in turn cited the ABS as their source, you could follow the chain of references back and identify that the paper I cited is where the error ocurred.

The second reason we cite a source is to give credit for a concept, term, or model for thinking. This is less about checking facts and more about academic norms and manners, though it also indicates how credible a scholar might be in terms of their understanding of a field. For example I might describe a concept whereby librarians feel that the mission of libraries is good and righteous, and this leads to burnout because they feel they can never complain about their working conditions. If I did not cite Fobazi Ettarh's Vocational Awe and Librarianship: The Lies We Tell Ourselves whilst describing this, I would rightly not be seen as a credible scholar in the field, or alternatively might be seen as surely knowing about Ettarh's work but deliberately ignoring it or even claiming her work as my own idea.

Why fabricated citations are bad

So that's the basics of why scholars include citations in their work. We can now explore why fabricated citations are a problem. There are two related but distinct reasons.

Citations that look real but are actually fake waste the time of already-busy library resource-sharing teams by making them spend time checking whether the citation is real, and sometimes looking for items that don't exist. This aspect of fabrication is bad because the cited item doesn't exist. If we match this to our first reason for citing, we can see that a claim that is backed by a citation to nothing at all is, uh, pretty problematic if the reason we cite is to link to the source data backing up a claim. It's equivalent to simply not providing a citation at all, except worse because we're claiming that our plucked-out-of-the-air "fact" is backed up by some other source.

The second problem with fabricated citations is that there is no connection between the statement being made and the source being cited. Even if the source being cited exists, the connection between the statement and the cited item is fabricated. This is slightly more difficult to understand because generative AI is based on probability, so in many cases there will appear to be a connection. But without a tightly-controlled RAG system, it's likely to simply be a lucky guess. The problem here is one of academic integrity – we've cited a source that exists, but it may or may not back up our claim, and the claim doesn't follow from the source.

A false nexus

Clarivate seems to be conflating these two issues. Their Nexus product has two core functions: checking citations to see if they are real, and suggesting references for content in chatbot conversations. The first is genuinely useful, though highly constrained – Clarivate only checks their own indexes, and defines anything that doesn't appear in those indexes as either non-existing, or "non-scholarly" (it's unclear how it would define, for example, something with a DOI that exists but doesn't appear in Web of Science). Neither academia nor the tech industry are short on hubris, but even in that context, "anything not listed in our proprietary databases isn't credible" is a pretty eyebrow-raising claim.

The second function kicks in when the citation checker defines a citation as failed – it offers to "Find Verified Alternative". That is, Nexus offers to replace both cited sources that don't exist and cited sources that "aren't scholarly" with another real source. This addresses the first problem (cited sources that don't exist) but not the second (cited sources that aren't the real source of a claim or quotation).

With Nexus, Clarivate are essentially integrity-washing synthetic text, giving it an academic sheen without any academic rigour. Far from helping librarians, Clarivate's Nexus threatens to further unravel the hard work we do to teach students information literacy skills and its sparkling variety, "AI literacy". Students are already inclined to write their argument first and go on a fishing expedition for citations to back it up later (I certainly wrote my undergraduate essays this way). The last thing we want to do is direct them to a product that encourages this academically dishonest behaviour.

ChatGPT is designed to provide something that looks like a competent answer to a question. Nexus seems to be designed to amend this answer-shaped text into something that looks like a correctly-cited academic essay. But the point of student assessments isn't to produce essays – it's to produce competent researchers and systematic thinkers. Perhaps Clarivate thinks there is a large potential market of universities who want to help their own students cheat on assignments in ways that look more credible. To that, I would say "[citation needed]".


Memorial for Fobazi Ettarh / In the Library, With the Lead Pipe

It is with heavy hearts and great sadness that we acknowledge the passing of trailblazer and fire-starter Fobazi Ettarh. Her loss will be felt by us all for years to come.  

Fobazi published two articles with us at ITLWTLP. In 2014 she wrote “Making a New Table: Intersectional Librarianship,” one of the first scholarly articles published about viewing librarianship through an intersectional lens. In 2018 she published the hugely influential “Vocational Awe and Librarianship: The Lies We Tell Ourselves.” Since then, we have published many, many articles that cite the concept she identified: vocational awe. She was, to borrow a phrase from bell hooks, a maker of theory and a leader of action. We remember her as one of the great thinkers of her time, and we encourage our readers to spend some time with her words and her work. Additionally, please consider contributing to or sharing the link for her GoFundMe.

Streamlining Open Access Agreement Lookup for U-M Authors / Library Tech Talk (U of Michigan)

Sign hanging in a shop window that says OPEN ACCESS!
Image Caption
University of Michigan Library recently launched a new application to help U-M researchers and authors at our three campuses locate publications covered under institutional open access agreements. This tool aggregates nearly 13,000 titles across publishers, streamlining the process of locating eligible journals. The project involved data-wrangling, application design and development, and usability testing to produce a usable, sustainable tool.