Planet Code4Lib

Chatbots vs. Ozone / David Rosenthal

Source
Back in February I posted The Kessler Syndrome, which also included a brief section mentioning the impacts of the proposed megaconstellations on the environment, specifically global warming from CO2 and black carbon, and depletion of the ozone layer. Three months earlier Anton Petrov had examined the last of these in Risk of Ozone Layer Destruction from Internet Satellite Swarms and Rocket Fuel. He has now followed up with SpaceX Is Conducting a Giant Chemical Experiment on Our Atmosphere Without Realizing. Below the fold I survey the papers Petrov cited and a few others.
The papers involved are, in date order, as follows together with extracts from their abstracts:

Impact of Rocket Launch and Space Debris Air Pollutant Emissions on Stratospheric Ozone and Global Climate by Robert Ryan et al (9th June 2022):
Rockets, unlike other anthropogenic pollution sources, emit gaseous and solid chemicals directly into the upper atmosphere. We compile inventories of these chemicals from rocket launches in 2019 and projections of future growth and speculative space tourism activity. We incorporate these in a 3D atmospheric chemistry model to simulate the impact on climate and the protective stratospheric ozone layer. We find that loss of ozone due to current rockets is small, but that routine space tourism launches may undermine progress made by the Montreal Protocol in reversing ozone depletion in the Arctic springtime upper stratosphere. The BC (or soot) particles from rockets are also of great concern, as these are almost five hundred times more efficient at warming the atmosphere than all other sources of soot combined.
Note that even four years ago it was already clear that the space industry was both depleting ozone and aggravating global warming. But this was before the scale of the proposed mega constellations was evident.

Metals from spacecraft reentry in stratospheric aerosol particles by Daniel Murphy et al (7th September 2023):
So far, models of spacecraft reentry have focused on understanding the hazard presented by objects that survive to the surface rather than on the fate of the metals that vaporize. Here, we show that metals that vaporized during spacecraft reentries can be clearly measured in stratospheric sulfuric acid particles. Over 20 elements from reentry were detected and were present in ratios consistent with alloys used in spacecraft. The mass of lithium, aluminum, copper, and lead from the reentry of spacecraft was found to exceed the cosmic dust influx of those metals. About 10% of stratospheric sulfuric acid particles larger than 120 nm in diameter contain aluminum and other elements from spacecraft reentry. Planned increases in the number of low earth orbit satellites within the next few decades could cause up to half of stratospheric sulfuric acid particles to contain metals from reentry.
Much of the reentry burn happens above the stratosphere, and it takes time for the aluminum nanoparticles to drift down to the levels where they were collected. So the 10% number represents pollution from an earlier period with fewer reentries that the 2020s. Murphy notes that:
Most of the meteoric mass is deposited at altitudes between 75 and 110 km by a very large number of sub-millimeter meteoroids. Reentering spacecraft, which are larger and moving more slowly, ablate between 40 and 70 km over a ~300 km long footprint
His samples were collected at 19km altitude.

Potential Ozone Depletion From Satellite Demise During Atmospheric Reentry in the Era of Mega-Constellations by José P. Ferreira et al (11th June 2024):
This paper investigates the oxidation process of the satellite's aluminum content during atmospheric reentry utilizing atomic-scale molecular dynamics simulations. We find that the population of reentering satellites in 2022 caused a 29.5% increase of aluminum in the atmosphere above the natural level, resulting in around 17 metric tons of aluminum oxides injected into the mesosphere. The byproducts generated by the reentry of satellites in a future scenario where mega-constellations come to fruition can reach over 360 metric tons per year. As aluminum oxide nanoparticles may remain in the atmosphere for decades, they can cause significant ozone depletion.
Ferreira et al confirm the potentially long delay between reentry and the nanoparticles reaching the ozone layer and depleting it:
we find that these reentry byproducts may take up to 30 years to settle from the top of the mesosphere into the stratospheric ozone layer. Upon reaching an altitude of about 40 km, aluminum oxides catalyze chlorine activation which promotes ozone depletion. This suggests that concentrations of aluminum oxide compounds may start increasing in the mesosphere well before reaching the stratospheric ozone layer. This would introduce a noticeable delay between the beginning of the injection process when orbiting bodies are decommissioned and the eventual ozone-depletion consequences in the stratosphere.
Investigating the Potential Atmospheric Accumulation and Radiative Impact of the Coming Increase in Satellite Reentry Frequency by Christopher Maloney et al (21st March 2025):
A lack of observations and validated models of reentry demise limits our ability to simulate the complex aerosols associated with reentry, which makes estimating the climate impacts difficult. Aluminum is a primary satellite component and will likely be emitted during reentry vaporization in the form of alumina. Unmodified alumina is a useful approximation for metallic reentry aerosol. In this study, we simulate a potential yearly emission of 10,000 metric tons of alumina from reentering space debris. We investigate how the location of atmospheric accumulation, aerosol size distribution, and radiative properties of reentry alumina impacts the middle atmosphere. We find that 20,000–40,000 metric tons of alumina accumulates at high latitudes between 10 and 30 km in both hemispheres. Small changes in mesospheric heating rates lead to 1.5-K temperature anomalies in the middle atmosphere at high latitudes. These temperature anomalies are accompanied by changes in wind speed in the polar vortex.
So there are thermal effects on the climate as well as the effects on the ozone layer.

Near-future rocket launches could slow ozone recovery by Laura Revell et al (9th June 2025):
To understand if significant ozone losses could occur as the launch industry grows, we examine two scenarios. Our ‘ambitious’ scenario (2040 launches/year) yields a −0.29% depletion in annual-mean, near-global total column ozone in 2030. Antarctic springtime ozone decreases by 3.9%. Our ‘conservative’ scenario (884 launches/year) yields −0.17% annual, near-global depletion; current licensing rates suggest this scenario may be exceeded before 2030. Ozone losses are driven by the chlorine produced from solid rocket motor propellant, and black carbon which is emitted from most propellants. The ozone layer is slowly healing from the effects of CFCs, yet global-mean ozone abundances are still 2% lower than measured prior to the onset of CFC-induced ozone depletion. Our results demonstrate that ongoing and frequent rocket launches could delay ozone recovery. Action is needed now to ensure that future growth of the launch industry and ozone protection are mutually sustainable.
Note that this paper addresses only the ozone depletion from launches, not from reentry. But their 'ambitious' scenario of 5.6 launches/day is far short of Musk's ambitions, let alone the other planned megaconstellations. My understanding is that the 2040 launches/year in their scenario are of Falcon 9 class vehicles but "only 4.4% of launches are using vehicles designed for re-entry", which is implausible. But the mega-constellations can't be built or maintained with Falcon 9s.

Will Lockett is, as one should be, skeptical of Musk's claims. In Musk’s Orbital Data Centre Idea Is Getting More Stupid By The Day he analyzes the claimed "million satellite data center" assuming it is built, as Musk claims, with Starship but over 15 years, a longer timescale than Musk's:
To achieve that, they would need to launch 120,000 satellites per year. Over the 15 years, they would launch 1.8 million satellites, but 800,000 of them would fail (as part of our 9% failure rate), leaving a total operational fleet of one million satellites. This equates to 3,158 Starship launches per year, or nearly nine launches per day. For some context, the current launch rate for Starship is just five per year.
...
In order to keep a million satellites in the constellation, it needs to be maintained. So, each year, SpaceX would have to launch 90,000 AI Sat Minis to replace the roughly 9% of the constellation that failed. That equates to 2,368 Starship launches per year, or 6.4 per day.
That's 9 launches/day for 15 years then 6.4 launches/day indefinitely of a much rocket that is vastly bigger than Falcon 9 and is completely re-usable.

Of course, these claims are ridiculous - neither logistically nor economically feasible. But assuming Starship or a competitor such as Blue Origin does manage to create a reliable, reusable, 100 ton to LEO launch vehicle, there will be a lot more mass in LEO and a lot more of it reentering.

Measurement of a lithium plume from the uncontrolled re-entry of a Falcon 9 rocket by Robin Wing et al (19th February 2026):
A 10-fold enhancement of lithium atoms was detected at 96 km altitude by a resonance lidar at Kühlungsborn, Germany, approximately 20 hours after the uncontrolled re-entry of a Falcon 9 upper stage. The upper-atmospheric extension of the ICON general circulation model, nudged to ECMWF, was used to calculate winds. Backwards trajectories, including wind variability as measured by radar, traced air masses to the Falcon 9 re-entry path at 100 km altitude, west of Ireland. This study presents the first measurement of upper-atmospheric pollution resulting from space debris re-entry and the first observational evidence that the ablation of space debris can be detected by ground-based lidar. The analysis of geomagnetic conditions, atmospheric dynamics, and ionospheric measurements supports the claim that the enhancement was not of natural origin. Our findings demonstrate that identifying pollutants and tracing them to their sources is achievable, with significant implications for monitoring and mitigating space emissions in the atmosphere.
The effect of lithium and other spacecraft ingredients on the ozone layer doesn't appear to have been studied compared to aluminum. To be fair, there will be a lot more aluminum.

Radiative Forcing and Ozone Depletion of a Decade of Satellite Megaconstellation Missions by Connor Barker et al (14th May 2026):
We use a global inventory of launch and re-entry emissions covering the onset of the megaconstellation era (2020–2022), and project these to 2029 based on 2020–2022 growth rates. We implement this inventory into a 3D atmospheric chemistry model to determine the impacts of megaconstellations on the ozone layer and climate. We find that global stratospheric ozone depletion from all mission types is relatively small compared to surface sources and megaconstellation missions only account for about one-tenth of this depletion. This is because rockets launching megaconstellations almost all use kerosene, a large source of black carbon or soot particles, but not of chemicals such as chlorine that directly destroy ozone. Soot from rockets absorbs sunlight, warming the upper layers of the atmosphere and decreasing the amount of sunlight reaching Earth's lower atmosphere, causing it to cool. Megaconstellation missions are responsible for about half of this climate effect. In this regard, rockets launching megaconstellations and other missions are like small-scale stratospheric aerosol injection experiments without forethought for potential unintended consequences.
Again, this paper addresses only atmospheric impacts from launches, not from reentries. And, the launch rate for 2020-2022 is far less, and uses much smaller rockets, than the proposed "million satellite data center" and its competitors.

From Inherited Systems to Strategic Decisions / Information Technology and Libraries

The author examines the migration of Indiana University Libraries’ interlibrary loan platform, ILLiad, from a locally-hosted server to OCLC hosting through the perspective of a new department head inheriting this critical technology decision. He explores how staffing changes, lost institutional knowledge, recurring system instability, and limited technical capacity prompted a reassessment of long-standing local practices. The piece outlines research, consortium consultation, approval processes, implementation challenges, authentication and workflow issues, and post-migration tradeoffs. Ultimately, the author offers practical guidance for new leaders tasked with managing inherited systems, vendor relationships, imperfect information, and strategic change in complex academic library environments.

Locked Out of the Library / Information Technology and Libraries

While incarcerated students face many challenges when commencing higher education, a lack of access to the internet is a considerable barrier. This technological exclusion has implications for the delivery of course materials, most of which are offered only electronically. A project team from Curtin University Library sought to understand and address the challenges faced by incarcerated students in accessing library services, particularly ebooks and audiovisual content. It was found that restrictions related to contract terms, digital rights management, and copyright contribute to a reactive and uncertain situation for library services. This article outlines the state of the problem and offers possible pathways academic libraries can take to improve the state of information access for incarcerated students.

Improving Database Discovery and Understandability by Identifying and Reducing A–Z List Jargon / Information Technology and Libraries

Countless research questions arise when investigating connections between library resource discovery and student success. Existing literature explores best practices of database description language and style, the usability of database A–Z lists, and library resource jargon. Academic libraries continue to grapple with these challenges in resource discovery, even as online searching behavior evolves and new research tools emerge. A research team at the University of Arizona Libraries builds on the literature by examining these topics with a focus on the impact of a user’s academic discipline, university affiliation (faculty, staff, or student), and research experience on their understanding of database terminology, resource content and applications, and A–Z list type filters. The authors conducted an environmental scan of library websites along with several usability tests to identify and reduce library and disciplinary jargon on their A–Z list to make databases more understandable and approachable to all users. This article presents the results of these assessments as a case study for exploring external and internal factors that impact users’ understanding and discovery of databases.

Improving Accessibility of Electronic Course Reserve PDFs to Users with Disabilities at Hunter College Library / Information Technology and Libraries

By April 2027 and 2028, institutions covered by Title II of the Americans with Disabilities Act are expected to be legally required to ensure that digital content created or used at the institution is accessible as defined by Web Content Accessibility Guidelines (WCAG) 2.1 Level AA. The new law strongly emphasizes accessibility of course materials—including PDFs. This case study demonstrates how an R2 academic library staff can enhance the accessibility of PDF course materials by improving the accessibility of electronic reserves (e-reserves) PDFs at Hunter College Library (HCL).

Processes described here can be adapted by other libraries. Supporting campuses’ work to make course readings accessible may be a natural role for academic libraries. Locating or procuring the best quality version of a text available to the institution is a critical task for which libraries are optimally equipped. Furthermore, when readings are available only in print format, libraries can create higher-quality scans than those typically produced when the task is left to individual faculty members.

HCL began improving the accessibility of e-reserves PDFs in 2020. This article shares the knowledge acquired, established processes, limitations, and future directions. The workflow comprises checking each e-reserves reading. For those deemed poor, we locate an HCL collection or open access copy, purchase a digital copy, or remediate. Remediation involves optical character recognition (OCR), fixing errors therein, correcting reading order, removing repetitive headers and footers, and tagging. Literature the authors found on libraries proactively correcting OCR and tagging PDFs—that is, preceding a user’s request—was sparse, with the exceptions of the University of Toronto and the University of Michigan. Literature about proactively doing so for e-reserves was even narrower. This case study is intended to help fill the gap.

Generative AI Meets Cataloging Practice / Information Technology and Libraries

This study evaluates the performance of four generative AI models—ChatGPT, DeepSeek, Gemini, and Copilot—in generating descriptive metadata for bibliographic resources. Models were tested on a small, diverse set of resources using four prompt types: a basic prompt, a basic prompt with an example, a detailed prompt referencing Resource Description and Access (RDA) guidelines, and a detailed prompt with an example. Results show that both detailed RDA guidance and the inclusion of sample outputs improved metadata quality, particularly in formatting and field structure. While DeepSeek and ChatGPT showed better performance on the tasks, all models displayed limitations in parsing and following the prompts, using descriptive metadata fields, analyzing subject headings, and assigning URIs. These findings suggest that while generative AI holds potential to assist in metadata creation, its current capabilities fall short of meeting cataloging standards without human review.

Case Study of the Implementation of AI Primo Research Assistant (Beta Version) in Academic Libraries in Poland / Information Technology and Libraries

One of the generative artificial intelligence tools developed for use in libraries, including academic libraries, is the AI Primo Research Assistant. Of the 65 academic libraries in Poland, only 19 have access to software that supports this tool. In practice, only 9 libraries have implemented it (data from March 2025). For the purposes of this study, original research was conducted to assess the implementation status of the Primo Assistant in academic libraries in Poland. Two anonymous surveys were developed for this purpose and sent to libraries that had implemented the feature, as well as to those with the capability to run the Primo Assistant (i.e., the Primo VE Discovery admin role), in order to gather information on why they had chosen not to implement it. The analysis revealed several positive aspects, mainly a reduction in the workload of staff tasked with preparing publication lists on topics requested by library users. Some concerns were also raised by library employees, mainly regarding the reliability of the metadata provided and the accuracy of the recommended publications. The study also revealed a general lack of awareness and a need for further implementation. This paper presents the first scientific study focused on the implementation of the AI Primo Research Assistant in Polish academic libraries.

Enhancing Information Technology Governance at the University of Riau Library / Information Technology and Libraries

Effective information technology (IT) governance is essential for the University of Riau (UNRI) Library to achieve its research and educational objectives. This paper presents a qualitative pilot study investigating the library’s current IT governance processes, focusing on two COBIT 5 processes—DSS01 (Manage Operations) and DSS05 (Manage Security Services). These processes were selected in consultation with library and IT leadership due to their direct relevance to ensuring operational reliability and safeguarding the library’s information assets. COBIT 5 principles and capability models guide the assessment, emphasizing regulatory compliance, performance monitoring, and stakeholder collaboration. Using a detailed questionnaire and capability model, the study evaluates base practices and work products for DSS01 and DSS05. Results indicate varying proficiency levels, with DSS01 at level 0 and DSS05 at level 1, highlighting significant gaps between current and desired capability levels. Recommendations include implementing standard operating procedures, enhancing security measures, and optimizing resource management. In conclusion, the findings underscore the need for standardized processes, continuous monitoring, and alignment with established frameworks like COBIT 5. By addressing identified gaps and implementing recommended improvements, the UNRI Library can strengthen its IT governance, enhance operational efficiency, and better support its academic mission.

Access Reframed / Information Technology and Libraries

This study critically explores the transformative potential of human-computer interaction (HCI) in reimagining African public libraries as dynamic, user-centered, and culturally grounded spaces. Based on a literature review and comparative analysis of libraries across several African countries, the research investigates how HCI principles can enhance user engagement, usability, and inclusivity, particularly in multilingual, resource-constrained, and postcolonial contexts. The paper situates libraries as sociotechnical infrastructures that mediate between technology, local knowledge systems, and community needs, and argues for the importance of participatory and culturally responsive design approaches in library digitization efforts. The findings highlight significant gaps in current implementations of HCI within library services, including the lack of localized interfaces and limited user involvement in design processes. The study concludes by offering practical recommendations for integrating HCI into library development strategies and advocating for the co-creation of digital public spaces that reflect and empower Africa’s diverse knowledge ecologies. In doing so, the paper contributes to the growing discourse on decolonial approaches to technology and the future of public libraries in the digital age.

The Kids Are All Right / Dan Cohen

A banner that says "2026"

Writing has been light around here recently for a wonderful reason: our twins graduated from their respective colleges over the past month, and we have been in nearly nonstop revelry (and packing, and schlepping…). We are so fortunate to have two great kids; I’m super proud of them.

Speakers at our kids’ commencements, thankfully and remarkably, said little about artificial intelligence, but they did talk a lot about the complex circumstances and especially the psychology of this rising generation, and offered advice on how the graduating seniors should move forward in life given significant headwinds. I suppose it’s tempting to describe and analyze the troubles facing each graduating class, and provide sage guidance in response to the historical moment, but I’m not sure that my kids, their friends, and their generation overall are so very different from any other, or that any distinct advice is needed.

The Great Class of 2026 is, I’m afraid, just like every graduating class: happy and sad, confused and hopeful about the future, striving and procrastinating. Young adults, in other words. Sure, they seem to be impacted by new technology and our dreadful national politics and nerve-racking global challenges, but hasn’t it always been so? My college class graduated into a recession, the rise of the internet, the fall of the Berlin Wall, the chaotic end of the Soviet Union, and a messy war in the Middle East — all of these dominoes falling after a childhood in which we were fairly sure we would perish at any moment in a nuclear war. That was a lot to absorb! Back then, commencement speakers picked up on our anxiety, which had apparently morphed into excessive irony and a general lack of motivation, epitomized by the title and content of a Richard Linklater film: Slacker.

It may have taken some time, but we muddled through. So did the generation another turn of the clock back from ours (Vietnam, stagflation, etc.) and the generations before that (pick your World War and/or the Great Depression, etc.). History is, unfortunately, a procession of horrible developments, but also a showcase of astonishing resilience and creativity. Is it so Pollyannaish to simply say that Gen Z will also find a way forward, and frankly might be better off without pithy advice from the olds? Must we unconsciously mimic the opening of Woody Allen’s fictional commencement address, raising the graduating class’s blood pressure by declaring, “More than at any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly”?

Instead, I saw hope in every joyful row of begowned seniors, students who, despite all of the radical changes and stressful tensions around them, had nevertheless maintained their curiosity and maybe even cultivated a passion during college. Students who found their special niche in music, writing, art, or science, who felt compelled to listen to it all, read it all, see it all, or experiment late into the night, regardless of the requirements of the classroom. I have a feeling that this kind of deep and abiding engagement, born not from careerism but from genuine profound interest, will serve these graduates well in the years ahead. As it always has.


Books I Have Not Written

The class-action lawsuit of authors against Anthropic and its subsequent settlement have helpfully informed me of the many, many other writers named Daniel Cohen, because the settlement administrators, in their quest to match authors and texts, have sent emails and letters asking if I am the Dan Cohen who wrote this or that book. There are too many volumes by The Daniel Cohens to list in full here, but as a public service to a handful of special fellow Dans, I hereby declare:

I am not the Daniel Cohen who wrote The Monsters of Star Trek, but I would wager 100 quatloos on Triskelion that I would greatly enjoy meeting that Dan Cohen.

I am #$%@# mad I am not the Daniel Cohen who penned Famous Curses, because my family is on a mission to bring back the useful exclamation “Gordon Bennett!

I did not write Southern Fried Rat and Other Gruesome Tales, but, based on the delightful cover of this not-me Daniel Cohen book, I probably read it at camp the year it was published.

My final confession: The settlement administrators believe there is a Daniel Cohen who authored a book titled Final Confession, but, alas, I am not the one.

My conscience is now clear.


Tree 2 / Ed Summers

Tree 2

Same as Tree 1 but after playing with some filters on my Android phone before uploading.

Tree 1 / Ed Summers

Tree 1

A reflection of a tree in the Northwest Branch river, cropped and turned upside down.

Bookmarks - data, design, vis, book / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 When Bits Rot - with C McKean, L Talboom, A Page-Mitchell

English Edition: floppy disks, hard drives, CDs, DVDs, SSD drives - no matter what you choose to store your data on - ultimately they all decay. With my guests Callum McKean, Leontien Talboom and Adrian Page-Mitchell, we’re going to talk about what kinds of data we find on old drives, why we want to get them in the first place, and what can go wrong with the storage media. To all of you who love all things retro - we’ll be talking about floppy disks a bit.

🔖 series: learning-rust (10 parts)

  • Learn Rust Basics By Building a Brainfuck Interpreter
  • Learn Rust Ownership and Borrowing By Building Mini Grep
  • Build a JSON Parser in Rust from Scratch
  • Learn Error Handling in Rust By Building a TOML Config Parser
  • Learn Rust Generics and Traits By Building a Mini Blackjack Game
  • Learn Rust Lifetimes by Building a Generic LRU Cache
  • Learn Rust HashMap and Iterators by Building a Git Object Store Reader
  • Learn Rust Closures By Building a Tiny Rule-Based Linter *Learn Rust Smart Pointers and Interior Mutability by Building Git Commit Graph Viewer
  • Learn Rust Concurrency By Building a Thread Pool

🔖 Porting our Django backend to Rust improved the infra usage by 90%

We went from using 220 CPUs and 800 GB of ram to just 24 CPUs and 64 GB. Thus, way less money, less things to maintain.

The number of open DB connections at any point in time have improved quite a bit, from the thousands to hundreds (about ~3-5x reduction).

The good news is that we haven’t even added caching to the Rust backend yet, and query timings are already 5-10 times faster.

🔖 How to Build an Agentic RAG with RubyLLM and Rails

I run a RAG application for Italian pension and tax consultants. Users ask questions about INPS, professional pension funds, laws and regulations, and the app answers using a knowledge base of uploaded documents.

For a long time the app used the classic single-shot RAG pipeline: take the question, search the database, stuff the results into a system prompt, ask the model. It works, but it has a hard limit: the retrieval happens once, before the model has any chance to reason about the question. If the first search misses, the answer is bad and there is nothing the model can do about it.

So I rebuilt the pipeline as an agent. Now the model drives the retrieval itself: it decides what to search, reads the results, searches again with different terms, follows cross references between documents, and only then writes the answer. All in plain Ruby, with RubyLLM and Rails. No LangChain, no Python sidecar.

In this article I will show you exactly how it works, with the real code from my application. One note before we start: since the app serves Italian consultants, all the prompts, tool descriptions and user-facing strings are in Italian in the real codebase. I translated them to English here so you can follow along, but the structure is identical.

🔖 Zooming Out: Can We Integrate IIIF and Wikimedia?

Wikimedia and GLAM institutions share a challenge. How do we make cultural heritage collections accessible at scale without sacrificing quality, provenance, sustainability, or community control? The International Image Interoperability Framework, IIIF, is now used by thousands of institutions to serve high-resolution media through open standards. Wikimedia does not currently integrate IIIF in its core architecture. Should it?

🔖 5 things to know about the Eastern Silver Spring Communities Plan

Since 2023, Montgomery Planning staff have been working on the Eastern Silver Spring Communities Plan, drafting recommendations on zoning and land use, transportation, housing, parks and the environment, economic development and urban design. The plan is expected to set a vision for the area’s future development for decades to come. The plan is bordered by Colesville Road, University Boulevard and New Hampshire Avenue and will include three future Purple Line stations, the Piney Branch Road, Long Branch and Manchester Place

🔖 What Design Can’t Do

Design is broken. Young and not-so-young designers are becoming increasingly aware of this. Many feel impotent: they were told they had the tools to make the world a better place, but instead the world takes its toll on them. Beyond a haze of hype and bold claims lies a barren land of self-doubt and impostor syndrome. Although these ‘feels’ might be the Millennial norm, design culture reinforces them. In conferences we learn that “with great power comes great responsibility” but, when it comes to real-life clients, all they ask is to “make the logo bigger.”

🔖 Gemini 3 Solves Handwriting Recognition and it’s a Bitter Lesson

On our strictest tests, Gemini 3 achieved a CER of 1.67% and a WER of 4.42%. On these tests, any difference between the ground truth and test texts counts as an error. WER is thus almost always a bit more than double the CER because if a single character in a word is wrong, including leading or trailing punctuation like commas, single quotes vs double quotes, etc, the whole word is marked as an error. On this measure, Gemini 3 performs nearly 50% better than the best, fine-tuned specialized models and achieved performance comparable to an early career, professional human typist.

🔖 FacilMap

FacilMap is a privacy-friendly, open-source versatile online map that combines different services based on OpenStreetMap. FacilMap offers the following features:

  • Show different map styles, for example maps optimized for driving, cycling, hiking or showing the topography or public transportation networks.
  • Search for places
  • Show amenities and POIs
  • Calculate a route, optionally showing the elevation profile.
  • Find out what is at a particular point on the map
  • Open geographic files, for example GPX, KML or GeoJSON files
  • Show your location on the map
  • Share a link to a particular view of the map.
  • Add FacilMap as an app to your device.
  • Change the language settings in the user preferences.
  • FacilMap is privacy-friendly and does not track you

🔖 Vector Search, Visualised

SQL makes sense. But when it breaks, you reach for EXPLAIN. Vector search offers no such comfort. Multi-thousand-dimension embeddings, approximate nearest-neighbour indexes, and quantisation tradeoffs make it hard to know what your system is doing, and harder still to diagnose when results quietly degrade. Through interactive visualisations, Simon Hearne shows what embeddings look like in high-dimensional space, what quantisation does to your recall, and how to catch retrieval failures before your agents do. You’ll leave with a sharper mental model and a diagnostic toolkit for the production problems hardest to see.

🔖 Using the Screen Capture API to record a browser window

Once again I am reminded that modern web tech is amazing, and web browsers are incredibly capable. There’s a Screen Capture API to record the screen. You can select a tab, a window, or the entire screen. The feature has limited browser support so I don’t think I’d use it in a big web app, but it’s fine for a one-off screen recording. (I wonder how browser-based video conference apps like Google Meet do screen sharing? Do they use this API, or do they use something with wider support?)

TASCAM recovery / Ed Summers

TL;DR if you have a TASCAM 788 backup and don’t know how to get the audio out of it this script might help. Also: AI tools work best when paired with expertise.


I needed to take a very personal excursion into digital preservation recently as I attempted to listen to some audio recordings my brother John had made about 20 years ago. John died recently, and is sorely missed by his friends and family.

John was a continuous source of inspiration for me, because of his many varied interests and projects. One thing he did consistently since he was a teenager was perform music as a singer-songwriter.

As my family and I went through the very difficult process of emptying his apartment, we discovered a set of recordings he had made on CD-R. Three of these CDs were clearly conceived of as albums, and easily mounted as CDDA when I popped them in my CD player.

However he also left a binder of CD-Rs, where each CD was neatly labeled with a song title and a year. All in all there are 108 of them, from the 2003-2008 time period. There is a lot of material on these CDs that is not present on the three albums. However, when I popped these in my CD player all I saw was a macOS error dialog box saying:

The disk you attached was not readable by this computer.

John’s binder of CD-Rs

At first I thought they might be damaged or corrupted. But it seemed unlikely that so many of them would be. After some asking around I got pointed to two excellent guides to working with CDs:

These guides were great, and did help me extract the raw data from the CD-R with cdrdao, but ultimately I was unable to determine what format the data was in using tools, like file, Siegfried and Droid.

In a fit of desperation I spent some time in Claude Code trying to see if it could help me identify what format the data was in. Despite several forays, it kept going round in circles, burning tokens.

One of those forays led me on a wild goose chase installing an old version of macOS in order to see if an old version of Retrospect might be able to read the CDs (it didn’t).

During this time I got some excellent advice over in the Fediverse at digipres.club. One of those messages was from Ross Spencer who took a look at a sample raw CD image. He was able to spot some markers that pointed to it possibly being a backup from a TASCAM DAW, specifically a TASCAM 788 (I believe Ross was using either strings or a hex editor to look for these clues).

TASCAM 788

Unfortunately, after poking around in various user forums, I discovered that there were not really any tools for working with TASCAM 788 backups. Everyone seemed to be recommending the purchase of a TASCAM 788 and its CD Burner, since the data was in a proprietary format, and there were no emulators.

Before dropping some money on Ebay I decided to roll the dice with Claude Code again, but this time with the more specific guidance that this was likely a TASCAM 788 backup, and asking about options for recovery. If you are interested you can read the transcript for this session. The key part of the back and forth for me was:

The 2488 stores audio as raw 16-bit or 24-bit PCM at 44.1kHz in a proprietary block structure. Once you identify the byte offset where audio data starts, you can use Audacity’s “Import Raw Data” with 24-bit signed big-endian PCM, 44.1kHz, to listen and verify.

I prompted it to try to identify the offset, so I could attempt the import in Audacity. It did some work writing Python snippets and executing them for a few minutes, and then output a likely offset. The first time I read it in I only heard white noise. But after twiddling some of the import options in Audacity I saw some promising waveforms appear in the Audacity display. And when I pressed play ✨✨✨✨ instead of white noise I heard John’s guitar and voice!

Audacity screenshot of imported raw data

What appeared to be a single track turned out to be multiple tracks created with the TASCAM, that were joined together. The final segment was the completed mix.

I continued to work with Claude on a program that would identify the offset in the raw CD data, then extract a WAV file, and then extract the separate tracks, as well as the complete track. It did this by looking for gaps inside the audio. I put the program here:

https://github.com/edsu/jas-discs/blob/main/extract_tracks.py

Here is the guitar / vocal first track (there are a few seconds of silence at the beginning):

And here is the mix including percussion and keyboards:

These recordings are Copyright John Summers CC-BY-NC

I have since been able to find John’s TASCAM 788 at my brother Matt’s house–although it doesn’t have the SCSI external CD burner anymore. So there’s no way to read the CDs with it.

These CDs and songs are important enough to me that I want to see if the actual hardware can do a better job of preserving John’s work. So I’ve got a bid one of the external CD-Recorder devices I found on Ebay.

John clearly spent a lot of time and care taking a snapshot of these songs he used to perform in coffee shops around Bucks County Pennsylvania. I plan to release some of them on his Bandcamp, with some of his artworks as album covers. I want to share them with people who knew him, and put these songs out into the world in a way that respects his memory and creative work, while also being something that he just wasn’t focused on as an artist. For John it was the creative process itself that mattered most.

None of this will bring John back of course. He’s gone now, and at peace. But he will always be remembered by those who loved him. Look for more posts here after I’ve been able to extract these songs in total.

Why are cached input tokens cheaper with AI services? / Xe Iaso

When you see AI model pricing pages, you usually see things broken down like this:

ModelContext LengthMax CoT TokensMax Output TokensInput Price (Cache Hit)Input Price (Cache Miss)Output Price
deepseek-chat64K-8K$0.07 / 1M tokens$0.27 / 1M tokens$1.10 / 1M tokens
deepseek-reasoner64K32K8K$0.14 / 1M tokens$0.55 / 1M tokens$2.19 / 1M tokens

Source: DeepSeek API Docs

If you manage to have most of your input tokens be cached, you save a huge amount, in this case $0.20 per million tokens. What does this mean though? What does caching do that makes you save so much, in some cases upwards of tens of kilodollars?

Someone explain the cached vs not thing to me for how this is $10,000 worth of savings lol



[image or embed]

— Chimney Sweepers Local 420 FKA yburyug (

@bobbby.online

)

June 12, 2026 at 12:39 AM

Warning

I'm gonna be totally honest, I barely understand the basic outline of the math involved here. Where possible I am to not be completely wrong here, but I'm not going to emit something 1:1 accurate with the mathematical truth of large language models' inner workings. Bear with me.

When you make an API call to large language model services, you make an API call like the following:

curl http://localhost:11434/api/chat -d '{
          "model": "llama3.2",
          "messages": [
            {
              "role": "user",
              "content": "why is the sky blue?"
            }
          ]
        }'
        

That messages element is the key bit. Every time you accumulate messages from the initial system prompt, initial user request, AI responses and any tool use requests/responses, you add to that array and make it grow bigger and bigger.

A good way to think about this is that sending a conversation to a large language model is like having a pair of people share a roll of paper on two different typewriters. Every time you finish your message, you send the roll of paper back to the AI model and it has to re-read through the entire conversation in order to start typing on the end with its response. As the conversation gets longer, this gets more and more expensive because the model has to recalculate its internal state all over again for every additional message.

However, large language model inference is complicated but deterministic. Given the same inputs, you will always get the same output. This means that you can use a technique called key-value caching (KV caching) in order to save that intermediate state and use it for next time. Most of the time this cache is a prefix cache because that allows you to just add on more messages to the end of the request pretty easily and be fine.

Imagine something like this:

curl http://localhost:11434/api/chat -d '{
          "model": "llama3.2",
          "messages": [
            {
              "role": "user",
              "content": "why is the sky blue?"
            },
            {
              "role": "assistant",
              "content": "The sky is blue because of a phenomenon..."
            },
            {
              "role": "user",
              "content": "But I am looking outside right now and it is orange!"
            }
          ]
        }'
        

If the model has already processed the question about the sky being blue and generated the response about Rayleigh scattering, it doesn't need to process both of those messages again to answer the user's question about sunsets. In production AI model deployments you would put that generated intermediate state into the KV cache so that the model doesn't need to run twice for the same data. This saves time and effort on the side of the AI model provider, and currently model providers decide to pass that savings onto API users in the form of cheaper inference costs for cached lookups.

As you develop an application with AI in it, try to avoid changing any inference settings or previous messages between prompts. This makes your application's queries much more likely to read from the cache, making it faster, reducing the environmental impact, and saving you(r users) money.

Reading the room: What global library leadership conversations teach us / HangingTogether

This is the first installment of a three-part series on global library leadership engagement, contributed by Ellen Hartman, OCLC Leaders Council Manager. We’re grateful to Ellen for sharing her perspectives on this topic.

Proof we engaged face to face

At a recent gathering of the OCLC Leaders Council, something happened that I always hope for but never take for granted. Connections were being made, there was laughter, sidebar conversations over lunch and dinner, and a willingness to challenge each other’s ideas, honesty about what people were struggling with, and genuine curiosity about what others are doing. All of this was built on a foundation of trust that made these in-depth conversations possible.

These moments don’t happen automatically. In my experience, they take timeand often, the opportunity to meet in person. Meeting online can be very efficient, but it can feel rushed and impersonalit’s hard to truly get to know each other through a screen. Being in the room together over the course of a few days, in a small enough group that you actually get to speak to everyone, creates a solid foundation for future opportunities to meet again, online or in person, to build on the connections, themes, and conversations that started there.

What made this gathering particularly significant was its global dimension. Library leaders do come together regularly, but often within their own region, or among peers from the same library type. Academic and public library leaders, for instance, don’t always get the opportunity to meet for in-depth conversation, even though there is much they can learn from each other. Conversations organized by library type or region have real value, of course, but there is something additional that comes with a broader perspective that is still rooted in the library ecosystem while extending beyond your usual network. Every perspective in the room adds something, regardless of what an institution has or hasn’t yet achieved. The value of these conversations comes from the range of experiences present.

OCLC Research has published work on building relationships across unit boundaries within institutions (social interoperability), as well as creating and sustaining successful multi-institutional collaborative partnerships. But what I’d like to talk about here is more fundamental—the foundation for building successful partnerships: global library leaders from a variety of backgrounds and experiences engaging with one another in the same room. Prompted by the recent Leaders Council meeting, here are some reflections on the practical realities of these conversations, intended to deepen understanding and maximize their effectiveness.

Same words, different realities

Across international leadership spaces, a remarkably consistent vocabulary tends to surface. Terms recur across sessions, regions, and formats, and their repetition signals that we are all on the same page: a reassurance that participants are engaged with the same broad challenges and moving in a broadly similar direction.

The problem is that shared language doesn’t necessarily mean a shared understanding or a shared reality. One of the things that becomes apparent, watching these conversations unfold, is how often the same word lands differently depending on who is in the room.

Take efficiency, a term that surfaces regularly in conversations about how libraries operate and plan strategically. In some contexts, efficiency encompasses decisions about workforce size and structure. In others, those decisions are shaped by employment frameworks that lead to a very different kind of conversation, shifting the focus instead toward technology, software, or finding different ways of working within existing structures. The word is the same. The need it describes, and the range of solutions available, are not. This is why you need a deeper understanding of each other’s context to find out where you are using the same words but aren’t speaking the same language.

Glimpses, not full pictures

Even with that understanding in place, international leadership conversations can only ever offer glimpses of each other’s reality rather than the full picture. You see enough of someone else’s context to recognize the challenge, but rarely enough to understand all the constraints behind it.

This matters because those constraints are often what make the difference. Take something many library leaders struggle with: making the case for their library’s value to the broader institution or community they serve (for more on this topic, see OCLC Research’s latest report!). Some leaders have, through long-term effort and considerable perseverance, managed to position the library as visibly central to their institution’s priorities and a key part of its success. For others, making that same case remains difficult. The reasons could be structural or personal: the physical or organizational distance between the library and the part of the institution that makes key decisions, the data available to demonstrate the library’s impact, or the library leader’s own position, voice, and access to the right conversations at the right time.

In international settings, what tends to surface is the success story. What is harder to showcase is the full path to that success. The years of lobbying, the hundreds of stakeholder conversations, the incremental steps that made this outcome possible. A leader who has achieved that recognition may share what they did in good faith and genuinely want to help others reach the same goal. But because the conditions that made their success possible are often invisible in how the story gets told, it can be hard for them to understand why the same challenge feels insurmountable to a peer.

The value outside of the program

International leadership meetings are often evaluated by what happens in the formal program. But some of the most valuable exchanges happen elsewhere. Recognizing that is part of understanding how these spaces work in practice.

In smaller gatherings, it’s the time outside the formal agenda where a lot of the magic happens. When a group of library leaders meet for the first time, they are still in the process of getting to know one another. This is why you can’t expect them to immediately share their biggest challenges or most acute pain points. There is a measure of trust building that happens as a gathering takes place, especially over multiple days. It’s often after the official program ends, and there is room for leaders to relax and reflect together (for example, during dinner or at the bar) that the more personal and complex topics get discussed.

That kind of conversation requires enough prior exchanges that people feel safe being a little vulnerable. Admitting that your library is struggling to secure its position, or that you haven’t found a way to make your value proposition tangible enough to institutional leadership or other stakeholders that control funding, is not something most people are willing to do in a room full of peers they’ve just met. It becomes possible when the group has had time to become something more than a collection of strangers.

This is one of the reasons smaller, sustained gatherings tend to produce a different quality of exchange than large conferences. It is also why the informal spaces within those gatherings deserve to be nurtured rather than left entirely to chance.

No neat resolutions needed

One expectation worth setting aside is that international leadership conversations should resolve into clear conclusions. They rarely do, and that is not a failure.

Conversations like these do not need to end in consensus or a neat step-by-step path forward. It’s often the process of sharing and reflecting on both differences and commonalities that provides the greatest benefit. It might be an idea you hear and want to incorporate in your own library. A perspective that’s truly new to you and makes you see a topic in a different way. Or simply the opportunity to take a subject that was discussed at surface level and deepen the conversation in future gatherings.

That is why continued engagement matters more than resolution. Understanding accumulates across multiple conversations, multiple gatherings, and sometimes multiple years. It cannot be compressed into a single meeting, however well designed. The friction and the moments of genuine surprise are part of the value. Smoothing those moments away or rushing toward consensus risks losing exactly what makes international exchange worthwhile.

Conclusion

International leadership spaces are often judged by the ideas they surface or the alignment they appear to produce. But their deeper value lies in the glimpses they offer into realities that are different from our own. Those glimpses don’t tell the full story of what other library leaders are experiencing, but taken together, they help form a better understanding of what experiences are out there.

When designed well and when opportunities for informal interactions are cultivated, global library leadership spaces create the conditions for the kinds of conversations that go deepest. Those conversations rarely happen on the agenda, but rather emerge when enough trust has been built that people are willing to be open and candid with one another. That is not something that happens automatically: it requires continued investment in bringing people together, and repeated exposure to each other’s contexts, experiences, and points of view over time. Trust is not built overnight.

The next post in this series takes a closer look at what global engagement actually involves beyond the conversation itself and why showing up, in every sense of the phrase, costs more for some than others.

The post Reading the room: What global library leadership conversations teach us appeared first on Hanging Together.

Inside Out / Ed Summers

I have a problem with RSS. Not RSS itself, RSS is great!

The problem is that I subscribe to more feeds than I can possibly read, so the unread count in FreshRSS climbs faster than I can bring it down. Some days I skim titles, declare bankruptcy, and mark everything as read. Other days I let it pile up and feel guilty.

I’ve tried to using newer tools like Current which was definitely an improvement, but still didn’t quite do it. My friend Dan has been working on a new RSS tool that works a bit like a personal newspaper, that seems like it could be extremely helpful, and I’m keeping my eye on it. But meanwhile the list of unread posts grows…

Now, I’ve been very reluctant and slow to introduce LLMs into my daily work. But even from under my rock, in a cave, down by the river, I’ve heard that LLMs are good at text summarization.

I thought maybe, just maybe, I could try using one to summarize my unread posts? It seemed like a good fit for an experiment since the impact of getting things wrong is basically zero (in theory).

I wanted to try routing my unread RSS posts through an LLM to get a daily digest. From under my rock I’d also heard about Model-Context-Protocol (MCP), and how it is going to change everything. So I thought it would be a good exercise in seeing how that works in practice with a tool like Claude Code. I’d use Claude Code’s MCP support to connect directly to FreshRSS and ask Claude to summarize what I’d missed. Yeah, that’s the ticket.

This is the Way?

The first thing I tried was ChrisLAS’s freshrss-mcp server, which wraps the FreshRSS GReader API and exposes it as a set of MCP tools. The idea is that you drop it into your Claude configuration and Claude can then call those tools to fetch and read your articles.

I gave it a try, and it worked! But the results were… mixed. Claude would usually fetch articles. But then it would produce a lot of diagnostic chatter alongside the actual summary: narrating its own tool calls, noting what it was about to do, explaining why it was skipping certain things, asking for permission for this and that.

And more frustratingly, it would sometimes take strange detours: executing inline Python code, and Unix tools to do things it could have done by calling the MCP tools more directly, wandering into unnecessary computation. The experience felt noisy and unpredictable, and (frankly) just a bit scary.

I started by creating some “skills” and some scripts for those skills thinking it would make things a bit more deterministic. It kinda did?

I thought maybe my problem was that the skills weren’t bundled together, so I built my own plugin: freshrss-claude. This version bundled the MCP server as a Claude Code plugin with a set of “skills”, the structured prompts to guide Claude through fetching and summarizing in a more controlled way.

It seemed better? Not needing to start the MCP server was definitely better. But ultimately it wasn’t as big an improvement as I’d hoped for. Claude still exhibited strange behaviors: writing and executing Python scripts unnecessarily, going off-script in ways that were hard to anticipate. The summaries themselves were fine when they arrived, but the path to getting them there was erratic and unpredictable.

The last straw for me was the idea of running this Rube Goldberg machine from a cron job to generate the summary for me automatically. To run it automatically I needed to grant it all kinds of permissions to ensure it ran through. This scared the shit out of me, given it was giving it permission to run arbitrary Python programs and reach out to the web, and interact with the filesystem. Running it once or twice manually was ok. But sticking it in my crontab and forgetting about it? Forget about it. I exprerimented briefly with putting things in a Docker container, and Claude Cowork’s sandboxing, but then…

Turning it inside out

I stepped back and rethought the problem. The thing I’d been trying to do, have an LLM orchestrate a set of tools to accomplish a task, is one (seemingly popular) way to use an LLM. But it turns out to be kinda demented. You’re asking the model to plan, to sequence, to decide. You are asking it to be An Agent. Sure models can do this, but they are not reliable in the way a simple program is. They wander. They improvise. They sometimes decide to take a detour. Do I really benefit from this runtime model in this little RSS digest app? Nah, not really.

So the alternative, and this is the inversion that made things click for me, is to write a deterministic program that calls the LLM as a component, rather than letting the LLM drive the program as an Agent. My code fetches the articles. My code shapes the prompt. My code writes the output to a file. The LLM does exactly one thing: it reads the content I hand it and produces a summary.

Take Two (or Three, or Four?)

I threw it all on the fire and started over by writing rss-digest instead. Well, truth be told, Claude and I wrote it. Ok, ok, mostly Claude.

It’s a small Python CLI that connects to any GReader API-compatible RSS reader (FreshRSS, Miniflux, Tiny Tiny RSS, The Old Reader), fetches your recent unread articles, and asks an LLM to produce a digest. Because it uses LiteLLM under the hood, you can point it at any compatible model: OpenAI, a local model running in LM Studio, whatever you prefer.

The output is a Markdown file (or HTML with –html). I have a cron job run it in the morning and drop a file on my desktop for me to read. Here’s an example of what it looks like.

For smaller batches (≤25 articles) it gives you a structured list. For larger ones it produces a curated prose summary grouped by theme. You can pass a custom system prompt file if you want to tune the style or grouping. You can pass –mark-read if you want it to mark everything as read afterward.

The tool is on PyPI and the code is on GitHub. I’ve just started using it, so it quite possibly has problems. The prompt that is used for doing the summarization is configurable. If you have a different take on the prompt or want to extend it, please send me a pull request so I can add it as an alternative.

So…

What I keep coming back to is the design lesson underneath all of this.

There’s real value in being thoughtful about which part of your system is deterministic and which part is probabilistic. There’s no doubt that LLMs are magical things, but it’s not a reliable program. It shouldn’t always be the thing making decisions about what to fetch, when to stop, or how to structure output. Hand it a well-formed input, ask it a clear question, and (hopefully) it will return something useful. Everything else, the plumbing, the sequencing, the file I/O stays in your code that you can look at, and test and run directly.

I’m not saying all programs using LLMs need to take this approach. I’m just saying maybe you don’t need MCP, Agentic AI, etc, etc all the time. Experiment with it, but don’t forget to turn it inside out when you need to.

Library of Congress Storage Architecture Meeting 2026 / David Rosenthal

Once again I attended most of the library of Congress' Designing Storage Architectures workshop remotely. I apologize for the delay in posting this; domestic duties have kept me very busy recently. Below the fold notes on the talks that caught my attention, based on my now somewhat memory and the slide decks for the talks from the Library of Congress website.

Data Storage Trends

As usual, IBM's Georg Lauhoff provided an invaluable overview of the storage industry as of late 2025, co-authored with Sassan Shahidi. They make an important point that I have been making since at least 2018's Archival Media: Not a Good Business:
Challenges of Alternative Archival Technologies
• Alternative archival technologies face technical and economic hurdles.
This justifies their focus on flash, hard disk and tape. Their "exabytes shipped" graph shows that indeed Hard Disk Unexpectedly Not Dead; the dramatic decline in HDD's share since 2008 reversed in 2024.
The key metric for technological progress in traditional storage media is areal density:
  • Lauhoff and Shahidi's graph shows that tape, which has the easiest path because of the relatively large size of the bits, has continued its steady growth, although one could argue both that their 24% annual growth exaggerates the period since 2017, and that INSIC's projection of 28% is optimistic.
  • It is clear that HDD areal density progress slowed dramatically about 2010 to around 11% per year. But the developments Jon Trantham reported, see the next section, could lead to a significant acceleration in HDD areal density.
  • Flash has continued a steady 30% per year growth since about 2010, thanks to stacking cells vertically and storing multiple bits in them. Both of these have limits, into which the industry will eventually run.
As regards the relative cost per TB of the three media, the big picture is that since around 2010 change has been very gradual. Tape and flash have both become cheaper relative to HDD, but the rate of change has been much lower than predicted.

Lauhoff and Shahidi conclude that:
  • Tape Storage: continues to evolve.
  • HDD: improvements slow down but recently high demand.
  • NAND: well-suited for hot storage but not for archival purposes.
  • Lack of Alternatives: Within the foreseeable future (within 10 years), there are no viable alternatives to Tape, HDD, and NAND storage.
  • AI leads to storage demands across the tiers
This last point was a theme for the entire meeting. But it is important to note that the meeting was too early to capture the full impact of AI on the cost and availability of media and systems.

Mass Capacity Storage in an AI Era

Jon Trantham of Seagate reported that after more than a quarter-century of work and 14 years after HAMR was demonstrated in the lab, Seagate has finally been shipping HAMR drives in quantity since early 2025.
He also announced that they have started to ship their 40TB HAMR drives. Their roadmap to 100TB/drive presents some significant challenges, as shown in Trantham's slide. The history of HAMR shows that Seagate can surmount major technical challenges, but it may take longer than they project.
One of Trantham's slides vividly illustrated the technology challenges the HDD industry faces, showing to scale to evolution since 1997 of the sizes of the bits on the media, the reader, and the writer. Note the 1610-fold decrease in the area of the writer, the 305-fold decrease in the area of the bit, and the 289-fold decrease in the area of the reader.

Flash for Archival Storage

Fifteen years ago, Ethan Miller, Ian Adams and I published Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes. It was inspired by work at Carnegie-Mellon from 2009, FAWN: a fast array of wimpy nodes, which argued that implementing fast storage using large numbers of small nodes built from cell-phone technology could save two orders of magnitude in energy per query. We argued that it would be possible to build low cost, low energy archival storage systems using a similar approach.

Our idea was ignored, but at this meeting Ethan Miller revived the idea of using flash as an archival medium. He argues for a rack-scale system storing 500PB/rack built from 5U shelves, similar to Backblaze's, each holding 216 of Pure Storage's 300TB DFMs (direct flash modules) stacked vertically.

There are three big challenges:
  • First, if all the DFMs were actively I/O-ing the rack would draw 45KW. Supplying the rack with that much power and cooling it would be very difficult (see the design of Nvidia's racks). But, just as with Facebook's hard disk cold storage, this can be mitigated by scheduling accesses so that only a small proportion of the drives are active.
  • Second, flash cells gradually leak electrons, so must be regularly refreshed by reading and re-writing them. This task must be scheduled along with the application's reads and writes, but doing so is fairly easy since the refresh timing isn't critical.
  • Third, flash is more expensive per TB than hard disk or tape. As I have argued for a long time, in the archival storage market the time value of money makes it difficult to justify trading increased capex for decreased opex:
    • The opex savings are significant, with essentially no mechanical failures, more benign failure modes, and much higher bandwidth for erasure code recovery.
    • Miller argues that the capex isn't as bad as the cost of the media makes it look, because at 0.5EB/rack there are savings in space, power and cooling. He doesn't point out that the lower latency for read access potentially allows for the elimination of an entire warm layer of the storage hierarchy.
    But he acknowledges that AI is driving up the media cost. This is probably only relative to tape, since hard drive prices are also skyrocketing.
Miller argues that, over time, flash costs will come down. The scope for further shrinkage of the cells, and the addition of more layers, is limited. Once that happens the fabs that manufacture flash will gradually fall behind the leading edge and become depreciated.

Although I'm naturally biassed, I think Miller's case for archival flash is worth a detailed investigation.

Avoiding the Pitfalls of Cloud Storage for AI Applications

Fourteen years ago in Cloud vs. Local Storage Costs and More on Glacier Pricing I started writing about the way the complex and somewhat opaque pricing models of cloud storage platforms made it difficult to estimate how much you would end up paying. People are just now figuring out that AI has the same problem. Neither is an accident; these pricing models serve two goals important for the platform's business model. First, the purchase decision is based on the "Low, Low" advertised price. Second, once you discover how much more you're actually paying, you face the lock-in created by egress fees. In 2019's Cloud for Presevation I wrote about how egress charges implement vendor lock-in.
David Boland of Wasabi presented a current analysis of this issue. He reports that about half of all the organizations they surveyed exceeded their budget for public cloud storage.
The budget overruns were caused by the fact that the actual spend was about double the sticker price for the storage. Fees were the culprit, which by design are much harder to project.

Digital Storage Architectures for AI and ML

Will Cavin of Amazon had two important iterms of news:

Announcing AI for Libraries – a weekly newsletter / Artefacto

AI is one of those generational tech topics that isn’t going away soon. But the signal to noise (or hype to reality) ratio can be truly overwhelming.  There are just so many links, opinions, new resources that are getting lost in the mix. And that’s for us information and tech nerds – we can only [...]

Continue Reading...

Source

2026-06-09: Teaching Database Concepts for Senior Undergraduate and Graduate Students at ODU / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

In the Spring 2026 semester at Old Dominion University (ODU), I taught CS 450 (Undergrad) / CS 550 (Graduate): Database Concepts. The course was fully online, with synchronous live Zoom sessions held twice a week. The attendance was not mandatory but strongly encouraged. All lectures were recorded and made available for students to access whenever needed.


Figure 1: Canvas course page for CS 450/550: Database Concepts


Through this blog post, I want to share my experience of teaching a senior-level undergraduate/graduate course for the first time, the behind-the-scenes realities of course preparation through to the end of the course, and how student feedback actively shaped the course as it progressed. 


Since the course had been taught previously by other instructors, materials were already available, which made things easier. Rather than building everything from scratch, I started by copying over the existing course structure and then carefully updating it to align with the current semester. The more time-consuming part was setting everything up, cleaning up the Canvas course, especially updating deadlines and revising the syllabus, while ensuring the topics were properly aligned with assignment deadlines. If you are instructing for the first time, it is very important to make sure you get access to the course in time, so you can set everything up without a rush.

Throughout the semester, to make the most of class time, I spent a couple of hours before each session preparing things such as reviewing material, planning examples, and thinking through how topics would connect. I tried to debug issues during the class in real time whenever possible. If something took longer than expected, I pushed it to the end of class or moved it to the office hours. It helped me to continue the flow of the topic without interruptions.


I was able to experience first hand how handling a class of 50 students without a teaching assistant (TA) was, honestly, a lot more work than I expected. Grading labs, homework, quizzes, and discussions while also preparing for lectures and responding to emails required a constant balance. I wasn’t always perfect, but I made a steady effort to stay on top of it. Grades were returned as quickly as I could manage, and emails were typically answered within 24 hours, often sooner. Again, it reinforced something I had already noticed as a student: timeliness matters. Things do not have to be instant, but when there is a clear effort to respond and follow through, it builds trust and keeps students engaged.


One of the first challenges I faced as an instructor to this course involved managing classroom dynamics. After a few classes, a student shared a concern that some well-intentioned peer engagement (jumping in to answer questions or adding explanations during lecture) was becoming distracting to follow along. It was a fair concern, and an important one. At the same time, I didn’t want to discourage participation. Active engagement is something every instructor hopes for, and it was clear that students were eager to contribute. My challenge was to find the right balance. I responded by acknowledging the concern and assuring the student that I would make adjustments so that participation remained helpful rather than overwhelming. Before taking action, I also reached out to a mentor for advice, which helped me approach the situation more thoughtfully. I thanked students for being engaged and willing to contribute, but also clarified expectations: participation was welcome, but lectures and question answering would be primarily instructor-led, with designated moments for peer discussion. I also reflected on something I had noticed during the class introductions: students were coming from a wide range of backgrounds. Some had prior experience with databases, while others were encountering these concepts for the first time. Because of that, maintaining a consistent pace and structure was important. I believe that framing it this way helped convey the message to the students that my goal is not to limit participation but to support a better learning environment for all. There were no further concerns raised afterwards and the students remained engaged while being supportive of the entire class. 

   

Midway through the semester, I conducted an anonymous check-in survey to better understand how students were experiencing the course. To encourage participation, I offered a small amount of extra credit, which resulted in a strong response rate.


Figure 2: Screenshot of the CS 450/550 mid-semester student check-in survey page 


Overall, the feedback was encouraging, most students agreed or strongly agreed that assignments were clear, the workload was manageable, and the pace was appropriate (Figure 3). But what mattered more were the written responses. They highlighted patterns that helped me see the course from the students’ perspective (full set of responses).


A few consistent concerns stood out:

  1. Some students said they weren’t always sure what to prepare before class or whether a session would lean more toward lecture or lab. That feedback pushed me to be more specific in my announcements, clearly laying out what each class would cover.

  2. Several students pointed out that while their answers were marked incorrect or partially correct, the reasoning behind it wasn’t always clear. This was a fair point, and a difficult balance when grading at scale. Still, I made a more conscious effort to leave clearer comments.  

  3. Even when students understood the concepts, many struggled to translate them into SQL queries or ER diagrams. That reinforced something I kept coming back to: the need for more in-class examples and live coding, which I continued to prioritize.

  4. Interestingly, a lot of students said the challenge wasn’t the material itself, but managing their time. A few students shared situations where missing a single assignment significantly impacted their grade. This feedback later influenced my decision to allow requests for reopening missed work.


At the same time, there were plenty of positive notes that helped confirm what was working:

  1. Students consistently appreciated the clarity of explanations and examples.

  2. The labs and live coding sessions were frequently mentioned as highlights.

  3. Many felt the course structure was organized and manageable.   

  4. Some even described it as one of the best online courses they had taken.


I also asked students a simple question: what’s one thing I should keep doing, and one thing I could do better? Here are some of the responses that stood out:


“The instructor is great. Instructions are clear, vibes are good, I would recommend this class. The homework is work intensive but not unreasonable.”


“You have been doing a great job and this has been one of the best online courses I have taken at ODU”


“very good at explaining things, even when the students dont seem to get something she fines a new way of explaining it so they get it.”


“The instructor is accommodating to students within reason and I believe that is something they should keep doing.”


“keep being a great teacher :)”


Figure 3: Summary of student responses to four questions: assignments, workload, grading, and pace

At the mid-semester point, once the grades were up-to-date, I started reaching out to students who had missing work or were falling behind. The intention wasn’t to penalize them, but to give them an opportunity to catch-up. At the same time, I made a point to recognize those who were consistently performing well and allowed all students the same opportunity to request the opportunity to catch up  on any missed assignments to maintain fairness. Many students responded well to that nudge.   


One practice I intentionally carried forward from my own experience as a student was leaving comments on graded work, not just when points were deducted, but also to acknowledge strong submissions. It is a small effort from my end, but it helps students feel seen and motivates them to keep improving. As a student, those were the moments we looked forward to, knowing the instructor noticed good work.


As the semester came to an end, the focus shifted to final evaluations, especially grading the course projects and submitting final grades to the university. One thing I did not fully anticipate during this phase was the time needed to carefully evaluate student projects. Each submission reflected a significant amount of effort, and I wanted to give them the attention they deserved. As a result, grading ran later than I had initially expected, although it was still well within the official deadline. 


Teaching this course taught me some important things. Good teaching is not about getting everything perfect, it’s a way to strengthen your own knowledge while sharing that knowledge in a way others can truly grasp. It is also about being responsive, thinking about what’s working and what isn’t, and being willing to adjust along the way. Managing a full class without a TA was basically a one-person band situation (except I was the entire percussion section, keeping tempo, fixing the rhythm mid-performance, and still trying not to miss a beat while everyone else expected a flawless show). But throughout the semester, I focused on doing the best I could and continuously improving based on student input. Overall, this experience was incredibly rewarding and reaffirmed my plan to pursue a career in academia.


Acknowledgements


I sincerely thank my advisors, Dr. Michele C. Weigle, Dr. Michael L. Nelson, and Associate Professor & Assistant Chair of the Department of Computer Science, Dr. Steven J. Zeil for providing me with this invaluable opportunity to gain teaching experience as a PhD student. I am also grateful to my advisors and my colleague Dr. Bhanuka Mahanama, for always being available to answer questions. Special thanks to Dr. Santosh Nukavarapu for his mentorship throughout the semester and Syed R. Rizvi for providing the course slides. Credit for establishing and continuously refining this structure should go to the instructors who have taught the course over the years, including but not limited to Drs. Irwin Levinstein, Jian Wu, Vikas Ashok, Syed Rizvi, and Santosh Nukavarapu.


And finally, a very special thank you to my husband, Skanda Siva, for being endlessly flexible with his schedule and for his constant support, and to Yara Siva, who may not know it yet but was my tiniest companion through it all.

~ Himarsha Jayanetti (HimarshaJ)

"No way to prevent this" say users of only language where this regularly happens / Xe Iaso

In the hours following the release of CVE-2026-45447 for the project OpenSSL, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a heap use-after-free in PKCS7_verify(). This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Prof. Fabian Greenholt, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

Giving your Go apps Tigris superpowers / Xe Iaso

Tigris is S3-compatible, which means you can point the AWS SDK at it and most things just work. The catch is that the Tigris-exclusive features—bucket forking, snapshots, object renaming, and the like—need verbose workarounds because the AWS SDK doesn't know they exist.

So we wrote a Go SDK that does. It comes in two flavors: the storage package is a drop-in replacement for the standard S3 client with first-class methods for the Tigris-specific operations, and simplestorage is a higher-level client for the common single-bucket case that infers its configuration from the environment so you stop passing the same parameters over and over. You can adopt the Tigris features incrementally without refactoring your existing S3 code, and the simpler API still works against other S3-compatible providers.

I wrote up how it works and why we built it over on the Tigris blog.

WARCbench: A Swiss Army Knife for WARC Processing / Harvard Library Innovation Lab

The Perma team is excited to announce WARCbench, an open-source tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.

WARCbench builds on over a decade of experience gained from developing Perma.cc. Over that time, we’ve accumulated a collection of scripts, utilities, debugging workflows, and one-off experiments for dealing with web archives. WARCbench brings together those processes into a simple command-line tool that helps web archivists make sense of the wild, occasionally malformed, and deeply heterogeneous web archives that web archivists encounter in practice.

WARCbench was designed to make as few assumptions as possible about your familiarity with web archives, the kind of WARC you are working with, or what you want to do with it. It is intentionally a command-line tool. You can use it to explore and work with WARC files even without deep prior knowledge of the format, though it does assume you’re comfortable using a terminal and open to a bit of experimentation. The goal is not to hide the complexity of web archives. It is to make that complexity easier to inspect, manipulate, and learn from so you can experiment and iterate.

While many existing WARC tools are optimized for specific production workflows, the exploratory, in-the-moment WARC wrangling and debugging work archivists and developers often need to do benefits from different design choices. Sometimes you need to inspect a malformed or misbehaving WARC. Sometimes you need hooks and custom callbacks for an experiment. Sometimes you need to optimize for speed, memory, or convenience. Sometimes you just need to look and see what is there before deciding what to do next. WARCbench was designed for those moments.

We don’t know all the ways researchers or web archivists might use WARCbench, but we hope it becomes a versatile Swiss Army knife that others will find valuable to keep in their toolkit too.

Links

Slide Deck from IIPC Web Archiving Conference Presentation on April 21

 

Thanks and acknowledgments

We would like to thank our colleagues Chris Setzer and Ben Steinberg for their help and support in developing this tool.

WARCbench logo by Jacob Rhoades.

Welcome Ann McCranie, Director of Research Insights / HangingTogether

We’re delighted to welcome Ann McCranie, PhD, who joins OCLC on June 8 as our new Director of Research Insights.

Ann joins OCLC at an important moment as OCLC Research advances Research Reimagined, a strategic effort to strengthen the relevance, visibility, and impact of our research for library leaders and their institutions. In her role, Ann will help connect research priorities to practical insights that support decision‑making across a rapidly changing library and higher education landscape. She will lead a team of research scientists and engineers focused on advancing the Research Reimagined strategy.

Portrait photo of Ann McCranie

Ann brings more than a decade of experience leading research programs in higher education, with expertise in mixed methods research, research operations, and research communication. Most recently, she held senior leadership roles at Indiana University.

Her work has focused on building durable research services, guiding cross-functional teams, and helping researchers and administrators navigate change. Throughout her career, she has paired rigorous analysis with practical application.

Ann also brings a perspective shaped by close collaboration with researchers, research administrators, and campus leaders beyond the library. That experience informs how she thinks about the evolving roles libraries as institutions respond to changes in technology, AI-informed scholarly workflows, and research infrastructure. This perspective will be especially valuable as OCLC Research continues exploring future-focused questions facing libraries and higher education.

To help introduce Ann to the community, we asked her a few informal questions.

What drew you to this role at OCLC?

What attracted me was the opportunity to help connect research to the decisions library leaders are making today, while also contributing to longer-term thinking about where libraries are headed. In higher education, I’ve worked with researchers, administrators, and institutional leaders who rely on strong evidence and practical insights to guide strategy, services, and priorities.

I was especially excited by the chance to bring that experience to OCLC and support work that can have both immediate and lasting value for libraries. Research is most meaningful to me when it helps people navigate change, make informed decisions, and think differently about what comes next.

How do you think about “research insights”?

I tend to think about research insights through the lens of impact. I once asked a doctor about a medical test, and she explained that she would not order it because the result would not change the treatment plan. At first, I was a little disappointed because I was genuinely curious, but that idea stayed with me.

It became a useful way for me to think about research. I’m always asking whether the work can help inform decisions, shape action, or open up new possibilities. If the findings don’t create an opportunity to do something differently, it’s worth asking how we can make the research more purposeful and useful.

What are you most looking forward to as you get started?

I’m really looking forward to getting to know my team and connecting with colleagues across OCLC to understand their work, priorities, and how Research Insights can support them.

As a social networks scholar, I’ve always been interested in the connections between people and how relationships help ideas spread and grow. So much innovation comes from those informal networks, whether that is among coworkers, library partners, or the broader community. I’m excited to learn from those connections and help build on the momentum already underway with Research Reimagined.

Ann will be attending ALA at the end of June, and we look forward to introducing her to many of you there. Until then, please join us in welcoming her to OCLC Research.

The post Welcome Ann McCranie, Director of Research Insights appeared first on Hanging Together.

Systems Life: Lost (and Found) in Log Data / Library | Ruth Kitchin Tillman

This post is part of a series in which I write about experiences or specific challenges from my day-to-day work. I’m hoping that these will be interesting for other librarians that work in entirely different areas, for my colleagues who are solving different problems on different systems (or maybe eventually the same one after we migrate), and for those who are thinking about doing this kind of work in the future.

Building from navigating the distributed database, I want to get more deeply into what cross-system problem solving can look like. To re-set the stage (but for more details about these tools, check the previous post), transaction history of items is only available for most users via our Analytics tool.

Transaction Histories

Transaction history represents the ways an item’s traveled, checkouts but also transits and receipts. This is one of the many transactions created while my request for the Alien: Romulus DVD was filled. In this transaction, a coworker at York (I’ve redacted any details, but the user ID is in the actual log) sets the item to transit for reason “HOLD” to “UP-PAT”:

Trans Hist Datetime Trans Hist Workstation Trans Hist Command Desc Trans Hist Data Code Desc Trans Hist Data Value
2025-08-27 12:10:48 0173 Transit Item call number POPULAR
2025-08-27 12:10:48 0173 Transit Item copy number 1
2025-08-27 12:10:48 0173 Transit Item item ID 000080622957
2025-08-27 12:10:48 0173 Transit Item Max length of transaction response 3000000
2025-08-27 12:10:48 0173 Transit Item station library UP-PAT
2025-08-27 12:10:48 0173 Transit Item station login clearance NONE
2025-08-27 12:10:48 0173 Transit Item station login user access REDACTED
2025-08-27 12:10:48 0173 Transit Item station user’s user ID REDACTED
2025-08-27 12:10:48 0173 Transit Item transit from UP-PAT
2025-08-27 12:10:48 0173 Transit Item transit reason HOLD
2025-08-27 12:10:48 0173 Transit Item transit to UP-ANNEX

This is the Analytics export, which I transformed from a CSV into a table for readability in this post.

Unfortunately, even though the underlying Symphony database has unique item keys for records, Analytics seems to use the barcode as the primary key of an item table, not just the primary way to find an item record. An item’s transaction history is completely wiped from Analytics if someone changes the barcode. And sometimes, barcodes change. In our case, we change barcodes on everything that’s permanently shifted to the annex (see my post on macros). We also have barcodes wear out or fall off. So we have hundreds of thousands of items whose histories were lost, at least from the Analytics.

These lost records came to a head when our Collection Maintenance team needed to be able to track large sets of items being moved the Annex. Once the items arrived, their barcodes would be replaced with an Annex barcode, which serves a different function. So one could follow a set of barcodes on their journey until “poof,” every record related to them vanished. On the one hand, one could assume the item had been processed by the Annex since it had now disappeared. But it made tracking uneven and meant collections maintenance couldn’t tell what route an item had taken to get there or how long it’d taken.

First, I’ll note that our systems work is also quite distributed. While I was working with our collections maintenance data expert on getting access to older data, the Symphony admins were configuring item extended information to include an original barcode field, which is now populated when a barcode updates. They’ve also done some work hunting down barcode changes to update the original barcode fields. These will be exportable, even though they won’t be searchable the same way in our Analytics. Systems takes a village.

Where the Data Still Lives

Getting back to the problem-solving, this data can still be found through the oldest method of ILS data access: Workflows reports.

By running a Scan History Logs report against a set of barcodes, we can export every log in which that barcode shows up. This data wasn’t nearly as easy to use as an Analytics or Data Control export. It’s exported in a text file and uses opaque datacodes.1 Here are two example log entries from a barcode change (the actual user’s ID has been replaced with REDACTED):

2/4/2025,10:23:04 Station: 0265 Request: Sequence #: 59 Command: Edit Item Part B
$<datacode_FF>:REDACTED  $<datacode_FE>:UP-ANNEX  $<datacode_Fc>:NONE  $<datacode_FW>:REDACTED  $<datacode_NQ>:000009387393  $<datacode_IQ>:DD3.M825M66 1976  $<datacode_NX>:A2  $<datacode_NY>:2046  $<datacode_0A>:0902  $<datacode_0B>:015  $<datacode_IN>:CATO-PARK  $<datacode_NR>:20460902015  0y:Y  $<datacode_Fv>:3000000  
 
2/4/2025,10:23:04 Station: 0265 Request: Sequence #: 71 Command: Edit Item Part B
$<datacode_FF>:REDACTED  $<datacode_FE>:UP-ANNEX  $<datacode_Fc>:NONE  $<datacode_FW>:REDACTED  $<datacode_NQ>:20460902015  $<datacode_IQ>:DD3.M825M66 1976  $<datacode_Io>:USERID  $<datacode_Fv>:3000000

That top entry is really important because, even though there are other ways of accessing a permanent item ID, it’s not in the logs. So by scanning for that original barcode, we can get the entry where the barcode is in NQ and the new barcode is in NR.

I wrote a Python script that processed entire log entries, since the colleague from Collection Maintenance wasn’t just looking for old/new barcodes but for the transaction histories of that item. He could apply a date range to the log export itself, so he could set it just to export the last few months. I’m not going to share the entire script here, but this is the overall approach that I used:

current = re.search(r'\<datacode_NQ\>:(.+?) ',line)

I wrote a conditional function for all the entries which might not be present. For the handful whose data might contain a space, I wrote the search to break at the $<datacode that begins the next entry and trimmed space off the right side.

The ouptut is a very large JSON object, which I’ve condensed below to reflect the key fields from this transaction. Even with its size, it’s a lot more compact and efficient than the Analytics output shared above and thus might be easier to process, so this script may end up being useful in other contexts.

    {
        "date": "2/4/2025",
        "time": "10:23:04",
        "sequence": "59",
        "command": "Edit Item Part B",
        "station_user_login": "REDACTED",
        "station_library": "UP-ANNEX",
        "station_user_ID": "REDACTED",
        "current_barcode": "000009387393",
        "new_barcode": "20460902015",
        "call": "DD3.M825M66 1976"
    },

Going forward (to migration) this new process should meet the use cases of:

  1. connecting old and new barcodes in collection maintenance logs,
  2. tracking item histories that had been dropped from Analytics, and
  3. created a friendly JSON object vs. the same entry spread across a dozen lines or more of a CSV, making it of potential use for reporting on new barcodes as well.

So in sum:

  1. Sysadmins created a new field for original barcodes and set it to populate when barcodes are changed.
  2. Sysadmins began hunting through logs to find barcode changes and wrote a script to populate them in the database for export/reference.
  3. I created a way to extract JSON objects for item transaction history out of log reports run on old barcodes since those transactions were no longer accessible in Analytics.

  1. There are also ways to export a formatted log which is human readable, but those logs are much harder to turn into data structures. ↩︎

Weekly Bookmarks / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Graph database-ball! Exploring the Game with the graph capabilities of LadybugDB, DuckDB and PostgreSQL

This article presents a comparison of graph capabilities in three different databases: DuckDB (v1.4.4 with duckpgq), LadybugDB (0.16.1), and PostgreSQL (19devel). We will load a large volume of records (5,635,972 rows of baseball data covering people, parks, team records, and game play-by-plays) into each database, define the entities and relationships, and write a variety of queries that take full advantage of the graph structure.

🔖 Ambient Church

Ambient Church transforms architecturally stunning spaces into immersive audio-visual environments. Our events feature pioneering artists presenting vibrant works in a context that elevates both the music and the space.

Founded in Brooklyn in 2016, we facilitate collective peak experiences through the soundscapes of modern contemplative music. With an emphasis on education and environment, we seek to illuminate an underacknowledged lineage of sonic exploration.

🔖 Language models transmit behavioural traits through hidden signals in data

Large language models (LLMs) are increasingly used to generate data to train improved models1,2,3, but it remains unclear what properties are transmitted in this model distillation4,5. Here we show that distillation can lead to subliminal learning—the transmission of behavioural traits through semantically unrelated data. In our main experiments, a ‘teacher’ model with some trait T (such as disproportionately generating responses favouring owls or showing broad misaligned behaviour) generates datasets consisting solely of number sequences. Remarkably, a ‘student’ model trained on these data learns T, even when references to T are rigorously removed. More realistically, we observe the same effect when the teacher generates math reasoning traces or code. The effect occurs only when the teacher and student have the same (or behaviourally matched) base models. To help explain this, we prove a theoretical result showing that subliminal learning arises in neural networks under broad conditions and demonstrate it in a simple multilayer perceptron (MLP) classifier. As artificial intelligence systems are increasingly trained on the outputs of one another, they may inherit properties not visible in the data. Safety evaluations may therefore need to examine not just behaviour, but the origins of models and training data and the processes used to create them.

🔖 The Archive in Art Art in the Archive

In this essay we will attempt to look at both the archive of art as well as the archive as art. When we draw a distinction between those materials that we treat as documents with a ‘factual’ historical significance (those which offer themselves in the service of scholarship), and the uses which artists make of the archive as one of the media of expression that intersect with their documentary value, we ask ourselves: which theories about the archive’s nature and function are applicable to Syrian art? What are the roles adopted by ‘the document’ and ‘the archivist’? To what extent do these roles alternate and intersect?

🔖 Cave of Forgotten Dreams

Cave of Forgotten Dreams is a 2010 3D documentary film by Werner Herzog about the Chauvet Cave in Southern France, which contains some of the oldest human-painted images yet discovered—some of them were crafted around 32,000 years ago. It consists of footage from inside the cave, as well as of the nearby Pont d’Arc natural bridge, alongside interviews with various scientists and historians. The film premiered on 13 September 2010 at the Toronto International Film Festival.

🔖 Starbucks ditches AI inventory system after just 9 months

Starbucks is saying goodbye to its artificial intelligence inventory management system about nine months after its debut, Reuters reported Thursday. The tool, which used computer vision to track some parts of the chain’s inventory, was announced in September as a method to simplify inventory record-keeping and prevent stockouts.

🔖 FediRoster

FediRoster is a slightly more heavyweight alternative to David Adler’s Sociologists on Mastodon software. It is intended to function as a public list of Mastodon and other fediverse accounts, geared primarily towards academic communities, but suitable for others as well. It offers functions for following listed accounts individually or in bulk. The main novelty here is that you can add yourself to the list through an authentication process instead of all the work falling on a list maintainer. You can sign in through your Mastodon account or send a message to the list’s bot to verify your account ownership. This also means that the hosting process for new lists is a bit more involved (it’s a Python/WSGI application).

🔖 Rust for Python Programmers: Complete Training Guide

A comprehensive guide to learning Rust for developers with Python experience. This guide covers everything from basic syntax to advanced patterns, focusing on the conceptual shifts required when moving from a dynamically-typed, garbage-collected language to a statically-typed systems language with compile-time memory safety.

🔖 CS336: Language Modeling from Scratch

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleaning for pre-training, transformer model construction, model training, and evaluation before deployment.

🔖 wasteback-machine

Wasteback Machine is a JavaScript library for analysing archived web pages, measuring their size and composition to enable retrospective, quantitative web research.

🔖 No, Artificial Intelligence Is Not Conscious

The primary difference between deepfake photos and LLM conversations is that the people who generate the former are deliberately trying to fool others, and many of the people who elicit the latter from LLMs have inadvertently fooled themselves.

🔖 A Visual Guide to Gemma 4 12B

The removal of the encoders, which are typically in charge of making sense of the multimodal inputs, places the burden of making sense of all outputs on the LLM. Although the model is encoder-free, all modalities are now unified within the LLM. Instead of the model having to wait for the encoders to finish processing the audio and image inputs, the LLM can get started earlier processing the input and generating output!

In this guide, I want to showcase what it took to remove the vision and audio encoders and replace them with something much faster. The result, a 12B model that can handle audio and image inputs but without the need for encoders.

🔖 LiteRT-LM Python API

The Python API of LiteRT-LM for Linux, macOS and Windows. Features like multi-modality, tools use, and GPU and NPU acceleration are supported.

🔖 LiteRT-LM

LiteRT-LM is Google’s production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices

🔖 Google AI Edge Gallery

AI Edge Gallery is the premier destination for running the world’s most powerful open-source Large Language Models (LLMs) on your mobile device. Experience high-performance Generative AI directly on your hardware—fully offline, private, and lightning-fast.

🔖 Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B and our more advanced 26B Mixture of Experts (MoE), Gemma 4 12B packages powerful capabilities inside a reduced memory footprint. It is also our first mid-sized model to feature native audio inputs

🔖 Solid State Book

Solid State Books is a full-service, Black-owned general interest bookstore with a great selection of fiction & non-fiction titles. We stock literary gifts, stationery, greeting cards & puzzles for all ages. We have a carpeted, playful children’s books area in both stores for kids & parents alike to spread out & read together. Come by for weekly children’s story hours, catch monthly book groups, author readings/signings, local interest panels, political conversations & more!

🔖 minisearch

MiniSearch is a tiny but powerful in-memory fulltext search engine written in JavaScript. It is respectful of resources, and it can comfortably run both in Node and in the browser.

🔖 Documents the Department of Justice tried to disappear.

In May 2026, the Justice Department began systematically removing material from its web sites regarding the many indictments and convictions related to the Jan. 6 attack on the U.S. Capitol. This archive reconstructs the vast bulk of those thousands of deleted records.

🔖 The Justice Department Erases History; Lawfare Restores It

Last week, the Justice Department began systematically removing material from its web sites regarding the many indictments and convictions related to the Jan. 6 attack on the U.S. Capitol.

The operation started without fanfare or formal announcement and proceeded largely unnoticed. Until, that is, journalists such as the Washington Post’s Meryl Kornfield took notice of certain press releases and other materials that had conspicuously disappeared from www.justice.gov.

“The Trump admin is quietly deleting info about the Capitol attack from the DOJ website as it prepares to give funds to J6ers,” Kornfield posted. “This week, DOJ deleted a press release about one man with an ongoing child solicitation case who came to the Capitol with bear spray.”

Then, with typical bombast, the Justice Department responded by taking issue with one particular aspect of Kornfield’s characterization. “Nothing ‘quiet’ about it,” the DOJ Rapid Response account replied. “We are proud to reverse the DOJ’s weaponization under the Biden administration. We will do everything in our power to make whole those who were persecuted for political purposes. This includes stripping DOJ’s website of partisan propaganda.”

We are not erasing history quietly, the Justice Department seemed to suggest. We are erasing history loudly and proudly.

At Lawfare, we have restored the vast bulk of what was deleted. We have also started to preemptively archive a raft of material that has not yet been deleted but probably will be, given its thematic relationship to the material that was 86ed.

🔖 Data Center Policy Database

Data centers are the physical facilities that power cloud services, AI systems, streaming, and nearly every digital platform people use each day. As demand for artificial intelligence accelerates, data centers are becoming major sources of electricity demand and local infrastructure pressure, which means their growth affects energy systems, communities, and long-term public planning.

🔖 The State of Data Center Policy in the United States

The regulatory landscape for data centers in the United States has shifted dramatically in recent years from a period of aggressive economic incentives to a phase of intense scrutiny, restriction, and community-led resistance. To track these legislative changes, the DIGS Lab at the University of Virginia reviewed more than 700 federal, state, and local policies related to data centers. The data center policy database aims to bring transparency around zoning, permitting, and regulating data centers and their impacts on communities. This is what we found.

🔖 R.E.M. Live - 1981-11-07 Viceroy Park, Charlotte, North Carolina

Another early R.E.M. set, from the same state but a different city as the previous show. Pretty much the same library of songs, but this one’s the superior show to get - it sounds slightly nicer and doesn’t have the equipment failures of the previous show. There’s already a source on here, but that’s a different master of the same recording.

🔖 itsjunetime / tdf

A terminal-based PDF viewer.

Designed to be performant, very responsive, and work well with even very large PDFs. Built with ratatui.

🔖 ratatui

Ratatui (ˌræ.təˈtu.i) is a Rust crate for cooking up terminal user interfaces (TUIs). It provides a simple and flexible way to create text-based user interfaces in the terminal, which can be used for command-line applications, dashboards, and other interactive console programs.

IPv6 zones in URLs are a mistake / Xe Iaso

IPv6 is weird. One of the more strange parts of the standard is that every interface's link local addresses are in fe80::whatever. If you have a machine with two network interfaces, both of them will be in fe80::, so if you have a packet destined to fe80::4, how do you disambiguate it?

The answer is you use IPv6 scopes/zones. The exact format of what goes into a zone is OS dependent, but on Linux it's the interface name and on Windows it's the interface ID. This lets the kernel's routing table know how to handle an address range conflict.

On my tower, this would be represented like this:

fe80::4%eth0
        

Where eth0 is the name of my tower's ethernet device.

When you create a host:port bindhost, you normally separate the hostname and port with a colon. IPv6 uses colons to separate hex groups. In order to disambiguate what's the host and what's the port, you typically format the IPv6 address in square brackets, so fe80::4 on port 80 would look like this:

[fe80::4]:80
        

And with the right scope it looks like this:

[fe80::4%eth0]:80
        

Now let's get URL encoding into the mix. From high orbit, you can imagine a URL's format as being something like this:

<scheme>:[//][<username>[:<password>]@][<hostname>][:<port>][/<path>][?<query>][#<fragment>]
        

An IPv6 zone would then be part of the hostname, just like with that fe80::4 port 80 example from earlier. So you'd think the URL would be something like this:

http://[fe80::4%eth0]:80
        

But if you try to parse this as a URL in Go, you get an error:

package main
        
        import "net/url"
        
        func main() {
        	if _, err := url.Parse("http://[fe80::4%eth0]:80"); err != nil {
        		panic(err)
        	}
        }
        

Yields:

panic: parse "http://[fe80::4%eth0]:80": invalid URL escape "%et"
        

This happens because URLs can't represent all Unicode values, so any values that don't fit into the grammar of a URL become percent-encoded. This is why sometimes you'll see a %20 in URLs in the wild; that's encoding the ascii space key, which is invalid in URLs.

In order to work around this, you need to percent-encode the percent sign in the IPv6 zone:

package main
        
        import (
        	"fmt"
        	"net/url"
        )
        
        func main() {
        	u, err := url.Parse("http://[fe80::4%25eth0]:80")
        	if err != nil {
        		panic(err)
        	}
        	fmt.Println(u.Hostname())
        }
        

Yields:

fe80::4%eth0
        

In theory, there is guidance for how to properly handle IPv6 zones in user interfaces in RFC 9844, but there's no such guidance for URLs. Go also does not seem to follow this RFC in net/url.

Cadey is coffee
Cadey

EDIT: It seems that this behaviour is compliant with RFC 6874 and that this is in fact how it is meant to be done.

      IP-literal = "[" ( IPv6address / IPv6addrz / IPvFuture  ) "]"
        
              ZoneID = 1*( unreserved / pct-encoded )
        
              IPv6addrz = IPv6address "%25" ZoneID
        

Our industry confounds me.

So in the meantime in order for Anubis to point to IPv6 zoned addresses, you need to encode the % with percent encoding. This is horrible, but it seems that this is an edge case that applies to other frameworks, programming languages, and libraries:

Maybe some day in the future there will be a better option here. In the meantime my policy of not forking the Go standard library means that this somewhat terrible UX for an edge case is acceptable. I hate it, but what can you do?

TL;DR: computers were a mistake.

Sustaining Open-Source Web Archiving Infrastructure: Takeaways from IIPC Conference 2026 / Harvard Library Innovation Lab

The Perma team recently attended the International Internet Preservation Consortium’s (IIPC) Web Archiving Conference, held this year at the KBR—Royal Library of Belgium in Brussels. A recurring theme was that web archiving depends on collective stewardship of the open-source tools, institutions, and people that make preservation possible. At a moment when the web is becoming more difficult to archive, the conference offered an assessment of current challenges and a reminder that the sustainability of the field relies heavily on collaboration and shared responsibility.

The opening keynote panel—“Sustainability for Open Source Web Archiving Tools”—brought together perspectives from libraries, consortia, and open source service providers: Lauren Ko (University of North Texas Libraries), Tessa Walsh (Webrecorder), Neil Jefferies (Open Preservation Foundation), Yves Maurer (National Library of Luxembourg), and LIL’s very own Clare Stanton (Perma.cc). The conversation focused on the structural pressures now reshaping the digital landscape, and what collective stewardship might realistically look like. Key takeaways from this conversation are outlined below.

Five keynote speakers sitting in front of a slide in an auditorium describing the Library Innovation Lab and Perma.cc

Clare Stanton (center) discusses Perma.cc during the opening keynote.

Need for sustained investment in open-source software

The web archiving community no longer has the luxury of treating tool and infrastructure maintenance as someone else’s problem. Nearly every institution in the room relies on these open-source tools, including Perma itself. For example, the replay functionality for Perma.cc is built on replayweb.page, part of the software suite developed by our long-time collaborators at Webrecorder. Despite almost everyone using these open-source tools, almost no one is funding them proportionally. Historically, many projects survived on grants and foundation support, but that funding landscape is shrinking. Yves framed open-source work as a shared mission and responsibility, especially for national libraries and cultural heritage institutions whose mandates depend on long-term stewardship. Institutions should be contributing back to the web archiving ecosystem they depend on.

An asymmetric fight against a complex and closing web

Web archiving has become more difficult in the past few years, and the scale and pace of change is only accelerating. Tessa described the current environment as an “asymmetric fight” due to bot detection and anti-scraping systems increasingly treat archiving crawlers the same way they treat commercial scrapers. Several panelists pointed to the collateral damage caused by large-scale scraping and large language model (LLM) training. Infrastructure providers are tightening access controls across the web, often in ways that make legitimate archival crawling significantly harder. Tessa noted that archivists now need to spend more time simply observing crawls to determine whether captures succeeded or whether crawlers archived nothing but bot verification pages. Clare suggested that the closing web may create an opportunity for archiving institutions to advocate collectively for differentiated treatment, making the case to infrastructure companies like Cloudflare that preservation work serves a fundamentally different purpose from commercial scraping.

Beyond single maintainers: Sustaining people, not just code

The panelists repeatedly returned to governance and community structure as equally important to technical capability, and also discussed the human labor behind open source tooling. Multiple panelists emphasized that storage and compute are not the primary costs in web archiving operations. The expensive part is retaining highly skilled people capable of adapting tools to a rapidly changing web environment. Neil argued that sustainability problems become especially acute when projects depend too heavily on single maintainers. The goal, Neil suggested, is not to remove human dependency, but to move from person-dependent systems to people-dependent systems, with succession planning, multiple technical leads, and stronger organizational support structures.

Digital preservation as collective responsibility

There was some cautious optimism about potential sources for more sustainable support. Panelists discussed adding funding requirements for upstream open-source projects into public tenders for web archiving services, creating institutional budget lines specifically for open-source maintenance, and treating contributions to community software as legitimate professional development work for developers within libraries and archives. Some panelists pointed to growing interest in digital sovereignty policies in Europe, where governments increasingly want more direct control over digital infrastructure and collections stewardship. Yves suggested that this political shift could create opportunities for open-source preservation tooling, particularly if public sector procurement rules begin explicitly rewarding contributions back to shared infrastructure.

Benefits and limitations of AI-assisted coding

Not surprisingly, AI hovered over much of the discussion. AI-assisted coding may reduce some development overhead, and some panelists described productive uses for code review, bug detection, and scripting assistance. However, the panel was skeptical of the idea that AI meaningfully solves the underlying sustainability problem. Faster code generation does not automatically create maintainable systems, healthy governance, or resilient communities. As Tessa noted, velocity without understanding creates its own risks.

Open-source software is critical preservation infrastructure

The key takeaway that emerged from the opening keynote was a reframing of open-source web archiving infrastructure not as ancillary technical tooling, but as critical preservation infrastructure. The field behaves as though these systems are indispensable, but there is a significant underinvestment in open-source tools. The harder question, and the one the panel kept circling back to, is whether institutions are willing to fund, maintain, and steward them accordingly.

Reading Rooms for the Archived Web / Ed Summers

Archives de Bevaix by Service intercommunal d’archivage

The Wayback Machine is (usually) good at preserving web pages, but it’s not always good at helping you find your way around what’s been preserved. URLs from a vanished website may be archived, but if the original site is gone, the paths into it (its navigation, its search, its tables of contents) are sometimes gone too.

This creates a need, and opportunity, for sites I want to call reading rooms for the archived web: standalone sites that sit to the side of archived web content and provide the index, browse, search and curation layers that the original site used to, with provenance links back to the captures they’re drawn from. The metaphor I have in mind is the reading room in a brick & mortar archive, the place you go to consult a collection, with finding aids close at hand and the records themselves a request slip away. Perhaps a finding aid is the better metaphor here?

The most recent example of this I’ve come across is work from Lawfare Media, who recovered 5,772 pages deleted from the Department of Justice website related to the Jan. 6 attack on the US Capitol. They’ve built a standalone archival viewer of the extracted content that links back to the Wayback Machine. There is more about the motivation for the project in their post The Justice Department Erases History. Lawfare Restores It. (Sadly the GitHub repo for the archive itself looks to be private.)

This is a bit archive-eating-its-own-tail, but one feature of the site that Lawfare Media built is that the search is operational from within the Wayback Machine’s own snapshot of the site, since the search runs client-side. A user search doesn’t require an API back to the server.

Searching the archive from inside the Wayback Machine

Looking at the HTML it appears the site is using minisearch for client-side search. A nice side effect of client-side search is that the indexed corpus (metadata for all the DoJ content) is itself available on the open web, as corpus.json. Some caring person has even already thought to archive corpus.json using Save Page Now:

A Wayback Machine snapshot of corpus.json from May 29, 2026

Other “Reading Rooms”

Lawfare’s archive sits in a small but growing genre. Or maybe it’s well established and I’m just noticing it for the first time? Another example is Ben Welsh’s FiveThirtyEight Index which he built after Disney shut down fivethirtyeight.com in March 2025. It catalogs over 38,000 articles, datasets, podcasts and graphics, browsable by author, date and series, with every record linked back to its Wayback Machine snapshot. (The Internet Archive also runs a companion collection.)

Another example is Internet Archive’s Scholar, which provides a catalog of published research (mostly journal articles) that are found in the Wayback Machine. I believe this is a presentation layer over data collected by IA’s FatCat project. Which provides some ability to edit the metadata about the archived content.

In archival terms what these projects are doing is effectively what finding aids doe: describing scope, arrangement, and provenance, but wrapping it in something that feels more like a reading room than a paper inventory. They are themselves websites that will eventually need to be archived. I think it’s interesting to think about them as a continuation of something archives have been doing for a very long time. It’s also interesting to think about the role that agentic coding tools played in their production (at least in the case of the Jan 6 Archive).

Jonathan Gray and the Public Data Lab at King’s College London run a project called Repurposing Web Archives (with the Internet Archive and Internet Archive Europe) that looks at the tools, methods, and stories of how researchers, journalists, and artists actually work with the archived web: see their recent Follow the Changes post. Perhaps this idea of Reading Rooms for web archives is a subset of the types of practices this project is interested in? It seems like there is a gray area between research that incorporates web archives, and more documentation oriented content for providing an entry point into web archives?

If you know of other examples of Reading Rooms (or finding aids) for Web Archives I’d love to hear about them!


This post was originally a thread over in the Fediverse. Thanks to (freegovinfo?) for the pointer to the Lawfare Media work.

2026-05-28: If LLMs can write abstracts, what's our job? The Uncanny Valley and Gell-Mann Amnesia Effect in the ACM Digital Library / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

 If LLMs can write abstracts, what's our job? 

The Uncanny Valley and Gell-Mann Amnesia Effect in the ACM Digital Library


Michael L. Nelson

2026-05-28


I serve on the ACM Digital Libraries Board, and we are navigating a number of changes to the ACM's Digital Library, which as a professional society and memory organization, is arguably the ACM's primary asset.   A recent article (March, 2026) by Jack Davidson and Wayne Graves provides a status update of the ACM's move to open access, which includes establishing a "basic" and "premium" service level. Although there are some questions regarding the long-term implications of moving to open access, I, and presumably all authors, welcome the ACM's bold strategy for ensuring that our content reaches the widest possible audience.  


Jack's and Wayne's article also addressed the DL's recent experimentation with AI/LLM enrichment of articles, specifically landing pages.  And unfortunately, the experimentation got off on the wrong foot.  Just before the holidays in 2025, the landing page for articles in the DL added AI-generated summaries as a sort of alternate or rival abstract.  To make matters worse, these summaries were shown by default, and users had to select a tab to show the original, author-supplied abstracts.  The figure below is an example taken from Dr. Casey Fielder (CU Boulder), whose social media post about the summaries being shown by default instead of the abstracts gained a lot of traction. 


AI-generated summary shown by default (2025-12-16) for https://doi.org/10.1145/3706598.3713322 


Fortunately, the expected behavior of showing the authors' abstract by default returned very quickly, and the AI-generated summary is now clearly marked as such, including the date that the summary was generated:



Author-generated abstract is now shown by default https://doi.org/10.1145/3706598.3713322 



The AI-generated summary is now clearly marked as such, and includes the date the summary was generated https://doi.org/10.1145/3706598.3713322 


First, let me be clear: showing the AI-generated summary by default instead of the authors' abstract was a terrible idea and was uniformly rebuked.  The DL board was not informed that this was going to happen, and I can't recall anyone on the DL board even suggesting it; perhaps it was just an oversight by an ACM staff member or engineer at Atypon. I don't recall exactly when the expected default behavior was restored, but it was soon after the author community complained. 


My original suggestion at the DL board meetings (echoed by Dr. Fiesler) was to provide wiki-style editing on the AI-generated summaries, possibly limited to logged-in authors (a possible premium feature?).  One can make a good argument for either opt-out or opt-in, but neither option adequately addresses the problem of the sizable back catalog of unreachable authors (JACM began in 1954).  


But what I find interesting is the level of author backlash against AI-generated summaries, at least as I observed on social media.  This is all anecdotal, and I realize people don't post about things for which they are neutral or have even mildly positive feelings about because, let's face it: carping is a lot more fun.  But Dr. Fiesler and the others in the thread are all reasonable people and aren't just trolling. I think there's something more fundamental happening.  I think our collective reaction (revulsion?) to AI-generated summaries can be explained by adapting two phenomena: the Uncanny Valley, and the Gell-Mann Amnesia Effect.  


The Uncanny Valley is an hypothesis that posits that our emotional response to depictions of humans (expressions, speech, movement, etc.) initially rises as the likeness becomes more human-like, and then takes a sharp dive as the likeness becomes nearly human-like but not quite. Basically, most cartoon characters, anthropized animals, etc. are "cute", but the more realistic animated humans in movies like "Polar Express" (2004) are just creepy.  



The Uncanny Valley (Source: Wikipedia


I propose that something similar happens with text.  Most authors have no problem with AI tools enriching the work, for example: language translation, extracting citations, repairing/rewriting hyperlinks, suggesting related works, suggesting/assigning keywords and ACM CCS values, and any number of other services and derived content.  But generating a summary that rivals the abstract?  Yuck.  No thanks.  An error in citation parsing or CCS assignment?  Meh, who cares, either ignore it or fix it, but no one takes to social media to complain.  A subtle but detectable (if only by the author) error in a summary?  That's glaring and viscerally wrong. And even if we can find no substantive errors, knowing the text is AI-generated, we will find fault with phrasing, the structure, and various minutiae (cf. humans' negative attitudes to replicants in Blade Runner).  Extracting keywords is what computers do. Writing abstracts is what we do. If LLMs can write abstracts, what's our job? 


Those assessments inevitably derive from us reviewing AI-generated summaries of our own work.  Presumably, no one knows the material better than us, so the best anyone / anything else can do is be "as good as", certainly not "better". We're writing for our peers, and we share a nuanced, high-bandwidth vocabulary that outsiders just can't appreciate.  On the other hand, if we have to read articles outside of our area of expertise, we often wonder why are the authors so obtuse? Why can't "those people" just write plainly?  


gocomics.com/calvinandhobbes 


This is the essence of the Gell-Mann Amnesia Effect, which was coined by Michael Crichton to describe the phenomena that the more you know about a topic, the more likely you are to see the flaws in a third party analysis, but at the same time not being as critical when that same third party summarizes a topic on which you are not an expert. Anyone who has been interviewed by the media has experienced this: the reporters inevitably butcher your hour-long exposition, provided in painstaking detail, covering all the nuances, edge cases, historical review, and possible future directions – all reduced to a minute or less of decontextualized soundbites. But that news outlet suddenly becomes a trusted and valuable source when they cover a topic outside of your expertise.  


dilbert.com


I suspect the Gell-Mann Amnesia Effect applies to AI-generated summaries as well: they are an abomination when applied to my work, but a useful de-jargoning tool for exploring unfamiliar or even adjacent sub-fields.  This even presupposes that there should be multiple AI-generated summaries, aimed at different audiences (e.g., lay person, High School, undergraduate, researcher).  In fact, the rival abstract in Dr. Fiesler's example might be the least useful summary, precisely because it does rival the author's abstract.  But writing for audiences other than our own is a different skill set: writing for my fellow researchers at JCDL, Hypertext, Web Science, etc. is what I do, but writing for high schoolers is not what I do.  Casting my work into something appropriate for high schoolers would be a good use of LLMs, and simplifications (if not outright errors) are to be expected.  


In summary, I think it's natural to feel revulsion when the LLMs are used to rival our work: it falls into the textual uncanny valley, in a way that other generative works, such as translation, do not (at least not currently).  But at the same time and based on the Gell-Mann Amnesia Effect, our harshest judgement of AI-generated summaries is reserved for areas in which we are an expert, and our assessment of AI-generated summaries improves as we apply them to areas further from our own.  


With that in mind, it would make sense for the ACM DL to enable wiki-style editing on summaries, move away from the model of a single summary that rivals the author's abstract in length and complexity, and introduce multiple summaries, tailored to audience and intended purpose. 



–Michael 


2026-05-29 Update: I was chatting with Martin Klein, and he informed me that bioRxiv introduced in late 2023 on-demand summaries at variable reading levels. bioRxiv is far from my field, so I'm not completely clear on its status as a production service or just a prototype. For example, this recently published preprint doesn't show the option for AI-generated summaries: 


 

Clicking on the "Automated Services" for the recently published https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1


…shows "There are no automated services for this paper."  


However, I was able to find this preprint from a year ago that does have that option available:


The "Automated Services" option is active for https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1 


When clicked, the default AI-generated summary is for the "General" audience:


 

The "General" AI-generated summary for https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1 

The "Expert" AI-generated summary for https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1 


Are these good summaries?  I guess so – although I'm not sure what else to evaluate them against. I don't know the first thing about proteomics, so the "General" summary is certainly the most accessible to me.  The "Expert" summary is more detailed than the "General" summary, but still more accessible to me than the authors' abstract. That's not a surprise because 1) I haven't studied biology or chemistry since High School, some 40 (!) years ago, so Schär et al. aren't writing for me, and 2) the summaries are both about half the length of the authors' abstract. I saved all three into separate files:


% wc -w bio-*txt | grep -v total

     219 bio-abs.txt

     107 bio-expert.txt

      88 bio-general.txt


Two hundred words is a good target for abstracts. I'm guessing the prompts for the AI-generated summaries had a target of about 100 words, so by design even the "Expert" summary will not rival the authors' abstract (though metadata and wiki-style editing would be nice). The "Automated Services" tab has at the bottom a link to "Explore Further on ScienceCast":


The target of the "Explore Further on ScienceCast" link https://sciencecast.org/casts/jpdm4k710oet 



I don't have an account (yet) on ScienceCast, so that's the end of my exploration for now.  But there's clearly a bigger AI↔paper ecosystem to explore, for both me personally and the ACM DL.  


–Michael 


2026-06-02 Update: In another chat with Martin Klein, and had just discovered the institutional repository at Niigata University. It does not a native English interface, so all of the translations shown below are via Chrome and thus a little clunky.  When you first visit the repository, it asks you to choose a persona or level from three choices: "adult", "junior and senior High School students", and "Elementary school student"


Choosing a persona when visiting https://repolab.lib.niigata-u.ac.jp/ 


Selecting a persona brings up the search page (with the persona changeable via a dropdown menu in the upper right-hand side):


Search page for https://repolab.lib.niigata-u.ac.jp/ 


I did a search for "web archiving".  The hits are not especially relevant (perhaps no one at Niigata is active in the field), but they are sufficient to demonstrate the personas.

Result #1 in the SERP for https://repolab.lib.niigata-u.ac.jp/ 


Clicking on "View AI explanation", there are three tabs corresponding to the three personas previously introduced:

AI Explanation for Middle and High School Students https://repolab.lib.niigata-u.ac.jp/records/record-2000416/  


AI Explanation for Adults https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 



AI Explanation for Elementary School Students https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 


Chrome's translation for Elementary School students is not smooth, but I'm guessing that's an issue with Chrome and not the LLM that Niigata is using – presumably there is less training data for translating "children's" Japanese?


The landing page Niigata's institutional repository does have the regrettable "embedded PDF" interface, and it does list a truncated "AI Explanation" above the "Summary by the author" (to be fair, perhaps it's named "summary by the author" instead of "abstract" is a function of the translation) 

Top of the landing page for https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 


The bottom of the landing page for https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 



It is a little hard to evaluate this three-level approach, since there's the added dimension of language translation.  But it feels like an interesting application of LLMs, and aside from being listed at the top of the SERP, it does not seem to be in competition with the authors' abstract.  


Note that the landing page displayed above is likely an experimental and/or local UI since it is hosted at niigata-u.ac.jp, and is very different from the more conventional looking landing page for associated the handle which resolves to nii.ac.jp


The handle http://hdl.handle.net/10191/0002000416 resolves to https://niigata-u.repo.nii.ac.jp/records/2000416  



I appreciate all of Martin's suggestions and pointers, and welcome more from other readers.


–Michael



*Apologies for including Dilbert, but the options for Gell-Mann Amnesia Effect cartoons are limited. 



AI's PR Problem / David Rosenthal

J.P. Morgan hits photographer with cane
This is just a brief post to explain to my old boss, Eric Schmidt, why he and his ilk are getting booed at college commencements, and why laws against data centers are getting passed. The explanation is below the fold.

Let us start from an under-appreciated fact. Paul Campos reports that:
The college wage premium, that is, the increased earnings associated with having a college degree as opposed to only being a high school graduate, hasn’t changed at all in the past 25 years, because median real wages have been flat as a pancake for everybody, no matter what their formal education level, for the past quarter century.
But:
I wonder what’s happened to capital over this time? Value of S & P 500, inflation-adjusted, 1/2000 to 9/2025 (same period as the wage data):

2000: $1,394

2025: $6,688
On average, for more than the students' entire lives, stock-owners like Schmidt and (to a much lesser extent) I have stolen every last drop of the productivity increase of US workers at every age and education level. (See the actual numbers in the appendix)

Now, the perpetrators of this theft are telling their victims, the students and the public at large, that whether they like it or not they will be subjected to AI because that will make the perpetrators even richer. The victims have been informed that this new technology will:
Nothing better illustrates the contempt of the Epstein class for the proletariat than that these oligarchs would expect the graduating class to enthusiastically accept this prospect.

Appendix

Here are the actual numbers from Paul Campos' 25 years of flat wages and no increase in the college wage premium, while value of capital has skyrocketed:
I was fooling around with FRED this morning, as one does, and here are some stats: (The FRED numbers are presented in nominal dollars; I’ve converted them to CPI-adjusted dollars).

Median usual weekly earnings of workers with a high school degree only:

2000: $968

2025: $980

Median usual weekly earnings of workers with a bachelor degree only:

2000: $1,587

2025: $1,580
...
Median usual weekly earnings of people with a bachelor’s degree or higher:

2000: $1,705

2025: $1,747
Here is a short list of YouTube videos on this topic: As a boomer, I think this post might be the exception that proves Ms. Baba's rule.

Note that every single one of the ads that I saw watching these videos in an incognito window was advertising an AI company! As are 49% of all the billboards in the Bay Area. Read the room, guys!

An Automated Data Monitoring Toolkit and the AI Benchmarking Exercise at the Public Data Project / Harvard Library Innovation Lab

This post is being shared on both the dataindex.us newsletter and the Library Innovation Lab Blog.


“Is data changing? Is it being disappeared? How do we know? How can we know?” This interrogative refrain rang through just about every conversation I had when, almost a year ago, I came to Harvard Law School Library to lead the Public Data Project. Thanks to the dataindex.us Data Checkup, a plan is in place to do this complicated but essential work. Through the careful scaffolding dataindex.us has constructed and the assiduous research of its staff, more than a dozen federal datasets have “health assessments,” and the team continues to add to this list.

In October 2025, the Public Data Project partnered with dataindex.us to develop a data monitoring toolkit that could both work at scale and be user-driven. In addition to creating an automated tool that can process large numbers of datasets, we also want the user to determine which datasets they want to monitor. Let’s face it, when it comes to federal data, one person’s byzantine, inscrutable dataset is another person’s trove of invaluable ground truth. The anecdotes of data use collected by essentialdata.us offer varied examples of the ways people benefit from federal datasets. The range of uses is a clear indication that people need to be able to monitor the data that matters to them.

At the Public Data Project, we are creating a toolkit that will enable users to detect and monitor changes to federal datasets over time. It will enable users to select a dataset and track changes within the data itself, as well as to automate the monitoring of external sources that indicate whether the data might be changing. Indicators of change to a given dataset range from somewhat obvious sources, like major news sites, to more obscure sources, like the U.S. Code. At present, our tool development has produced two components.

First, Binoc is a command-line tool and library to generate changelogs for datasets that don’t have them.

Scanned illustration depicting a man made out of optical equipment; advertisement for L. Srisheim, optician (ca. 1840) Advertising card for L. Srisheim, optician. Source: American Antiquarian Society.

Unlike generic diffing utilities intended to describe line-level differences in plain-text content such as source code or Markdown, Binoc aims to efficiently summarize changes in real-world datasets, including file additions and deletions, row-level updates, and schema alterations. Given a series of dataset snapshots captured at different points in time, Binoc detects what changed, expresses any changes as a minimal structured diff, and produces a human-readable summary. Binoc is currently in a collaborative design phase of development, with new features being added regularly. We welcome feedback from early adopters.

We have also begun the research for a second component of the data monitoring toolkit development.

Photograph of cast bronze USGS benchmark Cast bronze benchmark. Source: United States Geological Survey.

We have created an AI benchmarking exercise to compare and to evaluate how well AI can monitor data and assess its risk when considered next to the processes and conclusions of a careful researcher. The goals of the exercise are to:

  • Test how well AI can assess various types of risk to federal datasets;
  • Evaluate what baseline a popular search model would use to answer those without a custom search harness;
  • Surface and reflect on the tacit knowledge necessary to perform risk assessment, including the sources needed, the steps involved, and the difficulty of defining criteria;
  • Create awareness and community through an intellectually engaging activity that includes both individual research and group reflection.

We have conducted an initial test run of this exercise with a group of 10 information professionals. After introducing the participants to the dataindex.us rubric to assess the risk level of a given dataset, each participant was assigned a dataset and asked to evaluate it across three of the six risk dimensions outlined in the rubric. Each participant was either assigned the first three dimensions — Historical Data Availability, Future Data Availability, and Data Quality — or the latter three — Statutory Context, Staffing and Funding, and Policy. For the first hour, participants more or less worked alone, diligently researching a subject that they lacked expertise in, but for which they had clear guidelines for the kind of information they sought. Participants then opened ChatGPT, and fed it prompts that we had scripted and tailored for each dataset. First in a form that asked them specific questions and then as a group compared their results with ChatGPT’s, participants reflected on their findings. Going through their three assessment dimensions, participants compared their conclusions to those of AI, reflecting on what AI missed, what they missed, and on what parts of the rubric may have led to confusion.

This exercise gave us an early insight into the potentials and pitfalls of AI’s ability to assess data risk, as well as ways in which we might tweak both the exercise and the assessment rubric. This group of participants were information professionals, not policy wonks, and we are eager to see how area specialists’ experience might lead to different outcomes in this exercise. In addition, we want to experiment with prompt engineering and give participants more leeway in their interaction with AI. In the next iteration of the exercise, we will rely on the transcription of each participant’s interactions with AI for analysis, rather than asking individuals to respond in a form.

What we liked most about this exercise, however, were the collective reflections not just on AI, but on public data more generally. One participant described it as an “excellent empathy-building exercise” because, through the work, both alone and as a group, participants become aware of the importance of and perils to public data. They reflected on whether and how to translate their own empathetic experience to AI.

June 2026 Early Reviewers Batch Is Live! / LibraryThing (Thingology)

Win free books from the June 2026 batch of Early Reviewer titles! We’ve got 251 books this month, and a grand total of 3,098 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.

If you haven’t already, sign up for Early Reviewers. If you’ve already signed up, please check your mailing/email address and make sure they’re correct.

» Request books here!

The deadline to request a copy is Thursday, June 25th at 6PM EDT.

Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the UK, the US, Canada, Australia, Germany, New Zealand, Ireland, Malta, Italy, Latvia and more. Make sure to check the message on each book to see if it can be sent to your country.

Employee No. 9The Weight of AngelsFood Is Medicine: Healing Our Bodies, Nourishing Our Minds, and Transforming Our Food SystemA Voice Like Mine: A MemoirLast Seen in Sea IsleAmarisa's Cooking Pot: Tales of Life in All Its WondersMount MiseryFunjeepups: A Beautiful SongFunjeepups: A Star WishGood Families Don'tMy Best Friend Is a Butternut SquashToad on the GoThe Wise PickleWhale, That Was UnexpectedConfessions of the Green River Killer: A True Story of Manipulation, Madness, and a Search for JusticeThe Roman Holiday RuleThe Crazy TestI Know ThingsJonathan's JournalEvery Nanny Before MeAntitherapiesRedworkThe Rise and Fall of the Republic of West Delphi: A MemoirNo One Will Ever Hear You: StoriesA Day in the Life: An NPC LitRPG AnthologyThe Set of All SpiesThe Set of All SpiesSwitching SidesBubbles, Roses, and RumpIs It Poop?: A Guessing Game With Poop and Animals That Look Like PoopRuptured: Jewish Woman in Australia Reflect on Life Post - October 7DiodeDestiny or DefeatErkül Bwaroo, Elf Detective - At Your ServiceWoman Outside the CityShoulders of GiantsThe Very Unremarkable Life of Mrs. Etty BloomParallel CircuitsThe Durbar's ReckoningPadani: A Family StoryA Misfit's Guide to Magic and MayhemQuo Vadis, Jane Mitchell?Wave 2: The SequelEden at DawnThe Necklace of Seven SoulsLong WeekendThe Chumash of Heroes: Bereshit (Genesis)The Girl Who Watched the Trains DepartWriting Memoir in Flashes: Creative Ways to Tell Your True Stories, One Memory at a TimeExalted ObjectsCircus of the Vanishing ElephantTessa's LandingDispatches from Grief: A Mother's Journey Through the UnthinkableThe Curse of Teed HouseHeracles: The Hydra of LernaPhantom of the GalleriaThe Quaint Convictions of Kit BennetThe Timeless Teachings of Conny Mendez: An Essential Collection of Metaphysics in Plain EnglishOcean Animals: An Animal Guessing Game Book Full of Fun and FactsLove in the AbstractThe Only CatchBigger Than VersaceBrandedBareThe Shattered MirrorA Voice of WrathLike MagicDevoted to His SwordWoman Afraid of WaterHunter's BloodBrutal Country: Ten Short StoriesFinding My Way Through Cancer: A Gentle Journey Through Early-Stage Lung CancerWhere Water Meets Sky: An Isekai RomantasyScales of DestinyChange By Doing Nothing: The Hidden Science of Self-Sabotage and Why You Can Change Only When You Stop Forcing ItThe WindowZombies of the Upper East SideEncoded Minds: A Biological ThrillerNot All HandsirlGood Grooming and a Healthy Respect for AuthorityThe Ash Cycle: How the Trypillians Defeated Urbanization Through FireBrilliant Life: The 5 Science-Backed Pillars to Boost Energy, Improve Sleep, and Build Healthy Habits That LastOnly Breath and ShadowTo The Moon and BackRest For The Weary: Biblical Support for Autistic Burnout and RecoveryTakes One to Know OneFlash PointThe Relief of Not Knowing: Stop Overthinking Decisions Start Trusting YourselfThe Fifth SilenceAfter the Altar: Living the Promises of the Wedding DayMaillane: That Morning Sun Comes Rising UpDeath in the End ZoneThe Devil of Tarsyn ForestThe Agentic CMO: How Artificial Intelligence Is Rewriting the Rules of Marketing LeadershipTech Equity: Freedom Through Enabling Technology: A Dream Officer's Playbook for Tech Equity in Disability & Aging ServicesBio-Logic Herbalism: Evidence-Based Natural Herbal Remedies & Home Apothecary Protocols for the Whole FamilyHome Apothecary for Healthy Lifestyle: The Practical DIY Herbal Guide for Household Use, Immune Support, and Natural First AidGoodbyeThe Journey Beyond the MapI Know When You're AsleepChronicle of the Stellar BridgesThe Legend of KaaliThe Family LiarBlue Year: A Literary Lesbian Erotic NovelBlack Hole Guns For HireHow to Show Up for Your Life: Start Living the Life You were MEANT to LiveParents Praying… Heaven Answers: When God Hears the Cry of Parents and IntercessorsTinkerHow to Be Enough: Your Worth Isn't Up for NegotiationThat McKenzie GirlMurder on the Shuffleboard CourtThe Light That Devours the SkyFreedom Quest: A Love StoryA Touch of Magic & DesireThe Last Sethu QueenThe Ever-Changing SandsA Spark of Earth and FlameHealing Your Inner Bestie: A Practical Guide to Master Self-Love and Overcome Self-DoubtLebanon: A Country for No One & EveryoneOutsphereBuduneliEvery Last BoneIntrospection: Exploring the Racialized Politics and Conception of Ideal-Blackness Within African American CultureBetween Home and Silence: A Memoir of Family, Silence, Work, Migration, and Survival Between Two WorldsHold On to MeWhere the Willow BendsThe Redux of Sam MurdochArcadian AlcoveEveryone Kept Quiet So I Did Too: Tales of a Reluctant SoldierSire, Oleander Isn't Dead! (Yet)The AwakeningImago Nine: The Popstar ApocalypseBlood ForgedThe CriticWhispers on FlowersDead ExitDead ExitNursing FlagstopDon't Believe a WordUnshaken: A 30-Day Anxiety Management Workbook for High-Functioning MenThe RanchThe Soufflé Also RisesSketchGates of LoryndasTY, Thel: Films of Thelma RitterA Vamp RevampedThe Supreme LunaThe Three Creature CurseWhat Hears YouAltars in the Ruins: Twenty-Five Sermons from the Ruins Redeemed by GraceCathedral of Scars: Fifteen Sermons from the Ruins That Became SanctuaryWaterspoutThe 1-Thing Way: A Sustainable Path to Reach Your Goals Without BurnoutReclaim Your Body for Life: The 1-Thing Way to Sustainable Fat Loss, Metabolic Health, and EnergyRadical Son Back to RootsThe Man with the Blue Suede ShoesThe Divine Feminine ScentThe Hollow Gospel: Scripture of WoundsConspiracy In TimeA Small Tree in a Texas Hurricane: A MemoirWhat No One Tells You About Caring for an Aging Parent: Real-Life Lessons, Emotional Survival, and Practical Wisdom From 14 Years as My Mother’s CaregiverA Perfectly Normal Childhood (and other lies I tell myself)Sterne: ValerieWho Is Singing?The Devil You KnowBuying Wealth With Money: A Workbook On LegacyH. A. L. T. Own Your Emotions: A Workbook on Self-ControlTeen Slang for Parents: What Your Kids Are Actually SayingMy Voice: A Guide to Mastering Life, Truth, and PurposeThe Park RaceA Tale of Two Chinas: A Fifteen-Year Odyssey Through China's Cultural HeartlandsAngel's SalvationDiddly Duggins and the Great Memory MisplacementA Devil AmidstHarbinger of DarknessThe Next Hundred Years404 Love Not Found: The Story of Harper and JonahThe Resilience of Red ThreadMurder At The Radio StationThe Summer That Changed UsSultana: The Last Road Home: The Titanic of the MississippiThe FalsehoodThe Thirteenth DreamThe One24 Hours to ForgetPelagic ShoresAre We Friends Yet?: How to Deepen Your Relationships and Create the Community You NeedMAX and the Beanstalk!HeliumThe Dying TideA Selfless Marriage: How Mutual Service Rebuilds Love, Respect, and Emotional ConnectionAI Adventures with Maya and ByteThe Quiet Night HugWhere Does It Live?: Learn Where Emotions Live in Your Body and What to Do About ThemOrdinary SoulsCardboard SpaceshipThis Sea WithinThe Statistically Unlikely ReunionThe Statistically Unlikely ReboundTicket to MarsThe Moonscorn MandateAchieve Financial Peace Budget Planner: 12 Month Practical Debt Workbook for Beginners in Large SizeThe Shipton PrincipleCrown and ChronosWalking Along the Ancient Tokaido Road: A Pilgrim's Path: Adventures and Transformations (Vol. 1: Departure)Walking Along the Ancient Tokaido Road: A Pilgrim's Path: Adventures and Transformations (Vol. 2: Insight and Memories)My Mother Said My NameWhispers on FlowersCome, Play with Me: Writer's Camp 3rd AnthologyReflections from a ShoeboxHaiku Redo: A Collection of Haiku, Companion Pieces, and Space for Your OwnHow to Conquer The BillionairesKink-Affirming Therapy Worksheets: A Clinician’s Guide to Sex-Positive and Consensually Non-Monogamous IntegrationIn the Flesh: Why Manifestation Fails in Your Head, and Works in Your BodyCatamorphosisThe Fortress of UsLeo and the Dragon of Sound: A Journey Through the Kingdom of NoiseNotes on HopeKevin The Werewolf: Shattered MoonThe 7-Day Dopamine Detox: A Beginner's Guide to Unplugging, Resetting, and Not Falling Apart OnlineAuthor and Finisher Volume IBecause I Deserve It: What Chronic Illness Taught Me about Finding My Voice in the Healthcare SystemCalling Out the Shadows: A Father's Stand Against the CurrentKiera and Lamby: TokyoCalling Out the Shadows: A Father's Stand Against the CurrentOdysseyThe Brink of Becoming: Designing a Future Beyond Zionism and Cultural ProgrammingThe Last Summer on Hawthorne StreetEternalWe Never Signed the ContractWildfire & The Sun PrinceShadow & The Air TricksterThe Cave of Past and PresentMary FalconDeadly GroundThe Vow RewrittenLost HeroRepatriated: Sons of the SoilSame IceOur Lady of the ArtilectsFart, Laugh, and Be Happy: Inspiring Bathroom Humor Stories to Uplift Your SpiritThe Great Bathroom Humor Cover-Up: An Investigation into the Lost History of Bodily Function ComedyThe Coin of ForeverA Literary Offering: Observations & CommentaryStriking JusticeThe Question of When: A Practical Guide to Knowing When It's Time for Assisted Living, Memory Care, or Skilled NursingThe Protector and the AnnihilationCornelius & The Sneak Goose AttackBlind ItemThe Echo She Left Behind

Thanks to all the publishers participating this month!

ALIO Publishing Group Attwater Books Autumn House Press
Bricolage Lit Brother Mockingbird City of Words
City Owl Press Crooked Lane Books Entrada Publishing
Flat Sole Studio Gefen Publishing House Haven
Henry Holt and Company History Through Fiction HTF Publishing
Inferno Books Infinite Books Inkd Publishing LLC
LaPuerta Books and Media Learning Spark Educational Publishing NeoParadoxa
Plexus Publishing, Inc. Pocketbook Press PublishNation
Restless Books Riverfolk Books RIZE Press
Rootstock Publishing Running Wild Press, LLC Somewhat Grumpy Press
Thinking Ink Press Tundra Books Tuxtails Publishing, LLC
Type Eighteen Books University of Nevada Press University of New Mexico Press
UpLit Press WorthyKids

DLF Digest: June 2026 / Digital Library Federation

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here

Happy June, DLF community! Thanks to everyone who participated in Community Voting for the 2026 Virtual DLF Forum. We appreciate your input as we work with the Forum Planning Committee to build this year’s program.

Look out for updates this month: the program release, registration opening, and Digital Storytelling Fellows applications. We’re excited to share what’s next!

Warmly,

-Shaneé

This month’s news

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus conferences and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.

  • DLF Born-Digital Access Working Group (BDAWG): Tuesday, 6/2, 2pm ET / 11am PT.
  • DLF Digital Accessibility Working Group (DAWG): Tuesday, 6/2, 2pm ET / 11am PT. 
  • DLF AIG Cultural Assessment Working Group: Monday, 6/8, 1pm ET / 10am PT.
  • AIG Metadata Assessment Group: Friday, 6/12, 2pm ET / 11am PT.
  • AIG User Experience Working Group: Friday, 6/19, 11am ET / 8am PT.
  • Digitization Interest Group: Monday, 6/22, 2pm ET / 11am PT.
  • Committee for Equity & Inclusion: Monday, 6/22 3pm ET / 12pm PT.
  • DLF Open Source Capacity Resources Group: Wednesday, 6/24, 1pm ET / 10am PT.
  • DAWG Policy & Workflows: Friday, 6/26, 1pm ET / 10am PT.
  • DAWG IT & Development: Monday, 6/29, 1pm ET/ 10am PT.
  • DLF Climate Justice Working Group: Tuesday, 6/30, 3pm ET / 12pm PT.

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member organization. Learn more about our working groups on our website. Interested in scheduling an upcoming working group call or reviving a past group? Check out the DLF Organizer’s Toolkit. As always, feel free to get in touch at info@diglib.org

Get Involved / Connect with Us

Below are some ways to stay connected with the digital library community and us: 

The post DLF Digest: June 2026 appeared first on DLF.

"No way to prevent this" say users of only package manager where this regularly happens / Xe Iaso

In the hours following the news that Redhat Insights' JavaScript packages fell victim to a supply chain attack via NPM, developers and systems administrators scrambled ensure all of their projects were unaffected from a supply chain attack that steals credentials for AWS, GCP, Azure, Kubernetes, HashiCorp Vault, npm, and CircleCI before then self-propagating via said stolen npm credentials and the bypass_2fa setting. This establishes persistence via Claude Code hooks and VS Code task injection. If you have installed the affected package, reprovision your development hardware. This is is due to the affected dependencies being distributed via NPM, the only package manager where these supply-chain attacks regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Lady Eulah Howell, echoing statements expressed by hundreds of thousands of programmers who use the only package manager where 90% of the world's supply-chain attacks have occurred in the last decade, and whose projects are 20 times more likely to fall victim to supply chain attacks. "It's a shame, but what can we do? There really isn't anything we can do to prevent supply-chain attacks from happening if the maintainers don't want to secure access to their accounts in a robust manner". At press time, users of the only package manager in the world where these vulnerabilities regularly happen once or twice per week for the last year were referring to themselves and their situation as "helpless".

For more information, please see upstream documentation published by Redhat Insights' JavaScript packages at the following link: redhat-javascript-clients-06-2026.

Systems Life: Navigating the Distributed Database / Library | Ruth Kitchin Tillman

This post is the first in a series in which I write about experiences or specific challenges from my day-to-day work. Planned posts include descriptions of a bug and how this impacted the coworkers, how I wrote a script to parse log data… I’m hoping that these will be interesting for other librarians that work in entirely different areas, for my colleagues who are solving different problems on different systems (or maybe eventually the same one after we migrate), and for those who are thinking about doing this kind of work in the future.

When we talk about the ILS or LSP, it can sound like we’re talking about a single system. And we are, some of the time. But just like our permissions shape what we can see and do, the ways we access the system and its data may lead to entirely different experiences. More importantly, if you don’t know how different tools and even databases work, you may end up with inaccurate results or not knowing that something is possible.

For example, our Sirsi ILS and reporting system(s) consist of two separate databases. These databases can be accessed in: one way for most folks (two for people using a BLUEcloud module), two-to-three ways for some, four ways if you’re special, and five ways if you’re one of two people.

Diffusion of Databases

The Sirsi Symphony Database, fka Unicorn1, underlies the whole thing. This Oracle database is the ultimate database of record. If we load MARC, it ends up in the Symphony database. If we place orders, they become entries across Symphony tables. If we loan materials, it triggers a series of updates in the Symphony database.

BLUEcloud Analytics runs off a separate database, also Oracle.2 This separation is common and appropriate. Alma also uses a separate Oracle database and FOLIO has the option of Metadb built with PostgreSQL. The analytics databases don’t contain live data. Instead, they’re updated regularly overnight, based on things that have occurred in the primary database. Change a title? It’ll show up in analytics tomorrow.3 Check out a book? That transaction will show up in circ stats tomorrow.

This is an appropriate choice for three reasons:

  1. It’s a bad idea to run large analytical queries on production. Plus, static indexes are much more efficient to search.
  2. The analytics system has no real demand overnight, so its server can do a full reindex before running any scheduled jobs.
  3. The analytics database can be designed differently.

Following that last point, the analytics database isn’t just a snapshot of production. It has a fundamentally different design. It anonymizes circulation transactions, but it also builds completely different indexes from the ones we need for daily work. For example, it indexes circulation data by hour, day, month, and year as well as by circulation desk. Sometimes we want big numbers. Sometimes we want to see which desks get the most traffic. Those aren’t the kind of searcehs we need to do in day-to-day work. It indexes MARC as fields and subfields, including invalid ones like λ.

Accessing the Databases

Most of my coworkers only access Symphony using one tool: Workflows. A few also use BLUEcloud Circ.4 Using the client, they look for records, update them, perform transactions, etc. We import single MARC records using Workflows wizards. We import batches of MARC records using Workflows reporting (and FTP). Global item updates are done in Workflows. The Workflows reporting module can be used to load, transform, or extract data, history, or (some) statistics.

Next, we have BLUEcloud Analytics. A much smaller set of people (but still plenty) have rights here. As described above, Analytics is a completely separate database. It’s also designed in a way that’s more oriented toward statistical work. Folks use it to extract shelf lists, acquisitions data, spreadsheets of MARC subfields, etc. The indexes are enormous and joined queries can take some time to run (and you can only run joined queries which are supported by the system), but you can get a lot of data and can’t accidentally bring down production.

About four years ago, we got access to Data Control. This is probably my favorite Sirsi product5. Unlike Analytics, Data Control gives you the power to query or even update the Symphony database itself. That means it doesn’t have some things that are in Analytics. You can’t see an item’s transaction history, for example, just its current data.6 Even fewer people have access to this, most use it on our Stage server, and just a couple of us are allowed to run batch updates to production.7

seltools is like Data Control for the command line. More properly, Data Control is an interface that lets ordinary humans use seltools with enough scaffolding not to mess quite as many things up. seltools can do even more and can do it very quickly. It is a sysadmin tool and only two people here have rights to use it. It can do extraordinary work in seconds and could cause irreparable damage (or at least, damage that requires restoring from backup). AFAIK it dates back to the launch of Unicorn.

How I Access the Data

I have rights in Workflows, BLUEcloud, Analytics, and Data Control. I tend to use them as a kind of grab bag and often chain Analytics and Data Control in my work, sometimes performing interim steps with Python or OpenRefine.

Because Analytics isn’t querying live data, it’s a much better place to do initial MARC searches. If I want to find every record with a 699, for example, Analytics is the place to do that fast. Or I could look for every 100 or 700 with a subfield “e” or search for a particular piece of text in one or more fields.

But in terms of output, Analytics leaves a lot to be desired for MARC work. It’ll shows a field’s subfields like a table. For example:

Field Subfield data
264 a New York
b Grosset & Dunlap
c [1972]

That’s fine if I only want to facet down to the subfield b in each row, but if I want to deal with the MARC data as a field it becomes a problem.

In the Analytics reports I use, it’s easy to add the bib key to a report if it wasn’t already in there. Before we got Data Control, my next step would be to actually switch to something like Z39.50 and download all the bibs manually, hoping I got everything (because our keys are not always in the 001, it’s a long story). I then had to do a delimited export in MarcEdit or write a pymarc script to get the fields I wanted.

Now, if I want to see a set of fields from the record, I simply upload that same set of bibkeys in Data Control8. I structure my query to include the tables I want and output the fields I need from each table. I can then export them into a much nicer spreadsheet with the MARC field (and indicators, if desired) printed the way it appears in the original MARC. I can also export the entire set of records as MARC.

264
|aNew York :|bGrosset & Dunlap,|c[1972]

An Example Update

But, even better, I have the rights to update the data. In most cases, I can even use regular expressions. For example, when we added a new ILLiad request placement module to our MyAccount app, we grabbed the 020 (ISBN field) straight from the Symphony API.9 Unfortunately, about 600,000 of our 020 fields followed the pre-2013 structure, when qualifying information was still included in the subfield a. In 2013, subfield q was introduced to handle things like “(paperback)”. This unexpected data was messing with ILLiad’s automated processes. We could’ve changed the script, but it made more sense to fix the actual data, since we niw had the tools.

First, I ran an Analytics query to find all records where the 020a contained (,), or any letter except x. I exported the data, extracted the bibkey column, and then broke it into batches of 25,000 bibkeys.

I spent a few weeks working on our stage server to develop the appropriate regex-based find and replace patterns to move qualifying data into a subfield q. I had to handle various edge cases: no parentheticals, only one half of the parenthetical, etc. Once I felt confident, I ran a batch of about 5000 on stage and QAd my results thoroughly. I then spent the next month running batches in production. I limited batch sizes and chose days when we didn’t have other jobs which would trigger big reindexes (you can only do so many jobs in a night or the reindex will take forever and throw off all the other chron jobs).

Once the project was done, I was able to re-run queries in Analytics to ensure there weren’t any issues remaining.

I can also click into and update single records from Data Control results page or set it to let me modify a particular field and paste repeating data into that field. The former is useful when there might be other related fields which need to be updated or I need more context. The latter is useful when only some of the results need to be updated or the person hasn’t yet got regex privileges on production.

Clashing Designs

So that’s what it looks like when things go well. Tech librarianship so often involves what Marshall Breeding called “Knitting Systems Together” that I almost don’t think about the ways I hop across tools. At most I feel a minor irritation. Recently, I ran across a case where the difference between system designs and who had permissions to access what was making a huge difference in my coworkers’ abilities to get their work done.

In theory, the data in Analytics should mirror what’s in Symphony, at most with a different structure. However, when a barcode is updated in Symphony (generally via Workflows), Analytics completely drops entries related to that barcode. The entries are not transferred to the new barcode. Data that’s still in the item record is retained, so we have the item last activity date, the circulation count (an incremented field), etc. But we can’t see the item transaction history.

Now, there were a couple things we could do about this… I’ll describe how system logs come into play in my next post!


  1. Still labeled Unicorn in some places. ↩︎

  2. Specifically, it’s MicroStrategy whose Wikipedia page starts off like any other data analytics software and then …pivots to Bitcoin. It’s Michael Saylor’s company, if that name means anything to you. ↩︎

  3. Timing could be more frequent, but I believe most have daily updates. ↩︎

  4. BLUEcloud is Sirsi’s next-gen browser client. To my knowledge, we still only use the circulation module and many people still use Workflows for circulation. ↩︎

  5. It’s extremely powerful, though extremely fragile – but that could also describe me, so I can only be so annoyed by it. ↩︎

  6. Transactions here meaning every time the item was scanned, some of which is available via Analytics. There is also transaction history in Symphony but it’s in logs. ↩︎

  7. It also supports two kinds of batch updates – a batch modify which lets you edit fields individually in a browser interface and a batch substitute which lets you run updates on fields using regular expressions. If you wanted to update a MARC 500 field on a set of items, for example, someone with batch modify permissions could display all 500 fields on the records, click Modify, and then paste a new text into any field they wanted to replace (while skipping 500 fields which didn’t match). Someone with regex permissions could find all notes matching the old note and sub it with the new note. ↩︎

  8. Why not do the whole search in Data Control? It is painfully slow compared to Analytics, especially for MARC searches. For the cases when Data Control is designed better for searching, I’ll export a set of keys for the overall records I want to search within and then perform it as a scoped search, which is much faster. ↩︎

  9. We only use the APIs for integrations not for reporting/updates/etc., so I didn’t list it above. Seltools are much faster and more powerful. ↩︎

Untitled / Ed Summers

Untitled

by John Summers

2026-05-26: URL Arguments in API Calls Can Cause Intermittent Temporal Violations While Replaying Archived Web Pages / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

URL Arguments in API Calls Can Cause

Intermittent Temporal Violations

While Replaying Archived Web Pages 


Michael L. Nelson

2026-05-26

Just over two months ago, I was at the ​Information Stewardship Forum 2026 at the Internet Archive, where I was fortunate enough to present a lightning talk about making copies of copies, entitled "The Disintegration Loops: Generational Loss in Web Archives".  During one of the breaks, Mark Graham asked Sawood Alam to take a look at a problem that had stumped the Wayback Machine support team.  I was sitting next to Sawood, and knowing my love for web archiving investigations, Mark invited me to take a look too.  The original inquiry:

Hi, everyone! Got a concerning report from a patron alleging that WBM "URLs were intermittently displaying the current version of the website instead of the archived version." The URLs in question are:

A quick check shows that when replaying these URLs, the content does resemble what is on the live web. For example, the text shown on the page references 2025 and 2026 updates, even though the captures are from 2024 - 2025. I've attached a screenshot of the 2025 capture appearing to show live web content as well as a printout/capture the patron provided of the same URL appearing to show the "actual" archive.


Sawood and I discovered that the problem is not that these URLs are sometimes displaying the live web (or at least not directly). The problem is that this seemingly simple "Terms of Use" page is unnecessarily complex, with the boilerplate legal text included via an API call.  The JavaScript that makes the call includes a number of superfluous URL arguments, including "screenWidth" and "screenHeight", and probably are appended to all API calls "just in case they are needed" (presumably the "Terms of Use" do not actually vary based on the size of the browser).  Thus, depending on the size of your browser, the legal text included in the page is potentially archived at different times, sometimes resulting in a temporal violation: a replay of an archived web page with subresources in a combination that did not exist at the time the top level page was archived.  


Although there are potentially a countably infinite number of archived "Terms of Use" pages, for the examples above there are two semantically interesting versions: one is marked (near the top, left-hand side) "Last Updated: January 18, 2024" and the other is marked "Last Updated: September 22, 2025".  Taking these "Last Updated" strings at face value, we would not expect the three URLs above (archived at "20240222221058" (February 22, 2024), "20241228224626" (December 28, 2024), and "20250531013827" (May 31, 2025)) to display "Last Updated: September 22, 2025". But sometimes they do – and sometimes they don't – and which archived version you get depends on the size of your browser.  


First, as of the time of this writing, the live web still has the "Last Updated: September 22, 2025" version:


https://www.victoriassecret.com/us/site-terms-and-notices


What appears to be a relatively simple HTML page is unnecessarily complex, with nearly 200 subresources. The figure below shows the relevant portion of the call stack: the HTML page calls the cheekily named JavaScript "brastrap.js", which in turn calls the API at "api.victoriassecret.com".  


https://api.victoriassecret.com/categories/v15/page?...


For me, right now, the full live web URL is (emphasis added):


https://api.victoriassecret.com/categories/v15/page?categoryId=4b1ed4b3-5965-4a4d-a3d5-1e5ad379445a&brand=vs&isPersonalized=true&activeCountry=US&platform=mobile&deviceType=phone&platformType=ios&perzConsent=true&cid=&tntId=&screenWidth=701&screenHeight=605


Guessing at the URL arguments: 

  • categoryId=4b1ed4b3-5965-4a4d-a3d5-1e5ad379445a

    • I guess this hash identifies the "Terms of Use" page?

  • brand=vs

    • "vs" = Victoria's Secret?  I believe the parent company operates several affiliated brands, and perhaps the API serves all of them.   

  • isPersonalized=true

    • should this be "false"? – I don't have an account here

  • activeCountry=US

  • platform=mobile, deviceType=phone, platformType=ios

    • none of these are accurate; I'm on a Mac Air laptop. 

  • perzConsent=true

    • looks like a GDPR-related argument 

  • cid=, tntId=

    • tracking arguments (currently null)

  • screenWidth=701, screenHeight=605

    • these are the current dimensions of the active window in my Chrome browser 


It's the last two arguments, "screenWidth" and "screenHeight", that cause the intermittent behavior the original users noticed.  


First, let's consider the page archived on February 22, 2024 ("20240222221058"), which clearly shows the "Last Updated: September 22, 2025" string:


https://web.archive.org/web/20240222221058/https://www.victoriassecret.com/us/site-terms-and-notices


And since the live web still has "Last Updated: September 22, 2025", this is what caused people to think they were getting a live web version (more on that in a bit).  First of all, the Wayback Machine's "About this capture" link does not help; it shows only some of the subresources (improving its function is a task for another time):


"About this capture" lists only some of the subresources, and not the problematic api.victoriassecret.com page. 


Sawood discovered the API URL first. It's well-obfuscated, so it's not a surprise that tech support staff did not find it immediately.  We were sitting side by side, each using our own laptops, and he's much smarter than me and he's always going to win that race. But I noticed that for me, the page seemed to be saved right then, just a minute or two before, whereas he saw that it was archived a few days before (it was then March 19, 2026).  That was odd, but the next session started and I had to stop. 


The 2024 archived version of the page uses a "/v12/" version of the API endpoint (note: this is a common but wrong way to version an API), but it's similar to the 2026 live web example above:


https://web.archive.org/web/20260319160602/https://api.victoriassecret.com/categories/v12/page?categoryId=43b79880-5d66-4747-a553-65aeb3bd1876&brand=vs&isPersonalized=true&activeCountry=US&platform=web&deviceType=&platformType=&cid=&perzConsent=true&tntId=&screenWidth=508&screenHeight=593


In particular, the "/v12/" endpoint remains functional, even though the live web HTML & brastrap.js access the "/v15/" version.  Checking the Wayback Machine directly confirmed that this was indeed the first time that URL had been archived:


https://web.archive.org/web/20260401000000*/https://api.victoriassecret.com/categories/v12/page?categoryId=43b79880-5d66-4747-a553-65aeb3bd1876&brand=vs&isPersonalized=true&activeCountry=US&platform=web&deviceType=&platformType=&cid=&perzConsent=true&tntId=&screenWidth=508&screenHeight=593


Although Sawood found the problem URL, and we confirmed it was archived in March, 2026 (and thus displayed the "Last Updated: September 22, 2025" string), it bothered me that he had an earlier archival time than I did (March 14, 2026 vs. March 19, 2026).  After the next session ended, I returned to this problem.  I changed the size of my browser, and was able to force another new archived version (reproduced on March 22, 2026 below):


The highlighted text shows: 

https://web.archive.org/save/_embed/https://api.victoriassecret.com/categories/v12/page?categoryId=43b79880-5d66-4747-a553-65aeb3bd1876&brand=vs&isPersonalized=true&activeCountry=US&platform=web&deviceType=&platformType=&cid=&perzConsent=true&tntId=&screenWidth=565&screenHeight=605 


Although it's beyond the scope of this post, the Wayback Machine's Save Page Now has a "/save/_embed/" API that allows the Wayback Machine to "patch" the archive with missing URLs from the live web.  In this case, the version of the API response ending with "&screenWidth=565&screenHeight=605" was "missing" from the Wayback Machine, so it patched the archive from the live web, which still displays the "Last Updated: September 22, 2025" string, despite the main HTML page being archived in February, 2024.  So in essence, the Wayback Machine was displaying the live web version, after it was immediately saved to Wayback Machine.  Presumably the "Terms of Use" page changes slowly, but this behavior would be more noticeable if the "Last Updated" string was updated, say, every minute. 


A call to the CDX API confirmed that there were a variety of screenWidth and screenHeight combinations archived (horizontally scroll to the right in the gist below to see the combinations):



In fact, by inspection, there are at least two chances to get the wrong version. If your screen size is "screenWidth=1600&screenHeight=1000", you will get a version of the page that has the string "Last Updated: February 7, 2023", a temporal violation reaching into the past instead of the previously described version that is a temporal violation from the future.  A screen size of "screenWidth=1400&screenHeight=900" will produce the right result ("Last Updated: January 18, 2024"), and a screen size of "screenWidth=1440&screenHeight=900" will produce a different wrong result ("Last Updated: September 22, 2025"). And as shown above, a screenWidth and screenHeight combination not already archived will cause the Wayback Machine to be patched from the live web.  Furthermore, if/when the "/v12/" live web API endpoint is deprecated, then unarchived size combinations will just cause the page replay to silently fail, and most people won't understand why.   


In summary, this seemingly simple "Terms of Use" page is really quite challenging in practice:


  • The API call is not easily discovered, and the "About this capture" service does not show the API URL (and many of the other nearly 200 URLs of subresources in this page).

  • The API has a raft of (arguably) unnecessary URL arguments that do not change the response and cause the Wayback Machine to patch the archive from the live web. 

  • Because the temporally violative subresource is JSON and not, say, a JPEG, one can't simply right-click on the subresource and inspect when it was archived. 


We've encountered synchronization problems with HTML and JSON before (e.g., "Right HTML, Wrong JSON" (JCDL 2023), "Challenges in replaying archived Twitter pages" (IJDL 2024)), but the implementation complexity found in news outlets and social media was to be expected: the advanced UI features that make these sites engaging (e.g., auto-updating, infinite scroll, embedded media, personalized content) are the same features that make archival replay difficult. Without the "Last Updated: …" string, the problem would have been much harder to notice and diagnose. The seemingly intermittent nature, where you'd get a temporally coherent replay only if your browser was the same size as the previously archived responses, made the investigation especially challenging. 


Who pays attention to their browser's exact width and height? In this case, they were the keys to solving this puzzle.


–Michael 

Mark Graham welcoming attendees

My lightning talk

Me in front of the Internet Archive

Dr. Sawood Alam, me, Dr. Jian Wu

Dancing mad with sandboxing / Xe Iaso

Cadey is enby
Cadey

What is an operating system, really?

Aoi is wut
Aoi

I mean, isn't it obvious? It's something like FreeBSD or Fedora that has a kernel, userspace, graphics stack, core set of programs, and everything else you need to be able to use a computer. Is this a trick question?

Numa is smug
Numa

Well it depends, is the Nintendo Switch OS an operating system? It doesn't have a shell in the same way FreeBSD does. Is SEL4 an OS? It doesn't ship with core utilities. Is Linux an OS? Is Windows an OS?

Aoi is facepalm
Aoi

Oh gods here we go again…

The definition of an operating system gets really fuzzy when you start looking at the edges of it, but let's say that an operating system is any part of a computer system that doesn't involve pure math. When you print to the screen, render 3d graphics, connect to the internet, and write to files your code calls into the underlying system to do that work. These system calls are defined by your operating system and are exposed as functions*.

Mara is hacker
Mara

Okay they're not actually functions, but they quack enough like functions that you can treat them like functions and not have to worry about the details too much.

System calls are injected into each operating system process via a process kinda like how you inject dependencies into your applications for database sessions or object storage operations.

Bashing your head into the wall

A while ago a new JavaScript package got into the meme sphere at work: just-bash. It's a sandboxed environment with a shell interpreter that was originally intended for use with AI agents after its author observed that AI agents know how to use a tool called bash a lot better than a tool called search_documentation. This is backed by a "fake" shell with "fake" core utilities (cat, ls, etc, hereinafter coreutils) so that when an agent decides to rm -rf /, nothing important actually leaves the room. One of my coworkers made @tigrisdata/agent-shell on top of this that uses Tigris as its storage layer.

This is great for people in the JavaScript ecosystem, but I am not mainly a JavaScript developer. I really wanted to play with it so I started thinking what it would take to have something like this in Go. mvdan's shell package makes this a heck of a lot easier, meaning that this "fake" shell would be powered by a real shell instead of either porting half of bash to JavaScript or making up hopefully-compatible behaviour.

After a bunch of thought, hacking, and a spot of vibe coding while I did some Dawntrail extreme mount farms, I ended up with Kefka, a "fake" shell with coreutils implementations that lets you put your programs in clown jail. This package lets you add a sandboxed-in-userspace shell to your existing projects without shelling out to the actual implementations of coreutils on your machine.

Mara is hacker
Mara

The name is inspired from Kefka Palazzo, the final boss of Final Fantasy VI. Need to chain uncontrollable demons? Use the power of a mad god driven to the brink of insanity with raw access to magic! What could possibly go wrong!

So I did that

So after some thought, I came up with this interface for the "commands" to use: Execer. This takes process context and passes it as an argument to a function named Exec. Exec then does whatever the process needs it to (list files, write to stdout, etc.) and returns an error if things went wrong and no error if things didn't.

type ExecContext struct {
        	Stdin          io.Reader
        	Stdout, Stderr io.Writer
        	Dir            string
        	Environ        expand.Environ
        	FS             billy.Filesystem
        	// Runner is the active shell runner. Commands that need to dispatch a
        	// child command (for example, `time CMD`) should call Runner.Subshell()
        	// and re-enter the shell so the call goes through the same exec handler
        	// chain instead of poking at the registry directly. May be nil in
        	// embedders or tests that have not wired up a runner.
        	Runner *interp.Runner
        }
        
        type Execer interface {
        	Exec(ctx context.Context, ec *ExecContext, args []string) error
        }
        

This is where I started vibe coding things, mostly via a skill that ports a just-bash command to the Execer interface and filesystem in Go. just-bash itself looks vibe coded from help output and manpages; I tried to go further and stay POSIX compatible, down to matching flag syntax (and in some cases output formats). If your muscle memory fails you, it's a bug in my book.

Aoi is wut
Aoi

If I recall, some POSIX utilities like false aren't usable as Go identifiers, how did you handle package names for that?

Cadey is aha
Cadey

By naming them things like falsecmd:

package falsecmd
        
        // ...
        
        type Impl struct{}
        
        func (Impl) Exec(ctx context.Context, ec *command.ExecContext, args []string) error {
        	return interp.ExitStatus(1)
        }
        

Honestly the implementations of true and false are my favourite part of this implementation. Here's the implementation of true:

package truecmd
        
        // ...
        
        type Impl struct{}
        
        func (Impl) Exec(ctx context.Context, ec *command.ExecContext, args []string) error {
        	return nil
        }
        

This is a fully POSIX compliant implementation of true! Here's the relevant part of the spec if you don't believe me:

true - return true value

SYNOPSIS

true

OPTIONS

None.

OPERANDS

None.

Really, check out the POSIX spec for true. It's trivial to implement, here's a oneliner to implement it in Linux:

touch ./true && chmod +x ./true
        

I made an operating system*

This is basically an operating system: it provides interfaces for programs (well, in this case functions) to get input from a user, send output to a user, interact with a filesystem, and more. Eventually I want to add networking via a network stack on ExecContext, probably with tsnet or wireguard-go's netstack package for the user-level side. Maybe there's room for adding CEL based network filters there too.

Porting applications with WebAssembly

Once I got basic coreutils working, I thought it would be fun to get Python, jq, and ripgrep working. From previous experimentation back in the strawberry era of AI, I had already gotten Python running in WebAssembly via wazero. This used the stdlib io/fs#FS interface to allow me to inject virtual filesystems into the WebAssembly context. I used this to isolate my chatbot's filesystem state so that it (hopefully) wasn't able to delete anything important by accident.

io/fs#FS has methods for the important stuff, and runtime interface assertions let you bridge the gap for things like writes. But it was really designed for embedded filesystems, and writes get hairy fast once you're talking to object storage or anything that isn't a tree of bytes on disk.

At some point I hit a wall and had to switch from io/fs#FS to billy, another filesystem interface that I think predates the standard library one. This gives you a bunch more methods that map a lot closer to filesystem semantics in ways that coreutils crave. The interface was also mostly compatible with io/fs#FS so most of the hard part was really changing out the type and then chasing down compiler errors until I found enough of a pattern to have Opus automate the rest of it.

From there it was a matter of adapting billy's filesystem to wazero's experimental sys interface. Mostly glue code, except where I had to translate Go errors into POSIX errno values. I had to read both the POSIX spec, the WASI spec, and the wazero source to figure out how to map errors between the two worlds. I think I'm at least 95% correct, which is likely within the margin of porting error.

Adapting that codeinterpreter/python library to the new interface was mostly straightforward, and I ended up with a flow like this:

// from https://tangled.org/xeiaso.net/kefka/blob/main/command/internal/python3/python3.go
        
        func (Impl) Exec(ctx context.Context, ec *command.ExecContext, args []string) error {
        	fsConfig := wazero.NewFSConfig().
        		(sysfs.FSConfig).
        		WithSysFSMount(billyfs.New(ec.FS), "/")
        
        	config := wazero.NewModuleConfig().
        		// Pipe ExecContext stdio
        		WithStdin(ec.Stdin).WithStdout(ec.Stdout).WithStderr(ec.Stderr).
        		// Pipe argv
        		WithArgs(append([]string{"python3"}, args...)...).
        		WithName("python3").
        		// Pipe filesystem
        		WithFSConfig(fsConfig).
        		// Pipe system time
        		WithSysNanosleep().WithSysNanotime().WithSysWalltime()
        
        	mod, err := runtime.InstantiateModule(ctx, compiled, config)
        	if err != nil {
        		// Fit the square peg into the round hole
        		if exitErr, ok := errors.AsType[*wsys.ExitError](err); ok {
        			if code := exitErr.ExitCode(); code != 0 {
        				return interp.ExitStatus(uint8(code))
        			}
        			return nil
        		}
        		return err
        	}
        	return mod.Close(ctx)
        }
        
Mara is aha
Mara

See? The dependencies such as stdin, stdout, and stderr get injected into the WebAssembly guest. Wazero also makes you inject the implementation of time for boring reasons involving deterministic computing, but I'm sure you can see the ways things hook in. This basic dependency injection flow is how things like the linuxulator in FreeBSD or the old version of the Windows Subsystem for Linux work (WSL1 before it was made into a Linux VM with WSL2). The table of system calls and filesystem context is effectively an argument to the process.

Same trick got me ripgrep and jq. jq was annoying because wasi-sdk doesn't love jq's (ab)use of cmake; however 30 or so minutes of tweaking compiler flags got me a binary that works enough.

I could see it being pretty easy to port over arbitrary programs to Kefka using WebAssembly like this. There's just one small problem: WASI preview 0.1 doesn't allow you to open arbitrary network sockets. This has been a huge pain in practice (it means you can't do HTTP requests, database connections, or other common internet things from inside the WASM sandbox) and future work probably would include adapting wazero to use wasix instead of WASI 0.1.

Using filesystems that don't exist

OK, that handles filesystems that (arguably) exist, like the btrfs volume on my dev box. What about filesystems that don't? For the sake of argument, let's say you want this fake shell to interact with object storage as its main filesystem. At some level all you need to do is adapt the billy interface to object storage using something like storage-go.

Cadey is coffee
Cadey

Disclaimer, I work at Tigris and developed this library for them. It's basically the S3 client with more methods to handle additional Tigris features like forks and snapshots. I'll be writing more about it soon.

After finding a basic implementation of an S3 -> Billy adapter, I vendored it into the Kefka repo and swapped out the "real" filesystem in cmd/kefka for an s3fs implementation pointed at a sample Tigris bucket. From there it was down to an iterative process of running commands, finding feature gaps when errors showed up, implementing them, fuzzing, and making sure things work mostly the same against Tigris as they do against a local filesystem.

WASI is cursed: it has no process-level "current working directory," which most programs assume exists. You patch around it by passing a CWD envvar, or just use absolute paths. I haven't hit anything broken in casual use, but expect rough edges. Here be dragons and this code may be known by the state of California to cause cancer.

Why does it have to use the command line?

Once everything got working with s3fs and a local shell, I wondered how hard it would be to make this work as an SSH server using the github.com/gliderlabs/ssh package. Hooking things up was pretty easy:

func HandleSSH(sess ssh.Session) error {
          // Convenience variables for SSH session values
          var stdout io.Writer = sess
          var stderr io.Writer = sess.Stderr()
          var stdin  io.Reader = sess
          ctx := sess.Context() // cancelled when the user disconnects
        
          // Kefka command registry with coreutils/python/jq/etc
          commands := registry.New()
          coreutils.Register(commands)
          wasmprog.Register(commands)
        
          // Base envvars for all programs, needed by POSIX
          env := expand.ListEnviron(
            "HOME=/",
            "PWD=/",
            "IFS=\n",
            "HOSTNAME=localhost",
            "USER="+sess.User(),
            // not strictly required, but just-bash sets it
            "MACHTYPE=x86_64-pc-linux-gnu",
          )
        
          // Create shell engine
          sh, err := interp.New(
            // Set the "interactive" flag so the shell expands aliases
            interp.Interactive(true),
            // Forward our envvars
            interp.Env(env),
            // Wire up stdio
            interp.StdIO(stdin, stdout, stderr),
            // Change the shell exec handler such that it's constrained to the
            // Kefka registry.
            //
            // Strictly speaking you don't have to do this, but if you don't
            // then any time the registry doesn't have a command
            // implementation, interp falls back to its default ExecHandler that
            // executes the command as a subprocess. This is almost certainly
            // not what you want.
            interp.ExecHandlers(constrainToRegistry(commands)),
            // Wire up per-command pwd state to the filesystem implementation
            interp.CallHandler(billysh.CallHandler(commands, fsys, stdout, stderr)),
            // Handle shell-level filesystem I/O (redirects, glob expansion, etc)
            interp.StatHandler(billysh.FsysStatHandler(commands, fsys)),
            interp.FsysOpenHandler(billysh.FsysOpenHandler(commands, fsys)),
            interp.ReadDirHandler2(billysh.FsysReadDirHandler(commands, fsys)),
          )
        
          // Read shell commands
          parser := syntax.NewParser()
          fmt.Fprintf(stdout, "$ ")
        
          // Split input into commands
          for stmts, err := range parser.InteractiveSeq(stdin) {
            if err != nil {
              return err
            }
        
            if parser.Incomplete() {
              fmt.Fprintf(stdout, "> ")
              continue
            }
        
            for _, stmt := range stmts {
              err := sh.Run(ctx, stmt)
              if sh.Exited() {
                return err
              }
            }
        
            // Show prompt
            fmt.Fprintf(stdout, "$ ")
          }
        
          return nil
        }
        

The real handler is much messier because Python's REPL needs careful buffering, Ctrl-C has to actually cancel things, and pty wiring is its own can of cans of worms. None of that shows up if it's working. Tab completion and readline polish are easy enough; I'll let you wire those up as an exercise for the reader.

If you want to try it today, you can ssh into sophia.xeiaso.net:

$ ssh sophia.xeiaso.net
        

You'll get an isolated sandbox in your own bucket fork/branch. Every ls is a ListObjectsV2 against the bucket. Every qjs or python3 runs WebAssembly on the server, wired to that same bucket.

$ cat ./samples/hello.js
        console.log("Hello, world!");
        $ qjs ./samples/hello.js
        Hello, world!
        

The demo bucket is seeded with examples. You'll probably have to poke around to find everything. Worst case, run help.

Cadey is coffee
Cadey

I should really hook up session recording to this.

I want more experimental WebAssembly hacks like this to exist. I'll keep poking at it.

Put your programs in clown jail

With some effort, yeet could use Kefka's shell utilities to run Anubis builds on Windows; and if management ever makes you babysit AI agents, clown jail is a decent answer.

The code lives on Tangled. I'm wiring it into an agent harness so I can automate small tools against a local model (I'm loving Qwen3-36B-A3B).

There's a sister post on the Tigris blog that goes deeper into the AI-agent angle and the porting work using Claude Code. If you want, you can check it out here:

alt
Tigris DataGive your agents disposable environments in GoKefka is a userspace shell sandbox in Go that gives every AI agent its own copy-on-write Tigris bucket fork plus Python, jq, and ripgrep via WebAssembly.

Moving Beyond Willpower: A New Direction for Media Literacy Instruction / In the Library, With the Lead Pipe

In brief

Academic librarians and others often engage with media literacy instruction by promoting fact-checking strategies, such as lateral reading or Mike Caulfield’s SIFT. Evidence shows that these strategies are valuable and can be effective, but they all ultimately rely on individual students to use willpower to overcome cognitive habits, biases, strong parasocial relationships with content creators, the power of algorithms, and other challenges to fact-checking content in the moment. This paper offers an alternative approach that instead encourages librarians to support students in intentionally redesigning their information environments to improve the quality of information that they encounter in the first place.

By Mandi Goodsett

“The task of breaking a bad habit is like uprooting a powerful oak within us. And the task of building a good habit is like cultivating a delicate flower one day at a time.” – James Clear

In a 2024 study conducted by the News Literacy Project, the organization found that 80% of the teen participants believed that journalists fail to produce more impartial information than other online content creators, and 69% said that news organizations intentionally make their content biased to advance a particular viewpoint. When the News Literacy Project followed up with these young adults a year later, they found that most of them believe that trustworthy, unbiased news is rare or maybe doesn’t even exist (2025).

Pew Research found, through a series of focus groups, that Americans don’t always agree on what constitutes a “journalist” or “news media,” and young adults are more likely than older adults to call “new media” platforms hosts, such as podcasters and social media creators, “journalists” (Eddy et al., 2025). Overall, younger participants were less likely than older adults to even care whether the news they consume comes from a journalist. The investigation found that Americans are concerned that, besides maybe a few reliable ones, journalists are concerned with “clicks, eyeballs, money, things like that, and they don’t necessarily mind tweaking the truth to suit their audience or their advertisers” (quoted in Pew Research, 2025). 

These statistics are significant because cynicism about standards-based news and other traditionally authoritative institutions has many negative impacts. First, news cynicism can lead to news disengagement, which pushes information consumers to less reliable platforms (Ahmed et al., 2025; Fletcher, et al., 2024; Mont’Alverne, 2022) and contributes to erosion of trust more broadly in institutions like voting (Park, et al., 2025; Raffio, 2025). When people disengage, news sources themselves are threatened by obsolescence, and this threatens their role as a watchdog and a keystone of democratic societies (Haider & Sundin, 2022). News cynicism makes it difficult for accurate information to reach people and, paradoxically, makes people more vulnerable to misinformation (Ahmed et al., 2025; Hasell & Halversen, 2024). Individuals may feel anxious, depressed, and helpless about their world, leading to a spiral of disengagement (Hasell & Halversen, 2024). News cynicism also fuels societal division and threatens democracy (Cappella & Jamieson, 1996; Valgarðsson et al., 2025). Widespread distrust in institutions such as the government, science, public authorities, and the press is a risk to media literacy, democracy, civil discourse, and our sense of agency.

Academic Librarians and Media Literacy Instruction

One strategy for helping students and others improve their media consumption is to teach them media literacy skills. Media literacy is generally thought to be the ability to access, evaluate, analyze, and create media messages (Aufderheide, 1993), although definitions vary considerably between researchers and practitioners (Fleming, 2014; Hobbs, 1998). Media literate individuals have the skills to identify media sources and messages that are unreliable, and, perhaps more importantly, craft an overall media diet that is more likely to consist of reliable information.

Academic librarians are interested in and possess relevant expertise to teach students media literacy skills that are relevant in academic and non-academic settings. Many librarians have explored tactics for teaching students source evaluation skills that move beyond the CRAAP test (Currency, Relevance, Authority, Accuracy, Purpose), such as the SIFT method (Stop, Investigate, Find, Trace), created by Mike Caulfield (2019), or lateral reading, popularized by the Civic Online Reasoning organization (Digital Inquiry Group, n.d.). Caulfield’s SIFT method provides a more up-to-date approach to source evaluation by offering strategies that are more efficient, straightforward, and applicable in a wide variety of contemporary information settings (Bull, 2021). “Lateral reading,” which is a key component of SIFT, involves leaving the source that is being evaluated and opening new browser tabs to investigate what other Internet sources report about the site and its claims (Wineburg & McGrew, 2019). Research has shown that the SIFT method and lateral reading results in more accurate student source evaluation (Bobkowski & Younger, 2020; Breakstone et al., 2021; Brodsky et al., 2021). These techniques reflect a better understanding of the modern online information environment than simplistic checklist strategies. However, they still expect students to avoid misinformation through careful self-control and self-monitoring.

Misinformation is an interdisciplinary problem with significant complexities. As Sullivan has argued, librarians have historically focused on media literacy instruction strategies that neglect the psychology of how people interact with information, and the field of library and information sciences is somewhat siloed in its exploration of source evaluation instruction (2019). For example, heuristics, systems thinking, mental models, and cognitive biases all play a role in how and why people adopt misinformed beliefs. Emotions also influence the ways that individuals evaluate information (Hewitt, 2023; Hicks & Lloyd, 2021), yet they play a minor role in most library source evaluation instructional strategies. Academic librarians may have a role in combatting misinformation, but we should proceed, as much as possible, guided by research conducted across disciplines (Saunders, 2025). As an example, academic librarians have often focused their source evaluation teaching on investigation strategies and fact-checking skills. These skills are very important, and we shouldn’t abandon them. But there are many reasons, informed by research outside of Library and Information Science (LIS), why reactive strategies that rely on individual willpower are destined to be difficult to maintain. 

Challenges of Fact-Checking and Other Traditional Source Evaluation Techniques

Evidence shows that, globally, trust in institutions is decreasing, including in democratic societies (Kavanagh & Rich, 2018; Gil de Zúñiga & Diehl, 2019). The consequences of this could be severe, as many scholars posit that trust in institutions is an important pillar of democracy (Haider & Sundin, 2022). There are also a number of well-studied examples of how bad actors can sow doubt in institutions, such as academia, to achieve their own ends (Haider & Sundin, 2022). This has played out in the case of the tobacco industry and fossil fuel companies; in both cases, the science is clear, but raising uncertainty can be enough to sway consumers to take actions that are not in their best interests (Oreskes & Conway, 2010). All of this said, when society’s institutions become corrupt or unreliable, or when institutions are systematically unfair to one’s group or identity, distrust in institutions is often justified (Haider & Sundin, 2022). So while dismissing institutionally-backed information in favor of persuasive individuals is risky, confidently pointing to institutions as always trustworthy is also unlikely to be effective. Easy-to-apply source evaluation checklists that are meant to be used across all contexts and blind trust in compelling individual voices both fail to reflect the complexity of information environments. 

While media literacy that relies on individual fact-checking skills is very important, there are many reasons why a willpower approach is likely to have limited success. The section below explores these limitations from internal factors, to external factors, and finally, to systemic factors.

Limits of Fact-Checking: Internal Factors

The intuitive solution to the problem of misinformation is to let media consumers know that a piece of information is untrue. However, there is mounting evidence that retractions and corrections have little effect on whether someone will make decisions based on misinformation (Seifert, 2014; Thorson, 2016; Zhou & Shen, 2024). There are many potential reasons for this, but one that almost certainly plays a role is the effect of cognitive bias. For example, epistemic egocentrism is a cognitive bias that occurs when individuals fail to consider their own privileged information when imagining the perspectives of others (Royzman et al., 2003; Zhou & Shen, 2024), which can cause people to judge their own source evaluation skills highly and blame the problem of misinformation’s spread on others. Closely related is blind spot bias, which is the belief that one is immune to bias (Pronin, et al., 2002). Confirmation bias is also relevant to the adoption and spread of misinformation; this bias is the tendency to seek out and remember information in ways that favor existing beliefs (Nickerson, 1998; Oswald & Grosjean, 2004). A consequence of confirmation bias is selective exposure, or a person’s proclivity to preferentially seek and engage with information that is in alignment with their existing values, beliefs, or attitudes (Zhou & Shen, 2024). These cognitive biases, which can occur whether or not the person has a pre-existing attitude about the misinformation, may lead people to dismiss corrections, assume they are correct in situations where there is substantial conflicting evidence, or, by consciously or subconsciously designing their information environment, rarely encounter threats to their existing worldview.

Research into the mechanisms that cause misinformation adoption to persist (sometimes called the “continued influence effect of misinformation”) shows that corrections can fail in their effectiveness when they leave a gap in someone’s mental model, especially when the misinformation fills that gap in a more satisfying way (Johnson & Seifert, 1994, p. 1420). Retrieval errors can also contribute; for example, when misinformation is retrieved from memory without the “false” label, or when misinformation is retrieved more readily than its correction (Ecker et al., 2011; Gordon et al., 2017; Lewandowsky et al., 2012). Because the misinformation and correction both exist in memory, deliberate, effortful thinking is necessary to retrieve corrections from memory, and natural cognitive efficiency processes can make this retrieval difficult or unlikely (Kendeou & O’Brien, 2014; Pennycook & Rand, 2019). These neurological processes make debunking misinformation incredibly challenging once it has been adopted into someone’s mental model. 

Information consumers are also often very confident about their beliefs, even if their knowledge about the topic at hand is, upon investigation, quite shallow. While perceptions of widespread misinformation increase, Americans are confident that they have the skills to identify this unreliable content. In 2016, a study found that 84% of participants were confident in their ability to spot “fake news” and 64% of those same participants believed that fabricated news stories caused significant confusion for Americans (Barthell et al.). Who is being confused by these stories? Not them, the participants in the study seemed to say; it’s everyone else. This points to an overconfidence that individuals have in their own ability to detect false information, contributing to the problem of misinformation’s spread.

One cognitive bias that helps to explain this phenomenon is the Dunning-Kruger effect, whereby individuals with limited knowledge of a subject fail to accurately assess their own level of expertise (Dunning, 2011). For example, research has shown that overconfidence in news judgments is associated with higher susceptibility to false news across a variety of topics, from autism awareness to nutrition claims (Lyons, et al., 2021; Motta, Callaghan, & Sylvester, 2018; Peng & Shen, 2025). Along the same lines, the “nobody-fools-me perception” is a cognitive bias whereby someone is overconfident in their ability to detect misinformation, especially as compared to others (Martinez-Costa et al., 2022). This leads people to make claims like “Many people haven’t learned to check facts” but fail to recognize their own media literacy deficiencies (Martinez-Costa et al., 2022).

Relatedly, the illusion of explanatory depth occurs when people believe they understand a complex topic more than they actually do upon further probing (Rozenblit & Keil, 2002; Sloman & Fernbach, 2017). Humans move through the complex, nuanced, and dangerous modern world by holding a naive intuition that they understand how the world around them works. This, combined with poor knowledge about the extent of our knowledge, causes a pervasive belief that we can explain the world around us even when we can’t (Bailey, 2021). The illusion of explanatory depth can cause people to adopt false beliefs confidently, not realizing their shallow understanding of the topic should cause them to question their self-assured stance.

It’s important to note that a 2025 study found that exposing participants to false news not only caused them to become overconfident in their judgments about whether news stories were true or false, it also fueled news mistrust (Altay et al.). This study demonstrates how news environments themselves contribute to issues that spur misinformation’s spread, such as overconfidence and cynicism. Along the same lines, some researchers worry that media literacy interventions that focus on “misinformation’s omnipresence” risk heightening the salience of misinformation as a threat to society and individuals, ultimately increasing news mistrust (van der Meer, Hameleers, & Ohme, 2023). Misinformation warnings alone can provoke a deception-bias, whereby people assume deception in news messages, rather than defaulting to a trust-bias as they often do in other contexts (van der Meer, Hameleers, & Ohme, 2023).

Limits of Fact-Checking: External Factors

While it’s clear that cognitive limitations make corrections to misinformation difficult or impossible, other researchers argue that misinformation itself is not as widespread of a problem as is commonly believed. They argue that the current perceived prevalence and “panic” about misinformation is a kind of “historical amnesia” (Stecula, 2025). The spread of misinformation is nothing new, and misleading messages have been created and spread for hundreds of years, from anti-vaccination movements of the early 1800s to disbelief about the real cause of JFK’s assassination, all of which occurred before the invention of social media (Stecula, 2025). What is different about the spread of false messages today is their overt support by important societal leaders and the new visibility their small groups of adherents have due to social media. These changes have allowed society to diverge into competing knowledge communities with unique standards for expertise, source evaluation, and, ultimately, defining truth (Stecula, 2025). These new, ideologically isolated communities with extreme views do not represent the majority of the population, but may seem to, given the way social media can amplify their messages. Fact-checking is likely to have limited reach and impact in these isolated, closely-knit communities.

Even in the rare cases when overtly false information is spread outside of isolated bubbles, fact-checking as a strategy for stopping its spread has limitations. Some argue that most fact-checking is ultimately reactive, constrained by scale and speed, and destined to  always be catching up with rapidly changing misinformation messages (Wack, Duskin, & Hodel, 2024). Fact-checkers themselves worry that fact-checking risks drawing additional attention to misinformation and has limited impact for cognitive reasons; one said, “I can only convince those already convinced” (Westlund et al., 2024). 

Another assumption of fact-checking is that knowledge of the truth impacts people’s behaviors in positive ways. However, research about climate change misinformation, for example, found that even when people have accurate beliefs about climate change, it has limited impact on their willingness to engage in pro-environmental behavior (Spampatti, 2025). Additional research has shown that, for some individuals, feeling and appearing independent from outside influence is more important than being correct; for these individuals, whether something is factual or not is irrelevant to whether it should be shared (Stein & Rutchick, 2025). 

It’s also possible that the problem of misinformation has been mischaracterized due to how it is typically studied. Current research on misinformation often focuses on issues that are likely to invoke false beliefs, and it also rarely asks participants to indicate confidence levels; both of these oversights may inflate the perception that people are deeply divided about many issues. In reality, participants may just be uninformed about issues, not misinformed, which is not captured in most studies (Stecula, 2025). Along the same lines, many studies that rely on truth discernment tasks impose a false dichotomy between true and false statements, when misinformation in real world contexts often rides the line between true and false, or may include some true statements with an overall misleading message (Spampatti, 2025). 

Limits of Fact-Checking: Systemic Factors

Research on the spread of misinformation has also frequently focused on individual-level susceptibility without addressing the role of structural inequities in shaping exposure to misinformation and capacity to resist it (Lin et al., 2022; Schirmer, et al., 2025; Walter et al., 2020). Socioeconomic disparities limit who can access high-quality information; lack of broadband access, language differences, and digital literacy deficiencies can all contribute to this problem (Schrimer, et al., 2025). Systemic mistrust, justified by decades of historical injustice, can lead some to seek information outlets alternative to the mainstream, exposing them to misinformation (Jaiswal et al., 2020; Pew Research Center, 2024). Many marginalized communities, however, are actively working to understand the impacts of misinformation and take grassroots efforts to combat it (Schirmer, et al., 2025). There are many ways to move beyond laying the responsibility of misinformation avoidance on individuals, and structural interventions have more potential to address the social disparities that shape misinformation adoption. 

While fact-checking strategies in particular have limited utility, all misinformation interventions that expect individuals to exercise willpower in algorithmically-driven environments will face considerable difficulties. Algorithms have significant power to influence what information and voices individuals encounter. While evidence about the impact of “filter bubbles,” or isolated online spaces that perpetuate misinformation messages (Pariser, 2011), is mixed (Arguedas et al, 2022), there is some evidence that filter bubbles can limit users’ exposure to diverse points of view and increase users’ access to lower-quality content (Ciampaglia et al, 2018). It can be tempting, in today’s algorithm-rich environment, to assume that, instead of intentionally seeking out standards-based news, that news will “find” you (Skurka, et al., 2025). American adults who think the news will “find” them are more likely to overestimate their ability to tell false from true political news and more likely to engage confidently with false news messages (Skurka, et al., 2025). 

One reason social media messages can be especially compelling has to do with influencers. Social media platforms allow for individual voices to have an outsized influence on large sections of the population. These individual voices, or “influencers,” do more than entertain people; they often drive the narrative around topics ranging from politics to economics to health (Thi & Ibrahim, 2025). While research shows that credibility, consistency, and transparency are important characteristics of an influencer that people trust, for an influencer to truly appear “authentic,” they must also build an emotional connection with their audience by seeming relatable and “being real” (Thi & Ibrahim, 2025). Accuracy of the messenger, while not completely irrelevant, is not the most important factor when people decide who to trust in social media settings.

The emotional bond that audience members form with influencers contributes to the rise of parasocial relationships, which are one-sided relationships in which someone develops a sense of closeness and intimacy with a media figure, usually a celebrity or influencer (Hoffner & Bond, 2022). The intensity of parasocial relationships is driven by the media figure’s moments of self-disclosure, glimpses into parts of the person’s life that are usually unknown, and momentary, technology-mediated interactions (e.g. reposting or liking a fan’s post) (Hoffner & Bond, 2022; Kim & Song, 2016; Kurtin, O’Brien, Roy, & Dam, 2018; Dai & Walther, 2018). Even though the influencer or celebrity does not know fans or even necessarily have their best interests at heart, it can feel to fans that they do because of the sense of closeness and trust they have for the influential person. 

Influencers are an important source of misinformation in the information ecosystem because of the scale of their impact. This is especially true for messages that are already viral or widespread; these messages actually help influencers gain more trust from their followers, regardless of the veracity of the message (Mulcahy, et al., 2024). However, influencers face little to no accountability when it comes to sharing misinformation, beyond the impact that being found to have shared inaccurate information might have on their reputation (Thi & Ibrahim, 2025). Unlike journalists, who receive training and commit to a code of ethics, social media creators operate outside any kind of formal ethical framework. 

Complicating the interplay between cognitive biases, algorithmically-driven online spaces, and persuasive social media personalities, is the rise of generative artificial intelligence (AI). Although access to this technology is fairly recent, the use of these systems contributes significantly to the existing problem of misinformation by allowing for the easy creation and customized dissemination of misinformation at scale (Bontridder & Poullet, 2021). Even elected officials have shared AI-generated misinformation with a wide audience (Skau, 2026). 

The widespread sharing of AI-generated misinformation has two main negative impacts; first, even when the content is fact-checked, it can continue to misinform due to the previously mentioned continued influence effect. Sandra Ristovska, an expert in visual evidence from the University of Boulder, Colorado described this challenge of false AI-generated images: “It lies deep in human nature and in the way we see and interpret images that it can be difficult to ‘un-see’ an image or a video once we have seen it” (Ristovska as cited in Skau, 2026, para. 10). The other negative effect is that it can contribute to a sense that nothing online is real, or that we shouldn’t bother determining if something is true or false; in other words, it deepens the cynicism many already feel. As Renee Hobbs, Professor of Communication at the University of Rhode Island, stated, “If we become indifferent to whether something is true or false, we risk losing many of the cooperative structures that make civilization possible” (as cited in Skau, 2026, para. 13). 

Willpower and Habits

Clearly many factors make fact-checking a challenging strategy to rely on for stopping the spread of misinformation and improving students’ media literacy. Importantly, whether an individual is stumbling upon someone else’s fact-check or considering whether to fact-check something themselves, they must have the willpower to take additional critical steps.

It could be argued that the most effective means of improving this situation is to make systemic changes, such as improving social media and search engine algorithms to prioritize accuracy and flag misinformation, or requiring influencers to be more transparent about their motives or qualifications. But while we continue to push for these systemic changes, individuals must continue to make information choices everyday, and this is what library instruction tends to focus on. With that in mind, how can we encourage individual actions that rely less on willpower?

What we are ultimately trying to accomplish is a habit change. Considerable research shows that changing someone’s habits through willpower is very challenging and often destined to fail (Bargh & Barndollar, 1996; Borland, 2013; Muraven, 2012; Wood et al., 2014). What is more effective is changing someone’s environment to encourage the desired behaviors (Bargh & Barndollar, 1996). In research conducted about the importance of environmental as opposed to willpower-based approaches to habit change, Duckworth et al. describe how “situational selection strategies” like putting a distracting device in another room during study time, spending time with friends who value studying, and telling someone else their study goal to hold them accountable had maximum success in improving student study habits (2016). These strategies were more successful than “self control” strategies, which students described as a mindset like, “Just deal with it and study” or “Just do it…I just focus and get my work done” (p. 334). This is just one example of many studies that show how stopping a bad habit through sheer willpower and keeping all other aspects of the environment the same has limited success. However, changing the environment to make the bad habit more difficult and good habits easy and effortless has a much better chance at success. 

The same is true with our information environments. When students spend considerable time in algorithmically-driven social media spaces, they may encounter more poor-quality information that requires fact-checking, and they may feel both a sense of cynicism about the information system more broadly as well as a lack of agency. However, when students spend less time being directed by an algorithm in information spaces with lots of tempting, low-quality information, and more time consulting reliable, standards-based information sources, they improve their information behavior, and, importantly, gain a sense of agency about what information they encounter and consume.

Recommendations for Academic Librarians

Although structural changes are necessary to address many of the issues discussed here, academic librarians may be able to contribute by changing how we approach information literacy instruction. While fact-checking methods like SIFT and lateral reading are important skills (that are convenient to fit into a 50 minute class period), librarians could instead (or in addition) address the importance of adopting new information habits. Rather than asking students to start with having the presence of mind and willpower to “stop” as in SIFT, maybe we should start our process before that “stop” is even necessary by intentionally designing the information environment in the first place.

“Lift Our Gaze” : Teach about Systemic Information Structures

One initial challenge that librarians must address is that it may require considerable motivation for students to take the initial steps to improve their information environments. If students believe that influencers are just as reliable as journalists (or more so), why would they change their habits? 

One strategy is to lean into the ACRL Frame “Information Creation is a Process” (2016). Librarians can help students better understand the systems that underlie the information they encounter through the concept of “infrastructural meaning-making” offered by Haider and Sundin (2023). They define infrastructural meaning-making as going “beyond examining the content’s sources, and even beyond evaluating the source’s content, to also be concerned with the institutions and systems, the platforms and algorithms that deliver it to us and onto our devices” (p. 2). To apply this concept, in addition to traditional source evaluation methods like CRAAP and SIFT, instructors would also encourage students to consider why that particular source appeared to them at that time – in other words, how do the conditions of access, along with the information and its source, help us understand the piece of information? (Haider & Sundin, 2019). Algorithmic literacy, situational awareness, and platform knowledge can all contribute to better decisions about whether to pay attention to a particular piece of information (Haider & Sundin, 2023). Fortunately, many simple and creative activities exist to help students understand how algorithms work to impact their information environments (Camarillo, 2025). While digital information infrastructures are often invisible to us (intentionally on the part of platform providers), we benefit from “lifting our gaze” to understand how networked environments impact what information we encounter (Haider & Sundin, 2023, p.3).

With this strategy, it’s important to consider how affective or attitudinal factors might impact students’ source evaluation approaches, and to add instructional interventions that address these factors to typical source evaluation instruction. For example, one researcher found that just teaching algorithmic awareness to students was helpful, but it was limited in its impact because students felt such a sense of powerlessness to shape their online experiences. However, by pairing algorithmic knowledge with activities that promote digital agency, we can help to combat the significant cynicism students feel about their digital environments (Chung, 2025). 

Along the same lines, helping students understand how standards-based news is created, especially in comparison to influencer-generated content, can help them view the information landscape with a wider scope, rather than focusing on fact-checking individual claims. In the field of communication, researchers have found that knowledge of how news is produced, disseminated, and consumed can improve misinformation detection (Ashley et al., 2023; Chan, 2024; Chan et al., 2024). 

Deliberately Design a News and Information Landscape

Next, students should be encouraged to intentionally seek out reliable information, rather than allow algorithms to determine their information landscape. Research shows that young adults who are exposed to news-rich environments, especially in the classroom, are more likely to develop news consumption habits (Edgerly, 2025; York & Scholl, 2015). In general, people need more help accepting true news than rejecting false news (Pfänder & Altay, 2025), so deliberately undertaking this task could be helpful. Researchers have also found that this approach – focusing on what sources to trust, rather than focusing on the small prevalence of misinformation – can increase trust in standards-based news, rather than fueling cynicism about news (Altay, De Angelis, & Hoes, 2024). However, it’s important to incorporate instruction about negativity bias and click-bait into this process, because research shows that a pessimistic outlook is correlated with self-selecting more negative and episodic news when given the chance to intentionally select news outlets (van der Meer & Hameleers, 2022). Encouraging students to deliberately select reliable information while also helping them break out of their cynical outlooks may improve the effectiveness of this strategy. Recommending platforms like the Good News Network and others that focus on positive news stories can help address the very real mental health concerns of increasing time spent focused on news.

Abstain from Unreliable Information Spaces

Finally, while it may not always be popular, taking time to teach students why social media platforms are an unreliable source of information is essential. These platforms are “firmly grounded in beliefs about individualism, capitalism and consumerism,” not the pursuit of accuracy (Fister, 2021). Librarians might even encourage students to step away from these platforms when possible and to the extent they feel comfortable. This might mean deliberately limiting or eliminating social media accounts, or engaging in phone-free time, which some college students are choosing to do for a variety of other reasons (Beres, 2025). In the habit example above, this is the step when the triggers for the bad habit are removed from the environment, and it is essential to success in new habit formation. Helping students recognize what platforms they engage in that deliver mostly low-quality information is an information literacy issue. 

Conclusion

Media literacy skills are essential to today’s college students, and academic librarians are among the few on campus with the expertise and skills to promote these skills for students. However, teaching students quick fact-checking strategies that they must remember and be motivated to use in the moment may not be effective in real-world environments for a variety of reasons, including the power of cognitive biases, the sway of parasocial relationships, the influence of algorithms and generative AI, and the systemic nature many of these problems. To teach students new habits, we should rely less on willpower and more on proactively/preemptively shaping information environments that help students feel empowered, informed, and positive (or at least realistic) about the information landscape.

It’s not as quick and easy as a fact-checking strategy, but helping students understand the information landscape and set up a more reliable information environment may have longer-lasting positive impacts than hoping to instill new habits for them that face considerable challenges to implement. It’s clear that we are facing more cynicism and disengagement from standards-based news and other authoritative information sources than we ever have before. Even with our limited resources, academic librarians can leverage our expertise to help with this major problem and move students towards a healthier relationship with online information. This foundational shift—from fact-checking individual claims to fostering a healthier, more intentional relationship with information—is arguably among the most critical skills college students can learn.


Acknowledgements

I would like to extend my sincere gratitude to editors Ian G Beilin, Jess Schomberg, and, especially, Brittany Paloma Fiedler, for their invaluable feedback throughout the editing process.  I would also like to thank Amber Willenborg for her thoughtful peer review of the manuscript. The input of these reflective, considerate people greatly improved the story-telling and flow of the paper, and it ensured that it was as inclusive as possible. Finally, I would like to thank Andrea Baer for significantly contributing to the ideas behind this manuscript through our engaging, helpful, and inspiring discussions.


Works Cited

Ahmed, S., Masood, M., Deng, R., & Malviya, S. (2025). Why cynics disengage: the nexus of political cynicism, misinformation, and online political participation. Asian Journal of Communication, 35(5), 381-402. https://www.tandfonline.com/doi/pdf/10.1080/01292986.2025.2538142 

Altay, S., De Angelis, A., & Hoes, E. (2024). Media literacy tips promoting reliable news improve discernment and enhance trust in traditional media. Communications Psychology, 2(1), 74. https://www.nature.com/articles/s44271-024-00121-5 

Altay, S., Lyons, B. A., & Modirrousta-Galian, A. (2025). Exposure to higher rates of false news erodes media trust and fuels overconfidence. Mass Communication and Society, 28(2), 301-325.  https://doi.org/10.1080/15205436.2024.2382776 

Ashley, S., Craft, S., Maksl, A., Tully, M., & Vraga, E. K. (2023). Can news literacy help reduce belief in COVID misinformation? Mass Communication and Society, 26(4), 695-719. https://doi.org/10.1080/15205436.2022.2137040 

Association of College & Research Libraries. (2016). Framework for information literacy for higher education. ACRL. https://www.ala.org/acrl/standards/ilframework 

Aufderheide, P. (1993). Media literacy. A report of the national leadership conference on media literacy. Aspen Institute, Communications and Society Program. https://eric.ed.gov/?id=ED365294 

Bailey, J. J. (2021). False beliefs and the illusion of explanatory depth. Journal of Business and Behavioral Sciences, 33(2), 54-64. https://asbbs.org/files/2021-22/JBBS_33.2_Fall_2021.pdf#page=55 

Bargh, J. A., & Barndollar, K. (1996). Automaticity in action. The psychology of action, 457-481.

Barthell, M.; Mitchell, A.; and Holcomb, J. (2016, December 15). Many Americans believe fake news is sowing confusion. Pew Research Center. https://www.pewresearch.org/journalism/2016/12/15/many-americans-believe-fake-news-is-sowing-confusion/ 

Beres, D. (2025, November 5). The age of anti-social media is here. The Atlantic. https://www.theatlantic.com/magazine/2025/12/ai-companionship-anti-social-media/684596/ 

Bobkowski, P. S., & Younger, K. (2020). News credibility: Adapting and testing a source evaluation assessment in journalism. College & Research Libraries, 81(5), 822. https://doi.org/10.5860/crl.81.5.822 

Bontridder, N., & Poullet, Y. (2021). The role of artificial intelligence in disinformation. Data & Policy, 3, e32. https://doi.org/10.1017/dap.2021.20 

Borland, R. (2013). Understanding hard to maintain behaviour change: a dual process approach. John Wiley & Sons.

Breakstone, J., McGrew, S., Smith, M., Ortega, T., & Wineburg, S. (2018, March). Why we need a new approach to teaching digital literacy. Phi Delta Kappan, 99(6), 27-32. https://doi.org/10.1177/00317217187624 

Brodsky, J. E., Brooks, P. J., Scimeca, D., Todorova, R., Galati, P., Batson, M., … & Caulfield, M. (2021). Improving college students’ fact-checking strategies through lateral reading instruction in a general education civics course. Cognitive Research: Principles and Implications, 6, 1-18. https://link.springer.com/article/10.1186/s41235-021-00291-4 

Bull, A.C. (2021). Dismantling the evaluation framework. In the Library with the Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2021/dismantling-evaluation/ 

Camarillo, L. A. (2025). Squinting through the dawn of AI: Embedding algorithmic literacy principles in library instruction. ACRL 2025 Proceedings. https://www.ala.org/sites/default/files/2025-03/SquintingThroughtheDawnofAI.pdf 

Cappella, J. N., & Jamieson, K. H. (1996). News frames, political cynicism, and media cynicism. The Annals of the American Academy of Political and Social Science, 546(1), 71-84. https://www.jstor.org/stable/pdf/1048171.pdf 

Caulfield, M. (2019, June 19). SIFT (The four moves). Hapgood. https://hapgood.us/2019/06/19/sift-the-four-moves/ 

Chan, M. (2024). News literacy, fake news recognition, and authentication behaviors after exposure to fake news on social media. New Media & Society, 26(8), 4669-4688. https://doi.org/10.1177/146144482211276 

Chan, M., Vaccari, C., & Yamamoto, M. (2024). Examining the relationship between dispositional news literacy and discernment of real and misleading news: Cross-national evidence. International Journal of Public Opinion Research, 36(2), edae020. https://doi.org/10.1093/ijpor/edae020 

Chung, M. (2025). When knowing more means doing less: Algorithmic knowledge and digital (dis) engagement among young adults. Harvard Kennedy School Misinformation Review. https://misinforeview.hks.harvard.edu/wp-content/uploads/2025/10/chung_algorithmic_literacy_youth_20251013.pdf 

Ciampaglia, G. L., Nematzadeh, A., Menczer, F., & Flammini, A. (2018). How algorithmic popularity bias hinders or promotes quality. Scientific Reports, 8(1), 1-7. https://doi.org/10.1038/s41598-018-34203-2  

Clear, J. (2018). Atomic habits: An easy & proven way to build good habits & break bad ones: tiny changes, remarkable results. Random House Business.

Dai Y, Walther JB. (2018). Vicariously experiencing parasocial intimacy with public figures through observations of interactions on social media. Human Communication Research, 44: 322–342, https://doi.org/10.1093/hcr/hqy003.

Digital Inquiry Group. (n.d.). Teaching lateral reading | Civic online reasoning. Retrieved February 11, 2026, from https://cor.inquirygroup.org/curriculum/collections/teaching-lateral-reading/ 

Duckworth, A., White, R., Matteucci, A., Shearer, A., & Gross, J. (2016). A stitch in time: Strategic self-control in high school and college students. Journal of Educational Psychology, 108(3): 329-41. https://psycnet.apa.org/fulltext/2016-15978-003.pdf 

Dunning, D. (2011). The Dunning–Kruger effect: On being ignorant of one’s own ignorance. In Advances in Experimental Social Psychology (Vol. 44, pp. 247-296). Academic Press.

Ecker, U. K., Lewandowsky, S., Swire, B., & Chang, D. (2011). Correcting false information in memory: Manipulating the strength of misinformation encoding and its retraction. Psychonomic Bulletin & Review, 18(3), 570-578. https://doi.org/10.3758/s13423-011-0065-1

Eddy, K.; Lipka, M.; Matsa, K. E.; Forman-Katz, N.; Liedke, J.; St Aubin, C.; & Wang, L. (2025, August 20). How Americans view journalists in the digital age. Pew Research Center. https://www.pewresearch.org/journalism/2025/08/20/how-americans-view-journalists-in-the-digital-age/ 

Edgerly, S. (2026). Developing the habit: The socialization of US teens into distinct repertoires of news consumption. Journal of Children and Media, 20(1), 132-150. https://doi.org/10.1177/14648849211012922 

Fister, B. (2021). Lizard people in the library. PIL Provocation Series, 1(1). Project Information Literacy. https://files.eric.ed.gov/fulltext/ED613472.pdf 

Fleming, J. (2014). Media literacy, news literacy, or news appreciation? A case study of the news literacy program at Stony Brook University. Journalism & Mass Communication Educator, 69(2), 146–165. https://doi.org/10.1177/1077695813517885 

Fletcher, R., Andı, S., Badrinathan, S., Eddy, K. A., Kalogeropoulos, A., Mont’Alverne, C., … & Nielsen, R. K. (2025). The link between changing news use and trust: longitudinal analysis of 46 countries. Journal of Communication, 75(1), 1-15. https://academic.oup.com/joc/article/75/1/1/7907139 

Gil de Zúñiga, H., & Diehl, T. (2019). News finds me perception and democracy: Effects on political knowledge, political interest, and voting. New Media & Society, 21(6), 1253-1271. https://journals.sagepub.com/doi/pdf/10.1177/1461444818817548 

Gordon, L. T., & Thomas, A. K. (2017). The forward effects of testing on eyewitness memory: The tension between suggestibility and learning. Journal of Memory and Language, 95, 190-199. https://doi.org/10.1016/j.jml.2017.04.004 

Haider, J., & Sundin, O. (2019). The fragmentation of facts and infrastructural meaning-making: new demands on information literacy. Information Research, 24(4), 24-4. https://papers.ssrn.com/sol3/Delivery.cfm?abstractid=4698023 

Haider, J., & Sundin, O. (2022). Paradoxes of media and information literacy: The crisis of information. Routledge.

Haider, J., & Sundin, J. H., Olof. (2023, September 21). What is infrastructural meaning-making and why do we need it? Information Matters. https://informationmatters.org/2023/09/what-is-infrastructural-meaning-making-and-why-do-we-need-it/ 

Hasell, A., & Halversen, A. (2024). Feeling misinformed? The role of perceived difficulty in evaluating information online in news avoidance and news fatigue. Journalism Studies, 25(12), 1441-1459. https://doi.org/10.1080/1461670X.2024.2345676 

Hewitt, A. (2023). What Role Can Affect and Emotion Play in Academic and Research Information Literacy Practices?. Journal of Information Literacy, 17(1), 120-137. https://files.eric.ed.gov/fulltext/EJ1393880.pdf 

Hicks, A., & Lloyd, A. (2021). Deconstructing information literacy discourse: Peeling back the layers in higher education. Journal of Librarianship and Information Science, 53(4), 559-571. https://link.springer.com/chapter/10.1007/978-3-030-43687-2_28 

Hobbs, R. (1998). The seven great debates in the media literacy movement. Journal of Communication, 48(1), 16-32. https://mediaeducationlab.com/sites/default/files/Seven_Great_Debates_0.pdf 

Hoffner, C. A., & Bond, B. J. (2022). Parasocial relationships, social media, & well-being. Current Opinion in Psychology, 45, 101306. https://doi.org/10.1016/j.copsyc.2022.101306 

Jaiswal, J., LoSchiavo, C., & Perlman, D. C. (2020). Disinformation, misinformation and inequality-driven mistrust in the time of COVID-19: lessons unlearned from AIDS denialism. AIDS and Behavior, 24(10), 2776-2780. https://doi.org/10.1007/s10461-020-02925-y 

Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(6), 1420-1436. https://psycnet.apa.org/fulltext/1995-04372-001.pdf 

Kavanagh, J., & Rich, M. D. (2018). Truth decay: An initial exploration of the diminishing role of facts and analysis in American public life. https://www.rand.org/pubs/research_reports/RR2314.html 

Kendeou, P., & O’Brien, E. J. (2014). The Knowledge Revision Components (KReC) framework: Processes and mechanisms. In D. N. Rapp & J. L. G. Braasch (Eds.), Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences (pp. 353–377). MIT Press.

Kim J, Song H. (2016). Celebrity’s self-disclosure on Twitter and parasocial relationships: a mediating role of social presence. Computers in Human Behavior, 62:570–577. https://doi.org/10.1016/J.chb.2016.03.083

Kurtin KS, O’Brien N, Roy D, Dam L (2018). The development of parasocial relationships on YouTube. The Journal of Social Media and Society, 7:233–252.

Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106-131. http://dx.doi.org/10.1037/a0039684 

Lin, F., Chen, X., & Cheng, E. W. (2022). Contextualized impacts of an infodemic on vaccine hesitancy: The moderating role of socioeconomic and cultural factors. Information Processing & Management, 59(5), 103013. https://doi.org/10.1016/j.ipm.2022.103013 

Lyons, B. A., Montgomery, J. M., Guess, A. M., Nyhan, B., & Reifler, J. (2021). Overconfidence in news judgments is associated with false news susceptibility. Proceedings of the National Academy of Sciences, 118(23), e2019527118. https://doi.org/10.1073/pnas.2019527118 

Martínez-Costa, M. P., López-Pan, F., Buslón, N., & Salaverría, R. (2023). Nobody-fools-me perception: Influence of age and education on overconfidence about spotting disinformation. Journalism Practice, 17(10), 2084-2102. https://www.tandfonline.com/doi/full/10.1080/17512786.2022.2135128 

Mont’Alverne, C., Badrinathan, S., Ross Arguedas, A., Toff, B., Fletcher, R., & Kleis Nielsen, R. (2022). The trust gap: how and why news on digital platforms is viewed more skeptically versus news in general. Reuters Institute. https://reutersinstitute.politics.ox.ac.uk/trust-gap-how-and-why-news-digital-platforms-viewed-more-sceptically-versus-news-general 

Motta, M., Callaghan, T., & Sylvester, S. (2018). Knowing less but presuming more: Dunning-Kruger effects and the endorsement of anti-vaccine policy attitudes. Social Science & Medicine, 211, 274-281. Knowing less but presuming more_ Dunning-Kruger effects and the endorsement of anti-vaccine policy attitudes

Mulcahy, R., Barnes, R., de Villiers Scheepers, R., Kay, S., & List, E. (2025). Going viral: Sharing of misinformation by social media influencers. Australasian Marketing Journal, 33(3), 296-309.

Muraven, M. (2012). Ego depletion: Theory and evidence. The Oxford handbook of human motivation, 111, 126.

News Literacy Project (2024). News literacy in America: A survey of teen information attitudes, habits and skills. NLP-Teen-Survey-Report-2024.pdf

News Literacy Project (2025). ‘Biased,” “boring” and “bad”: Unpacking perceptions of news media and journalism among U.S. teens. NLP-Teens-and-News-Media-Report-2025.pdf

Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175-220. https://doi.org/10.1037/1089-2680.2.2.175 

Oreskes, N., & Conway, E. M. (2010). Defeating the merchants of doubt. Nature, 465(7299), 686-687.

Oswald, M. E., & Grosjean, S. (2004). Confirmation bias. Cognitive illusions: A handbook on fallacies and biases in thinking, judgement and memory. Psychology Press.

Pariser, E. (2011). The filter bubble: How the new personalized web is changing what we read and how we think. Penguin.

Park, S., Fisher, C., Fletcher, R., Tandoc Jr, E., Dulleck, U., Fulton, J., … & Yao, S. P. (2025). Exploring responses to mainstream news among heavy and non-news users: From high-effort pragmatic skepticism to low effort cynical disengagement. New Media & Society, 27(7), 4143-4163. https://journals.sagepub.com/doi/pdf/10.1177/14614448241234916 

Peng, R. X., & Shen, F. (2025). Why fall for misinformation? Role of information processing strategies, health consciousness, and overconfidence in health literacy. Journal of Health Psychology, 30(8), 2030-2045. https://doi.org/10.1177/13591053241273647

Pennycook, G., & Rand, D. G. (2019). Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition, 188, 39-50. https://doi.org/10.1016/j.cognition.2018.06.011 

Pew Research Center. (2024, June 15). Most Black Americans believe U.S. institutions were designed to hold Black people back. https://www.pewresearch.org/race-and-ethnicity/2024/06/15/most-black-americans-believe-u-s-institutions-were-designed-to-hold-black-people-back 

Pfänder, J., & Altay, S. (2025). Spotting false news and doubting true news: a systematic review and meta-analysis of news judgements. Nature Human Behaviour, 9(4), 688-699. https://www.nature.com/articles/s41562-024-02086-1 

Pronin, E., Lin, D. Y., & Ross, L. (2002). The bias blind spot: Perceptions of bias in self versus others. Personality and Social Psychology Bulletin, 28(3), 369-381. https://journals.sagepub.com/doi/pdf/10.1177/0146167202286008?casa_token=ZS3Q-iOSxikAAAAA:Pgfv8gXr2LjWr3lxdBD5Evj8BcjLJpGW9F6GJFfbWqnj4OGpbJcCNQbQoklWWxwO7lv4yMauNnY 

Raffio, N. (2024, October 28). Trust in voting: How misinformation threatens democracy. USC Today. https://today.usc.edu/trust-in-voting-how-misinformation-threatens-democracy/ 

Ross Arguedas, A., Robertson, C., Fletcher, R., & Nielsen, R. (2022). Echo chambers, filter bubbles, and polarisation: A literature review. The Royal Society. https://doi.org/10.60625/risj-etxj-7k60 

Royzman, E. B., Cassidy, K. W., & Baron, J. (2003). “I know, you know”: Epistemic egocentrism in children and adults. Review of General Psychology, 7(1), 38-65. https://doi.org/10.1037/1089-2680.7.1.38 

Rozenblit, L., & Keil, F. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive science, 26(5), 521-562. https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog2605_1 

Saunders, L. (2025). Information literacy as part of an interdisciplinary approach to combat misinformation. Information Research an International Electronic Journal, 30(CoLIS), 424-442. https://publicera.kb.se/ir/article/download/52318/43437 

Schirmer, M., Walter, N., & Horvát, E. Á. (2025). Disparities by design: Toward a research agenda that links science misinformation and socioeconomic marginalization in the age of AI. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-178 

Seifert, C. M. (2014). The continued influence effect: The persistence of misinformation in memory and reasoning following correction. In Rapp, D. & Braasch, J.L.G. (Ed.s.), Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences (pp 39-71.) MIT Press.

Skurka, C., Cheng, Z., Goyanes, M., & Gil de Zúñiga, H. (2026). News Finds Me as the illusion of competence: evidence for overconfidence in discernment of political misinformation. Human Communication Research, 52(1), 11-23. https://doi.org/10.1093/hcr/hqaf015 

Skau, M. 2026, Feb 24. AI challenges our relationship with truth. Media Education Lab. Retrieved March 2, 2026, from https://mediaeducationlab.com/index.php/blog/ai-challenges-our-relationship-truth 

Sloman, S., & Fernbach, P. (2018). The knowledge illusion: Why we never think alone. Penguin.

Spampatti, T. (2025). Truth discernment may not help to overcome misinformation. Nature Climate Change, 15(10), 1006-1009. https://www.nature.com/articles/s41558-025-02426-7 

Stecula, D. A. (2025). Getting misinformation wrong: Why context fixes can’t solve structural problems [white paper]. University of Delaware Biden School of Public Policy & Administration; Stavros Niarcho Foundation Ithaca Initiative. https://udspace.udel.edu/server/api/core/bitstreams/3cac95b7-d97c-4929-8ec9-e88cbc76470a/content 

Stein, R., Rutchick, A. M., Sin, A. Y., & Jarrin Rueda, L. F. (2025). Symbolic show of strength: a predictor of risk perception and belief in misinformation. The Journal of Social Psychology, 1-27. https://doi.org/10.1080/00224545.2025.2541206 

Sullivan, M. C. (2019). Why librarians can’t fight fake news. Journal of Librarianship and Information Science, 51(4), 1146-1156. https://doi.org/10.1177/0961000618764258 

Thi, P. V., & Ibrahim, A. (2025). Influencer credibility and authenticity in the fight against misinformation. Feedback International Journal of Communication, 2(3), 205-215. https://doi.org/10.62569/fijc.v2i3.199 

Thorson, E. (2016). Belief echoes: The persistent effects of corrected misinformation. Political Communication, 33(3), 460-480. https://repository.upenn.edu/bitstreams/fde2b15d-38dd-4d96-9205-6ca7bfb356e2/download 

Valgarðsson, V., Jennings, W., Stoker, G., Bunting, H., Devine, D., McKay, L., & Klassen, A. (2025). A Crisis of Political Trust? Global Trends in Institutional Trust from 1958 to 2019. British Journal of Political Science, 55, e15. https://doi.org/10.1017/S0007123424000498 

van der Meer, T. G., & Hameleers, M. (2022). I knew it, the world is falling apart! Combatting a confirmatory negativity bias in audiences’ news selection through news media literacy interventions. Digital Journalism, 10(3), 473-492. https://doi.org/10.1080/21670811.2021.2019074 

Van Der Meer, T. G., Hameleers, M., & Ohme, J. (2023). Can fighting misinformation have a negative spillover effect? How warnings for the threat of misinformation can decrease general news credibility. Journalism Studies, 24(6), 803-823.  https://doi.org/10.1080/1461670X.2023.2187652 

Wack, M., Duskin, K., & Hodel, D. (2024). Political fact-checking efforts are constrained by deficiencies in coverage, speed, and reach. arXiv preprint arXiv:2412.13280.

Walter, N., Cohen, J., Holbert, R. L., & Morag, Y. (2020). Fact-checking: A meta-analysis of what works and for whom. Political Communication, 37(3), 350-375. https://doi.org/10.1080/10584609.2019.1668894 

Westlund, O., Belair-Gagnon, V., Graves, L., Larsen, R., & Steensen, S. (2024). What is the problem with misinformation? Fact-checking as a sociotechnical and problem-solving practice. Journalism Studies, 25(8), 898-918. https://www.tandfonline.com/doi/pdf/10.1080/1461670X.2024.2357316 

Willenborg, A., & Detmering, R. (2025). ” I don’t think librarians can save us”: The material conditions of information literacy instruction in the misinformation age. College & Research Libraries, 86(4), 534. doi:https://doi.org/10.5860/crl.86.4.535 

Wineburg, S. & McGrew, S. (2019). Lateral reading and the nature of expertise: Reading less and learning more when evaluating digital information. Teachers College Record: The Voice of Scholarship in Education, 121(11): 1-40. https://doi.org/10.1177/016146811912101102 

Wood, W., Labrecque, J. S., Lin, P. Y., & Rünger, D. (2014). Habits in dual process models. Dual process theories of the social mind, 1, 371-85.

York, C., & Scholl, R. M. (2015). Youth antecedents to news media consumption: Parent and youth newspaper use, news discussion, and long-term news behavior. Journalism & Mass Communication Quarterly, 92(3), 681-699. https://doi.org/10.1177/1077699015588191 

Zhou, Y., & Shen, L. (2024). Processing of misinformation as motivational and cognitive biases. Frontiers in Psychology, 15, 1430953. https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1430953/pdf