Rockets, unlike other anthropogenic pollution sources, emit gaseous and solid chemicals directly into the upper atmosphere. We compile inventories of these chemicals from rocket launches in 2019 and projections of future growth and speculative space tourism activity. We incorporate these in a 3D atmospheric chemistry model to simulate the impact on climate and the protective stratospheric ozone layer. We find that loss of ozone due to current rockets is small, but that routine space tourism launches may undermine progress made by the Montreal Protocol in reversing ozone depletion in the Arctic springtime upper stratosphere. The BC (or soot) particles from rockets are also of great concern, as these are almost five hundred times more efficient at warming the atmosphere than all other sources of soot combined.
Note that even four years ago it was already clear that the space industry was both depleting ozone and aggravating global warming. But this was before the scale of the proposed mega constellations was evident.
So far, models of spacecraft reentry have focused on understanding the hazard presented by objects that survive to the surface rather than on the fate of the metals that vaporize. Here, we show that metals that vaporized during spacecraft reentries can be clearly measured in stratospheric sulfuric acid particles. Over 20 elements from reentry were detected and were present in ratios consistent with alloys used in spacecraft. The mass of lithium, aluminum, copper, and lead from the reentry of spacecraft was found to exceed the cosmic dust influx of those metals. About 10% of stratospheric sulfuric acid particles larger than 120 nm in diameter contain aluminum and other elements from spacecraft reentry. Planned increases in the number of low earth orbit satellites within the next few decades could cause up to half of stratospheric sulfuric acid particles to contain metals from reentry.
Much of the reentry burn happens above the stratosphere, and it takes time for the aluminum nanoparticles to drift down to the levels where they were collected. So the 10% number represents pollution from an earlier period with fewer reentries that the 2020s. Murphy notes that:
Most of the meteoric mass is deposited at altitudes between 75 and 110 km by a very large number of sub-millimeter meteoroids. Reentering spacecraft, which are larger and moving more slowly, ablate between 40 and 70 km over a ~300 km long footprint
This paper investigates the oxidation process of the satellite's aluminum content during atmospheric reentry utilizing atomic-scale molecular dynamics simulations. We find that the population of reentering satellites in 2022 caused a 29.5% increase of aluminum in the atmosphere above the natural level, resulting in around 17 metric tons of aluminum oxides injected into the mesosphere. The byproducts generated by the reentry of satellites in a future scenario where mega-constellations come to fruition can reach over 360 metric tons per year. As aluminum oxide nanoparticles may remain in the atmosphere for decades, they can cause significant ozone depletion.
Ferreira et al confirm the potentially long delay between reentry and the nanoparticles reaching the ozone layer and depleting it:
we find that these reentry byproducts may take up to 30 years to settle from the top of the mesosphere into the stratospheric ozone layer. Upon reaching an altitude of about 40 km, aluminum oxides catalyze chlorine activation which promotes ozone depletion. This suggests that concentrations of aluminum oxide compounds may start increasing in the mesosphere well before reaching the stratospheric ozone layer. This would introduce a noticeable delay between the beginning of the injection process when orbiting bodies are decommissioned and the eventual ozone-depletion consequences in the stratosphere.
A lack of observations and validated models of reentry demise limits our ability to simulate the complex aerosols associated with reentry, which makes estimating the climate impacts difficult. Aluminum is a primary satellite component and will likely be emitted during reentry vaporization in the form of alumina. Unmodified alumina is a useful approximation for metallic reentry aerosol. In this study, we simulate a potential yearly emission of 10,000 metric tons of alumina from reentering space debris. We investigate how the location of atmospheric accumulation, aerosol size distribution, and radiative properties of reentry alumina impacts the middle atmosphere. We find that 20,000–40,000 metric tons of alumina accumulates at high latitudes between 10 and 30 km in both hemispheres. Small changes in mesospheric heating rates lead to 1.5-K temperature anomalies in the middle atmosphere at high latitudes. These temperature anomalies are accompanied by changes in wind speed in the polar vortex.
So there are thermal effects on the climate as well as the effects on the ozone layer.
To understand if significant ozone losses could occur as the launch industry grows, we examine two scenarios. Our ‘ambitious’ scenario (2040 launches/year) yields a −0.29% depletion in annual-mean, near-global total column ozone in 2030. Antarctic springtime ozone decreases by 3.9%. Our ‘conservative’ scenario (884 launches/year) yields −0.17% annual, near-global depletion; current licensing rates suggest this scenario may be exceeded before 2030. Ozone losses are driven by the chlorine produced from solid rocket motor propellant, and black carbon which is emitted from most propellants. The ozone layer is slowly healing from the effects of CFCs, yet global-mean ozone abundances are still 2% lower than measured prior to the onset of CFC-induced ozone depletion. Our results demonstrate that ongoing and frequent rocket launches could delay ozone recovery. Action is needed now to ensure that future growth of the launch industry and ozone protection are mutually sustainable.
Note that this paper addresses only the ozone depletion from launches, not from reentry. But their 'ambitious' scenario of 5.6 launches/day is far short of Musk's ambitions, let alone the other planned megaconstellations. My understanding is that the 2040 launches/year in their scenario are of Falcon 9 class vehicles but "only 4.4% of launches are using vehicles designed for re-entry", which is implausible. But the mega-constellations can't be built or maintained with Falcon 9s.
To achieve that, they would need to launch 120,000 satellites per year. Over the 15 years, they would launch 1.8 million satellites, but 800,000 of them would fail (as part of our 9% failure rate), leaving a total operational fleet of one million satellites. This equates to 3,158 Starship launches per year, or nearly nine launches per day. For some context, the current launch rate for Starship is just five per year.
...
In order to keep a million satellites in the constellation, it needs to be maintained. So, each year, SpaceX would have to launch 90,000 AI Sat Minis to replace the roughly 9% of the constellation that failed. That equates to 2,368 Starship launches per year, or 6.4 per day.
That's 9 launches/day for 15 years then 6.4 launches/day indefinitely of a much rocket that is vastly bigger than Falcon 9 and is completely re-usable.
Of course, these claims are ridiculous - neither logistically nor economically feasible. But assuming Starship or a competitor such as Blue Origin does manage to create a reliable, reusable, 100 ton to LEO launch vehicle, there will be a lot more mass in LEO and a lot more of it reentering.
A 10-fold enhancement of lithium atoms was detected at 96 km altitude by a resonance lidar at Kühlungsborn, Germany, approximately 20 hours after the uncontrolled re-entry of a Falcon 9 upper stage. The upper-atmospheric extension of the ICON general circulation model, nudged to ECMWF, was used to calculate winds. Backwards trajectories, including wind variability as measured by radar, traced air masses to the Falcon 9 re-entry path at 100 km altitude, west of Ireland. This study presents the first measurement of upper-atmospheric pollution resulting from space debris re-entry and the first observational evidence that the ablation of space debris can be detected by ground-based lidar. The analysis of geomagnetic conditions, atmospheric dynamics, and ionospheric measurements supports the claim that the enhancement was not of natural origin. Our findings demonstrate that identifying pollutants and tracing them to their sources is achievable, with significant implications for monitoring and mitigating space emissions in the atmosphere.
The effect of lithium and other spacecraft ingredients on the ozone layer doesn't appear to have been studied compared to aluminum. To be fair, there will be a lot more aluminum.
We use a global inventory of launch and re-entry emissions covering the onset of the megaconstellation era (2020–2022), and project these to 2029 based on 2020–2022 growth rates. We implement this inventory into a 3D atmospheric chemistry model to determine the impacts of megaconstellations on the ozone layer and climate. We find that global stratospheric ozone depletion from all mission types is relatively small compared to surface sources and megaconstellation missions only account for about one-tenth of this depletion. This is because rockets launching megaconstellations almost all use kerosene, a large source of black carbon or soot particles, but not of chemicals such as chlorine that directly destroy ozone. Soot from rockets absorbs sunlight, warming the upper layers of the atmosphere and decreasing the amount of sunlight reaching Earth's lower atmosphere, causing it to cool. Megaconstellation missions are responsible for about half of this climate effect. In this regard, rockets launching megaconstellations and other missions are like small-scale stratospheric aerosol injection experiments without forethought for potential unintended consequences.
Again, this paper addresses only atmospheric impacts from launches, not from reentries. And, the launch rate for 2020-2022 is far less, and uses much smaller rockets, than the proposed "million satellite data center" and its competitors.
The author examines the migration of Indiana University Libraries’ interlibrary loan platform, ILLiad, from a locally-hosted server to OCLC hosting through the perspective of a new department head inheriting this critical technology decision. He explores how staffing changes, lost institutional knowledge, recurring system instability, and limited technical capacity prompted a reassessment of long-standing local practices. The piece outlines research, consortium consultation, approval processes, implementation challenges, authentication and workflow issues, and post-migration tradeoffs. Ultimately, the author offers practical guidance for new leaders tasked with managing inherited systems, vendor relationships, imperfect information, and strategic change in complex academic library environments.
While incarcerated students face many challenges when commencing higher education, a lack of access to the internet is a considerable barrier. This technological exclusion has implications for the delivery of course materials, most of which are offered only electronically. A project team from Curtin University Library sought to understand and address the challenges faced by incarcerated students in accessing library services, particularly ebooks and audiovisual content. It was found that restrictions related to contract terms, digital rights management, and copyright contribute to a reactive and uncertain situation for library services. This article outlines the state of the problem and offers possible pathways academic libraries can take to improve the state of information access for incarcerated students.
Countless research questions arise when investigating connections between library resource discovery and student success. Existing literature explores best practices of database description language and style, the usability of database A–Z lists, and library resource jargon. Academic libraries continue to grapple with these challenges in resource discovery, even as online searching behavior evolves and new research tools emerge. A research team at the University of Arizona Libraries builds on the literature by examining these topics with a focus on the impact of a user’s academic discipline, university affiliation (faculty, staff, or student), and research experience on their understanding of database terminology, resource content and applications, and A–Z list type filters. The authors conducted an environmental scan of library websites along with several usability tests to identify and reduce library and disciplinary jargon on their A–Z list to make databases more understandable and approachable to all users. This article presents the results of these assessments as a case study for exploring external and internal factors that impact users’ understanding and discovery of databases.
By April 2027 and 2028, institutions covered by Title II of the Americans with Disabilities Act are expected to be legally required to ensure that digital content created or used at the institution is accessible as defined by Web Content Accessibility Guidelines (WCAG) 2.1 Level AA. The new law strongly emphasizes accessibility of course materials—including PDFs. This case study demonstrates how an R2 academic library staff can enhance the accessibility of PDF course materials by improving the accessibility of electronic reserves (e-reserves) PDFs at Hunter College Library (HCL).
Processes described here can be adapted by other libraries. Supporting campuses’ work to make course readings accessible may be a natural role for academic libraries. Locating or procuring the best quality version of a text available to the institution is a critical task for which libraries are optimally equipped. Furthermore, when readings are available only in print format, libraries can create higher-quality scans than those typically produced when the task is left to individual faculty members.
HCL began improving the accessibility of e-reserves PDFs in 2020. This article shares the knowledge acquired, established processes, limitations, and future directions. The workflow comprises checking each e-reserves reading. For those deemed poor, we locate an HCL collection or open access copy, purchase a digital copy, or remediate. Remediation involves optical character recognition (OCR), fixing errors therein, correcting reading order, removing repetitive headers and footers, and tagging. Literature the authors found on libraries proactively correcting OCR and tagging PDFs—that is, preceding a user’s request—was sparse, with the exceptions of the University of Toronto and the University of Michigan. Literature about proactively doing so for e-reserves was even narrower. This case study is intended to help fill the gap.
This study evaluates the performance of four generative AI models—ChatGPT, DeepSeek, Gemini, and Copilot—in generating descriptive metadata for bibliographic resources. Models were tested on a small, diverse set of resources using four prompt types: a basic prompt, a basic prompt with an example, a detailed prompt referencing Resource Description and Access (RDA) guidelines, and a detailed prompt with an example. Results show that both detailed RDA guidance and the inclusion of sample outputs improved metadata quality, particularly in formatting and field structure. While DeepSeek and ChatGPT showed better performance on the tasks, all models displayed limitations in parsing and following the prompts, using descriptive metadata fields, analyzing subject headings, and assigning URIs. These findings suggest that while generative AI holds potential to assist in metadata creation, its current capabilities fall short of meeting cataloging standards without human review.
One of the generative artificial intelligence tools developed for use in libraries, including academic libraries, is the AI Primo Research Assistant. Of the 65 academic libraries in Poland, only 19 have access to software that supports this tool. In practice, only 9 libraries have implemented it (data from March 2025). For the purposes of this study, original research was conducted to assess the implementation status of the Primo Assistant in academic libraries in Poland. Two anonymous surveys were developed for this purpose and sent to libraries that had implemented the feature, as well as to those with the capability to run the Primo Assistant (i.e., the Primo VE Discovery admin role), in order to gather information on why they had chosen not to implement it. The analysis revealed several positive aspects, mainly a reduction in the workload of staff tasked with preparing publication lists on topics requested by library users. Some concerns were also raised by library employees, mainly regarding the reliability of the metadata provided and the accuracy of the recommended publications. The study also revealed a general lack of awareness and a need for further implementation. This paper presents the first scientific study focused on the implementation of the AI Primo Research Assistant in Polish academic libraries.
Effective information technology (IT) governance is essential for the University of Riau (UNRI) Library to achieve its research and educational objectives. This paper presents a qualitative pilot study investigating the library’s current IT governance processes, focusing on two COBIT 5 processes—DSS01 (Manage Operations) and DSS05 (Manage Security Services). These processes were selected in consultation with library and IT leadership due to their direct relevance to ensuring operational reliability and safeguarding the library’s information assets. COBIT 5 principles and capability models guide the assessment, emphasizing regulatory compliance, performance monitoring, and stakeholder collaboration. Using a detailed questionnaire and capability model, the study evaluates base practices and work products for DSS01 and DSS05. Results indicate varying proficiency levels, with DSS01 at level 0 and DSS05 at level 1, highlighting significant gaps between current and desired capability levels. Recommendations include implementing standard operating procedures, enhancing security measures, and optimizing resource management. In conclusion, the findings underscore the need for standardized processes, continuous monitoring, and alignment with established frameworks like COBIT 5. By addressing identified gaps and implementing recommended improvements, the UNRI Library can strengthen its IT governance, enhance operational efficiency, and better support its academic mission.
This study critically explores the transformative potential of human-computer interaction (HCI) in reimagining African public libraries as dynamic, user-centered, and culturally grounded spaces. Based on a literature review and comparative analysis of libraries across several African countries, the research investigates how HCI principles can enhance user engagement, usability, and inclusivity, particularly in multilingual, resource-constrained, and postcolonial contexts. The paper situates libraries as sociotechnical infrastructures that mediate between technology, local knowledge systems, and community needs, and argues for the importance of participatory and culturally responsive design approaches in library digitization efforts. The findings highlight significant gaps in current implementations of HCI within library services, including the lack of localized interfaces and limited user involvement in design processes. The study concludes by offering practical recommendations for integrating HCI into library development strategies and advocating for the co-creation of digital public spaces that reflect and empower Africa’s diverse knowledge ecologies. In doing so, the paper contributes to the growing discourse on decolonial approaches to technology and the future of public libraries in the digital age.
Writing has been light around here recently for a wonderful reason: our twins graduated from their respective colleges over the past month, and we have been in nearly nonstop revelry (and packing, and schlepping…). We are so fortunate to have two great kids; I’m super proud of them.
Speakers at our kids’ commencements, thankfully and remarkably, said little about artificial intelligence, but they did talk a lot about the complex circumstances and especially the psychology of this rising generation, and offered advice on how the graduating seniors should move forward in life given significant headwinds. I suppose it’s tempting to describe and analyze the troubles facing each graduating class, and provide sage guidance in response to the historical moment, but I’m not sure that my kids, their friends, and their generation overall are so very different from any other, or that any distinct advice is needed.
The Great Class of 2026 is, I’m afraid, just like every graduating class: happy and sad, confused and hopeful about the future, striving and procrastinating. Young adults, in other words. Sure, they seem to be impacted by new technology and our dreadful national politics and nerve-racking global challenges, but hasn’t it always been so? My college class graduated into a recession, the rise of the internet, the fall of the Berlin Wall, the chaotic end of the Soviet Union, and a messy war in the Middle East — all of these dominoes falling after a childhood in which we were fairly sure we would perish at any moment in a nuclear war. That was a lot to absorb! Back then, commencement speakers picked up on our anxiety, which had apparently morphed into excessive irony and a general lack of motivation, epitomized by the title and content of a Richard Linklater film: Slacker.
It may have taken some time, but we muddled through. So did the generation another turn of the clock back from ours (Vietnam, stagflation, etc.) and the generations before that (pick your World War and/or the Great Depression, etc.). History is, unfortunately, a procession of horrible developments, but also a showcase of astonishing resilience and creativity. Is it so Pollyannaish to simply say that Gen Z will also find a way forward, and frankly might be better off without pithy advice from the olds? Must we unconsciously mimic the opening of Woody Allen’s fictional commencement address, raising the graduating class’s blood pressure by declaring, “More than at any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly”?
Instead, I saw hope in every joyful row of begowned seniors, students who, despite all of the radical changes and stressful tensions around them, had nevertheless maintained their curiosity and maybe even cultivated a passion during college. Students who found their special niche in music, writing, art, or science, who felt compelled to listen to it all, read it all, see it all, or experiment late into the night, regardless of the requirements of the classroom. I have a feeling that this kind of deep and abiding engagement, born not from careerism but from genuine profound interest, will serve these graduates well in the years ahead. As it always has.
Books I Have Not Written
The class-action lawsuit of authors against Anthropic and its subsequent settlement have helpfully informed me of the many, many other writers named Daniel Cohen, because the settlement administrators, in their quest to match authors and texts, have sent emails and letters asking if I am the Dan Cohen who wrote this or that book. There are too many volumes by The Daniel Cohens to list in full here, but as a public service to a handful of special fellow Dans, I hereby declare:
I am not the Daniel Cohen who wrote The Monsters of Star Trek, but I would wager 100 quatloos on Triskelion that I would greatly enjoy meeting that Dan Cohen.
I am #$%@# mad I am not the Daniel Cohen who penned Famous Curses, because my family is on a mission to bring back the useful exclamation “Gordon Bennett!”
I did not write Southern Fried Rat and Other Gruesome Tales, but, based on the delightful cover of this not-me Daniel Cohen book, I probably read it at camp the year it was published.
My final confession: The settlement administrators believe there is a Daniel Cohen who authored a book titled Final Confession, but, alas, I am not the one.
English Edition: floppy disks, hard drives, CDs, DVDs, SSD drives - no
matter what you choose to store your data on - ultimately they all
decay. With my guests Callum McKean, Leontien Talboom and Adrian
Page-Mitchell, we’re going to talk about what kinds of data we find on
old drives, why we want to get them in the first place, and what can go
wrong with the storage media. To all of you who love all things retro -
we’ll be talking about floppy disks a bit.
I run a RAG application for Italian pension and tax consultants. Users
ask questions about INPS, professional pension funds, laws and
regulations, and the app answers using a knowledge base of uploaded
documents.
For a long time the app used the classic single-shot RAG pipeline: take
the question, search the database, stuff the results into a system
prompt, ask the model. It works, but it has a hard limit: the retrieval
happens once, before the model has any chance to reason about the
question. If the first search misses, the answer is bad and there is
nothing the model can do about it.
So I rebuilt the pipeline as an agent. Now the model drives the
retrieval itself: it decides what to search, reads the results, searches
again with different terms, follows cross references between documents,
and only then writes the answer. All in plain Ruby, with RubyLLM and
Rails. No LangChain, no Python sidecar.
In this article I will show you exactly how it works, with the real code
from my application. One note before we start: since the app serves
Italian consultants, all the prompts, tool descriptions and user-facing
strings are in Italian in the real codebase. I translated them to
English here so you can follow along, but the structure is identical.
Wikimedia and GLAM institutions share a challenge. How do we make
cultural heritage collections accessible at scale without sacrificing
quality, provenance, sustainability, or community control? The
International Image Interoperability Framework, IIIF, is now used by
thousands of institutions to serve high-resolution media through open
standards. Wikimedia does not currently integrate IIIF in its core
architecture. Should it?
Since 2023, Montgomery Planning staff have been working on the Eastern
Silver Spring Communities Plan, drafting recommendations on zoning and
land use, transportation, housing, parks and the environment, economic
development and urban design. The plan is expected to set a vision for
the area’s future development for decades to come. The plan is bordered
by Colesville Road, University Boulevard and New Hampshire Avenue and
will include three future Purple Line stations, the Piney Branch Road,
Long Branch and Manchester Place
Design is broken. Young and not-so-young designers are becoming
increasingly aware of this. Many feel impotent: they were told they had
the tools to make the world a better place, but instead the world takes
its toll on them. Beyond a haze of hype and bold claims lies a barren
land of self-doubt and impostor syndrome. Although these ‘feels’ might
be the Millennial norm, design culture reinforces them. In conferences
we learn that “with great power comes great responsibility” but, when it
comes to real-life clients, all they ask is to “make the logo bigger.”
On our strictest tests, Gemini 3 achieved a CER of 1.67% and a WER of
4.42%. On these tests, any difference between the ground truth and test
texts counts as an error. WER is thus almost always a bit more than
double the CER because if a single character in a word is wrong,
including leading or trailing punctuation like commas, single quotes vs
double quotes, etc, the whole word is marked as an error. On this
measure, Gemini 3 performs nearly 50% better than the best, fine-tuned
specialized models and achieved performance comparable to an early
career, professional human typist.
FacilMap is a privacy-friendly, open-source versatile online map that
combines different services based on OpenStreetMap. FacilMap offers the
following features:
Show different map styles, for example maps optimized for driving,
cycling, hiking or showing the topography or public transportation
networks.
Search for places
Show amenities and POIs
Calculate a route, optionally showing the elevation profile.
Find out what is at a particular point on the map
Open geographic files, for example GPX, KML or GeoJSON files
Show your location on the map
Share a link to a particular view of the map.
Add FacilMap as an app to your device.
Change the language settings in the user preferences.
FacilMap is privacy-friendly and does not track you
SQL makes sense. But when it breaks, you reach for EXPLAIN. Vector
search offers no such comfort. Multi-thousand-dimension embeddings,
approximate nearest-neighbour indexes, and quantisation tradeoffs make
it hard to know what your system is doing, and harder still to diagnose
when results quietly degrade. Through interactive visualisations, Simon
Hearne shows what embeddings look like in high-dimensional space, what
quantisation does to your recall, and how to catch retrieval failures
before your agents do. You’ll leave with a sharper mental model and a
diagnostic toolkit for the production problems hardest to see.
Once again I am reminded that modern web tech is amazing, and web
browsers are incredibly capable.
There’s a Screen Capture API to record the screen. You can select a tab,
a window, or the entire screen. The feature has limited browser support
so I don’t think I’d use it in a big web app, but it’s fine for a
one-off screen recording. (I wonder how browser-based video conference
apps like Google Meet do screen sharing? Do they use this API, or do
they use something with wider support?)
TL;DR if you have a TASCAM 788 backup and don’t know how to get the
audio out of it this
script might help. Also: AI tools work best when paired with
expertise.
I needed to take a very personal excursion into digital preservation
recently as I attempted to listen to some audio recordings my brother
John had made about 20 years ago. John died recently,
and is sorely missed by his friends and family.
John was a continuous source of inspiration for me, because of his many
varied interests and projects. One thing he did consistently since he
was a teenager was perform music as a singer-songwriter.
As my family and I went through the very difficult process of emptying
his apartment, we discovered a set of recordings he had made on CD-R.
Three of these CDs were clearly conceived of as albums, and easily
mounted as CDDA
when I popped them in my CD player.
However he also left a binder of CD-Rs, where each CD was neatly labeled
with a song title and a year. All in all there are 108 of them, from the
2003-2008 time period. There is a lot of material on these CDs that is
not present on the three albums. However, when I popped these in my CD
player all I saw was a macOS error dialog box saying:
The disk you attached was not readable by this computer.
John’s binder of CD-Rs
At first I thought they might be damaged or corrupted. But it seemed
unlikely that so many of them would be. After some asking around I got
pointed to two excellent guides to working with CDs:
These guides were great, and did help me extract the raw data from the
CD-R with cdrdao, but
ultimately I was unable to determine what format the data was in using
tools, like file, Siegfried and Droid.
In a fit of desperation I spent some time in Claude Code trying to see
if it could help me identify what format the data was in. Despite
several forays, it kept going round in circles, burning tokens.
One of those forays led me on a wild goose chase installing an old
version of macOS in order to see if an old version of Retrospect
might be able to read the CDs (it didn’t).
During this time I got some excellent advice over in the Fediverse at
digipres.club. One of
those messages was from Ross Spencer who took a look
at a sample raw CD image. He was able to spot some markers that pointed
to it possibly being a backup from a TASCAM DAW,
specifically a TASCAM 788 (I
believe Ross was using either strings or a hex editor to look
for these clues).
TASCAM 788
Unfortunately, after poking around in various user forums, I discovered
that there were not really any tools for working with TASCAM 788
backups. Everyone seemed to be recommending the purchase of a TASCAM 788
and its CD Burner, since the data was in a proprietary format, and there
were no emulators.
Before dropping some money on Ebay I decided to roll the dice with
Claude Code again, but this time with the more specific
guidance that this was likely a TASCAM 788 backup, and asking about
options for recovery. If you are interested you can read the
transcript for this session. The key part of the back and forth for
me was:
The 2488 stores audio as raw 16-bit or 24-bit PCM at 44.1kHz in a
proprietary block structure. Once you identify the byte offset where
audio data starts, you can use Audacity’s “Import Raw Data” with 24-bit
signed big-endian PCM, 44.1kHz, to listen and verify.
I prompted it to try to identify the offset, so I could attempt the
import in Audacity. It did some work writing Python snippets and
executing them for a few minutes, and then output a likely offset. The
first time I read it in I only heard white noise. But after twiddling
some of the import options in Audacity I saw some promising waveforms
appear in the Audacity display. And when I pressed play ✨✨✨✨ instead
of white noise I heard John’s guitar and voice!
Audacity screenshot of imported raw data
What appeared to be a single track turned out to be multiple tracks
created with the TASCAM, that were joined together. The final segment
was the completed mix.
I continued to work with Claude on a program that would identify the
offset in the raw CD data, then extract a WAV file, and then extract the
separate tracks, as well as the complete track. It did this by looking
for gaps inside the audio. I put the program here:
Here is the guitar / vocal first track (there are a few seconds of
silence at the beginning):
And here is the mix including percussion and keyboards:
These recordings are Copyright John Summers CC-BY-NC
I have since been able to find John’s TASCAM 788 at my brother Matt’s
house–although it doesn’t have the SCSI external CD burner anymore. So
there’s no way to read the CDs with it.
These CDs and songs are important enough to me that I want to see if the
actual hardware can do a better job of preserving John’s work. So I’ve
got a bid one of the external CD-Recorder devices I found on Ebay.
John clearly spent a lot of time and care taking a snapshot of these
songs he used to perform in coffee shops around Bucks County
Pennsylvania. I plan to release some of them on his Bandcamp, with some
of his artworks as album covers. I want to share them with people who
knew him, and put these songs out into the world in a way that respects
his memory and creative work, while also being something that he just
wasn’t focused on as an artist. For John it was the creative process
itself that mattered most.
None of this will bring John back of course. He’s gone now, and at
peace. But he will always be remembered by those who loved him. Look for
more posts here after I’ve been able to extract these songs in total.
If you manage to have most of your input tokens be cached, you save a huge amount, in this case $0.20 per million tokens. What does this mean though? What does caching do that makes you save so much, in some cases upwards of tens of kilodollars?
Someone explain the cached vs not thing to me for how this is $10,000
worth of savings lol
I'm gonna be totally honest, I barely understand the basic outline of the math
involved here. Where possible I am to not be completely wrong here, but I'm
not going to emit something 1:1 accurate with the mathematical truth of large
language models' inner workings. Bear with me.
When you make an API call to large language model services, you make an API call like the following:
curl http://localhost:11434/api/chat -d'{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
That messages element is the key bit. Every time you accumulate messages from the initial system prompt, initial user request, AI responses and any tool use requests/responses, you add to that array and make it grow bigger and bigger.
A good way to think about this is that sending a conversation to a large language model is like having a pair of people share a roll of paper on two different typewriters. Every time you finish your message, you send the roll of paper back to the AI model and it has to re-read through the entire conversation in order to start typing on the end with its response. As the conversation gets longer, this gets more and more expensive because the model has to recalculate its internal state all over again for every additional message.
However, large language model inference is complicated but deterministic. Given the same inputs, you will always get the same output. This means that you can use a technique called key-value caching (KV caching) in order to save that intermediate state and use it for next time. Most of the time this cache is a prefix cache because that allows you to just add on more messages to the end of the request pretty easily and be fine.
Imagine something like this:
curl http://localhost:11434/api/chat -d'{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "The sky is blue because of a phenomenon..."
},
{
"role": "user",
"content": "But I am looking outside right now and it is orange!"
}
]
}'
If the model has already processed the question about the sky being blue and generated the response about Rayleigh scattering, it doesn't need to process both of those messages again to answer the user's question about sunsets. In production AI model deployments you would put that generated intermediate state into the KV cache so that the model doesn't need to run twice for the same data. This saves time and effort on the side of the AI model provider, and currently model providers decide to pass that savings onto API users in the form of cheaper inference costs for cached lookups.
As you develop an application with AI in it, try to avoid changing any inference settings or previous messages between prompts. This makes your application's queries much more likely to read from the cache, making it faster, reducing the environmental impact, and saving you(r users) money.
This is the first installment of a three-part series on global library leadership engagement, contributed by Ellen Hartman, OCLC Leaders Council Manager. We’re grateful to Ellen for sharing her perspectives on this topic.
Proof we engaged face to face
At a recent gathering of the OCLC Leaders Council, something happened that I always hope for but never take for granted. Connections were being made, there was laughter, sidebar conversations over lunch and dinner, and a willingness to challenge each other’s ideas, honesty about what people were struggling with, and genuine curiosity about what others are doing. All of this was built on a foundation of trust that made these in-depth conversations possible.
These moments don’t happen automatically. In my experience, they take time—and often, the opportunity to meet in person. Meeting online can be very efficient, but it can feel rushed and impersonal—it’s hard to truly get to know each other through a screen. Being in the room together over the course of a few days, in a small enough group that you actually get to speak to everyone, creates a solid foundation for future opportunities to meet again, online or in person, to build on the connections, themes, and conversations that started there.
What made this gathering particularly significant was its global dimension. Library leaders do come together regularly, but often within their own region, or among peers from the same library type. Academic and public library leaders, for instance, don’t always get the opportunity to meet for in-depth conversation, even though there is much they can learn from each other. Conversations organized by library type or region have real value, of course, but there is something additional that comes with a broader perspective that is still rooted in the library ecosystem while extending beyond your usual network. Every perspective in the room adds something, regardless of what an institution has or hasn’t yet achieved. The value of these conversations comes from the range of experiences present.
Across international leadership spaces, a remarkably consistent vocabulary tends to surface. Terms recur across sessions, regions, and formats, and their repetition signals that we are all on the same page: a reassurance that participants are engaged with the same broad challenges and moving in a broadly similar direction.
The problem is that shared language doesn’t necessarily mean a shared understanding or a shared reality. One of the things that becomes apparent, watching these conversations unfold, is how often the same word lands differently depending on who is in the room.
Take efficiency, a term that surfaces regularly in conversations about how libraries operate and plan strategically. In some contexts, efficiency encompasses decisions about workforce size and structure. In others, those decisions are shaped by employment frameworks that lead to a very different kind of conversation, shifting the focus instead toward technology, software, or finding different ways of working within existing structures. The word is the same. The need it describes, and the range of solutions available, are not. This is why you need a deeper understanding of each other’s context to find out where you are using the same words but aren’t speaking the same language.
Glimpses, not full pictures
Even with that understanding in place, international leadership conversations can only ever offer glimpses of each other’s reality rather than the full picture. You see enough of someone else’s context to recognize the challenge, but rarely enough to understand all the constraints behind it.
This matters because those constraints are often what make the difference. Take something many library leaders struggle with: making the case for their library’s value to the broader institution or community they serve (for more on this topic, see OCLC Research’s latest report!). Some leaders have, through long-term effort and considerable perseverance, managed to position the library as visibly central to their institution’s priorities and a key part of its success. For others, making that same case remains difficult. The reasons could be structural or personal: the physical or organizational distance between the library and the part of the institution that makes key decisions, the data available to demonstrate the library’s impact, or the library leader’s own position, voice, and access to the right conversations at the right time.
In international settings, what tends to surface is the success story. What is harder to showcase is the full path to that success. The years of lobbying, the hundreds of stakeholder conversations, the incremental steps that made this outcome possible. A leader who has achieved that recognition may share what they did in good faith and genuinely want to help others reach the same goal. But because the conditions that made their success possible are often invisible in how the story gets told, it can be hard for them to understand why the same challenge feels insurmountable to a peer.
The value outside of the program
International leadership meetings are often evaluated by what happens in the formal program. But some of the most valuable exchanges happen elsewhere. Recognizing that is part of understanding how these spaces work in practice.
In smaller gatherings, it’s the time outside the formal agenda where a lot of the magic happens. When a group of library leaders meet for the first time, they are still in the process of getting to know one another. This is why you can’t expect them to immediately share their biggest challenges or most acute pain points. There is a measure of trust building that happens as a gathering takes place, especially over multiple days. It’s often after the official program ends, and there is room for leaders to relax and reflect together (for example, during dinner or at the bar) that the more personal and complex topics get discussed.
That kind of conversation requires enough prior exchanges that people feel safe being a little vulnerable. Admitting that your library is struggling to secure its position, or that you haven’t found a way to make your value proposition tangible enough to institutional leadership or other stakeholders that control funding, is not something most people are willing to do in a room full of peers they’ve just met. It becomes possible when the group has had time to become something more than a collection of strangers.
This is one of the reasons smaller, sustained gatherings tend to produce a different quality of exchange than large conferences. It is also why the informal spaces within those gatherings deserve to be nurtured rather than left entirely to chance.
No neat resolutions needed
One expectation worth setting aside is that international leadership conversations should resolve into clear conclusions. They rarely do, and that is not a failure.
Conversations like these do not need to end in consensus or a neat step-by-step path forward. It’s often the process of sharing and reflecting on both differences and commonalities that provides the greatest benefit. It might be an idea you hear and want to incorporate in your own library. A perspective that’s truly new to you and makes you see a topic in a different way. Or simply the opportunity to take a subject that was discussed at surface level and deepen the conversation in future gatherings.
That is why continued engagement matters more than resolution. Understanding accumulates across multiple conversations, multiple gatherings, and sometimes multiple years. It cannot be compressed into a single meeting, however well designed. The friction and the moments of genuine surprise are part of the value. Smoothing those moments away or rushing toward consensus risks losing exactly what makes international exchange worthwhile.
Conclusion
International leadership spaces are often judged by the ideas they surface or the alignment they appear to produce. But their deeper value lies in the glimpses they offer into realities that are different from our own. Those glimpses don’t tell the full story of what other library leaders are experiencing, but taken together, they help form a better understanding of what experiences are out there.
When designed well and when opportunities for informal interactions are cultivated, global library leadership spaces create the conditions for the kinds of conversations that go deepest. Those conversations rarely happen on the agenda, but rather emerge when enough trust has been built that people are willing to be open and candid with one another. That is not something that happens automatically: it requires continued investment in bringing people together, and repeated exposure to each other’s contexts, experiences, and points of view over time. Trust is not built overnight.
The next post in this series takes a closer look at what global engagement actually involves beyond the conversation itself and why showing up, in every sense of the phrase, costs more for some than others.
I have a problem with RSS. Not RSS itself, RSS is great!
The problem is that I subscribe to more feeds than I can possibly read,
so the unread count in FreshRSS climbs faster than I can bring
it down. Some days I skim titles, declare bankruptcy, and mark
everything as read. Other days I let it pile up and feel guilty.
I’ve tried to using newer tools like Current which was
definitely an improvement, but still didn’t quite do it. My friend Dan
has been working
on a new RSS tool that works a bit like a personal newspaper, that seems
like it could be extremely helpful, and I’m keeping my eye on it. But
meanwhile the list of unread posts grows…
Now, I’ve been very reluctant and slow to introduce LLMs into my daily
work. But even from under my rock, in a cave, down by the river, I’ve
heard that LLMs are good at text summarization.
I thought maybe, just maybe, I could try using one to summarize
my unread posts? It seemed like a good fit for an experiment since the
impact of getting things wrong is basically zero (in theory).
I wanted to try routing my unread RSS posts through an LLM to get a
daily digest. From under my rock I’d also heard about
Model-Context-Protocol (MCP),
and how it is going to change everything. So I thought it would
be a good exercise in seeing how that works in practice with a tool like
Claude Code. I’d use Claude Code’s MCP support to connect directly to
FreshRSS and ask Claude to summarize what I’d missed. Yeah, that’s the
ticket.
This is the Way?
The first thing I tried was ChrisLAS’s
freshrss-mcp server, which wraps the FreshRSS GReader API and exposes
it as a set of MCP tools. The idea is that you drop it into your Claude
configuration and Claude can then call those tools to fetch and read
your articles.
I gave it a try, and it worked! But the results were… mixed. Claude
would usually fetch articles. But then it would produce a lot of
diagnostic chatter alongside the actual summary: narrating its own tool
calls, noting what it was about to do, explaining why it was skipping
certain things, asking for permission for this and that.
And more frustratingly, it would sometimes take strange detours:
executing inline Python code, and Unix tools to do things it could have
done by calling the MCP tools more directly, wandering into unnecessary
computation. The experience felt noisy and unpredictable, and (frankly)
just a bit scary.
I started by creating some “skills” and some scripts for those skills
thinking it would make things a bit more deterministic. It kinda did?
I thought maybe my problem was that the skills weren’t bundled together,
so I built my own plugin: freshrss-claude. This
version bundled the MCP server as a Claude Code plugin with a set of
“skills”, the structured prompts to guide Claude through fetching and
summarizing in a more controlled way.
It seemed better? Not needing to start the MCP server was definitely
better. But ultimately it wasn’t as big an improvement as I’d hoped for.
Claude still exhibited strange behaviors: writing and executing Python
scripts unnecessarily, going off-script in ways that were hard to
anticipate. The summaries themselves were fine when they arrived, but
the path to getting them there was erratic and unpredictable.
The last straw for me was the idea of running this Rube Goldberg machine
from a cron job to generate the summary for me automatically. To run it
automatically I needed to grant it all kinds of permissions to ensure it
ran through. This scared the shit out of me, given it was giving it
permission to run arbitrary Python programs and reach out to the web,
and interact with the filesystem. Running it once or twice manually was
ok. But sticking it in my crontab and forgetting about it? Forget about
it. I exprerimented briefly with putting things in a Docker container,
and Claude Cowork’s sandboxing, but then…
Turning it inside out
I stepped back and rethought the problem. The thing I’d been trying to
do, have an LLM orchestrate a set of tools to accomplish a task, is one
(seemingly popular) way to use an LLM. But it turns out to be kinda
demented. You’re asking the model to plan, to sequence, to decide.
You are asking it to be An Agent. Sure models can do this, but they are
not reliable in the way a simple program is. They wander. They
improvise. They sometimes decide to take a detour. Do I really benefit
from this runtime model in this little RSS digest app? Nah, not really.
So the alternative, and this is the inversion that made things click for
me, is to write a deterministic program that calls the LLM as a
component, rather than letting the LLM drive the program as an Agent. My
code fetches the articles. My code shapes the prompt. My code writes the
output to a file. The LLM does exactly one thing: it reads the content I
hand it and produces a summary.
Take Two (or Three, or Four?)
I threw it all on the fire and started over by writing rss-digest instead. Well,
truth be told, Claude and I wrote it. Ok, ok, mostly Claude.
It’s a small Python CLI that connects to any GReader API-compatible
RSS reader (FreshRSS, Miniflux, Tiny Tiny RSS, The Old Reader), fetches
your recent unread articles, and asks an LLM to produce a digest.
Because it uses LiteLLM under the
hood, you can point it at any compatible model: OpenAI, a local model
running in LM Studio, whatever you prefer.
The output is a Markdown file (or HTML with –html). I have
a cron job run it in the morning and drop a file on my desktop for me to
read. Here’s an example
of what it looks like.
For smaller batches (≤25 articles) it gives you a structured list. For
larger ones it produces a curated prose summary grouped by theme. You
can pass a custom system prompt file if you want to tune the style or
grouping. You can pass –mark-read if you want it to mark
everything as read afterward.
The tool is on PyPI and the code is on
GitHub. I’ve just
started using it, so it quite possibly has problems. The prompt that is
used for doing the summarization is configurable. If you have a
different take on the prompt or want to extend it, please send me a pull
request so I can add it as an alternative.
So…
What I keep coming back to is the design lesson underneath all of this.
There’s real value in being thoughtful about which part of your
system is deterministic and which part is probabilistic. There’s no
doubt that LLMs are magical things, but it’s not a reliable program. It
shouldn’t always be the thing making decisions about what to fetch, when
to stop, or how to structure output. Hand it a well-formed input, ask it
a clear question, and (hopefully) it will return something useful.
Everything else, the plumbing, the sequencing, the file I/O stays in
your code that you can look at, and test and run directly.
I’m not saying all programs using LLMs need to take this approach. I’m
just saying maybe you don’t need MCP, Agentic AI, etc, etc all the time.
Experiment with it, but don’t forget to turn it inside out when you need
to.
Once again I attended most of the library of Congress' Designing Storage Architectures workshop remotely. I apologize for the delay in posting this; domestic duties have kept me very busy recently. Below the fold notes on the talks that caught my attention, based on my now somewhat memory and the slide decks for the talks from the Library of Congress website.
As usual, IBM's Georg Lauhoff provided an invaluable overview of the storage industry as of late 2025, co-authored with Sassan Shahidi. They make an important point that I have been making since at least 2018's Archival Media: Not a Good Business:
Challenges of Alternative Archival Technologies
• Alternative archival technologies face technical and economic hurdles.
This justifies their focus on flash, hard disk and tape. Their "exabytes shipped" graph shows that indeed Hard Disk Unexpectedly Not Dead; the dramatic decline in HDD's share since 2008 reversed in 2024.
The key metric for technological progress in traditional storage media is areal density:
Lauhoff and Shahidi's graph shows that tape, which has the easiest path because of the relatively large size of the bits, has continued its steady growth, although one could argue both that their 24% annual growth exaggerates the period since 2017, and that INSIC's projection of 28% is optimistic.
It is clear that HDD areal density progress slowed dramatically about 2010 to around 11% per year. But the developments Jon Trantham reported, see the next section, could lead to a significant acceleration in HDD areal density.
Flash has continued a steady 30% per year growth since about 2010, thanks to stacking cells vertically and storing multiple bits in them. Both of these have limits, into which the industry will eventually run.
As regards the relative cost per TB of the three media, the big picture is that since around 2010 change has been very gradual. Tape and flash have both become cheaper relative to HDD, but the rate of change has been much lower than predicted.
Lauhoff and Shahidi conclude that:
Tape Storage: continues to evolve.
HDD: improvements slow down but recently high demand.
NAND: well-suited for hot storage but not for archival purposes.
Lack of Alternatives: Within the foreseeable future (within 10 years), there are no viable alternatives to Tape, HDD, and NAND storage.
AI leads to storage demands across the tiers
This last point was a theme for the entire meeting. But it is important to note that the meeting was too early to capture the full impact of AI on the cost and availability of media and systems.
He also announced that they have started to ship their 40TB HAMR drives. Their roadmap to 100TB/drive presents some significant challenges, as shown in Trantham's slide. The history of HAMR shows that Seagate can surmount major technical challenges, but it may take longer than they project.
One of Trantham's slides vividly illustrated the technology challenges the HDD industry faces, showing to scale to evolution since 1997 of the sizes of the bits on the media, the reader, and the writer. Note the 1610-fold decrease in the area of the writer, the 305-fold decrease in the area of the bit, and the 289-fold decrease in the area of the reader.
Fifteen years ago, Ethan Miller, Ian Adams and I published Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes. It was inspired by work at Carnegie-Mellon from 2009, FAWN: a fast array of wimpy nodes, which argued that implementing fast storage using large numbers of small nodes built from cell-phone technology could save two orders of magnitude in energy per query. We argued that it would be possible to build low cost, low energy archival storage systems using a similar approach.
Our idea was ignored, but at this meeting Ethan Miller revived the idea of using flash as an archival medium. He argues for a rack-scale system storing 500PB/rack built from 5U shelves, similar to Backblaze's, each holding 216 of Pure Storage's 300TB DFMs (direct flash modules) stacked vertically.
There are three big challenges:
First, if all the DFMs were actively I/O-ing the rack would draw 45KW. Supplying the rack with that much power and cooling it would be very difficult (see the design of Nvidia's racks). But, just as with Facebook's hard disk cold storage, this can be mitigated by scheduling accesses so that only a small proportion of the drives are active.
Second, flash cells gradually leak electrons, so must be regularly refreshed by reading and re-writing them. This task must be scheduled along with the application's reads and writes, but doing so is fairly easy since the refresh timing isn't critical.
Third, flash is more expensive per TB than hard disk or tape. As I have argued for a long time, in the archival storage market the time value of money makes it difficult to justify trading increased capex for decreased opex:
The opex savings are significant, with essentially no mechanical failures, more benign failure modes, and much higher bandwidth for erasure code recovery.
Miller argues that the capex isn't as bad as the cost of the media makes it look, because at 0.5EB/rack there are savings in space, power and cooling. He doesn't point out that the lower latency for read access potentially allows for the elimination of an entire warm layer of the storage hierarchy.
But he acknowledges that AI is driving up the media cost. This is probably only relative to tape, since hard drive prices are also skyrocketing.
Miller argues that, over time, flash costs will come down. The scope for further shrinkage of the cells, and the addition of more layers, is limited. Once that happens the fabs that manufacture flash will gradually fall behind the leading edge and become depreciated.
Although I'm naturally biassed, I think Miller's case for archival flash is worth a detailed investigation.
Fourteen years ago in Cloud vs. Local Storage Costs and More on Glacier Pricing I started writing about the way the complex and somewhat opaque pricing models of cloud storage platforms made it difficult to estimate how much you would end up paying. People are just now figuring out that AI has the same problem. Neither is an accident; these pricing models serve two goals important for the platform's business model. First, the purchase decision is based on the "Low, Low" advertised price. Second, once you discover how much more you're actually paying, you face the lock-in created by egress fees. In 2019's Cloud for Presevation I wrote about how egress charges implement vendor lock-in.
David Boland of Wasabi presented a current analysis of this issue. He reports that about half of all the organizations they surveyed exceeded their budget for public cloud storage.
The budget overruns were caused by the fact that the actual spend was about double the sticker price for the storage. Fees were the culprit, which by design are much harder to project.
Using AMAzon's cloud services for long-term preservation always suffered from the fact that, to confirm the fixity of the preserved content, it had to be read and thus incur fees. Finally, two advances have improved things. First, it is now possible to use SHA-256 and SHA-512 schecksums when uploading data. Second, it is now possible to use an S3 batch job to validate the checksums on objects without reading them.
AI is one of those generational tech topics that isn’t going away soon. But the signal to noise (or hype to reality) ratio can be truly overwhelming. There are just so many links, opinions, new resources that are getting lost in the mix. And that’s for us information and tech nerds – we can only [...]
In the Spring 2026 semester at Old Dominion University (ODU), I taught CS 450 (Undergrad) / CS 550 (Graduate): Database Concepts. The course was fully online, with synchronous live Zoom sessions held twice a week. The attendance was not mandatory but strongly encouraged. All lectures were recorded and made available for students to access whenever needed.
Figure 1: Canvas course page for CS 450/550: Database Concepts
Through this blog post, I want to share my experience of teaching a senior-level undergraduate/graduate course for the first time, the behind-the-scenes realities of course preparation through to the end of the course, and how student feedback actively shaped the course as it progressed.
Since the course had been taught previously by other instructors, materials were already available, which made things easier. Rather than building everything from scratch, I started by copying over the existing course structure and then carefully updating it to align with the current semester. The more time-consuming part was setting everything up, cleaning up the Canvas course, especially updating deadlines and revising the syllabus, while ensuring the topics were properly aligned with assignment deadlines. If you are instructing for the first time, it is very important to make sure you get access to the course in time, so you can set everything up without a rush. Throughout the semester, to make the most of class time, I spent a couple of hours before each session preparing things such as reviewing material, planning examples, and thinking through how topics would connect. I tried to debug issues during the class in real time whenever possible. If something took longer than expected, I pushed it to the end of class or moved it to the office hours. It helped me to continue the flow of the topic without interruptions.
I was able to experience first hand how handling a class of 50 students without a teaching assistant (TA) was, honestly, a lot more work than I expected. Grading labs, homework, quizzes, and discussions while also preparing for lectures and responding to emails required a constant balance. I wasn’t always perfect, but I made a steady effort to stay on top of it. Grades were returned as quickly as I could manage, and emails were typically answered within 24 hours, often sooner. Again, it reinforced something I had already noticed as a student: timeliness matters. Things do not have to be instant, but when there is a clear effort to respond and follow through, it builds trust and keeps students engaged.
One of the first challenges I faced as an instructor to this course involved managing classroom dynamics. After a few classes, a student shared a concern that some well-intentioned peer engagement (jumping in to answer questions or adding explanations during lecture) was becoming distracting to follow along. It was a fair concern, and an important one. At the same time, I didn’t want to discourage participation. Active engagement is something every instructor hopes for, and it was clear that students were eager to contribute. My challenge was to find the right balance. I responded by acknowledging the concern and assuring the student that I would make adjustments so that participation remained helpful rather than overwhelming. Before taking action, I also reached out to a mentor for advice, which helped me approach the situation more thoughtfully. I thanked students for being engaged and willing to contribute, but also clarified expectations: participation was welcome, but lectures and question answering would be primarily instructor-led, with designated moments for peer discussion. I also reflected on something I had noticed during the class introductions: students were coming from a wide range of backgrounds. Some had prior experience with databases, while others were encountering these concepts for the first time. Because of that, maintaining a consistent pace and structure was important. I believe that framing it this way helped convey the message to the students that my goal is not to limit participation but to support a better learning environment for all. There were no further concerns raised afterwards and the students remained engaged while being supportive of the entire class.
Midway through the semester, I conducted an anonymous check-in survey to better understand how students were experiencing the course. To encourage participation, I offered a small amount of extra credit, which resulted in a strong response rate.
Overall, the feedback was encouraging, most students agreed or strongly agreed that assignments were clear, the workload was manageable, and the pace was appropriate (Figure 3). But what mattered more were the written responses. They highlighted patterns that helped me see the course from the students’ perspective (full set of responses).
A few consistent concerns stood out:
Some students said they weren’t always sure what to prepare before class or whether a session would lean more toward lecture or lab. That feedback pushed me to be more specific in my announcements, clearly laying out what each class would cover.
Several students pointed out that while their answers were marked incorrect or partially correct, the reasoning behind it wasn’t always clear. This was a fair point, and a difficult balance when grading at scale. Still, I made a more conscious effort to leave clearer comments.
Even when students understood the concepts, many struggled to translate them into SQL queries or ER diagrams. That reinforced something I kept coming back to: the need for more in-class examples and live coding, which I continued to prioritize.
Interestingly, a lot of students said the challenge wasn’t the material itself, but managing their time. A few students shared situations where missing a single assignment significantly impacted their grade. This feedback later influenced my decision to allow requests for reopening missed work.
At the same time, there were plenty of positive notes that helped confirm what was working:
Students consistently appreciated the clarity of explanations and examples.
The labs and live coding sessions were frequently mentioned as highlights.
Many felt the course structure was organized and manageable.
Some even described it as one of the best online courses they had taken.
I also asked students a simple question: what’s one thing I should keep doing, and one thing I could do better? Here are some of the responses that stood out:
“The instructor is great. Instructions are clear, vibes are good, I would recommend this class. The homework is work intensive but not unreasonable.”
“You have been doing a great job and this has been one of the best online courses I have taken at ODU”
“very good at explaining things, even when the students dont seem to get something she fines a new way of explaining it so they get it.”
“The instructor is accommodating to students within reason and I believe that is something they should keep doing.”
“keep being a great teacher :)”
Figure 3: Summary of student responses to four questions: assignments, workload, grading, and pace
At the mid-semester point, once the grades were up-to-date, I started reaching out to students who had missing work or were falling behind. The intention wasn’t to penalize them, but to give them an opportunity to catch-up. At the same time, I made a point to recognize those who were consistently performing well and allowed all students the same opportunity to request the opportunity to catch up on any missed assignments to maintain fairness. Many students responded well to that nudge.
One practice I intentionally carried forward from my own experience as a student was leaving comments on graded work, not just when points were deducted, but also to acknowledge strong submissions. It is a small effort from my end, but it helps students feel seen and motivates them to keep improving. As a student, those were the moments we looked forward to, knowing the instructor noticed good work.
As the semester came to an end, the focus shifted to final evaluations, especially grading the course projects and submitting final grades to the university. One thing I did not fully anticipate during this phase was the time needed to carefully evaluate student projects. Each submission reflected a significant amount of effort, and I wanted to give them the attention they deserved. As a result, grading ran later than I had initially expected, although it was still well within the official deadline.
Teaching this course taught me some important things. Good teaching is not about getting everything perfect, it’s a way to strengthen your own knowledge while sharing that knowledge in a way others can truly grasp. It is also about being responsive, thinking about what’s working and what isn’t, and being willing to adjust along the way. Managing a full class without a TA was basically a one-person band situation (except I was the entire percussion section, keeping tempo, fixing the rhythm mid-performance, and still trying not to miss a beat while everyone else expected a flawless show). But throughout the semester, I focused on doing the best I could and continuously improving based on student input. Overall, this experience was incredibly rewarding and reaffirmed my plan to pursue a career in academia.
Acknowledgements
I sincerely thank my advisors, Dr. Michele C. Weigle, Dr. Michael L. Nelson, and Associate Professor & Assistant Chair of the Department of Computer Science, Dr. Steven J. Zeil for providing me with this invaluable opportunity to gain teaching experience as a PhD student. I am also grateful to my advisors and my colleague Dr. Bhanuka Mahanama, for always being available to answer questions. Special thanks to Dr. Santosh Nukavarapu for his mentorship throughout the semester and Syed R. Rizvi for providing the course slides. Credit for establishing and continuously refining this structure should go to the instructors who have taught the course over the years, including but not limited to Drs. Irwin Levinstein, Jian Wu, Vikas Ashok, Syed Rizvi, and Santosh Nukavarapu.
And finally, a very special thank you to my husband, Skanda Siva, for being endlessly flexible with his schedule and for his constant support, and to Yara Siva, who may not know it yet but was my tiniest companion through it all.
In the hours following the release of CVE-2026-45447 for the project OpenSSL, site reliability workers
and systems administrators scrambled to desperately rebuild and patch all their systems to fix a heap use-after-free in PKCS7_verify(). This is due to the affected components being
written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes
these things just happen and there's nothing anyone can do to stop them," said programmer Prof. Fabian Greenholt, echoing statements
expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have
occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can
we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to
write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities
regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."
Tigris is S3-compatible, which means you can point the AWS SDK at it and most things just work. The catch is that the Tigris-exclusive features—bucket forking, snapshots, object renaming, and the like—need verbose workarounds because the AWS SDK doesn't know they exist.
So we wrote a Go SDK that does. It comes in two flavors: the storage package is a drop-in replacement for the standard S3 client with first-class methods for the Tigris-specific operations, and simplestorage is a higher-level client for the common single-bucket case that infers its configuration from the environment so you stop passing the same parameters over and over. You can adopt the Tigris features incrementally without refactoring your existing S3 code, and the simpler API still works against other S3-compatible providers.
I wrote up how it works and why we built it over on the Tigris blog.
The Perma team is excited to announce WARCbench, an open-source tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.
WARCbench builds on over a decade of experience gained from developing Perma.cc. Over that time, we’ve accumulated a collection of scripts, utilities, debugging workflows, and one-off experiments for dealing with web archives. WARCbench brings together those processes into a simple command-line tool that helps web archivists make sense of the wild, occasionally malformed, and deeply heterogeneous web archives that web archivists encounter in practice.
WARCbench was designed to make as few assumptions as possible about your familiarity with web archives, the kind of WARC you are working with, or what you want to do with it. It is intentionally a command-line tool. You can use it to explore and work with WARC files even without deep prior knowledge of the format, though it does assume you’re comfortable using a terminal and open to a bit of experimentation. The goal is not to hide the complexity of web archives. It is to make that complexity easier to inspect, manipulate, and learn from so you can experiment and iterate.
While many existing WARC tools are optimized for specific production workflows, the exploratory, in-the-moment WARC wrangling and debugging work archivists and developers often need to do benefits from different design choices. Sometimes you need to inspect a malformed or misbehaving WARC. Sometimes you need hooks and custom callbacks for an experiment. Sometimes you need to optimize for speed, memory, or convenience. Sometimes you just need to look and see what is there before deciding what to do next. WARCbench was designed for those moments.
We don’t know all the ways researchers or web archivists might use WARCbench, but we hope it becomes a versatile Swiss Army knife that others will find valuable to keep in their toolkit too.
We’re delighted to welcome Ann McCranie, PhD, who joins OCLC on June 8 as our new Director of Research Insights.
Ann joins OCLC at an important moment as OCLC Research advances Research Reimagined, a strategic effort to strengthen the relevance, visibility, and impact of our research for library leaders and their institutions. In her role, Ann will help connect research priorities to practical insights that support decision‑making across a rapidly changing library and higher education landscape. She will lead a team of research scientists and engineers focused on advancing the Research Reimagined strategy.
Ann brings more than a decade of experience leading research programs in higher education, with expertise in mixed methods research, research operations, and research communication. Most recently, she held senior leadership roles at Indiana University.
Her work has focused on building durable research services, guiding cross-functional teams, and helping researchers and administrators navigate change. Throughout her career, she has paired rigorous analysis with practical application.
Ann also brings a perspective shaped by close collaboration with researchers, research administrators, and campus leaders beyond the library. That experience informs how she thinks about the evolving roles libraries as institutions respond to changes in technology, AI-informed scholarly workflows, and research infrastructure. This perspective will be especially valuable as OCLC Research continues exploring future-focused questions facing libraries and higher education.
To help introduce Ann to the community, we asked her a few informal questions.
What drew you to this role at OCLC?
What attracted me was the opportunity to help connect research to the decisions library leaders are making today, while also contributing to longer-term thinking about where libraries are headed. In higher education, I’ve worked with researchers, administrators, and institutional leaders who rely on strong evidence and practical insights to guide strategy, services, and priorities.
I was especially excited by the chance to bring that experience to OCLC and support work that can have both immediate and lasting value for libraries. Research is most meaningful to me when it helps people navigate change, make informed decisions, and think differently about what comes next.
How do you think about “research insights”?
I tend to think about research insights through the lens of impact. I once asked a doctor about a medical test, and she explained that she would not order it because the result would not change the treatment plan. At first, I was a little disappointed because I was genuinely curious, but that idea stayed with me.
It became a useful way for me to think about research. I’m always asking whether the work can help inform decisions, shape action, or open up new possibilities. If the findings don’t create an opportunity to do something differently, it’s worth asking how we can make the research more purposeful and useful.
What are you most looking forward to as you get started?
I’m really looking forward to getting to know my team and connecting with colleagues across OCLC to understand their work, priorities, and how Research Insights can support them.
As a social networks scholar, I’ve always been interested in the connections between people and how relationships help ideas spread and grow. So much innovation comes from those informal networks, whether that is among coworkers, library partners, or the broader community. I’m excited to learn from those connections and help build on the momentum already underway with Research Reimagined.
Ann will be attending ALA at the end of June, and we look forward to introducing her to many of you there. Until then, please join us in welcoming her to OCLC Research.
This post is part of a series in which I write about experiences or specific challenges from my day-to-day work. I’m hoping that these will be interesting for other librarians that work in entirely different areas, for my colleagues who are solving different problems on different systems (or maybe eventually the same one after we migrate), and for those who are thinking about doing this kind of work in the future.
Building from navigating the distributed database, I want to get more deeply into what cross-system problem solving can look like. To re-set the stage (but for more details about these tools, check the previous post), transaction history of items is only available for most users via our Analytics tool.
Transaction Histories
Transaction history represents the ways an item’s traveled, checkouts but also transits and receipts. This is one of the many transactions created while my request for the Alien: Romulus DVD was filled. In this transaction, a coworker at York (I’ve redacted any details, but the user ID is in the actual log) sets the item to transit for reason “HOLD” to “UP-PAT”:
Trans Hist Datetime
Trans Hist Workstation
Trans Hist Command Desc
Trans Hist Data Code Desc
Trans Hist Data Value
2025-08-27 12:10:48
0173
Transit Item
call number
POPULAR
2025-08-27 12:10:48
0173
Transit Item
copy number
1
2025-08-27 12:10:48
0173
Transit Item
item ID
000080622957
2025-08-27 12:10:48
0173
Transit Item
Max length of transaction response
3000000
2025-08-27 12:10:48
0173
Transit Item
station library
UP-PAT
2025-08-27 12:10:48
0173
Transit Item
station login clearance
NONE
2025-08-27 12:10:48
0173
Transit Item
station login user access
REDACTED
2025-08-27 12:10:48
0173
Transit Item
station user’s user ID
REDACTED
2025-08-27 12:10:48
0173
Transit Item
transit from
UP-PAT
2025-08-27 12:10:48
0173
Transit Item
transit reason
HOLD
2025-08-27 12:10:48
0173
Transit Item
transit to
UP-ANNEX
This is the Analytics export, which I transformed from a CSV into a table for readability in this post.
Unfortunately, even though the underlying Symphony database has unique item keys for records, Analytics seems to use the barcode as the primary key of an item table, not just the primary way to find an item record. An item’s transaction history is completely wiped from Analytics if someone changes the barcode. And sometimes, barcodes change. In our case, we change barcodes on everything that’s permanently shifted to the annex (see my post on macros). We also have barcodes wear out or fall off. So we have hundreds of thousands of items whose histories were lost, at least from the Analytics.
These lost records came to a head when our Collection Maintenance team needed to be able to track large sets of items being moved the Annex. Once the items arrived, their barcodes would be replaced with an Annex barcode, which serves a different function. So one could follow a set of barcodes on their journey until “poof,” every record related to them vanished. On the one hand, one could assume the item had been processed by the Annex since it had now disappeared. But it made tracking uneven and meant collections maintenance couldn’t tell what route an item had taken to get there or how long it’d taken.
First, I’ll note that our systems work is also quite distributed. While I was working with our collections maintenance data expert on getting access to older data, the Symphony admins were configuring item extended information to include an original barcode field, which is now populated when a barcode updates. They’ve also done some work hunting down barcode changes to update the original barcode fields. These will be exportable, even though they won’t be searchable the same way in our Analytics. Systems takes a village.
Where the Data Still Lives
Getting back to the problem-solving, this data can still be found through the oldest method of ILS data access: Workflows reports.
By running a Scan History Logs report against a set of barcodes, we can export every log in which that barcode shows up. This data wasn’t nearly as easy to use as an Analytics or Data Control export. It’s exported in a text file and uses opaque datacodes.1 Here are two example log entries from a barcode change (the actual user’s ID has been replaced with REDACTED):
That top entry is really important because, even though there are other ways of accessing a permanent item ID, it’s not in the logs. So by scanning for that original barcode, we can get the entry where the barcode is in NQ and the new barcode is in NR.
I wrote a Python script that processed entire log entries, since the colleague from Collection Maintenance wasn’t just looking for old/new barcodes but for the transaction histories of that item. He could apply a date range to the log export itself, so he could set it just to export the last few months. I’m not going to share the entire script here, but this is the overall approach that I used:
current = re.search(r'\<datacode_NQ\>:(.+?) ',line)
I wrote a conditional function for all the entries which might not be present. For the handful whose data might contain a space, I wrote the search to break at the $<datacode that begins the next entry and trimmed space off the right side.
The ouptut is a very large JSON object, which I’ve condensed below to reflect the key fields from this transaction. Even with its size, it’s a lot more compact and efficient than the Analytics output shared above and thus might be easier to process, so this script may end up being useful in other contexts.
Going forward (to migration) this new process should meet the use cases of:
connecting old and new barcodes in collection maintenance logs,
tracking item histories that had been dropped from Analytics, and
created a friendly JSON object vs. the same entry spread across a dozen lines or more of a CSV, making it of potential use for reporting on new barcodes as well.
So in sum:
Sysadmins created a new field for original barcodes and set it to populate when barcodes are changed.
Sysadmins began hunting through logs to find barcode changes and wrote a script to populate them in the database for export/reference.
I created a way to extract JSON objects for item transaction history out of log reports run on old barcodes since those transactions were no longer accessible in Analytics.
There are also ways to export a formatted log which is human readable, but those logs are much harder to turn into data structures. ↩︎
This article presents a comparison of graph capabilities in three
different databases: DuckDB (v1.4.4 with duckpgq), LadybugDB (0.16.1),
and PostgreSQL (19devel). We will load a large volume of records
(5,635,972 rows of baseball data covering people, parks, team records,
and game play-by-plays) into each database, define the entities and
relationships, and write a variety of queries that take full advantage
of the graph structure.
Ambient Church transforms architecturally stunning spaces into immersive
audio-visual environments. Our events feature pioneering artists
presenting vibrant works in a context that elevates both the music and
the space.
Founded in Brooklyn in 2016, we facilitate collective peak experiences
through the soundscapes of modern contemplative music. With an emphasis
on education and environment, we seek to illuminate an underacknowledged
lineage of sonic exploration.
Large language models (LLMs) are increasingly used to generate data to
train improved models1,2,3, but it remains unclear what properties are
transmitted in this model distillation4,5. Here we show that
distillation can lead to subliminal learning—the transmission of
behavioural traits through semantically unrelated data. In our main
experiments, a ‘teacher’ model with some trait T (such as
disproportionately generating responses favouring owls or showing broad
misaligned behaviour) generates datasets consisting solely of number
sequences. Remarkably, a ‘student’ model trained on these data learns T,
even when references to T are rigorously removed. More realistically, we
observe the same effect when the teacher generates math reasoning traces
or code. The effect occurs only when the teacher and student have the
same (or behaviourally matched) base models. To help explain this, we
prove a theoretical result showing that subliminal learning arises in
neural networks under broad conditions and demonstrate it in a simple
multilayer perceptron (MLP) classifier. As artificial intelligence
systems are increasingly trained on the outputs of one another, they may
inherit properties not visible in the data. Safety evaluations may
therefore need to examine not just behaviour, but the origins of models
and training data and the processes used to create them.
In this essay we will attempt to look at both the archive of art as well
as the archive as art. When we draw a distinction between those
materials that we treat as documents with a ‘factual’ historical
significance (those which offer themselves in the service of
scholarship), and the uses which artists make of the archive as one of
the media of expression that intersect with their documentary value, we
ask ourselves: which theories about the archive’s nature and function
are applicable to Syrian art? What are the roles adopted by ‘the
document’ and ‘the archivist’? To what extent do these roles alternate
and intersect?
Cave of Forgotten Dreams is a 2010 3D documentary film by Werner Herzog
about the Chauvet Cave in Southern France, which contains some of the
oldest human-painted images yet discovered—some of them were crafted
around 32,000 years ago. It consists of footage from inside the cave, as
well as of the nearby Pont d’Arc natural bridge, alongside interviews
with various scientists and historians. The film premiered on 13
September 2010 at the Toronto International Film Festival.
Starbucks is saying goodbye to its artificial intelligence inventory
management system about nine months after its debut, Reuters reported
Thursday. The tool, which used computer vision to track some parts of
the chain’s inventory, was announced in September as a method to
simplify inventory record-keeping and prevent stockouts.
FediRoster is a slightly more heavyweight alternative to David Adler’s
Sociologists on Mastodon software. It is intended to function as a
public list of Mastodon and other fediverse accounts, geared primarily
towards academic communities, but suitable for others as well. It offers
functions for following listed accounts individually or in bulk. The
main novelty here is that you can add yourself to the list through an
authentication process instead of all the work falling on a list
maintainer. You can sign in through your Mastodon account or send a
message to the list’s bot to verify your account ownership. This also
means that the hosting process for new lists is a bit more involved
(it’s a Python/WSGI application).
A comprehensive guide to learning Rust for developers with Python
experience. This guide covers everything from basic syntax to advanced
patterns, focusing on the conceptual shifts required when moving from a
dynamically-typed, garbage-collected language to a statically-typed
systems language with compile-time memory safety.
Language models serve as the cornerstone of modern natural language
processing (NLP) applications and open up a new paradigm of having a
single general purpose system address a range of downstream tasks. As
the field of artificial intelligence (AI), machine learning (ML), and
NLP continues to grow, possessing a deep understanding of language
models becomes essential for scientists and engineers alike. This course
is designed to provide students with a comprehensive understanding of
language models by walking them through the entire process of developing
their own. Drawing inspiration from operating systems courses that
create an entire operating system from scratch, we will lead students
through every aspect of language model creation, including data
collection and cleaning for pre-training, transformer model
construction, model training, and evaluation before deployment.
Wasteback Machine is a JavaScript library for analysing archived web
pages, measuring their size and composition to enable retrospective,
quantitative web research.
The primary difference between deepfake photos and LLM conversations is
that the people who generate the former are deliberately trying to fool
others, and many of the people who elicit the latter from LLMs have
inadvertently fooled themselves.
The removal of the encoders, which are typically in charge of making
sense of the multimodal inputs, places the burden of making sense of all
outputs on the LLM. Although the model is encoder-free, all modalities
are now unified within the LLM. Instead of the model having to wait for
the encoders to finish processing the audio and image inputs, the LLM
can get started earlier processing the input and generating output!
In this guide, I want to showcase what it took to remove the vision and
audio encoders and replace them with something much faster. The result,
a 12B model that can handle audio and image inputs but without the need
for encoders.
AI Edge Gallery is the premier destination for running the world’s most
powerful open-source Large Language Models (LLMs) on your mobile device.
Experience high-performance Generative AI directly on your
hardware—fully offline, private, and lightning-fast.
Today, we are introducing Gemma 4 12B, our latest model designed to
bring agentic multimodal intelligence directly to laptops. Bridging the
gap between our edge-friendly E4B and our more advanced 26B Mixture of
Experts (MoE), Gemma 4 12B packages powerful capabilities inside a
reduced memory footprint. It is also our first mid-sized model to
feature native audio inputs
Solid State Books is a full-service, Black-owned general interest
bookstore with a great selection of fiction & non-fiction titles. We
stock literary gifts, stationery, greeting cards & puzzles for all
ages. We have a carpeted, playful children’s books area in both stores
for kids & parents alike to spread out & read together. Come by
for weekly children’s story hours, catch monthly book groups, author
readings/signings, local interest panels, political conversations &
more!
MiniSearch is a tiny but powerful in-memory fulltext search engine
written in JavaScript. It is respectful of resources, and it can
comfortably run both in Node and in the browser.
In May 2026, the Justice Department began systematically removing
material from its web sites regarding the many indictments and
convictions related to the Jan. 6 attack on the U.S. Capitol. This
archive reconstructs the vast bulk of those thousands of deleted
records.
Last week, the Justice Department began systematically removing material
from its web sites regarding the many indictments and convictions
related to the Jan. 6 attack on the U.S. Capitol.
The operation started without fanfare or formal announcement and
proceeded largely unnoticed. Until, that is, journalists such as the
Washington Post’s Meryl Kornfield took notice of certain press releases
and other materials that had conspicuously disappeared from
www.justice.gov.
“The Trump admin is quietly deleting info about the Capitol attack from
the DOJ website as it prepares to give funds to J6ers,” Kornfield
posted. “This week, DOJ deleted a press release about one man with an
ongoing child solicitation case who came to the Capitol with bear
spray.”
Then, with typical bombast, the Justice Department responded by taking
issue with one particular aspect of Kornfield’s characterization.
“Nothing ‘quiet’ about it,” the DOJ Rapid Response account replied. “We
are proud to reverse the DOJ’s weaponization under the Biden
administration. We will do everything in our power to make whole those
who were persecuted for political purposes. This includes stripping
DOJ’s website of partisan propaganda.”
We are not erasing history quietly, the Justice Department seemed to
suggest. We are erasing history loudly and proudly.
At Lawfare, we have restored the vast bulk of what was deleted. We have
also started to preemptively archive a raft of material that has not yet
been deleted but probably will be, given its thematic relationship to
the material that was 86ed.
Data centers are the physical facilities that power cloud services, AI
systems, streaming, and nearly every digital platform people use each
day. As demand for artificial intelligence accelerates, data centers are
becoming major sources of electricity demand and local infrastructure
pressure, which means their growth affects energy systems, communities,
and long-term public planning.
The regulatory landscape for data centers in the United States has
shifted dramatically in recent years from a period of aggressive
economic incentives to a phase of intense scrutiny, restriction, and
community-led resistance. To track these legislative changes, the DIGS
Lab at the University of Virginia reviewed more than 700 federal, state,
and local policies related to data centers. The data center policy
database aims to bring transparency around zoning, permitting, and
regulating data centers and their impacts on communities. This is what
we found.
Another early R.E.M. set, from the same state but a different city as
the previous show. Pretty much the same library of songs, but this one’s
the superior show to get - it sounds slightly nicer and doesn’t have the
equipment failures of the previous show. There’s already a source on
here, but that’s a different master of the same recording.
Ratatui (ˌræ.təˈtu.i) is a Rust crate for cooking up terminal user
interfaces (TUIs). It provides a simple and flexible way to create
text-based user interfaces in the terminal, which can be used for
command-line applications, dashboards, and other interactive console
programs.
IPv6 is weird. One of the more strange parts of the standard is that every interface's link local addresses are in fe80::whatever. If you have a machine with two network interfaces, both of them will be in fe80::, so if you have a packet destined to fe80::4, how do you disambiguate it?
The answer is you use IPv6 scopes/zones. The exact format of what goes into a zone is OS dependent, but on Linux it's the interface name and on Windows it's the interface ID. This lets the kernel's routing table know how to handle an address range conflict.
On my tower, this would be represented like this:
fe80::4%eth0
Where eth0 is the name of my tower's ethernet device.
When you create a host:port bindhost, you normally separate the hostname and port with a colon. IPv6 uses colons to separate hex groups. In order to disambiguate what's the host and what's the port, you typically format the IPv6 address in square brackets, so fe80::4 on port 80 would look like this:
[fe80::4]:80
And with the right scope it looks like this:
[fe80::4%eth0]:80
Now let's get URL encoding into the mix. From high orbit, you can imagine a URL's format as being something like this:
An IPv6 zone would then be part of the hostname, just like with that fe80::4 port 80 example from earlier. So you'd think the URL would be something like this:
http://[fe80::4%eth0]:80
But if you try to parse this as a URL in Go, you get an error:
package main
import"net/url"funcmain(){if_, err := url.Parse("http://[fe80::4%eth0]:80"); err !=nil{panic(err)}}
This happens because URLs can't represent all Unicode values, so any values that don't fit into the grammar of a URL become percent-encoded. This is why sometimes you'll see a %20 in URLs in the wild; that's encoding the ascii space key, which is invalid in URLs.
In order to work around this, you need to percent-encode the percent sign in the IPv6 zone:
package main
import("fmt""net/url")funcmain(){ u, err := url.Parse("http://[fe80::4%25eth0]:80")if err !=nil{panic(err)} fmt.Println(u.Hostname())}
Yields:
fe80::4%eth0
In theory, there is guidance for how to properly handle IPv6 zones in user interfaces in RFC 9844, but there's no such guidance for URLs. Go also does not seem to follow this RFC in net/url.
So in the meantime in order for Anubis to point to IPv6 zoned addresses, you need to encode the % with percent encoding. This is horrible, but it seems that this is an edge case that applies to other frameworks, programming languages, and libraries:
Maybe some day in the future there will be a better option here. In the meantime my policy of not forking the Go standard library means that this somewhat terrible UX for an edge case is acceptable. I hate it, but what can you do?
The Perma team recently attended the International Internet Preservation Consortium’s (IIPC) Web Archiving Conference, held this year at the KBR—Royal Library of Belgium in Brussels. A recurring theme was that web archiving depends on collective stewardship of the open-source tools, institutions, and people that make preservation possible. At a moment when the web is becoming more difficult to archive, the conference offered an assessment of current challenges and a reminder that the sustainability of the field relies heavily on collaboration and shared responsibility.
The opening keynote panel—“Sustainability for Open Source Web Archiving Tools”—brought together perspectives from libraries, consortia, and open source service providers: Lauren Ko (University of North Texas Libraries), Tessa Walsh (Webrecorder), Neil Jefferies (Open Preservation Foundation), Yves Maurer (National Library of Luxembourg), and LIL’s very own Clare Stanton (Perma.cc). The conversation focused on the structural pressures now reshaping the digital landscape, and what collective stewardship might realistically look like. Key takeaways from this conversation are outlined below.
Clare Stanton (center) discusses Perma.cc during the opening keynote.
Need for sustained investment in open-source software
The web archiving community no longer has the luxury of treating tool and infrastructure maintenance as someone else’s problem. Nearly every institution in the room relies on these open-source tools, including Perma itself. For example, the replay functionality for Perma.cc is built on replayweb.page, part of the software suite developed by our long-time collaborators at Webrecorder. Despite almost everyone using these open-source tools, almost no one is funding them proportionally. Historically, many projects survived on grants and foundation support, but that funding landscape is shrinking. Yves framed open-source work as a shared mission and responsibility, especially for national libraries and cultural heritage institutions whose mandates depend on long-term stewardship. Institutions should be contributing back to the web archiving ecosystem they depend on.
An asymmetric fight against a complex and closing web
Web archiving has become more difficult in the past few years, and the scale and pace of change is only accelerating. Tessa described the current environment as an “asymmetric fight” due to bot detection and anti-scraping systems increasingly treat archiving crawlers the same way they treat commercial scrapers. Several panelists pointed to the collateral damage caused by large-scale scraping and large language model (LLM) training. Infrastructure providers are tightening access controls across the web, often in ways that make legitimate archival crawling significantly harder. Tessa noted that archivists now need to spend more time simply observing crawls to determine whether captures succeeded or whether crawlers archived nothing but bot verification pages. Clare suggested that the closing web may create an opportunity for archiving institutions to advocate collectively for differentiated treatment, making the case to infrastructure companies like Cloudflare that preservation work serves a fundamentally different purpose from commercial scraping.
Beyond single maintainers: Sustaining people, not just code
The panelists repeatedly returned to governance and community structure as equally important to technical capability, and also discussed the human labor behind open source tooling. Multiple panelists emphasized that storage and compute are not the primary costs in web archiving operations. The expensive part is retaining highly skilled people capable of adapting tools to a rapidly changing web environment. Neil argued that sustainability problems become especially acute when projects depend too heavily on single maintainers. The goal, Neil suggested, is not to remove human dependency, but to move from person-dependent systems to people-dependent systems, with succession planning, multiple technical leads, and stronger organizational support structures.
Digital preservation as collective responsibility
There was some cautious optimism about potential sources for more sustainable support. Panelists discussed adding funding requirements for upstream open-source projects into public tenders for web archiving services, creating institutional budget lines specifically for open-source maintenance, and treating contributions to community software as legitimate professional development work for developers within libraries and archives. Some panelists pointed to growing interest in digital sovereignty policies in Europe, where governments increasingly want more direct control over digital infrastructure and collections stewardship. Yves suggested that this political shift could create opportunities for open-source preservation tooling, particularly if public sector procurement rules begin explicitly rewarding contributions back to shared infrastructure.
Benefits and limitations of AI-assisted coding
Not surprisingly, AI hovered over much of the discussion. AI-assisted coding may reduce some development overhead, and some panelists described productive uses for code review, bug detection, and scripting assistance. However, the panel was skeptical of the idea that AI meaningfully solves the underlying sustainability problem. Faster code generation does not automatically create maintainable systems, healthy governance, or resilient communities. As Tessa noted, velocity without understanding creates its own risks.
Open-source software is critical preservation infrastructure
The key takeaway that emerged from the opening keynote was a reframing of open-source web archiving infrastructure not as ancillary technical tooling, but as critical preservation infrastructure. The field behaves as though these systems are indispensable, but there is a significant underinvestment in open-source tools. The harder question, and the one the panel kept circling back to, is whether institutions are willing to fund, maintain, and steward them accordingly.
Together with AGESIC, we are piloting a traceable AI system with Uruguay’s open data catalogue – so that citizens can receive verifiable answers from national data
The Wayback Machine is (usually)
good at preserving web pages, but it’s not always good at helping you
find your way around what’s been preserved. URLs from a vanished website
may be archived, but if the original site is gone, the paths into it
(its navigation, its search, its tables of contents) are sometimes gone
too.
This creates a need, and opportunity, for sites I want to call
reading rooms for the archived web: standalone sites that sit
to the side of archived web content and provide the index, browse,
search and curation layers that the original site used to, with
provenance links back to the captures they’re drawn from. The metaphor I
have in mind is the reading room in a brick & mortar archive, the
place you go to consult a collection, with finding aids close
at hand and the records themselves a request slip away. Perhaps a
finding aid is the better metaphor here?
The most recent example of this I’ve come across is work from Lawfare Media, who recovered
5,772 pages deleted from the Department of Justice website related to
the Jan. 6 attack on the US Capitol. They’ve built a standalone archival
viewer of the extracted content that links back to the Wayback Machine. There is more about
the motivation for the project in their post The
Justice Department Erases History. Lawfare Restores It. (Sadly the
GitHub repo for the archive itself looks to be private.)
This is a bit archive-eating-its-own-tail, but one feature of the site
that Lawfare Media built is that the search
is operational from within the Wayback Machine’s own snapshot
of the site, since the search runs client-side. A user search doesn’t
require an API back to the server.
Searching the archive from inside the Wayback Machine
Looking at the HTML it appears the site is using minisearch for
client-side search. A nice side effect of client-side search is that the
indexed corpus (metadata for all the DoJ content) is itself available on
the open web, as corpus.json.
Some caring person has even already thought to archive
corpus.json using Save Page Now:
A Wayback Machine snapshot of corpus.json from May 29, 2026
Other “Reading Rooms”
Lawfare’s archive sits in a small but growing genre. Or maybe it’s well
established and I’m just noticing it for the first time? Another example
is Ben Welsh’s
FiveThirtyEight Index
which he built after Disney shut down fivethirtyeight.com
in March 2025. It catalogs over 38,000 articles, datasets, podcasts and
graphics, browsable by author, date and series, with every record linked
back to its Wayback Machine snapshot. (The Internet Archive also runs a
companion
collection.)
Another example is Internet Archive’s Scholar, which provides a
catalog of published research (mostly journal articles) that are found
in the Wayback Machine. I believe this is a presentation layer over data
collected by IA’s FatCat project. Which
provides some ability to edit the metadata about the archived content.
In archival terms what these projects are doing is effectively what
finding aids
doe: describing scope, arrangement, and provenance, but wrapping it in
something that feels more like a reading room than a paper inventory.
They are themselves websites that will eventually need to be archived. I
think it’s interesting to think about them as a continuation of
something archives have been doing for a very long time. It’s also
interesting to think about the role that agentic coding tools played in
their production (at least in the case of the Jan 6 Archive).
Jonathan Gray and the Public Data Lab at King’s College
London run a project called Repurposing
Web Archives (with the Internet Archive and Internet Archive Europe)
that looks at the tools, methods, and stories of how researchers,
journalists, and artists actually work with the archived web: see their
recent Follow
the Changes post. Perhaps this idea of Reading Rooms for web
archives is a subset of the types of practices this project is
interested in? It seems like there is a gray area between research that
incorporates web archives, and more documentation oriented content for
providing an entry point into web archives?
If you know of other examples of Reading Rooms (or finding aids) for Web
Archives I’d love to hear about them!
This post was originally a thread over in the
Fediverse. Thanks to (freegovinfo?) for
the pointer to the Lawfare Media work.
The Uncanny Valley and Gell-Mann Amnesia Effect in the ACM Digital Library
Michael L. Nelson
2026-05-28
I serve on the ACM Digital Libraries Board, and we are navigating a number of changes to the ACM's Digital Library, which as a professional society and memory organization, is arguably the ACM's primary asset. A recent article (March, 2026) by Jack Davidson and Wayne Graves provides a status update of the ACM's move to open access, which includes establishing a "basic" and "premium" service level. Although there are some questions regarding the long-term implications of moving to open access, I, and presumably all authors, welcome the ACM's bold strategy for ensuring that our content reaches the widest possible audience.
Jack's and Wayne's article also addressed the DL's recent experimentation with AI/LLM enrichment of articles, specifically landing pages. And unfortunately, the experimentation got off on the wrong foot. Just before the holidays in 2025, the landing page for articles in the DL added AI-generated summaries as a sort of alternate or rival abstract. To make matters worse, these summaries were shown by default, and users had to select a tab to show the original, author-supplied abstracts. The figure below is an example taken from Dr. Casey Fielder (CU Boulder), whose social media post about the summaries being shown by default instead of the abstracts gained a lot of traction.
Fortunately, the expected behavior of showing the authors' abstract by default returned very quickly, and the AI-generated summary is now clearly marked as such, including the date that the summary was generated:
First, let me be clear: showing the AI-generated summary by default instead of the authors' abstract was a terrible idea and was uniformly rebuked. The DL board was not informed that this was going to happen, and I can't recall anyone on the DL board even suggesting it; perhaps it was just an oversight by an ACM staff member or engineer at Atypon. I don't recall exactly when the expected default behavior was restored, but it was soon after the author community complained.
My original suggestion at the DL board meetings (echoed by Dr. Fiesler) was to provide wiki-style editing on the AI-generated summaries, possibly limited to logged-in authors (a possible premium feature?). One can make a good argument for either opt-out or opt-in, but neither option adequately addresses the problem of the sizable back catalog of unreachable authors (JACM began in 1954).
But what I find interesting is the level of author backlash against AI-generated summaries, at least as I observed on social media. This is all anecdotal, and I realize people don't post about things for which they are neutral or have even mildly positive feelings about because, let's face it: carping is a lot more fun. But Dr. Fiesler and the others in the thread are all reasonable people and aren't just trolling. I think there's something more fundamental happening. I think our collective reaction (revulsion?) to AI-generated summaries can be explained by adapting two phenomena: the Uncanny Valley, and the Gell-Mann Amnesia Effect.
The Uncanny Valley is an hypothesis that posits that our emotional response to depictions of humans (expressions, speech, movement, etc.) initially rises as the likeness becomes more human-like, and then takes a sharp dive as the likeness becomes nearly human-like but not quite. Basically, most cartoon characters, anthropized animals, etc. are "cute", but the more realistic animated humans in movies like "Polar Express" (2004) are just creepy.
I propose that something similar happens with text. Most authors have no problem with AI tools enriching the work, for example: language translation, extracting citations, repairing/rewriting hyperlinks, suggesting related works, suggesting/assigning keywords and ACM CCS values, and any number of other services and derived content. But generating a summary that rivals the abstract? Yuck. No thanks. An error in citation parsing or CCS assignment? Meh, who cares, either ignore it or fix it, but no one takes to social media to complain. A subtle but detectable (if only by the author) error in a summary? That's glaring and viscerally wrong. And even if we can find no substantive errors, knowing the text is AI-generated, we will find fault with phrasing, the structure, and various minutiae (cf. humans' negative attitudes to replicants in Blade Runner). Extracting keywords is what computers do. Writing abstracts is what we do. If LLMs can write abstracts, what's our job?
Those assessments inevitably derive from us reviewing AI-generated summaries of our own work. Presumably, no one knows the material better than us, so the best anyone / anything else can do is be "as good as", certainly not "better". We're writing for our peers, and we share a nuanced, high-bandwidth vocabulary that outsiders just can't appreciate. On the other hand, if we have to read articles outside of our area of expertise, we often wonder why are the authors so obtuse? Why can't "those people" just write plainly?
This is the essence of the Gell-Mann Amnesia Effect, which was coined by Michael Crichton to describe the phenomena that the more you know about a topic, the more likely you are to see the flaws in a third party analysis, but at the same time not being as critical when that same third party summarizes a topic on which you are not an expert. Anyone who has been interviewed by the media has experienced this: the reporters inevitably butcher your hour-long exposition, provided in painstaking detail, covering all the nuances, edge cases, historical review, and possible future directions – all reduced to a minute or less of decontextualized soundbites. But that news outlet suddenly becomes a trusted and valuable source when they cover a topic outside of your expertise.
I suspect the Gell-Mann Amnesia Effect applies to AI-generated summaries as well: they are an abomination when applied to my work, but a useful de-jargoning tool for exploring unfamiliar or even adjacent sub-fields. This even presupposes that there should be multiple AI-generated summaries, aimed at different audiences (e.g., lay person, High School, undergraduate, researcher). In fact, the rival abstract in Dr. Fiesler's example might be the least useful summary, precisely because it does rival the author's abstract. But writing for audiences other than our own is a different skill set: writing for my fellow researchers at JCDL, Hypertext, Web Science, etc. is what I do, but writing for high schoolers is not what I do. Casting my work into something appropriate for high schoolers would be a good use of LLMs, and simplifications (if not outright errors) are to be expected.
In summary, I think it's natural to feel revulsion when the LLMs are used to rival our work: it falls into the textual uncanny valley, in a way that other generative works, such as translation, do not (at least not currently). But at the same time and based on the Gell-Mann Amnesia Effect, our harshest judgement of AI-generated summaries is reserved for areas in which we are an expert, and our assessment of AI-generated summaries improves as we apply them to areas further from our own.
With that in mind, it would make sense for the ACM DL to enable wiki-style editing on summaries, move away from the model of a single summary that rivals the author's abstract in length and complexity, and introduce multiple summaries, tailored to audience and intended purpose.
Are these good summaries? I guess so – although I'm not sure what else to evaluate them against. I don't know the first thing about proteomics, so the "General" summary is certainly the most accessible to me. The "Expert" summary is more detailed than the "General" summary, but still more accessible to me than the authors' abstract. That's not a surprise because 1) I haven't studied biology or chemistry since High School, some 40 (!) years ago, so Schär et al. aren't writing for me, and 2) the summaries are both about half the length of the authors' abstract. I saved all three into separate files:
% wc -w bio-*txt | grep -v total
219 bio-abs.txt
107 bio-expert.txt
88 bio-general.txt
Two hundred words is a good target for abstracts. I'm guessing the prompts for the AI-generated summaries had a target of about 100 words, so by design even the "Expert" summary will not rival the authors' abstract (though metadata and wiki-style editing would be nice). The "Automated Services" tab has at the bottom a link to "Explore Further on ScienceCast":
I don't have an account (yet) on ScienceCast, so that's the end of my exploration for now. But there's clearly a bigger AI↔paper ecosystem to explore, for both me personally and the ACM DL.
–Michael
2026-06-02 Update: In another chat with Martin Klein, and had just discovered the institutional repository at Niigata University. It does not a native English interface, so all of the translations shown below are via Chrome and thus a little clunky. When you first visit the repository, it asks you to choose a persona or level from three choices: "adult", "junior and senior High School students", and "Elementary school student"
I did a search for "web archiving". The hits are not especially relevant (perhaps no one at Niigata is active in the field), but they are sufficient to demonstrate the personas.
Chrome's translation for Elementary School students is not smooth, but I'm guessing that's an issue with Chrome and not the LLM that Niigata is using – presumably there is less training data for translating "children's" Japanese?
The landing page Niigata's institutional repository does have the regrettable "embedded PDF" interface, and it does list a truncated "AI Explanation" above the "Summary by the author" (to be fair, perhaps it's named "summary by the author" instead of "abstract" is a function of the translation)
It is a little hard to evaluate this three-level approach, since there's the added dimension of language translation. But it feels like an interesting application of LLMs, and aside from being listed at the top of the SERP, it does not seem to be in competition with the authors' abstract.
Note that the landing page displayed above is likely an experimental and/or local UI since it is hosted at niigata-u.ac.jp, and is very different from the more conventional looking landing page for associated the handle which resolves to nii.ac.jp.
The college wage premium, that is, the increased earnings associated with having a college degree as opposed to only being a high school graduate, hasn’t changed at all in the past 25 years, because median real wages have been flat as a pancake for everybody, no matter what their formal education level, for the past quarter century.
I wonder what’s happened to capital over this time? Value of S & P 500, inflation-adjusted, 1/2000 to 9/2025 (same period as the wage data):
2000: $1,394
2025: $6,688
On average, for more than the students' entire lives, stock-owners like Schmidt and (to a much lesser extent) I have stolen every last drop of the productivity increase of US workers at every age and education level. (See the actual numbers in the appendix)
Now, the perpetrators of this theft are telling their victims, the students and the public at large, that whether they like it or not they will be subjected to AI because that will make the perpetrators even richer. The victims have been informed that this new technology will:
Nothing better illustrates the contempt of the Epstein class for the proletariat than that these oligarchs would expect the graduating class to enthusiastically accept this prospect.
I was fooling around with FRED this morning, as one does, and here are some stats: (The FRED numbers are presented in nominal dollars; I’ve converted them to CPI-adjusted dollars).
Median usual weekly earnings of workers with a high school degree only:
2000: $968
2025: $980
Median usual weekly earnings of workers with a bachelor degree only:
2000: $1,587
2025: $1,580
...
Median usual weekly earnings of people with a bachelor’s degree or higher:
2000: $1,705
2025: $1,747
Here is a short list of YouTube videos on this topic:
As a boomer, I think this post might be the exception that proves Ms. Baba's rule.
Note that every single one of the ads that I saw watching these videos in an incognito window was advertising an AI company! As are 49% of all the billboards in the Bay Area. Read the room, guys!
This post is being shared on both the dataindex.us newsletter and the Library Innovation Lab Blog.
“Is data changing? Is it being disappeared? How do we know? How can we know?” This interrogative refrain rang through just about every conversation I had when, almost a year ago, I came to Harvard Law School Library to lead the Public Data Project. Thanks to the dataindex.us Data Checkup, a plan is in place to do this complicated but essential work. Through the careful scaffolding dataindex.us has constructed and the assiduous research of its staff, more than a dozen federal datasets have “health assessments,” and the team continues to add to this list.
In October 2025, the Public Data Project partnered with dataindex.us to develop a data monitoring toolkit that could both work at scale and be user-driven. In addition to creating an automated tool that can process large numbers of datasets, we also want the user to determine which datasets they want to monitor. Let’s face it, when it comes to federal data, one person’s byzantine, inscrutable dataset is another person’s trove of invaluable ground truth. The anecdotes of data use collected by essentialdata.us offer varied examples of the ways people benefit from federal datasets. The range of uses is a clear indication that people need to be able to monitor the data that matters to them.
At the Public Data Project, we are creating a toolkit that will enable users to detect and monitor changes to federal datasets over time. It will enable users to select a dataset and track changes within the data itself, as well as to automate the monitoring of external sources that indicate whether the data might be changing. Indicators of change to a given dataset range from somewhat obvious sources, like major news sites, to more obscure sources, like the U.S. Code. At present, our tool development has produced two components.
First, Binoc is a command-line tool and library to generate changelogs for datasets that don’t have them.
Unlike generic diffing utilities intended to describe line-level differences in plain-text content such as source code or Markdown, Binoc aims to efficiently summarize changes in real-world datasets, including file additions and deletions, row-level updates, and schema alterations. Given a series of dataset snapshots captured at different points in time, Binoc detects what changed, expresses any changes as a minimal structured diff, and produces a human-readable summary. Binoc is currently in a collaborative design phase of development, with new features being added regularly. We welcome feedback from early adopters.
We have also begun the research for a second component of the data monitoring toolkit development.
We have created an AI benchmarking exercise to compare and to evaluate how well AI can monitor data and assess its risk when considered next to the processes and conclusions of a careful researcher. The goals of the exercise are to:
Test how well AI can assess various types of risk to federal datasets;
Evaluate what baseline a popular search model would use to answer those without a custom search harness;
Surface and reflect on the tacit knowledge necessary to perform risk assessment, including the sources needed, the steps involved, and the difficulty of defining criteria;
Create awareness and community through an intellectually engaging activity that includes both individual research and group reflection.
We have conducted an initial test run of this exercise with a group of 10 information professionals. After introducing the participants to the dataindex.us rubric to assess the risk level of a given dataset, each participant was assigned a dataset and asked to evaluate it across three of the six risk dimensions outlined in the rubric. Each participant was either assigned the first three dimensions — Historical Data Availability, Future Data Availability, and Data Quality — or the latter three — Statutory Context, Staffing and Funding, and Policy. For the first hour, participants more or less worked alone, diligently researching a subject that they lacked expertise in, but for which they had clear guidelines for the kind of information they sought. Participants then opened ChatGPT, and fed it prompts that we had scripted and tailored for each dataset. First in a form that asked them specific questions and then as a group compared their results with ChatGPT’s, participants reflected on their findings. Going through their three assessment dimensions, participants compared their conclusions to those of AI, reflecting on what AI missed, what they missed, and on what parts of the rubric may have led to confusion.
This exercise gave us an early insight into the potentials and pitfalls of AI’s ability to assess data risk, as well as ways in which we might tweak both the exercise and the assessment rubric. This group of participants were information professionals, not policy wonks, and we are eager to see how area specialists’ experience might lead to different outcomes in this exercise. In addition, we want to experiment with prompt engineering and give participants more leeway in their interaction with AI. In the next iteration of the exercise, we will rely on the transcription of each participant’s interactions with AI for analysis, rather than asking individuals to respond in a form.
What we liked most about this exercise, however, were the collective reflections not just on AI, but on public data more generally. One participant described it as an “excellent empathy-building exercise” because, through the work, both alone and as a group, participants become aware of the importance of and perils to public data. They reflected on whether and how to translate their own empathetic experience to AI.
Win free books from the June 2026 batch of Early Reviewer titles! We’ve got 251 books this month, and a grand total of 3,098 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.
The deadline to request a copy is Thursday, June 25th at 6PM EDT.
Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the UK, the US, Canada, Australia, Germany, New Zealand, Ireland, Malta, Italy, Latvia and more. Make sure to check the message on each book to see if it can be sent to your country.
Thanks to all the publishers participating this month!
Happy June, DLF community! Thanks to everyone who participated in Community Voting for the 2026 Virtual DLF Forum. We appreciate your input as we work with the Forum Planning Committee to build this year’s program.
Look out for updates this month: the program release, registration opening, and Digital Storytelling Fellows applications. We’re excited to share what’s next!
Early Bird Registration Open: Early bird registration for iPRES 2026 is open until July 13. The conference will be held in Copenhagen, Denmark, from September 21-25. A call for Ad Hoc sessions is
Office closure: CLIR and DLF are closed on Thursday, June 18 and Friday, June 19, in observance of Juneteenth.
This month’s open DLF group meetings:
For the most up-to-date schedule of DLF group meetings and events (plus conferences and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.
DLF Born-Digital Access Working Group (BDAWG): Tuesday, 6/2, 2pm ET / 11am PT.
DLF Digital Accessibility Working Group (DAWG): Tuesday, 6/2, 2pm ET / 11am PT.
DLF AIG Cultural Assessment Working Group: Monday, 6/8, 1pm ET / 10am PT.
AIG Metadata Assessment Group: Friday, 6/12, 2pm ET / 11am PT.
AIG User Experience Working Group: Friday, 6/19, 11am ET / 8am PT.
Digitization Interest Group: Monday, 6/22, 2pm ET / 11am PT.
Committee for Equity & Inclusion: Monday, 6/22 3pm ET / 12pm PT.
DLF Open Source Capacity Resources Group: Wednesday, 6/24, 1pm ET / 10am PT.
DAWG Policy & Workflows: Friday, 6/26, 1pm ET / 10am PT.
DAWG IT & Development: Monday, 6/29, 1pm ET/ 10am PT.
DLF Climate Justice Working Group: Tuesday,6/30, 3pm ET / 12pm PT.
In the hours following the news that Redhat Insights' JavaScript packages fell
victim to a supply chain attack via NPM, developers and systems administrators
scrambled ensure all of their projects were unaffected from a supply chain attack that steals credentials for AWS, GCP, Azure, Kubernetes, HashiCorp Vault, npm, and CircleCI before then self-propagating via said stolen npm credentials and the bypass_2fa setting. This establishes persistence via Claude Code hooks and VS Code task injection. If you have installed the affected package, reprovision your development hardware.
This is is due to the affected dependencies being distributed via
NPM, the only package manager where these supply-chain
attacks regularly happen. "This was a terrible tragedy, but sometimes these
things just happen and there's nothing anyone can do to stop them," said
programmer Lady Eulah Howell, echoing statements expressed by hundreds of thousands of
programmers who use the only package manager where 90% of the world's
supply-chain attacks have occurred in the last decade, and whose projects are
20 times more likely to fall victim to supply chain attacks. "It's a shame, but
what can we do? There really isn't anything we can do to prevent supply-chain
attacks from happening if the maintainers don't want to secure access to their
accounts in a robust manner". At press time, users of the only package manager
in the world where these vulnerabilities regularly happen once or twice per
week for the last year were referring to themselves and their situation as
"helpless".
For more information, please see upstream documentation published by
Redhat Insights' JavaScript packages at the following link: redhat-javascript-clients-06-2026.
This post is the first in a series in which I write about experiences or specific challenges from my day-to-day work. Planned posts include descriptions of a bug and how this impacted the coworkers, how I wrote a script to parse log data… I’m hoping that these will be interesting for other librarians that work in entirely different areas, for my colleagues who are solving different problems on different systems (or maybe eventually the same one after we migrate), and for those who are thinking about doing this kind of work in the future.
When we talk about the ILS or LSP, it can sound like we’re talking about a single system. And we are, some of the time. But just like our permissions shape what we can see and do, the ways we access the system and its data may lead to entirely different experiences. More importantly, if you don’t know how different tools and even databases work, you may end up with inaccurate results or not knowing that something is possible.
For example, our Sirsi ILS and reporting system(s) consist of two separate databases. These databases can be accessed in: one way for most folks (two for people using a BLUEcloud module), two-to-three ways for some, four ways if you’re special, and five ways if you’re one of two people.
Diffusion of Databases
The Sirsi Symphony Database, fka Unicorn1, underlies the whole thing. This Oracle database is the ultimate database of record. If we load MARC, it ends up in the Symphony database. If we place orders, they become entries across Symphony tables. If we loan materials, it triggers a series of updates in the Symphony database.
BLUEcloud Analytics runs off a separate database, also Oracle.2 This separation is common and appropriate. Alma also uses a separate Oracle database and FOLIO has the option of Metadb built with PostgreSQL. The analytics databases don’t contain live data. Instead, they’re updated regularly overnight, based on things that have occurred in the primary database. Change a title? It’ll show up in analytics tomorrow.3 Check out a book? That transaction will show up in circ stats tomorrow.
This is an appropriate choice for three reasons:
It’s a bad idea to run large analytical queries on production. Plus, static indexes are much more efficient to search.
The analytics system has no real demand overnight, so its server can do a full reindex before running any scheduled jobs.
The analytics database can be designed differently.
Following that last point, the analytics database isn’t just a snapshot of production. It has a fundamentally different design. It anonymizes circulation transactions, but it also builds completely different indexes from the ones we need for daily work. For example, it indexes circulation data by hour, day, month, and year as well as by circulation desk. Sometimes we want big numbers. Sometimes we want to see which desks get the most traffic. Those aren’t the kind of searcehs we need to do in day-to-day work. It indexes MARC as fields and subfields, including invalid ones like λ.
Accessing the Databases
Most of my coworkers only access Symphony using one tool: Workflows. A few also use BLUEcloud Circ.4 Using the client, they look for records, update them, perform transactions, etc. We import single MARC records using Workflows wizards. We import batches of MARC records using Workflows reporting (and FTP). Global item updates are done in Workflows. The Workflows reporting module can be used to load, transform, or extract data, history, or (some) statistics.
Next, we have BLUEcloud Analytics. A much smaller set of people (but still plenty) have rights here. As described above, Analytics is a completely separate database. It’s also designed in a way that’s more oriented toward statistical work. Folks use it to extract shelf lists, acquisitions data, spreadsheets of MARC subfields, etc. The indexes are enormous and joined queries can take some time to run (and you can only run joined queries which are supported by the system), but you can get a lot of data and can’t accidentally bring down production.
About four years ago, we got access to Data Control. This is probably my favorite Sirsi product5. Unlike Analytics, Data Control gives you the power to query or even update the Symphony database itself. That means it doesn’t have some things that are in Analytics. You can’t see an item’s transaction history, for example, just its current data.6 Even fewer people have access to this, most use it on our Stage server, and just a couple of us are allowed to run batch updates to production.7
seltools is like Data Control for the command line. More properly, Data Control is an interface that lets ordinary humans use seltools with enough scaffolding not to mess quite as many things up. seltools can do even more and can do it very quickly. It is a sysadmin tool and only two people here have rights to use it. It can do extraordinary work in seconds and could cause irreparable damage (or at least, damage that requires restoring from backup). AFAIK it dates back to the launch of Unicorn.
How I Access the Data
I have rights in Workflows, BLUEcloud, Analytics, and Data Control. I tend to use them as a kind of grab bag and often chain Analytics and Data Control in my work, sometimes performing interim steps with Python or OpenRefine.
Because Analytics isn’t querying live data, it’s a much better place to do initial MARC searches. If I want to find every record with a 699, for example, Analytics is the place to do that fast. Or I could look for every 100 or 700 with a subfield “e” or search for a particular piece of text in one or more fields.
But in terms of output, Analytics leaves a lot to be desired for MARC work. It’ll shows a field’s subfields like a table. For example:
Field
Subfield
data
264
a
New York
b
Grosset & Dunlap
c
[1972]
That’s fine if I only want to facet down to the subfield b in each row, but if I want to deal with the MARC data as a field it becomes a problem.
In the Analytics reports I use, it’s easy to add the bib key to a report if it wasn’t already in there. Before we got Data Control, my next step would be to actually switch to something like Z39.50 and download all the bibs manually, hoping I got everything (because our keys are not always in the 001, it’s a long story). I then had to do a delimited export in MarcEdit or write a pymarc script to get the fields I wanted.
Now, if I want to see a set of fields from the record, I simply upload that same set of bibkeys in Data Control8. I structure my query to include the tables I want and output the fields I need from each table. I can then export them into a much nicer spreadsheet with the MARC field (and indicators, if desired) printed the way it appears in the original MARC. I can also export the entire set of records as MARC.
264
|aNew York :|bGrosset & Dunlap,|c[1972]
An Example Update
But, even better, I have the rights to update the data. In most cases, I can even use regular expressions. For example, when we added a new ILLiad request placement module to our MyAccount app, we grabbed the 020 (ISBN field) straight from the Symphony API.9 Unfortunately, about 600,000 of our 020 fields followed the pre-2013 structure, when qualifying information was still included in the subfield a. In 2013, subfield q was introduced to handle things like “(paperback)”. This unexpected data was messing with ILLiad’s automated processes. We could’ve changed the script, but it made more sense to fix the actual data, since we niw had the tools.
First, I ran an Analytics query to find all records where the 020a contained (,), or any letter except x. I exported the data, extracted the bibkey column, and then broke it into batches of 25,000 bibkeys.
I spent a few weeks working on our stage server to develop the appropriate regex-based find and replace patterns to move qualifying data into a subfield q. I had to handle various edge cases: no parentheticals, only one half of the parenthetical, etc. Once I felt confident, I ran a batch of about 5000 on stage and QAd my results thoroughly. I then spent the next month running batches in production. I limited batch sizes and chose days when we didn’t have other jobs which would trigger big reindexes (you can only do so many jobs in a night or the reindex will take forever and throw off all the other chron jobs).
Once the project was done, I was able to re-run queries in Analytics to ensure there weren’t any issues remaining.
I can also click into and update single records from Data Control results page or set it to let me modify a particular field and paste repeating data into that field. The former is useful when there might be other related fields which need to be updated or I need more context. The latter is useful when only some of the results need to be updated or the person hasn’t yet got regex privileges on production.
Clashing Designs
So that’s what it looks like when things go well. Tech librarianship so often involves what Marshall Breeding called “Knitting Systems Together” that I almost don’t think about the ways I hop across tools. At most I feel a minor irritation. Recently, I ran across a case where the difference between system designs and who had permissions to access what was making a huge difference in my coworkers’ abilities to get their work done.
In theory, the data in Analytics should mirror what’s in Symphony, at most with a different structure. However, when a barcode is updated in Symphony (generally via Workflows), Analytics completely drops entries related to that barcode. The entries are not transferred to the new barcode. Data that’s still in the item record is retained, so we have the item last activity date, the circulation count (an incremented field), etc. But we can’t see the item transaction history.
Now, there were a couple things we could do about this… I’ll describe how system logs come into play in my next post!
Specifically, it’s MicroStrategy whose Wikipedia page starts off like any other data analytics software and then …pivots to Bitcoin. It’s Michael Saylor’s company, if that name means anything to you. ↩︎
Timing could be more frequent, but I believe most have daily updates. ↩︎
BLUEcloud is Sirsi’s next-gen browser client. To my knowledge, we still only use the circulation module and many people still use Workflows for circulation. ↩︎
It’s extremely powerful, though extremely fragile – but that could also describe me, so I can only be so annoyed by it. ↩︎
Transactions here meaning every time the item was scanned, some of which is available via Analytics. There is also transaction history in Symphony but it’s in logs. ↩︎
It also supports two kinds of batch updates – a batch modify which lets you edit fields individually in a browser interface and a batch substitute which lets you run updates on fields using regular expressions. If you wanted to update a MARC 500 field on a set of items, for example, someone with batch modify permissions could display all 500 fields on the records, click Modify, and then paste a new text into any field they wanted to replace (while skipping 500 fields which didn’t match). Someone with regex permissions could find all notes matching the old note and sub it with the new note. ↩︎
Why not do the whole search in Data Control? It is painfully slow compared to Analytics, especially for MARC searches. For the cases when Data Control is designed better for searching, I’ll export a set of keys for the overall records I want to search within and then perform it as a scoped search, which is much faster. ↩︎
We only use the APIs for integrations not for reporting/updates/etc., so I didn’t list it above. Seltools are much faster and more powerful. ↩︎
Just over two months ago, I was at the Information Stewardship Forum 2026 at the Internet Archive, where I was fortunate enough to present a lightning talk about making copies of copies, entitled "The Disintegration Loops: Generational Loss in Web Archives". During one of the breaks, Mark Graham asked Sawood Alam to take a look at a problem that had stumped the Wayback Machine support team. I was sitting next to Sawood, and knowing my love for web archiving investigations, Mark invited me to take a look too. The original inquiry:
Hi, everyone! Got a concerning report from a patron alleging that WBM "URLs were intermittently displaying the current version of the website instead of the archived version." The URLs in question are:
A quick check shows that when replaying these URLs, the content does resemble what is on the live web. For example, the text shown on the page references 2025 and 2026 updates, even though the captures are from 2024 - 2025. I've attached a screenshot of the 2025 capture appearing to show live web content as well as a printout/capture the patron provided of the same URL appearing to show the "actual" archive.
Sawood and I discovered that the problem is not that these URLs are sometimes displaying the live web (or at least not directly). The problem is that this seemingly simple "Terms of Use" page is unnecessarily complex, with the boilerplate legal text included via an API call. The JavaScript that makes the call includes a number of superfluous URL arguments, including "screenWidth" and "screenHeight", and probably are appended to all API calls "just in case they are needed" (presumably the "Terms of Use" do not actually vary based on the size of the browser). Thus, depending on the size of your browser, the legal text included in the page is potentially archived at different times, sometimes resulting in a temporal violation: a replay of an archived web page with subresources in a combination that did not exist at the time the top level page was archived.
Although there are potentially a countably infinite number of archived "Terms of Use" pages, for the examples above there are two semantically interesting versions: one is marked (near the top, left-hand side) "Last Updated: January 18, 2024" and the other is marked "Last Updated: September 22, 2025". Taking these "Last Updated" strings at face value, we would not expect the three URLs above (archived at "20240222221058" (February 22, 2024), "20241228224626" (December 28, 2024), and "20250531013827" (May 31, 2025)) to display "Last Updated: September 22, 2025". But sometimes they do – and sometimes they don't – and which archived version you get depends on the size of your browser.
First, as of the time of this writing, the live web still has the "Last Updated: September 22, 2025" version:
What appears to be a relatively simple HTML page is unnecessarily complex, with nearly 200 subresources. The figure below shows the relevant portion of the call stack: the HTML page calls the cheekily named JavaScript "brastrap.js", which in turn calls the API at "api.victoriassecret.com".
And since the live web still has "Last Updated: September 22, 2025", this is what caused people to think they were getting a live web version (more on that in a bit). First of all, the Wayback Machine's "About this capture" link does not help; it shows only some of the subresources (improving its function is a task for another time):
"About this capture" lists only some of the subresources, and not the problematic api.victoriassecret.com page.
Sawood discovered the API URL first. It's well-obfuscated, so it's not a surprise that tech support staff did not find it immediately. We were sitting side by side, each using our own laptops, and he's much smarter than me and he's always going to win that race. But I noticed that for me, the page seemed to be saved right then, just a minute or two before, whereas he saw that it was archived a few days before (it was then March 19, 2026). That was odd, but the next session started and I had to stop.
The 2024 archived version of the page uses a "/v12/" version of the API endpoint (note: this is a common but wrong way to version an API), but it's similar to the 2026 live web example above:
In particular, the "/v12/" endpoint remains functional, even though the live web HTML & brastrap.js access the "/v15/" version. Checking the Wayback Machine directly confirmed that this was indeed the first time that URL had been archived:
Although Sawood found the problem URL, and we confirmed it was archived in March, 2026 (and thus displayed the "Last Updated: September 22, 2025" string), it bothered me that he had an earlier archival time than I did (March 14, 2026 vs. March 19, 2026). After the next session ended, I returned to this problem. I changed the size of my browser, and was able to force another new archived version (reproduced on March 22, 2026 below):
Although it's beyond the scope of this post, the Wayback Machine's Save Page Now has a "/save/_embed/" API that allows the Wayback Machine to "patch" the archive with missing URLs from the live web. In this case, the version of the API response ending with "&screenWidth=565&screenHeight=605" was "missing" from the Wayback Machine, so it patched the archive from the live web, which still displays the "Last Updated: September 22, 2025" string, despite the main HTML page being archived in February, 2024. So in essence, the Wayback Machine was displaying the live web version, after it was immediately saved to Wayback Machine. Presumably the "Terms of Use" page changes slowly, but this behavior would be more noticeable if the "Last Updated" string was updated, say, every minute.
A call to the CDX API confirmed that there were a variety of screenWidth and screenHeight combinations archived (horizontally scroll to the right in the gist below to see the combinations):
In fact, by inspection, there are at least two chances to get the wrong version. If your screen size is "screenWidth=1600&screenHeight=1000", you will get a version of the page that has the string "Last Updated: February 7, 2023", a temporal violation reaching into the past instead of the previously described version that is a temporal violation from the future. A screen size of "screenWidth=1400&screenHeight=900" will produce the right result ("Last Updated: January 18, 2024"), and a screen size of "screenWidth=1440&screenHeight=900" will produce a different wrong result ("Last Updated: September 22, 2025"). And as shown above, a screenWidth and screenHeight combination not already archived will cause the Wayback Machine to be patched from the live web. Furthermore, if/when the "/v12/" live web API endpoint is deprecated, then unarchived size combinations will just cause the page replay to silently fail, and most people won't understand why.
In summary, this seemingly simple "Terms of Use" page is really quite challenging in practice:
The API call is not easily discovered, and the "About this capture" service does not show the API URL (and many of the other nearly 200 URLs of subresources in this page).
The API has a raft of (arguably) unnecessary URL arguments that do not change the response and cause the Wayback Machine to patch the archive from the live web.
Because the temporally violative subresource is JSON and not, say, a JPEG, one can't simply right-click on the subresource and inspect when it was archived.
We've encountered synchronization problems with HTML and JSON before (e.g., "Right HTML, Wrong JSON" (JCDL 2023), "Challenges in replaying archived Twitter pages" (IJDL 2024)), but the implementation complexity found in news outlets and social media was to be expected: the advanced UI features that make these sites engaging (e.g., auto-updating, infinite scroll, embedded media, personalized content) are the same features that make archival replay difficult. Without the "Last Updated: …" string, the problem would have been much harder to notice and diagnose. The seemingly intermittent nature, where you'd get a temporally coherent replay only if your browser was the same size as the previously archived responses, made the investigation especially challenging.
Who pays attention to their browser's exact width and height? In this case, they were the keys to solving this puzzle.
I mean, isn't it obvious? It's something like FreeBSD or Fedora that has a
kernel, userspace, graphics stack, core set of programs, and everything else
you need to be able to use a computer. Is this a trick question?
Well it depends, is the Nintendo Switch OS an operating system? It doesn't
have a shell in the same way FreeBSD does. Is SEL4 an OS? It doesn't ship
with core utilities. Is Linux an OS? Is Windows an OS?
The definition of an operating system gets really fuzzy when you start looking at the edges of it, but let's say that an operating system is any part of a computer system that doesn't involve pure math. When you print to the screen, render 3d graphics, connect to the internet, and write to files your code calls into the underlying system to do that work. These system calls are defined by your operating system and are exposed as functions*.
Okay they're not actually functions, but they quack enough like functions that
you can treat them like functions and not have to worry about the details too
much.
System calls are injected into each operating system process via a process kinda like how you inject dependencies into your applications for database sessions or object storage operations.
Bashing your head into the wall
A while ago a new JavaScript package got into the meme sphere at work: just-bash. It's a sandboxed environment with a shell interpreter that was originally intended for use with AI agents after its author observed that AI agents know how to use a tool called bash a lot better than a tool called search_documentation. This is backed by a "fake" shell with "fake" core utilities (cat, ls, etc, hereinafter coreutils) so that when an agent decides to rm -rf /, nothing important actually leaves the room. One of my coworkers made @tigrisdata/agent-shell on top of this that uses Tigris as its storage layer.
This is great for people in the JavaScript ecosystem, but I am not mainly a JavaScript developer. I really wanted to play with it so I started thinking what it would take to have something like this in Go. mvdan's shell package makes this a heck of a lot easier, meaning that this "fake" shell would be powered by a real shell instead of either porting half of bash to JavaScript or making up hopefully-compatible behaviour.
After a bunch of thought, hacking, and a spot of vibe coding while I did some Dawntrail extreme mount farms, I ended up with Kefka, a "fake" shell with coreutils implementations that lets you put your programs in clown jail. This package lets you add a sandboxed-in-userspace shell to your existing projects without shelling out to the actual implementations of coreutils on your machine.
The name is inspired from Kefka
Palazzo, the final boss
of Final Fantasy VI. Need to chain uncontrollable demons? Use the power of a
mad god driven to the brink of insanity with raw access to magic! What could
possibly go wrong!
So I did that
So after some thought, I came up with this interface for the "commands" to use: Execer. This takes process context and passes it as an argument to a function named Exec. Exec then does whatever the process needs it to (list files, write to stdout, etc.) and returns an error if things went wrong and no error if things didn't.
type ExecContext struct{ Stdin io.Reader
Stdout, Stderr io.Writer
Dir string Environ expand.Environ
FS billy.Filesystem
// Runner is the active shell runner. Commands that need to dispatch a// child command (for example, `time CMD`) should call Runner.Subshell()// and re-enter the shell so the call goes through the same exec handler// chain instead of poking at the registry directly. May be nil in// embedders or tests that have not wired up a runner. Runner *interp.Runner
}type Execer interface{Exec(ctx context.Context, ec *ExecContext, args []string)error}
This is where I started vibe coding things, mostly via a skill that ports a just-bash command to the Execer interface and filesystem in Go. just-bash itself looks vibe coded from help output and manpages; I tried to go further and stay POSIX compatible, down to matching flag syntax (and in some cases output formats). If your muscle memory fails you, it's a bug in my book.
This is a fully POSIX compliant implementation of true! Here's the relevant part of the spec if you don't believe me:
true - return true value
SYNOPSIS
true
OPTIONS
None.
OPERANDS
None.
Really, check out the POSIX spec for true. It's trivial to implement, here's a oneliner to implement it in Linux:
touch ./true && chmod +x ./true
I made an operating system*
This is basically an operating system: it provides interfaces for programs (well, in this case functions) to get input from a user, send output to a user, interact with a filesystem, and more. Eventually I want to add networking via a network stack on ExecContext, probably with tsnet or wireguard-go's netstack package for the user-level side. Maybe there's room for adding CEL based network filters there too.
Porting applications with WebAssembly
Once I got basic coreutils working, I thought it would be fun to get Python, jq, and ripgrep working. From previous experimentationback in the strawberry era of AI, I had already gotten Python running in WebAssembly via wazero. This used the stdlib io/fs#FS interface to allow me to inject virtual filesystems into the WebAssembly context. I used this to isolate my chatbot's filesystem state so that it (hopefully) wasn't able to delete anything important by accident.
io/fs#FS has methods for the important stuff, and runtime interface assertions let you bridge the gap for things like writes. But it was really designed for embedded filesystems, and writes get hairy fast once you're talking to object storage or anything that isn't a tree of bytes on disk.
At some point I hit a wall and had to switch from io/fs#FS to billy, another filesystem interface that I think predates the standard library one. This gives you a bunch more methods that map a lot closer to filesystem semantics in ways that coreutils crave. The interface was also mostly compatible with io/fs#FS so most of the hard part was really changing out the type and then chasing down compiler errors until I found enough of a pattern to have Opus automate the rest of it.
From there it was a matter of adapting billy's filesystem to wazero's experimental sys interface. Mostly glue code, except where I had to translate Go errors into POSIX errno values. I had to read both the POSIX spec, the WASI spec, and the wazero source to figure out how to map errors between the two worlds. I think I'm at least 95% correct, which is likely within the margin of porting error.
Adapting that codeinterpreter/python library to the new interface was mostly straightforward, and I ended up with a flow like this:
// from https://tangled.org/xeiaso.net/kefka/blob/main/command/internal/python3/python3.gofunc(Impl)Exec(ctx context.Context, ec *command.ExecContext, args []string)error{ fsConfig := wazero.NewFSConfig().(sysfs.FSConfig).WithSysFSMount(billyfs.New(ec.FS),"/") config := wazero.NewModuleConfig().// Pipe ExecContext stdioWithStdin(ec.Stdin).WithStdout(ec.Stdout).WithStderr(ec.Stderr).// Pipe argvWithArgs(append([]string{"python3"}, args...)...).WithName("python3").// Pipe filesystemWithFSConfig(fsConfig).// Pipe system timeWithSysNanosleep().WithSysNanotime().WithSysWalltime() mod, err := runtime.InstantiateModule(ctx, compiled, config)if err !=nil{// Fit the square peg into the round holeif exitErr, ok := errors.AsType[*wsys.ExitError](err); ok {if code := exitErr.ExitCode(); code !=0{return interp.ExitStatus(uint8(code))}returnnil}return err
}return mod.Close(ctx)}
See? The dependencies such as stdin, stdout, and stderr get injected
into the WebAssembly guest. Wazero also makes you inject the implementation of
time for boring reasons involving deterministic computing, but I'm sure you
can see the ways things hook in. This basic dependency injection flow is how
things like the linuxulator in FreeBSD
or the old version of the Windows Subsystem for Linux work (WSL1 before it was
made into a Linux VM with WSL2). The table of system calls and filesystem
context is effectively an argument to the process.
Same trick got me ripgrep and jq. jq was annoying because wasi-sdk doesn't love jq's (ab)use of cmake; however 30 or so minutes of tweaking compiler flags got me a binary that works enough.
I could see it being pretty easy to port over arbitrary programs to Kefka using WebAssembly like this. There's just one small problem: WASI preview 0.1 doesn't allow you to open arbitrary network sockets. This has been a huge pain in practice (it means you can't do HTTP requests, database connections, or other common internet things from inside the WASM sandbox) and future work probably would include adapting wazero to use wasix instead of WASI 0.1.
Using filesystems that don't exist
OK, that handles filesystems that (arguably) exist, like the btrfs volume on my dev box. What about filesystems that don't? For the sake of argument, let's say you want this fake shell to interact with object storage as its main filesystem. At some level all you need to do is adapt the billy interface to object storage using something like storage-go.
Disclaimer, I work at Tigris and developed this library for them. It's
basically the S3 client with more methods to handle additional Tigris features
like forks and snapshots. I'll be writing more about it soon.
After finding a basic implementation of an S3 -> Billy adapter, I vendored it into the Kefka repo and swapped out the "real" filesystem in cmd/kefka for an s3fs implementation pointed at a sample Tigris bucket. From there it was down to an iterative process of running commands, finding feature gaps when errors showed up, implementing them, fuzzing, and making sure things work mostly the same against Tigris as they do against a local filesystem.
WASI is cursed: it has no process-level "current working directory," which most programs assume exists. You patch around it by passing a CWD envvar, or just use absolute paths. I haven't hit anything broken in casual use, but expect rough edges. Here be dragons and this code may be known by the state of California to cause cancer.
Why does it have to use the command line?
Once everything got working with s3fs and a local shell, I wondered how hard it would be to make this work as an SSH server using the github.com/gliderlabs/ssh package. Hooking things up was pretty easy:
funcHandleSSH(sess ssh.Session)error{// Convenience variables for SSH session valuesvar stdout io.Writer = sess
var stderr io.Writer = sess.Stderr()var stdin io.Reader = sess
ctx := sess.Context()// cancelled when the user disconnects// Kefka command registry with coreutils/python/jq/etc commands := registry.New() coreutils.Register(commands) wasmprog.Register(commands)// Base envvars for all programs, needed by POSIX env := expand.ListEnviron("HOME=/","PWD=/","IFS=\n","HOSTNAME=localhost","USER="+sess.User(),// not strictly required, but just-bash sets it"MACHTYPE=x86_64-pc-linux-gnu",)// Create shell engine sh, err := interp.New(// Set the "interactive" flag so the shell expands aliases interp.Interactive(true),// Forward our envvars interp.Env(env),// Wire up stdio interp.StdIO(stdin, stdout, stderr),// Change the shell exec handler such that it's constrained to the// Kefka registry.//// Strictly speaking you don't have to do this, but if you don't// then any time the registry doesn't have a command// implementation, interp falls back to its default ExecHandler that// executes the command as a subprocess. This is almost certainly// not what you want. interp.ExecHandlers(constrainToRegistry(commands)),// Wire up per-command pwd state to the filesystem implementation interp.CallHandler(billysh.CallHandler(commands, fsys, stdout, stderr)),// Handle shell-level filesystem I/O (redirects, glob expansion, etc) interp.StatHandler(billysh.FsysStatHandler(commands, fsys)), interp.FsysOpenHandler(billysh.FsysOpenHandler(commands, fsys)), interp.ReadDirHandler2(billysh.FsysReadDirHandler(commands, fsys)),)// Read shell commands parser := syntax.NewParser() fmt.Fprintf(stdout,"$ ")// Split input into commandsfor stmts, err :=range parser.InteractiveSeq(stdin){if err !=nil{return err
}if parser.Incomplete(){ fmt.Fprintf(stdout,"> ")continue}for_, stmt :=range stmts { err := sh.Run(ctx, stmt)if sh.Exited(){return err
}}// Show prompt fmt.Fprintf(stdout,"$ ")}returnnil}
The real handler is much messier because Python's REPL needs careful buffering, Ctrl-C has to actually cancel things, and pty wiring is its own can of cans of worms. None of that shows up if it's working. Tab completion and readline polish are easy enough; I'll let you wire those up as an exercise for the reader.
If you want to try it today, you can ssh into sophia.xeiaso.net:
$ ssh sophia.xeiaso.net
You'll get an isolated sandbox in your own bucket fork/branch. Every ls is a ListObjectsV2 against the bucket. Every qjs or python3 runs WebAssembly on the server, wired to that same bucket.
I should really hook up session recording to this.
I want more experimental WebAssembly hacks like this to exist. I'll keep poking at it.
Put your programs in clown jail
With some effort, yeet could use Kefka's shell utilities to run Anubis builds on Windows; and if management ever makes you babysit AI agents, clown jail is a decent answer.
The code lives on Tangled. I'm wiring it into an agent harness so I can automate small tools against a local model (I'm loving Qwen3-36B-A3B).
There's a sister post on the Tigris blog that goes deeper into the AI-agent angle and the porting work using Claude Code. If you want, you can check it out here:
Academic librarians and others often engage with media literacy instruction by promoting fact-checking strategies, such as lateral reading or Mike Caulfield’s SIFT. Evidence shows that these strategies are valuable and can be effective, but they all ultimately rely on individual students to use willpower to overcome cognitive habits, biases, strong parasocial relationships with content creators, the power of algorithms, and other challenges to fact-checking content in the moment. This paper offers an alternative approach that instead encourages librarians to support students in intentionally redesigning their information environments to improve the quality of information that they encounter in the first place.
“The task of breaking a bad habit is like uprooting a powerful oak within us. And the task of building a good habit is like cultivating a delicate flower one day at a time.” – James Clear
In a 2024 study conducted by the News Literacy Project, the organization found that 80% of the teen participants believed that journalists fail to produce more impartial information than other online content creators, and 69% said that news organizations intentionally make their content biased to advance a particular viewpoint. When the News Literacy Project followed up with these young adults a year later, they found that most of them believe that trustworthy, unbiased news is rare or maybe doesn’t even exist (2025).
Pew Research found, through a series of focus groups, that Americans don’t always agree on what constitutes a “journalist” or “news media,” and young adults are more likely than older adults to call “new media” platforms hosts, such as podcasters and social media creators, “journalists” (Eddy et al., 2025). Overall, younger participants were less likely than older adults to even care whether the news they consume comes from a journalist. The investigation found that Americans are concerned that, besides maybe a few reliable ones, journalists are concerned with “clicks, eyeballs, money, things like that, and they don’t necessarily mind tweaking the truth to suit their audience or their advertisers” (quoted in Pew Research, 2025).
These statistics are significant because cynicism about standards-based news and other traditionally authoritative institutions has many negative impacts. First, news cynicism can lead to news disengagement, which pushes information consumers to less reliable platforms (Ahmed et al., 2025; Fletcher, et al., 2024; Mont’Alverne, 2022) and contributes to erosion of trust more broadly in institutions like voting (Park, et al., 2025; Raffio, 2025). When people disengage, news sources themselves are threatened by obsolescence, and this threatens their role as a watchdog and a keystone of democratic societies (Haider & Sundin, 2022). News cynicism makes it difficult for accurate information to reach people and, paradoxically, makes people more vulnerable to misinformation (Ahmed et al., 2025; Hasell & Halversen, 2024). Individuals may feel anxious, depressed, and helpless about their world, leading to a spiral of disengagement (Hasell & Halversen, 2024). News cynicism also fuels societal division and threatens democracy (Cappella & Jamieson, 1996; Valgarðsson et al., 2025). Widespread distrust in institutions such as the government, science, public authorities, and the press is a risk to media literacy, democracy, civil discourse, and our sense of agency.
Academic Librarians and Media Literacy Instruction
One strategy for helping students and others improve their media consumption is to teach them media literacy skills. Media literacy is generally thought to be the ability to access, evaluate, analyze, and create media messages (Aufderheide, 1993), although definitions vary considerably between researchers and practitioners (Fleming, 2014; Hobbs, 1998). Media literate individuals have the skills to identify media sources and messages that are unreliable, and, perhaps more importantly, craft an overall media diet that is more likely to consist of reliable information.
Academic librarians are interested in and possess relevant expertise to teach students media literacy skills that are relevant in academic and non-academic settings. Many librarians have explored tactics for teaching students source evaluation skills that move beyond the CRAAP test (Currency, Relevance, Authority, Accuracy, Purpose), such as the SIFT method (Stop, Investigate, Find, Trace), created by Mike Caulfield (2019), or lateral reading, popularized by the Civic Online Reasoning organization (Digital Inquiry Group, n.d.). Caulfield’s SIFT method provides a more up-to-date approach to source evaluation by offering strategies that are more efficient, straightforward, and applicable in a wide variety of contemporary information settings (Bull, 2021). “Lateral reading,” which is a key component of SIFT, involves leaving the source that is being evaluated and opening new browser tabs to investigate what other Internet sources report about the site and its claims (Wineburg & McGrew, 2019). Research has shown that the SIFT method and lateral reading results in more accurate student source evaluation (Bobkowski & Younger, 2020; Breakstone et al., 2021; Brodsky et al., 2021). These techniques reflect a better understanding of the modern online information environment than simplistic checklist strategies. However, they still expect students to avoid misinformation through careful self-control and self-monitoring.
Misinformation is an interdisciplinary problem with significant complexities. As Sullivan has argued, librarians have historically focused on media literacy instruction strategies that neglect the psychology of how people interact with information, and the field of library and information sciences is somewhat siloed in its exploration of source evaluation instruction (2019). For example, heuristics, systems thinking, mental models, and cognitive biases all play a role in how and why people adopt misinformed beliefs. Emotions also influence the ways that individuals evaluate information (Hewitt, 2023; Hicks & Lloyd, 2021), yet they play a minor role in most library source evaluation instructional strategies. Academic librarians may have a role in combatting misinformation, but we should proceed, as much as possible, guided by research conducted across disciplines (Saunders, 2025). As an example, academic librarians have often focused their source evaluation teaching on investigation strategies and fact-checking skills. These skills are very important, and we shouldn’t abandon them. But there are many reasons, informed by research outside of Library and Information Science (LIS), why reactive strategies that rely on individual willpower are destined to be difficult to maintain.
Challenges of Fact-Checking and Other Traditional Source Evaluation Techniques
Evidence shows that, globally, trust in institutions is decreasing, including in democratic societies (Kavanagh & Rich, 2018; Gil de Zúñiga & Diehl, 2019). The consequences of this could be severe, as many scholars posit that trust in institutions is an important pillar of democracy (Haider & Sundin, 2022). There are also a number of well-studied examples of how bad actors can sow doubt in institutions, such as academia, to achieve their own ends (Haider & Sundin, 2022). This has played out in the case of the tobacco industry and fossil fuel companies; in both cases, the science is clear, but raising uncertainty can be enough to sway consumers to take actions that are not in their best interests (Oreskes & Conway, 2010). All of this said, when society’s institutions become corrupt or unreliable, or when institutions are systematically unfair to one’s group or identity, distrust in institutions is often justified (Haider & Sundin, 2022). So while dismissing institutionally-backed information in favor of persuasive individuals is risky, confidently pointing to institutions as always trustworthy is also unlikely to be effective. Easy-to-apply source evaluation checklists that are meant to be used across all contexts and blind trust in compelling individual voices both fail to reflect the complexity of information environments.
While media literacy that relies on individual fact-checking skills is very important, there are many reasons why a willpower approach is likely to have limited success. The section below explores these limitations from internal factors, to external factors, and finally, to systemic factors.
Limits of Fact-Checking: Internal Factors
The intuitive solution to the problem of misinformation is to let media consumers know that a piece of information is untrue. However, there is mounting evidence that retractions and corrections have little effect on whether someone will make decisions based on misinformation (Seifert, 2014; Thorson, 2016; Zhou & Shen, 2024). There are many potential reasons for this, but one that almost certainly plays a role is the effect of cognitive bias. For example, epistemic egocentrism is a cognitive bias that occurs when individuals fail to consider their own privileged information when imagining the perspectives of others (Royzman et al., 2003; Zhou & Shen, 2024), which can cause people to judge their own source evaluation skills highly and blame the problem of misinformation’s spread on others. Closely related is blind spot bias, which is the belief that one is immune to bias (Pronin, et al., 2002). Confirmation bias is also relevant to the adoption and spread of misinformation; this bias is the tendency to seek out and remember information in ways that favor existing beliefs (Nickerson, 1998; Oswald & Grosjean, 2004). A consequence of confirmation bias is selective exposure, or a person’s proclivity to preferentially seek and engage with information that is in alignment with their existing values, beliefs, or attitudes (Zhou & Shen, 2024). These cognitive biases, which can occur whether or not the person has a pre-existing attitude about the misinformation, may lead people to dismiss corrections, assume they are correct in situations where there is substantial conflicting evidence, or, by consciously or subconsciously designing their information environment, rarely encounter threats to their existing worldview.
Research into the mechanisms that cause misinformation adoption to persist (sometimes called the “continued influence effect of misinformation”) shows that corrections can fail in their effectiveness when they leave a gap in someone’s mental model, especially when the misinformation fills that gap in a more satisfying way (Johnson & Seifert, 1994, p. 1420). Retrieval errors can also contribute; for example, when misinformation is retrieved from memory without the “false” label, or when misinformation is retrieved more readily than its correction (Ecker et al., 2011; Gordon et al., 2017; Lewandowsky et al., 2012). Because the misinformation and correction both exist in memory, deliberate, effortful thinking is necessary to retrieve corrections from memory, and natural cognitive efficiency processes can make this retrieval difficult or unlikely (Kendeou & O’Brien, 2014; Pennycook & Rand, 2019). These neurological processes make debunking misinformation incredibly challenging once it has been adopted into someone’s mental model.
Information consumers are also often very confident about their beliefs, even if their knowledge about the topic at hand is, upon investigation, quite shallow. While perceptions of widespread misinformation increase, Americans are confident that they have the skills to identify this unreliable content. In 2016, a study found that 84% of participants were confident in their ability to spot “fake news” and 64% of those same participants believed that fabricated news stories caused significant confusion for Americans (Barthell et al.). Who is being confused by these stories? Not them, the participants in the study seemed to say; it’s everyone else. This points to an overconfidence that individuals have in their own ability to detect false information, contributing to the problem of misinformation’s spread.
One cognitive bias that helps to explain this phenomenon is the Dunning-Kruger effect, whereby individuals with limited knowledge of a subject fail to accurately assess their own level of expertise (Dunning, 2011). For example, research has shown that overconfidence in news judgments is associated with higher susceptibility to false news across a variety of topics, from autism awareness to nutrition claims (Lyons, et al., 2021; Motta, Callaghan, & Sylvester, 2018; Peng & Shen, 2025). Along the same lines, the “nobody-fools-me perception” is a cognitive bias whereby someone is overconfident in their ability to detect misinformation, especially as compared to others (Martinez-Costa et al., 2022). This leads people to make claims like “Many people haven’t learned to check facts” but fail to recognize their own media literacy deficiencies (Martinez-Costa et al., 2022).
Relatedly, the illusion of explanatory depth occurs when people believe they understand a complex topic more than they actually do upon further probing (Rozenblit & Keil, 2002; Sloman & Fernbach, 2017). Humans move through the complex, nuanced, and dangerous modern world by holding a naive intuition that they understand how the world around them works. This, combined with poor knowledge about the extent of our knowledge, causes a pervasive belief that we can explain the world around us even when we can’t (Bailey, 2021). The illusion of explanatory depth can cause people to adopt false beliefs confidently, not realizing their shallow understanding of the topic should cause them to question their self-assured stance.
It’s important to note that a 2025 study found that exposing participants to false news not only caused them to become overconfident in their judgments about whether news stories were true or false, it also fueled news mistrust (Altay et al.). This study demonstrates how news environments themselves contribute to issues that spur misinformation’s spread, such as overconfidence and cynicism. Along the same lines, some researchers worry that media literacy interventions that focus on “misinformation’s omnipresence” risk heightening the salience of misinformation as a threat to society and individuals, ultimately increasing news mistrust (van der Meer, Hameleers, & Ohme, 2023). Misinformation warnings alone can provoke a deception-bias, whereby people assume deception in news messages, rather than defaulting to a trust-bias as they often do in other contexts (van der Meer, Hameleers, & Ohme, 2023).
Limits of Fact-Checking: External Factors
While it’s clear that cognitive limitations make corrections to misinformation difficult or impossible, other researchers argue that misinformation itself is not as widespread of a problem as is commonly believed. They argue that the current perceived prevalence and “panic” about misinformation is a kind of “historical amnesia” (Stecula, 2025). The spread of misinformation is nothing new, and misleading messages have been created and spread for hundreds of years, from anti-vaccination movements of the early 1800s to disbelief about the real cause of JFK’s assassination, all of which occurred before the invention of social media (Stecula, 2025). What is different about the spread of false messages today is their overt support by important societal leaders and the new visibility their small groups of adherents have due to social media. These changes have allowed society to diverge into competing knowledge communities with unique standards for expertise, source evaluation, and, ultimately, defining truth (Stecula, 2025). These new, ideologically isolated communities with extreme views do not represent the majority of the population, but may seem to, given the way social media can amplify their messages. Fact-checking is likely to have limited reach and impact in these isolated, closely-knit communities.
Even in the rare cases when overtly false information is spread outside of isolated bubbles, fact-checking as a strategy for stopping its spread has limitations. Some argue that most fact-checking is ultimately reactive, constrained by scale and speed, and destined to always be catching up with rapidly changing misinformation messages (Wack, Duskin, & Hodel, 2024). Fact-checkers themselves worry that fact-checking risks drawing additional attention to misinformation and has limited impact for cognitive reasons; one said, “I can only convince those already convinced” (Westlund et al., 2024).
Another assumption of fact-checking is that knowledge of the truth impacts people’s behaviors in positive ways. However, research about climate change misinformation, for example, found that even when people have accurate beliefs about climate change, it has limited impact on their willingness to engage in pro-environmental behavior (Spampatti, 2025). Additional research has shown that, for some individuals, feeling and appearing independent from outside influence is more important than being correct; for these individuals, whether something is factual or not is irrelevant to whether it should be shared (Stein & Rutchick, 2025).
It’s also possible that the problem of misinformation has been mischaracterized due to how it is typically studied. Current research on misinformation often focuses on issues that are likely to invoke false beliefs, and it also rarely asks participants to indicate confidence levels; both of these oversights may inflate the perception that people are deeply divided about many issues. In reality, participants may just be uninformed about issues, not misinformed, which is not captured in most studies (Stecula, 2025). Along the same lines, many studies that rely on truth discernment tasks impose a false dichotomy between true and false statements, when misinformation in real world contexts often rides the line between true and false, or may include some true statements with an overall misleading message (Spampatti, 2025).
Limits of Fact-Checking: Systemic Factors
Research on the spread of misinformation has also frequently focused on individual-level susceptibility without addressing the role of structural inequities in shaping exposure to misinformation and capacity to resist it (Lin et al., 2022; Schirmer, et al., 2025; Walter et al., 2020). Socioeconomic disparities limit who can access high-quality information; lack of broadband access, language differences, and digital literacy deficiencies can all contribute to this problem (Schrimer, et al., 2025). Systemic mistrust, justified by decades of historical injustice, can lead some to seek information outlets alternative to the mainstream, exposing them to misinformation (Jaiswal et al., 2020; Pew Research Center, 2024). Many marginalized communities, however, are actively working to understand the impacts of misinformation and take grassroots efforts to combat it (Schirmer, et al., 2025). There are many ways to move beyond laying the responsibility of misinformation avoidance on individuals, and structural interventions have more potential to address the social disparities that shape misinformation adoption.
While fact-checking strategies in particular have limited utility, all misinformation interventions that expect individuals to exercise willpower in algorithmically-driven environments will face considerable difficulties. Algorithms have significant power to influence what information and voices individuals encounter. While evidence about the impact of “filter bubbles,” or isolated online spaces that perpetuate misinformation messages (Pariser, 2011), is mixed (Arguedas et al, 2022), there is some evidence that filter bubbles can limit users’ exposure to diverse points of view and increase users’ access to lower-quality content (Ciampaglia et al, 2018). It can be tempting, in today’s algorithm-rich environment, to assume that, instead of intentionally seeking out standards-based news, that news will “find” you (Skurka, et al., 2025). American adults who think the news will “find” them are more likely to overestimate their ability to tell false from true political news and more likely to engage confidently with false news messages (Skurka, et al., 2025).
One reason social media messages can be especially compelling has to do with influencers. Social media platforms allow for individual voices to have an outsized influence on large sections of the population. These individual voices, or “influencers,” do more than entertain people; they often drive the narrative around topics ranging from politics to economics to health (Thi & Ibrahim, 2025). While research shows that credibility, consistency, and transparency are important characteristics of an influencer that people trust, for an influencer to truly appear “authentic,” they must also build an emotional connection with their audience by seeming relatable and “being real” (Thi & Ibrahim, 2025). Accuracy of the messenger, while not completely irrelevant, is not the most important factor when people decide who to trust in social media settings.
The emotional bond that audience members form with influencers contributes to the rise of parasocial relationships, which are one-sided relationships in which someone develops a sense of closeness and intimacy with a media figure, usually a celebrity or influencer (Hoffner & Bond, 2022). The intensity of parasocial relationships is driven by the media figure’s moments of self-disclosure, glimpses into parts of the person’s life that are usually unknown, and momentary, technology-mediated interactions (e.g. reposting or liking a fan’s post) (Hoffner & Bond, 2022; Kim & Song, 2016; Kurtin, O’Brien, Roy, & Dam, 2018; Dai & Walther, 2018). Even though the influencer or celebrity does not know fans or even necessarily have their best interests at heart, it can feel to fans that they do because of the sense of closeness and trust they have for the influential person.
Influencers are an important source of misinformation in the information ecosystem because of the scale of their impact. This is especially true for messages that are already viral or widespread; these messages actually help influencers gain more trust from their followers, regardless of the veracity of the message (Mulcahy, et al., 2024). However, influencers face little to no accountability when it comes to sharing misinformation, beyond the impact that being found to have shared inaccurate information might have on their reputation (Thi & Ibrahim, 2025). Unlike journalists, who receive training and commit to a code of ethics, social media creators operate outside any kind of formal ethical framework.
Complicating the interplay between cognitive biases, algorithmically-driven online spaces, and persuasive social media personalities, is the rise of generative artificial intelligence (AI). Although access to this technology is fairly recent, the use of these systems contributes significantly to the existing problem of misinformation by allowing for the easy creation and customized dissemination of misinformation at scale (Bontridder & Poullet, 2021). Even elected officials have shared AI-generated misinformation with a wide audience (Skau, 2026).
The widespread sharing of AI-generated misinformation has two main negative impacts; first, even when the content is fact-checked, it can continue to misinform due to the previously mentioned continued influence effect. Sandra Ristovska, an expert in visual evidence from the University of Boulder, Colorado described this challenge of false AI-generated images: “It lies deep in human nature and in the way we see and interpret images that it can be difficult to ‘un-see’ an image or a video once we have seen it” (Ristovska as cited in Skau, 2026, para. 10). The other negative effect is that it can contribute to a sense that nothing online is real, or that we shouldn’t bother determining if something is true or false; in other words, it deepens the cynicism many already feel. As Renee Hobbs, Professor of Communication at the University of Rhode Island, stated, “If we become indifferent to whether something is true or false, we risk losing many of the cooperative structures that make civilization possible” (as cited in Skau, 2026, para. 13).
Willpower and Habits
Clearly many factors make fact-checking a challenging strategy to rely on for stopping the spread of misinformation and improving students’ media literacy. Importantly, whether an individual is stumbling upon someone else’s fact-check or considering whether to fact-check something themselves, they must have the willpower to take additional critical steps.
It could be argued that the most effective means of improving this situation is to make systemic changes, such as improving social media and search engine algorithms to prioritize accuracy and flag misinformation, or requiring influencers to be more transparent about their motives or qualifications. But while we continue to push for these systemic changes, individuals must continue to make information choices everyday, and this is what library instruction tends to focus on. With that in mind, how can we encourage individual actions that rely less on willpower?
What we are ultimately trying to accomplish is a habit change. Considerable research shows that changing someone’s habits through willpower is very challenging and often destined to fail (Bargh & Barndollar, 1996; Borland, 2013; Muraven, 2012; Wood et al., 2014). What is more effective is changing someone’s environment to encourage the desired behaviors (Bargh & Barndollar, 1996). In research conducted about the importance of environmental as opposed to willpower-based approaches to habit change, Duckworth et al. describe how “situational selection strategies” like putting a distracting device in another room during study time, spending time with friends who value studying, and telling someone else their study goal to hold them accountable had maximum success in improving student study habits (2016). These strategies were more successful than “self control” strategies, which students described as a mindset like, “Just deal with it and study” or “Just do it…I just focus and get my work done” (p. 334). This is just one example of many studies that show how stopping a bad habit through sheer willpower and keeping all other aspects of the environment the same has limited success. However, changing the environment to make the bad habit more difficult and good habits easy and effortless has a much better chance at success.
The same is true with our information environments. When students spend considerable time in algorithmically-driven social media spaces, they may encounter more poor-quality information that requires fact-checking, and they may feel both a sense of cynicism about the information system more broadly as well as a lack of agency. However, when students spend less time being directed by an algorithm in information spaces with lots of tempting, low-quality information, and more time consulting reliable, standards-based information sources, they improve their information behavior, and, importantly, gain a sense of agency about what information they encounter and consume.
Recommendations for Academic Librarians
Although structural changes are necessary to address many of the issues discussed here, academic librarians may be able to contribute by changing how we approach information literacy instruction. While fact-checking methods like SIFT and lateral reading are important skills (that are convenient to fit into a 50 minute class period), librarians could instead (or in addition) address the importance of adopting new information habits. Rather than asking students to start with having the presence of mind and willpower to “stop” as in SIFT, maybe we should start our process before that “stop” is even necessary by intentionally designing the information environment in the first place.
“Lift Our Gaze” : Teach about Systemic Information Structures
One initial challenge that librarians must address is that it may require considerable motivation for students to take the initial steps to improve their information environments. If students believe that influencers are just as reliable as journalists (or more so), why would they change their habits?
One strategy is to lean into the ACRL Frame “Information Creation is a Process” (2016). Librarians can help students better understand the systems that underlie the information they encounter through the concept of “infrastructural meaning-making” offered by Haider and Sundin (2023). They define infrastructural meaning-making as going “beyond examining the content’s sources, and even beyond evaluating the source’s content, to also be concerned with the institutions and systems, the platforms and algorithms that deliver it to us and onto our devices” (p. 2). To apply this concept, in addition to traditional source evaluation methods like CRAAP and SIFT, instructors would also encourage students to consider why that particular source appeared to them at that time – in other words, how do the conditions of access, along with the information and its source, help us understand the piece of information? (Haider & Sundin, 2019). Algorithmic literacy, situational awareness, and platform knowledge can all contribute to better decisions about whether to pay attention to a particular piece of information (Haider & Sundin, 2023). Fortunately, many simple and creative activities exist to help students understand how algorithms work to impact their information environments (Camarillo, 2025). While digital information infrastructures are often invisible to us (intentionally on the part of platform providers), we benefit from “lifting our gaze” to understand how networked environments impact what information we encounter (Haider & Sundin, 2023, p.3).
With this strategy, it’s important to consider how affective or attitudinal factors might impact students’ source evaluation approaches, and to add instructional interventions that address these factors to typical source evaluation instruction. For example, one researcher found that just teaching algorithmic awareness to students was helpful, but it was limited in its impact because students felt such a sense of powerlessness to shape their online experiences. However, by pairing algorithmic knowledge with activities that promote digital agency, we can help to combat the significant cynicism students feel about their digital environments (Chung, 2025).
Along the same lines, helping students understand how standards-based news is created, especially in comparison to influencer-generated content, can help them view the information landscape with a wider scope, rather than focusing on fact-checking individual claims. In the field of communication, researchers have found that knowledge of how news is produced, disseminated, and consumed can improve misinformation detection (Ashley et al., 2023; Chan, 2024; Chan et al., 2024).
Deliberately Design a News and Information Landscape
Next, students should be encouraged to intentionally seek out reliable information, rather than allow algorithms to determine their information landscape. Research shows that young adults who are exposed to news-rich environments, especially in the classroom, are more likely to develop news consumption habits (Edgerly, 2025; York & Scholl, 2015). In general, people need more help accepting true news than rejecting false news (Pfänder & Altay, 2025), so deliberately undertaking this task could be helpful. Researchers have also found that this approach – focusing on what sources to trust, rather than focusing on the small prevalence of misinformation – can increase trust in standards-based news, rather than fueling cynicism about news (Altay, De Angelis, & Hoes, 2024). However, it’s important to incorporate instruction about negativity bias and click-bait into this process, because research shows that a pessimistic outlook is correlated with self-selecting more negative and episodic news when given the chance to intentionally select news outlets (van der Meer & Hameleers, 2022). Encouraging students to deliberately select reliable information while also helping them break out of their cynical outlooks may improve the effectiveness of this strategy. Recommending platforms like the Good News Network and others that focus on positive news stories can help address the very real mental health concerns of increasing time spent focused on news.
Abstain from Unreliable Information Spaces
Finally, while it may not always be popular, taking time to teach students why social media platforms are an unreliable source of information is essential. These platforms are “firmly grounded in beliefs about individualism, capitalism and consumerism,” not the pursuit of accuracy (Fister, 2021). Librarians might even encourage students to step away from these platforms when possible and to the extent they feel comfortable. This might mean deliberately limiting or eliminating social media accounts, or engaging in phone-free time, which some college students are choosing to do for a variety of other reasons (Beres, 2025). In the habit example above, this is the step when the triggers for the bad habit are removed from the environment, and it is essential to success in new habit formation. Helping students recognize what platforms they engage in that deliver mostly low-quality information is an information literacy issue.
Conclusion
Media literacy skills are essential to today’s college students, and academic librarians are among the few on campus with the expertise and skills to promote these skills for students. However, teaching students quick fact-checking strategies that they must remember and be motivated to use in the moment may not be effective in real-world environments for a variety of reasons, including the power of cognitive biases, the sway of parasocial relationships, the influence of algorithms and generative AI, and the systemic nature many of these problems. To teach students new habits, we should rely less on willpower and more on proactively/preemptively shaping information environments that help students feel empowered, informed, and positive (or at least realistic) about the information landscape.
It’s not as quick and easy as a fact-checking strategy, but helping students understand the information landscape and set up a more reliable information environment may have longer-lasting positive impacts than hoping to instill new habits for them that face considerable challenges to implement. It’s clear that we are facing more cynicism and disengagement from standards-based news and other authoritative information sources than we ever have before. Even with our limited resources, academic librarians can leverage our expertise to help with this major problem and move students towards a healthier relationship with online information. This foundational shift—from fact-checking individual claims to fostering a healthier, more intentional relationship with information—is arguably among the most critical skills college students can learn.
Acknowledgements
I would like to extend my sincere gratitude to editors Ian G Beilin, Jess Schomberg, and, especially, Brittany Paloma Fiedler, for their invaluable feedback throughout the editing process. I would also like to thank Amber Willenborg for her thoughtful peer review of the manuscript. The input of these reflective, considerate people greatly improved the story-telling and flow of the paper, and it ensured that it was as inclusive as possible. Finally, I would like to thank Andrea Baer for significantly contributing to the ideas behind this manuscript through our engaging, helpful, and inspiring discussions.
Works Cited
Ahmed, S., Masood, M., Deng, R., & Malviya, S. (2025). Why cynics disengage: the nexus of political cynicism, misinformation, and online political participation. Asian Journal of Communication, 35(5), 381-402. https://www.tandfonline.com/doi/pdf/10.1080/01292986.2025.2538142
Altay, S., De Angelis, A., & Hoes, E. (2024). Media literacy tips promoting reliable news improve discernment and enhance trust in traditional media. Communications Psychology, 2(1), 74. https://www.nature.com/articles/s44271-024-00121-5
Altay, S., Lyons, B. A., & Modirrousta-Galian, A. (2025). Exposure to higher rates of false news erodes media trust and fuels overconfidence. Mass Communication and Society, 28(2), 301-325. https://doi.org/10.1080/15205436.2024.2382776
Ashley, S., Craft, S., Maksl, A., Tully, M., & Vraga, E. K. (2023). Can news literacy help reduce belief in COVID misinformation? Mass Communication and Society, 26(4), 695-719. https://doi.org/10.1080/15205436.2022.2137040
Aufderheide, P. (1993). Media literacy. A report of the national leadership conference on media literacy. Aspen Institute, Communications and Society Program. https://eric.ed.gov/?id=ED365294
Bobkowski, P. S., & Younger, K. (2020). News credibility: Adapting and testing a source evaluation assessment in journalism. College & Research Libraries, 81(5), 822. https://doi.org/10.5860/crl.81.5.822
Bontridder, N., & Poullet, Y. (2021). The role of artificial intelligence in disinformation. Data & Policy, 3, e32. https://doi.org/10.1017/dap.2021.20
Borland, R. (2013). Understanding hard to maintain behaviour change: a dual process approach. John Wiley & Sons.
Breakstone, J., McGrew, S., Smith, M., Ortega, T., & Wineburg, S. (2018, March). Why we need a new approach to teaching digital literacy. Phi Delta Kappan, 99(6), 27-32. https://doi.org/10.1177/00317217187624
Brodsky, J. E., Brooks, P. J., Scimeca, D., Todorova, R., Galati, P., Batson, M., … & Caulfield, M. (2021). Improving college students’ fact-checking strategies through lateral reading instruction in a general education civics course. Cognitive Research: Principles and Implications, 6, 1-18. https://link.springer.com/article/10.1186/s41235-021-00291-4
Cappella, J. N., & Jamieson, K. H. (1996). News frames, political cynicism, and media cynicism. The Annals of the American Academy of Political and Social Science, 546(1), 71-84. https://www.jstor.org/stable/pdf/1048171.pdf
Chan, M. (2024). News literacy, fake news recognition, and authentication behaviors after exposure to fake news on social media. New Media & Society, 26(8), 4669-4688. https://doi.org/10.1177/146144482211276
Chan, M., Vaccari, C., & Yamamoto, M. (2024). Examining the relationship between dispositional news literacy and discernment of real and misleading news: Cross-national evidence. International Journal of Public Opinion Research, 36(2), edae020. https://doi.org/10.1093/ijpor/edae020
Ciampaglia, G. L., Nematzadeh, A., Menczer, F., & Flammini, A. (2018). How algorithmic popularity bias hinders or promotes quality. Scientific Reports, 8(1), 1-7. https://doi.org/10.1038/s41598-018-34203-2
Clear, J. (2018). Atomic habits: An easy & proven way to build good habits & break bad ones: tiny changes, remarkable results. Random House Business.
Dai Y, Walther JB. (2018). Vicariously experiencing parasocial intimacy with public figures through observations of interactions on social media. Human Communication Research, 44: 322–342, https://doi.org/10.1093/hcr/hqy003.
Duckworth, A., White, R., Matteucci, A., Shearer, A., & Gross, J. (2016). A stitch in time: Strategic self-control in high school and college students. Journal of Educational Psychology, 108(3): 329-41. https://psycnet.apa.org/fulltext/2016-15978-003.pdf
Dunning, D. (2011). The Dunning–Kruger effect: On being ignorant of one’s own ignorance. In Advances in Experimental Social Psychology (Vol. 44, pp. 247-296). Academic Press.
Ecker, U. K., Lewandowsky, S., Swire, B., & Chang, D. (2011). Correcting false information in memory: Manipulating the strength of misinformation encoding and its retraction. Psychonomic Bulletin & Review, 18(3), 570-578. https://doi.org/10.3758/s13423-011-0065-1
Edgerly, S. (2026). Developing the habit: The socialization of US teens into distinct repertoires of news consumption. Journal of Children and Media, 20(1), 132-150. https://doi.org/10.1177/14648849211012922
Fleming, J. (2014). Media literacy, news literacy, or news appreciation? A case study of the news literacy program at Stony Brook University. Journalism & Mass Communication Educator, 69(2), 146–165. https://doi.org/10.1177/1077695813517885
Fletcher, R., Andı, S., Badrinathan, S., Eddy, K. A., Kalogeropoulos, A., Mont’Alverne, C., … & Nielsen, R. K. (2025). The link between changing news use and trust: longitudinal analysis of 46 countries. Journal of Communication, 75(1), 1-15. https://academic.oup.com/joc/article/75/1/1/7907139
Gordon, L. T., & Thomas, A. K. (2017). The forward effects of testing on eyewitness memory: The tension between suggestibility and learning. Journal of Memory and Language, 95, 190-199. https://doi.org/10.1016/j.jml.2017.04.004
Hasell, A., & Halversen, A. (2024). Feeling misinformed? The role of perceived difficulty in evaluating information online in news avoidance and news fatigue. Journalism Studies, 25(12), 1441-1459. https://doi.org/10.1080/1461670X.2024.2345676
Hewitt, A. (2023). What Role Can Affect and Emotion Play in Academic and Research Information Literacy Practices?. Journal of Information Literacy, 17(1), 120-137. https://files.eric.ed.gov/fulltext/EJ1393880.pdf
Hicks, A., & Lloyd, A. (2021). Deconstructing information literacy discourse: Peeling back the layers in higher education. Journal of Librarianship and Information Science, 53(4), 559-571. https://link.springer.com/chapter/10.1007/978-3-030-43687-2_28
Hoffner, C. A., & Bond, B. J. (2022). Parasocial relationships, social media, & well-being. Current Opinion in Psychology, 45, 101306. https://doi.org/10.1016/j.copsyc.2022.101306
Jaiswal, J., LoSchiavo, C., & Perlman, D. C. (2020). Disinformation, misinformation and inequality-driven mistrust in the time of COVID-19: lessons unlearned from AIDS denialism. AIDS and Behavior, 24(10), 2776-2780. https://doi.org/10.1007/s10461-020-02925-y
Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(6), 1420-1436. https://psycnet.apa.org/fulltext/1995-04372-001.pdf
Kendeou, P., & O’Brien, E. J. (2014). The Knowledge Revision Components (KReC) framework: Processes and mechanisms. In D. N. Rapp & J. L. G. Braasch (Eds.), Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences (pp. 353–377). MIT Press.
Kim J, Song H. (2016). Celebrity’s self-disclosure on Twitter and parasocial relationships: a mediating role of social presence. Computers in Human Behavior, 62:570–577. https://doi.org/10.1016/J.chb.2016.03.083
Kurtin KS, O’Brien N, Roy D, Dam L (2018). The development of parasocial relationships on YouTube. The Journal of Social Media and Society, 7:233–252.
Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106-131. http://dx.doi.org/10.1037/a0039684
Lin, F., Chen, X., & Cheng, E. W. (2022). Contextualized impacts of an infodemic on vaccine hesitancy: The moderating role of socioeconomic and cultural factors. Information Processing & Management, 59(5), 103013. https://doi.org/10.1016/j.ipm.2022.103013
Lyons, B. A., Montgomery, J. M., Guess, A. M., Nyhan, B., & Reifler, J. (2021). Overconfidence in news judgments is associated with false news susceptibility. Proceedings of the National Academy of Sciences, 118(23), e2019527118. https://doi.org/10.1073/pnas.2019527118
Martínez-Costa, M. P., López-Pan, F., Buslón, N., & Salaverría, R. (2023). Nobody-fools-me perception: Influence of age and education on overconfidence about spotting disinformation. Journalism Practice, 17(10), 2084-2102. https://www.tandfonline.com/doi/full/10.1080/17512786.2022.2135128
Mulcahy, R., Barnes, R., de Villiers Scheepers, R., Kay, S., & List, E. (2025). Going viral: Sharing of misinformation by social media influencers. Australasian Marketing Journal, 33(3), 296-309.
Muraven, M. (2012). Ego depletion: Theory and evidence. The Oxford handbook of human motivation, 111, 126.
News Literacy Project (2024). News literacy in America: A survey of teen information attitudes, habits and skills. NLP-Teen-Survey-Report-2024.pdf
News Literacy Project (2025). ‘Biased,” “boring” and “bad”: Unpacking perceptions of news media and journalism among U.S. teens. NLP-Teens-and-News-Media-Report-2025.pdf
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175-220. https://doi.org/10.1037/1089-2680.2.2.175
Oreskes, N., & Conway, E. M. (2010). Defeating the merchants of doubt. Nature, 465(7299), 686-687.
Oswald, M. E., & Grosjean, S. (2004). Confirmation bias. Cognitive illusions: A handbook on fallacies and biases in thinking, judgement and memory. Psychology Press.
Pariser, E. (2011). The filter bubble: How the new personalized web is changing what we read and how we think. Penguin.
Park, S., Fisher, C., Fletcher, R., Tandoc Jr, E., Dulleck, U., Fulton, J., … & Yao, S. P. (2025). Exploring responses to mainstream news among heavy and non-news users: From high-effort pragmatic skepticism to low effort cynical disengagement. New Media & Society, 27(7), 4143-4163. https://journals.sagepub.com/doi/pdf/10.1177/14614448241234916
Peng, R. X., & Shen, F. (2025). Why fall for misinformation? Role of information processing strategies, health consciousness, and overconfidence in health literacy. Journal of Health Psychology, 30(8), 2030-2045. https://doi.org/10.1177/13591053241273647
Pennycook, G., & Rand, D. G. (2019). Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition, 188, 39-50. https://doi.org/10.1016/j.cognition.2018.06.011
Pfänder, J., & Altay, S. (2025). Spotting false news and doubting true news: a systematic review and meta-analysis of news judgements. Nature Human Behaviour, 9(4), 688-699. https://www.nature.com/articles/s41562-024-02086-1
Ross Arguedas, A., Robertson, C., Fletcher, R., & Nielsen, R. (2022). Echo chambers, filter bubbles, and polarisation: A literature review. The Royal Society. https://doi.org/10.60625/risj-etxj-7k60
Royzman, E. B., Cassidy, K. W., & Baron, J. (2003). “I know, you know”: Epistemic egocentrism in children and adults. Review of General Psychology, 7(1), 38-65. https://doi.org/10.1037/1089-2680.7.1.38
Saunders, L. (2025). Information literacy as part of an interdisciplinary approach to combat misinformation. Information Research an International Electronic Journal, 30(CoLIS), 424-442. https://publicera.kb.se/ir/article/download/52318/43437
Schirmer, M., Walter, N., & Horvát, E. Á. (2025). Disparities by design: Toward a research agenda that links science misinformation and socioeconomic marginalization in the age of AI. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-178
Seifert, C. M. (2014). The continued influence effect: The persistence of misinformation in memory and reasoning following correction. In Rapp, D. & Braasch, J.L.G. (Ed.s.), Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences (pp 39-71.) MIT Press.
Skurka, C., Cheng, Z., Goyanes, M., & Gil de Zúñiga, H. (2026). News Finds Me as the illusion of competence: evidence for overconfidence in discernment of political misinformation. Human Communication Research, 52(1), 11-23. https://doi.org/10.1093/hcr/hqaf015
Stein, R., Rutchick, A. M., Sin, A. Y., & Jarrin Rueda, L. F. (2025). Symbolic show of strength: a predictor of risk perception and belief in misinformation. The Journal of Social Psychology, 1-27. https://doi.org/10.1080/00224545.2025.2541206
Sullivan, M. C. (2019). Why librarians can’t fight fake news. Journal of Librarianship and Information Science, 51(4), 1146-1156. https://doi.org/10.1177/0961000618764258
Thi, P. V., & Ibrahim, A. (2025). Influencer credibility and authenticity in the fight against misinformation. Feedback International Journal of Communication, 2(3), 205-215. https://doi.org/10.62569/fijc.v2i3.199
Valgarðsson, V., Jennings, W., Stoker, G., Bunting, H., Devine, D., McKay, L., & Klassen, A. (2025). A Crisis of Political Trust? Global Trends in Institutional Trust from 1958 to 2019. British Journal of Political Science, 55, e15. https://doi.org/10.1017/S0007123424000498
van der Meer, T. G., & Hameleers, M. (2022). I knew it, the world is falling apart! Combatting a confirmatory negativity bias in audiences’ news selection through news media literacy interventions. Digital Journalism, 10(3), 473-492. https://doi.org/10.1080/21670811.2021.2019074
Van Der Meer, T. G., Hameleers, M., & Ohme, J. (2023). Can fighting misinformation have a negative spillover effect? How warnings for the threat of misinformation can decrease general news credibility. Journalism Studies, 24(6), 803-823. https://doi.org/10.1080/1461670X.2023.2187652
Wack, M., Duskin, K., & Hodel, D. (2024). Political fact-checking efforts are constrained by deficiencies in coverage, speed, and reach. arXiv preprint arXiv:2412.13280.
Walter, N., Cohen, J., Holbert, R. L., & Morag, Y. (2020). Fact-checking: A meta-analysis of what works and for whom. Political Communication, 37(3), 350-375. https://doi.org/10.1080/10584609.2019.1668894
Westlund, O., Belair-Gagnon, V., Graves, L., Larsen, R., & Steensen, S. (2024). What is the problem with misinformation? Fact-checking as a sociotechnical and problem-solving practice. Journalism Studies, 25(8), 898-918. https://www.tandfonline.com/doi/pdf/10.1080/1461670X.2024.2357316
Willenborg, A., & Detmering, R. (2025). ” I don’t think librarians can save us”: The material conditions of information literacy instruction in the misinformation age. College & Research Libraries, 86(4), 534. doi:https://doi.org/10.5860/crl.86.4.535
Wineburg, S. & McGrew, S. (2019). Lateral reading and the nature of expertise: Reading less and learning more when evaluating digital information. Teachers College Record: The Voice of Scholarship in Education, 121(11): 1-40. https://doi.org/10.1177/016146811912101102
Wood, W., Labrecque, J. S., Lin, P. Y., & Rünger, D. (2014). Habits in dual process models. Dual process theories of the social mind, 1, 371-85.
York, C., & Scholl, R. M. (2015). Youth antecedents to news media consumption: Parent and youth newspaper use, news discussion, and long-term news behavior. Journalism & Mass Communication Quarterly, 92(3), 681-699. https://doi.org/10.1177/1077699015588191