June 02, 2026

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2026-05-28: If LLMs can write abstracts, what's our job? The Uncanny Valley and Gell-Mann Amnesia Effect in the ACM Digital Library

 If LLMs can write abstracts, what's our job? 

The Uncanny Valley and Gell-Mann Amnesia Effect in the ACM Digital Library


Michael L. Nelson

2026-05-28


I serve on the ACM Digital Libraries Board, and we are navigating a number of changes to the ACM's Digital Library, which as a professional society and memory organization, is arguably the ACM's primary asset.   A recent article (March, 2026) by Jack Davidson and Wayne Graves provides a status update of the ACM's move to open access, which includes establishing a "basic" and "premium" service level. Although there are some questions regarding the long-term implications of moving to open access, I, and presumably all authors, welcome the ACM's bold strategy for ensuring that our content reaches the widest possible audience.  


Jack's and Wayne's article also addressed the DL's recent experimentation with AI/LLM enrichment of articles, specifically landing pages.  And unfortunately, the experimentation got off on the wrong foot.  Just before the holidays in 2025, the landing page for articles in the DL added AI-generated summaries as a sort of alternate or rival abstract.  To make matters worse, these summaries were shown by default, and users had to select a tab to show the original, author-supplied abstracts.  The figure below is an example taken from Dr. Casey Fielder (CU Boulder), whose social media post about the summaries being shown by default instead of the abstracts gained a lot of traction. 


AI-generated summary shown by default (2025-12-16) for https://doi.org/10.1145/3706598.3713322 


Fortunately, the expected behavior of showing the authors' abstract by default returned very quickly, and the AI-generated summary is now clearly marked as such, including the date that the summary was generated:



Author-generated abstract is now shown by default https://doi.org/10.1145/3706598.3713322 



The AI-generated summary is now clearly marked as such, and includes the date the summary was generated https://doi.org/10.1145/3706598.3713322 


First, let me be clear: showing the AI-generated summary by default instead of the authors' abstract was a terrible idea and was uniformly rebuked.  The DL board was not informed that this was going to happen, and I can't recall anyone on the DL board even suggesting it; perhaps it was just an oversight by an ACM staff member or engineer at Atypon. I don't recall exactly when the expected default behavior was restored, but it was soon after the author community complained. 


My original suggestion at the DL board meetings (echoed by Dr. Fiesler) was to provide wiki-style editing on the AI-generated summaries, possibly limited to logged-in authors (a possible premium feature?).  One can make a good argument for either opt-out or opt-in, but neither option adequately addresses the problem of the sizable back catalog of unreachable authors (JACM began in 1954).  


But what I find interesting is the level of author backlash against AI-generated summaries, at least as I observed on social media.  This is all anecdotal, and I realize people don't post about things for which they are neutral or have even mildly positive feelings about because, let's face it: carping is a lot more fun.  But Dr. Fiesler and the others in the thread are all reasonable people and aren't just trolling. I think there's something more fundamental happening.  I think our collective reaction (revulsion?) to AI-generated summaries can be explained by adapting two phenomena: the Uncanny Valley, and the Gell-Mann Amnesia Effect.  


The Uncanny Valley is an hypothesis that posits that our emotional response to depictions of humans (expressions, speech, movement, etc.) initially rises as the likeness becomes more human-like, and then takes a sharp dive as the likeness becomes nearly human-like but not quite. Basically, most cartoon characters, anthropized animals, etc. are "cute", but the more realistic animated humans in movies like "Polar Express" (2004) are just creepy.  



The Uncanny Valley (Source: Wikipedia


I propose that something similar happens with text.  Most authors have no problem with AI tools enriching the work, for example: language translation, extracting citations, repairing/rewriting hyperlinks, suggesting related works, suggesting/assigning keywords and ACM CCS values, and any number of other services and derived content.  But generating a summary that rivals the abstract?  Yuck.  No thanks.  An error in citation parsing or CCS assignment?  Meh, who cares, either ignore it or fix it, but no one takes to social media to complain.  A subtle but detectable (if only by the author) error in a summary?  That's glaring and viscerally wrong. And even if we can find no substantive errors, knowing the text is AI-generated, we will find fault with phrasing, the structure, and various minutiae (cf. humans' negative attitudes to replicants in Blade Runner).  Extracting keywords is what computers do. Writing abstracts is what we do. If LLMs can write abstracts, what's our job? 


Those assessments inevitably derive from us reviewing AI-generated summaries of our own work.  Presumably, no one knows the material better than us, so the best anyone / anything else can do is be "as good as", certainly not "better". We're writing for our peers, and we share a nuanced, high-bandwidth vocabulary that outsiders just can't appreciate.  On the other hand, if we have to read articles outside of our area of expertise, we often wonder why are the authors so obtuse? Why can't "those people" just write plainly?  


gocomics.com/calvinandhobbes 


This is the essence of the Gell-Mann Amnesia Effect, which was coined by Michael Crichton to describe the phenomena that the more you know about a topic, the more likely you are to see the flaws in a third party analysis, but at the same time not being as critical when that same third party summarizes a topic on which you are not an expert. Anyone who has been interviewed by the media has experienced this: the reporters inevitably butcher your hour-long exposition, provided in painstaking detail, covering all the nuances, edge cases, historical review, and possible future directions – all reduced to a minute or less of decontextualized soundbites. But that news outlet suddenly becomes a trusted and valuable source when they cover a topic outside of your expertise.  


dilbert.com


I suspect the Gell-Mann Amnesia Effect applies to AI-generated summaries as well: they are an abomination when applied to my work, but a useful de-jargoning tool for exploring unfamiliar or even adjacent sub-fields.  This even presupposes that there should be multiple AI-generated summaries, aimed at different audiences (e.g., lay person, High School, undergraduate, researcher).  In fact, the rival abstract in Dr. Fiesler's example might be the least useful summary, precisely because it does rival the author's abstract.  But writing for audiences other than our own is a different skill set: writing for my fellow researchers at JCDL, Hypertext, Web Science, etc. is what I do, but writing for high schoolers is not what I do.  Casting my work into something appropriate for high schoolers would be a good use of LLMs, and simplifications (if not outright errors) are to be expected.  


In summary, I think it's natural to feel revulsion when the LLMs are used to rival our work: it falls into the textual uncanny valley, in a way that other generative works, such as translation, do not (at least not currently).  But at the same time and based on the Gell-Mann Amnesia Effect, our harshest judgement of AI-generated summaries is reserved for areas in which we are an expert, and our assessment of AI-generated summaries improves as we apply them to areas further from our own.  


With that in mind, it would make sense for the ACM DL to enable wiki-style editing on summaries, move away from the model of a single summary that rivals the author's abstract in length and complexity, and introduce multiple summaries, tailored to audience and intended purpose. 



–Michael 


2026-05-29 Update: I was chatting with Martin Klein, and he informed me that bioRxiv introduced in late 2023 on-demand summaries at variable reading levels. bioRxiv is far from my field, so I'm not completely clear on its status as a production service or just a prototype. For example, this recently published preprint doesn't show the option for AI-generated summaries: 


 

Clicking on the "Automated Services" for the recently published https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1


…shows "There are no automated services for this paper."  


However, I was able to find this preprint from a year ago that does have that option available:


The "Automated Services" option is active for https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1 


When clicked, the default AI-generated summary is for the "General" audience:


 

The "General" AI-generated summary for https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1 

The "Expert" AI-generated summary for https://www.biorxiv.org/content/10.1101/2025.05.23.655690v1 


Are these good summaries?  I guess so – although I'm not sure what else to evaluate them against. I don't know the first thing about proteomics, so the "General" summary is certainly the most accessible to me.  The "Expert" summary is more detailed than the "General" summary, but still more accessible to me than the authors' abstract. That's not a surprise because 1) I haven't studied biology or chemistry since High School, some 40 (!) years ago, so Schär et al. aren't writing for me, and 2) the summaries are both about half the length of the authors' abstract. I saved all three into separate files:


% wc -w bio-*txt | grep -v total

     219 bio-abs.txt

     107 bio-expert.txt

      88 bio-general.txt


Two hundred words is a good target for abstracts. I'm guessing the prompts for the AI-generated summaries had a target of about 100 words, so by design even the "Expert" summary will not rival the authors' abstract (though metadata and wiki-style editing would be nice). The "Automated Services" tab has at the bottom a link to "Explore Further on ScienceCast":


The target of the "Explore Further on ScienceCast" link https://sciencecast.org/casts/jpdm4k710oet 



I don't have an account (yet) on ScienceCast, so that's the end of my exploration for now.  But there's clearly a bigger AI↔paper ecosystem to explore, for both me personally and the ACM DL.  


–Michael 


2026-06-02 Update: In another chat with Martin Klein, and had just discovered the institutional repository at Niigata University. It does not a native English interface, so all of the translations shown below are via Chrome and thus a little clunky.  When you first visit the repository, it asks you to choose a persona or level from three choices: "adult", "junior and senior High School students", and "Elementary school student"


Choosing a persona when visiting https://repolab.lib.niigata-u.ac.jp/ 


Selecting a persona brings up the search page (with the persona changeable via a dropdown menu in the upper right-hand side):


Search page for https://repolab.lib.niigata-u.ac.jp/ 


I did a search for "web archiving".  The hits are not especially relevant (perhaps no one at Niigata is active in the field), but they are sufficient to demonstrate the personas.

Result #1 in the SERP for https://repolab.lib.niigata-u.ac.jp/ 


Clicking on "View AI explanation", there are three tabs corresponding to the three personas previously introduced:

AI Explanation for Middle and High School Students https://repolab.lib.niigata-u.ac.jp/records/record-2000416/  


AI Explanation for Adults https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 



AI Explanation for Elementary School Students https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 


Chrome's translation for Elementary School students is not smooth, but I'm guessing that's an issue with Chrome and not the LLM that Niigata is using – presumably there is less training data for translating "children's" Japanese?


The landing page Niigata's institutional repository does have the regrettable "embedded PDF" interface, and it does list a truncated "AI Explanation" above the "Summary by the author" (to be fair, perhaps it's named "summary by the author" instead of "abstract" is a function of the translation) 

Top of the landing page for https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 


The bottom of the landing page for https://repolab.lib.niigata-u.ac.jp/records/record-2000416/ 



It is a little hard to evaluate this three-level approach, since there's the added dimension of language translation.  But it feels like an interesting application of LLMs, and aside from being listed at the top of the SERP, it does not seem to be in competition with the authors' abstract.  


Note that the landing page displayed above is likely an experimental and/or local UI since it is hosted at niigata-u.ac.jp, and is very different from the more conventional looking landing page for associated the handle which resolves to nii.ac.jp


The handle http://hdl.handle.net/10191/0002000416 resolves to https://niigata-u.repo.nii.ac.jp/records/2000416  



I appreciate all of Martin's suggestions and pointers, and welcome more from other readers.


–Michael



*Apologies for including Dilbert, but the options for Gell-Mann Amnesia Effect cartoons are limited. 



by Michael L. Nelson (noreply@blogger.com) at June 02, 2026 10:01 PM

David Rosenthal

AI's PR Problem

J.P. Morgan hits photographer with cane
This is just a brief post to explain to my old boss, Eric Schmidt, why he and his ilk are getting booed at college commencements, and why laws against data centers are getting passed. The explanation is below the fold.

Let us start from an under-appreciated fact. Paul Campos reports that:
The college wage premium, that is, the increased earnings associated with having a college degree as opposed to only being a high school graduate, hasn’t changed at all in the past 25 years, because median real wages have been flat as a pancake for everybody, no matter what their formal education level, for the past quarter century.
But:
I wonder what’s happened to capital over this time? Value of S & P 500, inflation-adjusted, 1/2000 to 9/2025 (same period as the wage data):

2000: $1,394

2025: $6,688
On average, for more than the students' entire lives, stock-owners like Schmidt and (to a much lesser extent) I have stolen every last drop of the productivity increase of US workers at every age and education level. (See the actual numbers in the appendix)

Now, the perpetrators of this theft are telling their victims, the students and the public at large, that whether they like it or not they will be subjected to AI because that will make the perpetrators even richer. The victims have been informed that this new technology will:
Nothing better illustrates the contempt of the Epstein class for the proletariat than that these oligarchs would expect the graduating class to enthusiastically accept this prospect.

Appendix

Here are the actual numbers from Paul Campos' 25 years of flat wages and no increase in the college wage premium, while value of capital has skyrocketed:
I was fooling around with FRED this morning, as one does, and here are some stats: (The FRED numbers are presented in nominal dollars; I’ve converted them to CPI-adjusted dollars).

Median usual weekly earnings of workers with a high school degree only:

2000: $968

2025: $980

Median usual weekly earnings of workers with a bachelor degree only:

2000: $1,587

2025: $1,580
...
Median usual weekly earnings of people with a bachelor’s degree or higher:

2000: $1,705

2025: $1,747
Here is a short list of YouTube videos on this topic: As a boomer, I think this post might be the exception that proves Ms. Baba's rule.

Note that every single one of the ads that I saw watching these videos in an incognito window was advertising an AI company! As are 49% of all the billboards in the Bay Area. Read the room, guys!

by David. (noreply@blogger.com) at June 02, 2026 03:00 PM

Harvard Library Innovation Lab

An Automated Data Monitoring Toolkit and the AI Benchmarking Exercise at the Public Data Project

This post is being shared on both the dataindex.us newsletter and the Library Innovation Lab Blog.


“Is data changing? Is it being disappeared? How do we know? How can we know?” This interrogative refrain rang through just about every conversation I had when, almost a year ago, I came to Harvard Law School Library to lead the Public Data Project. Thanks to the dataindex.us Data Checkup, a plan is in place to do this complicated but essential work. Through the careful scaffolding dataindex.us has constructed and the assiduous research of its staff, more than a dozen federal datasets have “health assessments” and the team continues to add to this list.

In October 2025, the Public Data Project partnered with dataindex.us to develop a data monitoring toolkit that could both work at scale and be user driven. In addition to creating an automated tool that can process large numbers of datasets, we also want the user to determine which datasets they want to monitor. Let’s face it, when it comes to federal data, one person’s byzantine, inscrutable dataset is another person’s trove of invaluable ground truth. The anecdotes of data use collected by essentialdata.us offer varied examples of the ways people benefit from federal datasets. The range of uses are a clear indication that people need to be able to monitor the data that matters to them.

At the Public Data Project, we are creating a toolkit that will enable users to detect and monitor changes to federal datasets over time. It will enable users to select a dataset and track changes within the data itself, as well as to automate the monitoring of external sources that indicate whether the data might be changing. Indicators of change to a given dataset range from somewhat obvious sources, like major news sites, to more obscure sources, like the U.S. Code. At present, our tool development has produced two components.

First, Binoc is a command line tool and library to generate changelogs for datasets that don’t have them.

Scanned illustration depicting a man made out of optical equipment; advertisement for L. Srisheim, optician (ca. 1840) Advertising card for L. Srisheim, optician. Source: American Antiquarian Society.

Unlike generic diffing utilities intended to describe line-level differences in plain-text content such as source code or Markdown, Binoc aims to efficiently summarize changes in real-world datasets, including file additions and deletions, row-level updates, and schema alterations. Given a series of dataset snapshots captured at different points in time, Binoc detects what changed, expresses any changes as a minimal structured diff, and produces a human-readable summary. Binoc is currently in a collaborative design phase of development, with new features being added regularly. We welcome feedback from early adopters.

We have also begun the research for a second component of the data monitoring toolkit development.

Photograph of cast bronze USGS benchmark Cast bronze benchmark. Source: United States Geological Survey.

We have created an AI benchmarking exercise to compare and to evaluate how well AI can monitor data and assess its risk when considered next to the processes and conclusions of a careful researcher. The goals of the exercise are to:

We have conducted an initial test run of this exercise with a group of 10 information professionals. After introducing the participants to the dataindex.us rubric to assess the risk level of a given dataset, each participant was assigned a dataset and asked to evaluate it across three of the six risk dimensions outlined in the rubric. Each participant was either assigned the first three dimensions — Historical Data Availability, Future Data Availability, and Data Quality — or the latter three — Statutory Context, Staffing and Funding, and Policy. For the first hour, participants more or less worked alone, diligently researching a subject that they lacked expertise in, but which they had clear guidelines for the kind of information they sought. Participants then opened ChatGPT, and fed it prompts that we had scripted and tailored for each dataset. First in a form that asked them specific questions and then as a group compared their results with ChatGPT’s, participants reflected on their findings. Going through their three assessment dimensions, participants compared their conclusions to that of AI’s, reflecting on what AI missed, what they missed, and on what parts of the rubric may have led to confusion.

This exercise gave us an early insight into the potentials and pitfalls of AI’s ability to assess data risk, as well as ways in which we might tweak both the exercise and the assessment rubric. This group of participants were information professionals, not policy wonks, and we are eager to see how area specialists’ experience might lead to different outcomes in this exercise. In addition, we want to experiment with prompt engineering and give participants more leeway in their interaction with AI. In the next iteration of the exercise, we will rely on the transcription of each participant’s interactions with AI for analysis, rather than asking individuals to respond in a form.

What we liked most about this exercise, however, was the collective reflections not just on AI, but on public data more generally. One participant described it as an “excellent empathy-building exercise” because, through the work, both alone and as a group, participants become aware of the importance of and perils to public data. They reflected on if and how to translate their own empathetic experience to AI.

by Molly Hardy at June 02, 2026 01:00 PM

June 01, 2026

LibraryThing (Thingology)

June 2026 Early Reviewers Batch Is Live!

Win free books from the June 2026 batch of Early Reviewer titles! We’ve got 251 books this month, and a grand total of 3,098 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.

If you haven’t already, sign up for Early Reviewers. If you’ve already signed up, please check your mailing/email address and make sure they’re correct.

» Request books here!

The deadline to request a copy is Thursday, June 25th at 6PM EDT.

Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the UK, the US, Canada, Australia, Germany, New Zealand, Ireland, Malta, Italy, Latvia and more. Make sure to check the message on each book to see if it can be sent to your country.

Employee No. 9The Weight of AngelsFood Is Medicine: Healing Our Bodies, Nourishing Our Minds, and Transforming Our Food SystemA Voice Like Mine: A MemoirLast Seen in Sea IsleAmarisa's Cooking Pot: Tales of Life in All Its WondersMount MiseryFunjeepups: A Beautiful SongFunjeepups: A Star WishGood Families Don'tMy Best Friend Is a Butternut SquashToad on the GoThe Wise PickleWhale, That Was UnexpectedConfessions of the Green River Killer: A True Story of Manipulation, Madness, and a Search for JusticeThe Roman Holiday RuleThe Crazy TestI Know ThingsJonathan's JournalEvery Nanny Before MeAntitherapiesRedworkThe Rise and Fall of the Republic of West Delphi: A MemoirNo One Will Ever Hear You: StoriesA Day in the Life: An NPC LitRPG AnthologyThe Set of All SpiesThe Set of All SpiesSwitching SidesBubbles, Roses, and RumpIs It Poop?: A Guessing Game With Poop and Animals That Look Like PoopRuptured: Jewish Woman in Australia Reflect on Life Post - October 7DiodeDestiny or DefeatErkül Bwaroo, Elf Detective - At Your ServiceWoman Outside the CityShoulders of GiantsThe Very Unremarkable Life of Mrs. Etty BloomParallel CircuitsThe Durbar's ReckoningPadani: A Family StoryA Misfit's Guide to Magic and MayhemQuo Vadis, Jane Mitchell?Wave 2: The SequelEden at DawnThe Necklace of Seven SoulsLong WeekendThe Chumash of Heroes: Bereshit (Genesis)The Girl Who Watched the Trains DepartWriting Memoir in Flashes: Creative Ways to Tell Your True Stories, One Memory at a TimeExalted ObjectsCircus of the Vanishing ElephantTessa's LandingDispatches from Grief: A Mother's Journey Through the UnthinkableThe Curse of Teed HouseHeracles: The Hydra of LernaPhantom of the GalleriaThe Quaint Convictions of Kit BennetThe Timeless Teachings of Conny Mendez: An Essential Collection of Metaphysics in Plain EnglishOcean Animals: An Animal Guessing Game Book Full of Fun and FactsLove in the AbstractThe Only CatchBigger Than VersaceBrandedBareThe Shattered MirrorA Voice of WrathLike MagicDevoted to His SwordWoman Afraid of WaterHunter's BloodBrutal Country: Ten Short StoriesFinding My Way Through Cancer: A Gentle Journey Through Early-Stage Lung CancerWhere Water Meets Sky: An Isekai RomantasyScales of DestinyChange By Doing Nothing: The Hidden Science of Self-Sabotage and Why You Can Change Only When You Stop Forcing ItThe WindowZombies of the Upper East SideEncoded Minds: A Biological ThrillerNot All HandsirlGood Grooming and a Healthy Respect for AuthorityThe Ash Cycle: How the Trypillians Defeated Urbanization Through FireBrilliant Life: The 5 Science-Backed Pillars to Boost Energy, Improve Sleep, and Build Healthy Habits That LastOnly Breath and ShadowTo The Moon and BackRest For The Weary: Biblical Support for Autistic Burnout and RecoveryTakes One to Know OneFlash PointThe Relief of Not Knowing: Stop Overthinking Decisions Start Trusting YourselfThe Fifth SilenceAfter the Altar: Living the Promises of the Wedding DayMaillane: That Morning Sun Comes Rising UpDeath in the End ZoneThe Devil of Tarsyn ForestThe Agentic CMO: How Artificial Intelligence Is Rewriting the Rules of Marketing LeadershipTech Equity: Freedom Through Enabling Technology: A Dream Officer's Playbook for Tech Equity in Disability & Aging ServicesBio-Logic Herbalism: Evidence-Based Natural Herbal Remedies & Home Apothecary Protocols for the Whole FamilyHome Apothecary for Healthy Lifestyle: The Practical DIY Herbal Guide for Household Use, Immune Support, and Natural First AidGoodbyeThe Journey Beyond the MapI Know When You're AsleepChronicle of the Stellar BridgesThe Legend of KaaliThe Family LiarBlue Year: A Literary Lesbian Erotic NovelBlack Hole Guns For HireHow to Show Up for Your Life: Start Living the Life You were MEANT to LiveParents Praying… Heaven Answers: When God Hears the Cry of Parents and IntercessorsTinkerHow to Be Enough: Your Worth Isn't Up for NegotiationThat McKenzie GirlMurder on the Shuffleboard CourtThe Light That Devours the SkyFreedom Quest: A Love StoryA Touch of Magic & DesireThe Last Sethu QueenThe Ever-Changing SandsA Spark of Earth and FlameHealing Your Inner Bestie: A Practical Guide to Master Self-Love and Overcome Self-DoubtLebanon: A Country for No One & EveryoneOutsphereBuduneliEvery Last BoneIntrospection: Exploring the Racialized Politics and Conception of Ideal-Blackness Within African American CultureBetween Home and Silence: A Memoir of Family, Silence, Work, Migration, and Survival Between Two WorldsHold On to MeWhere the Willow BendsThe Redux of Sam MurdochArcadian AlcoveEveryone Kept Quiet So I Did Too: Tales of a Reluctant SoldierSire, Oleander Isn't Dead! (Yet)The AwakeningImago Nine: The Popstar ApocalypseBlood ForgedThe CriticWhispers on FlowersDead ExitDead ExitNursing FlagstopDon't Believe a WordUnshaken: A 30-Day Anxiety Management Workbook for High-Functioning MenThe RanchThe Soufflé Also RisesSketchGates of LoryndasTY, Thel: Films of Thelma RitterA Vamp RevampedThe Supreme LunaThe Three Creature CurseWhat Hears YouAltars in the Ruins: Twenty-Five Sermons from the Ruins Redeemed by GraceCathedral of Scars: Fifteen Sermons from the Ruins That Became SanctuaryWaterspoutThe 1-Thing Way: A Sustainable Path to Reach Your Goals Without BurnoutReclaim Your Body for Life: The 1-Thing Way to Sustainable Fat Loss, Metabolic Health, and EnergyRadical Son Back to RootsThe Man with the Blue Suede ShoesThe Divine Feminine ScentThe Hollow Gospel: Scripture of WoundsConspiracy In TimeA Small Tree in a Texas Hurricane: A MemoirWhat No One Tells You About Caring for an Aging Parent: Real-Life Lessons, Emotional Survival, and Practical Wisdom From 14 Years as My Mother’s CaregiverA Perfectly Normal Childhood (and other lies I tell myself)Sterne: ValerieWho Is Singing?The Devil You KnowBuying Wealth With Money: A Workbook On LegacyH. A. L. T. Own Your Emotions: A Workbook on Self-ControlTeen Slang for Parents: What Your Kids Are Actually SayingMy Voice: A Guide to Mastering Life, Truth, and PurposeThe Park RaceA Tale of Two Chinas: A Fifteen-Year Odyssey Through China's Cultural HeartlandsAngel's SalvationDiddly Duggins and the Great Memory MisplacementA Devil AmidstHarbinger of DarknessThe Next Hundred Years404 Love Not Found: The Story of Harper and JonahThe Resilience of Red ThreadMurder At The Radio StationThe Summer That Changed UsSultana: The Last Road Home: The Titanic of the MississippiThe FalsehoodThe Thirteenth DreamThe One24 Hours to ForgetPelagic ShoresAre We Friends Yet?: How to Deepen Your Relationships and Create the Community You NeedMAX and the Beanstalk!HeliumThe Dying TideA Selfless Marriage: How Mutual Service Rebuilds Love, Respect, and Emotional ConnectionAI Adventures with Maya and ByteThe Quiet Night HugWhere Does It Live?: Learn Where Emotions Live in Your Body and What to Do About ThemOrdinary SoulsCardboard SpaceshipThis Sea WithinThe Statistically Unlikely ReunionThe Statistically Unlikely ReboundTicket to MarsThe Moonscorn MandateAchieve Financial Peace Budget Planner: 12 Month Practical Debt Workbook for Beginners in Large SizeThe Shipton PrincipleCrown and ChronosWalking Along the Ancient Tokaido Road: A Pilgrim's Path: Adventures and Transformations (Vol. 1: Departure)Walking Along the Ancient Tokaido Road: A Pilgrim's Path: Adventures and Transformations (Vol. 2: Insight and Memories)My Mother Said My NameWhispers on FlowersCome, Play with Me: Writer's Camp 3rd AnthologyReflections from a ShoeboxHaiku Redo: A Collection of Haiku, Companion Pieces, and Space for Your OwnHow to Conquer The BillionairesKink-Affirming Therapy Worksheets: A Clinician’s Guide to Sex-Positive and Consensually Non-Monogamous IntegrationIn the Flesh: Why Manifestation Fails in Your Head, and Works in Your BodyCatamorphosisThe Fortress of UsLeo and the Dragon of Sound: A Journey Through the Kingdom of NoiseNotes on HopeKevin The Werewolf: Shattered MoonThe 7-Day Dopamine Detox: A Beginner's Guide to Unplugging, Resetting, and Not Falling Apart OnlineAuthor and Finisher Volume IBecause I Deserve It: What Chronic Illness Taught Me about Finding My Voice in the Healthcare SystemCalling Out the Shadows: A Father's Stand Against the CurrentKiera and Lamby: TokyoCalling Out the Shadows: A Father's Stand Against the CurrentOdysseyThe Brink of Becoming: Designing a Future Beyond Zionism and Cultural ProgrammingThe Last Summer on Hawthorne StreetEternalWe Never Signed the ContractWildfire & The Sun PrinceShadow & The Air TricksterThe Cave of Past and PresentMary FalconDeadly GroundThe Vow RewrittenLost HeroRepatriated: Sons of the SoilSame IceOur Lady of the ArtilectsFart, Laugh, and Be Happy: Inspiring Bathroom Humor Stories to Uplift Your SpiritThe Great Bathroom Humor Cover-Up: An Investigation into the Lost History of Bodily Function ComedyThe Coin of ForeverA Literary Offering: Observations & CommentaryStriking JusticeThe Question of When: A Practical Guide to Knowing When It's Time for Assisted Living, Memory Care, or Skilled NursingThe Protector and the AnnihilationCornelius & The Sneak Goose AttackBlind ItemThe Echo She Left Behind

Thanks to all the publishers participating this month!

ALIO Publishing Group Attwater Books Autumn House Press
Bricolage Lit Brother Mockingbird City of Words
City Owl Press Crooked Lane Books Entrada Publishing
Flat Sole Studio Gefen Publishing House Haven
Henry Holt and Company History Through Fiction HTF Publishing
Inferno Books Infinite Books Inkd Publishing LLC
LaPuerta Books and Media Learning Spark Educational Publishing NeoParadoxa
Plexus Publishing, Inc. Pocketbook Press PublishNation
Restless Books Riverfolk Books RIZE Press
Rootstock Publishing Running Wild Press, LLC Somewhat Grumpy Press
Thinking Ink Press Tundra Books Tuxtails Publishing, LLC
Type Eighteen Books University of Nevada Press University of New Mexico Press
UpLit Press WorthyKids

by Abigail Adams at June 01, 2026 06:15 PM

Digital Library Federation

DLF Digest: June 2026

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here

Happy June, DLF community! Thanks to everyone who participated in Community Voting for the 2026 Virtual DLF Forum. We appreciate your input as we work with the Forum Planning Committee to build this year’s program.

Look out for updates this month: the program release, registration opening, and Digital Storytelling Fellows applications. We’re excited to share what’s next!

Warmly,

-Shaneé

This month’s news

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus conferences and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member organization. Learn more about our working groups on our website. Interested in scheduling an upcoming working group call or reviving a past group? Check out the DLF Organizer’s Toolkit. As always, feel free to get in touch at info@diglib.org

Get Involved / Connect with Us

Below are some ways to stay connected with the digital library community and us: 

The post DLF Digest: June 2026 appeared first on DLF.

by swillis at June 01, 2026 12:00 PM

Ed Summers

Inked clouds

Inked clouds

June 01, 2026 04:00 AM

Xe Iaso

"No way to prevent this" say users of only package manager where this regularly happens

In the hours following the news that Redhat Insights' JavaScript packages fell victim to a supply chain attack via NPM, developers and systems administrators scrambled ensure all of their projects were unaffected from a supply chain attack that steals credentials for AWS, GCP, Azure, Kubernetes, HashiCorp Vault, npm, and CircleCI before then self-propagating via said stolen npm credentials and the bypass_2fa setting. This establishes persistence via Claude Code hooks and VS Code task injection. If you have installed the affected package, reprovision your development hardware. This is is due to the affected dependencies being distributed via NPM, the only package manager where these supply-chain attacks regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Lady Eulah Howell, echoing statements expressed by hundreds of thousands of programmers who use the only package manager where 90% of the world's supply-chain attacks have occurred in the last decade, and whose projects are 20 times more likely to fall victim to supply chain attacks. "It's a shame, but what can we do? There really isn't anything we can do to prevent supply-chain attacks from happening if the maintainers don't want to secure access to their accounts in a robust manner". At press time, users of the only package manager in the world where these vulnerabilities regularly happen once or twice per week for the last year were referring to themselves and their situation as "helpless".

For more information, please see upstream documentation published by Redhat Insights' JavaScript packages at the following link: redhat-javascript-clients-06-2026.

June 01, 2026 12:00 AM

Library | Ruth Kitchin Tillman

Systems Life: Navigating the Distributed Database

This post is the first in a series in which I write about experiences or specific challenges from my day-to-day work. Planned posts include descriptions of a bug and how this impacted the coworkers, how I wrote a script to parse log data… I’m hoping that these will be interesting for other librarians that work in entirely different areas, for my colleagues who are solving different problems on different systems (or maybe eventually the same one after we migrate), and for those who are thinking about doing this kind of work in the future.

When we talk about the ILS or LSP, it can sound like we’re talking about a single system. And we are, some of the time. But just like our permissions shape what we can see and do, the ways we access the system and its data may lead to entirely different experiences. More importantly, if you don’t know how different tools and even databases work, you may end up with inaccurate results or not knowing that something is possible.

For example, our Sirsi ILS and reporting system(s) consist of two separate databases. These databases can be accessed in: one way for most folks (two for people using a BLUEcloud module), two-to-three ways for some, four ways if you’re special, and five ways if you’re one of two people.

Diffusion of Databases

The Sirsi Symphony Database, fka Unicorn1, underlies the whole thing. This Oracle database is the ultimate database of record. If we load MARC, it ends up in the Symphony database. If we place orders, they become entries across Symphony tables. If we loan materials, it triggers a series of updates in the Symphony database.

BLUEcloud Analytics runs off a separate database, also Oracle.2 This separation is common and appropriate. Alma also uses a separate Oracle database and FOLIO has the option of Metadb built with PostgreSQL. The analytics databases don’t contain live data. Instead, they’re updated regularly overnight, based on things that have occurred in the primary database. Change a title? It’ll show up in analytics tomorrow.3 Check out a book? That transaction will show up in circ stats tomorrow.

This is an appropriate choice for three reasons:

  1. It’s a bad idea to run large analytical queries on production. Plus, static indexes are much more efficient to search.
  2. The analytics system has no real demand overnight, so its server can do a full reindex before running any scheduled jobs.
  3. The analytics database can be designed differently.

Following that last point, the analytics database isn’t just a snapshot of production. It has a fundamentally different design. It anonymizes circulation transactions, but it also builds completely different indexes from the ones we need for daily work. For example, it indexes circulation data by hour, day, month, and year as well as by circulation desk. Sometimes we want big numbers. Sometimes we want to see which desks get the most traffic. Those aren’t the kind of searcehs we need to do in day-to-day work. It indexes MARC as fields and subfields, including invalid ones like λ.

Accessing the Databases

Most of my coworkers only access Symphony using one tool: Workflows. A few also use BLUEcloud Circ.4 Using the client, they look for records, update them, perform transactions, etc. We import single MARC records using Workflows wizards. We import batches of MARC records using Workflows reporting (and FTP). Global item updates are done in Workflows. The Workflows reporting module can be used to load, transform, or extract data, history, or (some) statistics.

Next, we have BLUEcloud Analytics. A much smaller set of people (but still plenty) have rights here. As described above, Analytics is a completely separate database. It’s also designed in a way that’s more oriented toward statistical work. Folks use it to extract shelf lists, acquisitions data, spreadsheets of MARC subfields, etc. The indexes are enormous and joined queries can take some time to run (and you can only run joined queries which are supported by the system), but you can get a lot of data and can’t accidentally bring down production.

About four years ago, we got access to Data Control. This is probably my favorite Sirsi product5. Unlike Analytics, Data Control gives you the power to query or even update the Symphony database itself. That means it doesn’t have some things that are in Analytics. You can’t see an item’s transaction history, for example, just its current data.6 Even fewer people have access to this, most use it on our Stage server, and just a couple of us are allowed to run batch updates to production.7

seltools is like Data Control for the command line. More properly, Data Control is an interface that lets ordinary humans use seltools with enough scaffolding not to mess quite as many things up. seltools can do even more and can do it very quickly. It is a sysadmin tool and only two people here have rights to use it. It can do extraordinary work in seconds and could cause irreparable damage (or at least, damage that requires restoring from backup). AFAIK it dates back to the launch of Unicorn.

How I Access the Data

I have rights in Workflows, BLUEcloud, Analytics, and Data Control. I tend to use them as a kind of grab bag and often chain Analytics and Data Control in my work, sometimes performing interim steps with Python or OpenRefine.

Because Analytics isn’t querying live data, it’s a much better place to do initial MARC searches. If I want to find every record with a 699, for example, Analytics is the place to do that fast. Or I could look for every 100 or 700 with a subfield “e” or search for a particular piece of text in one or more fields.

But in terms of output, Analytics leaves a lot to be desired for MARC work. It’ll shows a field’s subfields like a table. For example:

Field Subfield data
264 a New York
b Grosset & Dunlap
c [1972]

That’s fine if I only want to facet down to the subfield b in each row, but if I want to deal with the MARC data as a field it becomes a problem.

In the Analytics reports I use, it’s easy to add the bib key to a report if it wasn’t already in there. Before we got Data Control, my next step would be to actually switch to something like Z39.50 and download all the bibs manually, hoping I got everything (because our keys are not always in the 001, it’s a long story). I then had to do a delimited export in MarcEdit or write a pymarc script to get the fields I wanted.

Now, if I want to see a set of fields from the record, I simply upload that same set of bibkeys in Data Control8. I structure my query to include the tables I want and output the fields I need from each table. I can then export them into a much nicer spreadsheet with the MARC field (and indicators, if desired) printed the way it appears in the original MARC. I can also export the entire set of records as MARC.

264
|aNew York :|bGrosset & Dunlap,|c[1972]

An Example Update

But, even better, I have the rights to update the data. In most cases, I can even use regular expressions. For example, when we added a new ILLiad request placement module to our MyAccount app, we grabbed the 020 (ISBN field) straight from the Symphony API.9 Unfortunately, about 600,000 of our 020 fields followed the pre-2013 structure, when qualifying information was still included in the subfield a. In 2013, subfield q was introduced to handle things like “(paperback)”. This unexpected data was messing with ILLiad’s automated processes. We could’ve changed the script, but it made more sense to fix the actual data, since we niw had the tools.

First, I ran an Analytics query to find all records where the 020a contained (,), or any letter except x. I exported the data, extracted the bibkey column, and then broke it into batches of 25,000 bibkeys.

I spent a few weeks working on our stage server to develop the appropriate regex-based find and replace patterns to move qualifying data into a subfield q. I had to handle various edge cases: no parentheticals, only one half of the parenthetical, etc. Once I felt confident, I ran a batch of about 5000 on stage and QAd my results thoroughly. I then spent the next month running batches in production. I limited batch sizes and chose days when we didn’t have other jobs which would trigger big reindexes (you can only do so many jobs in a night or the reindex will take forever and throw off all the other chron jobs).

Once the project was done, I was able to re-run queries in Analytics to ensure there weren’t any issues remaining.

I can also click into and update single records from Data Control results page or set it to let me modify a particular field and paste repeating data into that field. The former is useful when there might be other related fields which need to be updated or I need more context. The latter is useful when only some of the results need to be updated or the person hasn’t yet got regex privileges on production.

Clashing Designs

So that’s what it looks like when things go well. Tech librarianship so often involves what Marshall Breeding called “Knitting Systems Together” that I almost don’t think about the ways I hop across tools. At most I feel a minor irritation. Recently, I ran across a case where the difference between system designs and who had permissions to access what was making a huge difference in my coworkers’ abilities to get their work done.

In theory, the data in Analytics should mirror what’s in Symphony, at most with a different structure. However, when a barcode is updated in Symphony (generally via Workflows), Analytics completely drops entries related to that barcode. The entries are not transferred to the new barcode. Data that’s still in the item record is retained, so we have the item last activity date, the circulation count (an incremented field), etc. But we can’t see the item transaction history.

Now, there were a couple things we could do about this… I’ll describe how system logs come into play in my next post!


  1. Still labeled Unicorn in some places. ↩︎

  2. Specifically, it’s MicroStrategy whose Wikipedia page starts off like any other data analytics software and then …pivots to Bitcoin. It’s Michael Saylor’s company, if that name means anything to you. ↩︎

  3. Timing could be more frequent, but I believe most have daily updates. ↩︎

  4. BLUEcloud is Sirsi’s next-gen browser client. To my knowledge, we still only use the circulation module and many people still use Workflows for circulation. ↩︎

  5. It’s extremely powerful, though extremely fragile – but that could also describe me, so I can only be so annoyed by it. ↩︎

  6. Transactions here meaning every time the item was scanned, some of which is available via Analytics. There is also transaction history in Symphony but it’s in logs. ↩︎

  7. It also supports two kinds of batch updates – a batch modify which lets you edit fields individually in a browser interface and a batch substitute which lets you run updates on fields using regular expressions. If you wanted to update a MARC 500 field on a set of items, for example, someone with batch modify permissions could display all 500 fields on the records, click Modify, and then paste a new text into any field they wanted to replace (while skipping 500 fields which didn’t match). Someone with regex permissions could find all notes matching the old note and sub it with the new note. ↩︎

  8. Why not do the whole search in Data Control? It is painfully slow compared to Analytics, especially for MARC searches. For the cases when Data Control is designed better for searching, I’ll export a set of keys for the overall records I want to search within and then perform it as a scoped search, which is much faster. ↩︎

  9. We only use the APIs for integrations not for reporting/updates/etc., so I didn’t list it above. Seltools are much faster and more powerful. ↩︎

June 01, 2026 12:00 AM

May 31, 2026

Ed Summers

Untitled

Untitled

by John Summers

May 31, 2026 04:00 AM

Weekly Bookmarks

These are some things I’ve wandered across on the web this week.

🔖 Software can be finished

One of the “25 lessons” I presented at LoopConf this year was “Software can be finished”.

I tried to remove this lesson from the talk to make room for something else. But it kinda took on a life of its own and wanted to be in the talk. It was an Important Lesson.

“Software can be finished” is a controversial statement. And I want to be clear, as I was in the talk, that in many cases “finished” software is not, and should not be, the goal.

But I present it as an idea. An ideal. A theory. Something to chew on and think about.

What would finished software be like? How would we write it? And what can we learn about the way that we create software by considering these other questions?

🔖 Current Rothko

A Rothko for the current weather.

🔖 Lament for the MIT Libraries

I write with dismay, grief and sorrow for the permanent closure of MIT Libraries Barker, Dewey and Rotch [not yet closed, but likely to suffer the same fate, ed. note], and termination of library staff in those libraries. For over half a century, I have learned, studied, researched, reflected, taught, created and rested in these libraries, as an MIT undergraduate, graduate and postdoc student, as an MIT researcher and instructor, and as an MIT alum. Across the diverse ever-changing areas of the studies and investigations that involve me and my students – from physics to poetry, from historical science to electrical engineering, from sculpture to photography, from philosophy to psychology – all the MIT Libraries have stimulated, opened, and connected us with human efforts, current and historical, to understand, express and learn in and with the world. Welcoming for me, and all students, the MIT Libraries were oases, spaces apart from the stresses, deadlines, demands of this school, where one could reflect apart, go to a familiar bookshelf, read in companionship with others and be challenged by human voices new, unexpected and concerned for nature, learning and truth. The MIT Librarians and MIT Libraries Circulation desk were available, interested and open to assist for whatever confused questions, incomplete references, tangential details or specific analyses we might be working on or stumped by [see MIT 1912, third quote below]. There were always other places to look, another staircase to climb, or resources to consider. MIT Libraries could open to anywhere and also facilitate rethinking of one’s own understanding and local contexts.

🔖 The Working Archivist’s Guide to Enthusiast CD-ROM Archiving Tools

I’ve seen a lot of professional archivists who use flux disc image archiving techniques for their collections—a technique in which a specialized floppy controller captures the raw signal coming from the floppy drive so that it can be preserved and decoded in software. I haven’t, however, seen many archivists using enthusiast-developed low-level reading techniques for CD-ROM. I’ve personally been making use of these techniques and I find them very helpful; I know that many other archivists and institutions could make great use of them. However, I know that information about enthusiast-developed tools are usually deeply embedded in those communities and can be hard to find for others. As someone with a foot in both worlds, I want to try to bridge the gap and make this information available a bit more widely. This post will summarize why archivists might be interested in these tools, what they can do, and how to make use of them.

🔖 palewire / fakethirtyeight

The old fivethirtyeight.com was taken offline by its corporate owners. This repo spiders the Wayback Machine (and a few adjacent sources) to build a comprehensive, deduplicated index of every editorial entry FiveThirtyEight ever published, then serves a browse + search UI at fivethirtyeightindex.com.

🔖 Maryland / CASA

These candidates had the CASA in Action seal of approval.

🔖 Preserving optical media from the command-line

The KB has quite a large collection of offline optical media, such as CD-ROMs, DVDs and audio CDs. We’re currently investigating how to stabilise the contents of these materials using disk imaging. During the initial phase of this work I did a number of tests with various open-source tools. It’s doubtful whether we’ll end up using these same tools in our actual workflows. The main reason for this is the sheer size of the collection, which we estimated at some 15,000 physical carriers; possibly even more. At those volumes we will need a solution that involves the use of a disk robot, and these often require dedicated software (we still need to investigate this more in-depth).

Nevertheless, throughout the initial testing phase I was surprised at the number of useful tools that are available in the open source domain. Since this will probably be of interest to others as well, I decided to polish a selection from my rough working notes into a somewhat more digestible form (or so I hope!). I edited my original notes down to the following topics:

May 31, 2026 04:00 AM

May 30, 2026

Ed Summers

Another moon

Another moon

May 30, 2026 04:00 AM

May 28, 2026

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2026-05-26: URL Arguments in API Calls Can Cause Intermittent Temporal Violations While Replaying Archived Web Pages

URL Arguments in API Calls Can Cause

Intermittent Temporal Violations

While Replaying Archived Web Pages 


Michael L. Nelson

2026-05-26

Just over two months ago, I was at the ​Information Stewardship Forum 2026 at the Internet Archive, where I was fortunate enough to present a lightning talk about making copies of copies, entitled "The Disintegration Loops: Generational Loss in Web Archives".  During one of the breaks, Mark Graham asked Sawood Alam to take a look at a problem that had stumped the Wayback Machine support team.  I was sitting next to Sawood, and knowing my love for web archiving investigations, Mark invited me to take a look too.  The original inquiry:

Hi, everyone! Got a concerning report from a patron alleging that WBM "URLs were intermittently displaying the current version of the website instead of the archived version." The URLs in question are:

A quick check shows that when replaying these URLs, the content does resemble what is on the live web. For example, the text shown on the page references 2025 and 2026 updates, even though the captures are from 2024 - 2025. I've attached a screenshot of the 2025 capture appearing to show live web content as well as a printout/capture the patron provided of the same URL appearing to show the "actual" archive.


Sawood and I discovered that the problem is not that these URLs are sometimes displaying the live web (or at least not directly). The problem is that this seemingly simple "Terms of Use" page is unnecessarily complex, with the boilerplate legal text included via an API call.  The JavaScript that makes the call includes a number of superfluous URL arguments, including "screenWidth" and "screenHeight", and probably are appended to all API calls "just in case they are needed" (presumably the "Terms of Use" do not actually vary based on the size of the browser).  Thus, depending on the size of your browser, the legal text included in the page is potentially archived at different times, sometimes resulting in a temporal violation: a replay of an archived web page with subresources in a combination that did not exist at the time the top level page was archived.  


Although there are potentially a countably infinite number of archived "Terms of Use" pages, for the examples above there are two semantically interesting versions: one is marked (near the top, left-hand side) "Last Updated: January 18, 2024" and the other is marked "Last Updated: September 22, 2025".  Taking these "Last Updated" strings at face value, we would not expect the three URLs above (archived at "20240222221058" (February 22, 2024), "20241228224626" (December 28, 2024), and "20250531013827" (May 31, 2025)) to display "Last Updated: September 22, 2025". But sometimes they do – and sometimes they don't – and which archived version you get depends on the size of your browser.  


First, as of the time of this writing, the live web still has the "Last Updated: September 22, 2025" version:


https://www.victoriassecret.com/us/site-terms-and-notices


What appears to be a relatively simple HTML page is unnecessarily complex, with nearly 200 subresources. The figure below shows the relevant portion of the call stack: the HTML page calls the cheekily named JavaScript "brastrap.js", which in turn calls the API at "api.victoriassecret.com".  


https://api.victoriassecret.com/categories/v15/page?...


For me, right now, the full live web URL is (emphasis added):


https://api.victoriassecret.com/categories/v15/page?categoryId=4b1ed4b3-5965-4a4d-a3d5-1e5ad379445a&brand=vs&isPersonalized=true&activeCountry=US&platform=mobile&deviceType=phone&platformType=ios&perzConsent=true&cid=&tntId=&screenWidth=701&screenHeight=605


Guessing at the URL arguments: 


It's the last two arguments, "screenWidth" and "screenHeight", that cause the intermittent behavior the original users noticed.  


First, let's consider the page archived on February 22, 2024 ("20240222221058"), which clearly shows the "Last Updated: September 22, 2025" string:


https://web.archive.org/web/20240222221058/https://www.victoriassecret.com/us/site-terms-and-notices


And since the live web still has "Last Updated: September 22, 2025", this is what caused people to think they were getting a live web version (more on that in a bit).  First of all, the Wayback Machine's "About this capture" link does not help; it shows only some of the subresources (improving its function is a task for another time):


"About this capture" lists only some of the subresources, and not the problematic api.victoriassecret.com page. 


Sawood discovered the API URL first. It's well-obfuscated, so it's not a surprise that tech support staff did not find it immediately.  We were sitting side by side, each using our own laptops, and he's much smarter than me and he's always going to win that race. But I noticed that for me, the page seemed to be saved right then, just a minute or two before, whereas he saw that it was archived a few days before (it was then March 19, 2026).  That was odd, but the next session started and I had to stop. 


The 2024 archived version of the page uses a "/v12/" version of the API endpoint (note: this is a common but wrong way to version an API), but it's similar to the 2026 live web example above:


https://web.archive.org/web/20260319160602/https://api.victoriassecret.com/categories/v12/page?categoryId=43b79880-5d66-4747-a553-65aeb3bd1876&brand=vs&isPersonalized=true&activeCountry=US&platform=web&deviceType=&platformType=&cid=&perzConsent=true&tntId=&screenWidth=508&screenHeight=593


In particular, the "/v12/" endpoint remains functional, even though the live web HTML & brastrap.js access the "/v15/" version.  Checking the Wayback Machine directly confirmed that this was indeed the first time that URL had been archived:


https://web.archive.org/web/20260401000000*/https://api.victoriassecret.com/categories/v12/page?categoryId=43b79880-5d66-4747-a553-65aeb3bd1876&brand=vs&isPersonalized=true&activeCountry=US&platform=web&deviceType=&platformType=&cid=&perzConsent=true&tntId=&screenWidth=508&screenHeight=593


Although Sawood found the problem URL, and we confirmed it was archived in March, 2026 (and thus displayed the "Last Updated: September 22, 2025" string), it bothered me that he had an earlier archival time than I did (March 14, 2026 vs. March 19, 2026).  After the next session ended, I returned to this problem.  I changed the size of my browser, and was able to force another new archived version (reproduced on March 22, 2026 below):


The highlighted text shows: 

https://web.archive.org/save/_embed/https://api.victoriassecret.com/categories/v12/page?categoryId=43b79880-5d66-4747-a553-65aeb3bd1876&brand=vs&isPersonalized=true&activeCountry=US&platform=web&deviceType=&platformType=&cid=&perzConsent=true&tntId=&screenWidth=565&screenHeight=605 


Although it's beyond the scope of this post, the Wayback Machine's Save Page Now has a "/save/_embed/" API that allows the Wayback Machine to "patch" the archive with missing URLs from the live web.  In this case, the version of the API response ending with "&screenWidth=565&screenHeight=605" was "missing" from the Wayback Machine, so it patched the archive from the live web, which still displays the "Last Updated: September 22, 2025" string, despite the main HTML page being archived in February, 2024.  So in essence, the Wayback Machine was displaying the live web version, after it was immediately saved to Wayback Machine.  Presumably the "Terms of Use" page changes slowly, but this behavior would be more noticeable if the "Last Updated" string was updated, say, every minute. 


A call to the CDX API confirmed that there were a variety of screenWidth and screenHeight combinations archived (horizontally scroll to the right in the gist below to see the combinations):



In fact, by inspection, there are at least two chances to get the wrong version. If your screen size is "screenWidth=1600&screenHeight=1000", you will get a version of the page that has the string "Last Updated: February 7, 2023", a temporal violation reaching into the past instead of the previously described version that is a temporal violation from the future.  A screen size of "screenWidth=1400&screenHeight=900" will produce the right result ("Last Updated: January 18, 2024"), and a screen size of "screenWidth=1440&screenHeight=900" will produce a different wrong result ("Last Updated: September 22, 2025"). And as shown above, a screenWidth and screenHeight combination not already archived will cause the Wayback Machine to be patched from the live web.  Furthermore, if/when the "/v12/" live web API endpoint is deprecated, then unarchived size combinations will just cause the page replay to silently fail, and most people won't understand why.   


In summary, this seemingly simple "Terms of Use" page is really quite challenging in practice:



We've encountered synchronization problems with HTML and JSON before (e.g., "Right HTML, Wrong JSON" (JCDL 2023), "Challenges in replaying archived Twitter pages" (IJDL 2024)), but the implementation complexity found in news outlets and social media was to be expected: the advanced UI features that make these sites engaging (e.g., auto-updating, infinite scroll, embedded media, personalized content) are the same features that make archival replay difficult. Without the "Last Updated: …" string, the problem would have been much harder to notice and diagnose. The seemingly intermittent nature, where you'd get a temporally coherent replay only if your browser was the same size as the previously archived responses, made the investigation especially challenging. 


Who pays attention to their browser's exact width and height? In this case, they were the keys to solving this puzzle.


–Michael 

Mark Graham welcoming attendees

My lightning talk

Me in front of the Internet Archive

Dr. Sawood Alam, me, Dr. Jian Wu

by Michael L. Nelson (noreply@blogger.com) at May 28, 2026 02:55 PM

Journal of Web Librarianship

Information Literacy and Social Media: Empowered Student Engagement with the ACRL Framework

.

by Alyshia Bagley Georgia Southern University, Savannah, Georgia, USA at May 28, 2026 04:45 AM

Xe Iaso

Dancing mad with sandboxing

Cadey is enby
Cadey

What is an operating system, really?

Aoi is wut
Aoi

I mean, isn't it obvious? It's something like FreeBSD or Fedora that has a kernel, userspace, graphics stack, core set of programs, and everything else you need to be able to use a computer. Is this a trick question?

Numa is smug
Numa

Well it depends, is the Nintendo Switch OS an operating system? It doesn't have a shell in the same way FreeBSD does. Is SEL4 an OS? It doesn't ship with core utilities. Is Linux an OS? Is Windows an OS?

Aoi is facepalm
Aoi

Oh gods here we go again…

The definition of an operating system gets really fuzzy when you start looking at the edges of it, but let's say that an operating system is any part of a computer system that doesn't involve pure math. When you print to the screen, render 3d graphics, connect to the internet, and write to files your code calls into the underlying system to do that work. These system calls are defined by your operating system and are exposed as functions*.

Mara is hacker
Mara

Okay they're not actually functions, but they quack enough like functions that you can treat them like functions and not have to worry about the details too much.

System calls are injected into each operating system process via a process kinda like how you inject dependencies into your applications for database sessions or object storage operations.

Bashing your head into the wall

A while ago a new JavaScript package got into the meme sphere at work: just-bash. It's a sandboxed environment with a shell interpreter that was originally intended for use with AI agents after its author observed that AI agents know how to use a tool called bash a lot better than a tool called search_documentation. This is backed by a "fake" shell with "fake" core utilities (cat, ls, etc, hereinafter coreutils) so that when an agent decides to rm -rf /, nothing important actually leaves the room. One of my coworkers made @tigrisdata/agent-shell on top of this that uses Tigris as its storage layer.

This is great for people in the JavaScript ecosystem, but I am not mainly a JavaScript developer. I really wanted to play with it so I started thinking what it would take to have something like this in Go. mvdan's shell package makes this a heck of a lot easier, meaning that this "fake" shell would be powered by a real shell instead of either porting half of bash to JavaScript or making up hopefully-compatible behaviour.

After a bunch of thought, hacking, and a spot of vibe coding while I did some Dawntrail extreme mount farms, I ended up with Kefka, a "fake" shell with coreutils implementations that lets you put your programs in clown jail. This package lets you add a sandboxed-in-userspace shell to your existing projects without shelling out to the actual implementations of coreutils on your machine.

Mara is hacker
Mara

The name is inspired from Kefka Palazzo, the final boss of Final Fantasy VI. Need to chain uncontrollable demons? Use the power of a mad god driven to the brink of insanity with raw access to magic! What could possibly go wrong!

So I did that

So after some thought, I came up with this interface for the "commands" to use: Execer. This takes process context and passes it as an argument to a function named Exec. Exec then does whatever the process needs it to (list files, write to stdout, etc.) and returns an error if things went wrong and no error if things didn't.

type ExecContext struct {
        	Stdin          io.Reader
        	Stdout, Stderr io.Writer
        	Dir            string
        	Environ        expand.Environ
        	FS             billy.Filesystem
        	// Runner is the active shell runner. Commands that need to dispatch a
        	// child command (for example, `time CMD`) should call Runner.Subshell()
        	// and re-enter the shell so the call goes through the same exec handler
        	// chain instead of poking at the registry directly. May be nil in
        	// embedders or tests that have not wired up a runner.
        	Runner *interp.Runner
        }
        
        type Execer interface {
        	Exec(ctx context.Context, ec *ExecContext, args []string) error
        }
        

This is where I started vibe coding things, mostly via a skill that ports a just-bash command to the Execer interface and filesystem in Go. just-bash itself looks vibe coded from help output and manpages; I tried to go further and stay POSIX compatible, down to matching flag syntax (and in some cases output formats). If your muscle memory fails you, it's a bug in my book.

Aoi is wut
Aoi

If I recall, some POSIX utilities like false aren't usable as Go identifiers, how did you handle package names for that?

Cadey is aha
Cadey

By naming them things like falsecmd:

package falsecmd
        
        // ...
        
        type Impl struct{}
        
        func (Impl) Exec(ctx context.Context, ec *command.ExecContext, args []string) error {
        	return interp.ExitStatus(1)
        }
        

Honestly the implementations of true and false are my favourite part of this implementation. Here's the implementation of true:

package truecmd
        
        // ...
        
        type Impl struct{}
        
        func (Impl) Exec(ctx context.Context, ec *command.ExecContext, args []string) error {
        	return nil
        }
        

This is a fully POSIX compliant implementation of true! Here's the relevant part of the spec if you don't believe me:

true - return true value

SYNOPSIS

true

OPTIONS

None.

OPERANDS

None.

Really, check out the POSIX spec for true. It's trivial to implement, here's a oneliner to implement it in Linux:

touch ./true && chmod +x ./true
        

I made an operating system*

This is basically an operating system: it provides interfaces for programs (well, in this case functions) to get input from a user, send output to a user, interact with a filesystem, and more. Eventually I want to add networking via a network stack on ExecContext, probably with tsnet or wireguard-go's netstack package for the user-level side. Maybe there's room for adding CEL based network filters there too.

Porting applications with WebAssembly

Once I got basic coreutils working, I thought it would be fun to get Python, jq, and ripgrep working. From previous experimentation back in the strawberry era of AI, I had already gotten Python running in WebAssembly via wazero. This used the stdlib io/fs#FS interface to allow me to inject virtual filesystems into the WebAssembly context. I used this to isolate my chatbot's filesystem state so that it (hopefully) wasn't able to delete anything important by accident.

io/fs#FS has methods for the important stuff, and runtime interface assertions let you bridge the gap for things like writes. But it was really designed for embedded filesystems, and writes get hairy fast once you're talking to object storage or anything that isn't a tree of bytes on disk.

At some point I hit a wall and had to switch from io/fs#FS to billy, another filesystem interface that I think predates the standard library one. This gives you a bunch more methods that map a lot closer to filesystem semantics in ways that coreutils crave. The interface was also mostly compatible with io/fs#FS so most of the hard part was really changing out the type and then chasing down compiler errors until I found enough of a pattern to have Opus automate the rest of it.

From there it was a matter of adapting billy's filesystem to wazero's experimental sys interface. Mostly glue code, except where I had to translate Go errors into POSIX errno values. I had to read both the POSIX spec, the WASI spec, and the wazero source to figure out how to map errors between the two worlds. I think I'm at least 95% correct, which is likely within the margin of porting error.

Adapting that codeinterpreter/python library to the new interface was mostly straightforward, and I ended up with a flow like this:

// from https://tangled.org/xeiaso.net/kefka/blob/main/command/internal/python3/python3.go
        
        func (Impl) Exec(ctx context.Context, ec *command.ExecContext, args []string) error {
        	fsConfig := wazero.NewFSConfig().
        		(sysfs.FSConfig).
        		WithSysFSMount(billyfs.New(ec.FS), "/")
        
        	config := wazero.NewModuleConfig().
        		// Pipe ExecContext stdio
        		WithStdin(ec.Stdin).WithStdout(ec.Stdout).WithStderr(ec.Stderr).
        		// Pipe argv
        		WithArgs(append([]string{"python3"}, args...)...).
        		WithName("python3").
        		// Pipe filesystem
        		WithFSConfig(fsConfig).
        		// Pipe system time
        		WithSysNanosleep().WithSysNanotime().WithSysWalltime()
        
        	mod, err := runtime.InstantiateModule(ctx, compiled, config)
        	if err != nil {
        		// Fit the square peg into the round hole
        		if exitErr, ok := errors.AsType[*wsys.ExitError](err); ok {
        			if code := exitErr.ExitCode(); code != 0 {
        				return interp.ExitStatus(uint8(code))
        			}
        			return nil
        		}
        		return err
        	}
        	return mod.Close(ctx)
        }
        
Mara is aha
Mara

See? The dependencies such as stdin, stdout, and stderr get injected into the WebAssembly guest. Wazero also makes you inject the implementation of time for boring reasons involving deterministic computing, but I'm sure you can see the ways things hook in. This basic dependency injection flow is how things like the linuxulator in FreeBSD or the old version of the Windows Subsystem for Linux work (WSL1 before it was made into a Linux VM with WSL2). The table of system calls and filesystem context is effectively an argument to the process.

Same trick got me ripgrep and jq. jq was annoying because wasi-sdk doesn't love jq's (ab)use of cmake; however 30 or so minutes of tweaking compiler flags got me a binary that works enough.

I could see it being pretty easy to port over arbitrary programs to Kefka using WebAssembly like this. There's just one small problem: WASI preview 0.1 doesn't allow you to open arbitrary network sockets. This has been a huge pain in practice (it means you can't do HTTP requests, database connections, or other common internet things from inside the WASM sandbox) and future work probably would include adapting wazero to use wasix instead of WASI 0.1.

Using filesystems that don't exist

OK, that handles filesystems that (arguably) exist, like the btrfs volume on my dev box. What about filesystems that don't? For the sake of argument, let's say you want this fake shell to interact with object storage as its main filesystem. At some level all you need to do is adapt the billy interface to object storage using something like storage-go.

Cadey is coffee
Cadey

Disclaimer, I work at Tigris and developed this library for them. It's basically the S3 client with more methods to handle additional Tigris features like forks and snapshots. I'll be writing more about it soon.

After finding a basic implementation of an S3 -> Billy adapter, I vendored it into the Kefka repo and swapped out the "real" filesystem in cmd/kefka for an s3fs implementation pointed at a sample Tigris bucket. From there it was down to an iterative process of running commands, finding feature gaps when errors showed up, implementing them, fuzzing, and making sure things work mostly the same against Tigris as they do against a local filesystem.

WASI is cursed: it has no process-level "current working directory," which most programs assume exists. You patch around it by passing a CWD envvar, or just use absolute paths. I haven't hit anything broken in casual use, but expect rough edges. Here be dragons and this code may be known by the state of California to cause cancer.

Why does it have to use the command line?

Once everything got working with s3fs and a local shell, I wondered how hard it would be to make this work as an SSH server using the github.com/gliderlabs/ssh package. Hooking things up was pretty easy:

func HandleSSH(sess ssh.Session) error {
          // Convenience variables for SSH session values
          var stdout io.Writer = sess
          var stderr io.Writer = sess.Stderr()
          var stdin  io.Reader = sess
          ctx := sess.Context() // cancelled when the user disconnects
        
          // Kefka command registry with coreutils/python/jq/etc
          commands := registry.New()
          coreutils.Register(commands)
          wasmprog.Register(commands)
        
          // Base envvars for all programs, needed by POSIX
          env := expand.ListEnviron(
            "HOME=/",
            "PWD=/",
            "IFS=\n",
            "HOSTNAME=localhost",
            "USER="+sess.User(),
            // not strictly required, but just-bash sets it
            "MACHTYPE=x86_64-pc-linux-gnu",
          )
        
          // Create shell engine
          sh, err := interp.New(
            // Set the "interactive" flag so the shell expands aliases
            interp.Interactive(true),
            // Forward our envvars
            interp.Env(env),
            // Wire up stdio
            interp.StdIO(stdin, stdout, stderr),
            // Change the shell exec handler such that it's constrained to the
            // Kefka registry.
            //
            // Strictly speaking you don't have to do this, but if you don't
            // then any time the registry doesn't have a command
            // implementation, interp falls back to its default ExecHandler that
            // executes the command as a subprocess. This is almost certainly
            // not what you want.
            interp.ExecHandlers(constrainToRegistry(commands)),
            // Wire up per-command pwd state to the filesystem implementation
            interp.CallHandler(billysh.CallHandler(commands, fsys, stdout, stderr)),
            // Handle shell-level filesystem I/O (redirects, glob expansion, etc)
            interp.StatHandler(billysh.FsysStatHandler(commands, fsys)),
            interp.FsysOpenHandler(billysh.FsysOpenHandler(commands, fsys)),
            interp.ReadDirHandler2(billysh.FsysReadDirHandler(commands, fsys)),
          )
        
          // Read shell commands
          parser := syntax.NewParser()
          fmt.Fprintf(stdout, "$ ")
        
          // Split input into commands
          for stmts, err := range parser.InteractiveSeq(stdin) {
            if err != nil {
              return err
            }
        
            if parser.Incomplete() {
              fmt.Fprintf(stdout, "> ")
              continue
            }
        
            for _, stmt := range stmts {
              err := sh.Run(ctx, stmt)
              if sh.Exited() {
                return err
              }
            }
        
            // Show prompt
            fmt.Fprintf(stdout, "$ ")
          }
        
          return nil
        }
        

The real handler is much messier because Python's REPL needs careful buffering, Ctrl-C has to actually cancel things, and pty wiring is its own can of cans of worms. None of that shows up if it's working. Tab completion and readline polish are easy enough; I'll let you wire those up as an exercise for the reader.

If you want to try it today, you can ssh into sophia.xeiaso.net:

$ ssh sophia.xeiaso.net
        

You'll get an isolated sandbox in your own bucket fork/branch. Every ls is a ListObjectsV2 against the bucket. Every qjs or python3 runs WebAssembly on the server, wired to that same bucket.

$ cat ./samples/hello.js
        console.log("Hello, world!");
        $ qjs ./samples/hello.js
        Hello, world!
        

The demo bucket is seeded with examples. You'll probably have to poke around to find everything. Worst case, run help.

Cadey is coffee
Cadey

I should really hook up session recording to this.

I want more experimental WebAssembly hacks like this to exist. I'll keep poking at it.

Put your programs in clown jail

With some effort, yeet could use Kefka's shell utilities to run Anubis builds on Windows; and if management ever makes you babysit AI agents, clown jail is a decent answer.

The code lives on Tangled. I'm wiring it into an agent harness so I can automate small tools against a local model (I'm loving Qwen3-36B-A3B).

There's a sister post on the Tigris blog that goes deeper into the AI-agent angle and the porting work using Claude Code. If you want, you can check it out here:

alt
Tigris DataGive your agents disposable environments in GoKefka is a userspace shell sandbox in Go that gives every AI agent its own copy-on-write Tigris bucket fork plus Python, jq, and ripgrep via WebAssembly.

May 28, 2026 12:00 AM

May 27, 2026

In the Library, With the Lead Pipe

Moving Beyond Willpower: A New Direction for Media Literacy Instruction

In brief

Academic librarians and others often engage with media literacy instruction by promoting fact-checking strategies, such as lateral reading or Mike Caulfield’s SIFT. Evidence shows that these strategies are valuable and can be effective, but they all ultimately rely on individual students to use willpower to overcome cognitive habits, biases, strong parasocial relationships with content creators, the power of algorithms, and other challenges to fact-checking content in the moment. This paper offers an alternative approach that instead encourages librarians to support students in intentionally redesigning their information environments to improve the quality of information that they encounter in the first place.

By Mandi Goodsett

“The task of breaking a bad habit is like uprooting a powerful oak within us. And the task of building a good habit is like cultivating a delicate flower one day at a time.” – James Clear

In a 2024 study conducted by the News Literacy Project, the organization found that 80% of the teen participants believed that journalists fail to produce more impartial information than other online content creators, and 69% said that news organizations intentionally make their content biased to advance a particular viewpoint. When the News Literacy Project followed up with these young adults a year later, they found that most of them believe that trustworthy, unbiased news is rare or maybe doesn’t even exist (2025).

Pew Research found, through a series of focus groups, that Americans don’t always agree on what constitutes a “journalist” or “news media,” and young adults are more likely than older adults to call “new media” platforms hosts, such as podcasters and social media creators, “journalists” (Eddy et al., 2025). Overall, younger participants were less likely than older adults to even care whether the news they consume comes from a journalist. The investigation found that Americans are concerned that, besides maybe a few reliable ones, journalists are concerned with “clicks, eyeballs, money, things like that, and they don’t necessarily mind tweaking the truth to suit their audience or their advertisers” (quoted in Pew Research, 2025). 

These statistics are significant because cynicism about standards-based news and other traditionally authoritative institutions has many negative impacts. First, news cynicism can lead to news disengagement, which pushes information consumers to less reliable platforms (Ahmed et al., 2025; Fletcher, et al., 2024; Mont’Alverne, 2022) and contributes to erosion of trust more broadly in institutions like voting (Park, et al., 2025; Raffio, 2025). When people disengage, news sources themselves are threatened by obsolescence, and this threatens their role as a watchdog and a keystone of democratic societies (Haider & Sundin, 2022). News cynicism makes it difficult for accurate information to reach people and, paradoxically, makes people more vulnerable to misinformation (Ahmed et al., 2025; Hasell & Halversen, 2024). Individuals may feel anxious, depressed, and helpless about their world, leading to a spiral of disengagement (Hasell & Halversen, 2024). News cynicism also fuels societal division and threatens democracy (Cappella & Jamieson, 1996; Valgarðsson et al., 2025). Widespread distrust in institutions such as the government, science, public authorities, and the press is a risk to media literacy, democracy, civil discourse, and our sense of agency.

Academic Librarians and Media Literacy Instruction

One strategy for helping students and others improve their media consumption is to teach them media literacy skills. Media literacy is generally thought to be the ability to access, evaluate, analyze, and create media messages (Aufderheide, 1993), although definitions vary considerably between researchers and practitioners (Fleming, 2014; Hobbs, 1998). Media literate individuals have the skills to identify media sources and messages that are unreliable, and, perhaps more importantly, craft an overall media diet that is more likely to consist of reliable information.

Academic librarians are interested in and possess relevant expertise to teach students media literacy skills that are relevant in academic and non-academic settings. Many librarians have explored tactics for teaching students source evaluation skills that move beyond the CRAAP test (Currency, Relevance, Authority, Accuracy, Purpose), such as the SIFT method (Stop, Investigate, Find, Trace), created by Mike Caulfield (2019), or lateral reading, popularized by the Civic Online Reasoning organization (Digital Inquiry Group, n.d.). Caulfield’s SIFT method provides a more up-to-date approach to source evaluation by offering strategies that are more efficient, straightforward, and applicable in a wide variety of contemporary information settings (Bull, 2021). “Lateral reading,” which is a key component of SIFT, involves leaving the source that is being evaluated and opening new browser tabs to investigate what other Internet sources report about the site and its claims (Wineburg & McGrew, 2019). Research has shown that the SIFT method and lateral reading results in more accurate student source evaluation (Bobkowski & Younger, 2020; Breakstone et al., 2021; Brodsky et al., 2021). These techniques reflect a better understanding of the modern online information environment than simplistic checklist strategies. However, they still expect students to avoid misinformation through careful self-control and self-monitoring.

Misinformation is an interdisciplinary problem with significant complexities. As Sullivan has argued, librarians have historically focused on media literacy instruction strategies that neglect the psychology of how people interact with information, and the field of library and information sciences is somewhat siloed in its exploration of source evaluation instruction (2019). For example, heuristics, systems thinking, mental models, and cognitive biases all play a role in how and why people adopt misinformed beliefs. Emotions also influence the ways that individuals evaluate information (Hewitt, 2023; Hicks & Lloyd, 2021), yet they play a minor role in most library source evaluation instructional strategies. Academic librarians may have a role in combatting misinformation, but we should proceed, as much as possible, guided by research conducted across disciplines (Saunders, 2025). As an example, academic librarians have often focused their source evaluation teaching on investigation strategies and fact-checking skills. These skills are very important, and we shouldn’t abandon them. But there are many reasons, informed by research outside of Library and Information Science (LIS), why reactive strategies that rely on individual willpower are destined to be difficult to maintain. 

Challenges of Fact-Checking and Other Traditional Source Evaluation Techniques

Evidence shows that, globally, trust in institutions is decreasing, including in democratic societies (Kavanagh & Rich, 2018; Gil de Zúñiga & Diehl, 2019). The consequences of this could be severe, as many scholars posit that trust in institutions is an important pillar of democracy (Haider & Sundin, 2022). There are also a number of well-studied examples of how bad actors can sow doubt in institutions, such as academia, to achieve their own ends (Haider & Sundin, 2022). This has played out in the case of the tobacco industry and fossil fuel companies; in both cases, the science is clear, but raising uncertainty can be enough to sway consumers to take actions that are not in their best interests (Oreskes & Conway, 2010). All of this said, when society’s institutions become corrupt or unreliable, or when institutions are systematically unfair to one’s group or identity, distrust in institutions is often justified (Haider & Sundin, 2022). So while dismissing institutionally-backed information in favor of persuasive individuals is risky, confidently pointing to institutions as always trustworthy is also unlikely to be effective. Easy-to-apply source evaluation checklists that are meant to be used across all contexts and blind trust in compelling individual voices both fail to reflect the complexity of information environments. 

While media literacy that relies on individual fact-checking skills is very important, there are many reasons why a willpower approach is likely to have limited success. The section below explores these limitations from internal factors, to external factors, and finally, to systemic factors.

Limits of Fact-Checking: Internal Factors

The intuitive solution to the problem of misinformation is to let media consumers know that a piece of information is untrue. However, there is mounting evidence that retractions and corrections have little effect on whether someone will make decisions based on misinformation (Seifert, 2014; Thorson, 2016; Zhou & Shen, 2024). There are many potential reasons for this, but one that almost certainly plays a role is the effect of cognitive bias. For example, epistemic egocentrism is a cognitive bias that occurs when individuals fail to consider their own privileged information when imagining the perspectives of others (Royzman et al., 2003; Zhou & Shen, 2024), which can cause people to judge their own source evaluation skills highly and blame the problem of misinformation’s spread on others. Closely related is blind spot bias, which is the belief that one is immune to bias (Pronin, et al., 2002). Confirmation bias is also relevant to the adoption and spread of misinformation; this bias is the tendency to seek out and remember information in ways that favor existing beliefs (Nickerson, 1998; Oswald & Grosjean, 2004). A consequence of confirmation bias is selective exposure, or a person’s proclivity to preferentially seek and engage with information that is in alignment with their existing values, beliefs, or attitudes (Zhou & Shen, 2024). These cognitive biases, which can occur whether or not the person has a pre-existing attitude about the misinformation, may lead people to dismiss corrections, assume they are correct in situations where there is substantial conflicting evidence, or, by consciously or subconsciously designing their information environment, rarely encounter threats to their existing worldview.

Research into the mechanisms that cause misinformation adoption to persist (sometimes called the “continued influence effect of misinformation”) shows that corrections can fail in their effectiveness when they leave a gap in someone’s mental model, especially when the misinformation fills that gap in a more satisfying way (Johnson & Seifert, 1994, p. 1420). Retrieval errors can also contribute; for example, when misinformation is retrieved from memory without the “false” label, or when misinformation is retrieved more readily than its correction (Ecker et al., 2011; Gordon et al., 2017; Lewandowsky et al., 2012). Because the misinformation and correction both exist in memory, deliberate, effortful thinking is necessary to retrieve corrections from memory, and natural cognitive efficiency processes can make this retrieval difficult or unlikely (Kendeou & O’Brien, 2014; Pennycook & Rand, 2019). These neurological processes make debunking misinformation incredibly challenging once it has been adopted into someone’s mental model. 

Information consumers are also often very confident about their beliefs, even if their knowledge about the topic at hand is, upon investigation, quite shallow. While perceptions of widespread misinformation increase, Americans are confident that they have the skills to identify this unreliable content. In 2016, a study found that 84% of participants were confident in their ability to spot “fake news” and 64% of those same participants believed that fabricated news stories caused significant confusion for Americans (Barthell et al.). Who is being confused by these stories? Not them, the participants in the study seemed to say; it’s everyone else. This points to an overconfidence that individuals have in their own ability to detect false information, contributing to the problem of misinformation’s spread.

One cognitive bias that helps to explain this phenomenon is the Dunning-Kruger effect, whereby individuals with limited knowledge of a subject fail to accurately assess their own level of expertise (Dunning, 2011). For example, research has shown that overconfidence in news judgments is associated with higher susceptibility to false news across a variety of topics, from autism awareness to nutrition claims (Lyons, et al., 2021; Motta, Callaghan, & Sylvester, 2018; Peng & Shen, 2025). Along the same lines, the “nobody-fools-me perception” is a cognitive bias whereby someone is overconfident in their ability to detect misinformation, especially as compared to others (Martinez-Costa et al., 2022). This leads people to make claims like “Many people haven’t learned to check facts” but fail to recognize their own media literacy deficiencies (Martinez-Costa et al., 2022).

Relatedly, the illusion of explanatory depth occurs when people believe they understand a complex topic more than they actually do upon further probing (Rozenblit & Keil, 2002; Sloman & Fernbach, 2017). Humans move through the complex, nuanced, and dangerous modern world by holding a naive intuition that they understand how the world around them works. This, combined with poor knowledge about the extent of our knowledge, causes a pervasive belief that we can explain the world around us even when we can’t (Bailey, 2021). The illusion of explanatory depth can cause people to adopt false beliefs confidently, not realizing their shallow understanding of the topic should cause them to question their self-assured stance.

It’s important to note that a 2025 study found that exposing participants to false news not only caused them to become overconfident in their judgments about whether news stories were true or false, it also fueled news mistrust (Altay et al.). This study demonstrates how news environments themselves contribute to issues that spur misinformation’s spread, such as overconfidence and cynicism. Along the same lines, some researchers worry that media literacy interventions that focus on “misinformation’s omnipresence” risk heightening the salience of misinformation as a threat to society and individuals, ultimately increasing news mistrust (van der Meer, Hameleers, & Ohme, 2023). Misinformation warnings alone can provoke a deception-bias, whereby people assume deception in news messages, rather than defaulting to a trust-bias as they often do in other contexts (van der Meer, Hameleers, & Ohme, 2023).

Limits of Fact-Checking: External Factors

While it’s clear that cognitive limitations make corrections to misinformation difficult or impossible, other researchers argue that misinformation itself is not as widespread of a problem as is commonly believed. They argue that the current perceived prevalence and “panic” about misinformation is a kind of “historical amnesia” (Stecula, 2025). The spread of misinformation is nothing new, and misleading messages have been created and spread for hundreds of years, from anti-vaccination movements of the early 1800s to disbelief about the real cause of JFK’s assassination, all of which occurred before the invention of social media (Stecula, 2025). What is different about the spread of false messages today is their overt support by important societal leaders and the new visibility their small groups of adherents have due to social media. These changes have allowed society to diverge into competing knowledge communities with unique standards for expertise, source evaluation, and, ultimately, defining truth (Stecula, 2025). These new, ideologically isolated communities with extreme views do not represent the majority of the population, but may seem to, given the way social media can amplify their messages. Fact-checking is likely to have limited reach and impact in these isolated, closely-knit communities.

Even in the rare cases when overtly false information is spread outside of isolated bubbles, fact-checking as a strategy for stopping its spread has limitations. Some argue that most fact-checking is ultimately reactive, constrained by scale and speed, and destined to  always be catching up with rapidly changing misinformation messages (Wack, Duskin, & Hodel, 2024). Fact-checkers themselves worry that fact-checking risks drawing additional attention to misinformation and has limited impact for cognitive reasons; one said, “I can only convince those already convinced” (Westlund et al., 2024). 

Another assumption of fact-checking is that knowledge of the truth impacts people’s behaviors in positive ways. However, research about climate change misinformation, for example, found that even when people have accurate beliefs about climate change, it has limited impact on their willingness to engage in pro-environmental behavior (Spampatti, 2025). Additional research has shown that, for some individuals, feeling and appearing independent from outside influence is more important than being correct; for these individuals, whether something is factual or not is irrelevant to whether it should be shared (Stein & Rutchick, 2025). 

It’s also possible that the problem of misinformation has been mischaracterized due to how it is typically studied. Current research on misinformation often focuses on issues that are likely to invoke false beliefs, and it also rarely asks participants to indicate confidence levels; both of these oversights may inflate the perception that people are deeply divided about many issues. In reality, participants may just be uninformed about issues, not misinformed, which is not captured in most studies (Stecula, 2025). Along the same lines, many studies that rely on truth discernment tasks impose a false dichotomy between true and false statements, when misinformation in real world contexts often rides the line between true and false, or may include some true statements with an overall misleading message (Spampatti, 2025). 

Limits of Fact-Checking: Systemic Factors

Research on the spread of misinformation has also frequently focused on individual-level susceptibility without addressing the role of structural inequities in shaping exposure to misinformation and capacity to resist it (Lin et al., 2022; Schirmer, et al., 2025; Walter et al., 2020). Socioeconomic disparities limit who can access high-quality information; lack of broadband access, language differences, and digital literacy deficiencies can all contribute to this problem (Schrimer, et al., 2025). Systemic mistrust, justified by decades of historical injustice, can lead some to seek information outlets alternative to the mainstream, exposing them to misinformation (Jaiswal et al., 2020; Pew Research Center, 2024). Many marginalized communities, however, are actively working to understand the impacts of misinformation and take grassroots efforts to combat it (Schirmer, et al., 2025). There are many ways to move beyond laying the responsibility of misinformation avoidance on individuals, and structural interventions have more potential to address the social disparities that shape misinformation adoption. 

While fact-checking strategies in particular have limited utility, all misinformation interventions that expect individuals to exercise willpower in algorithmically-driven environments will face considerable difficulties. Algorithms have significant power to influence what information and voices individuals encounter. While evidence about the impact of “filter bubbles,” or isolated online spaces that perpetuate misinformation messages (Pariser, 2011), is mixed (Arguedas et al, 2022), there is some evidence that filter bubbles can limit users’ exposure to diverse points of view and increase users’ access to lower-quality content (Ciampaglia et al, 2018). It can be tempting, in today’s algorithm-rich environment, to assume that, instead of intentionally seeking out standards-based news, that news will “find” you (Skurka, et al., 2025). American adults who think the news will “find” them are more likely to overestimate their ability to tell false from true political news and more likely to engage confidently with false news messages (Skurka, et al., 2025). 

One reason social media messages can be especially compelling has to do with influencers. Social media platforms allow for individual voices to have an outsized influence on large sections of the population. These individual voices, or “influencers,” do more than entertain people; they often drive the narrative around topics ranging from politics to economics to health (Thi & Ibrahim, 2025). While research shows that credibility, consistency, and transparency are important characteristics of an influencer that people trust, for an influencer to truly appear “authentic,” they must also build an emotional connection with their audience by seeming relatable and “being real” (Thi & Ibrahim, 2025). Accuracy of the messenger, while not completely irrelevant, is not the most important factor when people decide who to trust in social media settings.

The emotional bond that audience members form with influencers contributes to the rise of parasocial relationships, which are one-sided relationships in which someone develops a sense of closeness and intimacy with a media figure, usually a celebrity or influencer (Hoffner & Bond, 2022). The intensity of parasocial relationships is driven by the media figure’s moments of self-disclosure, glimpses into parts of the person’s life that are usually unknown, and momentary, technology-mediated interactions (e.g. reposting or liking a fan’s post) (Hoffner & Bond, 2022; Kim & Song, 2016; Kurtin, O’Brien, Roy, & Dam, 2018; Dai & Walther, 2018). Even though the influencer or celebrity does not know fans or even necessarily have their best interests at heart, it can feel to fans that they do because of the sense of closeness and trust they have for the influential person. 

Influencers are an important source of misinformation in the information ecosystem because of the scale of their impact. This is especially true for messages that are already viral or widespread; these messages actually help influencers gain more trust from their followers, regardless of the veracity of the message (Mulcahy, et al., 2024). However, influencers face little to no accountability when it comes to sharing misinformation, beyond the impact that being found to have shared inaccurate information might have on their reputation (Thi & Ibrahim, 2025). Unlike journalists, who receive training and commit to a code of ethics, social media creators operate outside any kind of formal ethical framework. 

Complicating the interplay between cognitive biases, algorithmically-driven online spaces, and persuasive social media personalities, is the rise of generative artificial intelligence (AI). Although access to this technology is fairly recent, the use of these systems contributes significantly to the existing problem of misinformation by allowing for the easy creation and customized dissemination of misinformation at scale (Bontridder & Poullet, 2021). Even elected officials have shared AI-generated misinformation with a wide audience (Skau, 2026). 

The widespread sharing of AI-generated misinformation has two main negative impacts; first, even when the content is fact-checked, it can continue to misinform due to the previously mentioned continued influence effect. Sandra Ristovska, an expert in visual evidence from the University of Boulder, Colorado described this challenge of false AI-generated images: “It lies deep in human nature and in the way we see and interpret images that it can be difficult to ‘un-see’ an image or a video once we have seen it” (Ristovska as cited in Skau, 2026, para. 10). The other negative effect is that it can contribute to a sense that nothing online is real, or that we shouldn’t bother determining if something is true or false; in other words, it deepens the cynicism many already feel. As Renee Hobbs, Professor of Communication at the University of Rhode Island, stated, “If we become indifferent to whether something is true or false, we risk losing many of the cooperative structures that make civilization possible” (as cited in Skau, 2026, para. 13). 

Willpower and Habits

Clearly many factors make fact-checking a challenging strategy to rely on for stopping the spread of misinformation and improving students’ media literacy. Importantly, whether an individual is stumbling upon someone else’s fact-check or considering whether to fact-check something themselves, they must have the willpower to take additional critical steps.

It could be argued that the most effective means of improving this situation is to make systemic changes, such as improving social media and search engine algorithms to prioritize accuracy and flag misinformation, or requiring influencers to be more transparent about their motives or qualifications. But while we continue to push for these systemic changes, individuals must continue to make information choices everyday, and this is what library instruction tends to focus on. With that in mind, how can we encourage individual actions that rely less on willpower?

What we are ultimately trying to accomplish is a habit change. Considerable research shows that changing someone’s habits through willpower is very challenging and often destined to fail (Bargh & Barndollar, 1996; Borland, 2013; Muraven, 2012; Wood et al., 2014). What is more effective is changing someone’s environment to encourage the desired behaviors (Bargh & Barndollar, 1996). In research conducted about the importance of environmental as opposed to willpower-based approaches to habit change, Duckworth et al. describe how “situational selection strategies” like putting a distracting device in another room during study time, spending time with friends who value studying, and telling someone else their study goal to hold them accountable had maximum success in improving student study habits (2016). These strategies were more successful than “self control” strategies, which students described as a mindset like, “Just deal with it and study” or “Just do it…I just focus and get my work done” (p. 334). This is just one example of many studies that show how stopping a bad habit through sheer willpower and keeping all other aspects of the environment the same has limited success. However, changing the environment to make the bad habit more difficult and good habits easy and effortless has a much better chance at success. 

The same is true with our information environments. When students spend considerable time in algorithmically-driven social media spaces, they may encounter more poor-quality information that requires fact-checking, and they may feel both a sense of cynicism about the information system more broadly as well as a lack of agency. However, when students spend less time being directed by an algorithm in information spaces with lots of tempting, low-quality information, and more time consulting reliable, standards-based information sources, they improve their information behavior, and, importantly, gain a sense of agency about what information they encounter and consume.

Recommendations for Academic Librarians

Although structural changes are necessary to address many of the issues discussed here, academic librarians may be able to contribute by changing how we approach information literacy instruction. While fact-checking methods like SIFT and lateral reading are important skills (that are convenient to fit into a 50 minute class period), librarians could instead (or in addition) address the importance of adopting new information habits. Rather than asking students to start with having the presence of mind and willpower to “stop” as in SIFT, maybe we should start our process before that “stop” is even necessary by intentionally designing the information environment in the first place.

“Lift Our Gaze” : Teach about Systemic Information Structures

One initial challenge that librarians must address is that it may require considerable motivation for students to take the initial steps to improve their information environments. If students believe that influencers are just as reliable as journalists (or more so), why would they change their habits? 

One strategy is to lean into the ACRL Frame “Information Creation is a Process” (2016). Librarians can help students better understand the systems that underlie the information they encounter through the concept of “infrastructural meaning-making” offered by Haider and Sundin (2023). They define infrastructural meaning-making as going “beyond examining the content’s sources, and even beyond evaluating the source’s content, to also be concerned with the institutions and systems, the platforms and algorithms that deliver it to us and onto our devices” (p. 2). To apply this concept, in addition to traditional source evaluation methods like CRAAP and SIFT, instructors would also encourage students to consider why that particular source appeared to them at that time – in other words, how do the conditions of access, along with the information and its source, help us understand the piece of information? (Haider & Sundin, 2019). Algorithmic literacy, situational awareness, and platform knowledge can all contribute to better decisions about whether to pay attention to a particular piece of information (Haider & Sundin, 2023). Fortunately, many simple and creative activities exist to help students understand how algorithms work to impact their information environments (Camarillo, 2025). While digital information infrastructures are often invisible to us (intentionally on the part of platform providers), we benefit from “lifting our gaze” to understand how networked environments impact what information we encounter (Haider & Sundin, 2023, p.3).

With this strategy, it’s important to consider how affective or attitudinal factors might impact students’ source evaluation approaches, and to add instructional interventions that address these factors to typical source evaluation instruction. For example, one researcher found that just teaching algorithmic awareness to students was helpful, but it was limited in its impact because students felt such a sense of powerlessness to shape their online experiences. However, by pairing algorithmic knowledge with activities that promote digital agency, we can help to combat the significant cynicism students feel about their digital environments (Chung, 2025). 

Along the same lines, helping students understand how standards-based news is created, especially in comparison to influencer-generated content, can help them view the information landscape with a wider scope, rather than focusing on fact-checking individual claims. In the field of communication, researchers have found that knowledge of how news is produced, disseminated, and consumed can improve misinformation detection (Ashley et al., 2023; Chan, 2024; Chan et al., 2024). 

Deliberately Design a News and Information Landscape

Next, students should be encouraged to intentionally seek out reliable information, rather than allow algorithms to determine their information landscape. Research shows that young adults who are exposed to news-rich environments, especially in the classroom, are more likely to develop news consumption habits (Edgerly, 2025; York & Scholl, 2015). In general, people need more help accepting true news than rejecting false news (Pfänder & Altay, 2025), so deliberately undertaking this task could be helpful. Researchers have also found that this approach – focusing on what sources to trust, rather than focusing on the small prevalence of misinformation – can increase trust in standards-based news, rather than fueling cynicism about news (Altay, De Angelis, & Hoes, 2024). However, it’s important to incorporate instruction about negativity bias and click-bait into this process, because research shows that a pessimistic outlook is correlated with self-selecting more negative and episodic news when given the chance to intentionally select news outlets (van der Meer & Hameleers, 2022). Encouraging students to deliberately select reliable information while also helping them break out of their cynical outlooks may improve the effectiveness of this strategy. Recommending platforms like the Good News Network and others that focus on positive news stories can help address the very real mental health concerns of increasing time spent focused on news.

Abstain from Unreliable Information Spaces

Finally, while it may not always be popular, taking time to teach students why social media platforms are an unreliable source of information is essential. These platforms are “firmly grounded in beliefs about individualism, capitalism and consumerism,” not the pursuit of accuracy (Fister, 2021). Librarians might even encourage students to step away from these platforms when possible and to the extent they feel comfortable. This might mean deliberately limiting or eliminating social media accounts, or engaging in phone-free time, which some college students are choosing to do for a variety of other reasons (Beres, 2025). In the habit example above, this is the step when the triggers for the bad habit are removed from the environment, and it is essential to success in new habit formation. Helping students recognize what platforms they engage in that deliver mostly low-quality information is an information literacy issue. 

Conclusion

Media literacy skills are essential to today’s college students, and academic librarians are among the few on campus with the expertise and skills to promote these skills for students. However, teaching students quick fact-checking strategies that they must remember and be motivated to use in the moment may not be effective in real-world environments for a variety of reasons, including the power of cognitive biases, the sway of parasocial relationships, the influence of algorithms and generative AI, and the systemic nature many of these problems. To teach students new habits, we should rely less on willpower and more on proactively/preemptively shaping information environments that help students feel empowered, informed, and positive (or at least realistic) about the information landscape.

It’s not as quick and easy as a fact-checking strategy, but helping students understand the information landscape and set up a more reliable information environment may have longer-lasting positive impacts than hoping to instill new habits for them that face considerable challenges to implement. It’s clear that we are facing more cynicism and disengagement from standards-based news and other authoritative information sources than we ever have before. Even with our limited resources, academic librarians can leverage our expertise to help with this major problem and move students towards a healthier relationship with online information. This foundational shift—from fact-checking individual claims to fostering a healthier, more intentional relationship with information—is arguably among the most critical skills college students can learn.


Acknowledgements

I would like to extend my sincere gratitude to editors Ian G Beilin, Jess Schomberg, and, especially, Brittany Paloma Fiedler, for their invaluable feedback throughout the editing process.  I would also like to thank Amber Willenborg for her thoughtful peer review of the manuscript. The input of these reflective, considerate people greatly improved the story-telling and flow of the paper, and it ensured that it was as inclusive as possible. Finally, I would like to thank Andrea Baer for significantly contributing to the ideas behind this manuscript through our engaging, helpful, and inspiring discussions.


Works Cited

Ahmed, S., Masood, M., Deng, R., & Malviya, S. (2025). Why cynics disengage: the nexus of political cynicism, misinformation, and online political participation. Asian Journal of Communication, 35(5), 381-402. https://www.tandfonline.com/doi/pdf/10.1080/01292986.2025.2538142 

Altay, S., De Angelis, A., & Hoes, E. (2024). Media literacy tips promoting reliable news improve discernment and enhance trust in traditional media. Communications Psychology, 2(1), 74. https://www.nature.com/articles/s44271-024-00121-5 

Altay, S., Lyons, B. A., & Modirrousta-Galian, A. (2025). Exposure to higher rates of false news erodes media trust and fuels overconfidence. Mass Communication and Society, 28(2), 301-325.  https://doi.org/10.1080/15205436.2024.2382776 

Ashley, S., Craft, S., Maksl, A., Tully, M., & Vraga, E. K. (2023). Can news literacy help reduce belief in COVID misinformation? Mass Communication and Society, 26(4), 695-719. https://doi.org/10.1080/15205436.2022.2137040 

Association of College & Research Libraries. (2016). Framework for information literacy for higher education. ACRL. https://www.ala.org/acrl/standards/ilframework 

Aufderheide, P. (1993). Media literacy. A report of the national leadership conference on media literacy. Aspen Institute, Communications and Society Program. https://eric.ed.gov/?id=ED365294 

Bailey, J. J. (2021). False beliefs and the illusion of explanatory depth. Journal of Business and Behavioral Sciences, 33(2), 54-64. https://asbbs.org/files/2021-22/JBBS_33.2_Fall_2021.pdf#page=55 

Bargh, J. A., & Barndollar, K. (1996). Automaticity in action. The psychology of action, 457-481.

Barthell, M.; Mitchell, A.; and Holcomb, J. (2016, December 15). Many Americans believe fake news is sowing confusion. Pew Research Center. https://www.pewresearch.org/journalism/2016/12/15/many-americans-believe-fake-news-is-sowing-confusion/ 

Beres, D. (2025, November 5). The age of anti-social media is here. The Atlantic. https://www.theatlantic.com/magazine/2025/12/ai-companionship-anti-social-media/684596/ 

Bobkowski, P. S., & Younger, K. (2020). News credibility: Adapting and testing a source evaluation assessment in journalism. College & Research Libraries, 81(5), 822. https://doi.org/10.5860/crl.81.5.822 

Bontridder, N., & Poullet, Y. (2021). The role of artificial intelligence in disinformation. Data & Policy, 3, e32. https://doi.org/10.1017/dap.2021.20 

Borland, R. (2013). Understanding hard to maintain behaviour change: a dual process approach. John Wiley & Sons.

Breakstone, J., McGrew, S., Smith, M., Ortega, T., & Wineburg, S. (2018, March). Why we need a new approach to teaching digital literacy. Phi Delta Kappan, 99(6), 27-32. https://doi.org/10.1177/00317217187624 

Brodsky, J. E., Brooks, P. J., Scimeca, D., Todorova, R., Galati, P., Batson, M., … & Caulfield, M. (2021). Improving college students’ fact-checking strategies through lateral reading instruction in a general education civics course. Cognitive Research: Principles and Implications, 6, 1-18. https://link.springer.com/article/10.1186/s41235-021-00291-4 

Bull, A.C. (2021). Dismantling the evaluation framework. In the Library with the Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2021/dismantling-evaluation/ 

Camarillo, L. A. (2025). Squinting through the dawn of AI: Embedding algorithmic literacy principles in library instruction. ACRL 2025 Proceedings. https://www.ala.org/sites/default/files/2025-03/SquintingThroughtheDawnofAI.pdf 

Cappella, J. N., & Jamieson, K. H. (1996). News frames, political cynicism, and media cynicism. The Annals of the American Academy of Political and Social Science, 546(1), 71-84. https://www.jstor.org/stable/pdf/1048171.pdf 

Caulfield, M. (2019, June 19). SIFT (The four moves). Hapgood. https://hapgood.us/2019/06/19/sift-the-four-moves/ 

Chan, M. (2024). News literacy, fake news recognition, and authentication behaviors after exposure to fake news on social media. New Media & Society, 26(8), 4669-4688. https://doi.org/10.1177/146144482211276 

Chan, M., Vaccari, C., & Yamamoto, M. (2024). Examining the relationship between dispositional news literacy and discernment of real and misleading news: Cross-national evidence. International Journal of Public Opinion Research, 36(2), edae020. https://doi.org/10.1093/ijpor/edae020 

Chung, M. (2025). When knowing more means doing less: Algorithmic knowledge and digital (dis) engagement among young adults. Harvard Kennedy School Misinformation Review. https://misinforeview.hks.harvard.edu/wp-content/uploads/2025/10/chung_algorithmic_literacy_youth_20251013.pdf 

Ciampaglia, G. L., Nematzadeh, A., Menczer, F., & Flammini, A. (2018). How algorithmic popularity bias hinders or promotes quality. Scientific Reports, 8(1), 1-7. https://doi.org/10.1038/s41598-018-34203-2  

Clear, J. (2018). Atomic habits: An easy & proven way to build good habits & break bad ones: tiny changes, remarkable results. Random House Business.

Dai Y, Walther JB. (2018). Vicariously experiencing parasocial intimacy with public figures through observations of interactions on social media. Human Communication Research, 44: 322–342, https://doi.org/10.1093/hcr/hqy003.

Digital Inquiry Group. (n.d.). Teaching lateral reading | Civic online reasoning. Retrieved February 11, 2026, from https://cor.inquirygroup.org/curriculum/collections/teaching-lateral-reading/ 

Duckworth, A., White, R., Matteucci, A., Shearer, A., & Gross, J. (2016). A stitch in time: Strategic self-control in high school and college students. Journal of Educational Psychology, 108(3): 329-41. https://psycnet.apa.org/fulltext/2016-15978-003.pdf 

Dunning, D. (2011). The Dunning–Kruger effect: On being ignorant of one’s own ignorance. In Advances in Experimental Social Psychology (Vol. 44, pp. 247-296). Academic Press.

Ecker, U. K., Lewandowsky, S., Swire, B., & Chang, D. (2011). Correcting false information in memory: Manipulating the strength of misinformation encoding and its retraction. Psychonomic Bulletin & Review, 18(3), 570-578. https://doi.org/10.3758/s13423-011-0065-1

Eddy, K.; Lipka, M.; Matsa, K. E.; Forman-Katz, N.; Liedke, J.; St Aubin, C.; & Wang, L. (2025, August 20). How Americans view journalists in the digital age. Pew Research Center. https://www.pewresearch.org/journalism/2025/08/20/how-americans-view-journalists-in-the-digital-age/ 

Edgerly, S. (2026). Developing the habit: The socialization of US teens into distinct repertoires of news consumption. Journal of Children and Media, 20(1), 132-150. https://doi.org/10.1177/14648849211012922 

Fister, B. (2021). Lizard people in the library. PIL Provocation Series, 1(1). Project Information Literacy. https://files.eric.ed.gov/fulltext/ED613472.pdf 

Fleming, J. (2014). Media literacy, news literacy, or news appreciation? A case study of the news literacy program at Stony Brook University. Journalism & Mass Communication Educator, 69(2), 146–165. https://doi.org/10.1177/1077695813517885 

Fletcher, R., Andı, S., Badrinathan, S., Eddy, K. A., Kalogeropoulos, A., Mont’Alverne, C., … & Nielsen, R. K. (2025). The link between changing news use and trust: longitudinal analysis of 46 countries. Journal of Communication, 75(1), 1-15. https://academic.oup.com/joc/article/75/1/1/7907139 

Gil de Zúñiga, H., & Diehl, T. (2019). News finds me perception and democracy: Effects on political knowledge, political interest, and voting. New Media & Society, 21(6), 1253-1271. https://journals.sagepub.com/doi/pdf/10.1177/1461444818817548 

Gordon, L. T., & Thomas, A. K. (2017). The forward effects of testing on eyewitness memory: The tension between suggestibility and learning. Journal of Memory and Language, 95, 190-199. https://doi.org/10.1016/j.jml.2017.04.004 

Haider, J., & Sundin, O. (2019). The fragmentation of facts and infrastructural meaning-making: new demands on information literacy. Information Research, 24(4), 24-4. https://papers.ssrn.com/sol3/Delivery.cfm?abstractid=4698023 

Haider, J., & Sundin, O. (2022). Paradoxes of media and information literacy: The crisis of information. Routledge.

Haider, J., & Sundin, J. H., Olof. (2023, September 21). What is infrastructural meaning-making and why do we need it? Information Matters. https://informationmatters.org/2023/09/what-is-infrastructural-meaning-making-and-why-do-we-need-it/ 

Hasell, A., & Halversen, A. (2024). Feeling misinformed? The role of perceived difficulty in evaluating information online in news avoidance and news fatigue. Journalism Studies, 25(12), 1441-1459. https://doi.org/10.1080/1461670X.2024.2345676 

Hewitt, A. (2023). What Role Can Affect and Emotion Play in Academic and Research Information Literacy Practices?. Journal of Information Literacy, 17(1), 120-137. https://files.eric.ed.gov/fulltext/EJ1393880.pdf 

Hicks, A., & Lloyd, A. (2021). Deconstructing information literacy discourse: Peeling back the layers in higher education. Journal of Librarianship and Information Science, 53(4), 559-571. https://link.springer.com/chapter/10.1007/978-3-030-43687-2_28 

Hobbs, R. (1998). The seven great debates in the media literacy movement. Journal of Communication, 48(1), 16-32. https://mediaeducationlab.com/sites/default/files/Seven_Great_Debates_0.pdf 

Hoffner, C. A., & Bond, B. J. (2022). Parasocial relationships, social media, & well-being. Current Opinion in Psychology, 45, 101306. https://doi.org/10.1016/j.copsyc.2022.101306 

Jaiswal, J., LoSchiavo, C., & Perlman, D. C. (2020). Disinformation, misinformation and inequality-driven mistrust in the time of COVID-19: lessons unlearned from AIDS denialism. AIDS and Behavior, 24(10), 2776-2780. https://doi.org/10.1007/s10461-020-02925-y 

Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(6), 1420-1436. https://psycnet.apa.org/fulltext/1995-04372-001.pdf 

Kavanagh, J., & Rich, M. D. (2018). Truth decay: An initial exploration of the diminishing role of facts and analysis in American public life. https://www.rand.org/pubs/research_reports/RR2314.html 

Kendeou, P., & O’Brien, E. J. (2014). The Knowledge Revision Components (KReC) framework: Processes and mechanisms. In D. N. Rapp & J. L. G. Braasch (Eds.), Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences (pp. 353–377). MIT Press.

Kim J, Song H. (2016). Celebrity’s self-disclosure on Twitter and parasocial relationships: a mediating role of social presence. Computers in Human Behavior, 62:570–577. https://doi.org/10.1016/J.chb.2016.03.083

Kurtin KS, O’Brien N, Roy D, Dam L (2018). The development of parasocial relationships on YouTube. The Journal of Social Media and Society, 7:233–252.

Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106-131. http://dx.doi.org/10.1037/a0039684 

Lin, F., Chen, X., & Cheng, E. W. (2022). Contextualized impacts of an infodemic on vaccine hesitancy: The moderating role of socioeconomic and cultural factors. Information Processing & Management, 59(5), 103013. https://doi.org/10.1016/j.ipm.2022.103013 

Lyons, B. A., Montgomery, J. M., Guess, A. M., Nyhan, B., & Reifler, J. (2021). Overconfidence in news judgments is associated with false news susceptibility. Proceedings of the National Academy of Sciences, 118(23), e2019527118. https://doi.org/10.1073/pnas.2019527118 

Martínez-Costa, M. P., López-Pan, F., Buslón, N., & Salaverría, R. (2023). Nobody-fools-me perception: Influence of age and education on overconfidence about spotting disinformation. Journalism Practice, 17(10), 2084-2102. https://www.tandfonline.com/doi/full/10.1080/17512786.2022.2135128 

Mont’Alverne, C., Badrinathan, S., Ross Arguedas, A., Toff, B., Fletcher, R., & Kleis Nielsen, R. (2022). The trust gap: how and why news on digital platforms is viewed more skeptically versus news in general. Reuters Institute. https://reutersinstitute.politics.ox.ac.uk/trust-gap-how-and-why-news-digital-platforms-viewed-more-sceptically-versus-news-general 

Motta, M., Callaghan, T., & Sylvester, S. (2018). Knowing less but presuming more: Dunning-Kruger effects and the endorsement of anti-vaccine policy attitudes. Social Science & Medicine, 211, 274-281. Knowing less but presuming more_ Dunning-Kruger effects and the endorsement of anti-vaccine policy attitudes

Mulcahy, R., Barnes, R., de Villiers Scheepers, R., Kay, S., & List, E. (2025). Going viral: Sharing of misinformation by social media influencers. Australasian Marketing Journal, 33(3), 296-309.

Muraven, M. (2012). Ego depletion: Theory and evidence. The Oxford handbook of human motivation, 111, 126.

News Literacy Project (2024). News literacy in America: A survey of teen information attitudes, habits and skills. NLP-Teen-Survey-Report-2024.pdf

News Literacy Project (2025). ‘Biased,” “boring” and “bad”: Unpacking perceptions of news media and journalism among U.S. teens. NLP-Teens-and-News-Media-Report-2025.pdf

Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175-220. https://doi.org/10.1037/1089-2680.2.2.175 

Oreskes, N., & Conway, E. M. (2010). Defeating the merchants of doubt. Nature, 465(7299), 686-687.

Oswald, M. E., & Grosjean, S. (2004). Confirmation bias. Cognitive illusions: A handbook on fallacies and biases in thinking, judgement and memory. Psychology Press.

Pariser, E. (2011). The filter bubble: How the new personalized web is changing what we read and how we think. Penguin.

Park, S., Fisher, C., Fletcher, R., Tandoc Jr, E., Dulleck, U., Fulton, J., … & Yao, S. P. (2025). Exploring responses to mainstream news among heavy and non-news users: From high-effort pragmatic skepticism to low effort cynical disengagement. New Media & Society, 27(7), 4143-4163. https://journals.sagepub.com/doi/pdf/10.1177/14614448241234916 

Peng, R. X., & Shen, F. (2025). Why fall for misinformation? Role of information processing strategies, health consciousness, and overconfidence in health literacy. Journal of Health Psychology, 30(8), 2030-2045. https://doi.org/10.1177/13591053241273647

Pennycook, G., & Rand, D. G. (2019). Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition, 188, 39-50. https://doi.org/10.1016/j.cognition.2018.06.011 

Pew Research Center. (2024, June 15). Most Black Americans believe U.S. institutions were designed to hold Black people back. https://www.pewresearch.org/race-and-ethnicity/2024/06/15/most-black-americans-believe-u-s-institutions-were-designed-to-hold-black-people-back 

Pfänder, J., & Altay, S. (2025). Spotting false news and doubting true news: a systematic review and meta-analysis of news judgements. Nature Human Behaviour, 9(4), 688-699. https://www.nature.com/articles/s41562-024-02086-1 

Pronin, E., Lin, D. Y., & Ross, L. (2002). The bias blind spot: Perceptions of bias in self versus others. Personality and Social Psychology Bulletin, 28(3), 369-381. https://journals.sagepub.com/doi/pdf/10.1177/0146167202286008?casa_token=ZS3Q-iOSxikAAAAA:Pgfv8gXr2LjWr3lxdBD5Evj8BcjLJpGW9F6GJFfbWqnj4OGpbJcCNQbQoklWWxwO7lv4yMauNnY 

Raffio, N. (2024, October 28). Trust in voting: How misinformation threatens democracy. USC Today. https://today.usc.edu/trust-in-voting-how-misinformation-threatens-democracy/ 

Ross Arguedas, A., Robertson, C., Fletcher, R., & Nielsen, R. (2022). Echo chambers, filter bubbles, and polarisation: A literature review. The Royal Society. https://doi.org/10.60625/risj-etxj-7k60 

Royzman, E. B., Cassidy, K. W., & Baron, J. (2003). “I know, you know”: Epistemic egocentrism in children and adults. Review of General Psychology, 7(1), 38-65. https://doi.org/10.1037/1089-2680.7.1.38 

Rozenblit, L., & Keil, F. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive science, 26(5), 521-562. https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog2605_1 

Saunders, L. (2025). Information literacy as part of an interdisciplinary approach to combat misinformation. Information Research an International Electronic Journal, 30(CoLIS), 424-442. https://publicera.kb.se/ir/article/download/52318/43437 

Schirmer, M., Walter, N., & Horvát, E. Á. (2025). Disparities by design: Toward a research agenda that links science misinformation and socioeconomic marginalization in the age of AI. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-178 

Seifert, C. M. (2014). The continued influence effect: The persistence of misinformation in memory and reasoning following correction. In Rapp, D. & Braasch, J.L.G. (Ed.s.), Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences (pp 39-71.) MIT Press.

Skurka, C., Cheng, Z., Goyanes, M., & Gil de Zúñiga, H. (2026). News Finds Me as the illusion of competence: evidence for overconfidence in discernment of political misinformation. Human Communication Research, 52(1), 11-23. https://doi.org/10.1093/hcr/hqaf015 

Skau, M. 2026, Feb 24. AI challenges our relationship with truth. Media Education Lab. Retrieved March 2, 2026, from https://mediaeducationlab.com/index.php/blog/ai-challenges-our-relationship-truth 

Sloman, S., & Fernbach, P. (2018). The knowledge illusion: Why we never think alone. Penguin.

Spampatti, T. (2025). Truth discernment may not help to overcome misinformation. Nature Climate Change, 15(10), 1006-1009. https://www.nature.com/articles/s41558-025-02426-7 

Stecula, D. A. (2025). Getting misinformation wrong: Why context fixes can’t solve structural problems [white paper]. University of Delaware Biden School of Public Policy & Administration; Stavros Niarcho Foundation Ithaca Initiative. https://udspace.udel.edu/server/api/core/bitstreams/3cac95b7-d97c-4929-8ec9-e88cbc76470a/content 

Stein, R., Rutchick, A. M., Sin, A. Y., & Jarrin Rueda, L. F. (2025). Symbolic show of strength: a predictor of risk perception and belief in misinformation. The Journal of Social Psychology, 1-27. https://doi.org/10.1080/00224545.2025.2541206 

Sullivan, M. C. (2019). Why librarians can’t fight fake news. Journal of Librarianship and Information Science, 51(4), 1146-1156. https://doi.org/10.1177/0961000618764258 

Thi, P. V., & Ibrahim, A. (2025). Influencer credibility and authenticity in the fight against misinformation. Feedback International Journal of Communication, 2(3), 205-215. https://doi.org/10.62569/fijc.v2i3.199 

Thorson, E. (2016). Belief echoes: The persistent effects of corrected misinformation. Political Communication, 33(3), 460-480. https://repository.upenn.edu/bitstreams/fde2b15d-38dd-4d96-9205-6ca7bfb356e2/download 

Valgarðsson, V., Jennings, W., Stoker, G., Bunting, H., Devine, D., McKay, L., & Klassen, A. (2025). A Crisis of Political Trust? Global Trends in Institutional Trust from 1958 to 2019. British Journal of Political Science, 55, e15. https://doi.org/10.1017/S0007123424000498 

van der Meer, T. G., & Hameleers, M. (2022). I knew it, the world is falling apart! Combatting a confirmatory negativity bias in audiences’ news selection through news media literacy interventions. Digital Journalism, 10(3), 473-492. https://doi.org/10.1080/21670811.2021.2019074 

Van Der Meer, T. G., Hameleers, M., & Ohme, J. (2023). Can fighting misinformation have a negative spillover effect? How warnings for the threat of misinformation can decrease general news credibility. Journalism Studies, 24(6), 803-823.  https://doi.org/10.1080/1461670X.2023.2187652 

Wack, M., Duskin, K., & Hodel, D. (2024). Political fact-checking efforts are constrained by deficiencies in coverage, speed, and reach. arXiv preprint arXiv:2412.13280.

Walter, N., Cohen, J., Holbert, R. L., & Morag, Y. (2020). Fact-checking: A meta-analysis of what works and for whom. Political Communication, 37(3), 350-375. https://doi.org/10.1080/10584609.2019.1668894 

Westlund, O., Belair-Gagnon, V., Graves, L., Larsen, R., & Steensen, S. (2024). What is the problem with misinformation? Fact-checking as a sociotechnical and problem-solving practice. Journalism Studies, 25(8), 898-918. https://www.tandfonline.com/doi/pdf/10.1080/1461670X.2024.2357316 

Willenborg, A., & Detmering, R. (2025). ” I don’t think librarians can save us”: The material conditions of information literacy instruction in the misinformation age. College & Research Libraries, 86(4), 534. doi:https://doi.org/10.5860/crl.86.4.535 

Wineburg, S. & McGrew, S. (2019). Lateral reading and the nature of expertise: Reading less and learning more when evaluating digital information. Teachers College Record: The Voice of Scholarship in Education, 121(11): 1-40. https://doi.org/10.1177/016146811912101102 

Wood, W., Labrecque, J. S., Lin, P. Y., & Rünger, D. (2014). Habits in dual process models. Dual process theories of the social mind, 1, 371-85.

York, C., & Scholl, R. M. (2015). Youth antecedents to news media consumption: Parent and youth newspaper use, news discussion, and long-term news behavior. Journalism & Mass Communication Quarterly, 92(3), 681-699. https://doi.org/10.1177/1077699015588191 

Zhou, Y., & Shen, L. (2024). Processing of misinformation as motivational and cognitive biases. Frontiers in Psychology, 15, 1430953. https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1430953/pdf 

by Mandi Goodsett at May 27, 2026 06:54 PM

Open Knowledge Foundation

We are learning AI by doing – OKFN Newsletter May 2026

AI Learning Labs, neurotech, public data, Open Technology Symposium, and more.

The post We are learning AI by doing – OKFN Newsletter May 2026 first appeared on Open Knowledge Blog.

by OKFN at May 27, 2026 03:13 PM

May 26, 2026

David Rosenthal

Wrench Attacks

XKCD #538
A year ago I wrote The Risks Of HODL-ing sparked by Mitch Moxley's They Stole a Quarter-Billion in Crypto and Got Caught Within a Month. Moxley recounts the kidnapping of Veer Chetal's parents to persuade him to hand over his share of the loot:
the Lamborghini was suddenly rammed from behind by a white Honda Civic. At the same time, a white Ram ProMaster work van cut in front, trapping the Chetals. According to a criminal complaint filed after the incident, a group of six men dressed in black and wearing masks emerged from their vehicles and forced the Chetals from their car, dragging them toward the van’s open side door.
Below the fold I look at Bloomberg updates from last week on why the crypto-bros are having to spend vast sums on defending against the threat of HODL-ing.

First, Emily Nicolle's Crypto High-Rollers Go Big on Bodyguards to Deter Kidnappers reports on the aftermath of a serious security breach at Coinbase:
Coinbase has said that the leak affected less than 1% of its monthly transacting users. Yet for months, criminals had access to customer data that included their names, addresses, government-ID imagery, transaction history and account balances. Customer support workers in India were bribed to offer access to the company’s data.

Criminals have already used the information to trick some Coinbase customers into handing over access to their accounts or transferring their tokens. As with data leaks from traditional banks, personal information can be used for online fraud and identity theft. But the physical threats are of particular concern to crypto investors, many of whom have long operated anonymously to avoid threats.
The "crypto investors" who "operated anonymously" should have paid attention to the technology for deanonymization. Gosh and Lee report:
In most documented cases, attackers have identified marks in advance. Public blockchain records, leaked exchange data and chain-analytic tools — available to both investigators and criminals — together produce a legible map of who holds what.
Because Coinbase is a US-based exchange, the investors had to undergo KYC/AML:
The concerns about physical safety have come to the fore after the Coinbase attack because the hackers who penetrated the cryptocurrency exchange gained access to data that could allow them to identify and track down customers with large holdings — a frightening prospect just a few days after the kidnapping attempt in France.
So the exchange had to have their personal information, so their safety was in the hands of Coinbase's employees and systems. But not to worry because Coinbase is a: high-tech company:
The industry’s massive investments in protecting online systems may even be fueling the offline risks. Rapid crypto innovation has meant cracking cyber defences has become so challenging that adversaries are resorting to physical attacks, according to Charles Marino, CEO of the security firm Sentinel, which provides intelligence reports about ongoing threats in the crypto industry.
The "industry’s massive investments" clearly didn't prevent low-paid "customer support workers in India" having access to personal information that placed their customers lives at risk.

But it isn't the customers' safety that Coinbase is worried about:
The elevated concerns around the safety of crypto executives and their loved ones are illustrated by the amount of money that Coinbase spends to protect its own chief executive officer, Brian Armstrong.

The company spent $6.2 million in personal security costs for Armstrong last year, according to an April regulatory filing that detailed executive compensation. That’s more than the combined amount that JPMorgan Chase & Co., Goldman Sachs Group Inc. and Nvidia Corp. spent on their respective CEOs, similar filings showed.
Second, Suvashree Ghosh and Isabelle Lee report that Crypto Crime Escalates With Kidnappings, Cons and Human Coercion:
After a year of kidnappings, assaults and armed home invasions targeting cryptocurrency holders, the industry is racing to harden its defenses.

Conferences are beefing up security. Private firms serving crypto holders say demand has surged. Exchanges are protecting their executives.
...
The technology’s defining transparency, which its adherents have long celebrated as a structural improvement on the opaque plumbing of traditional finance, is the same feature that lets a criminal identify a target.
Be careful what you wish for. It is possible to maintain anonymity (or rather pesudonyity) despite the transparency of the infrastructure, but doing so requires an extraordinary level of operational security. You are in hand-to-hand combat with North Korean hackers, not to mention "the Com" and other assorted criminals.

Attacks by Month
And the bad guys are rampant:
Physical attacks on cryptocurrency holders rose 75% in 2025, reaching 72 confirmed incidents and $41 million in known losses, according to data compiled by the blockchain security firm CertiK. The figure is widely considered understated, with kidnappings and ransom demands often resolved privately. Jameson Lopp, co-founder of the Bitcoin custody firm Casa, maintains a separate public database that has tracked a roughly threefold increase of known so-called wrench attacks between 2023 and 2025.
Check out Jamison Lopp's Known Physical Bitcoin Attacks. He has a job for life.

The mantra of the hard-core crypto-bros is "not your keys, not your coins", meaning that for safety you shouldn't trust exchanges to hold your coins. But the whales are finding that they need to trust physical vaults:
The founder of a large crypto protocol said he has moved his digital-asset holdings out of self-custody on-chain wallets and into physical vaults at four separate institutions, splitting his crypto across them as an additional safeguard. Each requires him to physically sign and wait through a seven-day lock period before any withdrawal. To access the full sum now takes him a month. He declined to be named, citing the risk of being identified to kidnappers.
As Nicholas Weaver writes:
If Bitcoin is the "Internet of money," what does it say that it cannot be safely stored on an Internet connected computer?
Nakamoto's goal was trustlessness but::
Crypto’s founding proposition was that financial sovereignty could be restored to individuals by removing intermediaries and anchoring wealth to cryptographic keys rather than institutional relationships. That proposition has held. The consequence is that the keys — and the people who hold them — are now the single point of failure. There is no bank branch to call and no regulator to appeal. A stolen key is a final transaction.

“Criminals follow where they believe the money is,” said Healy. “And many crypto-affiliated individuals combine significant wealth with a uniquely difficult threat landscape.”
Louis Ashworth follows up with A bitcoin miner spent $860k armouring vehicles for its bosses, a fact for which we have very little context:
Here are two notes from the latest DEF14A compensation tables for MARA Holdings, the bitcoin miner (our emphases):
(3) Amount reflects costs related to personal security for Mr. Thiel pursuant to MARA’s security program ($4,300,629), including a one-time expense for vehicle armoring ($430,780) and a one-time expense for home security installation ($58,810); the incremental cost to the Company associated with Mr. Thiel’s personal use of Company aircraft ($43,114); and a Company contribution under our 401(k) plan ($10,500)

(6) Amount reflects costs related to personal security for Mr. Khan pursuant to MARA’s security program ($3,946,398), including a one-time expense for vehicle armoring ($438,380) and a Company contribution under our 401(k) plan ($10,500).
We’ve never seen “vehicle armoring” disclosed as a perk before, and $869,160 of across two executives is quite a lot.
Mara noted:
As a result of the Company’s substantial and publicly disclosed bitcoin holdings, our executives face an elevated and distinctive threat profile that differs materially from that of executives at most other public companies. Our CEO, CFO and other employees have experienced, and continue to experience, direct security threats.
But this security is actually a good thing because across the entire cryptosphere there may be around $100M/year being siphoned from cryptocurrency users to lubricate the real economy of security companies, personal armored vehicles and bodyguards. Not to mention maybe half that being siphoned from HODL-ers via criminals to Lamborghini dealers. In the face of the looming recession, every bit of spending in the K-shaped economy helps to boost GDP.

by David. (noreply@blogger.com) at May 26, 2026 03:00 PM

Harvard Library Innovation Lab

Introducing the Law Skills Hub

A trusted advisor, someone with decades of experience, can help with both small things and big things. Often, the small things come first. Getting the structure of a document right, or unsticking an awkward passage, can clear space for the deeper thinking that follows.

The procedural knowledge of an experienced advisor lives in the space between what they say and what they ask, what they cross out and what they leave, what they teach explicitly and what they only ever model.

A great deal of writing, reformatting, and thinking-through is now happening inside AI agents. The agents are general by design. They start from an average of the public web, which means a student asking one to “fix my résumé” gets an average resume. An advisor’s twenty years of experience is nowhere in that exchange.

We built the Law Skills Hub to see if it was possible to capture, preserve, and share relevant procedural expertise with others and with agents, to empower more meaningful work.

What it is

The Law Skills Hub is a curated, openly licensed collection of agentic skills. Skills are small, structured documents that a user can install in an AI agent, so that the agent has a procedure to follow, not just a prompt to react to. Each skill in the hub has been written or vetted by LIL, published in plain Markdown, and kept under public version control. You can find the hub at lil.law.harvard.edu/lawskills-hub and the underlying repository at github.com/harvard-lil/lawskills-hub. A human can read a skill like a recipe. An AI tool can read it too.

Screenshot of the Law Skills Hub, highlighting the "Law Professor" or instructional category.

A skill is the codification of a process, a checklist of sorts for how to coach a student writing a public-interest resume, how to scaffold a syllabus around evidence-based learning, how to reformat instructor feedback so it tracks the rubric. The skill carries the steps, the values to check against, the templates the expert would reach for, and the things they would not do. A skill does not replace expertise. It tries to preserve and apply process.

We are launching with a small set of skills already in production, several more in progress, and a contributor guide for anyone who wants to add to the collection.

Why now

Three things became clear over the last several months, working with faculty, with Career Services, and inside the lab.

The first is that AI companies are starting to converge on a standard for agent skill. Anthropic and OpenAI have agreed on a common format and capabilities which you can learn about at agentskills.io.

The second is that, like most things, it is best to meet users where they are. Many people are working in agent software to create and improve knowledge work. People are not, as a rule, going to abandon the agent and come to a library website or use a bespoke tool. If our procedural knowledge is going to be useful, it must travel into the agent the user is already in and ideally into more than one of them, because the agents are interchangeable and people switch between them.

The third is that there is a growing informal economy of skills shared in zip files, gists, and Discord threads. A non-technical user downloading one of these has no easy way to know what is inside, what values it encodes, or whether the code it runs has been read by anyone they trust. Some of those skills are excellent. Some of them quietly do things their users would not endorse.

We think there was room for a different kind of hub. A hub grounded in stewardship and reliability.

A Harvard Law⁠–⁠branded hub on the harvard.edu domain, with skills published as readable Markdown rather than zipped bundles, is our attempt to address both problems at once. The address tells you where the software comes from. The format lets you read it before you run it. We’ve also created “meta skills” which allow people to install one skill that will help them discover and install other skills for them based on their interests.

What we won’t do

Replace human cognition.

The hub has a clear scope, and the contributor guide names it. We are not building skills that produce essays, exam answers, or thought labor on the user’s behalf. The skills we publish coach, reformat, and scaffold—they presume the user has the source material, the question, the work, and that what they want help with is the procedural part. The mechanical, the administrative, the templated.

This is a values boundary, not a technical one. We are a library, and our work has always been about making people more capable of their own thinking, not less.

A librarian’s framing

There is an older form on campus this hub is descended from, even if it isn’t always recognized. The library guide, or LibGuide, is the genre librarians have used for a long time to compact the things people keep asking about, the workflows experts reach for, the curated path through a subject. A skill, in our reading, is a LibGuide an agent can execute.

This frames the work for us in a way we have found useful. We are not, primarily, building software. We are doing something closer to journalism, or to archival fieldwork—sitting with experienced practitioners, recording what they do and how they do it, and turning that record into a document a future user (human or otherwise) can consult. The output happens to be machine-readable.

Not every workflow wants to be a skill. Some procedural knowledge is inseparable from the relationship in which it is taught, and writing it down would flatten it. Part of the work is knowing the difference.

What remains uncertain

We do not yet know how far this approach scales, and we want to say so plainly.

We do not know how large a skill can be before its consistency degrades. Résumé coaching is a useful test case: the work for a private-sector clerkship and the work for a public-interest fellowship genuinely diverge. We are running both as a single skill with branching, and as two specialized skills, and we do not yet know which will produce better outcomes at scale.

We do not know how portable skills are across disciplines. A faculty-feedback skill that works for a 1L torts course may or may not work in a humanities seminar or a wet-lab science. We suspect some skills are portable and some are deeply local; we cannot yet tell you which are which.

These are open questions, not rhetorical ones. The hub is a hypothesis, and the next year of work is testing it.

An invitation

For now, we have a basic site, a public repository, and a small but growing set of example skills. We are continuing to refine what is already there, add new skills, and learn where the approach holds and where it begins to fray.

We are hoping to talk with more people who are willing to share procedural knowledge with us. Sometimes that means a formal contribution. Sometimes it means an issue or a pull request. Sometimes it just means a conversation where we record how someone thinks through a recurring task, what they notice, what they warn against, and what they have learned.

If you are an institution thinking about something like this on your own campus, we would rather collaborate than duplicate. The hub’s value grows if other libraries are stewarding their own skills.

We’d love for you to take part.

by Jenevieve Haggard at May 26, 2026 12:00 AM

May 24, 2026

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2026-05-24: Paper Summary: Context-Based URL Classification for Open Access Datasets and Software in Scholarly Documents


Figure 1: EnSU Architecture. This is Figure 2 in our paper.

This blog post summarizes our paper, "Context-Based URL Classification for Open Access Datasets and Software in Scholarly Documents," (preprint) published in the 2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’25)Many scholarly papers include URLs that point to open-access datasets and software (OADS), but the URL string alone rarely tells us what the link refers to. In this paper, we present EnSU, an ensemble of three complementary models for classifying these URLs using the surrounding citation context. EnSU assigns each URL to one of six categories that jointly reflect both the resource type (dataset vs. software) and resource provider (authors vs. third parties), plus two catch-all categories for projects and general links. On our OADS-1K dataset, EnSU achieves a macro-average F1-score of up to 0.90 on a stratified 80/20 split and a mean macro-average F1-score of 0.89 across five-fold cross-validation. We also report that EnSU outperforms the best single-model classifier by 20%.

Introduction

Computational reproducibility depends on being able to access the same data and software used in a published study [1]. In practice, authors often share or cite such resources through URLs embedded in the paper text. These links can be valuable evidence for tracking and preserving OADS, but they are also challenging to index at scale.

A core obstacle is that URLs are semantically underspecified, such that a repository URL might host code, data, both, or something else, and the intended meaning is often expressed only in the nearby prose. We argue that moving from coarse URL detection to fine-grained classification is important for better metadata and discoverability, including distinguishing whether a resource is contributed by the paper’s authors or reused from elsewhere.

Problem Statement: What "Context-based URL Classification" Means

In this work, context-based URL classification means that given a URL that appears in a scholarly document, we classify it using the citation context around the URL, not the URL string alone.

Expanded Context Representation

We represent a URL’s textual context as a three-sentence window: the sentence immediately before the URL’s sentence, the target sentence that contains the URL, and the sentence immediately after.

If the preceding or trailing sentence is missing (for example, at a paragraph boundary), the “expanded context” reduces to the sentences that exist.

Dataset: OADS-1K

For training and evaluation, we compile OADS-1K, which contains 1,129 manually annotated samples. Each sample includes a URL-containing target sentence together with its expanded context. The annotation process considered six categories, listed below.

Output Labels (Six Categories)

We classify each URL into one of six categories: 

  1. Third-Party Dataset: points to a dataset hosted by someone other than the paper’s authors.
  2. Third-Party Software: points to software, tools, or code hosted by someone other than the paper’s authors.
  3. Author Provided Dataset: points to a dataset created and shared by the paper’s authors. 
  4. Author Provided Software: points to software, tools, or code created and shared by the paper’s authors.
  5. Project: points to a project website or repository that contains both data and software/tools.
  6. General URL: points to something other than a dataset, software/tool, or project.


A graph with blue bars  AI-generated content may be incorrect.
Figure 2: Distribution of Samples Across Different Subject Categories from CORD-19, ETD, and arXiv in OADS-1K. This is Figure 1 in our paper.

We build OADS-1K from 1,574 scholarly documents published between 2016 and 2022, drawn from three publicly available sources: CORD-19 [3], Electronic Theses and Dissertations (ETDs) [2], and arXiv [4]. The sampling prioritizes documents that contain at least two URLs. We note that the resulting set contains many biomedical and computer science scholarly documents, which is consistent with the underlying corpora and with the prevalence of data and software links in those fields.

Manual Extraction and URL Context Normalization

We extract contexts by visually inspecting each PDF and recording the target sentence and its expanded context. While many URLs appear inline, others show up in footnotes or reference sections. In those cases, we first substitute the citation marker with the full footnote or reference entry (including the URL), and then extract the surrounding sentences.

Annotation Process

Two graduate student annotators label all samples and reach 92% consensus. When they disagree, a third annotator with relevant expertise helps adjudicate.

When the target sentence and expanded context do not provide enough information to determine the category, annotators are instructed to follow the URL and inspect the linked content. We give an example where the context suggests the link is a dataset repository but does not reveal whether it is author-provided or third-party; the annotators then cross-reference paper authors with repository contributors to decide the final label.

Class Distribution and Examples

Table 1: Examples of URLs with target sentences and expanded contexts for each URL category. This is Table 1 in our paper. In this table, we represent the preceding sentence with <preceding>...</preceding>, the sentence containing the URL with <target>...</target>, and the trailing sentence with <trailing>...</trailing>

Table 1 provides both class proportions and representative examples. The dataset contains all six categories, but the "Project" class is notably smaller than the others.

Method

The central design choice is to ensemble complementary models rather than rely on a single classifier. We  motivate this by noting that URL contexts can be subtle and that different modeling choices capture different signals.

We build EnSU, an ensemble of three classifiers: 

  1. Supervised Contrastive Learning (SCL) [6] classifier, 
  2. SciBERT-based classifier, 
  3. BertGCN [5] classifier 

We then combine their predictions through majority voting, with a deterministic tie-breaking rule (see Fig. 1). If two models agree on the category, we take that shared label. If all three disagree, we output the BertGCN prediction as it is the strongest individual model among the three.

1) Supervised Contrastive Learning (SCL) Model

The SCL component is motivated by data scarcity. In addition to the standard cross-entropy objective, we use supervised contrastive learning, which encourages representations of same-class examples to be closer in embedding space than representations from different classes.

In practice, we start from a pretrained encoder and optimize a weighted mix of cross-entropy and supervised contrastive objectives. We also discuss the temperature term in the contrastive loss, which influences how sharply the model separates hard negatives.

In the experiments, we compare several pretrained encoders within the SCL framework and select SPECTER because it outperformed other language models, such as BERT, SciBERT, and DistilBERT, in context-based URL classification (Table 2).

2) SciBERT-based Model

The SciBERT model is a conventional transformer classifier: we fine-tune SciBERT and add a linear classification head to predict the six URL categories from the concatenated context input.

3) BertGCN Model

BertGCN augments a BERT-style encoder with a graph convolutional network (GCN) over a corpus-level graph. We build a graph containing document nodes (one per OADS-1K sample), word nodes (vocabulary terms), word-word edges weighted by PPMI (pointwise mutual information), and document-word edges weighted by TF-IDF.

On OADS-1K, we report a graph with 1,129 document nodes, 14,956 word nodes, and 969,423 edges. The adjacency matrix occupies 11.16 MB and is generated in 0.377 seconds on a server with 48 CPUs and 32 GB RAM.

We then fine-tune a BERT encoder and train a two-layer GCN to propagate label information across the graph structure, jointly optimizing the components with cross-entropy.

Experimental Setup

We evaluate on OADS-1K using a stratified 80/20 train-test split, and we also report five-fold cross-validation. The primary metric is macro-average F1, alongside precision and recall.

We compare EnSU against several baselines, including individual ensemble components (SCL, SciBERT, BertGCN), an LLM-based few-shot classifier (using GPT-4 and Claude 3.7 Sonnet in the described setup), OADSClassifier [7], a prior hybrid approach that combines heuristic and learning-based components and is adapted here from binary detection to the six-way classification setting.

For the LLM baseline, we use a few-shot prompt with category definitions and labeled examples, sets temperature to 0, and generates five independent predictions per input. It then takes a majority vote, and if there is no majority, it falls back to the class with the highest averaged logit probabilities. We report 94% consensus across the five runs.

Results

Table 2: F1-scores for SCL with different language models using the target sentence and expanded context as input. This is Table 2 in our paper.

We evaluated several pre-trained language models to choose the best encoder for the SCL classifier. As shown in Table 2, SPECTER performs best, achieving the highest macro-average F1 score of 0.85 and outperforming the other models.

Table 3: Performance metrics (Precision (P), Recall (R), and F1-score) for different input combinations evaluated with EnSU. Input 1: Target sentence. Input 2: Target sentence with expanded context. This is Table 3 in our paper.

To test whether surrounding sentences matter, we compare a target-only input against the expanded context window (Table 3). With expanded context, EnSU’s macro F1 increases from 0.88 to 0.90. This matches our observation that the cues needed to interpret a URL often sit just outside the sentence that contains it.

Table 4: Macro F1-scores for different URL classifiers evaluated on an 80/20 stratified split of the OADS1K dataset. This is Table 4 in our paper. “Claude” refers to Claude 3.7 Sonnet, and “SCL” stands for Supervised Contrastive Learning. 

Table 4 shows that the proposed EnSU classifier performs best overall, with a macro F1 score of 0.90. It consistently leads in key categories such as "Author Provided Software," "Project," and "Author Provided Dataset," significantly outperforming baseline methods, including LLM-based approaches.

We report that EnSU’s improvement over the strongest individual model (BertGCN) is statistically significant under a paired Student’s t-test (t(4) = -4.8107, p = 0.0086).

Data Efficiency

To study data efficiency, we train with 25%, 50%, 75%, and 100% of the available training data and tracks how performance changes.

Figure 3: Test F1-scores of SciBERT, SCL, BertGCN, and EnSU across different training data sizes (25%, 50%, 75%, 100%). This is Figure 4 in our paper.

Figure 3 shows a comparison of F1 scores for SciBERT, SCL, BertGCN, and EnSU on the sample test set as the training set size increases from 25% to 100%.

Runtime

On the 230-sample test set, we report a total runtime of 37.85 seconds for EnSU, which works out to roughly 0.165 seconds per sample. We present this as evidence that the approach is practical for larger-scale processing.

Error Analysis

Figure 4: Confusion matrix showing the performance of EnSU on the OADS-1K dataset. This is Figure 5 in our paper.

Figure 4 represents the confusion matrix for EnSU on OADS-1K, summarizing which classes are most often confused. 

EnSU benefits from combining multiple models, which helps it handle difficult cases where individual classifiers fail. For example, when one model mislabels a project URL as author-provided software, others correctly identify it, and the ensemble’s majority vote recovers the correct label. The confusion matrix shows strong overall performance, especially for "Project" and "Author Provided Software," but also reveals recurring challenges. The most common errors arise when author-created datasets are hosted on well-known repositories, or when URLs linking instructional pages mentioning software are mistaken for actual software links. These cases highlight how subtle language cues and overlapping mentions of data and tools can still confuse the model.

Limitations and Future Work

In our paper, we emphasize that OADS-1K is relatively small and that the Project category is underrepresented. In addition, the dataset excludes cases where the target sentence contains multiple URLs.

For future work, we plan to expanding and balancing the dataset, studying URLs that appear with limited surrounding text, and exploring LLM-agent approaches that inspect the linked content to help determine the URL type.


Availability

The dataset and source code are publicly available on GitHub: https://github.com/lamps-lab/EnSUIf you use this repository, please cite our JCDL ’25 paper: https://doi.org/10.1109/JCDL67857.2025.00031


Acknowledgment

This work was supported in part by the Institute of Museum and Library Services under grant LG-256694-OLS-24.

References:

[1] National Academies of Sciences, Engineering, and Medicine, “Reproducibility and Replicability in Science,” 2019.
[2] S. Uddin et al., “Building a large collection of multi-domain electronic theses and dissertations,” in 2021 IEEE International Conference on Big Data (Big Data). IEEE, 2021, pp. 6043–6045.
[3] L. L. Wang et al., “CORD-19: The COVID-19 open research dataset,” in Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, K. Verspoor et al., Eds., Online, Jul. 2020. [Online]. 
[4] M. Färber, “Analyzing the GitHub repositories of research papers,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020, pp. 491–492.
[5] Y. Lin, Y. Meng, X. Sun, Q. Han, K. Kuang, J. Li, and F. Wu, “BertGCN: Transductive text classification by combining GNN and BERT,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, C. Zong, F. Xia, W. Li, and R. Navigli, Eds., Online, Aug. 2021, pp. 1456–1462. [Online]. 
[6] B. Gunel, J. Du, A. Conneau, and V. Stoyanov, “Supervised contrastive learning for pre-trained language model fine-tuning,” arXiv preprint arXiv:2011.01403, 2020.
[7] L. Salsabil et al., “A study of computational reproducibility using URLs linking to open access datasets and software,” in Companion Proceedings of the Web Conference 2022, New York, NY, USA, 2022, p. 784–788. [Online]. 

-- Lamia Salsabil (@liya_lamia)

by Lamia Salsabil (noreply@blogger.com) at May 24, 2026 04:41 PM

Ed Summers

Weekly Bookmarks

These are some things I’ve wandered across on the web this week.

🔖 sliver

An ‘archival sliver’ of the web. A bit like a ‘data lifeboat’ for making or replicating web archives of small sets of pages. Uses shot-scraper to drive a web browser that generates screenshots of your URLs, but runs it through a pywb web proxy so it can produce a high quality archival version of what you download.

As well as archiving live web pages, this tools can leverage pywb’s support for neatly extracting URLs from other web archives and recording items with all the appropriate provenance information (see below for an example). This means it can work like hartator/wayback-machine-downloader but retain the additional information that the WARC and WACZ web archiving format suppor

🔖 Justice Department deletes press releases on charges against Jan. 6 rioters

The Justice Department has removed press releases detailing the charges against hundreds of individuals who participated in the Jan. 6, 2021 Capitol riot from its website, the department confirmed Friday.

🔖 Meryl Kornfield on deletion of justice.gov web content

The Trump admin is quietly deleting info about the Capitol attack from the DOJ website as it prepares to give funds to J6ers. This week, DOJ deleted a press release about one man with an ongoing child solicitation case who came to the Capitol with bear spray.

🔖 On Tools and the Normalization of Evil

The scale of theft is unreal, if one person or company plagiarizes something, lawsuits and court filings often ensue, or at the very least some reputational damage to the perpetrators, but not for Anthropic or OpenAI, not for Google or Microsoft, they steal from all of us and then they sell our work back to us. They want to keep us dumb and uneducated, they want us to rely on them. Learning is power, learning is resistance, knowledge provides independence.

🔖 LLMs and Buttondown

Our month-over-month growth rate in Q1 2026 was double our growth rate in Q4 2025. Buttondown has, roughly, grown a little less than 2x every year of its existence; this — its eighth year — is poised to shatter that, if trends hold.

Almost all of that incremental growth, meaning the growth in addition to our historical trend, I attribute to LLMs. We ask people when they sign up what brought them here, and an answer that went from surprising to banal to overwhelming over the course of Q1 was: an LLM. Users of all stripes cite an LLM as the reason that they ended up at Buttondown’s front door.

🔖 GitHub Breach Traced to Malicious ‘Nx Console’ VS Code Extension

GitHub has confirmed that a recent breach into its internal repositories was caused by a vulnerability in a Microsoft Visual Studio Code (VS Code) extension called ‘Nx Console.’

The security team at the Microsoft-owed software developer platform warned on May 19 that an attacker gained unauthorized access to 3800 internal repositories via a “poisoned” VS Code extension found on an employee device.

It was later confirmed by Jeff Cross, CEO of Nx that Nx Console, a popular VS Code extension, was the extensions that was poisoned extension and resulted in the GitHub breach.

🔖 ReS Futurae

ReS Futurae est une revue francophone internationale dédiée à l’étude de la science-fiction sous toutes ses formes : littérature, cinéma, arts graphiques, jeux vidéo, musique, design et phénomènes culturels divers. C’est une revue académique, à comité de lecture et arbitrage par les pairs, fondée sur un partenariat avec la revue Science Fiction Studies : des traductions croisées d’articles acceptés dans l’une et l’autre revue seront publiées régulièrement. Dans le paysage académique francophone, ce sera la première revue de cette nature.

🔖 Starlight

Starlight is a documentation website framework for Astro.

🔖 Tell New York Times, The Atlantic, and USA Today to keep the crucial work of journalists in the Wayback Machine!

The freedom of journalists isn’t only the freedom to write, it’s also the freedom to have your work read and remembered for generations to come. 2026 is the first World Press Freedom Day in 30 years that journalists’ work at major media outlets including New York Times, The Atlantic, and USA Today is not being preserved by the independent, nonprofit Internet Archive. We are calling on you and on all news outlets to publicly commit to working with the Internet Archive to keep the news in the Wayback Machine.

🔖 Hands-On Large Language Models

Through the visually educational nature of this book and with over 250 custom made figures, Python developers will learn the practical tools and concepts they need to use Large Language Models today.

🔖 Hiroshi Yoshimura

Hiroshi Yoshimura (吉村弘, Yoshimura Hiroshi; 22 October 1940 – 23 October 2003) was a Japanese musician and composer. He is considered a pioneer of ambient music in Japan.[2][3] His music lies mostly in the minimalist genre of kankyō ongaku, or environment music—soft electronic melodies infused with the sounds of nature: babbling brooks, steady rain, and morning birds.[4] However, not all Yoshimura’s work included nature sounds. His album Green (1986) only contained them in the United States release, as they were excluded in the Japanese version.

🔖 Yves Tanguy

Tanguy’s paintings have a recognizable style of nonrepresentational Surrealism. They show vast, abstract landscapes, mostly in a tightly limited palette of colors, occasionally showing flashes of contrasting color accents. Typically, these landscapes are populated with various abstract shapes, sometimes angular and sharp, sometimes with an organic look to them.

🔖 Announcing Web Serial Support in Firefox

Web Serial is a web API that allows a website to read and write to serial devices using JavaScript. See the MDN documentation for the details. While modern computers don’t typically include serial ports, serial devices connected to a USB port or paired via Bluetooth can advertise themselves as serial-capable devices so they appear as serial ports in the operating system.

The Web Serial API lets developers use the web platform to communicate with these devices. For example, websites can control devices or deliver firmware without requiring native applications or installers.

🔖 Weeds tend not to grow where they can’t take root

Destroying AI must include building counter-structures and nurturing a healthy, thriving social landscape that denies AI projects access to us in the first place. AI solutions like therapy & medical chatbots find space to thrive because of all the gaps in medical care we’ve normalized; we must make these interventions totally inscrutable in a future where care is always available, and people’s needs are not constantly being means-tested and scrutinized.

🔖 YesWeScan

Got an old USB scanner your computer can’t talk to? This web app is for you. Connect your scanner (see above) and get scanning.

🔖 Langfuse

Building AI applications and agents is very different from traditional software. Outputs are probabilistic, and teams need to reason about quality, cost, latency, and the tradeoffs between them. Langfuse Academy explains the AI engineering lifecycle to help you understand how the pieces fit together and what it takes to ship from prototype to production.

🔖 pocket_archive

Pocket Archive is a digital archival system and static site generator for small- to medium-(?) sized archives. It is designed to function in environments with unreliable connectivity and requires very low technical and human resources to set up, run, and use.

🔖 Memory in the Age of AI Agents

Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the proliferation of loosely defined memory terminologies has further obscured conceptual clarity. Traditional taxonomies such as long/short-term memory have proven insufficient to capture the diversity of contemporary agent memory systems. This work aims to provide an up-to-date landscape of current agent memory research. We begin by clearly delineating the scope of agent memory and distinguishing it from related concepts such as LLM memory, retrieval augmented generation (RAG), and context engineering. We then examine agent memory through the unified lenses of forms, functions, and dynamics. From the perspective of forms, we identify three dominant realizations of agent memory, namely token-level, parametric, and latent memory. From the perspective of functions, we propose a finer-grained taxonomy that distinguishes factual, experiential, and working memory. From the perspective of dynamics, we analyze how memory is formed, evolved, and retrieved over time. To support practical development, we compile a comprehensive summary of memory benchmarks and open-source frameworks. Beyond consolidation, we articulate a forward-looking perspective on emerging research frontiers, including memory automation, reinforcement learning integration, multimodal memory, multi-agent memory, and trustworthiness issues. We hope this survey serves not only as a reference for existing work, but also as a conceptual foundation for rethinking memory as a first-class primitive in the design of future agentic intelligence.

🔖 Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

This paper reports an empirical study organized into two experiments. Experiment 1 compares grep and vector retrieval on a 116-question sample from LongMemEval, using a custom agent harness (Chronos) and provider-native CLI harnesses (Claude Code, Codex, and Gemini CLI), for both inline tool results and file-based tool results that the model reads separately. Experiment 2 compares grep-only and vector-only retrieval while progressively mixing in additional unrelated conversation history, so that each query is embedded in more distracting material alongside the passages that matter. Across Chronos and the provider CLIs, grep generally yields higher accuracy than vector retrieval in our comparisons in experiment 1; at the same time, overall scores still depend strongly on which harness and tool-calling style is used, even when the underlying conversation data are the same.

🔖 Trevor Paglen and Holly Herndon on Making Art with AI and What the Discourse Is Missing

Neither Paglen nor Herndon are AI “skeptics”—they both use the various machine-learning technologies discursively bundled up as “AI” throughout their practices—but neither are they full-blown enthusiasts. So how is it changing their sense of what art is, and how we produce it? In the conversation that follows, I posed that question to them. “I think both Trevor’s practice and ours are looking at infrastructure in a really deep way,” Herndon said. “It was important in the early days, when we were beginning to experiment with this stuff, to see artists we had great respect for, like Trevor, working with it as well. It was like, OK, you’re not crazy—this is a really fruitful area to explore.”

🔖 Moving away from Tailwind, and learning to structure my CSS

I spent the last week or so migrating a couple of sites away from Tailwind and towards more semantic HTML + vanilla CSS, and it was SO fun and SO interesting, so here are some things I learned!

🔖 Wendt Center

Since 1975, the Wendt Center for Loss and Healing has helped people in the Washington metropolitan area rebuild a sense of safety and hope after experiencing the death of a loved one, life-threatening illness, violence, or other trauma. Nationally recognized for our expertise in grief, trauma, and mental health, we provide an array of holistic services for children, teens, adults, families, and our local communities.

🔖 OpenWebUI

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.

🔖 FaultLine

Most AI memory systems trust the LLM to write whatever it extracts. FaultLine doesn’t — every fact passes a validation gate before it touches storage. It’s the only system in the field that treats the model as an untrusted writer by design.

🔖 fastino-ai / GLiNER2

GLiNER2 unifies Named Entity Recognition, Text Classification, Structured Data Extraction, and Relation Extraction into a single 205M parameter model. It provides efficient CPU-based inference without requiring complex pipelines or external API dependencies

May 24, 2026 04:00 AM

May 23, 2026

David Rosenthal

Talk for Stanford's EE 292J

Via John Markoff, I was invited to a conversation with Jonathan Dotan and the students of his EE292J course entitled Designing for Authenticity. Below the fold are my brief introductory remarks, and some notes for the discussion.

Thank you for inviting a relic from earlier days in the Valley. As with all my talks, the text of this brief introduction will go up at blog.dshr.org later this afternoon.

Sun GX version 1
It is just over 60 years since I wrote my first program, and just under 50 years since I started using Unix. I was part of the Andrew team at Carnegie-Mellon, then an early employee at Sun Microsystems. At Sun I was the operating system guy for Curtis Priem and Chris Malachowsky's team that built Sun's GX graphics chip. The GX was a big success but making it so was an extremely frustrating experience.

When Curtis and Chris quit Sun and started hanging out in the now legendary Denny's with Jen-Hsun Huang, I also quit to become Nvidia's employee #4.

Curtis and I designed UDA (Unified Device Architecture), the way programs talk to Nvidia's chips. More than 30 years later, that is still the way they do it - the best engineering of my career. After 3 years, in the throes of Nvidia's first near-death experience, I had a big argument with Curtis and quit. It turned out that he was right and I was wrong. I immediately did another startup that also IPO-ed and ended up extremely burnt out.

My wife was part of the team at Stanford Library's HighWire Press that pioneered the transition of academic publishing from paper to the Web in 1995. One effect of the transition was that preservation of the academic record went from a side-effect of distribution to being at the whim of the publishers, which made librarians uneasy.

Our idea for fixing this was for libraries to crawl the journals to which they subscribed and keep a copy, as they did on paper. One problem was that the oligopoly publishers had consumed almost all of the libraries' budget. Whatever we did had to be very cheap, and thus not reliable. My idea for making the system reliable was a permissionless, peer-to-peer system in which libraries audited their copies and used inter-library copy to repair damage.

This was the LOCKSS system, for Lots Of Copies Keep Stuff Safe. In some ways it was a success; nearly 28 years later it is still going. In other ways it was a failure; it is now mostly a centralized system controlled by the publishers. The lesson we learn from this is that decentralization is extremely hard because it is an economic not a technical problem.

Twelve years ago in Economies of Scale in Peer-to-Peer Networks I explained the problem. The TL;DR is that the advantages of P2P networks arise from a diverse network of small, roughly equal resource contributors. Economies of scale mean the cost of participating will scale less than linearly. Unless the reward for participating decreases with scale faster than the cost the profit (reward minus cost) will increase with scale, and economics will drive centralization. No-one has found a way to make the reward decrease with scale, let alone faster than the costs.

Decentralized systems necessarily incur coordination costs that centralized systems don't. Here is an example from the BBC of coordination costs from 750 years ago:
Merton College Library
At Merton College in Oxford, there is an antique chest. In the Middle Ages, three key-holders had to be summoned to reveal the riches within. But this treasure wasn't gold or jewels. It was books. ... Merton College insisted its 13th-Century fellows donated books. The Archbishop of Canterbury issued a decree in 1276 introducing this requirement, which marked the beginning of the library at Merton College.
The requirement for three keys is like the requirement for a majority in Byzantine Fault Tolerance or Ethereum. But:
Just a few years after the Archbishop's decree, several books were stored outside the chest for the first time. They were chained to a table in the college, making them available at any time.
It is possible to make decentralized, permissionless systems as, or even more, reliable than centralized systems that use Byzantine Fault Tolerance. But doing so requires much higher levels of replication, and thus cost, and a large performance penalty. Thus in practice permissionless systems will either centralize, or be out-competed by centralized, peermissioned alternatives.

In effect both of these are what happened to LOCKSS:
Now you have this background, lets have a discussion.

Thoughts that didn't fit

Why did the printed paper system work better?
Are blockchains useful?
Stuart Haber and W. Scott Stornetta patented blockchains in 1991
They are a Merkle tree with only one branch, a great and useful idea

Haber & Stornetta's company Surety time-stamps documents by publishing the hash of the head of the chain of document hashes weekly in the New York Times classified ads. This is a centralized blockchain, and the root of trust is the New York Times and write-once, durable, dispersed media

But the crypto-bros didn't want to trust anyone, let alone the New York Times. Does the seductive idea of combining the concepts of a blockchain and decentralization deliver trustlessness?

Source
The title of the DARPA-sponsored report from the Trail of Bits cybersecurity company conforms to Betteridge's Law of Headlines because the answer to Are Blockchains Decentralized? is "No". We now have almost 18 years of experience on which to base this conclusion. One of the most important reasons is software monoculture.

The intensity of the crypto-bros' gaslighting about the virtues of decentralization is made necessary by the fact that among the 7 cryptocurrencies with market cap above $50B, only 3 even claim to be decentralized and they aren't really. The crypto-bros want people to assume that cryptocurrencies have the theoretical advantages of decentralization, while insiders can exploit the absence of these advantages.

Links from the discussion

Jonathan and the students asked good questions. Here are some links to topics that came up:

by David. (noreply@blogger.com) at May 23, 2026 12:00 AM

May 21, 2026

Evergreen ILS

The Evergreen Project 2026 Annual Meeting: Mission and Vision

The 2026 Evergreen International Conference took place this past April at the Hyatt Regency Lake Washington at Seattle’s Southport in scenic Renton. Huge thanks to our hosts from the King County Library System and BJ Colvin and his team for all their work that made this event possible!

At the conference, the Evergreen Project had its annual meeting, the first annual meeting of the organization as a membership organization. The membership program has been off to a great start, with 36 individual members, plus 13 metal-level member organizations, and 2 sustaining member libraries, all having joined in just the first 6 months since the program began.

The agenda of the annual meeting included community reports and board committee reports, the election of the slate of board members as determined by the vote of the membership in March 2026, and a discussion of the Evergreen Project’s draft Mission and Vision statements.

While we’ve all groaned at this sorts of thing at the office, these statements are part of the IRS’s reporting requirements for tax-exempt nonprofit organizations, so we need them, and they should ideally mean something to the community.

The board brought draft statements for discussion, which were then tweaked by the community and formally adopted by the project board at the regular meeting the following week.

Vision:

We envision a growing, healthy international community supporting Evergreen as a flexible, modern, and feature-competitive open-source library software platform suitable for use at many types of libraries and library consortia.

Mission:

The Evergreen Project exists to foster the open-source Evergreen Integrated Library System (ILS) software and its community of practice. The Evergreen Project engages in activities to promote, support, and advance the development of the Evergreen ILS software; support and facilitate the growth of the international community of Evergreen ILS software users; and to cultivate, manage, and protect the assets of the Project.

Also on the agenda for the annual membership meeting was a discussion of what sorts of projects the community would like to see the project board take on in pursuit of this now-articulated mission and vision for the project. A document was shared for feedback, and attendees threw out ideas and talked about what they’d like to see the project work on. A total of 23 ideas were proposed across the three main goals stated in the mission.

The next steps on this work will happen at the project board meetings over the next few months, as the community feedback from this session, and the discussion at the conference, and the support provided by our members, turns into action in pursuit of the mission of the Evergreen Project!

A big thank you to all of our users, contributors, supporters, question-askers, question-answerers, event planners, presenters, developers, administrators, collaborators, community members, project members, and most of all our library patrons for this beautiful, hopeful, extremely complicated thing we all do together, to help each other.

Stay tuned to board meetings and listservs for updates, and if you’re not already a member of the Evergreen Project, consider joining today!

 

by Eli Neiburger at May 21, 2026 04:44 AM

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2026-05-18: John Deasy (Computer Science PhD Student)

 


Hey WSDL!  I'm John (Jack) Deasy and I decided to commit to getting a Ph.D. in Computer Science, so here I am.  I received my B.S. in Physics from the University of Mary Washington and my M.S. in Computer Science from Old Dominion University.

After completing my undergraduate studies, I began my career as a physicist with the Naval Surface Warfare Center Dahlgren Division, where I currently apply my background in physics to the development of simulations for the Navy. Over the course of my career, I have experience across multiple disciplines, including electrical engineering, civil engineering, mechanical engineering, chemical and biological sciences, as well as simulation design and development. One of the highlights of my career was contributing to the advancement of defense technology through my work on the Remote Detection of Gun Projectiles, for which I hold a U.S. patent.  My research interests center on the use of synthetic data for machine learning development, with a particular focus on creating approaches that improve training, testing, and validation in environments where real-world data is limited or costly to obtain.

One of my many passions is taking a project from theory to a practical engineering product that can have a real-world impact and change how we interact with machines.  Speaking of machines, I have a passion for classic cars.  I'm currently restoring a 1978 Chevy Corvette, but with a twist.  I'm introducing the 1970s to the Body Control Module (BCM) and injecting Computer Science into the core of how this machine operates. 

I'm looking forward to my time at ODU and all the great people I'll work and interact with along the way.

Regards,

- John Deasy






by John Deasy (noreply@blogger.com) at May 21, 2026 02:52 AM

Xe Iaso

"No way to prevent this" say users of only package manager where this regularly happens

In the hours following the news that art-template fell victim to a supply chain attack via NPM, developers and systems administrators scrambled ensure all of their projects were unaffected from a supply chain attack where attackers have controlled the repository since 2025 and are using it to load unauthorized JavaScript from third party domains, including but not limited to Baidu Analytics. This is is due to the affected dependencies being distributed via NPM, the only package manager where these supply-chain attacks regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Mrs. Macy Von, echoing statements expressed by hundreds of thousands of programmers who use the only package manager where 90% of the world's supply-chain attacks have occurred in the last decade, and whose projects are 20 times more likely to fall victim to supply chain attacks. "It's a shame, but what can we do? There really isn't anything we can do to prevent supply-chain attacks from happening if the maintainers don't want to secure access to their accounts in a robust manner". At press time, users of the only package manager in the world where these vulnerabilities regularly happen once or twice per week for the last year were referring to themselves and their situation as "helpless".

For more information, please see upstream documentation published by art-template at the following link: 2026-art-template.

May 21, 2026 12:00 AM

"No way to prevent this" say users of only language where this regularly happens

In the hours following the release of CVE-2026-45250 for the project FreeBSD, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a kernel stack overflow when validating permissions of the setcred(2) system call, allowing arbitrary code execution in the context of the kernel. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Mrs. Gregoria Doyle, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

May 21, 2026 12:00 AM

May 20, 2026

Open Knowledge Foundation

Recap: What We Learned About Climate and AI from Our First Roundtable

The kick-off of AI Learning Labs brought the community together for a wide-ranging discussion lasting over two hours; here are the video recording and highlights

The post Recap: What We Learned About Climate and AI from Our First Roundtable first appeared on Open Knowledge Blog.

by Solana Larsen at May 20, 2026 09:33 PM

Evergreen ILS

Evergreen security releases: 3.15.13, 3.16.7, 3.17.1

The Evergreen Project has issued the following security releases:

This is a security release that fixes several vulnerabilities, including ones that allow the remote execution of arbitrary SQL statements in the Evergreen database as well as cross-site scripting vulnerabilities.

These releases are available on the downloads page. 

We strongly recommend immediate installation of this security release.

The security bugs fixed in this release are:

These bugs will be made publicly visible after the security release is generally available.

If you are running a version of Evergreen earlier than 3.15, please consult with your service provider or review the fixes in Git to update your system.

We would like to thank Brian A. Egge for responsibly reporting the vulnerabilities included in this release.

These releases also include other bugfixes, which are detailed in the release notes available on the downloads page.

Thank you to the release teams: Galen Charlton (Equinox), Martha Driscoll (NOBLE), Gina Monti (Bibliomation), Sarah Moody (ECDI), Michele Morgan (NOBLE), and Andrea Buntz Neiman (Equinox).

by Andrea Buntz Neiman at May 20, 2026 09:26 PM

Ed Summers

distant haze

distant haze

A view from Stanford Dish foothills

en.wikipedia.org/wiki/Stanford_Dish_%28Stanford_Radio_Tel…

May 20, 2026 04:00 AM

Xe Iaso

"No way to prevent this" say users of only language where this regularly happens

In the hours following the release of CVE-2026-45584 for the project Microsoft Windows, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a memory safety vulnerability resulting in arbitrary code execution inside the virus scanner Windows Defender. This is due to the affected components being written in C++, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Dr. Annabelle Connelly, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

May 20, 2026 12:00 AM

May 19, 2026

David Rosenthal

Flooded Zones Part 2

Source
This is the promised follow-on to Flooded Zones Part 1, which discussed the Distributed Denial of Service (DDoS) attack being mounted by AI against the scholarly publication system. By reducing the cost of generating and submitting a paper or a review, AI has caused a massive increase in the quantity and a significant decrease in the quality of submissions to a system that was already vastly overloaded.

Below the fold I look at AI-enabled DDoS attacks against two other even more important areas; software security and political discourse (as shown in the overview image).

Software Security

Last month, Raffi Krikorian's New York Times op-ed announced It’s the End of the Internet as We Know It:
Last week, Anthropic announced that its newest artificial intelligence model, Claude Mythos Preview, would not be released to the public, after the company learned it was capable of finding and exploiting vulnerabilities that have gone undetected in critical software systems for decades. Instead, Anthropic gave access to Mythos — and $100 million in credits to use it — to more than 50 of the world’s largest organizations, including Amazon, Apple, Microsoft, Google and JPMorgan Chase, as part of a defensive cybersecurity initiative called Project Glasswing.
It sounded like a double-edged sword, helping both the attackers and the defenders, with Anthropic claiming kudos for favoring the defenders. It is true that, once the maintainers of all the software in the world have used these tools and incorporated them into their build process, the world will be a safer place. Daniel Steinberg, who maintains curl, is among the maintainers who really care about security and were already using similar tools. In MYTHOS FINDS A CURL VULNERABILITY he reported that:
Back in April 2026 Anthropic caused a lot of media noise when they concluded that their new AI model Mythos is dangerously good at finding security flaws in source code. Apparently Mythos was so good at this that Anthropic would not release this model to the public yet but instead trickle it out to a selected few companies for a while to allow a few good ones(?) to get a head start and fix the most pressing problems first, before the general populace would get their hands on it.

The whole world seemed to lose its marbles. Is this the end of the world as we know it? An amazingly successful marketing stunt for sure.
Steinberg got access to Mythos' report on his codebase. It had found five "confirmed security vulnerabilities":
Five issues felt like nothing as we had expected an extensive list. Once my curl security team fellows and I had poked on the this short list for a number of hours and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”.

The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June. The flaw is not going to make anyone grasp for breath. All details of that vulnerability will of course not get public before then, so you need to hold out for details on that.
...
My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.
It seems Mythos isn't as revolutionary as Anthropic would like the world to believe as they head for an IPO. Nevertheless, Steinberg stresses that using tools like Mythos is an essential security practice.

To understand the DDoS facing software maintainers you need to understand how security vulnerabilities are found and fixed:
LLMs have made this part of the process much cheaper and quicker, so the flow of vulnerabilities reaching this point has greatly increased. Now the maintainer has a report of a vulnerability, hopefully a proposed fix and maybe exploit code confirming that it is real. What happens next?
LLMs don't help with any of these steps, so the zones of the humans who have to perform them are being flooded. Much of the flood is shit. To take perhaps the most critical example, Simon Sharwood reports that Linus Torvalds says AI-powered bug hunters have made Linux security mailing list ‘almost entirely unmanageable’:
“So just to make it really clear: If you found a bug using AI tools, the chances are somebody else found it too. If you actually want to add value, read the documentation, create a patch too, and add some real value on *top* of what the AI did. Don't be the drive-by ‘send a random report with no real understanding’ kind of person. OK?”
Jamie John provides another example in Bug bounty businesses bombarded with AI slop:
Businesses that run “bug bounty” schemes have long relied on independent security researchers to spot vulnerabilities. But the rise of AI tools is now overwhelming them with spurious submissions.

Bugcrowd, whose customers include OpenAI, T-Mobile, and Motorola, said the number of reports it received more than quadrupled over a three-week period in March, with most proving to be false.

Curl, a widely used tool to transfer data across the Internet, suspended its paid bug bounty program in January, citing an “explosion in AI slop reports” and lower-quality submissions.

Cyber security experts say advances in generative AI are reshaping the economics of bug bounty programs. While the tools allow experienced researchers to find flaws more quickly, they are also lowering the barrier to entry, triggering a flood of automated or erroneous submissions that companies must sift through.
The problems caused by DDoS-ing the patch develop-test-release-install cycle are vividly illustrated by recent vulnerabilities in the Linux kernel:
Note Hyunwoo Kim's assessment that it was the result of rushing the patch process:
According to researcher Hyunwoo Kim, who uncovered Dirty Frag, "Fragnesia" emerged as an unintended side effect of patches shipped to fix the original Dirty Frag vulnerabilities, adding yet another entry to the long tradition of security fixes accidentally creating new security problems.

As The Register previously reported, Dirty Frag followed hot on the heels of Copy Fail, another Linux kernel privilege escalation flaw that abused page cache handling to overwrite supposedly read-only files.
It doesn't appear that LLMs found any of these vulnerabilities, showing that even humans can overload the patch process to the point of failure. But the advent of LLMs means we can expect more and worse fiascos

Political Discourse

How malicious AI swarms can threaten democracy is a paper in Science by Daniel Thilo Schroeder et 21 al from last January (preprint). They set out the problem thus:
Advances in AI offer the prospect of manipulating beliefs and behaviors on a population-wide level. Large language models (LLMs) and autonomous agents now let influence campaigns reach unprecedented scale and precision. Generative tools can expand propaganda output without sacrificing credibility and inexpensively create falsehoods that are rated as more human-like than those written by humans. Techniques meant to refine AI reasoning, such as chain-of-thought prompting, can just as effectively be used to generate more convincing falsehoods. Enabled by these capabilities, a disruptive threat is emerging: swarms of collaborative, malicious AI agents. Fusing LLM reasoning with multi-agent architectures, these systems are capable of coordinating autonomously, infiltrating communities, and fabricating consensus efficiently. By adaptively mimicking human social dynamics, they threaten democracy. Because the resulting harms stem from design, commercial incentives, and governance, we prioritize interventions at multiple leverage points, focusing on pragmatic mechanisms over voluntary compliance.

This risk compounds long-standing vulnerabilities in democratic information ecosystems, already weakened by erosion of rational-critical discourse and a lack of shared reality among citizens. AI swarms are a potent accelerant in this trajectory, though their ultimate impact is not predetermined. Their effects will be shaped by platform design, market incentives, media institutions, and political actors. Here, we distinguish documented trends from projections, indicate where uncertainty remains, and note countervailing dynamics, such as growing public skepticism toward unverified content and a renewed interest in institutional demand for accountable journalism
David Gilbert covered it for Wired in AI-Powered Disinformation Swarms Are Coming for Democracy, and some of the authors discuss the paper in AI bot swarms threaten to undermine democracy on Gary Marcus' Substack:
The unique danger of a swarm is that it acts less like a megaphone and more like a coordinated social organism. Earlier botnets were simple-minded, mostly just copying and pasting messages at scale—and in well-studied cases (including Russia’s 2016 IRA effort on Twitter), their direct persuasive effects were hard to detect. Today’s swarms, now emerging, can coordinate fleets of synthetic personas—sometimes with persistent identities—and move in ways that are hard to distinguish from real communities. This is not hypothetical: in July 2024, the U.S. Department of Justice said it disrupted a Russia-linked, AI-enhanced bot farm tied to 968 X accounts impersonating Americans. And bots already make up a measurable slice of public conversation: a 2025 peer-reviewed analysis of major events estimated roughly one in five accounts/posts in those conversations were automated. Swarms don’t just broadcast propaganda; they can infiltrate communities by mimicking local slang and tone, build credibility over time, and then adapt in real time to audience reactions—testing variations at machine speed to discover what persuades.
This is precisely the AI-based version of Steve Bannon's "flooding the zone with shit".

Unlike the case of scholarly publication. I have no idea how to mitigate the shit that is flooding this zone. Unlike the oligopoly publishers, who can act as partially effective gatekeepers and are somewhat motivated to improve things, in the political space there are no longer any effective gatekeepers. This entire zone is driven by measures of "engagement", and fanning the flames of outrage is the best way to drive up the numbers.

The authors propose five "pragmatic mechanisms", but I have issues with each of them:
  1. social media platforms must move away from the “whack-a-mole” approach they currently use:
    Right now, companies rely on episodic takedowns—waiting until a disinformation campaign has already gone viral and done its damage before purging thousands of accounts in a single wave. This is too slow. Instead, we need continuous monitoring that looks for statistically unlikely coordination. Because AI can now generate unique text for every single post, looking for copy-pasted content no longer works. We must look at network behavior instead: a thousand users might be tweeting different things, but if they exhibit statistically improbable correlations in their semantic trajectories or propagate narratives with a synchronized efficiency that defies organic human diffusion.
    Platforms are always going to be reluctant to kill off the users on whom their finances depend upon. When forced to, they would rather do it in large blocks.
  2. we need to stop waiting for attackers to invent new tactics before we build defenses:
    A defense that only reacts to yesterday’s tricks is destined to fail. We should instead proactively stress-test our defenses using agent-based simulations. Think of this like a digital fire drill or a vaccine trial: researchers can build a “synthetic” social network populated by AI agents, and then release their own test-swarms into that isolated environment. By watching how these test-bots try to manipulate the system, we can see which safeguards crumble and which hold up, allowing us to patch vulnerabilities before bad actors act on them in the real world.
    This is a very good idea, but it would need funding (see below),
  3. we must make it expensive to be a fake person:
    Policymakers need to incentivize cryptographic attestations and reputation standards to strengthen provenance. This doesn’t mean forcing every user to hand over their ID card to a tech giant—that would be dangerous for whistleblowers and dissidents living under authoritarian regimes. Instead, we need “verified-yet-anonymous” credentialing. Imagine a digital stamp that proves you are a unique human being without revealing which human you are. If we require this kind of “proof-of-human” for high-reach interactions, we make it mathematically difficult and financially ruinous for one operator to secretly run ten thousand accounts.
    This is the same problem that has bedevilled computer-mediated communication since the advent of spam. Cynthia Dwork and Moni Naor's Pricing via Processing or Combatting Junk Mail signally failed to stem the tide by making it costly to send bulk e-mail. But it did lead to Satoshi Nakamoto's solution to the Sybil problem, making it expensive to mine Bitcoin. The problem here is that making it "expensive to be a fake person" effectively means making it somewhat expensive to be a person. The overhead and cost of obtaining such a "digital stamp" would disincentivize participation, so the platforms wouldn't like it. See The Permissionless Catch-22
  4. we need mandated transparency through free data access for researchers:
    We cannot defend society if the battlefield is hidden behind proprietary walls. Currently, platforms restrict access to the data needed to detect these swarms, leaving independent experts blind. Legislation must guarantee vetted academic and civil society researchers free, privacy-preserving access to platform data. Without a guaranteed “right to study,” we are forced to trust the self-reporting of the very corporations that profit from the engagement these swarms generate.
    The platforms depend upon monetizing the data they collect on users' behavior. So they are always going to be reluctant to give outsiders access to their key asset. And, in any case, any data to which they grant access will effectively be "self-reporting".
  5. we need to end the era of plausible deniability with an AI Influence Observatory:
    Crucially, this cannot be a government-run “Ministry of Truth.” Instead, it must be a distributed ecosystem of independent academic groups and NGOs. Their mandate is not to police content or decide who is right, but strictly to detect when the “public” is actually a coordinated swarm. By standardizing how evidence of bot-like networking is collected and publishing verified reports, this independent watchdog network would prevent the paralysis of “we can’t prove anything,” establishing a shared, factual record of when our public discourse is being engineered.
    Each of the "independent academic groups and NGOs" will need significant fund if they are to process information at the scale required. Where would this funding come from? Taxing the platforms to fund this is one answer, but it wouldn't motivate them to cooperate.
One of the most effective ways to use these disinformation swarms is to amplify pre-existing stereotypes, exploiting confirmation bias:
In social media, confirmation bias is amplified by the use of filter bubbles and "algorithmic editing", which display to individuals only information they are likely to agree with, while excluding opposing views.
Adam Kucharski shows how easy it is for AIs to build on such stereotypes in Real signals or artificial stereotypes?. He asked Copilot to analyze data that should have generated a null result:
First, I’d created 2000 free-text responses and labelled them ‘UK’. Then I copied and pasted the exact same 2000 responses but labelled these ‘US’. Finally, I combined them to create a dataset of 4000 total responses, and jumbled them up.

Despite the responses being identical for the UK and US, Copilot produced a rich, detailed summary of how US and UK respondents differed.
Copilot's output
Note how confident and detailed Copilot was driven not by anything in the data but only by the stereotypes in its training data. There are two problems when applying this to real data:
The defense against DDoS at the network level is services like Cloudflare interposed between the bots and their target. There doesn't seem to be any way to replicate this at higher levels like these three zones. It is really hard to be optimistic about their future.

by David. (noreply@blogger.com) at May 19, 2026 03:00 PM

May 18, 2026

Open Knowledge Foundation

Announcing the Open Technology Research (OTR) Symposium 2026 in Barcelona

We hope you will join us from 26–27 October at the historic University of Barcelona as we translate evidence-based research into actionable policy.

The post Announcing the Open Technology Research (OTR) Symposium 2026 in Barcelona first appeared on Open Knowledge Blog.

by OKFN at May 18, 2026 06:26 PM

May 17, 2026

Ed Summers

None

There is no wind
blowing here
on my face
cool from the faraway sea.

I don’t see that tree
green against blue
blue behind green.

Boarding a train
leaving a train
no train.

May 17, 2026 04:00 AM

Weekly Bookmarks

These are some things I’ve wandered across on the web this week.

🔖 abcde - A Better CD Encoder

Ordinarily, the process of grabbing the data off a CD and encoding it, then tagging or commenting it, is very involved. abcde is designed to automate this. It will take an entire CD and convert it into a compressed audio format - Ogg/Vorbis, MPEG Audio Layer III, Free Lossless Audio Codec (FLAC), Ogg/Speex, MPP/MP+(Musepack), M4A (AAC) or Opus format(s).

🔖 The New York Good Times

An alternative home page for the NYT

🔖 Visualizing all books of the world in ISBN-Space

https://phiresky.github.io/blog/2025/visualizing-all-books-in-isbn-space/

🔖 DC Tabular Application Profiles (DC TAP) - Primer

This primer is the best starting point for understanding the Dublin Core Tabular Application Profiles (DCTAP) model. With just the primer you should be able to create your first DCTAP. DCTAP is the product of the DCMI Application Profiles Interest Group. This and other work products of the group can be found at the DC TAP github repository. Other documents in this project are:

🔖 feed-survey

A high-performance, distributed survey of RSS/Atom feed usage, autodiscovery, and quality in Common Crawl using AWS EMR.

🔖 Maintenance begins at creation, so why are we not creating better?

What digital substrate could we be using for the different categories of digital record out there? How can we take what we know about digital preservation and, instead of restricting ourselves to one format, embrace plurality to enable the creation of rich, flexible, preservable records?

🔖 Stuplimity: Shock and Boredom in Twentieth-Century Aesthetics

As “dispositions” which result in a fundamental displacement from secure critical positions, the shocking and the boring usefully prompt us to look for new strategies of engagement and to extend the circumstances under which engagement becomes possible. The phenomenon of the intersection of these affective dynamics, in innovative artistic and literary production, will thus be explored here as a way of expanding our notion of the aesthetic in general.

🔖 My Blog Principles

So over my (relatively) long blogging journey I’ve accumulated some crust of principles. Ensuring that what I’m doing is kind and useful to people. This led to some decisions. Including ones that make my own blog maintenance slightly harder. But I’m ready to suffer if this brings something good to others. Here are things I formulated that must be true for my blog…

🔖 Tyler State Park (Pennsylvania)

This is the park where John deployed his weather station.

🔖 Whole Health Veterinary Care

Penny’s new vet!

May 17, 2026 04:00 AM

May 15, 2026

David Rosenthal

Flooded Zones Part 1

Tom Cowap
CC-BY-SA 4.0
Three years ago in Flooding The Zone With Shit, my first post on the AI bubble, I wrote:
My immediate reaction to the news of ChatGPT was to tell friends "at last, we have solved the Fermi Paradox". It wasn't that I feared being told "This mission is too important for me to allow you to jeopardize it", but rather that I assumed that civilizations across the galaxy evolved to be able to implement ChatGPT-like systems, which proceeded to irretrievably pollute their information environment, preventing any further progress.
The post title was a notorious quote from Steve Bannon. Below the fold, I look into scholarly publication, the first of three areas whose zones are currently being flooded with AI output in what can be considered DDoS (Distributed Denial of Service attacks:
A distributed denial-of-service (DDoS) attack occurs when multiple systems flood the bandwidth or resources of a targeted system, usually one or more web servers.
A subsequent post will examine two more flood zones, political discourse and software security.

Bacground

Spam
DDoS attacks work when it is cheaper for the attacker to consume the victim's resources than it is for the victim to supply them[1]. Everyone is familiar with this situation, their mail server has to use a machine-learning system to filter the small amount of ham from the vast flood of spam. This has been going on for more than three decades[2], in a continuous arms-race between the spammers and the filters.

Scholarly Publication

The record of scholarship has been under attack for a long time; my "flooding" post included this example:
Open access with "author processing charges" out-competed the subscription model. Because the Web eliminated the article rate limit imposed by page counts and printing schedules, it enabled the predatory open access journal business model. So now it is hard for people "doing their own research" to tell whether something that looks like a journal and claims to be "peer-reviewed" is real, or a pay-for-play shit-flooder. The result, as Bannon explains in his context, is disorientation, confusion, and an increased space for bad actors to exploit.
Now AI makes it very cheap to consume resources in the system, as Elizabeth Gibney reports in How AI slop is causing a crisis in computer science:
Fifty-four seconds. That’s how long it took Raphael Wimmer to write up an experiment that he did not actually perform, using a new artificial-intelligence tool called Prism, released by OpenAI last month. “Writing a paper has never been easier. Clogging the scientific publishing pipeline has never been easier,” wrote Wimmer, a researcher in human–computer action at the University of Regensburg in Germany, on Bluesky.

Large language models (LLMs) can suggest hypotheses, write code and draft papers, and AI agents are automating parts of the research process. Although this can accelerate science, it also makes it easy to create fake or low-quality papers, known as AI slop.
It is expensive to run the filters to separate the scholarly hame from the AI spam:
AI slop is hard to spot by conventional means, says Paul Ginsparg, a physicist at Cornell University in Ithaca, New York, and a co-founder of the arXiv. Volunteer moderators can no longer use how well a paper engages with the relevant literature and methods to gauge its merit. “AI slop frequently can’t be discriminated just by looking at abstract, or even by just skimming full text,” he says. This makes it an “existential threat” to the system, he says.
How bad is the problem? In How much of the scientific literature is generated by AI?, Miryam Naddaf asks:
How much of the scientific literature is generated by AI? The first studies of the size of the AI footprint in scientific journals, preprint repositories and peer-review reports give a spread of answers — and indicate a rapidly evolving situation that it is difficult to get a handle on.

The fear of many in the research community is that poor-quality or entirely fabricated research produced by large language models (LLMs) could overwhelm the ability of current quality-control systems to detect it, thereby polluting the scientific canon.
Source
The fear is justified. Can AI tools help reduce the cost of weeding out the AI slop? For example, Pangram is a service that detects AI generated text. In Pangram Predicts 21% of ICLR Reviews are AI-Generated, Bradley Emi asks:
Are authors using LLMs to write AI research papers? Are peer reviewers outsourcing the writing of their reviews of these papers to generative AI tools? In order to find out, we analyzed all 19,000 papers and 70,000 reviews from the International Conference on Learning Representations, one of the most important and prestigious AI research publication venues. Thanks to OpenReview and ICLR's public review process, all of the papers and their reviews were made publicly available online, and this open review process enabled this analysis.
Pangram found that a significant proportion of the reviews were AI slop:
We found 21%, or 15,899 reviews, were fully AI-generated. We found over half of the reviews had some form of AI involvement, either AI editing, assistance, or full AI-generation.
Source
There was less AI slop in the papers, but still significant AI use:
Paper submissions, on the other hand, are still mostly human-written (61% were mostly human-written). However, we did find several hundred fully AI-generated papers, though they seem to be outliers, and 9% of submissions had over 50% AI content.
Of course, just because Pangram flags a review or a paper as AI-generated doesn't mean it is wrong, just as that a paper is human-written doesn't mean it is right. A decade ago, long before AI arrived, science was suffering a reproducibility crisis caused by Bad incentives in peer-reviewed science. Eleven years ago Arthur Caplan of the Division of Medical Ethics at NYU's Langone Medical Center predicted it would lead to a total loss of science's credibility:
The time for a serious, sustained international effort to halt publication pollution is now. Otherwise scientists and physicians will not have to argue about any issue—no one will believe them anyway.
No-one did anything effective, so Caplan's "otherwise" was what happened. Science has had a quality problem for a long time. The bad incentives have also caused a quantity problem, spawning pay-to-play predatory journals publishing garbage under the "peer-reviewed" brand.

Gartenberg Fig. 2
An even more comprehensive analysis for the journal Organization Science also using Pangram was reported in More Versus Better: Artificial Intelligence, Incentives, and the Emerging Crisis in Peer Review by Claudine Gartenberg et al They find that AI's reduction in the cost of pumping up a researcher's publication count has caused a massive spike in submissions to an already overloaded system:
While there could be many reasons for the rise in submissions, including reduced backlogs, increased scholar productivity, or journal reputation, Figure 2 suggests that the disproportionate increase in submission volume is driven by AI use. Post-ChatGPT, we see a marked decline in submissions flagged at 0%–15% AI (little to no AI use) and a corresponding rise in all other categories that make up the difference between the decline in human-only submissions and the 42% increase in total submissions.
Gartenberg Fig. 3
The additional submissions were marked by heavy AI use:
Prior to the launch of ChatGPT, relative shares were flat. Nearly all submissions were classified as human (with some idiosyncratic noise). Immediately after the launch of the first commercial LLM chatbots, a precipitous decline in human-only submissions began. At the same time, we observe a steady rise in all categories of AI-supported or generated submissions. What is most striking is that by February 2026, the majority of submissions submitted to Organization Science use AI in their writing to some degree. The most striking trend is the rise of the 70%+ AI category, where text is mostly or entirely generated by AI.
Gartenberg Fig. 6
So much for quantity. There are no good automated tools to assess the quality of the research but there is a wide range of automated tools to assess the quality of the writing. Applying them, the authors found a sighnificant correlation between AI use and degraded readability:
We do not find much evidence that the writing quality of those manuscripts changed meaningfully between 2013 and November 2022, when ChatGPT was launched. In contrast, post-ChatGPT, we see a precipitous decline in the average manuscript’s Reading Ease score. Indeed, AI scores and Flesch Reading Ease are negatively correlated
...
We find strong evidence that AI use is associated with lower-quality writing across most of these traditional measures. This result is counterintuitive. Authors often assume that using AI will improve their writing. However, this is not the case, at least when authors substantially offload their writing to it.
...
AI prose is more difficult to read on several dimensions. Beyond substantially lower Flesch Reading Ease scores, the grade level required to understand the text is higher (more multisyllabic words); the FOG and SMOG indices increase, suggesting more complex text; and the use of jargon increases. We also find increased use of nominalizations (e.g., “conceptualization”, “operationalization”, or “contextualization”).
Whatever the quality of AI generated papers, they are massively aggravating the quantity problem. Samantha Cole reports on one approach to reducing the flow in ArXiv to Ban Researchers for a Year if They Submit AI Slop:
Late Thursday evening, Thomas Dietterich, chair of the computer science section of ArXiv, wrote on X: “If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s). We have recently clarified our penalties for this. If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper.”

Examples of incontrovertible evidence, he wrote, include “hallucinated references, meta-comments from the LLM (‘here is a 200 word summary; would you like me to make any changes?’; ‘the data in this table is illustrative, fill it in with the real numbers from your experiments’.”

“The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue,” Dietterich wrote.
I have two suggestions:

Footnotes

  1. This was a problem we addressed in the design of the LOCKSS protocol:
    Effort Balancing. If the effort needed by a requester to procure a service from a supplier is less than the effort needed by the supplier to furnish the requested service, then the system can be vulnerable to an attrition attack that consists simply of large numbers of ostensibly valid service requests. We can use provable effort mechanisms such as Memory-Bound Functions to inflate the cost of relatively “cheap” protocol operations by an adjustable amount of provably performed but otherwise useless effort. By requiring that at each stage of a multi-step protocol exchange the requester has invested more effort in the exchange than the supplier, we raise the cost of an attrition strategy that defects part-way through the exchange. This effort balancing is applicable not only to consumed resources such as computations performed, memory bandwidth used or storage occupied, but also to resource commitments. For example, if an adversary peer issues a cheap request for service and then defects, he can cause the supplier to commit resources that are not actually used and are only released after a timeout (e.g., SYN floods). The size of the provable effort required in a resource reservation request should reflect the amount of effort that could be performed by the supplier with the resources reserved for the request.
  2. I described the history in "Nobody cared about security":
    The first spam e-mail was sent in 1978 and evoked this reaction:
    ON 2 MAY 78 DIGITAL EQUIPMENT CORPORATION (DEC) SENT OUT AN ARPANET MESSAGE ADVERTISING THEIR NEW COMPUTER SYSTEMS. THIS WAS A FLAGRANT VIOLATION OF THE USE OF ARPANET AS THE NETWORK IS TO BE USED FOR OFFICIAL U.S. GOVERNMENT BUSINESS ONLY. APPROPRIATE ACTION IS BEING TAKEN TO PRECLUDE ITS OCCURRENCE AGAIN.
    Which pretty much fixed the problem for the next 16 years. But in 1994 lawyers Canter & Siegel spammed the Usenet with an advertisement for their "green card" services, and that December the first commercial e-mail spam was recorded.
  3. Credit for this idea goes to Vicky Reich.

by David. (noreply@blogger.com) at May 15, 2026 09:00 PM

May 14, 2026

Harvard Library Innovation Lab

Libraries and Public Access to Federal Data: Chris Marcum Talks to the Public Data Project

On May 7, 2026, Molly Hardy, Project Lead for the Public Data Project, sat down for an interview with Chris Marcum, Senior Fellow for Data Policy at the Data Foundation and former Senior Statistician at the White House Office of Management and Budget. Please click the video above to listen and watch; the interview transcript below has been lightly edited for clarity.


Public Data Project:
Hello, my name is Molly Hardy, and I’m here at the Library Innovation Lab’s Public Data Project. I’m the director of the project. And I’m very pleased to be welcoming Senior Fellow for Data Policy at the Data Foundation and former Senior Statistician from the White House Office of Management and Budget, Chris Marcum. Chris and I are going to have a conversation that’ll go about 45 minutes. It centers around a report that Chris recently published, The Integrity of Public Access to Federal Data: Evaluating Disruptions to Open Government Data, 2025–2026.

Cover of Chris Marcum's report, 'The Integrity of Public Access to Federal Data: Evaluating Disruptions to Open Government Data, 2025–2026' Source: The Integrity of Public Access to Federal Data (2026).

And through his explanation of the flaws in the evidence cited to assess government data loss since 2025, Chris explains the complexities and intricacies of government data collection and distribution, offering those of us in the library community real insights into how we might move forward in our work to preserve and make accessible government data. Government documents and data librarians have been thinking about the preservation and access to government publications for decades. See, for example, James A. Jacobs and James R. Jacobs’s Preserving Government Information: Past, Present, and Future.

And as the Internet Archive’s recent Information Stewardship Forum 2026 on building shared practices for the preservation and access of government information highlighted, librarians, technologists, policymakers, and community advocates need to work together to address the fragmentation and challenges in preserving and accessing government information. And I want to add a quick plug here for the Preservation of Government Information call to action that folks may want to check out and sign that came out of that meeting in San Francisco.

So in February 2025, the Library Innovation Lab announced its archive of the federal data clearinghouse, Data.gov, and our Public Data Project emerged from this effort. In October of last year, we shared Data.gov Archive Search, an interface for exploring this important collection of government datasets. This work builds on recent advancements in lightweight, browser-based querying to enable discovery of more than 311,000 datasets comprising almost 18 terabytes of data on topics ranging from automobile recalls to chronic disease indicators.

So, given his illustrious career in advocating for the preservation and access to government data, the Public Data Project has learned a lot from Chris. And we greatly value this recent report that he’s issued, again, called The Integrity of Public Access to Federal Data. And I’m so pleased today to have a chance to sit down with Chris and ask him to expand on areas of the report that might be of particular interest to the library community. So, welcome, Chris.

Chris Marcum:
Thanks so much, Molly. I’m super excited to be here. I’m just tickled that you all at the Public Data Project have asked me to come and speak with you today about the report. And I’m just really, really honored. Thank you.

Public Data Project:
Absolutely. Could you just tell our audience a little bit about your background? I think it’s really fascinating, and it would be helpful for folks to understand where you’re coming from.

Chris Marcum:
Yeah, sure. So first and foremost, I’m an open science advocate, and have been steeped in information policy in the U.S. federal government for over the last five or six years.

But that’s not what I was trained in: I have a PhD in sociology, and I did a postdoc in economics and statistics at Rand Corporation, where I was looking at vaccination uptake behavior during the H1N1 potential pandemic that didn’t turn out to be a pandemic thanks to high-quality data shared by the CDC. And the late Dr. Nancy Cox was able to share that data.

So eventually I ended up at the NIH. I was doing basic research as a methodologist on biobehavioral health and social networks in the context of heritable health disease. And I started getting this policy itch. I was like, we write really, really great research papers. We produce a lot of amazing data. But ultimately, the impact of that is pretty limited. We’re talking to a very narrow audience of other researchers. And I really wanted to have a broader impact.

And so I started looking for opportunities to do more policy-related work. And NIH is not a policy-setting agency outside of the NIH itself. And so I wanted to really think about how to cut my teeth in policy.

So I joined some committees in the intramural research program. We have a scientific review committee that’s like the Center for Scientific Review for extramural [research], where we were reviewing other intramural scientists’ research. And then I got involved in the data access committees. And that really accelerated my interest in information policy. I was able to go over to help set up a new program in the Office of the Director at NIAID — National Institute of Allergy and Infectious Diseases — called the Office of Data Science and Emerging Technologies. And that was done right at the start of the pandemic. So work there was done really in data sharing and training, and training people how to share effective data, standing up a new data access committee.

And that launched me into the national stage, where I ended up being invited to President Biden’s Fast Track Action Committee on Scientific Integrity. And that led Alondra Nelson at the White House Office of Science and Technology Policy to invite me to lead open science for the Biden-Harris administration, all the way before I got to OMB later on. So it’s been a long, winding career road.

Public Data Project:
That’s fascinating. It’s such an intersection of direct policy work, as you say, as well as the work that we in libraries are concerned with around preservation and access. It’s really great to have your perspective here.

And so if you don’t mind, we’ll just go ahead and jump into the report. And we librarians, we love lists. We love indexes. We love bibliographies. We love catalogs, right? And so a point that you repeatedly returned to in the report, one that I really took to heart, is that the Federal Data Catalog, often referred to as the FDC, is neither a repository nor is it a stable indicator of data accumulation or loss.

So I’m wondering: can you tell us what it is then? That is to say, how is it best understood? And if you could explain the relationship between the Federal Data Catalog and Data.gov, that would be really helpful.

Chris Marcum:
Yeah, this is a nuance in federal information policy that is not well understood or appreciated, even by the members of Congress who, ostensibly anyway, should have an interest or a stake here. So the Federal Data Catalog is a statutory requirement in the Foundations for Evidence-Based Policymaking Act. It’s in Title II, which is also known as the Open Government Data Act. And it basically establishes a centralized catalog or index of every agency’s federal data assets.

And previously, there had been an initiative started by the Obama administration that launched Data.gov that is hosted by GSA. Now, Data.gov did not serve as a repository. This is not where data is being deposited in the sense of, like, an institutional repository that many libraries are most familiar with. And instead, it just pulled in the information that agencies were indexing on their own inventories of data.

And so when the Foundations for Evidence-Based Policymaking Act was passed, it just made a lot of sense, right, to take advantage of the infrastructure that Data.gov provided. And so what we like to characterize it as is that Data.gov provides the Federal Data Catalog. And so the relationship is that Data.gov is the landing place for the Federal Data Catalog.

The Federal Data Catalog is comprised of an aggregation of what are known in the statute, in the Open Government Data Act, as agency comprehensive data inventories. This is just an index of every data asset they hold, but not the data themselves.

They [federal agencies] are under-resourced in terms of budget and staffing, and it would take an army for every agency to be able to do this comprehensively.

Public Data Project:
Okay, so I understand it’s not a repository, but I don’t understand completely why it’s not comprehensive. I mean, the words you just used would make me think that if every agency is submitting their indices, why isn’t it comprehensive?

Chris Marcum:
Yeah, this is a really good question. It comes down to the practicalities of implementation.

So today, there are over 500,000 datasets listed on Data.gov. Most of those are federal data assets. There are some data assets in there from state and local governments because Data.gov will index if they’re supplied to the GSA, the General Services Administration that administers Data.gov.

But the question about why the Federal Data Catalog isn’t comprehensive when, in fact, the federal agencies are required by statute to have a comprehensive data inventory.

And if you think about that number, around 500,000, it’s probably an order of magnitude lower than the actual number of federal data assets that federal agencies hold. And if you go back and you think about the complexity of all of the types of data and what is defined as a data asset that an agency might hold, you have to think back over the course of the history of that agency, and they might hold on to datasets for a long time. It becomes just a huge challenge to be able to index them, to digitize those. Some of those data assets are probably still on paper. Many of them have probably ended up, to some extent, in the National Archives already. And so there has been a loss of the record of those data.

And so it’s a complicated problem. It’s really challenging for an agency to be able to do a comprehensive inventory.

But the hope is that after we, and when I say “we,” [I mean] the Office of Management and Budget — while I was there, I was one of the leads of the development of an implementation guidance memo known as M-25-05, which is where we’re trying to translate Congress’s intent into an implementation strategy for the agencies to comply with the law on comprehensive data inventories.

And what’s really interesting about that is that the hope was that it would guide agencies to make sure that they have a forward-looking perspective. So everything that comes in now should be open by default, and that you should prioritize existing data assets based off of some strategies that you and your privacy officials and your chief information officers might have, and the agencies and your stakeholders might have for all the past data.

And so really it’s a forward-thinking guidance document. And so that’s why there’s under-resourcing that agencies are faced with, and the chief data officers’ staff. They’re under-resourced in terms of budget and staffing, and it would take an army for every agency to be able to do this comprehensively.

Public Data Project:
Yeah, that’s great. That’s really, really helpful and sobering to understand. Thank you for taking the time to explain that to us.

Another thing that really struck me in the report that really just rang true — my own background is in the history of bibliography. And you talk about a lack of consistent or transparent methodologies generally across the government and across the care for federal data assets. And one distinct part of that lack is in definitions — that is, clear definitions.

And you offer some helpful nuance when you distinguish between deletion, access removal, and discontinuation around federal data. That’s really important because when we’re talking about data rescue and things like that, those lines often get blurred. And it’s really important to remember how and why data might not be accessible.

But I was wondering, too, just at a very basic level, do we have a definition for federal data? Does it come down to who is collecting the data? Or because we know that contractors often do this work, is it who’s funding the collecting of the data? Something else? And then I guess I would just layer in, too, how and why might that definition matter? And I have some ideas myself related to what you were saying earlier, but I would love to hear if you had any additional thoughts on that.

Chris Marcum:
Yeah, so I wouldn’t say there’s a definition of federal data with the qualifier “federal,” but there is a definition of data in the Foundations for Evidence-Based Policymaking Act, as well as some other statutes.

And that definition is technical and a little bit boring, but — I’m going to use some acronyms — in 44 U.S.C. [3502], Congress has defined data as recorded information acquired or maintained by an agency, I believe. [Note: 44 U.S.C. 3502 defines “data” as “recorded information, regardless of form or the media on which the data is recorded”; related terms such as “data asset” refer to data maintained by an agency.] And so in the Open Government Data Act, there’s a provision that talks about recorded information, regardless of its form or the media on which the data is recorded, and that it’s acquired or maintained by the agency.

That is really important because in the modern age, we think of data as being digital, right? But this really gives a definition of data that is broader and that can include recorded information on paper, recorded information on [other media]. What I love to imagine is these new forms of data preservation where we have, like, crystals being inscribed. Or data being recorded in genomes, for example, has been a novel thing. So it’s a really broad definition.

And when you ask [about contractors], let’s say a contractor is working with the federal government and they’re collecting data. By statute, that data is owned by the federal government. It’s federal data. And so the Evidence Act, the Open Government Data Act, what is very clear is that those data assets do need to be inventoried. And any encumbrances on those data assets, let’s say that an agency partners with an organization that provides proprietary data for some services. If the agency is maintaining those data or it acquires them under whatever legal definition their lawyers can come up with, that has to be inventoried.

But the encumbrances on those data also need to be disclosed very transparently in the metadata. So the comprehensive data inventories have to say whether or not there’s copyright associated, and how the public can access it, if the public can access it, for example. I think the biggest component is transparency in that the agency has access to that data.

If data is put into an institutional repository, or is regularly used, or accessed via the cloud, there’s a good argument to say the federal agency is maintaining that data. …

Where it becomes more nebulous is on derivative datasets. And so you can imagine that you have a large corpus of data where you’ll have a dataset that lots of agencies create sub-datasets from … Are those data assets, and do they count as something being maintained?

Public Data Project:
Right, so the agency has access to it. And this word “maintained,” I might postulate, is even more nebulous than the word “preserve.” What does that mean in this context — to maintain that data?

Chris Marcum:
Yes, so does it mean that the agency has ingested it into their institutional repository? Does it mean that it’s stored on a computer in just one person’s office?

The chief data officers have to all go through this exercise where they have to figure out what the definition means to the agency’s mission. And so “maintained” here, I think, encompasses deposit in repositories. So if data is put into an institutional repository, or is regularly used, or accessed via the cloud, there’s a good argument to say the federal agency is maintaining that data.

Certainly data that are being updated, or are being cleaned, or being processed or used are also being maintained. And so that’s been a very easy one to handle.

Where it becomes more nebulous is on derivative datasets. And so you can imagine that you have a large corpus of data where you’ll have a dataset that lots of agencies create sub-datasets from: maybe bespoke use cases, or little research projects. Are those data assets, and do they count as something being maintained?

And so that becomes more of a product-focused approach to data. Is the thing that needs to be inventoried the parent dataset or any of these child datasets that might propagate after them? And that’s more complicated.

Public Data Project:
And returning to this concept of parent and child datasets, am I right to say that that is part of the reason that the numbers of datasets in Data.gov can fluctuate so wildly?

Chris Marcum:
Yeah. So one of the things that happened early on last year that got a lot of press and got attention even by members of Congress was that there was a lot of fluctuation shortly after the inauguration through the month of February on the top-level counts. Data.gov provides a top-level count, the number of data assets indexed in the Federal Data Catalog, and it was bouncing around on the order of a few thousand datasets.

And it just so happens that one of the very mundane reasons that can happen is because Data.gov is dynamic. It pulls in information from the federal agencies. And so if federal agencies are updating their comprehensive data inventories, then that will be reflected on Data.gov.

One of the big ways that that number can change is when an agency decides to put a series of data into a collection. And then historically on Data.gov, the way they handled that is — instead of enumerating every single one of the child datasets, you can imagine that there might be a project that has five datasets and they get collected into a single collection or put into a single collection. And then the inventory goes down by four because only the collection is being counted.

Now, the new iteration of Data.gov, the new update, doesn’t do that. It actually counts the individual data assets inside a collection. So this has been something that’s been desired by the community for a long time, and GSA is finally being responsive by updating Data.gov to make a more accurate reflection of the true count of datasets.

But it can happen the other way, too. You can imagine that a collection is, well, these are no longer one entity. There might be separate datasets, but there are separate maintenance tracks and update tracks, and they get broken up from a collection. That can also happen.

Public Data Project:
I just want to understand better. When you talk about Data.gov pulling in from federal agencies, is that automated? First question. And second question: how does it then relate to what you said earlier about it being statutory that this happens, that federal agencies contribute? So is it like there’s this automated process that you do or don’t sign up for? What is actually going on there with the vacuuming-in of data?

Chris Marcum:
Really good question. That is one of the mysteries in information policy.

So the way that this massive federated apparatus works — Cole Donovan and I recently wrote a piece for the Federation of American Scientists where we have a very simple sentence that I think has a lot of impact: “governing is hard.” And in this case, governing data is hard.

So I want to point your listeners to a resource, resources.data.gov, where they outline some of this process, to look at the information on data sources for Data.gov.

So what happens is the statute requires every agency to have a comprehensive data inventory. Some agencies have more than one. These become the data sources that are harvested by Data.gov. And some agencies have more than one, even though the statute says they have to have one.

Again, the complexities of implementation mean that [there are exceptions]. Like, the Census’s TIGER files have their own inventory because they’re updated with some regularity and they’re complex. And these are the shapefiles that give us our maps, basically, for the country. They’re relied upon by pretty much everything and they’re taken for granted because we all use them on Google Maps and other platforms.

And so what will happen is these inventories are promulgated at the agency level. They sit on agency servers. And then GSA has a harvesting routine that happens pretty much daily that goes through, crawls those sources, and then pulls in the information, updating its master list, which is the Federal Data Catalog.

Public Data Project:
Okay, thank you. And so then to return to the Federal Data Catalog, that’s the lodestone, the cornerstone of all of this. Thinking back, just to return to our initial conversation about its incompleteness. Were you made information czar, what would you do to make it more complete?

If we were to say that it would be a civic good to have a complete catalog, what would we do to get to that completeness?

Chris Marcum:
So I would first and foremost recognize that it is an extremely difficult task for the agencies.

And so, as I’ve said, notwithstanding resource limitations, staffing limitations, if we had some statutory authority with an appropriated budget that is sufficient to accomplish this, it would be really helpful for every agency to establish a data governance board that then goes through all of the use cases with effectively every staff member.

And we did this exercise in the Office of Management and Budget, or started to before I departed last year. But our CIO, Chief Information Officer, brought us together, about 20 or 30 of the staff members, to just talk about — hey, what data do you use? What data is important to you? What data do you store in your computer? What data do you make derivative datasets from? What do you need from us that you don’t have access to?

And that started the process for establishing a comprehensive data inventory within that office.

Establishing a data governance board that then goes out and makes sure the staff are trained in data access and management best practices, but are also aware of the need for inventorying all the data assets and to make sure the definitions for those data assets are governed — that would be what I would do. And I would make that a requirement for every agency and have the agencies report back up to, say, the Office of Management and Budget or another appropriate office as things evolve in the government.

There’s also … great expertise in the library community within the federal government. … And so greater interagency coordination is absolutely necessary for the success of this.

Public Data Project:
That’s great. Thank you. And in that work — you mentioned the National Archives, that some things go there. Of course, we’ve got our Library of Congress, which I realize has a somewhat complicated history when it comes to this kind of work. But I’m just wondering, are there library/archive institutions within the government already that would play a role here? Or is that a big lack?

Chris Marcum:
No, I think there are. I mean, it’s “yes and.” So, yes, there is a role for the National Archives. Obviously, the National Archives have to help agencies with their final disposition of all of their records and information that appear in datasets. Of course, those are records, and they are subject to the Federal Records Act for the most part.

So you have the National Archives, which has responsibilities on archiving information. They also have responsibilities for promulgating standards. They do the classification standards. And so it’s really helpful for agencies to be able to take advantage of this existing body of knowledge around, what is this? Is this controlled unclassified information? Is this secure information? And there’s already a lingua franca available.

There’s also, like you said, great expertise in the library community within the federal government. And one of the areas that I just love to talk about is that many agencies hold material collections. Obviously, we think of maybe the big ones, like the Smithsonian. There’s a huge material collection, huge libraries.

But then there are more nuanced cases like at NIST, the National Institute of Standards and Technology. They’ve got their reference materials database, a reference materials library. That is a licensed library that people pay to have access to. But they have a lot of knowledge on how to curate information in a structured manner for accessibility and preservation for the long term.

And so greater interagency coordination is absolutely necessary for the success of this. I like to even point to the fact that NIST a few years ago developed the Research Data Framework, where they provide a governance strategy for federally funded research data. And so this goes beyond just what the agency themselves are requiring or producing, to that which their grantees produce.

Public Data Project:
I’m thinking, too, another example might be the NASA Library, which of course was recently in the news and in peril, right?

Chris Marcum:
Yeah, so not all of the NASA libraries, just the library at Goddard has been shuttered. [Note: Additional NASA library closures have been reported since 2022.] And that is a tragedy because Goddard represents a wealth of material and informational assets that really require librarian stewardship over.

And to have those assets transferred either to the National Archives or probably, as the case may be now, shuttered and just locked behind a door while that process unfolds, really does not do a service to the public good. And it certainly doesn’t do a service to the researchers who rely on those resources at the lab.

I think it’s worth reflecting for a second on the ways in which the work of the government, when done best, is transparent. And that’s another way of saying it is accessible to all. … That is half of the reason that libraries exist: preservation and access, right? And so [between the government and libraries] there’s a very natural connection and shared mission in terms of the public good.

Public Data Project:
For sure. Our conversation has naturally shifted from questions around basic preservation to access. I think it’s worth reflecting for a second on the ways in which the work of the government, when done best, is transparent. And that’s another way of saying it is accessible to all. And that is the goal. That is half of the reason that libraries exist: preservation and access, right? And so there’s a very natural connection and shared mission in terms of the public good. So, yeah, that just all makes a lot of sense to me.

I would be remiss were I not to bring up metadata because we always want to talk about metadata. All roads lead to metadata. You note in your report that inaccurate metadata is a major issue, and the misclassification of datasets, and also misleading and rotting URLs, the kind of maintenance work that librarians are quite familiar with.

So I was just curious, in terms of metadata standards — I know they exist. Is that the issue, that the standards aren’t hitting it quite right? Is it an implementation issue? Is it something that’s happening in the aggregation? Where does the inaccuracy creep in? And then also the misclassification, and this obviously missing maintenance work. Lots on the table there, if you’d be willing to pick up any of that.

Chris Marcum:
I’m going to answer you with an answer I think you’re really going to appreciate. I think that the amount of, let’s just call it error, in agencies’ comprehensive data inventories is a strong indication of the need for more information scientists in those agencies, like librarians, like repository experts, to help with the curation.

Because ultimately the information in the metadata catalogs is only as good as it is entered, typically by people. And so you get a lot of errors that can occur based on human input error. You also get errors that occur when, like, a CIO migrates a system to a new server. And then all of a sudden, the links for the data sources are all broken. We’ve seen that happen in the past. An API that might serve up information about data or serve data itself might change. It might change vendors. And then that API might have a different URL propagation system. And so that can change. And so it takes time, of course. But if they had good information systems experts and information scientists available before these decisions are made, that will help tremendously with reducing the amount of error in the future.

On classification, I found this really fascinating for data in the Federal Data Catalog, because the law is not clear. And I will say that having struggled for a long time with my colleagues at OMB on how to communicate what constitutes a data asset, a public data asset, an open government data asset — these things are all in statute, but the distinction between them is not as clear as Congress could have made them. And part [of it] is probably because there certainly wasn’t an MIS or someone with an information sciences background writing the law, per se.

And so what we found is that the interpretation historically has been left up to the agencies and left up to individual subject matter experts or individual staffers. And so you get this really interesting mosaic of what gets captured as a data asset. And so it can range anywhere from a PDF of an infographic to, you know, the Census. And the diversity of that is just wild.

I think that hopefully M-25-05, the implementation guidance, provides some additional clarity on the structured nature that we expect of data. It’ll provide agencies more clarity, but they’ll also exercise more care in classifying their assets as they go through their prioritization of which assets need to move from federal data assets to public to open government data assets.

And again, it’s a tough problem. The other part of me is like, I love the fact that I can find, for example, CDC’s anti-smoking infographics on the Federal Data Catalog. But I just don’t think they belong there. And so it’s like, I love that they’re preserved and that they’re available. But are they data assets?

And so if you don’t preserve that data, then the tools, as you said, are kind of useless, right? Because they don’t have the high-quality information that you require. On the other hand, I am a strong believer in democratizing data and making it accessible and approachable to people.

Public Data Project:
Right. You talk in the report in really helpful ways about the distinction between data tools and data sources. And what is it that we need to be advocating for? The tools are amazingly powerful and they’re wonderful. And yet without the data behind them, there’s no there there.

Chris Marcum:
Yeah, it’s so fascinating because what enables many of the tools that have been taken down by this administration and put back up by civic society organizations is the fact that the underlying data have remained publicly accessible and were publicly accessible, publicly available.

And so if you don’t preserve that data, then the tools, as you said, are kind of useless, right? Because they don’t have the high-quality information that you require.

On the other hand, I am a strong believer in democratizing data and making it accessible and approachable to people. Denice Ross and I recently produced and published a Federal Data Field Guide. It helps to make federal data just more approachable. And it is, in effect, a type of data tool because it’s like an aggregation of all of these different data types. It provides an ontology.

I really do have an appreciation for democratization. I think the data tools really do provide that accessibility. And I think the modus operandi of this administration is to increase friction in the approachability of publicly accessible data. And so if you take down the tools that help everyday people interpret federal data, I think that’s part of the goal — even if you maintain access to the online data itself. So I’m right there with you. And the distinction is really important and it needs to be emphasized. Ultimately, if we’re targeting preservation, we definitely have to handle the underlying data because without the data, you don’t have the tools.

And I’d also want to add in another nuance and something I think a lot about, as when I was a senior statistician and senior scientist at OMB, is data reports. Data tools, typically, are interactive, and they help you interpret. But a lot of the economy relies on economic reports where the underlying data are confidential statistical data. They’re not readily publicly accessible. You have to go through a clearance process to get access to them, either through the Federal Statistical Research Data Center program or through the agency research data centers themselves. And there are costs associated with that. You have to be licensed and get clearance.

And so instead, what the agencies do is they create these wonderful aggregated quarterly, monthly, yearly reports that provide aggregated statistical data and information.

Many economists, many reporters, they consider that to be data, right? This is the federal economic data. It is not the dataset that underlies those data. It’s just the aggregations. And so that’s another really important nuance I didn’t talk about in my report, but is one that we have to really think about because these are costly. And the statistical agencies that produce them are under resource constraints and under threat.

Not only are data about people, but the entire data infrastructure relies on people. And the reduction in workforce capacity? There is irreplaceable, non-AI-replaceable damage that has been done.

Public Data Project:
Exactly. And the level of expertise it takes to produce them — the people who really know the data.

I’ll just ask one last question. What I’d love to close our conversation around is federal workers. And we’re not too far away from May Day to honor federal workers. As you know, I was DOGE’d myself a year ago, so this is a topic very close to my heart.

I’m going to embarrass you a little bit and quote from your own writing, because I was really struck by these sentences. You write, “By hollowing out subject matter experts and other critical staff across agencies, the administration reduced data integrity capacity in a systemic manner. Ultimately, this systemic disruption created lasting deficits in the nation’s ability to reliably collect, protect, and disseminate the vital data necessary for informed policymaking, economic forecasting, and scientific research.” I just thought that really summed it up in a lovely way.

So I wanted to see if you had any closing reflections on the relationship between the precarity of federal data and the slashing of the federal workforce.

Chris Marcum:
Not only are data about people, but the entire data infrastructure relies on people. And the reduction in workforce capacity? There is irreplaceable, non-AI-replaceable damage that has been done in this current administration to the federal workforce.

And you see some recalcitrance by the administration at this point in acknowledging that, where the Office of Personnel Management is touting that they’re going to hire thousands of tech workers. But they had just fired, like 300,000. Or 300,000 or so had departed.

So I would say, first and foremost, this is Public Service Recognition Week. And the public servants like you and myself, whether you have departed the federal workforce by your own volition, like myself, or not, like yourself, I think it’s incredibly important to recognize that subject matter expertise is absolutely essential for the integrity of federal data and for the integrity of maintaining public access to federal data.

Public Data Project:
That’s great. Thank you so much. And I think that’s the perfect place to end. And I just want to say thank you so much for your work.

And again, to give a shout out to The Integrity of Public Access to Federal Data, this fabulous report that Chris recently published. And we encourage everyone generally to pay attention to your work, because it’s just so valuable to all of us on so many levels. So thank you.

Chris Marcum:
Well, thank you, Molly, and thanks to your project and the great work that you all are doing with both the Data.gov project and everything that LIL is doing. I really appreciate it and really appreciate the opportunity to talk to you.

by Molly Hardy at May 14, 2026 04:00 PM

Journal of Web Librarianship

Algorithmic Literacy in LIS: A Systematic Literature Review

.

by Lateef Ayinde Ayansewa Adedeji Ayoola Oluwaseun Ajayi School of Information, Florida State University, Tallahassee, Florida, USA at May 14, 2026 05:35 AM

Developing a Framework and Wireframe for AI-Driven Personalization and Recommendation Systems in Library Management: A Design Thinking Approach

.

by Pitchai Arumugam Singarayar Jayachristrayar Rajendran Rega Jesus Rayar a Indian Institute of Astrophysics, Bengaluru, Karnataka, Indiab CHRIST (Deemed to be University), Bengaluru, Indiac Government Law College, Theni, Tamil Nadu, Indiad Department of Tamil, Central University of Tamil Nadu, Thiruvarur, Tamil Nadu, IndiaDr. Pitchai Arumugam, is the Librarian at Indian Institute of Astrophysics, where he leads transformative digital initiatives to modernize library and archival services. He has implemented key systems, including an integrated library management system, an institutional repository, collaborative knowledge platforms, and online finding aids. His current work focuses on research data management, digital preservation, and integrated discovery solutions to enhance research support infrastructure. With over 23 years of professional experience, his expertise spans digital libraries, knowledge management, library automation, artificial intelligence, and enterprise content management.Singarayar Jayachristrayar is an independent researcher in Library and Information Science, affiliated with Christ University. His research interests include AI-driven personalization, recommendation systems, and digital library innovations, with a focus on user-centered design and intelligent information systems. In addition to his research, he has authored Tamil poetry collections and is actively involved in providing open-source library software services.Rajendran Rega is serving as Librarian at Government Law College, Theni, Tamil Nadu, India. She is actively engaged in academic librarianship, legal information services, library automation, and digital resource management. Her professional interests include cataloguing, classification, collection development, and promoting access to legal information resources for students and researchers. She has contributed to library development initiatives, promotion of e-resources, and strengthening academic support services in legal education. Her interests also extend to knowledge organization, digital libraries, and innovations in library and information science.Jesus Rayar is a PhD scholar at Central University of Tamil Nadu. A gold medallist, he has published several research works in modern literature, literary criticism, creative writing, and Christian literature. He has qualified for the UGC National Eligibility Test (NET) with Junior Research Fellowship (JRF). His research interests include contemporary Tamil literature, critical theory, and Christian literature. at May 14, 2026 05:27 AM

Xe Iaso

Amazonbot is finally respecting robots.txt

I just got an email from Amazon saying they're finally going to respect robots.txt. Here's the verbatim email I got:

We are writing to inform you that starting Monday, June 15, 2026, crawl preferences for Amazonbot will be managed solely through the industry-standard directives. This gives you direct, ongoing control over how Amazonbot accesses your site, rather than relying on manual requests. If you do not implement robots.txt directives by that date, Amazonbot will follow standard web crawling practices when accessing your site.

How to maintain your current preferences: The robots.txt protocol allows you to control Amazonbot’saccess at the page-, directory-, or site-level and update your preferences at any time. Please find detailed information on Amazonbots approach to these directives here: https://developer.amazon.com/amazonbot.

Best,

Amazon Publisher Support

amazonbot@amazon.com

Get Outlook for Mac

Numa is smug
Numa

Amazing, they even kept the "sent from my iPhone" message proving they sent it from Outlook for Mac. Looking at the email headers it has a bunch of Exchange-specific headers so it's probably actually from Outlook for Mac. This timeline is absolutely wild.

This makes me feel kinda weird because Amazon's scraper is why Anubis exists.

I'm gonna make sure to merge these robots.txt changes into Anubis if they aren't already there.

May 14, 2026 12:00 AM

May 13, 2026

Evergreen ILS

Evergreen 3.17.0 available

The Evergreen Project is pleased to announce the release of version 3.17.0. 3.17.0 is a feature release that includes

Evergreen 3.17.0 is available for download.

by Galen Charlton at May 13, 2026 09:43 PM

LibraryThing (Thingology)

Interview: Stacy Mitchell, Bold Magazine Shop

Stacy (Klingbeil) Mitchell

LibraryThing is pleased to sit down this month for a special interview with Stacy (Klingbeil) Mitchell, who recently opened Bold Magazine Shop in our own home town of Portland, Maine. Describing itself as a “new, modern magazine shop,” Bold offers a carefully curated collection of independent and international magazines, with an emphasis on good design and interesting content. Mitchell, who studied design herself, and who worked for a number of years as a graphic and program designer for city governments, opened Bold in November 2025.

What gave you the idea to open a magazine shop, at a time when subscriptions to print newspapers and other periodicals are on the decline? Are you going against the grain, or tapping into an interest that isn’t being satisfied elsewhere?

I’ve had the idea for a long time–I’ve always loved magazines. There were a couple of moments that made me take it more seriously. I’ve been a longtime reader of Monocle, and when they launched Konfekt, their women’s title, it felt like something was shifting. Then when i-D was revived, I had a sense of urgency. I didn’t want to miss the opportunity to be part of whatever was happening in print.

While subscriptions to legacy publications are declining, there’s also a new wave of independent titles emerging, and others thriving. Many of the magazines I wanted to read and hold simply weren’t available in Portland.

I also looked to a few shops for inspiration: Periodicals in Detroit, Fine Print in Dallas, Issues in Toronto, and of course, Mag Culture in London and Casa and Iconic in New York. They made it clear that there is an audience for print.

What do you look for, when deciding what kinds of magazines to offer in your shop?

I’m looking for magazines that do something really well—whether that’s writing, photography, design, or concept. That’s the baseline, but many of the titles in the shop do more than one of these things exceptionally well.

I want Bold to be a place where someone can come in for something familiar and leave with a new magazine to try. It’s not for everyone, and it can feel overwhelming at first—but curiosity is part of the experience.

Since day one, we’ve also been listening to recommendations from customers. They tell us what they’re missing or what they wish they could find, and we try to track it down. This feedback has helped us discover some really beautiful, niche titles.

We’re still very new, and I’m excited to keep learning, experimenting, and figuring out what feels most like us—and what resonates with people coming in.

Browsing at Bold Magazine Shop

You’ve described yourself as an “analog girl.” What does that mean, and how has it impacted your vision for Bold?

When social media was really taking hold, I wasn’t sure how to navigate. It’s of course important for the shop, but it’s not our priority. I’m drawn to physical experiences and things you can hold, spend time with, and return to. That mindset definitely shapes Bold from the magazines we carry to the way we think about the in-store experience.

What’s so great about magazines? Why do you love them so much, and what can they offer that other kinds of print media cannot?

Magazines feel especially meaningful now—they’re tangible, intentional, and full of possibility. You choose them; they don’t choose you. That creates a different kind of relationship than what we experience on our phones. I really believe they’re having a comeback moment.

They’re also lower-commitment than a book since you can flip through or read more deeply, depending on what you’re in the mood for. That flexibility is part of their appeal. A magazine can be something you browse, spend time with, display, or gift.

They let you go deep on a subject while still feeling accessible. There’s certainly a nostalgia factor for some, and for younger audiences, there’s a sense of discovery–a way to support someone they follow online or an artist they admire through a physical publication they can actually hold.

What are some of your own favorites, of the magazines you offer, and what makes them appealing?

Tools is one of my favorites. It’s a stunning, creative publication from Paris that explores a technique or craft in-depth. The latest issue is themed around “spinning,” from the Earth’s rotation to profiles of glassblowers, potters, and ice skaters. It’s exactly the kind of magazine I imagined people discovering here and I love that others have been just as excited about it as I am.

Pencil, published in South Portland, is another standout. It’s done entirely in graphite pencil. It’s local, creative, and approachable. It expands what people think a magazine can be, which makes it a great introduction to the exciting world of indie print media and what this store is all about.

Night Event at Bold Magazine Shop

Tell us a little bit more about your shop, as a community space. What kinds of art exhibitions and other events do you host?

The people have been by far the best part. The customer who brings in an old issue to suggest we carry it. The publisher who stops by to chat. The friend putting together a stack for someone in the hospital. The magazine aficionado making sure we have the hardest-to-find issues. The person who says, “I’ve only ever seen this online.”

These people are the reason we’ve started thinking more intentionally about events and the shop as a place where people who love print can gather and connect.

On our very first day, the former first female Director of Photography at National Geographic came in. I followed up with her, and we hosted a small conversation to celebrate photography in print. More recently, we invited an artist who works with personal photographs and fragments of fashion magazines to show her work directly on our shelves. We’re still early, but we’re excited to see what takes shape next!

Bold Magazine Shop is located at 604 Congress St in Portland, Maine. Find their operating hours and select titles available online at boldmags.com.

by Abigail Adams at May 13, 2026 05:12 PM

In the Library, With the Lead Pipe

How do we write about this?: Reflections on scholarship about immigrants, data literacy, surveillance, and academic freedom in 2025

In brief

How do you write about immigrants, surveillance, and data literacy at a time when the political landscape around you is constantly shifting, immigrants are being targeted, and higher education is under threat? This question surfaced for us as we were writing an article for Library Trends in 2025 as a non-faculty doctoral student and a non-faculty academic librarian. This article explores the intersection of our personal experience in exercising our academic freedoms and the wider history and context of intellectual freedom in academia. As we worked on our Library Trends article, we confronted questions about our privileges and responsibilities, our concerns about safety, and fighting against self-censorship that grows out of the chilling effect of political pressures on justice-centered scholarship. We suggest risk management considerations and make an argument that collaborative writing can serve as one tool for resistance and empowerment when confronted with these challenges as librarians and library and information science scholars.

By Hayley Park and Negeen Aghassibake

Background

Writing about politically charged topics of the time forces scholars to weigh not only the costs and benefits of their work but also to consider risk management. Yet, what readers end up seeing is the published outcome, and rarely the decision-making process that scholars must navigate to get there. We recently wrote a  Library Trends publication titled “Empowering Immigrant Library Users Through Personal Data Literacy Programming in U.S. Public Libraries” (Park & Aghassibake, 2026), which sheds light on the impact of data surveillance on immigrants’ safety and recommends that public libraries play an active role in personal data literacy programming where possible. As we wrote the article during the first year of Donald Trump’s second presidency, we had to consider the meaning and boundaries of academic freedom and intellectual freedom, along with questions around safety and potential risks amidst intensified anti-immigrant rhetoric. 

This paper provides insight into the academic publication process, but it primarily contributes to the ongoing scholarly conversations around academic freedom in a time of anti-immigrant rhetoric and continued critical engagement in promoting library values and advocacy work. We argue that collaborative writing can serve as a form of resistance and a way to contend with uncertainty that enables scholars to critically engage in the process of navigating tensions, identifying values and power dynamics, and being vulnerable.

Key terms

For clarity, we offer descriptions of how we’re using related and often conflated terms. Academic freedom has a wide range of definitions across legal, political, and cultural contexts (Nelson, 2009). When we use the term “academic freedom,” we are largely aligning with the United Kingdom Education Reform Act of 1988 definition from Section 202 that refers to academics’ right to pursue research and lines of inquiry “without placing themselves in jeopardy of losing their jobs or privileges they may have at their institutions” (Education Reform Act 1988). We expand upon this definition to include freedom from institutional interference in academic activities with the purpose of reducing academic freedom.

We recognize that this definition varies from the 1940 Statement of Principles on Academic Freedom and Tenure by the American Association of University Professors (AAUP) that specifies academic freedom for faculty (1940). However, we are also in alignment with the following assertion in the AAUP Joint Statement on Faculty Status of College and University Librarians that recognizes academic librarians as members of the academic community and their need for academic freedom to conduct their work (American Association of University Professors, 1972): “Academic freedom is indispensable to librarians in their roles as teachers and researchers…as members of the academic community, librarians should have latitude in the exercise of their professional judgment within the library…” This also aligns with the ACRL Standards for Academic Librarians Without Faculty Status that says that “Academic Librarians are entitled to the protection of academic freedom” (Association of College & Research Libraries, 2011). Later in this article, we will discuss uneven protections of academic freedom for non-faculty academics in greater detail.

When we use the term “intellectual freedom,” we are generally referring to “the right of every individual to both seek and receive information from all points of view without restriction” (Garnar et al., 2022, p. 300). Intellectual freedom and academic freedom are related, though distinct, terms. Within our framework, academic freedom requires intellectual freedom to exist and is specific to institutions within the academy, whereas intellectual freedom is not restricted to academic life and applies to public life.

We also use the term “chilling effect” throughout this article. The definition we’re following here is based on Jonathan W. Penney’s explanation of chilling effect: “when a person, deterred by fear of some legal punishment or privacy harm, engages in self-censorship, that is, censors themselves and does not speak or engage in some activity, despite that activity being lawful or even desirable” (2022, pp. 1454-1455). As an example, within the context of this article, that could look like a researcher deciding not to pursue a politically charged research question due to a fear of harm to their career.

Another word that we refer to throughout this article is “safety.” When we use this word in this article, what we mean is safety from repercussions from our institutions and safety from impacts to our positions from within and outside of our institutions for exercising our academic freedom. This is not to say that we are opposed to critiques or debates, and as noted in the AAUP’s 1940 Statement of Principles on Academic Freedom and Tenure, we recognize that “the public may judge [our] profession and [our] institu­tion[s] by [our] utterances” (1940). Rather, it is the acknowledgment that fear of reprisal, such as job loss, and harm, such as stalking and individualized threats, has a chilling effect on academic freedom.

Setting the scene: writing in 2025

As we worked on our Library Trends article, the implementation of anti-immigration-related policies resulted in a stream of news covering the arrests of immigrants. In the educational context, attacks on diversity, equity, and inclusion (DEI) programs in higher education also led academic institutions to modify their language and programs, if not entirely remove DEI initiatives and positions, following Executive Order 14151, “Ending Radical And Wasteful Government DEI Programs And Preferencing” (The White House, 2025a).

In fact, several institutions of higher education, including Brown University, Columbia University, the University of Pennsylvania, the University of Virginia, among many more, agreed to Trump’s priority funding offer for accepting the compact that effectively threatens academic freedom and intellectual freedom on campus (Uglesbee et al., 2025).

Additionally, the library world was shocked when the Institute of Museum and Library Services (IMLS) was shut down by Executive Order 14217 (“Commencing the Reduction of the Federal Bureaucracy”) in February 2025 (The White House, 2025b), and its future remains uncertain (Landgraf, 2026). In this rapidly changing context, as a new PhD student, Hayley had to consider whether she could continue conducting immigrant-centered research. Most directly, the immediate suspension of the IMLS grant that funded her work meant the potential suspension of her salary. In addition, DEI-related terms and programs were targeted at public institutions, colleges and universities (Kim, 2025). Negeen felt the impact directly when an individual involved with anti-DEI organizations asked for her and her organization to be disciplined as a result of a program that she led.

The Trump administration released a list of more than 250 words deemed unacceptable in 2025, and the word “immigrants” takes space in the forbidden word list (Connelly, 2025). Assigning appropriate, representative keywords to a work is critical. There has been a record of information science literature emphasizing this point, from the classification aspect to developing a scholarly identity (Bowker & Starr, 1999; Inouye & McAlpine, 2019). This means that banning words such as “immigrants,” “advocacy,” and “equity” directly impede not only the effective discovery of a work but also the production and documentation of knowledge that enriches the scholarship on a topic.

Additionally, not being able to use specific words related to groups of people leads to an inability to discuss the harms that impact those groups. Political controversy can impede scientific research, leading to varying levels of self-censorship, from removing controversial words from research projects to leaving academia for an external position with greater job security (Kempner, 2009). In fact, drawing the direct parallel between Trump’s ban of seven ‘forbidden words’ from Center for Disease Control (CDC) during his first presidency — “vulnerable, entitlement, diversity, transgender, fetus, evidence-based, and science-based” (Sun & Eilperin, 2017), and the 2025 list (Connelly, 2025), Kronk and her colleagues analyzed the estimated impact of the government-imposed restriction of scientific language and found that the state and funder requirement of the removal of specific scientific terms could not only jeopardize research integrity as well as the communication of the findings (Kronk et al., 2025). Within the field of information, Kate Starbird, a Professor at the University of Washington who specializes in how information circulates in online spaces, has shared her personal account of being the subject of online harassment, her concerns about her and her team’s safety, and witnessing a chilling effect because of her scholarship on online mis-and disinformation related to the U.S. election (Starbird, 2023). Following the logic used by UC San Diego public health scientist Rebecca Fielding-Miller in an article about forbidden words (Sharma, 2025), if we are unable to use the word “immigrants” in our research, then we are unable to discuss the harms inflicted by the federal government on immigrants.

This move has forced scholars and institutions to reconsider their research directions and programs, leading to anticipatory self-censorship (The Lancet, 2025). This censorship of “‘disapproved subjects’” has led to a chilling effect on academics that has lasted well beyond the implementation of these restricted words (Blinder, 2026).

In this context, we had to take into consideration the changing political landscape affecting the conditions of our scholarly activities while writing about data and immigrants in libraries.

Positionality and privilege

We approached this experience and this subject with considerable privilege, given our positionalities. Hayley is a PhD student at a major research university, and Negeen is an academic librarian with permanent status at a different large research university. Both Hayley and Negeen generate scholarship in some form in order to progress in their careers. Hayley has some specific protections for graduate students as part of the University of Maryland (University of Maryland, n.d.), and Negeen has protections under Article 61 of the “Collective Bargaining Agreement By and Between the University of Washington and the Service Employees International Union Local 925 for Professional Libraries and Press Employees and Librarians” (2023). However, during the actual writing process, the application and the limits of these terms weren’t clear, as discussed in greater detail in the section below.

In addition, we are both women of color working at public institutions at a time when DEI-related activities in higher education are being scrutinized, and Hayley identifies as an immigrant working in this space at a time when immigrants are being targeted and attacked. We had to ask ourselves questions such as: What are the risks in writing this article? Can we weather those risks given our privileges? Are there risks we cannot see at the moment? Who may be surveilling our work? What is our responsibility given the privileges we do have? Are we overthinking this?

We experienced the tensions of these equally true realities and felt conflicting senses of vulnerability and privilege throughout the process of writing the original article and this one. We felt that one way to bring this conflict to the surface was through further collaborative reflection and writing. These conversations also helped us in our attempts to demystify academic freedom for ourselves and to consider the practical implications of exercising that freedom, given our positionalities.

In addition to the questions we asked above, we also uncovered questions related to how academic freedom applies to library workers across the field. Neither of us are faculty, and much of the language we found as we explored the literature on academic freedom related directly to faculty. Leebaw and Logsdon (2020) argue that academic freedom, in the context of academic librarians, needs further research. According to them, non-faculty academic librarians do not experience the same level of protection as faculty with tenure, and that one’s financial security and social identity impact their perception of their protection. This lack of a perception of safety and protection (as a result of not being faculty) potentially contributes to the chilling effect that some librarians may experience.

Throughout our research, the conclusion we came to is that the state of academic freedom for academic librarians is inconsistent across the field and institution-specific. Negeen’s union’s collective bargaining agreement (a legally binding contract), for instance, ensures academic freedom not only for librarians in the union, but also for non-librarian staff in the union:

The University of Washington recognizes Librarians’ and Libraries and Press professionals’ right to academic freedom and the right to examine and communicate ideas by any lawful means, even if such activities should generate hostility or pressure against the Librarians, Professional Libraries and Press employees, or the University (Collective Bargaining Agreement SEIU 925 UW Libraries and Press Union, 2023).

University of California librarians also won academic freedom rights in their collective bargaining agreement (Carrillo, 2019), whereas “academic freedom” is not mentioned in the Northwestern University Library union contract (Collective Bargaining Agreement Between Service Employees International Union, Local No. 73 and Northwestern University, 2023).

The status of academic freedom for PhD students is similarly inconsistent and unclear. While Hayley was unable to find academic freedom protections that directly applied to her, she did find academic freedom protections for University of Maryland School of Law students (and librarians) specifically: “Our commitment to academic freedom extends to all members of the law school community. We recognize the need for academic freedom for students and teachers, in their, at times overlapping, roles as scholars, educators, clinicians, administrators and librarians” (University of Maryland Francis King Carey School of Law, n.d.).

Chick (2025) finds that PhD students felt their academic freedom was under attack, especially when their topics were related to DEI issues. PhD students were being placed in a unique position of choosing between their commitment to equity and professional and academic risks. Chick further observes the doctoral students’ self-censorship as a consequence of the chilling effect of external threats to DEI initiatives and academic freedom in the context of educational innovation at Hispanic-serving institutions. Doerfler et al. (2021) also documented the experiences of scholars who have faced Internet-facilitated harassment, including doxxing, Zoom-bombing, and threats to funding cuts. Given the abundance of evidence in the literature, as well as our own anecdotal experiences, we knew we had to be thoughtful in how we made decisions.

Problem statement and significance

Scholarship on intellectual freedom and censorship under political and institutional pressure has a long history in LIS scholarship and practice. Yet, during our publication process, we found limited guidance on how to engage in academic writing about a topic that might be perceived as politically charged.

While Negeen has the protections in her union’s Collective Bargaining Agreement mentioned above (Collective Bargaining Agreement SEIU 925 UW Libraries and Press Union, 2023), the article suggests some limitations to those protections: “The expression of dissent and the attempt to produce change may not be carried out in ways that…disrupt the work of other University personnel.” The question of what is “disruptive” is undefined. Negeen was also unable to find specific guidance about how the university responds to threats to academic freedom, whether internally within the university or externally from the general public or government. This lack of clarity led to feelings of uncertainty about the boundaries of her academic freedom.

For Hayley, understanding the clear boundaries of her academic freedom was both straightforward and ambiguous. While the University of Maryland grants academic freedom as part of its intellectual freedom principle to all campus members, including students (University of Maryland, n.d.), there was no dedicated institutional document detailing how it applies to doctoral students, who operate in researcher, student, and teaching roles depending on funding and acceptance package, or the extent of protection and representation the university provides.

Similarly, while the AAUP provides a document on academic freedom and tenure that could be interpreted as extending protection to students in their learning, the scope and application of the agreement remain unclear (American Association of University Professors, 1940). This lack of clarity and practical guidance is further reflected in the absence of official institutional documents on this topic, from PhD orientation or doctoral program policy at the Graduate School level. This leaves doctoral students in a state of uncertainty as they exercise their intellectual freedom, navigate risks, and also complete their doctoral programs.

Some specific areas in need of further guidance include selecting publication venues, determining project scope, and author visibility to demystify the extension of author protection and academic freedom given to LIS PhD students and academic librarians, the detailed accounts of which we had difficulty locating in existing LIS literature. In response to this gap, and in the spirit of peer-sharing, we offer our decision-making process that shaped our topic selection, contextual scope, and the level of visibility we assumed as authors, including selecting venues and considering our safety (e.g., understanding potential threats, having a clearer sense of how institutional policies could be helpful or harmful) when advocating for immigrant rights, especially from perspectives of others in our academic (but not faculty) positions.

Our aim is to help other library scholars facing similar challenges think through the implications of exercising their freedoms and make informed decisions based on their own professional and personal circumstances. Our self-reflexive account of the immigrant-centered writing process contributes to the broader scholarship on academic freedom but also adds to the growing accounts of the lived experiences of library workers in navigating challenging political environments. In the following paragraphs, we will outline our publication experience, discuss issues of academic freedom and intellectual safety (especially in relation to intersectional identities and institutional privileges), and argue that collaborative writing is one form of resistance.

Context and timelines

The anti-immigrant policies around the time of our writing of the Library Trends article led to our discussions about academic freedom, which inspired the basis of this article. We submitted our Library Trends proposal in mid-December 2024, before Donald Trump took office, and our proposal broadly focused on data literacy programming for immigrants in public libraries, with some focus on privacy. We received notice of its acceptance on January 6, 2025, during a transitional period between the 2024 presidential election and the inauguration scheduled on January 20, 2026. When we began the actual writing process, Trump was inaugurated into the office, and on his first day, he signed about 34 immigration-related policies, 27 of which he revived from his first term, and seven additional policies (Roesenberg et al., 2025).

We recognized that the political and social context around immigration had shifted, and we realized that our initial conception of an article about data literacy programming for immigrants through public libraries with only a minimal focus on privacy was no longer sufficient. Rather, we needed to expand the focus to also include advocacy as well as personal data literacy as the foundation of privacy literacy against surveillance. For example, rather than conceptualizing data literacy as a necessary skill to better understand areas such as financial literacy and entrepreneurship, we recognized the emerging and more urgent need to make more accessible how personal data could be used against immigrants – for example, through programming that allows users to think through the implications of location sharing in mobile apps. This also included a shift from just thinking about programming to considering if and how advocacy plays a role in this work. We discussed the need to redirect some attention to contextualizing our recommendations in the history of advocacy in libraries and the recognition of increased surveillance of scholars and the general public, reflecting the historical and professional parallels in the rapidly changing political environment.

Our writing process spanned the entirety of 2025, with revisions and source updates responding to news and policy changes. The timeline of current events was closely intertwined with our publication timeline, prompting us to reflect on the challenges of writing about immigrant advocacy in academia amid a divisive political context. The article was published in February 2026.

Emerging concerns: Academic freedom and safety

Throughout the writing process, we confronted emerging concerns from the shifting political and social climate. These concerns forced us to face the public nature of our work, which also instilled a conflicting sense of both vulnerability and a renewed commitment to LIS values and led us to consider the issues around academic writing, its consequences, and the risks associated with it. Banning specific language, pressure on institutions, and threats to personal safety lead to a physical and psychological condition of a chilling effect in academia. Schauer (1978) discusses concerns over threats and safety, vulnerability, deciding when and how to speak about a topic. Initially conceived as a legal framework, Schauer captures the affective factors related to legal uncertainty, especially in relation to the exercise of First Amendment rights. The chilling effect here results in the avoidance of exercising one’s rightful freedom of speech due to fear of or uncertainty around potential legal consequences. Despite our best efforts, we often came up against the chilling effect and had to be intentional about not censoring ourselves in our writing and conversations.

Challenges against academic freedom are not new. What is common in these repeated challenges is a thread of patterns that shape and oppress the conditions of academic activities. Nelson (2009) outlines several threats to academic freedom, including authoritarian administration, unwarranted research oversight, political intolerance, legal threats, and claims of financial crisis. Many of these constructs describe the current sociopolitical landscape under the Trump administration, as discussed in this article. Threats against academics and their scholarly activities can be so severe that there are dedicated organizations such as the Scholars at Risk Network that provide support to those scholars globally (Adebayo, 2022).

One example in the United States is that of the experiences of Ricardo Dominguez, Professor of New Media, Performance Art, and a Principal Investigator at CALIT2 at the University of California, San Diego. Dominguez found himself at the center of the immigration and academic freedom debate for creating Transborder Immigrant Tool, which was a mobile application that was designed to help migrants find water caches and access and read poems. Dominguez was investigated for a potential misuse of funds and was under threat of termination from his position following pressure from three Republican members of Congress (Dominguez, 2014). The investigation found no evidence of a misuse of funds and he did not lose his job. However, this example illustrates how a work of academic and artistic expression can be used to try to criminalize a scholar, and how political and ideological differences (specifically anti-immigrant sentiments) can lead to attacks on scholars (Warren & Warren, 2011). These types of attacks on academic freedom and scholars’ livelihoods force scholars to negotiate between continuing their research and balancing real threats to individual and professional safety.

Decision-making process

To make informed decisions, we engaged in a collaborative assessment of our situation that involved discussions about scope and approach to writing, consultation with the editorial team and mentors, and regular team conversations to check in about our concerns.

Project scope and capturing the moment

As we worked on our article, we encountered a significant issue: an increasing number of news reports related to our topic, which raised concerns about academic freedom and intellectual safety. To begin, we had to decide the extent to which we wanted to document the current moment. Situating our inquiry in the current context that continued to unfold forced us to think about the relevance of not just the connection to our arguments for personal data literacy programming in public libraries, but also the details of the political conditions that continue the thread of advocacy in the history of libraries. Ultimately, we decided that we could not discuss non-citizen immigrants and personal data without also documenting current events. The decision to capture the current moment led to a discussion about source citing.

This led to more questions. The year 2025 was marked by political, legal, and policy shifts. Many cases that were in the news were still developing, and we would read updates as they progressed. It was difficult to get to a point where we felt like we could stop updating our literature review. We had cited cases that were, as of that moment, unresolved and still shifting. While the purpose was to situate our article within our specific context, it was hard to resist the temptation of a closure that we knew would never come. Ultimately, remembering the purpose of providing this background information (and quickly approaching deadlines) allowed us to pause.

Additionally, we believed it was important to capture the moment we found ourselves in through a less-than-traditional literature review and background that included newspaper articles and grey literature (such as reports from government agencies, policies from corporations, and presidential Executive Orders). We believed it was important to provide the context in which we were writing our article, and we also wanted to contribute to preserving this snapshot in the scholarly record. Given the likelihood that a news event as well as its source we referred to in the article would most likely change, we employed a few strategies to maintain the rigor of the publication. First, we made sure to include information about the specific time period that the writing took place in the narrative, along with an acknowledgement that the current landscape would likely look meaningfully different before even the final publication date. We also took snapshots of many sources by capturing their webpages using the Wayback Machine in case they are modified or taken down. Our hope was to maintain an accurate description of the context at the time of writing against the increasing instances of data and information erasure.

Risk assessment

Writing about advocacy and social justice and the exercising of intellectual freedom is one of the strengths of our field. Given the challenges against scholars at the time of writing, we had to consider the boundaries of exercising our intellectual freedom within our academic and institutional settings, especially given the public nature of our work. Other scholars have also felt compelled to change the direction of their work due to the threats of damaging emails or online bullying. Attacks following scholarly activities are also not unheard of in libraries.

In 2019, Professor Nicole Cooke and librarian Amy Koester hosted a conference called Defeating Bullies and Trolls in the Library: Developing Strategies to Protect our Rights and Personhood at the Skokie Public Library in Illinois. The conference featured several library scholars including Nicole Cooke, Stacy Collins, Kristin Lansdown, Amy Koester, Dianae Foote, Emily Knox, Jamie Naidoo, and Aimee Strittmatter, who spoke about bullying and harassment at both personal and institutional levels (Peet, 2019). This conference was built off of a panel discussion titled “Bullying, Trolling, and Doxxing, Oh My! Protecting our Advocacy and Public Discourse around Diversity and Social Justice,” which took place at the 2018 American Library Association Annual Conference in New Orleans and discussed library workers’ experiences of being bullied. While on that panel, Cooke and Miriam Sweeney specifically shared their experience with being harassed after Campus Reform (a conservative news source that covers higher education) published an article about their research on microaggressions in libraries (Peet, 2018).

Most recently, Oltmann and Dowell (2025) released a piece analyzing Professor Watchlist, a project created by the conservative organization Turning Point USA that surveils and “exposes” faculty for alleged discrimination of conservative students (Professor Watchlist, n.d.). They interviewed some faculty on the list and asked about its impact on their work. While some interviewees reported that their inclusion in Professor Watchlist was positive in the sense that they received support from their peers and institutions, they also noted concerns about its impact on academic freedom.

This all shows multi-layered tensions involved in the work. While researchers engage in important work, they also have to assess the potential risks of doing so against the risk of personal harm. The risk could ideally be mediated by institutions, yet, as Cooke and Sweeney specifically shared at the 2018 panel discussion, the response could either be delayed or lack an established protocol that provides an adequate level of protection, leaving the risk to be handled and experienced by the scholars themselves (Peet, 2018).

We experienced what we see as a fundamental tension between academic freedom and safety. That is, although we were granted academic freedom by our institutions, our work would be publicly available, so we were not necessarily guaranteed academic safety. Academic institutions, like many organizations, must also weigh risks and manage their public reputations. We grew concerned that threats to our reputation, institutional reputation and/or funding from external political forces could adversely impact our sense of academic freedom. These multilayered concerns stemmed from knowing how to identify threats to academic freedom but not enough about how to overcome these threats. To demystify the source of uncertainty, we consulted multiple members of our support network, beginning with the editorial team.

Editorial support

It takes a team to make an informed decision. Once we decided to situate our inquiry in the context of ongoing advocacy work in libraries, we needed to make sure our intention was aligned with the overall purpose and theme of the volume, “Data Literacy: Navigating the Shift from Hype to Reality (Chiewphasa, 2026). Additionally, Library Trends is published by the Johns Hopkins University Press (Hopkins Press, n.d.), and given the $800 million federal funding cuts the institution was facing in March 2025 (Daniels et al., 2025), we also felt the need to check in with our editor, Ben Chiewphasa. This led to honest and open dialogue about our reoriented research direction, as well as our concerns around the topic itself and our visibility, keywords, and the potential challenges to our work. These conversations helped us gain not only a sense of clarity but also support and being in community with other scholars.

Lessons learned

Throughout this experience, we learned lessons that we will carry forward and that we hope will make a meaningful contribution to conversations on academic freedom and safety for library workers and scholars.

Practical resources (and their limitations)

Having practical tools significantly reduces feelings of uncertainty and increases a sense of preparedness for potential threats. During the writing process for both the Library Trends article and this article, we discovered that some academic and non-academic institutions have provided practical guidelines to protect researchers’ safety.

In recognition of the common tactics of intimidation and harassment that some researchers face in their public-facing work, the Researcher Support Consortium created a series of resources to help researchers navigate potential risks and responses, and to equip them with coping strategies when attacked (Researcher Support Consortium, 2024b). The recommendations range from the removal of personal information on public websites, including institutional directories, to protecting online spaces with password requirements, and also preparing an organizational statement if necessary. The guide repeatedly emphasizes the importance of mitigating impact on one’s psychological well-being and physical safety, and recommends seeking out support in numbers and communicating with a support network. Along with these recommendations, the group also provides institutions with toolkits to protect and support their researchers, including a step-by-step guideline, a sample institutional policy, and certificates of confidentiality (Researcher Support Consortium, 2024a).

Several universities also provide researcher safety guidelines directly. The University of Colorado Boulder provides an extensive list of guidelines under Scholarship & Safety, including a checklist based on professional positionality (e.g., researcher, administrator) in the context of the event (Academic Affairs at the University of Colorado Boulder, n.d.). Similarly, York St. John University clearly addresses the term “researcher vulnerability” and provides institutional guidance on addressing physical and psychological vulnerabilities associated with research (York St. John University, n.d.). Both examples of institutional guidance include different types of risks associated with research, not only online harassment and physical harms but also the psychological aspects; this includes prolonged engagement with sensitive research topics and the potential to experience vicarious trauma.

While these documents provide a starting point for protecting oneself against potential attacks, they are few in number. The recommendations, while calling for institutional support, primarily ask individual researchers to take protective measures, thereby adding additional psychological and, at times, financial burden (such as subscribing to personal data removal services). Additionally, these guidelines may not cover all categories of scholars at an institution (e.g., PhD students, teaching faculty, staff scientists, academic librarians). We recognize the value of these tools and also acknowledge that they have limitations, and we suggest that institutions take steps to develop more comprehensive safety plans for all types of scholars.

Power of collaborative writing

One of the most important lessons we learned from writing about immigrants, data literacy, and surveillance in 2025 is that collaboration and care are critical to our decision-making process as well as our well-being as we work through our concerns. In addition to the typical challenges we expected when writing on our topic, we encountered new challenges through renewed anti-immigrant sentiments and threats to academic freedom. These issues impacted our psychological states more than we had initially anticipated. However, through our collaborative writing process and our weekly check-ins, we provided support for one another and worked through our concerns collectively and while critically interrogating the unfolding events. Exercising our library values and putting them into practice felt empowering and generative during a time that was otherwise demoralizing.

Additionally, we were grateful for the support we received from our editorial team. We were able to be open with them about our concerns and fears, and we had opportunities for low-stakes exploration of various options to minimize our exposure. We also received institutional support from our advisors and supervisors. These were key ingredients in our decision-making process, and we acknowledge that they were privileges not afforded to all academics and advocates writing in this space.

One of the most important aspects of our writing process was acknowledging the moment in history in the article itself. This helped ground us and provide context to our conversations about how to move forward with the article. Furthermore, it was a reminder that all articles are written under a particular context, whether stated or unstated, and we will consider including the social and political context in future articles we write.

Another lesson we learned, which is not a unique lesson but needs to be considered individually, was our decision on whether or not to use our real names and institutions in our publication. We had several discussions related to this issue, especially after one of us was targeted at work. Ultimately, we learned a lesson that other scholars may have already learned: we could either be completely anonymous or very visible.

Conclusion

We hope that sharing our experience and situating it in the wider context of academic freedom and intellectual freedom can contribute to discussions about the protections that library workers have (and lack) when exercising their freedoms. We felt that it was important to discuss not only the abstract, theoretical issues related to this topic, but also our lived experiences.

We are grateful to all of the scholars working on academic freedom, particularly the work of Leebaw and Logsdon (2020) and Nicole Cooke, Stacy Collins, Kristin Lansdown, Amy Koester, Dianae Foote, Emily Knox, Jamie Naidoo, and Aimee Strittmatter (Peet, 2019). One area of future research is to update the data on the state of academic freedom in libraries and for PhD students today given the rise in new union contracts that we found throughout our research. Ultimately, a more systematic approach to analyzing union contracts and institutional policies across libraries and doctoral programs will better prepare library workers and PhD students (and graduate students more broadly) to fight for the academic freedoms that protect faculty (while still recognizing its limitations).

One of the many inevitable consequences in writing about a topic that was personally and professionally challenging was that we experienced a range of emotions as we wrote: fear of being targeted, anger at the persecution of immigrants, and occasional moments of despair. We recognized the ways in which these emotions could lead to self-censorship and the watering down of the realities of that moment in history. This led to regular conversations about how we were framing current events and the urgency in our tone throughout the article, which needed to be balanced with practical suggestions for readers. This was a negotiation neither of us had experienced in writing an academic article, and our discussions were essential in determining our next steps, along with the support of our editor and peer reviewers.

Of course, we recognize that while powerful, writing is not a substitute for other actions. However, our writing partnership has helped us push against feelings of fear and fatalism, and it has been a way to connect with others who also care about these issues. We hope that readers are inspired to find community through writing and use it as one tool (even if a small tool) of resistance.


Acknowledgments

We are grateful to our editor, Pam Lach, and our reviewers, Brea McQueen and Shawn(ta) Smith-Cruz, for their support in guiding us through this process. At a time when we are all experiencing pressures from many directions, their generosity, energy, and care is even more meaningful. This article is better having been shaped by their wisdom and thoughtfulness. We are also thankful for our partnership and to LIS scholars for motivating us to keep on writing.


References

Academic Affairs at the University of Colorado Boulder. (n.d.). Scholarship & Safety: A Guide for CU Boulder. Retrieved February 14, 2026, from https://www.colorado.edu/academicaffairs/about/academic-freedom/scholarship-safety-guide-cu-boulder

Adebayo, K. O. (2022). The state of academic (un)freedom and scholar rescue programmes: A contemporary and critical overview. Third World Quarterly, 43(8), 1817–1836. https://doi.org/10.1080/01436597.2022.2074829

American Association of University Professors. (1940). 1940 Statement of Principles on Academic Freedom and Tenure with 1970 Interpretive Comments | AAUP. https://www.aaup.org/reports-publications/aaup-policies-reports/policy-statements/1940-statement-principles-academic

American Association of University Professors. (1972). Joint Statement on Faculty Status of College and University Librarians. https://www.aaup.org/reports-publications/aaup-policies-reports/policy-statements/joint-statement-faculty-status-college

Association of College & Research Libraries. (2011). ACRL Standards for Academic Librarians Without Faculty Status. https://www.ala.org/acrl/standards/guidelinesacademic

Blinder, A. (2026, March 16). Professors Are Changing What They Teach, Even Far From Trump’s Gaze. The New York Times. https://www.nytimes.com/2026/03/16/us/professors-change-teaching-trump.html

Bowker, G. C., & Star, S. L. (1999). Sorting Things Out: Classification and Its Consequences. The MIT Press. https://doi.org/10.7551/mitpress/6352.001.0001

Chick, J. C. (2025). Navigating Academic Freedom and Student Concerns in Doctoral Education at Hispanic-Serving Institutions: A Faculty Perspective. Education Sciences, 15(10). https://doi.org/10.3390/educsci15101324

Carrillo, A. (2019, April 9). UC librarians conclude negotiations of salary increases and academic freedom protections. Daily Bruin. https://dailybruin.com/2019/04/09/uc-librarians-conclude-negotiations-of-salary-increases-and-academic-freedom-protections/

Chiewphasa, B. (Ed.). (2026). Library Trends, 74(3). https://muse.jhu.edu/issue/56412

Collective Bargaining Agreement Between Service Employees International Union, Local No. 73 and Northwestern University. (2023). https://seiu73.org/wp-content/uploads/2023/09/2023-2026.pdf

Collective Bargaining Agreement By and Between the University of Washington and the Service Employees International Union Local 925 for Professional Libraries and Press Employees and Librarians. (2023). https://www.seiu925.org/wp-content/uploads/2023/08/SEIU-925-UW-Libraries-CBA-2023-2026-updated-and-re-signed-for-corrections-July-2023.pdf

Connelly, E. (n.d.). Federal Government’s Growing Banned Words List Is Chilling Act of Censorship—PEN America. Retrieved February 16, 2026, from https://pen.org/banned-words-list/

Daniels, R., Jayawardhana, R., & Heller, L. (2025, June 2). Updates on federal actions and budget planning. Office of the President. https://president.jhu.edu/messages/2025/06/02/updates-on-federal-actions-and-budget-planning/

Doerfler, P., Forte, A., De Cristofaro, E., Stringhini, G., Blackburn, J., & McCoy, D. (2021). “I’m a Professor, which isn’t usually a dangerous job”: Internet-facilitated Harassment and Its Impact on Researchers. Proc. ACM Hum.-Comput. Interact., 5(CSCW2), 341:1-341:32. https://doi.org/10.1145/3476082

Dominguez, R. (2014). UCOP versus R. Dominguez: The FBI Interview. In P. Chatterjee & S. Maira (Eds.), The Imperial University (pp. 343–354). University of Minnesota Press. https://doi.org/10.5749/minnesota/9780816680894.003.0015

Education Reform Act 1988, Chapter 40, United Kingdom (1988). https://www.legislation.gov.uk/ukpga/1988/40/contents

University of Maryland. (n.d.). Freedom of Speech on Campus. Office of General Counsel. Retrieved March 30, 2026, from https://ogc.umd.edu/freedom-of-speech

Garnar, M., & Magi, T. J. (Eds.) (with American Library Association Office for Intellectual Freedom). (2022). Intellectual freedom manual (Tenth edition). ALA Editions.

Hopkins Press. (n.d.). Library Trends. Retrieved February 14, 2026, from https://www.press.jhu.edu/journals/library-trends

Inouye, K., & McAlpine, L. (2019). Developing Academic Identity: A Review of the Literature on Doctoral Writing and Feedback. International Journal of Doctoral Studies, 14, 001–031.

Kempner, J. (2008). The Chilling Effect: How Do Researchers React to Controversy? PLOS Medicine, 5(11), e222. https://doi.org/10.1371/journal.pmed.0050222

Kim, J. (2025, March 14). Over 50 universities are under investigation as part of Trump’s anti-DEI crackdown. NPR. https://www.npr.org/2025/03/14/g-s1-53831/dei-universities-education-department-investigation

Kronk, C., Keyes, O., & Marathe, M. (2025). Towards an estimate of the impact of censorship on biomedical literature. Journal of the American Medical Informatics Association, 32(7), 1199–1205. https://doi.org/10.1093/jamia/ocaf089

Landgraf, G. (2026, January 23). One Year of the Trump Administration. American Libraries Magazine. https://americanlibrariesmagazine.org/2026/01/23/one-year-of-the-trump-administration/

Leebaw, D., & Logsdon, A. (2020). Power and Status (and Lack Thereof) in Academe: Academic Freedom and Academic Librarians. In the Library with the Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2020/power-and-status-and-lack-thereof-in-academe/

Nelson, C. (2009). The Fate of Academic Freedom. South Atlantic Quarterly, 108(4), 689–699. https://doi.org/10.1215/00382876-2009-014

Oltmann, S., & Dowell, M. (2025). A Badge of Honor? Ongoing Threats to Academic Freedom. Journal of Intellectual Freedom & Privacy, 10(1), 9–22. (185674058).

Park, H. and Aghassibake, N. (2026). Empowering Immigrant Library Users Through Personal Data Literacy Programming in Public Libraries. Library Trends, 73(3), 405-429. https://doi.org/10.1353/lib.2026.a983006

Peet, L. (2018, July 6). Protecting Library Workers’ Discourse around Social Justice | ALA Annual 2018. Library Journal. https://www.libraryjournal.com/story/protecting-library-workers-discourse-around-social-justice-ala18

Peet, L. (2019, April 25). Defeating Bullies and Trolls in the Library Conference Examines Harassment, Doxxing. Library Journal. https://www.libraryjournal.com/story/defeating-bullies-and-trolls-in-the-library-conference-examines-harassment-doxxing

Penney, J. (2022). Understanding Chilling Effects. Minnesota Law Review, 106(3), 1451. https://doi.org/10.24926/265535.4359

Professor Watchlist. (n.d.). About us. Retrieved February 13, 2026, from https://professorwatchlist.org/aboutus/

Researcher Support Consortium. (2024a, March 4). Institutions. https://researchersupport.org/institutions/

Researcher Support Consortium. (2024b, March 8). For Researchers: Mitigating Risk and Coping During an Attack. https://researchersupport.org/researchers-mitigate-risk-and-cope-during-an-attack/

Schauer, F. (1978). Fear, Risk and the First Amendment: Unraveling the Chilling Effect. 58 Boston University Law Review 685-732 (1978). https://scholarship.law.wm.edu/facpubs/879

Sharma, A. (2025, February 25). Federal list of forbidden words may jeopardize research at UCSD. KPBS Public Media. https://www.kpbs.org/news/economy/2025/02/07/federal-list-of-forbidden-words-may-jeopardize-research-at-ucsd

Sun, L. H., & Eilperin, J. (2017, December 18). CDC gets list of forbidden words: Fetus, transgender, diversity—The Washington Post. https://www.washingtonpost.com/national/health-science/cdc-gets-list-of-forbidden-words-fetus-transgender-diversity/2017/12/15/f503837a-e1cf-11e7-89e8-edec16379010_story.html

Starbird, K. (2023). A Battle for Better Information. Lawfare. https://www.lawfaremedia.org/article/a-battle-for-better-information

The Lancet. (2025). American chaos: Standing up for health and medicine. The Lancet, 405(10477), 439. https://doi.org/10.1016/S0140-6736(25)00237-5

The White House. (2025a, January 29). Ending Radical and Wasteful Government DEI Programs and Preferencing. Federal Register. https://www.federalregister.gov/documents/2025/01/29/2025-01953/ending-radical-and-wasteful-government-dei-programs-and-preferencing

The White House. (2025b, February 25). Commencing the Reduction of the Federal Bureaucracy. Federal Register. https://www.federalregister.gov/documents/2025/02/25/2025-03133/commencing-the-reduction-of-the-federal-bureaucracy

Uglesbee, B., Spitalniak, L., & Schwartz, N. (2025, October 22). Tracking the Trump administration’s deals with colleges | Higher Ed Dive. https://www.highereddive.com/news/tracking-the-trump-administrations-deals-with-colleges/803434/

University of Maryland. (n.d.). Statement of Free Speech Values. University Policies | UMD. Retrieved February 16, 2026, from https://policies.umd.edu/statement-free-speech-values

University of Maryland Francis King Carey School of Law. (n.d.). Academic Standards & Honor Code. Retrieved April 28, 2026, from https://www.law.umaryland.edu/all-policies/academic-standards–honor-code/

Warren, L., & Warren, S. (2011). The Art of Crossing Borders: Migrant Rights and Academic Freedom. Boom, 1(4), 26–30. https://doi.org/10.1525/boom.2011.1.4.26

York St John University. (n.d.). Researcher vulnerability. Retrieved February 14, 2026, from http://www.yorksj.ac.uk/policies-and-documents/research/ethics-and-integrity/roles-and-responsibilities/researcher-vulnerability/

by Hayley Park at May 13, 2026 04:17 PM

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2026-05-13: Awarded Grant for AI-Driven Accessible Telehealth Support for Unhoused Populations

 

I am pleased to share that I have been awarded an Internal Research and Development Grant from the Office of Enterprise Research and Innovation at Old Dominion University in the amount of $50,000, beginning July 1, 2026. The project, titled “Trust-Centered AI Avatars for Accessible Telehealth Support Among Unhoused Populations,” is co-led with Drs. Ginger Watson and Tina Gustin.

This project proposes to improve healthcare access for vulnerable and underserved populations by leveraging, enhancing, and evaluating an NSF-funded telehealth kiosk, TeleHSupport, to improve usability through an audio interface and AI-powered interactive avatar. TeleHSupport is a secure, self-service kiosk enabling individuals with limited access to traditional healthcare to receive reliable AI-generated health information and guidance. It delivers safe, evidence-based guidance on chronic disease through text-based conversational AI, with clinically validated content curated by advanced practice nursing faculty. Individuals select symptoms through step-by-step prompts, which are processed using a retrieval-augmented generation approach to assess symptom severity and likely conditions. The kiosk provides consistent guidance while prioritizing patient safety. Ongoing deployment and focus group studies have identified low general and health literacy, and trust as barriers that limit adoption and effective use.To address these challenges, the proposed work integrates voice recognition with a human-AI avatar agent to enable an agentic AI framework, support multimodal, conversational interaction, reduce reliance on text, and foster trust, engagement, and comprehension. Leveraging generative AI, large language models (LLMs), Retrieval-Augmented Generation (RAG), human-computer interaction (HCI), and modeling and simulation, the project will design, implement, and evaluate avatar-mediated interactions tailored to the needs of unhoused populations.

The work builds directly on my prior NSF investments and represents a critical next step toward scalable, trustworthy AI-enabled health support systems deployable beyond Hampton Roads. The proposed study will provide usability and adoption data critical for larger-scale deployment within and beyond Hampton Roads.

Faryaneh Poursardar (@Faryane)

*Picture: Patient signed the consent form.

by Faryaneh Poursardar (noreply@blogger.com) at May 13, 2026 01:49 PM

Digital Library Federation

Community Voting Now Open for the 2026 Virtual DLF Forum

Community voting is now open for the 2026 Virtual DLF Forum! Community voting lets DLF and the Program Committee know which proposals resonate with our community. Results are weighed when developing the final event programs. Anyone may participate, and you may vote for as many proposals as you’d like, but each one once.

How to participate in DLF Community Voting:

  1. Get a feel for your favorites by reading proposal abstracts, organized by event in Airtable or in our accessible Google Sheet.
  2. Navigate to the voting form to cast your votes.
  3. You’ll be asked to enter your email address. Email will only be used to ensure that each person votes just once, then will be de-coupled from the votes themselves.
  4. Click the +Add button under each event name to select your favorites for each event.

Community voting is open through June 1.

To learn more about our events, visit our event website.

If you have questions, please let us know at forum@diglib.org. Thanks!

-Team DLF

The post Community Voting Now Open for the 2026 Virtual DLF Forum appeared first on DLF.

by swillis at May 13, 2026 12:00 PM

May 11, 2026

Open Knowledge Foundation

How can we build a responsible climate chatbot?

In one of our very first AI Learning Labs projects, we are teaming up to explore how to design a chatbot that draws on reliable science and systemic social analysis

The post How can we build a responsible climate chatbot? first appeared on Open Knowledge Blog.

by Solana Larsen at May 11, 2026 07:03 PM

Xe Iaso

I'm really frustrated that GitLab is doing layoffs

GitLab announced layoffs today. They don't state how many people are affected, but honestly I find this really frustrating for several reasons:

  1. This is the one time where they could have won by doing relatively nothing. GitHub is having big outages on a daily cadence. All they have to do is market themselves as "we're the stable one" and maybe add tooling to run your existing GitHub Actions in GitLab to make the transition easier. They could have won so hard it's not even funny because GitLab makes it trivial to host it yourself.
  2. This is yet another case of "the stock price has gone down but we don't want to look bad to investors so we'll say that AI is going to help us more". I'm increasingly skeptical of this claim, but it's what makes the company look good to the people with the money sooo...
  3. They claim that one of their main goals is "Speed with Quality". Usually this is a "of two, pick one" type of scenario. I shudder to think what may happen when GitLab turns into a feature factory powered by something on the lines of Protos.

Maybe GitLab did need to trim the fat, maybe they will come out of this stronger, but damn I just can't help but think about a world where they could have won without AI and just by being more stable than GitHub.

Numa is smug
Numa

One small problem with that: what you are suggesting makes sense. We live in clown world with clown world logic. Why would we be allowed to have things that make sense in clown world?

Also yes, I do know that clowning is actually a very difficult art to pull off correctly. Humor is one of the most difficult things on the planet because if you do it wrong you offend people. People that are offended generally aren't people that are laughing. As a character that largely amounts to being a jaded contrarian comedian this is something that comes up a lot when planning what I say.

Also maybe I'm just oversensitive to it at this point, but the layoff announcement really reads like Claude Opus output. What a fuckin' world.

Aoi is coffee
Aoi

I don't want to live on this planet anymore.

Apropos of nothing, I'm really enjoying my experimentation with Tangled. More to come soon when I have more to say.

May 11, 2026 12:00 AM

Library | Ruth Kitchin Tillman

When Everything You Have Looks Like OpenRefine

This is not an OpenRefine tutorial. As I was trying to explain the process I took to read and code my sabbatical research, I realized that it could be best described as getting galaxy brained with OpenRefine and pandoc. I thought it might make an interesting blog post – less as a how-to or advice and more as an example of what real-world problem-solving looks like in libraries, or at least in my life. The alternative title could be “What It’s Like Inside My Weird Brain.”

The Problem

I got 267 responses to my 2024 survey (if you’re wondering why outputs haven’t yet been published, that is one contributing factor). Because I was trying to understand morale and migration, I’d allowed for free text responses along with controlled value fields. And, if you’re not aware, people have a lot of feelings about migrations. As a researcher, this is great!

These long textual responses were a great way of developing a fuller understanding of why a person answered the way they did. But when it came to assessing the data, this meant I would need to code all 267 responses or at least review and see if anything could be coded. Otherwise, I would be losing a ton of information in the statistical portions of analysis. And reading these in a spreadsheet client sounded miserable. I experimented with importing the sheet into NVivo, which seemed like a natural fit for coding, but didn’t feel I could properly map in my data. That might well be a skill issue, I’ve only used NVivo for interview transcripts, which are a completely different thing.

The Approach

What I needed was to be able to turn each response into its own pages. I’d set up the question in the output so that I had context for each answer. For example:

Which ILS did you previously use? Sirsi Dynix Symphony

My first thought: “oh, I could do this with Python.” The original Qualtrics export was in CSV. I’ve processed CSVs in Python for ages. I could output each response in markdown and use pandoc to transform them into a PDF. I then planned to read that PDF with my ReMarkable and take notes both on the sheet and in a running document.

The second half worked great. Simple markdown is simple. Pandoc is easy enough to use, although I did have to fuss around a bit to get a font I liked (and on my Linux machine). And ReMarkable remains the best way I have of reading and taking notes on PDFs.

But when I started messing with it in Python, I quickly got annoyed. It was doable, but it wasn’t easy. I felt like there should be something more straightforward.

And that’s when I remembered OpenRefine templates. Just before my sabbatical started, I’d been working on a project where I turned a spreadsheet into JSON through OpenRefine’s templating system. It’s a batch process I do once a year and I’ve put energy into, well, refining it. Because I only do it once a year, I also have enough documentation to go from my annual “wait how does this work again?” to getting it done. I imported the survey CSV into OpenRefine and started tackling it.

The Steps

When you go to the export templating area, OpenRefine helpfully outputs all existing fields in a sample template. So even though I’d have to hack it to pieces, I had a startiing list to work from. I kept my key nearby so I had a clue what Q22 might mean. I used the value split, replace, or forNonBlank processes on each line, grouping some together, and previewed results periodically to make sure I was on the right track.

It wasn’t the fastest thing I’ve ever done, but it took less than an hour to write, revise, and put out a final export. Just so there’s something useful in this post, this is the template I ended up with.

## {{cells["ResponseId"].value}}

**How many years have you worked in libraries?** {{cells["Q4"].value}}

**What kind of library do you work in?** {{cells["Q5"].value}} {{forNonBlank(cells["Q5_6_TEXT"],c,":" + c.value,"")}}

**How is your job classified?** {{cells["Q6"].value}} {{forNonBlank(cells["Q6_3_TEXT"],c,":" + c.value,"")}}

**How would you describe your job status?** {{cells["Q7"].value}} {{forNonBlank(cells["Q7_3_TEXT"],c,":" + c.value,"")}}

**If you regularly use the ILS (back-end system), what kind of tasks do you use it to perform? (choose all that apply)** {{cells["Q8"].value.replace(",",", ")}}

{{forNonBlank(cells["Q8_10_TEXT"],c,":" + c.value,"")}}

**How much time each week do you estimate that you spend using the ILS?** {{cells["Q9"].value}}

**If you regularly use your library's public online catalog or discovery platform as part of your work, which kind of tasks do you perform or support using these?** {{cells["Q10"].value.replace(",",", ")}} {{forNonBlank(cells["Q10_10_TEXT"],c,":" + c.value,"")}}

**How much time each week do you estimate that you spend using the online catalog and/or discovery platform?** {{(cells["Q11"].value)}}

**Which ILS did you previously use?** {{cells["Q13"].value.replace(",",", ")}} {{forNonBlank(cells["Q13_12_TEXT"],c,":" + c.value,"")}}

**Which ILS do you now use?** {{cells["Q14"].value.replace(",",", ")}} {{forNonBlank(cells["Q14_10_TEXT"],c,":" + c.value,"")}}

**When did your ILS migration complete? (please specify the year or month/year):** {{cells["Q15"].value}}

**About how many years had you (personally, not your institution) used the previous system?** {{cells["Q16"].value}}

**How would you describe degree to which the workflows of your primary job responsibilities have changed since the migration:** {{cells["Q18"].value.replace(",",", ")}} {{forNonBlank(cells["Q18_4_TEXT"],c,":" + c.value,"")}}

**How do you feel overall about any changes to your workflows?** {{cells["Q19"].value.replace(",",", ")}} {{forNonBlank(cells["Q19_4_TEXT"],c,":" + c.value,"")}}

**Please describe some challenges you experienced during the first 6 months post-migration:** {{cells["Q20"].value}}

**Did these challenges continue to impact your work at the 18-month mark post-migration (as well as you remember)?** {{cells["Q21"].value}}

**How were these challenges resolved?** {{cells["Q22"].value.replace(",",", ")}} {{forNonBlank(cells["Q22_6_TEXT"],c,":" + c.value,"")}}

**If you have any comments or reflections on either the resolution or on challenges you still experience, please provide them below:** {{cells["Q23"].value}}

**Were there aspects of your library's new ILS which substantially improved your ability to get things done or were features you'd always wanted?** {{cells["Q24"].value}}
	
**Please describe any aspects of your new ILS which substantially improved your ability to get things done or were features you'd always wanted:** {{cells["Q25"].value}}

**How would you summarize the impact of the ILS migration on your ability to complete your work as of today:** {{cells["Q26"].value}}

**Do you feel that your direct supervisor has/had reasonable expectations of the work you'd accomplish during the first 6 months post-migration?** {{cells["Q27"].value}}

**How would you describe your unit's current morale compared to unit morale prior to the migration:** {{cells["Q28"].value}}

**Is there anything you'd like to add to your assessment of the migration's impact on your unit's morale:** {{cells["Q29"].value}}

**How would you describe your own current morale compared your morale prior to the migration:** {{cells["Q30"].value}}

**Is there anything you'd like to add to your assessment of the migration's impact on your own morale:** {{cells["Q31"].value}}

Because there’s a lot of personal stuff in it, I can’t share a full sample response, but it came out like:

## R_7M9zOZyDpTDwXnj

**How many years have you worked in libraries?** more than 10 years

**What kind of library do you work in?** Academic 

**How is your job classified?** Librarian 

**How would you describe your job status?** Full-time 

**If you regularly use the ILS (back-end system), what kind of tasks do you use it to perform? (choose all that apply)** Cataloging

Pandoc was a bit harder, perhaps partly due to some settings in Linux, but ended up being something like:

pandoc markdown_survey_export.md --pdf-engine=xelatex -V 'mainfont:DejaVuSans.ttf' -V 'mainfontoptions:Extension=.ttf, UprightFont=*, BoldFont=*-Bold' -o Survey_Responses.pdf

Results

The resulting PDF was 231 pages long. It was well-formatted reading with one header for each entry and bolded questions. I spent a lot of time reading through it (twice), noting factors described in the free-text, condensing them into a codebook, and then applying appropriate terms to each entry.

Discussion

Ok, the header is a little tongue in cheek. This isn’t an article. I should be revising a nearly-complete article right now.

When I was trying to explain this, I realized it’s a good example of how I’ve experienced tech librarianship, from my early metadata days to things I do now. Can I write scripts? Yes, I do so all the time. I wrote a small processing script in Python earlier this week (last week now) and I can’t even recall that it was for because it was so ordinary and took maybe 10 minutes to get right. But I also use the things I know. Sometimes it’s easier to clean up a CSV by opening it in a text editor and performing a series of regular expression find and replace operations. Sometimes I use OpenRefine to … create PDFs? Sometimes I write journal articles and blog posts in Joplin.1

It’s possible to take this too far. Can you share an image or dataset via Word or Powerpoint? Yes. But please don’t. I’d encourage those who aren’t as comfortable with these whacky decisions to feel your way out and assess outputs using the following criteria:

…I was trying to think of a third to be traditional, but this is all that came to mind (I’m down to update it if someone thinks of a third or fourth). I’d say a loss in usability, but I think appropriate format (image should be JPG/PNG/TIFF/BMP/etc., dataset should be some kind of dataset format) and quality cover that.

Should I end this on an encouraging note? If you do this already, you’re not as weird as you think you are. If you’re not doing things because you don’t have time to buckle into the “right” tools, are there things which let you get the output you need while, taps the sign, don’t take the result off the rails too? Give it a go!


  1. this one’s not weird ↩︎

May 11, 2026 12:00 AM

May 10, 2026

Ed Summers

Weekly Bookmarks

These are some things I’ve wandered across on the web this week.

🔖 lowkey: critical coding club

lowkey: critical coding club is first and foremost a coding club - a series of hangouts for anyone who wants to write computer programs with other people. Whether you’re new or experienced, need inspiration, or just want to throw on headphones and hack, you are welcome to join. Expect it to be laid-back and self-organised outside usual hierarchies: study groups, spontaneous pair programming, parallel working/body doubling, or just folks with laptops.

In lowkey we are focused on creating a space for diverse coding practices and building a community outside the usual institutions and sites with temporary contracts. E.g., academic institutions, businesses and entrepreneurship with ethical compromises, or “expert” meetups. Through the lens of critical technical practice, we are also open to address politics and social issues tied to coding, but rather than discussing them in the abstract, we focus on integrating this into our practice of coding.

🔖 permacomputing principles

Commentary from the hacker news community…

🔖 I Verified My LinkedIn Identity. Here’s What I Actually Handed Over.

I wanted the blue checkmark on LinkedIn. The one that says “this person is real.” In a sea of fake recruiters, bot accounts, and AI-generated headshots, it seemed like a smart thing to do.

So I tapped “verify.” I scanned my passport. I took a selfie. Three minutes later — done. Badge acquired. I felt a tiny dopamine hit of legitimacy.

Then I did what apparently nobody does. I went and read the privacy policy and terms of service.

Not LinkedIn’s. The other company’s.

🔖 Agentic Coding is a Trap

Despite the countless failed attempts at trying to democratize coding while not understanding coding, we’re faced with the reality that you cannot understand code without engaging with it. And it’s become clear that if you don’t keep engaging and writing it, you can lose touch with that understanding, which will in turn make you a less capable orchestrator in the first place, rendering this phase of AI coding a strange and needlessly stressful interlude.

🔖 Computing as a social activity

By using LLMs to reduce their dependence on other humans, developers risk abandoning that social foundation. They work through roadblocks not by interacting with other developers on forums, but by asking a model trained on a millions of StackOverflow posts. They review code that no one has written, or submit code that will be reviewed by no one. Until now, FOSS has been conducted as a conversation, humans responding to other humans in the formal languages of code and project management, but increasingly it’s a conversation conducted between chatbots.1

May 10, 2026 04:00 AM

May 09, 2026

Journal of Web Librarianship

Peruvian National Current Research Information System: Implementation Status and Interoperability Between PeruCRIS and Institutional CRIS/RIM Systems

.

by Joel Alhuay-Quispe Vanessa Brañes-Gutiérrez Renato Velarde-Gutierrez Hilda Zulueta-Rafael Karla Peña-Pineda a Escuela de Postgrado, Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional Mayor de San Marcos, Lima, Perúb Facultad de Ciencias Administrativas, Universidad Nacional Mayor de San Marcos, Lima, Perúc Facultad de Letras y Ciencias Humanas, Pontificia Universidad Católica del Perú, Lima, Perúd Escuela de Bibliotecología y Ciencias de la Información, Facultad de Letras y Ciencias Humanas, Universidad Nacional Mayor de San Marcos, Lima, Perúe Subdirección de Promoción de la Innovación Agraria, Dirección de Gestión de la Innovación Agraria, Instituto Nacional de Innovación Agraria, Lima, Perú at May 09, 2026 05:06 AM

May 08, 2026

Mita Williams

What if we used AI as an excuse to provide structured open data to our communities?

On Wednesday, April 8th, I gave a presentation called What if we used AI as an excuse to provide structured open data to our communities?

by Mita Williams at May 08, 2026 03:41 PM

May 07, 2026

Journal of Web Librarianship

A Streamlined AI-Powered Workflow for Enriching Library Newspaper Archives

.

by Evagelos Varthis Laboratory of Information Technologies, Department of Libraries, Archives and Museums, Ionian University, Theotoki, Corfu, Greece at May 07, 2026 03:20 AM