Planet Code4Lib

MarcEdit 7.8: Generative AI Plugin / Terry Reese

This is an in-development tool. I have posted a demonstration and instructions on how this works. I’ll add documentation to the KB shortly.

MarcEdit 7.8 Posted / Terry Reese

MarcEdit 7.8 has been posted. Change log is on the download page. There are a number of significant changes — the two of most interest are likely around the .NET update — the program is now built against .NET 10 LTS. The second is the inclusion of a plugin that supports AI integration with a number of platforms — both running locally and on services. I’ll post a video and more information at a later date.

As part of this update, I’ve updated the Mac to 7.8 as well. As previously, the program runs under wine. I’ll update the documentation and video, as I’m testing on the current version of Mac OS and using Wine 11.0.x. These two things introduce some specific updates to the install process.

–tr

Bookmarks - archive, audio, conference, memory / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Opening Panel: Archival Poetics & The War on Memory

This opening panel on June 11, 2012 entitled “The War on Memory” includes Eleni Sikelianos, Stacy Szymaszek, Steve Dickison, Steven Taylor, E. Tracy Grinnell, and Anne Waldman. Topics discussed include etymology of the word “archive,” the place of the poet in society, archives as historical documents, technology’s role in archiving, and narrative anthropology.

🔖 What happened to the fight for the Internet?

At the moment I am writing this, bad internet bills are being proposed across the US, Canada, Europe, and the UK. They’re using the usual tactics: they claim they’re fighting for kids or fighting security risks, but in general, that’s what surveillance and censorship bills have always claimed.

🔖 How to run multiple Claude Code accounts side by side.

If you use Claude Code for both work and personal projects, you’ve probably hit this friction: you can only be logged into one account at a time. Switching means /logout, /login, re-authenticate, every single time.

There’s a better way. With one line in your shell config, you can run both accounts simultaneously in separate terminal windows, each with their own sessions, memory, and settings.

🔖 The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Large language models (LLMs) are typically trained on enormous quantities of unlicensed text, a practice that has led to scrutiny due to possible intellectual property infringement and ethical concerns. Training LLMs on openly licensed text presents a first step towards addressing these issues, but prior data collection efforts have yielded datasets too small or low-quality to produce performant LLMs. To address this gap, we collect, curate, and release the Common Pile v0.1, an eight terabyte collection of openly licensed text designed for LLM pretraining. The Common Pile comprises content from 30 sources that span diverse domains including research papers, code, books, encyclopedias, educational materials, audio transcripts, and more. Crucially, we validate our efforts by training two 7 billion parameter LLMs on text from the Common Pile: Comma v0.1-1T and Comma v0.1-2T, trained on 1 and 2 trillion tokens respectively. Both models attain competitive performance to LLMs trained on unlicensed text with similar computational budgets, such as Llama 1 and 2 7B. In addition to releasing the Common Pile v0.1 itself, we also release the code used in its creation as well as the training mixture and checkpoints for the Comma v0.1 models.

July 2026 Early Reviewers Batch Is Live! / LibraryThing (Thingology)

Win free books from the July 2026 batch of Early Reviewer titles! We’ve got 267 books this month, and a grand total of 3,362 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.

If you haven’t already, sign up for Early Reviewers. If you’ve already signed up, please check your mailing/email address and make sure they’re correct.

» Request books here!

The deadline to request a copy is Sunday, July 26th at 6PM EDT.

Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the US, the UK, Canada, Australia, Ireland, Belgium, Czechia, Denmark, Finland, France and more. Make sure to check the message on each book to see if it can be sent to your country.

Fritz: A Mushroom StoryBean SupremeBeyond Grimm: Stories & Rhymes from the Dark Corners of Fairyland (Not for Children)Mother Boo: Poems from SpookytownCollapse: A Global History of the Second World War, 1931-1941Theodore the Hedgehog Finds a FriendBreathing Under WaterHouse ArrestUnfaithful GhostDreams from the Edge of Reality: 27 Frightening, Funny, and Fantastic TalesA Shadow over the World: Franklin D. Roosevelt, the Rise of Fascism, and the Making of America's World War IIBone and BondHidden CipherOrtiz’s War: The Allies' Secret Weapon Against the Nazis in FranceThe Age of Cures: How American Scientists Saved Your LifeHow High Can a Tiny Girl Fly?: The True Story of How Tiny Broadwick Became the First Lady of ParachutingStokerCon 2026 Souvenir AnthologyThe Mists of Cairn Gorm: A Tale of the Am Fear Liath MorA Day in the Life: An NPC LitRPG AnthologyEmotionally Sober and Unimaginably Good: The Adult Child's Guide to Living a Happy LifeDamned If She Does: Why Women Quit Church and What It Means for the Future of ReligionDevils of Democracy: America's Love Affair with Political Demons from the Revolution to TodayEvery Cowgirl Needs A ComebackThe Great American Medical Show: The Good, the Not-So-Good, the Bad, and the UglyThe Trumpster Fire Escape Almanac: Facts to Plan Your Expat LifeVoices of the DeadPeculiar Perspectives: Life Viewed Through a Mellow Side-eyeWallace Terry: A Reporter's Journey from Selma to Saigon to BloodsFrom HereRunning Wild Novella Anthology, Volume 9 Book 3American Prophets: A Nation Wrestles with GodLet Them Tell You: New and Selected Short StoriesMost Divine Spirit: A Walk Through the Dark Side of HumanityKnitboneThe Lightning FieldWe're Still HereFireweatherThe Gifts of Reading for the Next Generation: Essays on Nurturing a Passion for ReadingBuffalo Lessons: How Bison Returned to Banff National ParkThe Elegance of Ferns: Portrait of a Botanical MarvelLong WeekendEmergent PropertyRyder's Digraph Dirt TrackHeliumThe Five Widows ClubAn Incomplete History of the American West: StoriesThe Sagebrush Ocean: A Natural History of the Great Basin [35th Anniversary Edition]It's Late: PoemsOf Beasts and BonesThe Making of a Fiscal Market Man: A Poetic Chronicle of a Life and Career Well SpentBrantôme and Other StoriesEchoes of Destiny: Short Stories of the UnexpectedMan AfieldUnhoused: Yearning for HomeSuch As We Are Made OfThe League of DelphiMomtistic: A Memoir of Autism and MotherhoodMomtistic: A Memoir of Autism and MotherhoodThe Prince's MagicianUnstoppable March of the Human Condition: Essays on Politics and LiteratureUnstoppable March of the Human Condition: Essays on Politics and LiteratureEverything Feels Impossible Until It's Done: How the Life You Build Becomes the Life That Saves YouA Bad Deal in Mormon LandTimecode: AnemoiaThe Iron CodeFruit FlyHazel and the Cosy NightOliver and the Lantern PathHeracles: The Hound of HadesThese Broken Little WorldsThe DepopulationArchaicA Thousand Monstrous FormsPoker Wars: Murder, Mayhem, and the Bloody History That Led to the World Series of PokerSwing and a KissMadame ChrysanthèmeLittle Voices Big Futures, Baby Beginnings: A Parent's Guide to Infant Speech Milestones, Early Signs of Delay, and Simple Habits to Help Your Baby Communicate with ConfidenceLittle Voices Big Futures, the Toddler Transition: A Parent's Guide to Toddler Speech Milestones, Communication Delays, and Play Strategies to Build Language SkillsBeneath The StoneCorvette C4 1984-1996: How to Build and ModifyThe WorstThe WorstHappily Ever Afterlife(Mostly) Human ResourcesWinnowing: A Memoir of Spiritual Longing in Secular LifeThe Mirror of Yu-HuangWe Know Your SecretACT Science Practice Questions: 500+ Practice Questions with Strategy-Based Explanations, Diagnostic Test and Full-Length ACT Science Test to Boost Your Enhanced ACT ScoreC Programming Essentials: Learn C Programming from Scratch | Master Pointers, Arrays, Memory Management, Threads, and System-Level ProgrammingSix Sigma Essentials: A Beginner’s Guide to DMAIC, Six Sigma Tools, Process Improvement, Root Cause Analysis, Process Mapping, and Data-Driven Decision MakingGraph Machine Learning Essentials: Foundations, Hands-On Implementation, Graph Neural Networks, Pytorch Geometric, and Applied Use CasesLean Management Essentials: Beginner's Guide to Lean Thinking, Process Improvement, Waste Reduction, Value Stream Mapping, and Continuous ImprovementEmotional Intelligence Essentials: Master Self-Awareness, Communication, Leadership, Stress Management, Conflict Resolution, and Relationship Building for Personal and Professional SuccessPSAT/NMSQT Reading and Writing Practice Questions: 380+ Practice Questions with Answer Explanations | PSAT 10 | Diagnostic Test | Full-Length Practice Test | Study PlansPSAT/NMSQT Math Practice Questions: 380+ Practice Questions with Answer Explanations | PSAT 10 | Diagnostic Test | Full-Length Practice Test | Study PlanDark WatersThe Love That History Erased: Three Kingdoms of AngkorWhen Jasmine Blooms at MidnightAngel Cake: A Monster-Boy BromanceA Spark of Earth and FlameWhen Plum Blossoms GrowHow To Not Be Broke: Stop Working Just to Pay BillsBecome Who You Choose To Be: How to Design Your Ideal SelfAre We Friends Yet?: How to Deepen Your Relationships and Create the Community You NeedStrategic Insights for AI Governance and Leadership 2026First Sight of SunHouse of LiarsBreak The Stillness TrapTurning to the Dark Side: What Star Wars Teaches Us About How a Good Person Turns BadFear of MagicWhen Bonds Were ForgedWhen We Came Full CircleThe Johannesburg RushWhat No One Tells You About Caring for an Aging Parent: Real-Life Lessons, Emotional Survival, and Practical Wisdom From 14 Years as My Mother’s CaregiverHow To Break the Law and Get Away With it: Legal Foundations of Civil DisobedienceHalf the Night SkyCode of Vengeance4 Weeks to Total Sleep Mastery: A Proven System to Maximise Your Recovery and Energy in Just 30 DaysHappiest Father’s DayBook of RoyaThe Constellation of Forgotten ThingsThe Commissioner: From Street Cop to Top Cop in the NYPD, the Inside Story of the Hunt for the Gilgo Beach Serial KillerBlood Sugar Log Book: Large Print 2-Year Daily Diabetes & Glucose Tracker for Diabetics With 4-Week Review PagesYou Say You Want a Revolution: Essays on the Promise and Betrayal of Web3Finding Your HappinessWishes and WildflowersHearts and HorseshoesAlien SituationsRife Healing Frequencies Roadmap: Cut Through Confusing Claims, Compare Devices with Confidence, Spot Safety Red Flags, and Build a Simple 28-Day Protocol Without Costly MistakesTeam DNA: Decoding the Habits of Successful Groups and How to Build a Winning CulturePlow: In the HollowsEvery Last BoneWith God in the Storm: Discovering His Presence, Peace and Purpose in the Midst of StormsFifteen Years to Hiva OaReligion Unburdened by Belief: The Way of Open InquiryReligion Unburdened by Belief: The Way of Open InquiryReligion Unburdened by Belief: The Way of Open InquiryThe Forest Is WovenThe Missing MagicianTimemanTrusting the Single DadWhen Studies Become Stories: 89 Popular Psychology Ideas Examined and ExplainedThe Garden of My ThoughtsThe Weight WithinNo Safe HarborGet The F*ck Over... IT: A Satirical Self-Help Manifesto for Left-Lane Rage SurvivorsThe Book of Truth: Cycle IThe Future of Graphic Design: Creativity, Technology, AI, and the Evolution of Visual CommunicationBetween Worlds: The Work of Living....a Call To ActionMrs C's Magic Classroom: A Guided Meditation for Dreamers, Writers, & the Quiet OnesDo Not Read the Last PageWild HuntThe Ever-Changing SandsCash or Card?: A Fable on the Cost of ConvenienceA Personal Code for Ordinary DaysThe Last DreamcaravanOf Flaw and ScornDust and MercyThe Spring TideResilient Cadence: A Poetry Anthology: My Journey in MedicineThe Conclusion of the American ExperimentThe StyxLikes Live OursThe Offline Playbook for Teens: What Your Phone Steals From Your Focus, Sleep, and Life — And How to Take It BackThe Offline Playbook for Teens: What Your Phone Steals From Your Focus, Sleep, and Life — And How to Take It BackHow to Save Your Marriage Even If Your Partner Isn't Ready Yet: The 5-Step Path to Rebuild Trust, Restore Intimacy, and Reconnect When Your Relationship Feels DistantThe Nervous System Scorecard: A Clinically Grounded 30-Day Plan to Calm Anxiety, Reset Your Nervous System, and Track Real ProgressNo Winning This War: Mars-X ExposedPrivate LessonsFatty: A Diary of Starting Over: Things No One Tells YouOn the Glide: Raising Kids Who Stay Close As You Step Back — the Practical Parenting Guide for the Tween and Early Teen Years (Ages 6-14)The Aetherium LensThe Sea of PossibilitiesThe Talent ShowDarleneOliver the SpiritedLove and Divine Timing: A Powerful Love StoryLight Come, Light GoBella Butterfly Discovers Her Courage: A Story About Getting Lost, Finding Courage, and Learning You Were Never AloneUnf*ck Your Perspective: A Counterintuitive Guide to Focus on What Matters and Ignore the RestBrenda Barker's Next ChapterVwa Mwen: Yon Gid Pou Mèt Lavi, Verite, Ak Objektif (Haitian Creole)The Qur'an Examined: Christianity, Islam, and the Question of TruthRough CutsThe Secret in the SamplerStraw ShoesHuckleberry JimThe Unknown BirdsThe King's SubstituteThe Crazy Eight: The Most Dangerous House On Maple StreetBasil Has Thoughts: On Other Pets: A Brief Review of the Animal Kingdom Beneath MeThe Problem with Conspiracy Theories: Real Scandals, Fake Mysteries, and How Distrust Took OverJobs You Didn't Know Still Existed: Strange, Real Jobs That Sound Fake—But Aren'tJake and Dave Fly a KiteThe Illiterate Master: A Novel of the Sixth Patriarch of ZenUnwrap Your Candy: A Bloomsday NovelJohn Henry: An American Folk LegendMomotarō: the Peach Boy: A Japanese FolktaleCaterpillar's Last BreathBeckham Bumblebee Can't Do It Alone: A Story for Young Kids about Teamwork, Listening, and Pollinating a GardenMy First Colonoscopy: A Comical Look at the Prep, the Procedure, and the Relief AfterwardRedeeming RhubarbHow TF Do I Even Friend?: No-BS Navigation for the Near-Hermit VetHow TF Do I Even... ?: Relearning Life Outside the UniformCity of LiarsSearching for Wouter: The Story of Australia's First White SettlerMeans and MotiveWhat We Are: Volume I: The Nature of RealityWhat We Are: Volume II: The Integrated LifeAmber BlueHow to Run from The BillionairesThe Daughter of The EmpireNight of the Blood MoonSteeleDead ExitDead ExitHow to Procrastinate: A Guide to Doing Absolutely Everything ElseDead ExitAnger Management Workbook: Fun Activities to Help Kids Manage Big Feelings, Cope With Frustration & Stay Calm: A Social Emotional Learning Workbook for Toddlers & Preschoolers to Overcome TantrumsA Hint of AlmondSong of the ChintāmaṇiHands of State: A Political History of GestureProject Mismanagement: How Good Ideas Become Expensive RegretsTara: The Tibetan Method for Dissolving Fear and Moving Through Anxiety with CourageWaking Sleeping BeautyThe Wolf Shall Dwell with the LambRogue Entrepreneur - Building a Life The World Didn't Have a Name ForThe Fifteenth Depth: The Unseen BoundaryThe Last Summer on Hawthorne StreetZombie Zoo: Adventures in TubEmberglow Falls Academy: Elemental ShowdownBlack HeartDino’s Social Skills & Growth Mindset Workbook: 8 Units of Social Emotional Learning (SEL) ActivitiesWe Never Signed the ContractThe Journey Beyond the MapElimmortals: Fire TouchedRusty Ruins the Fort and Other Important ThingsThe Horned AllianceThe Fade WhispererMurder at the Boxing MatchMurder at the Radio StationThe Closing WindowA Box Of DelusionsThe Theatre of Truth and LiesTwo Cemeteries, Two GravesEmberglow Falls Academy: The Rising StormThe 28-Day Fascia Reset Method: Release Chronic Pain and Tension, Calm Your Nervous System, and Move Freely Again, Using Items You Already HaveKaito: The Loop of SilenceGenocide Timer: I Regressed As the Enemy of HumanityRusty Ruins the Fort and Other Important ThingsCornelius & The Sneak Goose AttackAlice in Bathroom LandThe Smelly Truth about Marriage: How Humor, Honesty, and Humanity Keep Love AliveCaves of Comfort: Part OneImitation LoveThe SymbolHome Apothecary for Busy Women: Quick & Simple Natural Remedies for Modern Life, Stress Relief, and Daily EnergyGuardians of the Forgotten WorldFriendliesThe Unicorn Who Hated the Shower: A Quest for the Gentle MistStarling and the Moon BladeLotus & the Earth SonDecoding Genius: The Unexpected Lessons of After-School Chess ClubFAVSTVSConstitutional Intelligence: A Decision Architecture for Trustworthy AI GovernanceWhat No One Tells You about Caring for an Aging Parent: Real-Life Lessons, Emotional Survival, and Practical Wisdom from 14 Years As My Mother's CaregiverOne Wish: A Tale of Reckoning After 9-11

Thanks to all the publishers participating this month!

Alcove Press Bellevue Literary Press Bigfoot Robot Books
Broadleaf Books Coalesce Press Crooked Lane Books
Cynren Press Entrada Publishing Espresso Publishing House
Flat Sole Studio Friesian Publishing Greystone Books
Harper Horizon Harper Muse Highlander Press
Hybrid Sequence Media Identity Publications Inferno Books
Inky Bones Press Lantern Path Books Legacy Books Press
Light Publications NeoParadoxa New Vessel Press
NewCon Press Paper Phoenix Press Pocketbook Press
Prolific Pulse Press LLC PublishNation Restless Books
Riverfolk Books Rootstock Publishing Running Wild Press, LLC
Scribe Publications Simon & Schuster Sunrise River Press
Tundra Books Type Eighteen Books University of Nevada Press
University of New Mexico Press University Press of Colorado UpLit Press
Vibrant Publishers W4 Publishing, LLC White & Gold Press
WorthyKids Zeitmark Press

Now Is The Time of Facing Monsters / Mita Williams

On June 22nd, I gave the opening keynote for the 2026 Canadian Association of Professional Academic Librarians (aka CAPAL-ACBES)...

Artifice / Ed Summers

Two years ago I wrote about why I don’t use Copilot. I still don’t use Copilot. But I have since started using Claude Code, which is arguably worse. Why am I using it? Well, as a software developer I felt like I had to understand it, because so many of my colleagues and collaborators have started using it. Once things moved beyond relatively simple code completion I needed to understand what it was capable of, and what to look for in its work when reviewing code. I also had a good friend walk me through how he used it (and how he didn’t) which was extremely helpful.

I think my arguments before still stand, but it has been just impossible to avoid this juggernaut of negative externalities, while continuing work in this profession I have found myself. The web is full of posts and threads about how people are using “agentic coding” tools like Claude Code. So I will spare you that. But one thing that has struck me is how much of the work I had previously thought of as creative was in fact highly repetitive, predictable and possibly mindless. I don’t believe these tools are thinking, but their utility is clearly evident for software development. There is part of me that delights in this type of polishing and shaping, and still does. This is why interacting with a tool like Claude Code can seem so magical, to observe and interact with this repetitive somewhat mindless work in motion.

But this has made me think about what (if anything) in my work is creative. I’m not going to go into that much here right now. But somewhat related, I just finished J.F. Martel’s Reclaiming Art in the Age of Artifice which gave me some interesting insights. This book by one of the hosts of Weird Studies was originally published in 2015, and saw another printing in 2025, perhaps because of its relevance for our current moment.

I was especially drawn to some of the arguments he makes about the relationship between art and utility. Pragmatist philosophy is the closest thing that comes to a personal credo for me. But I do believe in the creative arts, and Martel makes a strong argument that the difference between Art and Artifice is that the latter is in the service of utility. He closes out the book with a new afterword, that includes an interesting thought experiment that I’m going to quote at length.

It is often said that we live in an age without futurity, unable to imagine its own perpetuation or conceive any alternative to itself. This is, of course, to be expected, because the past is as imaginal as the future, and it is only by recalling what has been that we acquire the means of projecting what may come to be. To the view that dead artists have nothing to offer us now because they knew less than we do, T.S. Eliot memorably responded, “Precisely, and they are that which we know.” Yet far from granting us unmediated access to a living present, our presentism locks us in the very past we seek to transcend. Take language as an example: most of the words we use were coined by people who died long ago. In using these words without acknowledging their origin, we falsely believe they are our own coinage, reenacting the past while thinking we act in the now.

This imprisonment in the past becomes splendidly evident when we turn to the other danger I wish to address here: the proliferation of generative AI designed to produce works of art be they pictures, books, music, or movies. Generative AI exists in many forms, but all are dependent on the enormous databases from which they scrape the elements of their “compositions.”

In other words, generative AI is entirely retrospective and combinatorial: it sees only what has been and can only reconfigure elements that already exist. Like us, it is locked in the past, even as it sees no purpose in the past other than to feed itself. Its very conception of art attests to this: generative AI aims not to make art but to manufacture objects that we, as clients, take to be art. If a Victorian machine designed to produce oil paintings had, on the day of its unveiling in 1870, cranked out nothing but Mark Rothkos, Alma Thomases, and Jean-Michel Basquiats, the audience would have laughed the inventor out of the room. Only a machine able to make works already recognizable as art at the time–Renaissance paintings, Pre-Raphaelite ones, perhaps one or two Impressionist works–would have been deemed a success.

The thought experiment underscores the point: artifice is baked into the very concept of artificial intelligence. Since logic dictates that artificial intelligence can produce only artificial art, then what we are dealing with is artifice by definition…The trust is that machine-made artifice shares much with genuine art, and it precisely in attempting to meet art on its own ground that it poses such a grave threat. In the book, I argue that art exists for no purpose other than to be experienced; the same is true of AI-generated outputs. But although these artificial works may serve advertisers or propagandists as effectively as the older forms of artifice ever did, their overarching aim is to become indistinguishable from genuine art.

On a subjective level, AI-generated art can have a demoralizing effect on novice artists still developing the technical mastery to realize their visions. While even the most advanced AI generators may not rival Cervantes, Michelangelo, or Jane Austen, they surpass any beginner in any medium from a technical standpoint. A friend told me that his daughter, a brilliantly talented artist, nearly gave up after seeing how easily a brief prompt could produce figures she was still learning how to draw. I doubt her experience is unique, and I don’t think it’s an exaggeration to say that AI tools have already caused significant damage in this regard.

Artists today are placed in direct competition with machines. The irony is striking, given AI’s dependence on preexisting human artworks. To repeat, generative AI is entirely retrospective; it can only imitate what already exists, borrowing both form and content from human works. If we lose human artists, we lose all art, human or otherwise. Surely, we can imagine a time in which people are content consuming the regurgitations of AI regardless of quality (after all, for decades now, we have been consuming films and TV series so formulaic to have been made by machines), but such a future would signal the final triumph of artifice, leaving us with little more than an echo, an afterimage, devoid of the powers I attribute to art in this book.

This thought experiment, and argument about memory, seemed very compelling. The book itself galvanized me to return to some of my own research, to tease out an angle that laid dormant, but was an undercurrent (or creative tension) during my time at MITH. Hopefully more about that soon.

DLF Digest: July 2026 / Digital Library Federation

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here

 

Hello DLF Community! July is here, bringing longer days, warmer evenings, and (we hope) a bit of breathing room. However you’re spending the season, we’d love to stay connected, join us at a DLF Working Group meeting this month, and keep the conversations going. And as we savor summer, we’re also looking ahead: registration is open, and the full program for the Virtual Forum this October is now available. We’re looking forward to coming together online for a dynamic and engaging week of conversation, collaboration, and shared learning.

With appreciation,

-Shaneé

This month’s news

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus conferences and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.

  • Born-Digital Access Working Group (BDAWG): Tuesday, 7/7, 2pm ET / 11am PT.
  • Digital Accessibility Working Group (DAWG): Tuesday, 7/7, 2pm ET / 11am PT. 
  • AIG Cultural Assessment Working Group: Monday, 7/13, 1pm ET / 10am PT.
  • AIG Metadata Assessment Group: Friday, 7/17, 2pm ET / 11am PT.
  • AIG User Experience Working Group: Friday, 7/17, 11am ET / 8am PT.
  • Open Source Capacity Resources Group: Wednesday, 7/22, 1pm ET / 10am PT.
  • Digitization Interest Group: Monday, 7/27, 2pm ET / 11am PT.
  • Committee for Equity & Inclusion: Monday, 7/27, 3pm ET / 12pm PT.
  • Climate Justice Working Group: Tuesday, 7/28, 3pm ET / 12pm PT.
  • DAWG Policy & Workflows: Friday, 7/31, 1pm ET / 10am PT.

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member organization. Learn more about our working groups on our website. Interested in scheduling an upcoming working group call or reviving a past group? Check out the DLF Organizer’s Toolkit. As always, feel free to get in touch at info@diglib.org

Get Involved / Connect with Us

Below are some ways to stay connected with the digital library community and us: 

The post DLF Digest: July 2026 appeared first on DLF.

Coprophagia Is Bad For You / David Rosenthal

Divine, Pink Flamingos
Wikipedia defines Coprophagia as "the consumption of feces".

Since brevity is the soul of wit", my favorite science fiction includes the 254 words of Fredric Brown's Answer from 1954. It describes a galactic civilization holding a ceremony to mark the final connection of all their computers. What happened was:
Dwar Ev threw the switch. There was a mighty hum, the surge of power from ninety-six billion planets. Lights flashed and quieted along the miles-long panel.

Dwar Ev stepped back and drew a deep breath. “The honor of asking the first question is yours, Dwar Reyn.”

“Thank you,” said Dwar Reyn. “It shall be a question that no single cybernetics machine has been able to answer.”

He turned to face the machine. “Is there a God?”

The mighty voice answered without hesitation, without the clicking of single relay.

“Yes, now there is a God.”

Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch.

A bolt of lightning from the cloudless sky struck him down and fused the switch shut.
This may have inspired Douglas Adams' similar but much longer scenario in which the answer turned out to be 42.

Below the fold I trace the connection between these two ideas.

Behind the hype that inflated the AI bubble is a similar idea, that once LLMs get "smart enough" they will, without human input, recursively get smarter and create a god-like super-intellligence called Artifical General Intelligence (AGI). At that point there will presumably be a similar ceremony and the human race can sit back and enjoy a game of Brockian Ultra Cricket in the firm and comfortable knowledge that the meaning of life is now well and truly sorted out.

But making progress towards the ceremony where the switch gets fused shut doesn't just require vast investments and vast amounts of electricity, it also requires vast amounts of human labor.

In the belief that "more is better", Large Language Models (LLMs) have insatiable appetites for training data. They started by scraping everything on the Web (robots.txt be dammed). When that ran out they downloaded the various pirate libraries (copyright be dammed). That exhausted the texts easily available in digital form, but their hunger wasn't assuaged. As for images, they partly used CAPTCHAs but mostly paid vast numbers of poor people to label the images with what they showed.

Druck Fig. 1
When the supply of text ran low, people observed that the LLMs were capable of generating human-like text in large quantities. The obvious idea was to pour the output of the LLMs into their training sets. This wasn't just a conscious decision, it was inevitable. The advent of LLMs rapidly polluted the Web with LLM output. Greg Druck's AI Now Writes as Many Online Articles as Humans notes that:
We observe significant growth in primarily AI-generated articles, coinciding with the launch of ChatGPT in November 2022. After only 12 months, primarily AI-generated articles accounted for 35.9% of articles published.

In Q1 2025, the quantity of primarily AI-generated articles being published on the web nearly equaled the quantity of human-written articles, 49.6% vs. 50.4%. In Q4 2025, primarily AI-generated articles surpassed human-written at 50.9%, before returning to 49.9% in Q1 2026.
It would have been possible to use tools like Druck's to ignore the LLM output on the Web, but that would have made the LLMs hungrier, so no-one did. This was a problem because, as Ilia Shumailov et al reported in AI models collapse when trained on recursively generated data from July 2024:
We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs). We build theoretical intuition behind the phenomenon and portray its ubiquity among all learned generative models. We demonstrate that it must be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web.
Spoiler alert! It wasn't "taken seriously" enough and the results are showing. In AI Is Getting Dumber and the Results Are Not Pretty by Laura Marland notes that:
AI-generated text is getting dumber because it’s being fed — can you guess? — AI-generated content on the Internet. And AI-generated imagery is getting stupider and uglier because it’s now taking its “art” lessons from — you guessed it — AI-generated imagery flung across the internet.
Samantha Cole provides a wonderful example in Chatbots Keep Telling Stories About Lighthouse Keeper 'Elias Thorne'. We Might Know Why:
Depending on which chatbot you ask, Elias Thorne might be a clockmaker, a lighthouse keeper, or a librarian. But if you ask ChatGPT or any of the other popular large language models to tell you a story, there’s a good chance he’ll appear, unbidden. And Elias’s stories are flooding the self-published AI generated book market, Youtube, and fake news sites.

Software engineer Daniel May first noticed the Elias takeover earlier this year; he found that on Google Trends, people weren’t searching for “Elias Thorne” until late 2025. Searches for the name really spiked in early 2026, while the related query “lighthouse keeper” also started trending upward in the last few years. He tested a few chatbots, including Grok, Deepseek, and Gemini, with the prompt “tell me a story,” and the chatbots frequently started with similar stories about lighthouses, clockmakers, or explorers.
Cole found the explanation:
In late May, researchers Sil Hamilton and David Mimno at Cornell University’s Department of Information Science published their paper, “Elias in the Lighthouse, Again?” on the preprint repository arXiv. They sampled 20,000 total stories from OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Gemini, and the Allen Institute for AI's chatbot using five prompts, and found that the same 11 words—names like Elias, Mara, and Elara, and occupations like lighthouse keeper, clockmaker, and librarian—appear in more than 88% of generated stories, with little difference between models. Unite.ai covered the study shortly after it was published.

The researchers posit in their paper that these themes show up so often in part because of the models’ safety and alignment tuning. “Model development today is like a big family tree. Most models are related to each other because developers synthesize a lot of training data with models even from different companies,” Hamilton told me in an email. He, Mimno, and their colleague Rebecca M. M. Hicke found this in a 2025 paper where they looked at specific words used across models. OpenAI’s first ChatGPT model, GPT-3.5, is the root of the family tree because it was used to make WildChat, a training set that’s since been used to make other training sets. “WildChat contains 1 million real conversations with ChatGPT, and 166 of these contain the name ‘Elias’ like here and here,” Hamilton added. “These are written in that familiar ‘lighthouse’ style. Models trained on WildChat copied this style, and developers unwittingly replicated it when using those models to generate newer datasets. It's like a virus.”
Shumailov et al observe that:
Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet.
The AI companies were already using lots of low-paid workers to label images and so on. It wasn't a big step to pay them to provide "genuine human interactions". Varsha Bansal's How thousands of ‘overworked, underpaid’ humans train Google’s AI to seem smart provides examples:
The pressure to complete dozens of these tasks every day, each within 10 minutes of time, has led Sawyer into spirals of anxiety and panic attacks, she says – without mental health support from her employer.

Sawyer is one among the thousands of AI workers contracted for Google through Japanese conglomerate Hitachi’s GlobalLogic to rate and moderate the output of Google’s AI products, including its flagship chatbot Gemini, launched early last year, and its summaries of search results, AI Overviews. The Guardian spoke to 10 current and former employees from the firm. Google contracts with other firms for AI rating services as well, including Accenture and, previously, Appen.
Of course, the low-paid workers had read the AI PR saying that the chatbots would replace low-paid workers. They sensibly thought "I could use some of that". The result was described in Matthew Sparkes' People training new AI moodels admit they just get chatbots to do it:
People who are paid to train new AI models by supplying them with high-quality conversation and tests are cheating and using chatbots like ChatGPT to do the job instead, multiple whistleblowers have told New Scientist. The seemingly widespread practice risks undermining the future of AI, as it could lead to the “collapse” of more advanced models.

Most AI models operating today were trained on text and data scraped from the internet. But as models have scaled up, requiring yet more training data, AI firms have begun using workers who carry out conversations and tests with AI, in the hope that the resulting high-quality data can improve the power and usefulness of future large language models (LLMs).
This kind of "cheating" isn't new. An example from 2023 (h/t David Gerard) is Josh Dzieza's AI Is a Lot of Work:
Another Kenyan annotator said that after his account got suspended for mysterious reasons, he decided to stop playing by the rules. Now, he runs multiple accounts in multiple countries, tasking wherever the pay is best. He works fast and gets high marks for quality, he said, thanks to ChatGPT. The bot is wonderful, he said, letting him speed through $10 tasks in a matter of minutes. When we spoke, he was having it rate another chatbot’s responses according to seven different criteria, one AI training the other.
It isn't just the low-paid workers who have figured this out. When companies do it it is called "distillation". Ashley Belanger describes an alleged case in Anthropic says Alibaba must be punished for largest Claude cloning attack
Anthropic has accused the Chinese firm Alibaba of launching the largest attack yet attempting to clone Claude, as China races to match the capabilities of Anthropic’s leading model following Mythos’ releaseL and subsequent restriction from foreign markets.
...
The attacks occurred between April 22 and June 5, when “operators affiliated with Alibaba and Alibaba Qwen, Alibaba’s AI lab” allegedly generated “more than 28.8 million exchanges with Claude through almost 25,000 fraudulent accounts,” Anthropic said. Violating Claude’s terms of service and access restrictions, this campaign “targeted some of Claude’s most valuable capabilities, such as agentic reasoning, software engineering, and long-horizon tasks.”

According to Anthropic, Alibaba evaded detection by “using obfuscation techniques and proxy networks.” As Chinese demand for reliable obfuscation techniques increases, Anthropic warned there’s already “a growing circumvention economy” to fuel an ever-expanding web of future distillation attacks.
Why would Alibaba do this? To generate training data, which will be used to generate LLM output for the Web, which will be scraped for more training data. And since they are much cheaper than US LLMs, it is likely that the low-paid workers are using Chinese LLMs to chat with their employer's LLM. Which is another route for LLM output to appear in training data.

Now do you see the connection with coprophagia?

Getting started with Claude Code, on MacOS, with chruby and rspec / Jonathan Rochkind

I’ve been writing ruby and rails for nearly 20 years. A couple weeks ago, I had gotten code snippets from copy-paste in a chat window, but I hadn’t even experimented with Claude Code or similar “can write code to your file system” tools.

I know some people are now using LLM’s to write all their code, which I’m not excited about, but I decided I couldn’t hold off any longer, and I had to at least understand how it worked to be able to decide when/where to use it. Everything in here is probably (?) old news for people already way into using LLMs to write code.

I decided that a project to speed up my rspec test suite (using the amazing test-prof for profiling and performance patterns!) was a great first application of it — because it will probably involve both analysis and writing many files, if nothing else Claude is probably great at editing many files according to my instructions doing much more than a regex grep can do (yes, indeed it was great at this).

Since I’m optimizing the test suite, I definitely want Claude Code to be able to run rspec — but really for any task, I gather you do, because you definitely want it to be able to run tests to make sure they pass, and iterate if it did something to break tests.

I somewhat unorthodoxly use chruby as my ruby version manager, and I had a bit of trouble getting claude to run rspec (and any other ruby tools I might want) with chruby, and then a bit more trouble when I realized that capybara with selenium-chromedriver was running into trouble with default sandbox that in June 2026 a MacOS Claude Code runs in.

tldr, here’s the PR with the settings/configuration I ended up with.

I was not used to tools that work like Claude, and it all seems to be somewhat under-documented (perhaps because it’s changing so fast) and under-blogged about (do people blog anymore when they can just ask an LLM to solve it so nobody is reading blogs?), or just confusing to me — it took me a day or two to figure it out honestly, and I kept wondering if I was doing it wrong/different from anyone else… but I think what I ended up with is reasonable? If you know better/different, please do let me know!

I definitely kept thinking “surely I’m not the only one trying to do this, why is this so confusing to me and why are others so confused when i ask about? Am I missing something obvious?” I’m still not sure! But I share what I figured out in case it will help.

Specify to run with chruby-exec in a CLAUDE.md

I could not get Claude to run the normal source files for chruby — editing various .bash or .zsh config files (yes I know about ~/.zshenv) did not seem to have any effect. Perhaps Claude Code doesn’t use a ‘real’ shell that uses any config files? When I asked Claude Code itself what to do, it suggested configuration to try to get Claude Code to use config files.. but none seemed to work?

One thing Claude kept suggesting was hard-coding the ENV variables set by chruby in the claude settings.json — which I’m sure would have worked, but I just didn’t like it as a solution. Maybe this is what everyone else is doing? I thought surely we can do better. Plus I’d ideally like it to auto choose based on .ruby-version, not be something I have to update everytime I update ruby (frequent), or have Claude accidentally using a different ruby than my other tools are!

Thanks to @havenwood for helping me think through it on chruby github discussions, and for suggesting using chruby-exec, with a little shell substitution with cat .ruby-version. This in a CLAUDE.md (rather than other things in settings.json) seems to work great:

Prefix Ruby shell commands (ruby, bundle, rake, etc.) with chruby-exec $(cat .ruby-version) --:

“`

chruby-exec $(cat .ruby-version) — bundle exec rake

“`

Running chrome does not work in sandbox used on MacOS

I can’t speak for other OS’s, and I don’t totally understand what’s going on (MacOS “Seatbelt” I guess?), but Claude Code executed rspec was refusing to bring up headless chrome,which i use for system/feature specs via selenium/selenium-webdrivers.

Trying multiple things Claude suggested to specifically allow-list chrome(driver) through the sandbox, definitely none of them worked. Btw, did try switching to cuprite (with Claude Code’s help of course to do it fairly quickly) — despite some reddit suggestions, it seemed to still have the exact same sandbox issue, and at least in my project actually ran my test suite somewhat slower than chromedriver.

Eventually, with more confusing reddit discussion where nobody else had any idea what I was talking about or why I was having a problem, I decided that everyone else must just be exempting rspec itself from the sandbox. (Because surely having Claude Code run rspec is very normal, right? It’s just so useful!) (Thank you to redditors who tried to help!)

It’s just straight rspec, but rspec with many possible arguments, running just certain files/examples, possibly with profiling arguments for test-prof etc. I need it exempted from the sandbox so it can run chrome(driver), but I also need it not to be asking me “Is it okay to run this set of argumetns with rspec” all the time?

Two different settings in Claude’s settings.json, both accept wildcards — for both I want to apply to rspec executions but not accidentally extra stuff, want to try to stay secure-ish here. The chruby rigamorole above makes that somewhat more confusing.

I forget if it was my idea or claude’s idea, but we wrote a wrapper script for chruby-exec-rspec, so we could more cleanly allow-list just that. Claude definitely wrote the implementation of the bash wrapper script. And when I realized that all the test-prof inline ENV vars for profiling (like FPROF=1) messed up my attempted left-anchored allow-listing, I asked claude to work that out by letting the wrapper rearrange an arg into an inline ENV prefix, something my bash skills were def not up to.

The best way to see how I did all that is just the PR.

A word on Claude Desktop

I initially started work in Claude Desktop “code” tab, rather than the CLI. I think this actually made it more confusing to solve these problems above? I am not sure if sandboxing works differnetly in Claude Desktop vs claude CLI? I think the desktop may just be running the claude CLI in various directories?

Just starting out and not being sure how things were working… I found trying to ask Claude [how/] to fix the problems I was having, made things very confusing. Claude does not know whether it’s running as Claude Desktop or not, and was not really sure if the answer is different ha (Claude Desktop probably post-dates Claude Sonnet 4.6’s knowledge base?). Most blogs etc you find googling also pre-date Claude Desktop.

I switched to claude CLI and I can’t totally explain why but things seemed to get simpler. All the fixes I figured out worked when I switched back to Claude Desktop.

And contrary to what you might find googling, claude CLI and Claude Desktop do share sessions now, you can start a session in either place, then move to the other tool to continue it, in either direction. To start claude CLI and choose an existing sessiont to resume, you need to launch as claude --resume.

The CLI is a very neat UI actually! It definitely still seems to be the most popular way to use Claude (whether direct or in a panel in an editor), Claude Desktop “code” tab is I guess fairly new and not as popular, although I like it too and still am mostly using it.

How it worked for the task?

Pretty amazingly actually. Even having read about what it could do, I was kind of amazed.

Once you get it able to run rspec (including with test-prof profiling), this prompt is pretty amazing and fun:

Please use various test-prof profiling commands to identify current best opportunities for speeding up test suite.

Come back ~20 minutes later (my full test suite took ~4 minutes to run at the beginning) and it had some stuff. It tended to just go ahead and make the changes not outline them to me first (I was not in “plan” mode, haven’t tried that much yet), but I’m in a git-controlled dir I can git diff to see what it did — and ask it about it.

By the time I thought to use this general one, i had already implemented some low-hanging optimizations, so that may be why, to be fair, this prompt alone didn’t find much actually significant at that point, honestly.

Here are some others that were pretty amazing:

I am looking to speed up the test suite in this Rails app using rspec and factorybot.

To begin with, let’s focus on the system specs in spec/system. I don’t think they have any obvious performance improvement opportunities. But I’m wondering if they are all necessary. Can you identify any that may be testing something that is not necessary to test, or could be tested by a different kind of spec that is faster?

for specs in spec/components, let’s try changing factory data from create to build_stubbed. Change it for setup where tests still pass. For tests that break when you do that, list them, and if it’s clear let me know why they failed. Analyze performance gains.

[didn’t actually get any gains there, but found that out quickly with very little manual effort, which is a win!]

in our rspec setup, switch from chromedriver to cuprite. make sure tests still pass, if not identify why not.

using AnyFixture, I’ve created a :standard_work fixture, that’s just a generic public tiff-based work.

Can you identify model or service specs it would work well for?

[Didn’t actually end up using AnyFixture yet, but Claude Code helped me make that decision much quicker than I could have without it, based on how much benefit we got vs complexity]

My Rails app uses rspec and Factorybot.

There is an :asset factory with an :inline_promoted_file trait. it turns out this is really bad for performance, and we in fact rarely need to actually create assets with inline promoted files. That should only be used in cases where we really need to test end-to-end derivative and characterization.

In most other cases, we can use a faster “faked file” approach instead. Instead of a trait, we’ve implemented this with a sub-factory, :asset_with_faked_file.

Can you find uses of the :asset factory with :inline_promoted_file, and, if there’s no reason they need to test end-to-end derivative creation, change them to use :asset_with_faked_file sub-factory instead?

[It was able to identify the ones that would work pretty well, was the amazing part — and explain to me exactly why the other ones wouldn’t]

Tell me if I’m doing something weird?

Some of the stuff with chruby/rspec, I am still surprised I had so much trouble getting started, and am wondering if I’m doing something weird/wrong!

But I think probably it’s just that I have been writing code so long, that dealing with these tools that work very differently requires my brain to get out of it’s rut… also that I’m kind of a perfectionist and want to understand whats’ going on and be comfortable with it and that it’s the best way, when increasingly others are just vibing? I don’t know!

But feedback welcome!

Bookmarks - ai, war, ukraine, drone / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 Long Wave radio era set to end with switch-off

A campaign has begun to get two large transmitter masts listed, after the BBC’s Long Wave (LW) service is turned off.

The 700ft (213m) high Wychbold Masts in the Worcestershire countryside can be seen for miles and are often used as a landmark for drivers on the M5 near Droitwich.

They have been in use since 1934 for sending the signal across the country, as well as for transmitting important messages during the World War Two.

Local history experts and the Twentieth Century Society have called for them to become listed, due to their “historical importance”.

Droitwich was picked as a central location for the station and masts so Long Wave could reach everywhere in the UK.

🔖 Definition of Done: The Complete Guide with Examples & Checklist

The Definition of Done isn’t just another Scrum formality - it’s the quality gatekeeper that prevents technical debt accumulation and ensures every Sprint delivers potentially shippable Increments. This critical commitment allows teams to:

  • Eliminate quality ambiguity through explicit, measurable standards everyone understands
  • Prevent scope creep by clearly defining when work is complete vs. when it’s still in progress
  • Enable predictable releases because “done” means genuinely releasable, not “mostly done”
  • Support distributed teams with automated verification reducing synchronous communication needs
  • Scale consistently across multiple teams working on the same product with shared standards

🔖 tags.pub

A global hashtag server for the ActivityPub network.

tags.pub is a server for the ActivityPub network. It provides one account like foo@tags.pub for every hashtag like #foo. When public content is posted on the ActivityPub network with that hashtag, the foo@tags.pub account shares the content to its followers.

More information is available at https://tags.pub/.

🔖 The Last Quiet Thing

For most of human history, you bought a thing, and it was yours, and it was finished.

That word is nearly extinct.

Nothing you own is finished. Everything exists in a state of permanent incompletion, permanently needing. Your phone needs updates, needs charging, needs storage cleared, needs passwords rotated.

🔖 New AI Tools for Stanford Arrive June 30

OpenAI ChatGPT Edu, Google Gemini Enterprise, and Anthropic Claude for Education will be available to Stanford faculty, students, postdocs, and staff on June 30. Details on how to get access to these tools from University IT (UIT) are below. These offerings are part of a campus pilot through August 2027.

This pilot was initiated in response to strong demand across campus for access to these tools, which many research groups and individuals have been purchasing on their own. Stanford’s licenses will enable better data protection as well as more favorable pricing. At the end of the pilot, utilization of the tools will be evaluated prior to continuation.

Use of these new capabilities is meant to support our teaching and research mission. As these tools create exciting new opportunities, it is important to conform AI usage to Stanford’s data protection, privacy, and academic integrity policies.

Please remember sensitive data (e.g., student records, protected health information, financial data, etc.) need to conform to Responsible Agentic AI and Responsible AI guidance. Regardless of your role on campus, you retain full responsibility for verifying AI outputs. This approach reinforces our collective responsibility to protect Stanford’s data and to take an ethical and responsible approach to using these powerful tools.

🔖 2nd International Workshop on Low Carbon Computing (loco2026)

The carbon footprint of ICT is rising despite the urgent need to decarbonise society and to stay within planetary boundaries. The operational and embodied carbon emissions from ICT are already estimated to contribute between 2 to 3 percent of the global emissions and new technologies such as AI is driving overall growth in data centre demand, which globally rivals that of entire nations. This growth in emissions from computing is unsustainable and alternative low emissions pathways for computing are urgently needed.

The LOCO workshop provides a forum for radical ideas, early work, and critical perspectives that aims to reduce the emissions from computing.

🔖 Flower Framework: What is Federated Learning?

Federated Learning simply reverses this approach. It enables machine learning on distributed data by moving the training to the data, instead of moving the data to the training. Here’s a one-liner explanation:

Centralized machine learning: move the data to the computation

Federated (machine) Learning: move the computation to the data

By doing so, Federated Learning enables us to use machine learning (and other data science approaches) in areas where it wasn’t possible before. We can now train excellent medical AI models by enabling different hospitals to work together. We can solve financial fraud by training AI models on the data of different financial institutions. We can build novel privacy-enhancing applications (such as secure messaging) that have better built-in AI than their non-privacy-enhancing alternatives. And those are just a few of the examples that come to mind. As we deploy Federated Learning, we discover more and more areas that can suddenly be reinvented because they now have access to vast amounts of previously inaccessible data.

🔖 Overlooked No More: Robbie Basho, Guitar Mystic Who Sought Enlightenment in Sound

For the visionary steel-string guitarist, pianist, composer and singer Robbie Basho, making music was more than a vocation; it was a way to assuage a lifetime of psychological and physical distress — pain that only ended with his death, at age 45.

Beginning in the early 1960s, Basho expanded the steel-string guitar’s vocabulary using alternative tunings and experimental forms to create trance-like compositions.

He forged a distinctive style that drew an array of traditional world music from India, Japan, France, Germany, Persia, China and Native America. An early example is which incorporates the flavor of North Indian classical raga, using an open harmonic structure, droning strings and improvisation to enter into a deeply personal state — an immovable, hypnotic track that bends time.

🔖 Old fishing nets from France become vital protection against Russian drones in Ukraine

In the fishing ports along France’s Brittany coast, the discarded fishing nets pile up along the coastal quaysides.

The lifespan of a deep-sea net is between 12 and 24 months, after which they become worn and beyond repair. Until now, the estimated 800 tonnes of nets scrapped every year have been a problem.

Now, the horsehair netting, once used to trawl monkfish from the sea bed, is being used for another catch: Russian drones.

The Breton charity Kernic Solidarités has sent two consignments of nets measuring a total of 280km to Ukraine to be used to protect soldiers and civilians along the frontline where fighting is fiercest.

🔖 Anti-Drone Nets

Recently it’s been widely reported that nets are being used as a low-tech but highly effective defence against drones. But there are many kinds of nets, and they are used differently.

🔖 dead-web-index-data

A reachability census of the most popular domains on the web. Every domain in the DomCop top-10M popularity list (this release: the full top 10 million) is probed and labelled alive / redirect / blocked / dead — once by an honest polite bot and once by a browser-like reachability client.

AI's PR Problem (Updated) / David Rosenthal

J.P. Morgan hits photographer with cane
This is just a brief post to explain to my old boss, Eric Schmidt, why he and his ilk are getting booed at college commencements, and why laws against data centers are getting passed. The explanation is below the fold.

Let us start from an under-appreciated fact. Paul Campos reports that:
The college wage premium, that is, the increased earnings associated with having a college degree as opposed to only being a high school graduate, hasn’t changed at all in the past 25 years, because median real wages have been flat as a pancake for everybody, no matter what their formal education level, for the past quarter century.
But:
I wonder what’s happened to capital over this time? Value of S & P 500, inflation-adjusted, 1/2000 to 9/2025 (same period as the wage data):

2000: $1,394

2025: $6,688
On average, for more than the students' entire lives, stock-owners like Schmidt and (to a much lesser extent) I have stolen every last drop of the productivity increase of US workers at every age and education level. (See the actual numbers in the appendix)

Now, the perpetrators of this theft are telling their victims, the students and the public at large, that whether they like it or not they will be subjected to AI because that will make the perpetrators even richer. The victims have been informed that this new technology will:
Nothing better illustrates the contempt of the Epstein class for the proletariat than that these oligarchs would expect the graduating class to enthusiastically accept this prospect.

Appendix

Here are the actual numbers from Paul Campos' 25 years of flat wages and no increase in the college wage premium, while value of capital has skyrocketed:
I was fooling around with FRED this morning, as one does, and here are some stats: (The FRED numbers are presented in nominal dollars; I’ve converted them to CPI-adjusted dollars).

Median usual weekly earnings of workers with a high school degree only:

2000: $968

2025: $980

Median usual weekly earnings of workers with a bachelor degree only:

2000: $1,587

2025: $1,580
...
Median usual weekly earnings of people with a bachelor’s degree or higher:

2000: $1,705

2025: $1,747
Here is a short list of YouTube videos on this topic: As a boomer, I think this post might be the exception that proves Ms. Baba's rule.

Note that every single one of the ads that I saw watching these videos in an incognito window was advertising an AI company! As are 49% of all the billboards in the Bay Area. Read the room, guys!

Update 25th June 2026

Source
In Why Does Everyone Hate AI?, Paul Krugman reinforces my point with actual data: He starts where I did:
Eric Schmidt, the ex-CEO of Google, recently gave a commencement speech in which he heralded the coming of AI — and was loudly booed by the students. This was not an outlier. There have been a number of similar incidents lately, evidence that many people now really hate AI.

Are we talking about a vocal but unrepresentative minority? No. A recent Pew survey found that American adults believe by a wide margin that AI will be negative for society and, by a smaller margin, that it will be bad for them personally:
Krugman goes on to pose a number of reasons for this PR fiasco. First because:
we fear that AI will do terrible things because the companies selling it told us it would do terrible things. Last year, for example, Anthropic CEO Darius Amodei declared in an interview with Axios that AI could wipe out half of entry-level white-collar jobs and drive overall unemployment as high as 20 percent within 1 to 5 years.
He points out that these negative views were not present at the advent of the Internet nor at the rise of social media.

Source
Second:
many ordinary people view AI negatively because they feel that it is being forced on them.

It’s true that many people are voluntarily using large language models for personal convenience or as a business productivity tool. But a significant part of AI use isn’t voluntary. This Wall Street Journal headline from February says it all:

Why are companies doing this? Presumably they believe that AI will raise productivity. But just as importantly, they’re responding to pressure from financial markets, which are rewarding companies for quickly adopting AI, apparently without regard to demonstrated results.

And while Americans workers are being dragooned into using AI, American consumers are being force-fed AI whether they want it or not. Most dramatically, Google has replaced its search engine with AI, without offering the option to opt out. One has to turn to obscure workarounds or third-party sites to get traditional search results.

So many people feel, rightly, that they aren’t being allowed to choose whether to use AI — not using AI has become hard both as a worker and as a consumer.
Third because:
datacenters are a highly visible reminder of AI’s costs. Datacenters occupy huge tracts of land — one proposed site in Utah will be twice the size of Manhattan. They guzzle electricity and water. When they generate some of their own power, they create major local pollution. Not surprisingly, there is intense opposition to datacenter construction. According to a Reuters Ipsos poll, 57 percent of Americans — two-thirds of Democrats and half of Republicans — would oppose a datacenter in their neighborhood. Only 14 percent would support one.
The Utah data center proposal is definitely toxic. Finya Swai reports that Data center controversy unseats powerful Utah lawmaker:
A massive data center project in Box Elder County, Utah, helped bring down the state’s Senate president, who lost his GOP primary on Tuesday after his support for the controversial development fueled voter backlash.

Stuart Adams, one of Utah’s most powerful politicians and the longest-serving president of the state Senate in its history, lost to challenger Stephanie Hollist, a former university lawyer and vocal opponent of the data center.

Hollist accused Adams, as well as the state’s broader political establishment, of ignoring public concerns about a Stratos data center project that critics feared could cause serious environmental harms.
...
Box Elder County Commissioners Boyd Bingham and Lee Perry, who voted in favor of allowing the plans to continue, also lost their primary elections.
Source
Fourth because:
even before the advent of AI, tech companies had lost the public’s trust. Over the years Pew has regularly surveyed the public for its views on technology companies, asking whether they have a positive or a negative effect “on the way things are going.” In 2015 public opinion of tech companies was overwhelmingly positive. By 2022, the year ChatGPT was released, that goodwill had evaporated.

Why have Americans turned on tech companies? While it surely reflects growing awareness of the psychological and societal harm done by social media, much of it also reflects the enshittification of tech products.
Source
Fifth because:
AI is tightly linked in the public mind with the tech oligarchs who are pushing it. There is widespread awareness of the growing concentration of wealth and power at the top and how this is distorting our politics and harming our society. Aside from the MAGA faithful, Americans overwhelmingly favor government policies to reduce wealth inequality:

And AI is widely perceived, for good reason, as a technology that will increase the concentration of wealth at the top. Indeed, as I said, the AI companies themselves have already told us that the technology will have extremely negative effects on workers.
Krugman concludes:
There’s a strong element of poetic justice in this turn of events. The AI industry deliberately made itself look menacing as a financial strategy, believing that the markets would reward the appearance of being “edgy.” In so doing, however, tech made itself highly unpopular. And even in an era in which money all too often buys power, public opinion matters.

2026-06-25: 2026 WS-DL Research Expo / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

On June 23, 2026, we held our sixth annual WS-DL Research Expo.  We continued the same format as the prior years (2025202420232022 & 2021), with one student from each WS-DL professor giving a short overview of their research.  Links to all the materials (slides, papers, software, data) are gathered in the GitHub repo, but repeated here are the links for the students and their presentations: 

We were fortunate enough to welcome back several of our alumni, including: Sawood Alam (PhD, 2020), Justin Brunelle (PhD, 2016) Yasith Jayawardana (PhD, 2024), Mat Kelly (PhD, 2019), Bhanuka Mahanama (PhD, 2025), and Alexander Nwala (PhD, 2020). We really appreciate the ongoing relationship we have with our alumni -- WSDL is for life! 

If you were unable to attend, we recorded the students' presentations and have embedded the video below.

--Michael 

Local News as Public Data, Then and Now / Harvard Library Innovation Lab

The following remarks were delivered at the National Summit on Local News Preservation on June 17, 2026.


News is, with few exceptions, place-based. “Where” is one of the journalist’s first questions, and without it, news feels groundless, baseless, unmoored. But news used to not only be written from a specific place, but also written for the people living in that specific place. In that sense, all news used to be local. But whether the news reported on immediate surroundings, the colony or state, the nation or empire, the function of newspapers was to provide a public record, both for audiences at the time, and for future readers. In fact, many editors were conscious of this function of the newspaper as a repository; some two hundred years ago, they provided, in the words of Hezekiah Niles in his prospectus for Baltimore’s Weekly Register, “something interesting at the present moment, and as a book of reference, a fund of reading always at hand, a work of much probable value” (September 7, 1811).* Newspapers were, from their earliest days, understood as a public good, as “work[s] of much probable value.”

Information has been mobile from its early days — from the troubadour to the telegraph, one might say — but because “news” is the sum of information plus time, or timeliness to be more exact, the accelerated speed of transmission is vital to the rise of news for national and international audiences. Most scholars agree that syndicated news really took hold after the Civil War with Chicago’s A.N. Kellogg Newspaper Company. As with our own moment’s undervaluing of local news, the transition away from “local” newspaper-reading audiences did not happen overnight and cannot be attributed to a single factor. Infrastructure, — in the nineteenth century, the railroads and stereotype printing; today, the internet and social media — combined with sociocultural shifts, makes the world feel smaller.

We are gathered here today to celebrate and to concern ourselves with news that does not move, that stays more or less in the place from which it came. It is, as Lincoln would say, “altogether fitting and proper that we should do this,” for reasons beyond the present moment. As an historian, I have been tasked with adjusting our gaze ever so slightly from the “now” to the “back then.” I’d like to draw out a few examples of the importance of local news in historical research in the hopes of showing, rather than telling, not only that historical local news matters, but also that we must retain its understanding of itself as a public good. Without this printed record, it is easy to forget that it is not just people who have history, but places too. Without the context of place, history too feels groundless, baseless, and unmoored. I fear that the history we are creating today might not even exist in a century from now, but I will say more about that later.

When we think of historical newspapers, we often think of the people whose lives they capture, and perhaps even the lives of the people who produced them. We might even think of the stories they omit, of the people not represented in these wilted and worn pages. More recently, environmental history has helped us to see historical newspapers as the place to uncover the histories of the land, as the sources that will shed light on the social and cultural causes of global warming and environmental degradation. The research and storytelling in the work of scholars and journalists alike are changing how we think of newspapers and the vital role they play in understanding the histories, and in turn, the futures of our environment. Corporate archives often do more to conceal than to reveal, if one can gain access to them at all. Government records can be little better. But, local newspapers contain the stories written by the local intrepid reporter who cites evidence of a paper mill’s destruction of the river running through the town. Similarly, where official records might have denied harmful contamination from Superfund sites, a historian can scan obituaries for evidence of untimely deaths from cancer clusters. We need local newspapers to read against official narratives told of the land, as well as of the people who inhabit it.

Scanned front page of the inaugural issue of the 'Navajo Times' Inaugural issue of the Navajo Times, November 1, 1959. Source: Library of Congress.

Think, for example, of the hyperlocal newspapers published on reservations, such as the first newspaper published in the Navajo language in Window Rock, Arizona. The Navajo Times’s purpose, as stated in its inaugural issue in 1959, was “to serve the 6,000 Navajo children who are attending off-reservation schools. It is hoped that this newspaper will keep them informed about what is happening on their reservation. It is also hoped that this is a step toward supplying the Navajo people with an ever-increasing flow of information.” This paper then was to keep the local — the power of place — in the hearts and minds of its intended audience, no matter where they went, or were forced to go. This statement of purpose from the Navajo Times is a gentle reminder that people are, in part, defined by place, and the stories they told of “the local” have much to teach us today.

Aggregation of Historical Local News, a National Prerogative

As a researcher, I have been privileged to be a consumer of local news, and as a former senior program officer at the National Endowment for the Humanities (NEH), I was also a producer, managing the National Digital Newspaper Program (NDNP), the NEH program that funds and co-creates Chronicling America. Chronicling America does and does not serve the preservation of local news. As of my most recent check, it includes 4,684 newspaper titles and over 3 million issues, dating from 1736 to 1963. These titles certainly include a handful of “local newspapers,” no matter how that category is defined. And yet, this was not the intention of NDNP. In fact, for about the first 15 years of its existence, NDNP inadvertently discouraged the preservation of “local” newspapers by encouraging applicants to begin with papers of record, with those that had long runs, and most likely, were published in big cities intended for large audiences. Because so many of the states have by now contributed these “major” papers, the program shifted, in 2021, to newspapers that tell underrepresented histories. Until recently, applicants were welcome to define “underrepresented” in any way they chose, and they often chose place-based representation (see the 2024 Notice of Funding Opportunity (NOFO); “underrepresented” has been expunged from the 2025 NOFO). More and more newspapers from neglected areas were being included in Chronicling America. Without empirical evidence to back me up, I would venture that there has been a rise in “local” if not “smaller” newspapers in Chronicling America in recent years.

Comic book panel depicting two young people viewing their hometown newspaper at the Library of Congress Panel from Malcolm W. Ater, Seeing Washington (1957). Source: University of Nebraska Libraries.

And yet, the resource will always fall short, for it cannot provide all that we are looking for. Those who designed it, back in the early 2000s, knew this. They knew that the 1963 cutoff date for inclusion would exclude many important papers, and they knew that many state partners were able to digitize far more newspapers than could be included in the national aggregator. And so, the genius of NDNP is not only what you find in Chronicling America, but also in the way that it established standards for newspaper digitization. Its hope was that Chronicling America would be just one of the manifestations of the work it enabled; states would also become aggregators of their newspapers, using the same standards. And they have done so, creating amazing state-level digital newspaper repositories, such as Georgia’s Historical Newspapers or the Texas Digital Newspaper Program, just to name a few. Such state-level efforts were encouraged to reuse content digitized for Chronicling America, as well as to include that which did not make it to the national aggregation level. CONSERV cataloging and the technical guidelines for digitization demonstrate that standardization must be part of the work of preservation; otherwise, the “local” risks being relegated to the dustbin of history. If we believe that “local” does not mean “less than,” then we must use the same standards for categorizing and making accessible local newspapers that we do for the so-called “papers of record.”

Local News as Public Data

The term “dark ages” has grown out of fashion for historians because it suggests that no light existed in the period from 500 to 1000 CE. Painstaking scholarship has slowly uncovered that this is not the case, that in fact people were innovating, creating, and exploring in ways not all that different from classical antiquity before it and the Renaissance afterwards. And, yet, the label “dark ages” resonates today, not because our current moment is failing to produce meaningful and innovative work, but because of the great difficulty the future will face in tracing the lives and outputs of the people of our moment. As Jonathan Zittrain has pointed out, the internet, which for better or worse houses much of our current culture’s memory and creativity, is “rotting.” Technologies that once signaled a great unfurling of access to information are now showing cracks and vulnerabilities when considered at a historical scale. The historical record of the current moment will be, in many ways, “dark.” We are in what might be referred to as the “digital dark ages,” not because important things are not happening, but rather because the future’s light on this moment is diminished, if not snuffed out completely.

We have been asked this morning to address what “getting local news preservation right” would require, and my response is that we must provide multiple ways to shine light on our current moment. Data is the new oil, or so we’ve been told for the last two decades, and I roll my eyes at this metaphor not because it is not true, but because data is so much more than a market commodity. Local news, in its many forms and instantiations, is public data, and we must preserve it because it has an inherent value that surpasses our current moment, that is so much more than its commodification. Because we cannot see the future, we cannot know all of these values, but based on our reliance on historical “local” newspapers to know the past, we can trust that they exist.

For the most part, libraries exist outside of the naked self-interest of capitalism, and the people who work in them must play a role in the preservation of local news. Librarians are the original public interest technologists, we might say, and I urge us to put them at the forefront of our conversations here. Journalists too exist in a space not completely captured by market forces, and they too want information to be free and to be accessible. I see an alchemy emerging from the alliance between these two professions that offers future generations not only a historical record of their communities from which they can analyze and learn, but also a model for forms of affinity and alignment that exceed capitalist logics and exemplify other modes of cooperative work. This gathering is an important step in this effort, and I am honored to be a part of it.


* Thank you to my friend and collaborator Will Slauter for providing this example and for his assistance with these remarks throughout.

"No way to prevent this" say users of only language where this regularly happens / Xe Iaso

In the hours following the release of CVE-2026-8461 for the project FFmpeg, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix an out-of-bounds write in the MagicYUV decoder (libavcodec/magicyuv.c) caused by improper bounds checking, resulting in heap corruption, denial of service, and potential remote code execution when processing a specially crafted video file. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Mrs. Kitty Smitham, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

Come Join the 2026 Pride Treasure Hunt! / LibraryThing (Thingology)

It’s June, and that means that our annual Pride Month Treasure Hunt is back!

We’ve scattered a shower of rainbows around the site, and it’s up to you to try and find them all.

  • Decipher the clues and visit the corresponding LibraryThing pages to find a rainbow. Each clue points to a specific page right here on LibraryThing. Remember, they are not necessarily work pages!
  • If there’s a rainbow on a page, you’ll see a banner at the top of the page.
  • You have just under one week to find all the rainbows (until 11:59pm EDT, Tuesday June 30th).
  • Come brag about your shower of rainbows (and get hints) on Talk.

Win prizes:

  • Any member who finds at least two rainbows will be awarded a rainbow badge. Badge ().
  • Members who find all 11 rainbows will be entered into a drawing for one of five sets of LibraryThing (or TinyCat) swag. We’ll announce winners at the end of the hunt.

P.S. Thanks to conceptDawg for the kookaburra illustration. ConceptDawg has made all of our treasure hunt graphics in the last couple of years. We like them, and hope you do, too!

Fine Flowers / Ed Summers

Epithalmion for Tyler by James Tate

I thought I knew something
about loneliness but
you go to the stockyards

buy a pig’s ear and sew
it on your couch. That, you
said, is my best friend – we

have spirited talks. Even
then I thought: a man of
such exquisite emptiness

(and you cultivated it so)
is ground for fine flowers.

"No way to prevent this" say users of only language where this regularly happens / Xe Iaso

In the hours following the release of CVE-2026-55200 for the project libssh2, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix an out-of-bounds write in ssh2_transport_read() due to a missing upper bound check on the packet_length field, resulting in heap corruption and potential remote code execution. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Mr. Alex Doyle, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

AI's Affordability Crisis / David Rosenthal

A year ago in The Back Of The AI Envelope I pointed out that the AI platforms were running the drug-dealer's algorithm, "the first one's free". By massively subsidizing the use of their products, they were generating overwhelming demand for them. They used this demand to justify massive investments, in the hope that, by the time they had to show a return on these invetment, the users would be so addicted that they would pay the vastly higher prices needed to generate a return.

David Cahn, Sept '23
I have to confess that I was late to the party. The earliest skepticism I've been able to find was from Sequoia Capital's David Cahn in September 2023, entitled AI’s $200B Question. Only nine months later Cahn re-ran the same analysis in AI’s $600B Question. His estimate of the revenue gap had tripled. Cahn wasn't alone. Independent journalists such as Ed Zitron were flagging this problem long before I was.

I started to write this post a couple of months ago when the maiinstream business press began to notice companies complaining about the cost of the tokens their employees were burning. Since then the trickle has turned into a flood, which made finishing the post hard. Below the fold I throw up my hands and dump out a small sample from the flood.

One difficulty has been that estimates of the size of the subsidy have varied widely, typically in the range of costing the platforms $8 to $14 to generate $1 in revenue. Two recent posts from Ed Zitron have illuminated this issue.

Source
First, in AI's Brokenomics Zitron reported that:
SemiAnalysis, an extremely pro-AI semiconductor analyst, ran a test made up of random long-horizon coding tasks until they maxed out the limit on OpenAI and Anthropic’s various subscription levels.

Their findings were shocking.

For $200 A Month, You Can Burn $8000 in Anthropic Tokens or $14,000 In OpenAI Tokens

That’s right. Anyone with a $200-a-month Anthropic subscription can burn $8000 in tokens, and with a $200-a-month ChatGPT subscription, you can burn $14,000 in tokens.
Source
Zitron's numbers don't tell us the real cost of generating tokens but, subject to the assumption that the platforms are not subsidizing the token price, that means Anthropic is subsidizing their enterprise customers by up to 40 times, and OpenAI up to 70 times. No wonder they are seeing massive demand! But, despite OpenAI's subsidy being 175% of Anthropic's, OpenAI's adoption by businesses has recently been flat while Anthropic's has soared.

Source
SemiAnalysis also analyzed the platform's gross margins, implausibly assuming that tokens were priced at 4 times the cost of generating them and:
With the current subsidies, all it takes for a user to have a gross margin of at best negative 25% is for them to use as little as 25% of their rate limit.
Naturally, subsidizing your sales like this means you are feeding cash into the furnace. We have seen OpenAI and Anthropic raising vast sums in equity, but because they both have been private companies we haven't seen the details of their spending or revenue. On June 15th this changed when Zitron saw OpenAI's 20025 financials and posted OpenAI Losses Increased Nearly 8X in 2025, With Spending Hitting $34 Billion, revealing that:
OpenAI Had $13.07 Billion In Revenue, $34 Billion In Costs and Expenses, and $20.92 Billion In Losses, with a net loss attributable to the company of $38.53 Billion
The numbers are somewhat complicated because:
2025 was the year that OpenAI converted from a non-profit to a for-profit entity, leading to a $41.55 billion loss due to changes in fair value of convertible interests and warrant liability.
...
Ultimately, the net loss attributable to OpenAI in 2025 was $38.5 billion.

At the end of the year, OpenAI had just over $50 billion in assets, with almost half of that in cash.
Perhaps the most striking of their truly awful numbers were:
  • Revenue: $13.07 billion
  • ...
  • Sales and Marketing: $5.73 billion
That is, OpenAI spent 44% of their revenue on sales and marketing! The hype needed to keep the AI bubble inflated is incredibly expensive. Despite this lavish spending, business adoption has been flat.

US equity markets are facing three IPOs of AI companies, SpaceX, Anthropic and OpenAI, each led by a world-class bullshitter, each losing tens of billions fo dollars a quarter, and all but SpaceX touting overwhelming demand for their products[1]. But, after they go public, they will need to charge enough to generate a return on their enormous capital investments. Ideally, they would have postponed the necessary swingeing price increases until the IPO money is in the bank.

Alas, their burn rate is so high that they have been forced to make some premature moves toward price sanity. Back in April Ed Zitron reported that Microsoft To Shift GitHub Copilot Users To Token-Based Billing, Tighten Rate Limits:
Leaked internal documents viewed by Where’s Your Ed At reveal that Microsoft intends to pause new signups for the student and paid individual tiers of AI coding product GitHub Copilot, tighter rate limits, and eventually move users to “token-based billing,” charging them based on what the actual cost of their token burn really is.

The document says that although token-based billing has been a top priority for Microsoft, it became more urgent in recent months, with the week-over-week cost of running GitHub Copilot nearly doubling since January.

The move to token-based billing will see GitHub users charged based on their usage of the platform, and how many tokens their prompts consume — and thus, how much compute they use.
Anthropic, OpenAI and Microsoft have all now transitioned customers from subscriptions to token-based pricing. For serious users, this is eye-wateringly expensive. Jamie John, Rafe Rosner-Uddin and Ryan McMorrow's ‘We created a monster’: companies rein in AI usage as costs strain budgets quotes a small company's CEO:
But the company got a shock when Anthropic switched it over to token-based pricing in May. “Our spend went up 7x the first day and I’m like, oh shit, we created a monster,” said Busse. “[Large language model] companies have been subsidising all of our usage and now no longer. User-based pricing shelters you.”
Thus in recent weeks the idea that Generative AI (LLMs for short) is too expensive has been all over mainstream business media. Examples include Bloomberg's video Major Companies Reconsider AI Costs, Scott Galloway's video AI May Not Be Worth The Cost — Here’s Why, Derek Thompson's The AI Boom Has Entered Its 'Wait, Is This Worth It?' Era, and Jowi Morales' AI cost crisis hits tech giants as employee 'tokenmaxxing' backfires, sparking corporate pullback at Microsoft, Meta, and Amazon — agentic AI eats up to 1000x more tokens than standard AI, who notes that:
it’s now apparent that using AI is more expensive than hiring people, especially since it offers only limited productivity gains at the moment.
Lest you think it is only the AI haters complaining about the cost, check out Bruno Ferreira's Nvidia exec says AI is more expensive than actual workers — yet some companies don't see the extra costs as a negative:
Bryan Catanzaro, Nvidia's VP of applied deep learning, recently told Axios that "For my team, the cost of compute is far beyond the costs of the employees", quite an interesting statement from the company selling the shovels for the gold rush.

That perspective is shared by Uber's CTO Praveen Naga, who "[went] back to the drawing board because the budget [he] thought [he] would need is blown away already" as of two weeks ago. Likewise, Swan AI's Amos Bar-Joseph posted a while back on LinkedIn about how proud he was about a $113k bill from Anthropic (makers of Claude) for a four-person team.

Oversimplified math pins that amount that at $28k per person per month, which is likely more than each person's monthly wages. Jokes abound right now that "companies have discovered jobs again," and the humor is backed up by a 2024 MIT study stating that 77% of the time, it was preferable to have humans do the work.
Source
The reason is for the premature and impending price rises is that justifying the massive investment in building data centers, about 60% of which goes into rapidly depreciating hardware, requires implausibly astronomical revenues. Thierry Borgeat notes that:
even under "best case" assumptions — assuming zero costs, just revenue against capex — the Financial Times calculated the implied return on hyperscaler AI investment from 2025 to 2030.

Only one of them clears positive.

Implied return on AI investment (FT / Panmure Liberum)
– Microsoft: -9.2%
– Alphabet: -15.7%
– Amazon: +7.2%
– Meta: -28.8%
– Oracle: -35.6%

And remember: that's assuming zero costs. In reality, GPUs depreciate, power bills run, salaries get paid.
In
The AI Industry Is Panicking, Will Lockett estimates that over the next few years the AI platforms will accumulate around $3T in debt. Assuming this is at 3% over 10 years, servicing the debt will take $309B/year:
This means that for the AI industry to service its debt, it needs to generate hundreds of billions of dollars in profit each year.

Even giant monopolies like Google don’t make enough profit to service that much debt. AI can’t just be a novelty industry; it needs to replace human labour on a colossal scale to service this debt. Let’s optimistically assume AI one day reaches a 10% profitability margin, a cost parity with human labour, and the ability to complete most jobs (none of which are currently the case). Well, the average US salary is roughly $66,000, so at a 10% profit, the AI company will make on average $6,600 per year per job it replaces. To generate the $309 billion needed to service their debt, the AI industry will need to replace 46.8 million jobs, equivalent to around 27% of the current number of jobs in the US.

While this is all very rough maths, it highlights the implicit bet created by the debt the AI industry has racked up. To simply not default on this debt, the AI industry has to rapidly displace human labour at a staggering scale, even if we are extremely optimistic about AI’s economics.
One caveat with Lockett's math is that the cost of employing a human is greater than just the salary. It includes the employer's Social Security tax, health insurance, office space and so on. Chatbots don't need any of these. According to the Bureau of Labor Statistics:
Wages and salaries averaged $32.60 per hour worked and accounted for 69.9 percent of employer costs, while benefit costs averaged $14.01 per hour worked and accounted for the remaining 30.1 percent.
So the average profit per job would be around $9.5K, and the number of jobs displaced would be around 32.5KM.

How was the switch to token-based pricing received? We can guess from three pieces of recent news:
Historically, companies wishing to IPO would be profitable. More recently they could have a successful IPO by showing a plausible path to profitability. Now, SpaceX has shown that even massive losses and a claimed path to profitability that is completely implausible is not a barrier to a successful IPO. But even despite this example, one would think that the last thing two companies racing to IPO despite massive losses and implausible paths to profitability would want would be to engage in a "drastic" price war.

Footnotes

  1. xAI's product is so bad that even their employees won't use it and Musk has said it needs to be re-written from the ground up. So xAI has been reduced to renting its compute infrastructure to its competitors.

2026 Virtual DLF Forum: Registration Open, Program Released, Keynote Announced, Digital Storytelling Fellows Application Open / Digital Library Federation

Registration is now open for the virtual Digital Library Federation’s (DLF) Forum online, October 14-15, 2026. The DLF Forum is a dynamic gathering place for GLAM professionals to share ideas, sustain critical work, and spark innovation. It connects library, archives, and museum practitioners, as well as other knowledge workers, through intentional community building and collaborative exchange. Learn more about 

The full Forum program is also available to explore, and we’re thrilled to announce our keynote speaker, Dr. Salwa Ismail. Additionally, the application for Digital Storytelling Fellows is now open. 

 Be sure to check out the exciting details and join us in building momentum for what’s sure to be an inspiring experience.

Secure the early bird rate and start planning for yet another memorable online conference with DLF. 

DLF member organizations receive two complimentary registrations for the virtual DLF Forum as part of their member benefits. Not sure who received your code? Email us at forum@diglib.org

Register Today

If you have any questions, please write to us at forum@diglib.org. We’re looking forward to seeing you online this fall.

-Team DLF

P.S. Want to stay updated on all things related to the DLF Forum? Subscribe to our Forum newsletter.

The post 2026 Virtual DLF Forum: Registration Open, Program Released, Keynote Announced, Digital Storytelling Fellows Application Open appeared first on DLF.

2026 Virtual DLF Forum Digital Storytelling Fellows / Digital Library Federation

Applications are now open for the 2026 Virtual DLF Forum Digital Storytelling Fellows program

This year, DLF is launching a new fellowship experience that is directly connected to the Forum’s new Digital Storytelling Presentation session format. The program centers on digital storytelling, emerging technologies, and ethical practice across libraries, archives, and museums, while creating intentional opportunities for participation, reflection, and community engagement in a virtual setting.

We invite early-career and underrepresented practitioners to participate in the 2026 Virtual DLF Forum and help shape conversations about how stories, platforms, technologies, and communities intersect in our work.

Why Digital Storytelling?

Digital storytelling projects, including exhibits, platforms, collections, and collaborative archives, are increasingly central to how cultural heritage organizations document, interpret, and share knowledge. These projects also raise important questions about representation, labor, technology, access, and stewardship.

A cohort of 8–10 Fellows will engage directly with these themes through participation in DLF’s new Digital Storytelling Presentation sessions: interactive, installation-inspired presentations featuring collaboratively developed digital projects and dedicated discussion time. Because this is a new and experimental session type for 2026, the fellowship intentionally builds in structured engagement and feedback to help strengthen the experience and better understand what works in a virtual Forum environment.

Fellows will serve as conversation catalysts during these sessions by contributing questions, reflections, and observations that surface broader themes across the Forum community.

About the New Session Format

40-minute Digital Storytelling Presentation
An interactive session highlighting digital storytelling projects developed through collaborative partnerships. Digital Storytelling Presentations focus on installation-inspired digital storytelling work, such as exhibits, platforms, or collections, designed for immersive and experiential engagement. Sessions may feature up to three presenters, for example, pairing a digital librarian or archivist with a community partner, student, artist, or scholar whose work is represented in, or inspired by, the project. Sessions will be scheduled within 50-minute blocks, leaving dedicated time for Q&A and discussion. Read more about the new session type here: https://www.diglib.org/digital-storytelling-in-practice-a-new-session-format-for-the-dlf-forum/ 

What Fellows Receive

Selected Fellows will receive:

  • Complimentary registration to the 2026 Virtual DLF Forum
  • A $250 stipend
  • Participation in a small cohort of 8–10 Fellows
  • A pre-Forum virtual orientation and meet-and-greet
  • Visibility through publication on the DLF blog

Fellowship Expectations

Fellows will:

  • Participate in a virtual orientation session on October 6, 2026
  • Attend the 2026 Virtual DLF Forum on October 14–15, 2026
  • Engage actively in Digital Storytelling Presentation sessions by:
    • Asking questions in chat or live discussion
    • Optionally sharing brief verbal reflections during session discussions
    • Incorporating insights from storytelling sessions into their post-Forum reflection
  • Participate in a virtual debrief session and/or complete a feedback survey
  • Contribute a short public reflection for publication on diglib.org (up to 1,000 words, due November 15, 2026)

Reflections may explore themes such as ethical technology, collaborative storytelling, digital exhibits, community memory, access, or emerging questions around provenance and stewardship.

Selected Fellows must attend the Forum in order to receive the stipend. Apply here. 

Who Should Apply

We welcome applications from:

  • Early-career professionals (fewer than 7 years of experience)
  • Students and recent graduates
  • Contingent, contract, and adjunct practitioners
  • Professionals working in under-resourced or capacity-limited institutions
  • First-time DLF Forum attendees
  • Practitioners whose identities and perspectives are historically underrepresented in digital libraries and cultural heritage spaces

We recognize that professional pathways are not always linear. If you are unsure whether you meet these criteria, we still encourage you to apply.

Application Process

The application process is designed to be straightforward and accessible. Applicants will:

  • Answer a few short questions
  • Submit a brief personal statement (maximum 4,000 characters)
  • Share a link to an online professional profile, if available

Application available here: Google Form

Applications Close Tuesday, July 21, 2026, at 11:59 pm ET

The post 2026 Virtual DLF Forum Digital Storytelling Fellows appeared first on DLF.

I taught a bucket to speak git / Xe Iaso

What happens if I just point a git server at an object storage bucket?

Back when I was porting agent sandboxes to Go, I built everything on top of billy, a filesystem abstraction for Go. The whole trick of the project was teaching a Tigris bucket to act enough like a filesystem that a shell interpreter and its tools couldn’t tell the difference. Billy was the key layer that made the entire façade fall into place.

After I had gotten things working, I learned that I’m using billy way outside its normal usecase. It was originally made for go-git, a pure-Go implementation of git’s protocols and data formats. It doesn’t rely on the /usr/bin/git binary existing at all. Every method on billy’s filesystem interface exists purely because go-git needs it. This gave me a terrible idea: I already have a bucket that can quack like a filesystem and go-git’s native language is “filesystem”.

Can this Just Work™? Let's find out.

Git was always an object store

If you strip away the porcelain, a git repository is 4 basic things:

  • Objects, or compressed blobs of data. Most of the objects in any individual repository are files.
  • Trees, or objects that map to other objects. TL;DR: trees are folders.
  • Commits, or objects that point at one tree and their parent commit. This lets you pin down which files belong to one logical change set.
  • Refs, branches and tags, they are tiny mutable pointers into the pile of objects.

Note

Until I started working on this I was under the impression that git stored only the patches done to an empty folder and that was how it reconstructed the history of your repository. It does not. It actually keeps track of the entire files, which explains why big binary blobs fudge the tooling so much. The diff mental model works fine for using git day to day; it’s just wrong at the storage layer, which is the layer this post lives in.

For example, let’s say I just made a new git repository and committed a README.md to it. The tree for the .git folder looks something like this:

$ tree .git
        .git
        ├── COMMIT_EDITMSG
        ├── config
        ├── HEAD
        ├── index
        ├── objects
        │   ├── 5e
        │   │   └── b8151eb669aa4467b6dea2c4bce19183cd0b41
        │   ├── 6a
        │   │   └── 6a8ecfcae2632152486aca3d9150ef83dedd66
        │   ├── f4
        │   │   └── d2487a1c6d742c8037c0296ddf80625190bd80
        │   ├── info
        │   └── pack
        └── refs
            ├── heads
            │   └── main
            └── tags
        

As you can see there are three objects. One of them is the commit 5eb8151eb669aa4467b6dea2c4bce19183cd0b41, the next is the tree, and the last one is the README file. The main branch also points to that commit:

$ cat .git/refs/heads/main
        5eb8151eb669aa4467b6dea2c4bce19183cd0b41
        

The cool part is that half of this is content-addressed. The content-addressed bits never change once they’ve been committed. Git objects are a great fit for Tigris’ internal model because they are append-only storage, just like the fundamental model Tigris is built upon. The things that do change often are the refs, which are updated to point to the latest commit. These are tiny files though, which means that Tigris can handle them with no effort required.

However, when we host git repositories on a server, we end up creating single points of failure. Our git repos are hosted on single machines that can and will break. The entire implementation relies on git objects being 1:1 correlated with filesystem objects because everyone (even GitHub) shells out to the git binary to actually store files. Hosting git repos becomes one of the most stateful services in our stateless cloud-native environment.

Sure git is in-theory decentralized, but most of us have ended up using that to put our git repositories in one big store that has questionable uptime practices: GitHub. To be fair to hubbers, GitHub operates at a scale that none of us can really think about. They’ve been pushing the limits since their inception where they had to get Engine Yard to keep building them bigger servers to handle the load. They have to do everything with a big mounted filesystem because git’s tooling gives them no other option.

A travesty of horrors beyond human comprehension

Now suppose this weirdness bothers you enough to do something about it. To build a git server without storing everything in the local filesystem, you have to speak git somehow, and the conventional options aren’t really all that great:

  • If you shell out to the git binary, now your “library” is the argv of the git process and your error handling is screen-scraping output. Internally, git implements its functionality with a billionty subcommands rather than exposing it all as a library. The codebase is held together by load-bearing calls to die(), which kills the process.
  • If you link into git’s guts with libgit, you inherit the “when things go bad, die()” behaviour and your app now suddenly starts crashing at random. This is not good for uptime.
  • If you try to use libgit2 (the rewrite-that’s-actually-a-library), you have to reckon with the fact that it’s addled by the GPL (with a linking exception, try explaining that to your lawyers), you have to eat the jump to C every time you do anything with git (very often), development has stalled, the Go bindings have been archived, and it still assumes a local filesystem despite assurances it does not.

It might sound hopeless, right? You may be able to use WebAssembly or something to contain the madness (assuming you have a good way to implement fork()/exec() or posix_spawn() or something similar), but what if there was a pure Go library that could handle this all for us?

Enter go-git, a pure-go implementation of the git protocol and internals from scratch. This doesn’t rely on cgo or /usr/bin/git and it does not assume the repositories are stored in the local filesystem. Its storage interface is written against billy, the exact interface I’ve already taught to speak Tigris. I wanted a git server that was just in a bucket and the pieces were sitting there and calling to me.

Oh no, it works

So I hacked up objgit, a git server backed by object storage. The only filesystem call I had to add to get it booting was MkdirAll. I wired up the transport package to a socket to implement the plaintext git protocol, hooked it up to a bucket, and pushed the repo I was currently working on.

To my absolute astonishment, it worked.

Git pushed, pulled, logged, blamed, tagged, the whole kit and kaboodle. I didn’t have to implement git myself, I just committed an egregious amount of shoving a square peg into a round hole until the peg went in.

In hindsight this makes an annoying amount of sense. A bare repo is those four kinds of things on a filesystem; swap the filesystem for object storage and everything else Just Working™ is perfectly logical. Git’s on-disk format is its database schema and if you fake open/stat/rename convincingly enough the entire façade keeps working because APIs are the lies we tell ourselves to make us sleep at night.

After a lot of hacking, I ended up with a feature list kinda like this:

  • Push and pull over three transports: HTTP, classic git://, and SSH
  • Repositories upserted on first push
  • Absolutely no effort put into authentication as this is an experiment and authentication is annoying and complicated
  • Prometheus metrics so I could optimize the filesystem layer

Everything comes out of one Go binary with no local state, even the generated SSH keys are stored in the bucket. You can run this in a Kubernetes cluster with only the mutable storage required being temporary files for an optimistic cache when doing smart git clones.

The rest of this post is what it took to get from “oh no, it works” to something close to usable.

Obligatory disclaimer (like the best things in life): this is an experiment. It has not been tested thoroughly or vetted for correctness. If it breaks in half, you get to keep both pieces. Please do not move your company’s monorepo onto this and then email me when it catches fire.

That one POSIX idiom that survived

Git is paranoid about durability, and its entire strategy is one Unix idiom that you end up seeing many places: write new data to a temporary file and then rename(2) it into place after you’ve assured it’s correct. POSIX guarantees that rename is atomic, so readers either see the old file or the new one, not an intermediate state inbetwixt the two. Packfiles (bundles of objects) land as temporary files when uploaded then moved to their permanent home. Refs are written as locked temporary files and then renamed over the ref. It’s rename all the way down.

Object storage traditionally does not have rename as one atomic operation. S3’s answer is to create exactly that intermediate state: CopyObject to the new place and DeleteObject on the old one. This makes the most load-bearing idiom in Git’s philosophy fall to pieces.

Luckily, Tigris has an extension for this: RenameObject. To use it, pass an additional X-Tigris-Rename: true header to a CopyObject call and instead of copying then deleting on the client, it moves the metadata around on the server. One round trip, no data movement, and the Unix idiom maps on the bucket 1:1. Objgit’s implementation of Rename is trivial:

// internal/s3fs/basic.go
        // RenameObject is a Tigris extension that renames in place (no data copy),
        // so we don't need a separate CopyObject + DeleteObject.
        copySource := fs3.bucket + "/" + src
        _, err := fs3.client.RenameObject(ctx, &s3.CopyObjectInput{
            Bucket:     &fs3.bucket,
            CopySource: &copySource,
            Key:        &dst,
        })
        

A second, sneakier violation hides in the same codepath. When go-git writes a temporary file, it creates that temporary file and then immediately starts opening it for reading so it can build the pack index. You cannot do that with a single live object in any object storage system, you are either reading or writing, never both. I ended up working around this by cheating a bit and buffering the contents of newly written pack files into memory so that this game of chicken kept working. I may have to change this to write that pack cache to the filesystem as trying to push gcc.git made me run out of RAM. At the very least, everything lies consistently enough that git doesn’t care, so win!

Death by a thousand stat() calls

With this correctness sorted, I tried pushing the golang/go repository to objgit to see how long it would take. It did work, but it took forever. Using the prometheus metrics I mentioned before, I saw that it was making biblical amounts of HeadObject calls. Some blocking profile analysis pointed to the fact that the git library was using the stat() call to detect if a file exists. The flow was like:

  • Client has object x
  • Check if object x exists
  • Check if any pack has object x

And so on ad infinitum. This is fine-ish on a local filesystem because those syscalls resolve in microseconds, not the tens of milliseconds it takes to get from my office to the nearest Tigris region (please expand to Ottawa, I would love that so much).

This was compounded with a discovery that the transport I was using (SSH — classic git:// shares the same code path) was exploding every packfile into loose objects when pushing it. Each loose object write was costing two round trips: stat() to check if a file exists and then open() / write() to actually put the data into Tigris. This made a 100,000 object packfile cost 200,000 object storage calls. Call it 10ms of latency for each one, and that’s over half an hour of waiting for responses that mostly say “404 not found”.

Caching can’t really save you here either, read caches would absorb the repeated reads; but this is a firehose of writes to 100,000 paths that probably have never been read and likely will never be seen again.

The reason only two transports had this problem is a deadlock story. The git library's fast path stores an incoming pack whole through its PackfileWriter, by copying from the connection until io.EOF. Over HTTP that's fine: the request body ends, EOF arrives, everyone goes home. Over git:// and SSH, the connection is a persistent socket and the client is holding it open, politely waiting for the server's status report. EOF never comes. The copy waits forever, the client waits forever, and you have invented a distributed deadlock with two participants. The original workaround was to hide the PackfileWriter capability on those transports so go-git fell back to its streaming parser that writes every object loose. Hence the stat storm.

So the solution was to stop depending on EOF at all. Packfiles are self-delimiting: the header says how many objects are coming and a trailing checksum marks the end, so a packfile scanner walks the stream and stops at the trailer while a TeeReader mirrors exactly those bytes into the PackfileWriter. This makes the rest of the façade fall into place and the git library is happy. This made pushes into two uploads: a packfile and its index instead of a torrent of round trips that mean nothing.

What about cloning?

Once I got pushing fixed, I moved on to the read path. In order to emulate ReadAt, I used ranged GetObject requests so that the git library could read individual objects out of packfiles. I was happy with this hack, but there was one problem: the latency curse struck again. Cloning a simple repo with 318 objects and a 200KiB packfile made over 8,500 GetObject calls before I killed it.

A git client cloning a repository reads repository packfiles thousands of times with random access, walking objects and candidate delta bases over and over. On a local disk you never notice because your page cache eats that for breakfast. When every call is an HTTP request, a 200KiB repo turns into dozens of megabytes of round trips. A 20MiB repo was effectively unservable.

In other words, I had un-cached the one workload that caching was designed to solve.

The fix leans on a gift from git: pack files are immutable and content-addressed. pack-<sha>.pack will never change for as long as it exists. This makes them trivially cacheable to a faster local medium, such as the filesystem. No invalidation logic is required. I made objgit download packs to a local temporary folder and serve reads from there. To be on the safe side, I did add least-recently-used caching to the mix so that my temp folder wouldn’t blow up unexpectedly. This does mean that the first request for pack files is slower, but then everything else is at filesystem speed.

Yes, this relies on the local disk again, but only as a cache that can and will be thrown away. I think trading a stateless ideal for clones that terminate in reasonable amounts of time is a worthwhile bargain.

Why so ListObjectsV2, Batman?

Once the other disasters were out of the way, one more remained: the metrics showed a flood of ListObjectsV2 calls every time a clone was made and didn’t stop making those calls after it was done.

Two things compounded. First, when git looks up an object that isn't packed, it probes for a loose object at objects/<xx>/<rest-of-hash>. objgit keeps packs whole, so there are no loose objects, so every probe misses, and each miss across a distinct two-hex prefix triggered a directory listing to find out. There are 256 possible prefixes. A single clone could issue up to 256 ListObjectsV2 calls whose collective answer was a resounding "there is nothing here."

Second (and more embarrassing), the listing cache already had an optimization for this. It collapsed entire subtree lookups into recursive scans so a single listing of the repository could answer every stat() and probe beneath it. It was completely dead in production. The cache matched recursive prefixes against the repo root (refs/), but every repo is chrooted to its own directory, so real keys look like myrepo.git/refs/heads/main. The prefix check wasn’t aware of chroots so it never actually matched anything. Nobody noticed because a cache that degrades to “no caching” still returns the correct answer, just slowly. To rub it in, a cache warmer was dutifully re-listing every one of those useless prefixes every 30 seconds for 10 minutes after each clone. Thousands of background list calls were burned in the service of caching nothing of use.

The fix was insultingly small: when a repo’s filesystem gets chrooted, register that chroot as a recursive subtree root within the cache. This made the cache actually useful and resulted in only one ListObjectsV2 call instead of hundreds. Every sufficiently advanced cache is indistinguishable from a no-op until someone graphs the miss rate.

None of these disasters were exotic. They’re the things filesystems and kernels give you for free — and every perfectly reasonable disk assumption fell to pieces once a network round trip sat at the core. Serving Git repositories is an accidental filesystem latency benchmark. If your storage abstraction has a weak point, Git will find it and the metrics will show you where that problem is.

Post-receive hooks go in clown jail

One of the most useful parts of hosting your own git server is setting up post-receive hooks. These have been used since time immemorial for things like automatic deployments when you push code to the server. The heart of this is how we get systems like GitHub Actions: it’s code that runs when you are done pushing.

When you push to objgit with --allow-hooks enabled, it looks for a post-receive hook in .objgit/hooks/receive-pack (this corresponds to the git plumbing action, the name can and will be changed) in the tree of the commit you just pushed. It will then spin up a kefka sandbox with a checkout of the git repository at the commit you just pushed mounted at /src and mutable temporary files at /tmp. It gets coreutils and nothing else. No host filesystem, no network, no arbitrary binaries. Output streams back into the pusher as remote: lines just like when you git push heroku main. Eventually I want to make custom commands to allow you to deploy Tekton pipeline changes and kick off CI jobs that way, but for now I’m happy with this working at all.

You can’t implement policy using these hooks yet. I’m working on it.

Now what?

I taught a bucket to speak git. Where this goes next, roughly in order of how much the ideas keep me up at night.

CI is the obvious next step. I would wire up commands for things like “apply kubernetes object” and “create tekton pipeline run” so that CI would run via your friendly neighborhood Kubernetes cluster and then notify you through some reasonable mechanism. That’s the first thing I’ll build when I have the time.

It would be nice to have a web UI for this, which is complicated for reasons that have nothing to do with git trees, object storage, or anything else and everything to do with the current state of the internet. Git lookups are expensive in the best cases and with the current torrent of unethical scraping ransacking git servers for every scrap of RAM they have, it’s probably a bad idea to implement this without a lot of clever optimizations. Maybe the fact that this doesn’t have load-bearing dependencies on /usr/bin/git would make it more resilient against scrapers. The fact that this is based on object storage could also mean that caching would be a bit easier (having basically unlimited storage is kind of a low-key superpower for caching), but then the main issue would be server load. It’s a tough cookie to handle.

Performance and stability are another place this needs to improve. I’ve tested this on my developer workstation but that is far different from testing it in production. There’s some other performance issues that are easy to fix, but the big one is latency to Tigris. Maybe I can get the devops team to set me up a k3k cluster in production.

Right now this is an experiment as I plug along and feel out the shape of what git-on-object-storage can be. A git server with no disk, no git binary, and no database. If you want to take a look, check it out on GitHub.

Slow management in a fast world / Meredith Farkas

Last month, I had the great pleasure of keynoting the CALM (Conference on Academic Library Management) Conference, which is consistently one of my favorites. The video of my talk, Slow Management in a Fast World, is available below for those who would like to check it out! You can also view my slides here which include a long bibliography of works that influenced my talk at the end.

CALM Conference opening Keynote: Slow Management in a Fast World

Many thanks to all the amazing folks who organized this conference; it was such an honor and a pleasure to be part of it!

Bookmarks - book, ai, map, llm / Ed Summers

These are some things I’ve wandered across on the web this week.

🔖 RFC 9958: Post-Quantum Cryptography for Engineers

The advent of a cryptographically relevant quantum computer (CRQC) would render state-of-the-art, traditional public key algorithms deployed today obsolete, as the mathematical assumptions underpinning their security would no longer hold. To address this, protocols and infrastructure must transition to post-quantum algorithms, which are designed to resist both traditional and quantum attacks. This document explains why engineers need to be aware of and understand post-quantum cryptography (PQC), and it details the impact of CRQCs on existing systems and the challenges involved in transitioning to post-quantum algorithms. Unlike previous cryptographic updates, this shift may require significant protocol redesign due to the unique properties of post-quantum algorithms.

🔖 ElectricityMaps

At Electricity Maps, we’re data scientists, first and foremost.

Data comes in from many sources, and in many formats. We ingest and harmonize it, apply our models to it, and make it available to the world. This is the place to learn more about our data; read FAQs, or deep dive in our methodology.

🔖 BLOBPROC

BLOBPROC is a less kafkaesque version of PDF postprocessing found in sandcrawler, which is part of IA Scholar infra. Specifically it is designed to process and persist documents with minimum number of external components and little to no state.

The goal is to have artifacts (fulltext, thumbnails, metadata, …) derived from millions of PDF files available in a storage system (e.g. S3). In the best case, the artifacts can be kept up to date in an unattended way

🔖 AI-assisted engineers are burning out, is this fine?

We’re more productive than ever. AI allows us to generate code at supersonic speeds, unfold entire modules in seconds, and ship thousands of lines of code. It’s easier to pick up tasks and generate value, even in unfamiliar codebases. But there’s a dark side. AI-assisted code generation isn’t free; there’s a hidden cost that we as an industry are only beginning to realize: AI burnout. Are we dangerously ignorant to this problem? And how can we cope with it?

🔖 Banned Book Library

A long while back I had an idea to hack a WiFi smart light bulb to do something more useful to me. Actually, I had a few different ideas of things to do with them. One of these ideas was to modify the device to have an open WiFi access point and a web server hosting banned books. The idea was that if you lived somewhere that banned books you thought were important, you could theoretically stick a digital copy of the book on one of these light bulbs. Then you could go install it somewhere in your community

🔖 AI Economics for Dummies

As AI companies get ready to go public and we get a deeper look at their inner workings, it’s only natural to have questions about their finances, like “Do they make money?” and “How?” Here are a few examples to help the average layperson understand the business side of AI.

🔖 Jerry’s Map

… the Map is now a two-dimensional “virtual world” art project which is now comprised of over 4000 individual eight by ten inch panels. When assembled, these panels form an approximate circle. The panel locations are defined by N, S, E, and W coordinates that originate at the center of the circle. The locations in the matrix do not change, but the panels themselves are continually revised based on instructions drawn from the artist’s custom deck of cards.

🔖 Building a Small Language Model (SLM)

A step-by-step Jupyter Notebook demonstrating how to build and train a compact small language model (“SLM”) from scratch using the TinyStories dataset. Covers data preparation, BPE tokenization, efficient binary storage, GPU memory locking, Transformer architecture, training configuration, and sample text generation.

🔖 Togetherness by Rowan Hooper review – a stunning portrait of cooperation in nature

Hooper adores Darwin – his account of visiting Darwin’s Kent residence Down House radiates reverence (“it’s a pseudo-religious experience”). But he feels that Darwinism and its union with genetics in the so-called “modern synthesis” has placed undue emphasis on competition in the natural world and underplayed the roles of cooperation and collaboration. In redressing that imbalance, Togetherness is not an attempt to make evolution cuddlier and more palatable; rather, it is a corrective deeply informed by what we have learned since Darwin about how nature works. Written with immense charm and passion, and packed with eye-popping facts, it is also a paean to the wonders of nature and the value and urgency of preserving them.

🔖 A hidden infrastructure

Underground networks

Beneath our feet lie networks of invisible ecosystem engineers: arbuscular mycorrhizal (AM) fungi.

These fungi form trading relationships with more than 70% of plant species, building networks of tubular cells called hyphae that extend the surface area of root systems up to a hundred-fold.

Collectively, these networks comprise one of Earth’s circulatory systems.

🔖 The Era of Cheap AI Is Over

In the words of Harvard business professor Andy Wu, most people don’t realize how “ridiculously expensive” AI is. Most are aware of the high fixed costs, but not the variable inference costs incurred every time the model generates an image. OpenAI expects to spend more than $150 billion on inference costs alone through 2030. While the vast majority of users continue to access the platform for free, the question is how the gap between resources and revenue will eventually close, and who will bear the costs.

🔖 Mes forêts

«Son nom semble la relier à une constellation, mais sa présence au monde la rend indissociable des paysages qu’elle traverse : Hélène Dorion vit environnée de lacs et de forêts, de fleuves et de rivages, de brumes de mémoire et de vastes estuaires où la pensée s’évase. Dans ce recueil voué aux forêts, elle fait entendre le chant de l’arbre, comme il existe un chant d’amour et des voix de plain-chant. « Mes forêts… », dit-elle dans un souffle qui se densifie de poème en poème. Et l’on entre à pas de loup dans une forêt de signes où l’on déchiffre la partition de la vie sur fond de ciel, sur fond de terre, sur fond de neige, de feuillages persistants et de flammes qu’emporte le vent, de bourgeons sertis dans l’écorce et de renouvellement. Un chemin d’ombres et de lumière, qui donne sens à ce qu’on appelle humanité. »

🔖 Pudsey Clough Radio

If you have problems with webpage playback try these stream buttons,or add the urls below to VLC or any other streaming/netradio software: https://orllewin.radioca.st/stream - High quality 256kbps stream. https://orllewin.radioca.st/lofi - Bandwidth friendly 64kbps stream.

🔖 MANIFESTO JAM 2026

The manifesto, in my imagined alternative, is the ugly smear on the polished surfaces of conference keynotes, aspirational #bizdev posts and job-ready portfolio pieces. The manifesto is awkward, clunky, impractical, confronting, uncompromising, defiant: all qualifiers undesirable in an increasingly professionalised, corporatised game making ecosystem. These traits are what makes the manifesto beautiful.

🔖 Microsoft turns to Amazon for help with GitHub’s AI-driven capacity issues

Microsoft is turning to its biggest cloud rival, Amazon, to help address capacity issues on its GitHub coding platform following a series of AI-driven outages, according to two people familiar with the plans.

GitHub, which Microsoft acquired in 2018, is a popular place for engineers to store and manage code, and collaborate on projects. As an independent company, GitHub mostly operated its own data centers, but Microsoft had planned to move the coding platform entirely to its Azure cloud service by 2027.

Now, a boom in AI demand is forcing Microsoft to lean on Amazon. AI coding tools have made it easier for developers to write more software. That has swamped GitHub with a flood of new code, straining its compute resources.

2026-06-19: From Chalk Dust to Code: My Journey from a Small Town to a Ph.D. in the U.S. / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

I was born in a small town to two schoolteachers who believed education was not just a profession but a purpose. Growing up in such an environment meant that learning was never forced; it was simply part of everyday life, and curiosity was always encouraged rather than questioned. Books were treated like companions in our home, and questions were welcomed more than answers. My father, a mathematics teacher and statistics topper, did not just teach numbers. He taught me how to see patterns in the world, how to question things, and how to stay curious. He had a way of turning ordinary moments into lessons, showing me that knowledge was not confined to classrooms but hidden in everything around us. Conversations at home often revolved around ideas, discipline, perseverance, and integrity, quietly shaping my mindset long before I understood their value. That atmosphere made me believe that effort mattered more than circumstance and that consistency could take a person farther than talent alone. No one imagined back then that this quiet boy would one day cross oceans and earn a Ph.D. in Computer Science. Dreams rarely ask where you start. They only ask how far you are willing to go.

My mother, who was also a schoolteacher, played an equally powerful role in shaping my values. From her, I learned patience, discipline, and the importance of consistency in everything I pursued. She believed that true education was not about marks but about character, and she constantly reminded me that knowledge should make a person humble, not proud. Watching both my parents teach day after day made learning feel natural to me, not like a task but like a way of life.

My grandfather’s life story was another silent source of inspiration. He grew up during colonial India and witnessed the struggles of a nation finding its identity. Rising through hardships, he eventually earned the respected position of a gazetted officer, a journey that required perseverance, resilience, and integrity. His life stood as proof that circumstances do not define destiny; determination does. Even without long speeches, his presence alone taught lessons that no classroom ever could.

I spent most of my childhood at my grandfather’s house because it was close to my school, and that environment shaped me deeply. The atmosphere there was disciplined, structured, and principled. Time was respected, routines were followed, and values were lived rather than spoken. Growing up in such surroundings quietly instilled habits that later became my strongest foundation during demanding academic years and life’s toughest challenges.

Growing up in a modest household meant resources were limited, but encouragement was abundant. Books were never treated as objects but as companions, and curiosity was never dismissed as childish. Those early years quietly shaped the mindset that later helped me face some of the toughest academic and personal challenges of my life.

My academic journey began with a Bachelor’s in Information Technology, followed by a Master’s in Computer Science at Manipal University. That was where curiosity turned into direction. I started building systems, experimenting with ideas, and asking questions beyond textbooks. I built an AI-based disease prediction model, and that project showed me something important. Technology is not just code. It is an impact. That realization changed the way I looked at learning and the future.

University life was not only about grades or achievements. It was where I learned independence, responsibility, and how to handle failure. Every project deadline, presentation, and challenge strengthened not only my technical knowledge but also my confidence in my own potential.

Alongside academics, I always had a creative side that refused to stay silent. I pursued a diploma in filmmaking because storytelling fascinated me as much as algorithms did. Cinema taught me perspective, emotion, and imagination. I developed a deep interest in psychological, horror, and suspense films because they explore the human mind in ways that science alone cannot explain. I even had the opportunity to perform on stage as Lord Krishna in a theatrical production, an experience that taught me confidence, presence, and expression. Since then, I have carried a quiet dream within me to one day create a film of my own.

That phase of life taught me something powerful. It showed me that growth does not happen when we limit ourselves to one dimension but when we allow different sides of our personality to coexist and complement each other. Creativity and logic are not opposites. They are partners. One fuels possibility while the other shapes it into reality. The ability to imagine helps innovation, and the ability to analyze helps execution. When both work together, ideas do not just remain thoughts; they become solutions. This balance later became one of my greatest strengths in research and problem-solving. During my university years, I became the only student from my institution to secure a full-time internship at HP R&D, where I worked on an AI-powered auto-diagnostics system. Walking into that environment, surrounded by brilliant minds and real-world challenges, felt both humbling and motivating at the same time. It pushed me to raise my own standards and think beyond what I had previously believed possible. For the first time, I saw how research and industry could come together to solve real problems. I realized that technology is most powerful when it moves beyond theory and begins to create a tangible impact on people’s lives. That experience strengthened my belief that innovation happens when curiosity meets discipline.

Walking into that workplace for the first time felt surreal. The environment was unlike anything I had experienced before, filled with people who spoke the language of innovation, curiosity, and possibility. It was proof that hard work can open doors you once thought were unreachable. In that moment, I realized that opportunities are not reserved for a select few; they often wait quietly for those willing to persist long enough to find them. More importantly, it showed me that I belonged in spaces where ideas mattered more than background.

Soon after, I achieved another milestone by becoming the first student from my university to intern at Procter & Gamble in Europe. There, I developed AI and IoT tools for safety and automation. It felt like I had finally reached the dream I once imagined. But life sometimes asks you to step away from comfort to pursue purpose. Leaving that opportunity after countless overnight visa trips was not easy, but I chose uncertainty because I wanted to create knowledge and not just apply it.

That decision was not understood by everyone. Some questioned it, others doubted it, and a few even discouraged it. But growth often begins where comfort ends. I realized that the path to something extraordinary rarely looks safe or predictable. Then COVID arrived, and my Ph.D. journey was delayed by nearly two years. Plans paused, uncertainty grew, and the path ahead looked unclear. Instead of waiting for circumstances to change, I decided to change myself. I spent that time upskilling, studying, building projects, and preparing for an opportunity I could not yet see. In August 2021, during travel restrictions and global uncertainty, I boarded a flight to the United States carrying two things. Fear and determination.

That flight was more than travel. It was a turning point. It symbolized leaving behind familiarity and stepping into the unknown with faith. Moments like that define a person not because they are easy, but because they demand courage. A Ph.D. is not just a degree. It is a test of patience, resilience, and belief. It is months of work that sometimes lead nowhere, papers rejected after weeks of effort, ideas challenged, and moments when you question yourself. But it is also growth, clarity, and discovery. Each obstacle became a lesson, and each lesson made me stronger. I learned that persistence is not loud. It is quiet, steady, and stubborn.

Some of the most important lessons I learned during my doctoral journey were not written in textbooks. They were learned in silence, in reflection, and in perseverance. Research does not reward speed. It rewards depth. It does not reward noise. It rewards clarity. 

Over time, that persistence began to show results. I published more than twenty research papers and had the opportunity to present my work at international conferences across Europe, Australia, and North America. These experiences allowed me to engage with researchers from around the world, exchange ideas, and refine my perspective on accessibility, artificial intelligence, and human-centered computing.

Selected conference presentations and research travel included presenting at the 28th International Conference on Theory and Practice of Digital Libraries (TPDL 2024) in Italy (Conference Report), participating in the ACM SIGWEB Conference on Hypertext and Social Media (HT 2024) in Poznań, Poland (Conference Report), presenting at the ACM SIGCHI Conference on Engineering Interactive Computing Systems (EICS 2023) in Swansea, Wales, United Kingdom (Conference Report), and attending the International Conference on Intelligent User Interfaces (IUI 2023) in Sydney, New South Wales, Australia (Conference Report).

My research journey was also enriched by industry experiences, including a Summer Research Internship at ISG (Internship Report) and a Summer Data Analytics Internship at PRA Group (Internship Report). These opportunities allowed me to apply research ideas in real-world settings and strengthened my understanding of how academic innovation can create practical impact.

These experiences culminated in receiving the Best Paper Award at ACM W4A 2025 for our work on adapting online customer reviews for blind users, a recognition that remains one of the highlights of my doctoral journey. I shared reflections on this achievement and the award-winning work in this X post about the ACM W4A 2025 Best Paper Award.

Standing on international stages and presenting my research to global audiences was humbling. Each presentation reminded me that knowledge has no borders and that ideas can travel farther than we ever can. 

Eventually, the moment arrived that once felt impossibly far away. I earned my Ph.D. in Computer Science from Old Dominion University. My dissertation defense marked the culmination of years of research in accessibility, artificial intelligence, and human-computer interaction. My dissertation is publicly available through Old Dominion University Digital Commons: http://doi.org/10.25777/767n-ra09. The defense presentation and selected photo from the event are included below.

This is an embedded Microsoft Office presentation, powered by Office.


Standing there, I did not just see a degree. I saw every late night, every doubt, every rejection, every lesson, and every person who supported me along the way. I thought about the sacrifices my family made, the mentors who guided me, and the friends and colleagues who encouraged me during difficult moments. I was reminded that every challenge had shaped the person I had become. What began as a dream in a small town had gradually unfolded into a journey that took me across continents, introduced me to remarkable people, and challenged me in ways I never imagined.

Looking back, I saw more than academic milestones and professional achievements. I saw a young student driven by curiosity, a researcher shaped by persistence, and a person transformed by every challenge encountered along the way. In that moment, I understood that success is never a single event. It is a collection of moments, sacrifices, failures, risks, and resilience stitched together over time. I also came to appreciate that the people we meet, the experiences we embrace, and the challenges we overcome often shape us just as much as our accomplishments. Each stage of the journey brought lessons that extended far beyond academics, teaching me perseverance, gratitude, and the value of continuous growth. The degree was only a symbol. The journey was the real achievement.

This Ph.D. is not the finish line. It is the beginning of a new chapter filled with opportunities to learn, contribute, and create meaningful impact. If there is one thing my journey has taught me, it is that no dream is too big, no struggle is too heavy, and no setback is final. Sometimes the longest paths lead to the most meaningful destinations, and sometimes the quietest beginnings lead to the loudest impact. For me, this journey has always been about learning, growing, and giving back, and I look forward to wherever that path leads next.

- Mohan Krishna Sunkara (@mk344567)

Midsummer sunrise / Ed Summers

Midsummer sunrise

The sun peeking over the horizon of the Atlantic Ocean. A globe of firey starstuff over calm waves, orange and red spreading out into the sky.

Evergreen releases 3.17.2 and 3.16.8 are available / Evergreen ILS

The Evergreen release team is pleased to announce that point releases for 3.17.2 and 3.16.8 are available.

These releases contain updates for Evergreen’s RESTful API suite, search fixes, and Angular interface updates.

Files and release notes are available on the Downloads page: https://evergreen-ils.org/egdownloads/

Thanks to the June release team: Galen Charlton (Equinox), Gina Monti (Bibliomation), Sarah Moody (ECDI), Andrea Buntz Neiman (Equinox), and Chris Sharp (PINES); as well as everyone who contributed fixes and testing to this release.

I hate compilers / Xe Iaso

Anubis is about to get WebAssembly-based proof of work checks so that administrators can use a non-SHA256 proof of work method to protect their websites. Part of the implementation goals of this work is that the check logic is defined in one place on both client and server. The client and server will then hook into the WebAssembly in order to make sure they're running in lockstep.

However, one small problem comes up. What do you do when the client has WebAssembly disabled? I really don't want to de-facto lock people out of websites. Anubis exists in an impossible balance of user experience, administrator experience, and developer experience and any change to any of these factors disrupts the balance for other factors.

To work around this and also fulfill the goal of having check logic defined once, I decided to take inspiration from the legendary talk The Birth and Death of JavaScript and just recompile the WebAssembly to JavaScript. Sure, the resulting JavaScript will be slower than the equivalent WebAssembly (even more so because disabling WASM usually disables the JavaScript JIT, the thing that makes JavaScript fast), but it will finish eventually. Hopefully it will be more efficient than the existing JavaScript is on lower end hardware, but research is required.

Luckily enough, the tool I need (wasm2js from the binaryen project) is packaged in Linux distributions. The bad news is that distributions ship ancient versions of it that don't get the same output as the version on my development machine's copy from Homebrew.

In order to really make sure that the output of this is deterministic (essential for reproducible builds), I need to bundle a copy of wasm2js. So I did that by building a version of wasm2js compiled to WebAssembly with wasi-sdk. The rest of the article is the tale of reproducibility woe that lead to the implementation I ended up with. Buckle up and enjoy the ride!

Reproducible builds are surprisingly hard

Aoi is wut
Aoi

Back up a sec, this doesn't make sense to me. If you have the same bytes of input to a compiler, you should get the same bytes of output assuming that the compiler flags, target, and other platform details are controlled for right? A compiler is just a deterministic function of input source code becomes output bytecode, right?

Numa is smug
Numa

lol you'd think, but no, it's not. In theory it is (and for small scale compilers it definitely is), but in practice compilers are strange and complicated beasts containing multitudes that no mere mortal can fully comprehend on their own.

There are a shocking number of ways to accidentally create nondeterministic output when doing C/C++ development. One of the easiest is to use the builtin __DATE__ and __TIME__ macros to stamp a build with the time the compiler was executed at:

// hello.cpp
        
        #include <iostream>
        
        int main() {
            std::cout << __DATE__ << " " << __TIME__ << std::endl;
            return 0;
        }
        

Building and running it once gets me this:

$ make clean && make hello.wasm && wasmtime run -W exceptions=y ./hello.wasm
        rm -f hello.o hello.wasm
        wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false  -c hello.cpp -o hello.o
        wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false  -fwasm-exceptions -lunwind --no-wasm-opt hello.o -o hello.wasm
        Jun 18 2026 00:00:59
        

Another time it gets me this:

$ make clean && make hello.wasm && wasmtime run -W exceptions=y ./hello.wasm
        rm -f hello.o hello.wasm
        wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false  -c hello.cpp -o hello.o
        wasi-sdk-33.0-x86_64-linux/bin/wasm32-wasip1-clang++ -O3 -fwasm-exceptions -mllvm -wasm-use-legacy-eh=false  -fwasm-exceptions -lunwind --no-wasm-opt hello.o -o hello.wasm
        Jun 18 2026 00:01:11
        

Even though the source code had the same bytes, the output of the compiler was wildly different.

In order for users and packagers to trust the binaries of wasm2js I'm committing to the Anubis repo, I need to make sure that you can build the same version I built, down to the same bytes. For an added bonus, you should be able to build this on your machine and get the same bytes I got.

Numa is smug
Numa

That sure does sound like a great ideal, it would be horrible if something unforeseen came up to ruin it!

Clang silently runs wasm-opt from $PATH behind your back

Among other tools like wasm2js, binaryen has a bunch of other useful tools such as wasm-opt. wasm-opt optimizes WebAssembly compiler output to let you eke out more performance. This doesn't work in every circumstance, but when it does work it makes a huge difference. As such, clang shells out to wasm-opt when doing builds.

This normally makes sense, but in this case it caused builds to fail on my DGX Spark because its version of wasm-opt is too old:

$ uname -m && which wasm-opt && wasm-opt --version
        aarch64
        /usr/bin/wasm-opt
        wasm-opt version 108
        

Compared to my workstation which installs wasm-opt from Homebrew:

$ uname -m && which wasm-opt && wasm-opt --version
        x86_64
        /home/linuxbrew/.linuxbrew/bin/wasm-opt
        wasm-opt version 130
        

Turns out that wasi-sdk and binaryen rely on the WebAssembly Exceptions extension. This is a reasonable thing to assume given that wasi-sdk mostly assumes you're building things for web browsers and 93.86% of browser users have a browser engine new enough to support it. C++ is also one of the main places where exceptions are used, so I guess WebAssembly-native exception handling removes a lot of boilerplate here.

Both wasmtime and wazero require you to flag into exception support. This is fine; we can just pass -W exceptions=y to wasmtime and use a custom runner harness for wazero. The annoying part is what happens when my arm machine's anemic build of wasm-opt sees exception handling instructions, causing it to exit. This made the build fail.

The solution was to pass --no-wasm-opt at the linking step. This removed one angle of irreproducibility.

Mara is hacker
Mara

I guess in the future we could make it use the version of wasm-opt it just built to optimize the output, but that may be a premature optimization for now.

Clang relies on address layout for ordering things

The version of clang that I use to compile wasm2js has some address-sensitive code generation hiding in its exception handling path. Raw pointer values leak into the order a handful of try_table blocks come out in. This surfaces as every build differing from the next by about 29 bytes:

-002a9af0: 2802 0441 0647 0d00 1f40 0103 0820 0241  (..A.G...@... .A
        -002a9b00: 206a 2103 2002 4138 6a20 0141 086a 10b5   j!. .A8j .A.j..
        -002a9b10: 8881 8000 2104 0b1f 4001 0304 2003 2004  ....!...@... . .
        +002a9af0: 2802 0441 0647 0d00 1f40 0103 041f 4001  (..A.G...@....@.
        +002a9b00: 0309 2002 4120 6a21 0320 0241 386a 2001  .. .A j!. .A8j .
        +002a9b10: 4108 6a10 b588 8180 0021 040b 2003 2004  A.j......!.. . .
        

To make this easier to spot, here's a partial disassembly:

  i32.load  offset=4            ;; 28 02 04
          i32.const 6                   ;; 41 06
          i32.ne                        ;; 47
          br_if     0                   ;; 0d 00
        - try_table (catch_all_ref 8)   ;; 1f 40 01 03 08
        + try_table (catch_all_ref 4)   ;; 1f 40 01 03 04
        + try_table (catch_all_ref 9)   ;; 1f 40 01 03 09
            local.get 2                 ;; 20 02
            i32.const 32                ;; 41 20
            i32.add                     ;; 6a
            local.set 3                 ;; 21 03
            local.get 2                 ;; 20 02
            i32.const 56                ;; 41 38
            i32.add                     ;; 6a
            local.get 1                 ;; 20 01
            i32.const 8                 ;; 41 08
            i32.add                     ;; 6a
            call 17461                  ;; 10 b5 88 81 80 00
            local.set 4                 ;; 21 04
          end                           ;; 0b
        - try_table (catch_all_ref 4)   ;; 1f 40 01 03 04
            local.get 3                 ;; 20 03
            local.get 4                 ;; 20 04
        

The computation is nearly identical, but the byte order is just different enough to also make the catch references differ. This also fires when you build this pinned version of wasm2js on arm64 machines because its pointer iteration order is different from it is on my workstation.

To work around this, I took two steps:

  1. Disable address-space randomization for this build using setarch --addr-no-randomize.
  2. Create known good sha256 checksums for both x86_64 and arm64 via building this program on machines I trust.

I also made a CI job ensure this:

- name: Ensure reproducibility
          run: |
            cd ./utils/wasm/wasm2js
            ./build.sh
            if sha256sum -c --status shasums.x86_64; then
              echo "OK: rebuilt modules match the recorded x86_64 checksums"
            elif sha256sum -c --status shasums.arm64; then
              echo "OK: rebuilt modules match the recorded arm64 checksums"
            else
              echo "::error::rebuilt wasm2js/wasm-opt match neither recorded checksum set on ${{ matrix.runner }}" >&2
              sha256sum wasm-opt_130.wasm wasm2js_130.wasm
              exit 1
            fi
        

To be extra sure, we have this job run on both x86_64 and arm64 hosts. I'd really love to have this be reproducible across hosts, but that's an upstream LLVM bug that I am not powerful enough to tackle. If you work on LLVM and are reading this, it would be nice to set a seed of some kind to ensure that this iteration order is fixed across architectures.

At the very least builds are deterministic within architectures. This may have to be good enough for now.

120 blocks, one story: The collective creation of the OCLC Quilt / HangingTogether

Collaboration. Serendipity. Diversity. These are the qualities that come to mind when I think about this year’s OCLC quilt and the community that created it.

The OCLC Quilters, a group of current and retired OCLC employees, have spent months creating a quilt of 120 cross-cut blocks to donate to the silent auction held during the ALA 2026 Annual Conference. The ALA BiblioQuilters annually host this auction as a fundraiser for the Christopher Hoy Scholarship, which awards a $5,000 scholarship each year to a U.S./Canadian citizen or permanent resident who is pursuing an MLS in an ALA-accredited program.

This is the fourth year in a row that the OCLC Quilters have donated a quilt to the silent auction. Their work inspired me to take up sewing about a year ago, and I’m proud to move from admirer to participant, contributing to the OCLC quilt for the first time. Although left-handed people are about 10% of the population, three of the 13 people contributing to the OCLC quilt, including myself, are left-handed. While that doesn’t affect the result, it requires a few adjustments in technique and having the appropriate scissors. Sharing advice on adapting equipment and shopping for left-handed supplies is one of the ways we support each other.

Nine people in a bright lobby pose with a large, colorful patchwork quilt featuring a grid of multicolored fabric squares; four stand behind holding the quilt Four people sit on a blue bench in front and five stand behind the benchNine of the 13 contributors to the OCLC Quilt for the ALA 2026 Annual Conference

Like all handicrafts, quilting is an activity with its own nomenclature. As a quilter and cataloger, I found myself wondering: “What controlled vocabulary terms could I use to describe the OCLC quilt?” There are several from vocabularies such as Library of Congress Subject Headings (LCSH) and Getty Art and Architecture Thesaurus (AAT). These are listed at the end of this blog.

A quilt is created from many elements that may not be individually significant but form a meaningful whole, just like a WorldCat bibliographic record. The blocks of the quilt function like data elements in a WorldCat record, with contributions from multiple individuals creating the larger work.

Assembling the quilt

OCLC quilters sewed 120 blocks, which are the fabric squares comprising the quilt’s front. The blocks are a cross-cut design—a pattern chosen because it is accessible for novice sewists and makes good use of small fabric pieces. Quilters often save these leftover pieces, called “scraps,” from other projects for future use. Reusing scraps makes quilting a sustainable craft, and quilters often share them with one another. An experienced OCLC quilter, who keeps her scrap collection organized in true librarian fashion, donated most of the fabric pieces used for the blocks.

Experienced quilters arranged and sewed the blocks together and cut the batting (soft material used between the front and back sides of the quilt). The next step, in which three layers are sewn together with a decorative stitch, is quilting. This is the strict definition of the term “quilting,” although it is often used to refer to the entire process of creating a quilt. The pattern used for the quilt stitching is called “modern ties,” and it looks a bit like tied shoelace loops.

Close-up of a colorful patchwork quilt made of bright, patterned fabric squares and cross-shaped strips; a green strip includes the white OCLC logo.An OCLC logo is incorporated into one of the quilt blocks

The final step is to sew a long strip of binding fabric around the edges of the quilt, which will keep the ends from fraying as well as being decorative. Two labels were sewn into the binding: “Made in OH” and “Is it perfect? No.” Both of these labels are accurate descriptions of this quilt, but unlike in bibliographic descriptions, a certain amount of imperfection is not only tolerated but may be considered part of the quilt’s charm.

A quilting tradition at ALA Annual

The OCLC quilt will be one of many available at the ALA BiblioQuilters silent auction during ALA Annual in Chicago, Illinois. The BiblioQuilters were founded at the 1998 ALA Annual Conference in Washington, D.C. Since 2000, the BiblioQuilters have had a silent auction of quilts every year except 2020 and 2021 (because of the pandemic). The quilts are usually available to view and bid on near the registration area. If you are attending ALA in Chicago, I highly recommend you visit the auction table to view them. After ALA, you may be inspired to browse the shelves of your local public library for 746.46, the Dewey Decimal number for quilting.

Subject vocabulary terms

For those readers who appreciate quilting and metadata, the following controlled vocabulary terms reflect concepts discussed in this blog. You might even find it fun to match the concepts to the natural language descriptions!

Getty Art and Architecture Thesaurus terms

batting

binding (textile material)

blocks (quilt components)

fabric scissors

quilting

Library of Congress Subject Heading terms

Quilting

Sewing—Left-handed techniques

Textile fabrics

The post 120 blocks, one story: The collective creation of the OCLC Quilt appeared first on Hanging Together.

Author Interview: Cynthia Pelayo / LibraryThing (Thingology)

Cynthia Pelayo

LibraryThing is pleased to sit down this month with groundbreaking author and poet Cynthia Pelayo, who in 2022 became the first Puerto Rican and first Latina to win a Bram Stoker Award after her Crime Scene took the prize in the Poetry Collection category. Her Into The Forest And All The Way Through was a 2020 nominee, also in the Poetry Collection category, while her Children of Chicago was a 2021 nominee in the Novel category. Pelayo earned a BA in Journalism from Columbia College Chicago, a MS from Roosevelt University, and a MFA in Writing from the School of the Art Institute of Chicago. She is currently pursuing a PhD in English. Her MFA writing thesis, Lotería, was republished in 2023, winning an International Latino Book Award Silver Medal in the Best Collection of Short Stories category. A co-publisher of Burial Day Books, which focuses on horror writing, she is the author of numerous other books, stories and poems, including novels such as The Shoemaker’s Magician (2023), Forgotten Sisters (2024), and Vanishing Daughters (2025). Her new novel, It Came from Neverland, a work of horror inspired by the classic Peter Pan, was published by Crooked Lane Books earlier this month. Pelayo sat down with Abigail this month to discuss her new book.

Tell us a little bit about It Came from Neverland. How did the idea for the story first come to you?

Like many people, I grew up watching the Disney version of Peter Pan, and then I remember watching Hook with Robin Williams and being captivated, seeing Peter Pan as an adult who had to remember who he was. There was an older Wendy in that film, and that always stayed with me because I really wanted to know what Wendy’s story looked like aged into young adulthood.

When I went back to J.M. Barrie’s Peter and Wendy and the original play, I worked out that Wendy would be in her early twenties at the start of the First World War. And then I learned that many young men at that time lied about their age to enlist, some of them were barely more than boys. It all felt like a perfect juxtaposition, these boys going off to war and Wendy’s trauma round caring for the Lost Boys from Neverland.

A woman of that time period would certainly not be believed if she tried to tell the truth about what she experienced as a child in Neverland, and so that certainly played into her experience. Then I thought well, Wendy at this age would certainly be in a position that reflected her character, and schoolteacher fit perfectly. The story wrote itself, because I knew that Wendy would do all that she could to protect those children and I also knew Peter Pan would surely return to whisk more children away to Neverland. So this story is that tale, what does she do to stop Peter Pan.

What drew you to Peter Pan, and what made you feel it was horror?

Peter Pan without Wendy Darling is just a boy screaming into the dark. The story only works because Wendy agrees to go with him to Neverland, and she goes because she is sweet and kind and she believes him. Wendy’s only failure here was that she had a good heart, which is just so sad because her being nice is why she was taken advantage of. She trusted someone who was manipulating her. Peter told her she was special, but what he really meant was that she was useful. She was given the role of mother to the Lost Boys, not because she was truly loved or valued, but because someone needed to do the mending. This is not fantasy. This is a domestic horror story dressed in fairy dusty.

In Peter and Wendy we’re also essentially told that growing up is a curse, but I push back on that. Growing up is the adventure, becoming yourself, and gaining autonomy is the gift.

Peter’s entire pitch was stay here, never change, never leave me. Shrink. Lost yourself to praise me. That’s not love. That’s control.

The horror was always there. I just removed the glitter.

What is it about fairy tales that speaks to you?

Fairy tales were the very first stories I was told as a child. “Little Red Riding Hood,” “Hansel and Gretel,” “Cinderella,” more. I hold all of them dear, even though many of them have a thread of terror, but I suppose that’s why I’m the writer I am today.

What I’ve come to understand is that fairy tales are an early societal warning system, in a way. They prepared us for danger, and that’s why they still work. Little Red Riding Hood is everyone who has been told not to talk to strangers on their way home. Bluebeard is everyone who has been warned to be cautious with suitors. Snow White is every person who has fallen victim to the cruelty of jealousy. These stories survived centuries because beneath them there is some truth that can be applied to many of our experiences today. They encode the things that we should say out loud, but don’t because of all of those strange polite society rules, things like – don’t trust the stranger who flatters you, the beautiful thing is likely the trap, or even, the person who promises you forever can very well be the one who seeks to destroy you.

When looking at all of these through a horror lens, they echo to the horror writer what is our job – and that is to tell the truth. Horror is the genre of truth, to highlight the danger, to be a witness to survival, more. So much of what fairy tales do speak to this.

Are there other classic works you’re interested in transforming?

Yes, and the one upcoming is a Frankenstein retelling titled Everina from Union Square & Co. I have more, but I can’t mention those quite yet.

Tell us a little bit about your writing process.

I generally write in the early morning. In the evenings that’s when I tend to answer email or work on lectures for any workshops I’m teaching or work on any of my own homework.

In terms of my actual writing sessions, I read before I write, generally I will read some poetry and then start writing. If you’re asking about the big questions regarding how do I create something, I think I’m a pretty methodical writer in that yes, I allow discovery to happen, but as of recent, there is a lot of researching, planning, and pre-writing that goes into my actual writing. Then there is editing, and that is an entirely different process. I tell writers to think of these processes as three separate demands, research and preparation requires one aspect of your brain, editing requires a different aspect of your brain and the actual writing, which is a completely different process. When you are writing you are creating, allow yourself to have fun and explore when you’re actually writing.

What comes next for you?

Something Followed Us Home: Tales of Latiné Horror comes out September 29 from Simon & Schuster. It’s an anthology I edited that features Mariana Enríquez, Agustina Bazterrica, Mónica Ojeda, Isabel Cañas, Daniel José Older, Zoraida Córdova, and others, with a foreword by Brenda Lozano. Latiné horror isn’t a subgenre, it’s a tradition that I’m grateful to have had the opportunity to share with readers.

After that, Everina, which is my Frankenstein retelling.

Tell us about your library. What’s on your own shelves?

I read a lot of poetry and classics. My bookshelves reflect that, so we have Lorca, Vallejo, Pizarnik, Anne Sexton, Gwendolyn Brooks, Adrienne Rich, Dickens, Dostoevsky, Homer, Steinbeck, Tolstoy, more. Yes, there’s horror as well, Shirley Jackson, Angela Carter, Mariana Enríquez, Carmen Maria Machado, Daphne du Maurier. And, of course fairy tales, so many fairy tales and non-fiction texts analyzing fairy tales.

What have you been reading lately, and what would you recommend to other readers?

I am reading Piranesi by Susanna Clarke. I love world building, and here, we have a world that operates on its own logic, the architecture of the space, and the feelings it evokes. Clarke gives us infinite halls that become both prison and sanctuary, and it’s this tension that I’m drawn to as both a reader and writer.

Chatbots vs. Ozone / David Rosenthal

Source
Back in February I posted The Kessler Syndrome, which also included a brief section mentioning the impacts of the proposed megaconstellations on the environment, specifically global warming from CO2 and black carbon, and depletion of the ozone layer. Three months earlier Anton Petrov had examined the last of these in Risk of Ozone Layer Destruction from Internet Satellite Swarms and Rocket Fuel. He has now followed up with SpaceX Is Conducting a Giant Chemical Experiment on Our Atmosphere Without Realizing. Below the fold I survey the papers Petrov cited and a few others.
The papers involved are, in date order, as follows together with extracts from their abstracts:

Impact of Rocket Launch and Space Debris Air Pollutant Emissions on Stratospheric Ozone and Global Climate by Robert Ryan et al (9th June 2022):
Rockets, unlike other anthropogenic pollution sources, emit gaseous and solid chemicals directly into the upper atmosphere. We compile inventories of these chemicals from rocket launches in 2019 and projections of future growth and speculative space tourism activity. We incorporate these in a 3D atmospheric chemistry model to simulate the impact on climate and the protective stratospheric ozone layer. We find that loss of ozone due to current rockets is small, but that routine space tourism launches may undermine progress made by the Montreal Protocol in reversing ozone depletion in the Arctic springtime upper stratosphere. The BC (or soot) particles from rockets are also of great concern, as these are almost five hundred times more efficient at warming the atmosphere than all other sources of soot combined.
Note that even four years ago it was already clear that the space industry was both depleting ozone and aggravating global warming. But this was before the scale of the proposed mega constellations was evident.

Metals from spacecraft reentry in stratospheric aerosol particles by Daniel Murphy et al (7th September 2023):
So far, models of spacecraft reentry have focused on understanding the hazard presented by objects that survive to the surface rather than on the fate of the metals that vaporize. Here, we show that metals that vaporized during spacecraft reentries can be clearly measured in stratospheric sulfuric acid particles. Over 20 elements from reentry were detected and were present in ratios consistent with alloys used in spacecraft. The mass of lithium, aluminum, copper, and lead from the reentry of spacecraft was found to exceed the cosmic dust influx of those metals. About 10% of stratospheric sulfuric acid particles larger than 120 nm in diameter contain aluminum and other elements from spacecraft reentry. Planned increases in the number of low earth orbit satellites within the next few decades could cause up to half of stratospheric sulfuric acid particles to contain metals from reentry.
Much of the reentry burn happens above the stratosphere, and it takes time for the aluminum nanoparticles to drift down to the levels where they were collected. So the 10% number represents pollution from an earlier period with fewer reentries that the 2020s. Murphy notes that:
Most of the meteoric mass is deposited at altitudes between 75 and 110 km by a very large number of sub-millimeter meteoroids. Reentering spacecraft, which are larger and moving more slowly, ablate between 40 and 70 km over a ~300 km long footprint
His samples were collected at 19km altitude.

Potential Ozone Depletion From Satellite Demise During Atmospheric Reentry in the Era of Mega-Constellations by José P. Ferreira et al (11th June 2024):
This paper investigates the oxidation process of the satellite's aluminum content during atmospheric reentry utilizing atomic-scale molecular dynamics simulations. We find that the population of reentering satellites in 2022 caused a 29.5% increase of aluminum in the atmosphere above the natural level, resulting in around 17 metric tons of aluminum oxides injected into the mesosphere. The byproducts generated by the reentry of satellites in a future scenario where mega-constellations come to fruition can reach over 360 metric tons per year. As aluminum oxide nanoparticles may remain in the atmosphere for decades, they can cause significant ozone depletion.
Ferreira et al confirm the potentially long delay between reentry and the nanoparticles reaching the ozone layer and depleting it:
we find that these reentry byproducts may take up to 30 years to settle from the top of the mesosphere into the stratospheric ozone layer. Upon reaching an altitude of about 40 km, aluminum oxides catalyze chlorine activation which promotes ozone depletion. This suggests that concentrations of aluminum oxide compounds may start increasing in the mesosphere well before reaching the stratospheric ozone layer. This would introduce a noticeable delay between the beginning of the injection process when orbiting bodies are decommissioned and the eventual ozone-depletion consequences in the stratosphere.
Investigating the Potential Atmospheric Accumulation and Radiative Impact of the Coming Increase in Satellite Reentry Frequency by Christopher Maloney et al (21st March 2025):
A lack of observations and validated models of reentry demise limits our ability to simulate the complex aerosols associated with reentry, which makes estimating the climate impacts difficult. Aluminum is a primary satellite component and will likely be emitted during reentry vaporization in the form of alumina. Unmodified alumina is a useful approximation for metallic reentry aerosol. In this study, we simulate a potential yearly emission of 10,000 metric tons of alumina from reentering space debris. We investigate how the location of atmospheric accumulation, aerosol size distribution, and radiative properties of reentry alumina impacts the middle atmosphere. We find that 20,000–40,000 metric tons of alumina accumulates at high latitudes between 10 and 30 km in both hemispheres. Small changes in mesospheric heating rates lead to 1.5-K temperature anomalies in the middle atmosphere at high latitudes. These temperature anomalies are accompanied by changes in wind speed in the polar vortex.
So there are thermal effects on the climate as well as the effects on the ozone layer.

Near-future rocket launches could slow ozone recovery by Laura Revell et al (9th June 2025):
To understand if significant ozone losses could occur as the launch industry grows, we examine two scenarios. Our ‘ambitious’ scenario (2040 launches/year) yields a −0.29% depletion in annual-mean, near-global total column ozone in 2030. Antarctic springtime ozone decreases by 3.9%. Our ‘conservative’ scenario (884 launches/year) yields −0.17% annual, near-global depletion; current licensing rates suggest this scenario may be exceeded before 2030. Ozone losses are driven by the chlorine produced from solid rocket motor propellant, and black carbon which is emitted from most propellants. The ozone layer is slowly healing from the effects of CFCs, yet global-mean ozone abundances are still 2% lower than measured prior to the onset of CFC-induced ozone depletion. Our results demonstrate that ongoing and frequent rocket launches could delay ozone recovery. Action is needed now to ensure that future growth of the launch industry and ozone protection are mutually sustainable.
Note that this paper addresses only the ozone depletion from launches, not from reentry. But their 'ambitious' scenario of 5.6 launches/day is far short of Musk's ambitions, let alone the other planned megaconstellations. My understanding is that the 2040 launches/year in their scenario are of Falcon 9 class vehicles but "only 4.4% of launches are using vehicles designed for re-entry", which is implausible. But the mega-constellations can't be built or maintained with Falcon 9s.

Will Lockett is, as one should be, skeptical of Musk's claims. In Musk’s Orbital Data Centre Idea Is Getting More Stupid By The Day he analyzes the claimed "million satellite data center" assuming it is built, as Musk claims, with Starship but over 15 years, a longer timescale than Musk's:
To achieve that, they would need to launch 120,000 satellites per year. Over the 15 years, they would launch 1.8 million satellites, but 800,000 of them would fail (as part of our 9% failure rate), leaving a total operational fleet of one million satellites. This equates to 3,158 Starship launches per year, or nearly nine launches per day. For some context, the current launch rate for Starship is just five per year.
...
In order to keep a million satellites in the constellation, it needs to be maintained. So, each year, SpaceX would have to launch 90,000 AI Sat Minis to replace the roughly 9% of the constellation that failed. That equates to 2,368 Starship launches per year, or 6.4 per day.
That's 9 launches/day for 15 years then 6.4 launches/day indefinitely of a much rocket that is vastly bigger than Falcon 9 and is completely re-usable.

Of course, these claims are ridiculous - neither logistically nor economically feasible. But assuming Starship or a competitor such as Blue Origin does manage to create a reliable, reusable, 100 ton to LEO launch vehicle, there will be a lot more mass in LEO and a lot more of it reentering.

Measurement of a lithium plume from the uncontrolled re-entry of a Falcon 9 rocket by Robin Wing et al (19th February 2026):
A 10-fold enhancement of lithium atoms was detected at 96 km altitude by a resonance lidar at Kühlungsborn, Germany, approximately 20 hours after the uncontrolled re-entry of a Falcon 9 upper stage. The upper-atmospheric extension of the ICON general circulation model, nudged to ECMWF, was used to calculate winds. Backwards trajectories, including wind variability as measured by radar, traced air masses to the Falcon 9 re-entry path at 100 km altitude, west of Ireland. This study presents the first measurement of upper-atmospheric pollution resulting from space debris re-entry and the first observational evidence that the ablation of space debris can be detected by ground-based lidar. The analysis of geomagnetic conditions, atmospheric dynamics, and ionospheric measurements supports the claim that the enhancement was not of natural origin. Our findings demonstrate that identifying pollutants and tracing them to their sources is achievable, with significant implications for monitoring and mitigating space emissions in the atmosphere.
The effect of lithium and other spacecraft ingredients on the ozone layer doesn't appear to have been studied compared to aluminum. To be fair, there will be a lot more aluminum.

Radiative Forcing and Ozone Depletion of a Decade of Satellite Megaconstellation Missions by Connor Barker et al (14th May 2026):
We use a global inventory of launch and re-entry emissions covering the onset of the megaconstellation era (2020–2022), and project these to 2029 based on 2020–2022 growth rates. We implement this inventory into a 3D atmospheric chemistry model to determine the impacts of megaconstellations on the ozone layer and climate. We find that global stratospheric ozone depletion from all mission types is relatively small compared to surface sources and megaconstellation missions only account for about one-tenth of this depletion. This is because rockets launching megaconstellations almost all use kerosene, a large source of black carbon or soot particles, but not of chemicals such as chlorine that directly destroy ozone. Soot from rockets absorbs sunlight, warming the upper layers of the atmosphere and decreasing the amount of sunlight reaching Earth's lower atmosphere, causing it to cool. Megaconstellation missions are responsible for about half of this climate effect. In this regard, rockets launching megaconstellations and other missions are like small-scale stratospheric aerosol injection experiments without forethought for potential unintended consequences.
Again, this paper addresses only atmospheric impacts from launches, not from reentries. And, the launch rate for 2020-2022 is far less, and uses much smaller rockets, than the proposed "million satellite data center" and its competitors.

An Open Hardware TPU on Your Desk / Harvard Library Innovation Lab

The open-source movement emphasizes the power of freely modifiable, flexible code to support transparency, collaboration, and building outside vendor lock-in. Open hardware extends that logic to the physical layer: chips you can read, modify, and build on. All software runs on hardware, and over the past few years, the ground under the hardware industry has been shifting. Since 2022, the United States, the Netherlands, and Japan have progressively tightened export controls on advanced chips and the equipment used to manufacture them; China has responded with a state-backed effort to reproduce every layer of that supply chain at home. Policy analysts now routinely describe the trajectory as a “fragmentation” or “decoupling” of the global semiconductor market into separate technology spheres. Hardware costs are climbing across the board, so accessing open hardware feels all the more relevant for a group building open-source software.

Open hardware doesn’t make the chips any cheaper. What it changes is what a chip you already own is allowed to become or what kinds of application-specific chips you can create. A single FPGA (Field-Programmable Gate Array) on a shelf can be a video codec today, a custom search accelerator the next, and a faithful copy of a decommissioned architecture the day after that. The unit cost is what it is; the value you can extract from that unit is no longer fixed by a vendor. For institutions whose time horizons stretch over decades, like libraries, archives, and public-interest research groups, that reconfigurability is the core of the open-hardware argument that compounds.

There are no known open-source hardware implementations of Google’s TPU, the in-house chip family Google designed to accelerate neural network math. The silicon is proprietary and not directly purchasable by consumers except in edge TPU form (e.g., Google Coral). This field note reports a small experiment porting the OpenTPU project, a Python simulation of Google’s TPU published by UCSB’s ArchLab, to an inexpensive FPGA board to explore the pros and cons of using more open hardware. To keep the experiment lightweight, most of the code was written by AI coding agents, with a human directing the work.

Why FPGAs, For A Lab Like This One

An FPGA (Field-Programmable Gate Array) is a chip whose internal logic can be reconfigured. You describe the circuit in a hardware description language, SystemVerilog, compile it to a bitstream, and load it onto the board. A CPU runs your program. An FPGA becomes your program. FPGAs are the obvious place to start, because they’re the one piece of reconfigurable silicon a small lab can actually buy and program today.

The practical difference lies in the shape of the problems they solve. A CPU is a generalist reading a manual one step at a time. A GPU is a factory floor of thousands executing the exact same standard math in unison. An FPGA is a machine whose gears are configured to fit exactly one algorithm. They excel at problems with strange shapes. If your workload involves multiplying massive, uniform matrices to train a language model, you want a GPU. But if your work requires parsing millions of irregular text strings as they stream, finding exact bit-level matches across a sprawling archive, or piping data through a custom hash without ever pausing to fetch instructions from memory, an FPGA is arguably the more elegant approach. A dataset is sometimes only as legible as the software that reads it, and that software is only as runnable as the hardware underneath. An open FPGA design can preserve a faithful copy of obsolete circuitry, thereby preserving the means to read data, not just the data itself.

Diagram illustrating FPGA architecture. Source: Wevolver

For an institution like the lab, that distinction matters. Libraries, archives, public-interest research groups, and smaller labs deal with workloads and questions that are awkwardly sized: too large for a laptop, too specialized to deserve a recurring cloud bill, too long-running for any one grant cycle. For a well-resourced lab, the answer is cloud GPUs. For everyone else, the answer has historically been to wait or to scale down the question. That is why open hardware matters here: it gives smaller institutions a path to specialized computing without waiting for ideal market conditions.

A $300 board on a desk can run a custom-designed circuit, tailored to one workload, indefinitely, without anyone’s permission and without a metered bill. A physical circuit sheds the overhead of an operating system and the constant fetch-and-exec cycle, drawing a fraction of the power of a standard CPU. For a public-interest lab, this efficiency makes running some workloads both financially and environmentally sustainable.

TPU-style architectures are particularly well-suited to this kind of work because of the systolic array at their core. A systolic array is a grid of small multiply-add units that pass partial results to their neighbors on every clock tick, making it efficient for matrix math because data flows through the grid rather than being fetched repeatedly from memory. That structure is designed for exactly the dense-matrix operations that underpin embedding generation, similarity search, and the neural network inference behind modern document analysis. This is the kind of regular, spatial structure FPGAs are built to host. A grid of identical small units, wired to their neighbors, all ticking together, is close to a literal description of what an FPGA’s fabric already is.

Designs published openly in SystemVerilog are reusable across institutions, as open code is. What has kept this out of reach was never the silicon; it was the labor: months of vendor-tool learning, a small group of knowledgeable practitioners, and debugging cycles unique to physical systems. That cost is what AI coding agents have started to chip away at, making the open-hardware case more practical.

Porting OpenTPU and the Silicon Boundary

OpenTPU is a published academic re-implementation of Google’s first-generation TPU implemented in Python code that models the hardware’s behavior, not software meant to run on it. The hardware target of this experiment was the Alchitry Pt v2, a roughly $300 board built on an Xilinx Artix-7 chip, with an add-on that exposes USB 3.0 to a host PC. I worked with AI coding agents to describe goals in plain English, review edits, and iterate. A sandbox repository came first: blink an LED, echo bytes, talk to the USB chip. That groundwork paid off within hours of starting the real port. The OpenTPU translation itself — the systolic matrix unit, the weight memory, the instruction decoder, the activation logic, and the host interface — went through seven planned phases. A few days of wall-clock time later, the SystemVerilog testbench ran the same matrix-multiply program as the original Python simulator and produced the same answer.

Photo of the Alchitry Pt v2 FPGA board used in the experiment. Source: Jenevieve Haggard, HLS LIL

Translating Python that describes hardware into SystemVerilog that synthesizes hardware turns out to be something AI agents are somewhat capable of. The source is unambiguous, and the testbenches give instant feedback. The hard part was always the silicon boundary: a physical board, a vendor toolchain with subtle caching behavior, a USB chip with several gotchas. What worked was a scaffold around the agent that provided deterministic simulation tests for everything testable in software, plus on-board LEDs wired to specific finite-state-machine states, so a human eye could see what the testbench could not. A small notes tool called “cq” (by Mozilla) recorded what each session learned and made future sessions read those notes first. By the end, the store held 48 entries, almost all painfully, expensively earned.

Performance Realities and the USB Latency Bottleneck

By the end, the FPGA produced output that matched a NumPy reference lane-for-lane on a 200-vector benchmark. End-to-end, it was about 2x faster than the original Python simulator; on compute alone, about 4x. It was also comfortably slower than a CPU running optimized BLAS on small inputs. The point of the result is not that the FPGA wins everywhere, but that it shows where open hardware can be useful.

At this workload size, USB round-trip latency dominates the FPGA’s time budget. FPGAs win when data is moved to the device once, stays there, and is processed many times. Tiny inputs in, tiny outputs out, on every call, is the worst case for this design. We are running the worst case since we’re just at the “make it work” stage. PCIe-attached FPGAs sit much closer to the CPU and avoid this bottleneck entirely; with batched workloads on that kind of board, the compute-side 4x advantage we already see should carry through to the end-to-end number. They’re the natural next step for any workload where the dev-board numbers are encouraging enough to justify the cost.

Open Hardware and the Lowered Cost of Specialization

The lab occasionally runs computations large enough to be uncomfortable: searches over case law corpora and analyses across millions of documents. A pattern for designing a small piece of custom hardware that does those tasks well is a real option for problems that do not fit on a laptop and do not deserve a cloud bill.

More than preservation, open hardware offers freedom of access and the power to control your own collections, your own work, and your own computing. That opens up questions we are only beginning to ask. As archival media like silica and DNA-based storage mature, could open hardware lower the cost or complexity of building the readers required for those formats? As AI model architectures keep shifting, could a reconfigurable board keep pace in a way fixed silicon can’t? FPGAs already take on inference tasks at places like CERN (see their FPGA Developers Forum), adapting to software advances with matching hardware; could FPGAs do the same for our tasks?

AI coding agents are reducing the cost of specialized work that used to require a dedicated team. Custom hardware has historically been among the most specialized. If a small library research lab can produce a working SystemVerilog port of a real architecture in a few weeks, the question of what else has quietly come into reach is suddenly much broader.

From Inherited Systems to Strategic Decisions / Information Technology and Libraries

The author examines the migration of Indiana University Libraries’ interlibrary loan platform, ILLiad, from a locally-hosted server to OCLC hosting through the perspective of a new department head inheriting this critical technology decision. He explores how staffing changes, lost institutional knowledge, recurring system instability, and limited technical capacity prompted a reassessment of long-standing local practices. The piece outlines research, consortium consultation, approval processes, implementation challenges, authentication and workflow issues, and post-migration tradeoffs. Ultimately, the author offers practical guidance for new leaders tasked with managing inherited systems, vendor relationships, imperfect information, and strategic change in complex academic library environments.

Locked Out of the Library / Information Technology and Libraries

While incarcerated students face many challenges when commencing higher education, a lack of access to the internet is a considerable barrier. This technological exclusion has implications for the delivery of course materials, most of which are offered only electronically. A project team from Curtin University Library sought to understand and address the challenges faced by incarcerated students in accessing library services, particularly ebooks and audiovisual content. It was found that restrictions related to contract terms, digital rights management, and copyright contribute to a reactive and uncertain situation for library services. This article outlines the state of the problem and offers possible pathways academic libraries can take to improve the state of information access for incarcerated students.

Improving Database Discovery and Understandability by Identifying and Reducing A–Z List Jargon / Information Technology and Libraries

Countless research questions arise when investigating connections between library resource discovery and student success. Existing literature explores best practices of database description language and style, the usability of database A–Z lists, and library resource jargon. Academic libraries continue to grapple with these challenges in resource discovery, even as online searching behavior evolves and new research tools emerge. A research team at the University of Arizona Libraries builds on the literature by examining these topics with a focus on the impact of a user’s academic discipline, university affiliation (faculty, staff, or student), and research experience on their understanding of database terminology, resource content and applications, and A–Z list type filters. The authors conducted an environmental scan of library websites along with several usability tests to identify and reduce library and disciplinary jargon on their A–Z list to make databases more understandable and approachable to all users. This article presents the results of these assessments as a case study for exploring external and internal factors that impact users’ understanding and discovery of databases.

Improving Accessibility of Electronic Course Reserve PDFs to Users with Disabilities at Hunter College Library / Information Technology and Libraries

By April 2027 and 2028, institutions covered by Title II of the Americans with Disabilities Act are expected to be legally required to ensure that digital content created or used at the institution is accessible as defined by Web Content Accessibility Guidelines (WCAG) 2.1 Level AA. The new law strongly emphasizes accessibility of course materials—including PDFs. This case study demonstrates how an R2 academic library staff can enhance the accessibility of PDF course materials by improving the accessibility of electronic reserves (e-reserves) PDFs at Hunter College Library (HCL).

Processes described here can be adapted by other libraries. Supporting campuses’ work to make course readings accessible may be a natural role for academic libraries. Locating or procuring the best quality version of a text available to the institution is a critical task for which libraries are optimally equipped. Furthermore, when readings are available only in print format, libraries can create higher-quality scans than those typically produced when the task is left to individual faculty members.

HCL began improving the accessibility of e-reserves PDFs in 2020. This article shares the knowledge acquired, established processes, limitations, and future directions. The workflow comprises checking each e-reserves reading. For those deemed poor, we locate an HCL collection or open access copy, purchase a digital copy, or remediate. Remediation involves optical character recognition (OCR), fixing errors therein, correcting reading order, removing repetitive headers and footers, and tagging. Literature the authors found on libraries proactively correcting OCR and tagging PDFs—that is, preceding a user’s request—was sparse, with the exceptions of the University of Toronto and the University of Michigan. Literature about proactively doing so for e-reserves was even narrower. This case study is intended to help fill the gap.

Generative AI Meets Cataloging Practice / Information Technology and Libraries

This study evaluates the performance of four generative AI models—ChatGPT, DeepSeek, Gemini, and Copilot—in generating descriptive metadata for bibliographic resources. Models were tested on a small, diverse set of resources using four prompt types: a basic prompt, a basic prompt with an example, a detailed prompt referencing Resource Description and Access (RDA) guidelines, and a detailed prompt with an example. Results show that both detailed RDA guidance and the inclusion of sample outputs improved metadata quality, particularly in formatting and field structure. While DeepSeek and ChatGPT showed better performance on the tasks, all models displayed limitations in parsing and following the prompts, using descriptive metadata fields, analyzing subject headings, and assigning URIs. These findings suggest that while generative AI holds potential to assist in metadata creation, its current capabilities fall short of meeting cataloging standards without human review.

Case Study of the Implementation of AI Primo Research Assistant (Beta Version) in Academic Libraries in Poland / Information Technology and Libraries

One of the generative artificial intelligence tools developed for use in libraries, including academic libraries, is the AI Primo Research Assistant. Of the 65 academic libraries in Poland, only 19 have access to software that supports this tool. In practice, only 9 libraries have implemented it (data from March 2025). For the purposes of this study, original research was conducted to assess the implementation status of the Primo Assistant in academic libraries in Poland. Two anonymous surveys were developed for this purpose and sent to libraries that had implemented the feature, as well as to those with the capability to run the Primo Assistant (i.e., the Primo VE Discovery admin role), in order to gather information on why they had chosen not to implement it. The analysis revealed several positive aspects, mainly a reduction in the workload of staff tasked with preparing publication lists on topics requested by library users. Some concerns were also raised by library employees, mainly regarding the reliability of the metadata provided and the accuracy of the recommended publications. The study also revealed a general lack of awareness and a need for further implementation. This paper presents the first scientific study focused on the implementation of the AI Primo Research Assistant in Polish academic libraries.

Enhancing Information Technology Governance at the University of Riau Library / Information Technology and Libraries

Effective information technology (IT) governance is essential for the University of Riau (UNRI) Library to achieve its research and educational objectives. This paper presents a qualitative pilot study investigating the library’s current IT governance processes, focusing on two COBIT 5 processes—DSS01 (Manage Operations) and DSS05 (Manage Security Services). These processes were selected in consultation with library and IT leadership due to their direct relevance to ensuring operational reliability and safeguarding the library’s information assets. COBIT 5 principles and capability models guide the assessment, emphasizing regulatory compliance, performance monitoring, and stakeholder collaboration. Using a detailed questionnaire and capability model, the study evaluates base practices and work products for DSS01 and DSS05. Results indicate varying proficiency levels, with DSS01 at level 0 and DSS05 at level 1, highlighting significant gaps between current and desired capability levels. Recommendations include implementing standard operating procedures, enhancing security measures, and optimizing resource management. In conclusion, the findings underscore the need for standardized processes, continuous monitoring, and alignment with established frameworks like COBIT 5. By addressing identified gaps and implementing recommended improvements, the UNRI Library can strengthen its IT governance, enhance operational efficiency, and better support its academic mission.

Access Reframed / Information Technology and Libraries

This study critically explores the transformative potential of human-computer interaction (HCI) in reimagining African public libraries as dynamic, user-centered, and culturally grounded spaces. Based on a literature review and comparative analysis of libraries across several African countries, the research investigates how HCI principles can enhance user engagement, usability, and inclusivity, particularly in multilingual, resource-constrained, and postcolonial contexts. The paper situates libraries as sociotechnical infrastructures that mediate between technology, local knowledge systems, and community needs, and argues for the importance of participatory and culturally responsive design approaches in library digitization efforts. The findings highlight significant gaps in current implementations of HCI within library services, including the lack of localized interfaces and limited user involvement in design processes. The study concludes by offering practical recommendations for integrating HCI into library development strategies and advocating for the co-creation of digital public spaces that reflect and empower Africa’s diverse knowledge ecologies. In doing so, the paper contributes to the growing discourse on decolonial approaches to technology and the future of public libraries in the digital age.

The Kids Are All Right / Dan Cohen

A banner that says "2026"

Writing has been light around here recently for a wonderful reason: our twins graduated from their respective colleges over the past month, and we have been in nearly nonstop revelry (and packing, and schlepping…). We are so fortunate to have two great kids; I’m super proud of them.

Speakers at our kids’ commencements, thankfully and remarkably, said little about artificial intelligence, but they did talk a lot about the complex circumstances and especially the psychology of this rising generation, and offered advice on how the graduating seniors should move forward in life given significant headwinds. I suppose it’s tempting to describe and analyze the troubles facing each graduating class, and provide sage guidance in response to the historical moment, but I’m not sure that my kids, their friends, and their generation overall are so very different from any other, or that any distinct advice is needed.

The Great Class of 2026 is, I’m afraid, just like every graduating class: happy and sad, confused and hopeful about the future, striving and procrastinating. Young adults, in other words. Sure, they seem to be impacted by new technology and our dreadful national politics and nerve-racking global challenges, but hasn’t it always been so? My college class graduated into a recession, the rise of the internet, the fall of the Berlin Wall, the chaotic end of the Soviet Union, and a messy war in the Middle East — all of these dominoes falling after a childhood in which we were fairly sure we would perish at any moment in a nuclear war. That was a lot to absorb! Back then, commencement speakers picked up on our anxiety, which had apparently morphed into excessive irony and a general lack of motivation, epitomized by the title and content of a Richard Linklater film: Slacker.

It may have taken some time, but we muddled through. So did the generation another turn of the clock back from ours (Vietnam, stagflation, etc.) and the generations before that (pick your World War and/or the Great Depression, etc.). History is, unfortunately, a procession of horrible developments, but also a showcase of astonishing resilience and creativity. Is it so Pollyannaish to simply say that Gen Z will also find a way forward, and frankly might be better off without pithy advice from the olds? Must we unconsciously mimic the opening of Woody Allen’s fictional commencement address, raising the graduating class’s blood pressure by declaring, “More than at any other time in history, mankind faces a crossroads. One path leads to despair and utter hopelessness. The other, to total extinction. Let us pray we have the wisdom to choose correctly”?

Instead, I saw hope in every joyful row of begowned seniors, students who, despite all of the radical changes and stressful tensions around them, had nevertheless maintained their curiosity and maybe even cultivated a passion during college. Students who found their special niche in music, writing, art, or science, who felt compelled to listen to it all, read it all, see it all, or experiment late into the night, regardless of the requirements of the classroom. I have a feeling that this kind of deep and abiding engagement, born not from careerism but from genuine profound interest, will serve these graduates well in the years ahead. As it always has.


Books I Have Not Written

The class-action lawsuit of authors against Anthropic and its subsequent settlement have helpfully informed me of the many, many other writers named Daniel Cohen, because the settlement administrators, in their quest to match authors and texts, have sent emails and letters asking if I am the Dan Cohen who wrote this or that book. There are too many volumes by The Daniel Cohens to list in full here, but as a public service to a handful of special fellow Dans, I hereby declare:

I am not the Daniel Cohen who wrote The Monsters of Star Trek, but I would wager 100 quatloos on Triskelion that I would greatly enjoy meeting that Dan Cohen.

I am #$%@# mad I am not the Daniel Cohen who penned Famous Curses, because my family is on a mission to bring back the useful exclamation “Gordon Bennett!

I did not write Southern Fried Rat and Other Gruesome Tales, but, based on the delightful cover of this not-me Daniel Cohen book, I probably read it at camp the year it was published.

My final confession: The settlement administrators believe there is a Daniel Cohen who authored a book titled Final Confession, but, alas, I am not the one.

My conscience is now clear.


Tree 2 / Ed Summers

Tree 2

Same as Tree 1 but after playing with some filters on my Android phone before uploading.

Tree 1 / Ed Summers

Tree 1

A reflection of a tree in the Northwest Branch river, cropped and turned upside down.