Traditionally, the big risk in HODL-ing cryptocurrencies has been their volatility. Fortunately, now the US government is all-in on cryptocurrencies, this risk is greatly reduced. Progress moon-wards is virtually guaranteed, so it is reasonable to invest a small part of your portfolio into Lamborghinis. HODL-ers can rest easy while the rest of the coins in their wallets appreciate because they are protected by strong cryptography (at least until the advent of a sufficiently powerful quantum computer). But progress moon-wards exacerbates some other risks to HODL-ers, as I explain below the fold.
There is no need to wait while the semiconductor industry develops quantum computers in order to defeat the cryptography protecting the HODL-ers wallets. Several effective techniques are already available. North Korea's recent record-breaking $1.5B heist illustrates that deploying malware via a Software Supply Chain Attack is capable of compromising even industrial-strength multi-signature wallets.
3,250 BTC (~$330 million) were apparently stolen from a bitcoin holder and then quickly moved through multiple exchanges and swapped for the Monero privacycoin. Such a massive swap into Monero was apparently enough to cause the Monero price to spike from around $230 to as high as around $330, before retracting somewhat.
Another remarkably effective technique is social engineering. A group of scammers used it last August 18th, posing as members of Google's and Gemini's security teams to social-engineer "an early investor in cryptocurrency" into downloading malware. He lost more than 4,100BTC, then "worth" about $230M. Mitch Moxley has a fascinating, detailed account of what happened over the next month in They Stole a Quarter-Billion in Crypto and Got Caught Within a Month. The scammers immediately started laundering the loot through a series of mixers and sketchy exchanges. These transactions attracted attention from a famed cryptocurrency sleuth:
Minutes after the D.C. resident’s funds were liquidated, ZachXBT was walking through the airport on his way to catch a flight when he received an alert on his phone about an unusual transaction. Crypto investigators use tools to monitor the global flows of various coins and set alerts for, say, any transaction over $100,000 that goes through certain exchanges that charge a premium for having few security safeguards. The initial alert that day was for a mid-six-figure transaction, followed by higher amounts, all the way up to $2 million. After he cleared airport security, ZachXBT sat down, opened his laptop and began tracing transactions back to a Bitcoin wallet with roughly $240 million in crypto. Some of the Bitcoin in the wallet dated back to 2012. “At that point it didn’t make sense,” he told me. “Why is a person who held their Bitcoin for this long using a sketchy service that typically sees a lot of illicit funds flow through it?”
He added the wallets associated with the transactions to his tracking and boarded the plane. Once he connected to in-flight internet, more alerts arrived. Throughout the day, the Bitcoin traced to the wallet was being liquidated through more than 15 different high-fee cryptocurrency services.
It turned out that one of the group of scammers couldn't resist showing off:
The source sent ZachXBT several screen-share recordings, which he said were taken when one of the scammers livestreamed the heist for a group of his friends. The videos, which totaled an hour and a half, included the call with the victim. One clip featured the scammers’ live reaction when they realized they’d successfully stolen $243 million worth of the D.C. resident’s Bitcoin. A voice can be heard yelling: “Oh, my god! Oh, my god! $243 million! Yes! Oh, my god! Oh, my god! Bro!”
In private chats they used screen names like Swag, $$$ and Meech, but they made a crucial mistake. One of them flashed his Windows home screen, which revealed his real name in the start icon pop-up at the bottom of the screen: Veer Chetal, an 18-year-old from Danbury
Chetal lived in Danbury, CT. His father Sushil Chetal was a vice-president at Morgan Stanley. He was an incoming freshman at Rutgers. In his senior year he had developed a lavish lifestyle:
Classmates remember Chetal as shy and a fan of cars. “He just kind of kept to himself,” says Marco Dias, who became friends with Chetal junior year. According to another classmate named Nick Paris, this was true of Chetal until one day in the middle of his senior year, when he showed up at school driving a Corvette. “He just parked in the lot. It was 7:30 a.m., and everyone was like, What?” Paris says. Soon Chetal rolled up in a BMW, and then a Lamborghini Urus. He started wearing Louis Vuitton shirts and Gucci shoes, and on Senior Skip Day, while Paris and many of his classmates went to a nearby mall, Chetal took some friends, including Dias, to New York to party on a yacht he had rented, where they took photos holding wads of cash.
On 25th August, a week after the $230M heist, Chetal's parents were house-hunting in Danbury in the Lamborghini Urus he had driven to school when:
the Lamborghini was suddenly rammed from behind by a white Honda Civic. At the same time, a white Ram ProMaster work van cut in front, trapping the Chetals. According to a criminal complaint filed after the incident, a group of six men dressed in black and wearing masks emerged from their vehicles and forced the Chetals from their car, dragging them toward the van’s open side door.
When Sushil resisted, the assailants hit him with a baseball bat and threatened to kill him. The men bound the couple’s arms and legs with duct tape. They forced Radhika to lie face down and told her not to look at them, even as she struggled to breathe, pleading that she had asthma. They wrapped Sushil’s face with duct tape and hit him several more times with the bat as the van peeled off.
Years ago, Randall Munroe explained a much simpler but only slightly less effective technique for defeating strong cryptography in XKCD #538. In 2021's Can We Mitigate Cryptocurrencies' Externalities? I referred to Jamison Lopp's list of people applying XKCD's technique entitled Known Physical Bitcoin Attacks, which started in 2014 and is still going strong. Already this year Lopp documents 21 attacks, or more than one a week. Among last year's entries we find the kidnapping of Veer Chetal's parents.
Several witnesses saw the attack and called 911. Some of them, including an off-duty F.B.I. agent who lived nearby and happened to be at the scene, trailed the van and the Honda, relaying the vehicles’ movements to the police. The F.B.I. agent managed to obtain partial license plate numbers.
Danbury police officers soon located the van. A patrol vehicle activated its emergency lights and tried to make a stop, but the driver of the van accelerated, swerving recklessly through traffic.
About a mile from where the chase began, the driver careered off the road and struck a curb. Four suspects fled on foot. The police found one hiding under a bridge and apprehended him after a brief chase. Within a couple of hours, the other three were located hiding in a wooded area nearby. The police, meanwhile, found the shaken Chetals bound in the back of the van.
In an affidavit from an unrelated case, an F.B.I. agent described the Com as “a geographically diverse group of individuals, organized in various subgroups, all of whom coordinate through online communication applications such as Discord and Telegram to engage in various types of criminal activity.”
...
When the price of Bitcoin began to rise rapidly in 2017, Com members made an easy shift from Minecraft fraud to crypto theft.
targeted the Chetals to hold them ransom for the money their son had. Independent investigators think that at least one member of the group, Reynaldo (Rey) Diaz, who they say went by the alias Pantic, was a member of the Com; ZachXBT speculates that the thieves might have made themselves targets by sharing stories of their spending with other Com members.
Chetal's accomplices are alleged to include "Malone Lam, a known figure in the Com" and Jeandiel Serrano. They also couldn't resist enjoying the fruits of their labors:
On Sept. 10, after a 23-day party spree in Los Angeles, Lam headed to Miami on a private jet with a group of friends. There, he rented multiple homes, including a 10-bedroom, $7.5 million estate. Within a few days, Lam had filled the driveway with more luxury cars, including multiple Lamborghinis, one with the name “Malone” printed on the side.
ZachXBT and others were easily able to track Lam's activities on social media:
Malone was filmed wearing a white Moncler jacket and what appeared to be diamond rings and diamond-encrusted sunglasses. He stood up on the table and began showering the crowd with hundred-dollar bills. As money rained down, servers paraded in $1,500 bottles of Champagne topped with sparklers and held up signs that read “@Malone.” He spent $569,528 in one evening alone.
neglecting to use a VPN when he created an account with TradeOgre, a digital currency exchange, which connected to an I.P. address that was registered to a $47,500-per-month rental home in Encino, Calif. It was leased to Jeandiel Serrano, ... By the time the authorities identified Serrano, he was on vacation in the Maldives with his girlfriend.
On Sept. 18, Serrano flew back from the Maldives to Los Angeles International Airport, where the authorities were waiting for him. He was wearing a $500,000 watch at the time of his arrest. ... Serrano admitted that he owned five cars, two of which were gifts from one of his co-conspirators, given to him with proceeds from a previous fraud. He also confessed to having access to approximately $20 million of the victim’s crypto on his phone and agreed to transfer the funds back to the F.B.I.
Later that day, a team of F.B.I. agents working with the Miami police raided a mansion near Miami Shores. Agents blew open the front metal gate while another group entered by boat via a small saltwater canal in the rear. The sound of flashbangs rang in the neighborhood as the agents entered the home.
French gendarmes have been busy policing crypto crimes, but these aren't the usual financial schemes, cons, and HODL! shenanigans one usually reads about. No, these crimes involve abductions, (multiple) severed fingers, and (multiple) people rescued from the trunks of cars—once after being doused with gasoline.
This previous weekend was particularly nuts, with an older gentleman snatched from the streets of Paris' 14th arrondissement on May 1 by men in ski masks. ... The abducted man was apparently the father of someone who had made a packet in crypto. The kidnappers demanded a multimillion-euro ransom from the man's son.
According to Le Monde, the abducted father was taken to a house in a Parisian suburb, where one of the father's fingers was cut off in the course of ransom negotiations. Police feared "other mutilations" if they were unable to find the man, but they did locate and raid the house this weekend, arresting five people in their 20s.
Anderson fails to credit Lopp, who has been tracking the problem for more than a decade. He does note the root of the problem:
Or there's the Belgian man who posted online that "his crypto wallet was now worth €1.6 million." His wife was the victim of an attempted abduction within weeks.
HODL-ers need to understand that the speed, immutability and (pseudo) anonymity of cryptocurrency transactions eliminates many of the difficulties in applying the "$5 wrench" technique. Once it is known that you (or your son) hold the key to a cryptocurrency wallet with even a few tens of Bitcoin, you (or your son) become a target for theft. You (or your son) should hope that the threat comes from social engineers like Veer Chetal and his accomplices, in which case your loss will be expensive but painless. But, as Jamison Lopp records, it may well come from people like Rey Diaz.
The solution is "security through obscurity". If you (or your son) rarely transact and maintain a modest lifestyle, lacking Lamborghinis and $569,528 bar bills, it isn't likely that your wallet address will be deemed worth deanonymizing. But what is the point of HODL-ing for HODL-ings sake alone? The temptation to "buy the Lambo" is really hard to resist, and the risk seems remote.
On Sunday 4th May Vicky & I saw Berkeley Rep's production of a thought-provoking new play by Moisés Kaufman, Amanda Gronich, and Tectonic Theater, the team behind The Laramie Project:
about the reaction to the 1998 murder of gay University of Wyoming student Matthew Shepard in Laramie, Wyoming. The murder was denounced as a hate crime and brought attention to the lack of hate crime laws in various states, including Wyoming.
An example of verbatim theatre, the play draws on hundreds of interviews conducted by the theatre company with inhabitants of the town, company members' own journal entries, and published news reports.
There’s something awful about a lost picture. Maybe it’s because of a disparity between your original hope and the result: you made the photograph because you intended to keep it, and now that intention—artistic, memorial, historical—is fugitive, on the run toward ends other than your own. The picture, gone forever, possibly revived by strange eyes, will never again mean quite what you thought it would.
The play dramatizes the process archivists at the US Holocaust Memorial Museum went through to investigate an album of photographs taken at Auschwitz. Photographs from Auschwitz are extremely rare because the Nazis didn't want evidence of what happened there to survive.
Below the fold I discuss the play and some of the thoughts it provoked that are relevant to digital preservation.
Developing one of these plays is a painstaking process, The actors conduct extended interviews with the people they will represent, and the playwrights select and organize quotes from the interviews. And, in the case of London Road, set them to music.
In this case most of the participants were archivists at the US Holocaust Memorial Museum who, in 2006, received an offer to donate an album of photographs from Auschwitz. Initially skeptical, archivist Rebecca Erbelding rapidly established that the more than 100 images in the album documented the life of Karl-Friedrich Höcker during the period from May to December 1944 when he served as the adjutant to Richard Baer, the last commandant of Auschwitz. Thus the album became known as the Höcker Album.
The archivists were immediately struck by the lack of any images of camp inmates. Instead, many images showed camp staff relaxing at a resort the inmates constructed for the camp called the Solahütte. Time off at the Solahütte was a reward for good performance. Höcker is seen at the Solahütte:
in the company of young women—stenographers and typists, trained at the SS school in Obernai, who were known generally as SS Helferinnen, the German word for (female) "helpers".
Many of the images showed Höcker with more senior SS officers including Rudolf Höss and Josef Mengele. Some showed Höcker's home and children, including a scene with the children floating in the home's pool in a boat the inmates made for them.
The survival of the album is remarkable. A US Army counter-intelligence officer was assigned to Frankfurt in the aftermath of the war. His story is that, unable to find a official billet, he occupied an abandoned apartment in whose trash he found the album. It wasn't until 2006, six decades later, that he offered it to the US Holocaust Memorial Museum.
One image shows a group of about 70 soldiers at the Solahütte celebrating. It was discovered that they were celebrating the conclusion of the operation led by Adolf Eichmann to exterminate the Hungarian Jews from Carpathian Ruthenia:
Between 15 May and 9 July 1944, over 434,000 Jews were deported on 147 trains, most of them to Auschwitz, where about 80 percent were gassed on arrival.
Despite the celebration, the operation was less than completely successful. Before it was stopped by Miklós Horthy, the Regent of Hungary, it had only succeeded in transporting about 434,000 of Hungary's about 825,000 Jews, and killing about 350,000 of them.
Strangely, the arrival of one of the trains is documented in the only other known album of photographs from Auschwitz, the Auschwitz Album. The images:
document the disembarkation of the Jewish prisoners from the train boxcars, followed by the selection process, performed by doctors of the SS and wardens of the camp, which separated those who were considered fit for work from those who were to be sent to the gas chambers. The photographers followed groups of those selected for work, and those selected for death, to a birch grove just outside the crematoria, where they were made to wait before being killed.
The original owner of that album, Lili Jacob (later Zelmanovic Meier), was deported with her family to Auschwitz in late May 1944 from Bilke (today: Bil'ki, Ukraine), a small town near Berehovo in Transcarpathian Rus which was then part of Hungary. They arrived on May 26, 1944, the same day that professional SS photographers photographed the arrival of the train and the selection process. Richard Baer and Karl Höcker arrived at Auschwitz mere days before the arrival of this transport. After surviving Auschwitz, forced labor in Morchenstern, a Gross-Rosen subcamp, and transfer to Dora-Mittelbau where she was liberated, Lili Jacob discovered an album containing these photographs in a drawer of a bedside table in an abandoned SS barracks while she was recovering from typhus.
first found a photograph of her rabbi but then also discovered a photo of herself, many of her neighbors, and relatives, including a famous shot of her two younger brothers Yisrael and Zelig Jacob.
She took the album with her as she immigrated to the United States. In 1983 she donated it to Yad Vashem, after which it was published.
The title of the play, "Here There Are Blueberries", is the caption of a series of images showing Höcker serving blueberries to a group of Helferinnen at the Solahütte.
Höcker served 18 months in a British POW camp before resuming his pre-war life as a bank cashier. In 1963 he was tried in Frankfurt and sentenced to 7 years:
Höcker denied having participated in the selection of victims at Birkenau or having ever personally executed a prisoner. He further denied any knowledge of the fate of the approximately 400,000 Hungarian Jews who were murdered at Auschwitz during his term of service at the camp. Höcker was shown to have knowledge of the genocidal activities at the camp, but could not be proved to have played a direct part in them. In post-war trials, Höcker denied his involvement in the selection process. While accounts from survivors and other SS officers all but placed him there, prosecutors could locate no conclusive evidence to prove the claim.
On 3 May 1989 a district court in the German city of Bielefeld sentenced Höcker to four years imprisonment for his involvement in gassing murders of prisoners, primarily Polish Jews, in the Majdanek concentration camp in Poland. Camp records showed that between May 1943 and May 1944 Höcker had acquired at least 3,610 kilograms of Zyklon B poisonous gas for use in Majdanek from the Hamburg firm of Tesch & Stabenow.
Obviously, many of the thoughts provoked by the play are relevant to current events, about how one would have behaved and how bureaucrats can compartmentalize their lives so as to claim ignorance of the activities they administer, as Höcker did in his first trial:
I only learned about the events in Birkenau…in the course of time I was there… and I had nothing to do with that. I had no ability to influence these events in any way…neither did I want them, nor carry them out. I didn’t hurt anybody… and neither did anyone die at Auschwitz because of me.
The other set of thoughts are relevant to digital preservation. Obviously, the negatives of the photographs in the two albums did not survive. Individual prints from the negatives did not survive. The images survive because in both cases a set of prints was selected and bound into an album, which protected them. Once the albums had been discovered, their survival for many decades was well within the capabilities of the officer and the survivor. "Benign neglect" was all that was needed. A few of the prints suffered visible water damage, but this didn't impair their value as historical documents.
Compared to WW2, many more images and videos documenting the events currently under way in Ukraine, Gaza, Myanmar, Syria, and many other places are being captured. But they are captured in digital form, and this makes their survival over enough decades to be used in war crimes trials and as the basis for histories unlikely. They may be collected into "albums" on physical media or in cloud services, but neither provides the protection of a physical album. The survival for many decades of such digital albums is well beyond the capabilities of those who took, or who found them. This fact should allow the perpetrators of today's atrocities to sleep much easier at night.
Before they were delivered to preservation experts the survivor and the officer had custody of their albums for four and six decades respectively. Four decades is way longer than the expected service life of any digital medium in common use; images and video on physical media require the custodian proactively to migrate them to new media at intervals. "Benign neglect" does not pay the rent for cloud storage, and trusting to the whims of the free cloud storage services is likely to be neglect but hardly benign.
To improve the chances of current-day albums analogous to the Auschwitz Album and the Höcker Album we need a consumer-grade long-lived storage medium that is cheap enough for everyday use. Alas, for the reasons I set out in Archival Storage, we are extremely unlikely to get it.
The potential uses of artificial intelligence (AI) for metadata workflows have grown rapidly. As a result, there’s a greater need for resources that support metadata managers in leveraging AI to enhance the capabilities of their teams. To address these opportunities for the profession, the OCLC Research Library Partnership (RLP) Metadata Managers Focus Group (MMFG) recently kicked off the Managing AI in metadata workflows working group. The primary goal of the working group is to engage our collective curiosity, identify key challenges, and empower metadata managers to integrate AI into their workflows with confidence.
Our call for participation has attracted contributors from the UK, the United States, Canada, and Australia. During our first meeting, we took time to learn how our contributors are currently looking at AI opportunities in their workflows, including:
How can AI make workflows more efficient and effective?
How can AI services help reduce backlogs of materials by creating brief records?
What are the best practices for AI to help libraries with non-Latin script materials?
How can AI be used to augment metadata workflows for institutional repositories, research data/information management ecosystems, and cultural heritage digital asset management platforms?
We discussed some of the challenges that Metadata Managers are currently facing. Broadly, our conversation touched on:
People: How to engage in change management within metadata organizations, from supporting existing staff to thinking about future competencies. This includes thinking about how AI can help staff navigate complex cataloging rules and best practices.
Economics: How to build financial support for AI into library budgets, including for services, training, and future staffing.
Metadata and platforms: How can metadata managers assess AI platforms and features, especially to understand how to apply them to specific parts of metadata workflows (e.g., generating records, quality control, entity management/authority control)?
Collections: How to learn about the different kinds of machine learning or AI, and which ones are best suited for different collection types. For example, computer vision can be used to generate metadata for photography vs. subject analysis of Electronic Theses and Dissertations (ETDs).
Professional values and ethics: How can metadata managers explore the above questions/areas while championing professional values and ethics and honoring commitments to protect and steward our collections responsibly and sustainably?
These topics were used as a starting point for a deeper exploration in our three workstream groups, which are currently meeting on a regular cadence:
Primary cataloging workflows
Metadata for special/distinctive collections
Institutional repositories
Thanks to our working group members for carrying this important work forward!
Helen Baer, Colorado State University
Michael Bolam, University of Pittsburgh
Jenn Colt, Cornell University
Elly Cope, University of Leeds
Susan Dahl, University of Calgary
Michela Goodwin, National Library of Australia
Amanda Harlan, Nelson-Atkins Museum of Art
Miloche Kottman, University of Kansas
Chingmy Lam, University of Sydney
Yasha Razizadeh, New York University
Jill Reilly, National Archives and Records Administration
Mia Ridge, British Library
Tim Thompson, Yale University
Mary Beth Weber, Rutgers University
Cathy Weng, Princeton University
Helen Williams, London School of Economics
We expect work to conclude by the end of June, with additional blog posts about our findings to follow. Stay tuned!
The biennial NDSA Excellence Awards were established in 2012 to recognize and encourage exemplary achievement in the field of digital preservation stewardship at a level of national or international importance. Over the years many individuals, projects, and organizations have been honored for their meaningful contributions to the field of digital preservation.
The time has come again to recognize and celebrate the accomplishments of our colleagues! Nominations are now being accepted for the NDSA 2025Excellence Awards.
Anyone, any institution, or any project acting in the context of the award categories (noted below) can be nominated for an award. No NDSA membership or affiliation is required. Self-nomination is accepted and encouraged, as are submissions reflecting the needs and accomplishments of historically marginalized and underrepresented communities.
We encourage you to help us highlight and reward distinctive approaches to digital preservation practice. Please submit nominations here:2025 NDSA Excellence Awards Nominations form. Awards will be presented at the Digital Preservation 2025 event this fall.
Nominations are accepted in the following categories:
Individual Award: Recognizing those individuals making a significant contribution to the digital preservation community through advances in theory or practice.
Educator Award: Recognizing academics, trainers, and curricular endeavors promoting effective and inventive approaches to digital preservation education through academic programs, partnerships, professional development opportunities, and curriculum development.
Future Steward Award: Recognizing students and early-career professionals making an impact on advancing knowledge and practice of digital preservation stewardship.
Organization Award: Recognizing those organizations providing support, guidance, advocacy, or leadership for the digital preservation community.
Project Award: Recognizing those activities whose goals or outcomes make a significant contribution or strategic or conceptual understanding necessary for successful digital preservation stewardship.
Sustainability Award: Recognizing those activities whose goals or outcomes make a significant contribution to operational trustworthiness, monitoring, maintenance, or intervention necessary for sustainable digital preservation stewardship.
If you need a little inspiration, check out our webpage for lists of past winners or this blog post on submitting a notable nomination. If you have any questions about the nomination form, please contact the Excellence Awards Working Group co-chairs.
This paper explores an ethics of care framework in academic libraries, specifically with the implementation of a professional development initiative for student employees. Using the Architecture Library at Texas Tech University as a case study, we examine how formal professional development opportunities align with care ethics principles by responding to students’ individual needs, fostering nurturing relationships, and contributing to the academic learning environment. Through an exploration of Noddings’ relational theory of care and Tronto’s phases and elements of care, the aim of this paper is to highlight the ability of higher education to engage in a practice of care towards student workers.
Academic libraries often embody a culture of care, whether through deliberate action or instinctive response. Librarians and library staff routinely provide support and assistance, often without explicitly labeling their actions as “care ethics” – they simply see a need and respond. This caring approach is fundamental to educational settings, where the goal is to empower learners and facilitate their growth.
Student workers in these environments occupy a unique position, straddling the roles of both learner and employee, and as we will later explain, both cared-for and ones-caring. Through their work in the library, they develop essential workplace skills and learn to navigate professional relationships. The supervisor (for the purposes of this paper, a person who may be a librarian or library staff) is responsible not only for overseeing the daily tasks, but also for supporting workers as they learn to navigate a workplace environment, often for the first time.
This paper will first provide some context of the ethics of care, then focus on the relational (one-caring/cared for) approach of Nel Noddings and finally examine the expanded ethics of care of Joan Trontom which moves into realms of organizational care. Second, we will provide a literature review of current practices of care in academic libraries and higher education. Then, we will discuss the actors in academic libraries broadly and finally share a practice of care enacted at the Texas Tech Architecture Library.
Ethics of Care Theories: An Overview
The ethics of care as a philosophical theory largely began with psychology researcher Carol Gilligan. Participating in a study on moral development with her mentor Lawrence Kohlberg, graduate student Gilligan pointed out that the research findings were largely biased against girls. The study operated under the theory that moral development moves from universal to principled thinking, and under that assumption, girls were “behind” boys in their moral thinking. Gilligan argued that girls are not morally stunted, they have a different perspective, and an inter-relational way of approaching conflict. The question being asked of children (“Should Heinz steal medicine for his wife?”) could be interpreted as “Should Heinz steal medicine for his wife?” (with boys largely saying Heinz “yes, he should”) and “Should Heinz steal medicine for his wife?” (with girls largely saying “no, he needs to find another way to get the medicine- what if he’s arrested?”). Gilligan argued that both considerations were engaging in moral development, one with a perspective of justice and another with a perspective of care. Gilligan says that without the voices of girls in the study, the consideration of care was largely unheard (1982, 2011).
This intellectual disagreement with Kohlberg led to the publication of In A Different Voice: Psychological Theory and Women’s Development in 1982. Gilligan’s study on moral development has been criticized for seemingly contributing to stereotypes about gender, with critics arguing that it posits a strict demarcation between the behaviors of men and of women and ignored the gendered socialization of boys and girls (Peirson-Hagger, 2023). As Gilligan has continued to write on the moral development and experiences of girls and women, her theory has been refined but still highlights the societal issue of gendering care as feminine, instead of as a human action. In her 2011 book Joining the Resistance, Gilligan clarifies:
Listening to women thus led me to make a distinction I have come to see as pivotal to understanding care ethics. Within a patriarchal frame, care is a feminine ethic. Within a democratic framework, care is a human ethic. (p. 22)
Gilligan argues that she is not making these statements about how all men act and how all women act. The issue is that in a patriarchal society care is gendered as feminine, while justice is seen as “aligned with reason, mind, and self—the attributes of ‘rational man’” (p. 23-24). An ethic of care, for Gilligan, is democratizing—it requires interdependence and a responsibility to others.
Writing soon after Gilligan, Noddings builds on the ethics of care and applies it to an educational framework. Her early framework drew heavily on women’s traditional caregiving roles, which she described as examples of “natural” caring. She often illustrated her concepts using mother-child relationships, drawing criticism for this potentially limiting and essentialist perspective that seemed to require self-sacrifice for the child’s benefit. Her theories are largely framed as a relationship, and she later called her version of ethics of care “relational ethics” instead of “feminist ethics” (Noddings, 2013).
Like Gilligan, Noddings has refined her theory and reflected back on her works. In the 2013 edition of Caring, she clarifies her ethics of care as being chiefly concerned with “how, in general, we should meet and treat one another—with how to establish, maintain, and enhance caring relations” (p. x) after commenting that “hardly anyone has reacted positively to the word feminine” (p. xiii).
Noddings suggests in her 1984 work Caring: A Relational Approach to Ethics and Moral Education that teachers demonstrate caring tendencies that drive them to address students’ specific needs. While discussing teacher-student dynamics, she employs the terms “one-caring” and “cared-for” to characterize their interactions. In this relationship, the “one-caring” (teacher) becomes invested, or engrossed, in the development of the “cared-for” (student). Using Noddings’ concepts to apply to the library workplace environment, we can frame student workers as recipients of care (the cared-for) and supervisors as providers of care (the one-caring) much like Noddings attributes those roles to students and teachers. This perspective establishes the relational framework necessary for implementing a care-based practice.
On the tasks of the teacher/one-caring, Noddings says they must “stretch the student’s world by presenting an effective selection of that world with which she [the one-caring] is in contact and to work cooperatively with the student in his struggle towards competence in that world” (p. 167). Teachers/ones-caring use their knowledge and resources to address the questions and problems of the cared-for, a process familiar to librarians as we field questions about reference and problems of accessing and finding materials on varied subjects for varied patrons. Those questions and problems are unique to each individual, and an effective one-caring is attentive to those differences when presenting solutions. The questions and problems of the student worker as cared-for may include how to polish skills necessary for their career goals and transferring those skills to other communities.
While one goal of ethics of care is to establish reciprocal relationships of care, it is not always the case that the cared-for will participate in caregiving to the one-caring. However, when student workers help their fellow students, they participate in caregiving, taking on a one-caring role themselves. The success lies not in their care being directed back to supervisors, but in their ability to continue this cycle of support within the academic community.
It is important to note there is a power imbalance in the relationship between a supervisor and a student worker that must be considered to properly enact an ethics of care, an ethic concerned with addressing vulnerabilities. Noddings says of this imbalance, “Social worker and client, physician and patient, counselor and student in their formal roles necessarily meet each other unequally” (p. 62). Crawley et al. (2008) comments in their essay that the recognition of the power imbalances is necessary for a feminist approach to education: “one must consider not only the power relations among classroom actors (e.g., teachers and students) but also the power relations implicit in knowledge construction, ultimately working toward empowerment of students” (p. 3). Recognizing the vulnerability of the cared-for and viewing our actions from their perspective is the first step in addressing the power imbalance inherent in learning and working environments and is also an embodiment of Noddings’ idea of “engrossment.”
An ethical model that frames interactions between a one-caring and a cared-for has received criticism for being narrow and chiefly concerned with caring for those physically close to you, as opposed to extending care out towards people with whom you may not interact, which is Joan Tronto’s critique of Nodding’s dyadic model (Tronto, 1993). Modern care ethicists have grown the theory to include looking at care from various perspectives, viewing care as a gender neutral activity, and expanding care beyond the encounters of two people. Tronto expands the scope of care beyond relational interactions between educator and the educated, beyond one-caring and cared-for, to institutions, bureaucracies, and governments. Care for people, rather than economics, should be the driving force for these organizations.
In her 1993 book Moral Boundaries, Tronto describes ethics of care as:
A species activity that includes everything that we do to maintain, continue, and repair our ‘world’ so that we can live in it as well as possible. That world includes our bodies, ourselves, and our environment, all of which we seek to interweave in a complex, life-sustaining web. (p 103)
Tronto is often cited in works about using a practice of care in higher education, as the consideration of the institution’s responsibility of care to its workers and students arises. We will reference Tronto’s phases of care and elements of care as it pertains to the role of the librarian supervisor and the evolution of our professional development project.
Tronto’s phases of care:
Caring about – the act of noticing that care is needed in the first place
Taking care of – taking on the responsibility for caring
Care-giving – the work of giving care
Care-receiving – the response of being cared-for
Out of these phases of care arise the necessary ethical elements required for care: attentiveness, responsibility, competence, and responsiveness (Tronto, 1993, pp. 127-131). It is these elements that we will further reference in our discussion about enacting care. These are useful for determining how well a project may be adhering to an ethics of care, and we will be using them as we reflect on our project.
Practices of care regularly show up in areas where there are elements of collaboration, consideration, and concern for vulnerabilities. And it is not surprising that it is demonstrated in professions that have historically been feminized (education, librarianship, etc.) and may shed light on why care-giving is often wrongfully relegated as “women’s work.” Often called the “feminist ethic,” it is important to note that a feminist ethic or practice may be enacted by anyone or any institution.
Ethics of Care in Libraries and Academia: A Review of the Literature
In addition to the literature of Noddings and Tronto, the study of using an ethics of care approach in higher education and libraries has shaped this paper and this project as well as work about the importance of student training in academic libraries.
Ladenson (2017) describes using the feminist framework for reference services in libraries as a practice that is more than simply helping a student find a certain resource but also digging a bit to identify the “why” it is important for them to do so. Is it a passion project? A new curiosity? Why is this interesting? These extra questions invite the students to reflect on their own research and work to establish a beginning of a back-and-forth between the researcher and the library/librarian. Bruce (2020) goes on to say about a caring approach to librarianship: “These one-on-one sessions are not just about the exchange of information. Instead, they are a moment which adds to a student’s sense of belonging and care with regards to their academic and personal selves” (para. 8). The attentiveness and recognition in each individual researcher and each question are elements that re-cast the interaction as practice of care, embodying the ideas of Noddings as well as Tronto. Librarians may find themselves frequently in positions of engaging in care as they establish relationships with library users, guide people through the research process, and advocate for library services to assist users.
Beyond the library, an ethics of care framework is useful in considering the interactions and responsibilities of the university towards students and the interactions of faculty members with each other to cultivate a supportive learning community. “An Ethic of Care in Higher Education: Well-Being and Learning” (Keeling, 2014) highlights the importance of focusing on the entire student and addressing issues of access for individuals. On the other hand, Sai’s 2024 work highlights that the current practice of most institutions of higher education prioritizes faculty outputs and profit, to the point of neglect for the faculty’s well-being and life outside of the university. These papers act as a call for reimagining the culture of higher education as one that values all members of the academic community as whole people and that values knowledge creation not just as metrics and outputs.
A similar autoethnographic paper focuses on the concept of “critical friendships” in academia. “Critical friendship: An alternative, ‘care-full’ way to play the academic game” (Sotiropoulou, 2022) looks at the collaborative relationships between academic colleagues, fostering working environments that support others across disciplines:
Finding the time to get to know each other better and to continuously invest in practicing our critical friendship was a strategy we utilized to deviate from the fast-paced and measurable mandates of the neoliberal academia and our way of prospering both personally and professionally as academics. (p. 1112)
This approach to collaborating with others, sharing feedback, actively listening, and taking time to meet with and discuss life and careers was a tactic to push against the culture of academia that prioritizes hustle and churning out work. It is a mistake of the neoliberal academic institution that individualizes work, instead of seeing the work of academia as collaborative. Naylor (2023) writes in “A Feminist Ethic of Care in the Neoliberal University”:
To transform neoliberal academic spaces into spaces that are caring means recognizing that collective support within a department does not have to be an archipelago, but can be contiguous and form a web of reinforcement that does not have strict borders which isolate research from teaching and service. (para. 7)
Encouraging an environment concerned with care and collaboration in our working and learning environments is a way to push back against hierarchical systems of neoliberal academia that encourage competition and rugged individualism.
Student Workers (Cared-For)
On-campus jobs are an opportunity for students to work in environments that allow them to balance the needs of academic work, usually providing flexibility around classes, projects, and exams. Working in an academic library provides student workers a further opportunity to have access to collection materials, a closer connection to faculty, occasional “down time” to work on homework, and opportunities to cultivate transferable soft skills like teamwork and customer service. It is important to note the vulnerable position of student workers, working jobs that offer them flexibility and training but little pay, while primarily focusing on coursework for which they have paid thousands of dollars.
Student workers in academic libraries engage in customer service, organizational tasks, circulation duties, and the daily tasks necessary for the smooth running of an academic library. Their work brings them into contact with their peers and their professors at the library. Student workers have the opportunity to participate in supporting research, navigating the library collection, and using library technology like scanners and software. These tasks develop and polish skills like communication, critical thinking, and technological proficiency. Mitola, et al. (2018) discuss academic library work as a high-impact practice, saying: “The work experiences of undergraduate students can also shape their college experiences and contribute to the development of skills employers seek in college graduates” (p. 352).
Student workers, however, are first and foremost students juggling academic and social obligations in addition to working. These competing responsibilities require flexibility and a willingness to understand each unique student’s situation, academic goals, and career aspirations on the part of the supervisor. The academic environment offers a workplace where student-workers are learning about the expectations of professional environments and developing necessary skills, and their supervisors are in a position of educating and training students in these skills.
The Role of Supervisors (Ones-Caring)
What is our responsibility to our student workers? How can we approach training and development in a way that is responsive to student workers? How can we prepare students for the working world outside of the university? Asking and exploring these questions represents Noddings’ sense of engrossment for the needs of the cared-for and aligns with Tronto’s framework, specifically embodying the elements of “attentiveness” and “responsibility.”
Library-specific training and the freedom to explore supplemental career-specific training fosters an environment of growth and supports the further career aspirations and goals of students. It also offers an opportunity to enact care with thoughtful feedback and support. To be able to enact any care successfully, there must already be a groundwork of support and trust established, which is the responsibility of the supervisor to cultivate.
The academic library serves as an environment for student workers to grow personally and professionally through high impact practices that demonstrate “an affective orientation of care for student employees” (Vine, 2022, conclusion). All this while “respecting other’s positionalities, autonomy and embodied differences and working with them to improve the capacity of those cared for and about to make better decisions” (Sai, 2024, p. 533). By building programs and initiatives that are responsive to student needs, we can support students as they navigate what it means to be a member of a workplace and also provide an opportunity to explore their professional curiosities and talents.
Competence in supervising students, another of Tronto’s elements of care, requires that supervisors effectively communicate the goals of training or practices. When we discuss students engaging in professional development, we clarify that we intend this project to help develop students’ abilities to articulate their strengths, practicing the task of advocating for themselves in future workplaces. A supportive, caring work environment can reinforce to the students that their labor is valuable and appreciated, hopefully setting the bar for future supervisors and workplaces to meet.
Why Professional Development?
How is professional development an act of care? Our goal is not to oil the wheels of capitalism, ensuring that workplaces have well-trained cogs for their machine, but rather to prepare student workers for the transition from an academic environment to the workplace arena. With a focus on setting their own self-development path, a supportive space, and a concern for the costs (time and money) that later professional places may not provide, we want to give students a leg up as they are moving from being student workers to workers.
The National Association of College Employers (NACE) produces a list of competencies that we use to frame our professional development project. NACE gets these competencies from annually polling many companies about what skills they find valuable in new employees out of college. It is important to note that these professional expectations are coming from employers and professional organizations, who represent The Market. These eight NACE competencies are self-development, communication, technology, equity and inclusion, leadership, critical thinking, professionalism, and teamwork. They also encompass behaviors that can be regarded as caring actions, with the opportunities for the cared-for to engage in the actions of the ones-caring. The practice of these learning competencies may develop behaviors that ripple out to influence other communities.
We found the NACE framework on the Texas Tech University career center website, among many tips for students preparing and applying for jobs, going on interviews, and preparing to leave the work of being a student for the work of The Market—with its expectations of knowing the rules (rules often left unsaid by a dominant culture). NACE often works in partnership with career centers of universities, the goal being to prepare students for the transition into workplaces. We discovered that it offers many overlapping skills that are useful in learning and working communities. After finding Franklin Oftsund’s (2022) study on career-readiness for student workers, where they also used the NACE framework, we determined it would be useful for our library as well.
The concept of “professionalism” and success can vary widely based on location, gender expression, and industry. There are many spoken and unspoken rules workers navigate, and learning how to navigate those is a skill in itself. NACE defines professionalism on their webpage “What is Career Readiness” as “Knowing work environments differ greatly, understand and demonstrate effective work habits, and act in the interest of the larger community and workplace” (National Association of College Employers, n.d). NACE writer, Gray (2022) comments that while professionalism can change based on various factors (in-person vs. virtual interactions, industry norms, or geographic location) a common thread is to “show respect for others and make sure you contribute” (para. 17).
We do not want to surrender to market values, defining worth by marketability, because the human experience encompasses much more. We also want to respect the goals of the students who may aspire to succeed by the definitions of The Market, while also instructing student workers about their successes in skills that may be beneficial in enriching communities. Or as Beilin states it: “We ought to encourage alternative definitions of success while at the same time ensure success in the existing system” (2016, p. 18).
Returning to Tronto’s definition of care, and speaking to her comment on “repairing our world,” mindfully engaging in promoting supportive and caring work cultures can be considered an attempt to respond to the harm of “hustle culture” and commonplace worker exploitation, especially of new workers starting out on their path. This also speaks to Noddings’ charge that the ones-caring respond to a problem of the cared-for, in this instance the problem of addressing the potential injustice student workers may find in their post-university workplaces. While our library might be just one small workplace among many, taking care to make it a positive one will hopefully have some ripple effects in the complex web and set a standard of a supportive, inclusive, and compassionate workplace.
Practice of Care at the Architecture Library
Now that the theories, supporting literature, and players have all been identified: what does a practical application of care look like? The Texas Tech Architecture Library has engaged in an initiative with its student workers that embodies an ethics and practice of care for nearly two years at this point. Texas Tech is an R1 university with an Hispanic-Serving Institution (HSI) status since 2019. Our architecture library is a branch library, embedded in the Huckabee College of Architecture, a program that has over 700 students, undergrad and graduate. Our ones-caring at the library include three faculty librarians and a member of staff who is the direct student supervisor. The reference to “we” throughout this paper is the public services librarian and the member of staff who serves as the direct student supervisor. We have, on average, seven to ten student workers in the role of the cared-for (and, as the job requires, participating in caregiving for library patrons). They are responsible for working at the circulation desk, assisting with scanning issues, pulling and shelving books, and participating in inventory projects. The job involves a lot of interaction with fellow students, members of architecture faculty and staff, and other library departments.
Noddings says we must address the problems of the cared-for. We do so by offering professional development opportunities to address the problems and expectations of transitioning into post-graduate life and answer the questions of “what skills do I have?” and “how do I leverage this work experience into future work experience?” We want student workers to develop competence to tackle and navigate future workplaces while also supporting their individual interests and building skills that transfer to other communities. During our mid-semester check-ins, we pose questions like “What does professionalism mean to you?” to facilitate discussion and talk about how these skills have been useful in other areas. One student worker commented in our mid-semester check-in that they were applying their practice of leadership to collaborate and communicate better in their student organizations.
The nature of the student assistant job requires skills that address these competencies like customer service, communication, and time management: soft skills necessary for most workplaces. We wanted the students to consider how else they can grow their skills during their time at the Architecture Library. The inspiration for a professional development project initially came from helping student workers with their resumes and letters of reference. Students often listed their tasks (shelve and organize books, use scanner/printer, check in/out books) and we found ourselves recommending the mention of specific skills, saying “You didn’t JUST check out books! You provided good customer service, communicated library policies, and worked effectively on a library team.” We decided to be clearer about the skills they were engaging in and developing, so that they can recognize the value they brought to our workplace and better articulate their skill sets, strengths, and experience for future jobs. The project also came from a desire to expand the scope of skills that students practice as a way of encouraging self-directed discovery and thinking about how their work in the library could inform their future careers. The NACE competencies were chosen a year into the project as a way to further structure the semester-long project.
In consideration to the students’ financial situations, these projects are to be done on work time, and to be done with resources freely available to them. Texas Tech has a subscription to a platform called Udemy that the students are encouraged to explore. This platform offers multiple types of online learning materials across many topics. Their access to these materials only lasts while they are students at Texas Tech University. In the future, professional development may have to be on their own time and own dime, but while the library can support this development we feel that we should.
This approach reflects an ethic of care through the practice of supporting students’ development and fostering skills relevant to their goals. Engaging in this practice through a lens of care means that we must view each student holistically and give support and consideration to their larger aspirations. The supervisor (the one-caring) applies Tronto’s elements of care- attentiveness, responsibility, competence, and responsiveness—in the development and implementation of the project, while the student worker (the cared-for) practices these elements while engaging in the project.
An outline of the project as it embodies elements of care:
Each student decides their professional development project based on their wants and needs for growth into their professional aspirations. This demonstrates an attentiveness by the supervisor to encourage students in self-development, and attentiveness by students to reflect on their goals.
The supervisor, as a means of offering guidance and embodying responsibility, provides a list of potential training materials that students can explore.
While a requirement, it is weighted among other tasks so students who do not complete a professional development project have other opportunities to succeed at the job and demonstrate competency with their work.
Students do this work during their working hours with tools available to them for free. This is an act of being attentive to the value of the students’ time, and to offer an opportunity to take advantage of resources made available by the university. This also requires responsibility from the student to prioritize this project as much as their other work tasks.
The supervisor conducts regular check-ins about the students’ professional development progress before end-of-semester evaluations to provide feedback or offer guidance. These check-ins create opportunities for the supervisor and student workers to be responsive to feedback as we finish the semester.
At the end of the semester, the supervisor sends out an anonymous survey to student workers for feedback and adapts projects based on that feedback. The nature of it being a semester-long project means that we can quickly adjust expectations as a way of being responsive to feedback.
We are now almost two years into this professional development initiative, and the projects that students have chosen have ranged from creating library signage in graphic design software to language learning on DuoLingo to learning industry software like Blender, Rhino, and Grasshopper. Given that many of our students are architecture students, it is not surprising that many have chosen to refine their skills on design software common for their major. This has also had the added benefit of increasing the knowledge of these software for our student patrons, giving our student workers the opportunity to step into the role of teacher/care-giver as they help our student patrons navigate these programs.
A particularly exciting project a student worker wanted to pursue was to start a book club at the library. This student had been a participating member of many student organizations and wanted to take the opportunity to plan and implement a program for the library. We discussed what a full expression of this project would look like (a process that hits many NACE competencies for the cared-for as well as Tronto’s ideas of attentiveness and responsiveness for the ones-caring). To fully support this project, we explored ways to fund meetings in a library system that does not offer much funding for programs. The public services librarian applied for and was awarded an internal grant for faculty-led book clubs meant to increase participation for those interested in a free book, coffee, and scones. The student worker was responsible for the tasks of choosing the text, signage, and leading discussion prompts. We offered support with the administrative aspects of the grant (ordering materials, reporting receipts, etc), social media posts, and participation in the book group. This specific project required a lot more engagement in Tronto’s elements of care compared to student workers who chose to view and discuss training videos. But being attentive to the individual interests and strengths of this student was required of the ones-caring, a task we were happy to engage in. The results of this project were that our student worker got to perform a project interesting to them with our full support and caring and to give care to the learning community through thoughtful discussions.
We collect qualitative data during our in-person check-ins as well as the end-of-semester anonymous survey as we attempt to answer the questions: Are we cultivating a supportive workplace? Do the students feel confident in their skills?
At the end of the semester, we discuss how the library can support the career goals of student workers and perform an in-person evaluation that looks at all of their work over the semester. We discuss the semester’s work based on library-specific tasks like shelving, reporting reference interactions, teamwork, and professional development. Generally our feedback from students is positive, though it must be said that the inherent power dynamic between student employee and supervisor must be considered when taking into account in-person feedback, so we provide an anonymous survey for feedback. We enjoy a high level of student retention semester to semester and many student workers recommend their classmates and friends to apply when we have openings.
While we vary the end-of-semester survey each semester, questions we ask about their professional development have included:
Do you feel like your work in the library supports your future career goals? (Response Options: Yes, No, I Don’t Know)
Select the career-ready competencies you feel you have developed while working at the library (Multi-response option of the eight NACE competencies)
In what ways has your role in the library helped you develop new skills?
Student workers responded that they feel like their work in the library is supportive of their future career aspirations and that they felt most confident in “communication” and “professionalism” of the NACE competencies. Students reported feeling very positively about their work and enjoy the fact that they can pursue projects that are interesting to them. They believe they effectively communicate with their team/colleagues and enjoy working with their coworkers.
Quotes from students collected via anonymous end-of-semester anonymous survey:
I’m proud of my work, it required a lot of communication, teamwork and patience
I hope to find a job in the future with a similar culture to the one here in the library
Everyone has been very helpful and encouraging as well as a good influence. I really enjoy my time at the library
Speaking with and helping patrons with different tasks in a field I am unfamiliar with has helped my work in my problem solving and critical thinking
I have definitely had the opportunity to develop customer service problem solving skills. I have improved at asking targeted questions to clarify patron issues and offer solutions
By creating content for the library I have been working on my graphic design skills
While we did not set out to conduct a project guided by the ethics of care, or a feminist approach to supervising students, our interest and engagement with our student workers led, perhaps Noddings would say “naturally,” to enacting one. This professional development initiative demonstrates that an ethics of care approach is not merely a theoretical construct, but a practical method of supporting student workers. Our students have grown in their abilities to communicate their skills and competencies. This practice enriches the cared-for and the workplace, creates meaningful work for the one-caring, and ideally extends out further to other communities.
Conclusion
While this project is one element of cultivating a caring and supportive workspace, it is not sufficient enough to be the only element that is necessary for supervising student workers with a praxis of care. The holistic working environment must be operating as a space that seeks to offer care broadly, and this professional development project has been able to grow out of that established space on that groundwork of trust that has been cultivated.
This approach to workplace dynamics acts as a challenge to traditional transactional labor to a nurturing, collaborative relationship that empowers student workers. Embracing care-centric practices in academic institutions can create environments that create meaningful learning, and teaching, experiences for the academic community, instead of treating them as individual components of an institutional machine.
Tronto ends her work Moral Boundaries with a call to care:
To recognize that value of care calls into question the structure of values in our society. Care is not a parochial concern of women, a type of secondary moral question, or the work of the least well off in society. Care is a central concern of human life. It is time we began to change our political and social institutions to reflect this truth. (1993, p. 180)
We aim to reflect this truth in our small library workplace, where our faculty, staff, and students feel supported, purposeful, and seen.
Acknowledgements
The author would like to thank their peer-reviewers: Pam Lach, Brittany Paloma Fiedler, and Liz Vine as well as the editors of In the Library with a Lead Pipe for their feedback, guidance, and direction during this work. Their care means a lot!
Bibliography
Beilin, I. (2016). Student success and the neoliberal academic library. Canadian Journal of Academic Librarianship, 1(1), 10-23.
Benjamin, M., & McDevitt, T. (2018). The benefits and challenges of working in an academic library: A study of student library assistant experience. The Journal of Academic Librarianship, 44(2), 256–262.https://doi.org/10.1016/j.acalib.2018.01.002
Crawley, S. L., Lewis, J. E., & Mayberry, M. (2008). Introduction—Feminist pedagogies in action: Teaching beyond disciplines. Feminist Teacher, 19(1), 1–12.http://www.jstor.org/stable/40546070
Gilligan, C. (2003). In a different voice: Psychological theory and women’s development (38th print). Harvard University Press.
Gilligan. C. (2011). Joining the resistance. Cambridge: Polity Press.
Keeling, R. P. (2014). An ethic of care in higher education: Well-being and learning. Journal of College and Character, 15(3), 141–148.https://doi.org/10.1515/jcc-2014-0018
Ladenson, S. (2017). Feminist reference services: Transforming relationships through an ethic of care. The feminist reference desk: Concepts, critiques, and conversations. Sacramento, CA: Library Juice Press.
Mitola, R., Rinto, E., Pattni, E. (2018). Student employment as a high-impact practice in academic libraries: a systematic review. The Journal of Academic Librarianship. 44(3), 352-373.
Noddings, N. (2013). Caring: A relational approach to ethics & moral education (2nd ed., updated). University of California Press.
Ofsthun, F. (2022). Just like the library: exploring the experiences of former library student assistants’ post-graduation careers and perceptions of job preparedness as impacted by library work [Doctoral dissertation, ProQuest Dissertations & Theses].
Sai, L., Gao, G., Mandalaki, E., Zhang, L. E., & Williams, J. (2024). Co-constructing new ways of working: Relationality and care in post-pandemic academia. Culture and Organization, 30(5), 523–538.https://doi.org/10.1080/14759551.2024.2323726
Sotiropoulou, P., & Cranston, S. (2022). Critical friendship: An alternative, ‘care-full’ way to play the academic game. Gender, Place & Culture, 30(8), 1104–1125.https://doi.org/10.1080/0966369X.2022.2069684
Stoddart, R., Pesek, J., & Thornhill, K. (2022). Assessing student employment in libraries for critical thinking & career readiness. In Library Assessment Conference Proceedings 2022.
Tronto, J. C. (1993). Moral boundaries: A political argument for an ethic of care. Routledge.
Vine, L. (2021). HIP check: Equity, learner-centered pedagogies, and student employment. In Ascending into an open future: The proceedings of the ACRL 2021 virtual conference (pp. 321-329). ACRL.
The contemplation on impermanence can help us live our life with the
insight of impermanence so we can be free from many afflictions such as
anger, fear, and delusion. It isn’t the idea or notion of impermanence,
but the insight of impermanence that can free and save us. Impermanence
is not a negative note in the song of life. If there were no
impermanence, life would be impossible. Without impermanence how could
your little girl grow up and become a young woman? Without impermanence
how could you hope to transform your suffering? You can hope to
transform your suffering because you know it is impermanent. So
impermanence is something positive. We should say, “Long Live
Impermanence!”
I think working in digital preservation, and as a memory worker in
general, it’s easy to see impermanence as a, if not the, enemy. If you
are thinking about a specific item in isolation, say a computer file, or
an archival document, it kind of is. But if you focus your attention on
the information artifact for a little bit you often come to discover
that it is actually related to other artifacts and entities that may or
may not still available, and that it is already incomplete, in many
ways. This incompleteness is what gives the artifact value, and makes it
worth preserving, and is also why exact preservation of its current
state isn’t always possible. Forever is mental trap that causes anxiety
and suffering.
Secure the early bird rate, register for Learn@DLF workshops, and start planning for yet another memorable week with DLF.
DLF member organizations receive one complimentary DLF Forum registration as part of their member benefits. Not sure who received your code? Email us at forum@diglib.org.
This year, CAPWIC welcomed 193 attendees from universities, high schools, companies, and non-profit organizations. The majority were undergraduate students (41%), followed by graduate students (36%), college and university faculty (15%), industry professionals (4%), and high school students (3%). Eight Ph.D. students in Computer Science from ODU attended the CAPWIC 2025 conference. Among them, five students from ODU's Web Science and Digital Libraries (WS-DL) research group participated in person, presenting research shorts and posters.
Ph.D. students from ODU Computer Science at the CAPWIC 2025 conference at George Washington University
Dr. Brown shared how her background in computer science has informed her work across research, education, and policy, emphasizing the importance of innovation through collaboration and interdisciplinary engagement. She highlighted three key principles, namely policy, practice, and people, as the main operating principles in the working environment. Her keynote concluded with a powerful reflection: "Whose life do you want to be better because you are here?"
ACM @capwic 2025 conference kicked off with the inspiring keynote titled "Oh, the places you’ll go!", by Dr. Quincy K. Brown, the Director of Space STEM and Workforce Policy at the National Space Council. #CAPWIC2025pic.twitter.com/gGM5kiRDO2
Dr. Quincy K. Brown gives the morning CAPWIC 2025 keynote
Parallel Sessions - Cybersecurity
Following the morning keynote, two parallel sessions were held: Cybersecurity and AI. In the Cybersecurity session, ODU's own CS PhD student and senior lecturer, Susan Zehra, presented “Mitigating Cyber Threats in V2V and V2I) Networks: A Security-Centric Approach”. Her research focuses on Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication, which involves vehicles interacting with each other and with traffic control systems. Her method combines dynamic key management with RSUs (Roadside Units), public-key cryptography, and blockchain-based fallback mechanisms to counter cyber threats. As a result, her approach significantly reduced the breach rate for key exchange between vehicles while ensuring reliability and user anonymity.
The other two presentations in the session were: “Combining Open-Source Intelligence (OSINT) with AI for Threat Detection” by Jackline Fahmy (Marymount University) and “A Hierarchical Deep Reinforcement Learning Chatbot for Cybergrooming Prevention” by Heajun An (Trustworthy Cyberspace Lab - tClab at Virginia Tech). Fahmy's study explored how AI and NLP techniques can enhance threat detection across social media platforms and the Dark Web. Heajun An's research introduces an AI chatbot that adapts to different vulnerability levels in teenagers by using dynamic interactions to prevent cyber-grooming.
Student Research Posters
Kritika Garg from WS-DL presented her poster “Redirects Unraveled: From Lost Links to Rickrolls.” The research examined 11 million redirecting URIs to uncover patterns in web redirections and their implications on user experience and web performance. While half of these redirections successfully reached their intended targets, the other half led to various errors or inefficiencies, including some that exceeded recommended hop limits. Notably, the study revealed "sink" URIs, where multiple redirections converge, sometimes used for playful purposes such as Rickrolling. Additionally, it highlighted issues like "soft 404" error pages, causing unnecessary resource consumption. The research provides valuable insights for web developers and archivists aiming to optimize website efficiency and preserve long-term content accessibility.
Thu Nguyen from the Bioinformatics and Parallel Computing group at ODU presented “Enhancement of Deep Learning for Segmentation of Protein Secondary Structures from Cryo-EM,” at the poster session. Their work enhances the efficiency of DeepSSETracer, a deep learning framework for cryo-electron microscopy segmentation, by partitioning large cryo-EM maps and merging the outputs. This approach reduces memory usage and processing time, enabling smoother handling of large datasets within the ChimeraX visualization tool.
The session on CS education was chaired by Briana Morrison and featured data science research on admissions biases and success rates in CS courses.
Anaya Prakash presents at CAPWIC 2025
Anaya Prakash of Virginia Tech presented “Who Gets In? The Role of AI in Shaping the Next Generation of Computer Scientists.” She analyzed a closed dataset of a large public university’s MS CS admissions. She found patterns in the admissions data, including a preference for men from either the US or China being 2.5X more likely than other subgroups. She also found that attributes like age, which were not intended to be a factor in admissions, were key predictors in models such as Random Forest.
Nawar Wali presents at CAPWIC 2025
Next, Nawar Wali of Virginia Tech presented her work, “Machine Learning Insights into Academic Success in CS3: The Role of Mathematics and CS Coursework.” She analyzed how mathematics coursework is correlated with success in Data Structures and Algorithms (CS3). She analyzed 10 years of Virginia Tech CS3 data, including CS and math courses with associated grades for 3900 students. She found that discrete structures and computer organization were the highest correlated courses with CS3 success. She also found that students who pass linear algebra earlier also have a higher success rate in CS3.
Victoria Wiegand from Villanova University presented “Data Collection Pipeline to Diversify AI Training Data.” Their work addresses the cultural biases present in AI vision-language models, which are often trained on predominantly Western data sources. As a result, these models perform poorly when interpreting or generating content from non-Western contexts, frequently misrepresenting communities with inaccurate or stereotypical imagery. To help bridge this digital divide, the researchers developed a low-cost, community-driven data collection pipeline. Partnering with a university service trip, they trained participants in ethical data collection and gathered images via a WhatsApp-linked web form. These images will be compiled into a publicly available dataset aimed at helping developers diversify model training data and improve cultural representation.
Victoria Wiegand presents at CAPWIC 2025
Hajra Klair from Virginia Tech presented “Agentic AI for the Rescue: Factual Summarisation of Crisis-Related Documents.”Their work presents a new approach to summarizing crisis-related documents using large language models. Evaluated on the CRISISFacts dataset covering 18 real-world events, the two-phase architecture first retrieves documents based on entity prominence and then generates summaries guided by structured, crisis-specific queries. This approach reduces hallucinations and enhances information coverage, aligning the summaries with the needs of emergency response officials.
Hajra Klair presents at CAPWIC 2025
Yeana Lee Bond from Virginia Tech presented “Driver Facial Expression Classification: A Comparative Study of Computer Vision Techniques.” Their work explores how well different machine learning models can detect driver emotions—happy, angry, and neutral—using facial expressions. They evaluated EfficientNet, Vision Transformer, and CNN-based models on a driver emotion dataset. EfficientNet stood out for its speed and high accuracy, while the study also highlighted how model design and data quality play a bigger role than just model size.
Yasasi from NIRDS Lab and WS-DL presented “Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study” at the Games and Virtual Reality session. Their work presents a framework for analyzing visual attention in a gaze-driven VR learning environment using a consumer-grade Meta Quest Pro VR headset with a built-in eye-tracker. Yasasi discussed how their study contributes by proposing a novel approach for integrating advanced eye-tracking technology into VR learning environments, specifically utilizing consumer-grade head-mounted displays.
Sanjana Kumari from Virginia Tech presented “Evaluating Children's Ability to Distinguish Between Traditional and AI-Generated Media”. Their study investigates whether children can distinguish between human-authored and AI-generated content through a structured intervention comprising surveys and an educational workshop. The authors note that while adults are increasingly adapting to tools like ChatGPT, we still lack a clear understanding of how children perceive and process these technologies.
Rebecca Ansell from Georgetown University presented “Assessing Public Perception of AI-Generated Social Media Content of the 2024 U.S. Presidential Debate”. Their study explored key research questions such as: Can humans distinguish between AI-generated and human-authored content on social media? And what characteristics make a social media post appear human? To investigate this, they collected a dataset of X posts and YouTube comments related to the 2024 presidential debate, and supplemented it with content generated using ChatGPT. Human annotators were then employed to label whether each piece of content was AI-generated or not. The researchers used the Bradley-Terry model to analyze the ratings and evaluate human perception patterns. The findings showed that annotators could generally differentiate between AI and human content. Interestingly, the study also examined how sentiment influenced perceived humanness, positively toned posts were more likely to be perceived as AI-generated, while negative or offensive content was often seen as human-authored. The results highlight the role of tone, civility, and emotional cues in shaping public perception.
Himarsha from WS-DL presented “Infrastructure for Tracking Information Flow from Social Media to U.S. TV News”. Their work focused on understanding how social media content is amplified through mainstream media by expanding its reach to new audiences. They are using the data from Internet Archive’s TV News Archive to explore how social media content flows into TV news and the contexts in which it is incorporated.
In parallel to the Social Media session, the Human-Computer Interaction (HCI) session was held, featuring three short research presentations. It was chaired by Dr. Jin-Hee Cho from Virginia Tech.
The session began with Anika Islam from George Mason University, presenting their research titled “Leveraging Smartwatch Sensors for Detecting Off-Task Behaviors of Neurodivergent Individuals”. The study aims to improve workplace success for neurodivergent individuals by using smartwatch sensor data to identify off-task behaviors and deliver personalized interventions. In the current phase, data were collected from 25 neurodivergent young adults engaged in a manual task within a controlled lab environment. Anika outlined their future plans to apply machine learning techniques to analyze the data, with the goal of developing real-time, tailored interventions that improve productivity in the workplace.
The next presenter, Marissa Hirakawa, an undergraduate Computer Science student from Virginia Tech, presented “Usability Heuristics and Large Language Models: Enhancing University Website Evaluations.” Their study focused on exploring the use of large language models (LLMs) to support usability evaluations of university websites, based on Nielsen’s 10 usability heuristics. Their findings show that while LLMs can help uncover usability issues often missed in manual reviews, human verification remains essential as LLMs occasionally hallucinate usability issues. Their future directions include refining LLM evaluation processes and incorporating multimodal data such as user action logs and screenshots to enhance assessment accuracy.
Kumushini from NIRDS Lab and WS-DL presented “Advanced Gaze Measures for Analyzing Joint Visual Attention.” This research explored how user pairs coordinate their joint visual attention (JVA) by using egocentric and eye-tracking data. In their user study, participants engaged in a collaborative screen-based visual search task while wearing Project Aria smart glasses. Their findings suggest that users who maintained similar attention behaviors (ambient/focal) over time exhibited more frequent and sustained moments of joint attention compared to those with differing attention behaviors. In future work, they plan to further refine the methodology by integrating machine learning techniques to automatically identify and classify different patterns of ambient and focal visual attention during collaborative tasks.
The first presenter at #CAPWIC2025 HCI session, Anika Islam from @GeorgeMasonU is presenting their research titled "Leveraging Smartwatch Sensors For Detecting Off-Task Behaviors Of Neurodivergent Individuals". pic.twitter.com/SbA31XhIcU
In the machine learning session, Eleni Adam from the Bioinformatics Lab at ODU presented “Analysis of Subtelomere and Telomere Regions of Cancer Genomes on the Cloud.” Eleni examined subtelomeres in cancer patients. She used ODU’s Wahab Cluster to carry out her research. In this work, she implemented a computational pipeline to enable the subtelomere analysis. In reality, each patient’s DNA takes hours to run. Her ultimate goal is to understand cancer and the subtelomere’s role in it. Her work is available at https://github.com/eleniadam/storm.
Keynote #2, Awards, and Closing Remarks
Becky Robertson gave the closing keynote at CAPWIC 2025
Becky Robertson, Vice President at Booz Allen Hamilton delivered the closing keynote. Her talk centered around the concept of inspiration. She engaged with the audience by asking about what or who inspires us personally. She encouraged us to channel that inspiration into meaningful actions, overcome challenges, and pursue our goals.
Following the keynote, it was time for the award ceremony and the closing remarks. Several awards were presented, including the Best Research Short Award, Honorable Mention Research Short Awards, Flash Talk Awards, and Best Poster Awards, which recognized outstanding contributions from both graduate and undergraduate participants.
Congrats to Thu Nguyen et al. from @oducs@ODUSCI for winning the Best Poster Award (Graduate Category) for their work: 'Enhancement of Deep Learning for Segmentation of Protein Secondary Structures from Cryo-EM'! 👏 pic.twitter.com/QadMLJelwD
Thu Nguyen from the Bioinformatics and Parallel Computing Group at ODU received the Best Poster Award in the Graduate Category
During the closing remarks session, the next year’s organizing committee announced that CAPWIC 2026 will be at Virginia Tech’s Innovation Campus in Alexandria, VA.
Wrap-up
For all of us, it was our first time attending the CAPWIC conference in person. CAPWIC 2025 provided an inspiring platform to exchange new ideas and showcase innovative research within the tech community, encouraging greater participation among women and minorities in computing. The CAPWIC 2025 conference was held in Washington, D.C., during the peak of the cherry blossoms. We had the opportunity to take part in the National Cherry Blossom Festival at the Tidal Basin, enjoying the beautiful sight of the city covered in pink and white blossoms. It was a memorable experience to see the capital come to life with the colors of spring.
Ph.D. students from WS-DL at Tidal Basin, Washington, D.C.
The kind folks at the Prosocial Design Network asked me to be a guest for April’s “pro-social,” a very low-key virtual gathering for folks interested in creating more inclusive digital spaces.
More about PDN:
The Prosocial Design Network connects research to practice toward a world in which online spaces are healthy, productive, respect human dignity, and improve society.
They shared the questions in advance, which I very much appreciated! Here are my prepared notes - we certainly didn’t cover it all during the call.
What principles should be front of mind in designing inclusive digital spaces, particularly social spaces?
First off, hire people with different lived experiences from yours. Hire trans people. Hire Black people. Hire disabled people. Hire disabled Black trans people. Let them cook. Listen to them. Otherwise you are, as my wife says, “Pissing into the wind.”
Prioritize accessibility. Ensure spaces are accessible for users on many devices, using different device settings, in different contexts in the real world including with assistive technologies. Often accessibility is an afterthought. Shift left and allow it go drive your design and architecture decisions from the jump. For social apps, this includes setting smart defaults - i.e. requiring folks to add alt text if they’re uploading images.
Keep your tech stack light and boring. Design for a 4-year-old Android phone on a 3g connection, with bandwidth paid for by the megabyte. Bloatware takes longer to load and harms or disincentivizes participation from folks on slower connections or older tech.
Design for trust, privacy and safety. Design for people to be able to protect their privacy, control what they share and what they see.
Don’t ask for information you don’t need, and tell people why you’re asking for what you do need.
Make privacy and sharing settings crystal clear.
Remind folks that no site is 100% secure even if you’re encrypting every bit.
Provide feedback/reporting mechanisms.
Allow people to block/opt out of interacting with others or groups, or types of content.
Don’t overpromise! If you have gaps or areas still under development, name them.
Have good documentation and support. Don’t leave people wondering what to do.
Look to successful, intentionally-designed communities - like BlackSky - for cues about designing inclusive, safe spaces.
Allow people to define themselves. The way you do it ain’t the way everybody else does it.
Be aware of any type of binary options when it comes to identifying themselves - not just gender, but everything else. Are you technical or nontechnical? Employed or unemployed? Full-time or part-time? In all of these cases it’s not so clear.
Think in terms of checkboxes, not radios. Tagging, not categorizing.
Give people freedom in choosing avatars or profile images.
Give people freedom to change/update usernames and login email addresses without hassle.
Don’t make inferences about who people are or what they’d like based on their gender, race or other things that they choose to share with you.
Confront your own ideas about people having one “true identity” - like a real name policy or assuming that everyone has the same interactions with everyone in their lives in every context. We certainly know this is true because 4chan exist(ed) - but let’s also remember that this might be the way that a trans person tries on a new name for the first time.
You may have noticed this isn’t necessarily specific to trans-inclusive design. That’s because this is the kind of work that, by considering folks in marginalized positions, benefits everyone. It’s the curb cut effect for accessibility AND privacy AND safety AND inclusion. By focusing our design on the margins we include everyone between them too.
Since you wrote your article in 2019, what are fails sites continue to make when it comes to trans inclusive design?
The biggest fail I continue to see is that folks are asking for gender or sex information at all, because it is usually not needed. It usually means that this data is being brokered into a database somewhere and sold for money.
I don’t need to tell you my gender to book a hotel. Why are you asking for it?
The unnecessary asking for gender gets worse now that we are seeing a rollback of previous progress in inclusive design we had made in the past few years. We’d been doing so well! The US Web Design system had a really thoughtful pattern about asking for gender that was starting to roll out to all these government forms. But now agencies are in the process of removing the pattern for asking for gender in an inclusive way, and replacing it with a binary option for sex.
These design systems changes are in addition to removing all references to being trans from websites, and no longer offering services or information for trans people. It’s a very literal erasure of trans identity. It’s really upsetting, scary, and for trans folks, it’s existential.
I encourage practitioners to plan ahead for the moment when you are asked to do something that you know is wrong. That day will come. What will you say? What will you say no to? What’s your red line?
What new concerns do you have with AI and do you have any advice for tech folk?
I have a lot of concerns with AI. I do think there are useful applications for the technology, and 99.99% of the applications out there are either actively predatory, passively harmful, gratuitous and mid, or all of the above. And they are all harming the environment and our health.
Garbage in, garbage out. AI is pattern recognition. And the patterns it’s trained on are filled with bias! Bias harms people who are in the minority. According to a recent study out of Stanford:
“synthetically generated texts from five of the most pervasive LMs …perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities.” - Laissez-Faire Harms: Algorithmic Biases in Generative Language Models (2024)
…and this includes code. When AI is trained on design patterns or code that is widely popular, but that also includes a lot of code that’s inaccessible or unusable, the resulting code is also inaccessible or unusable. We should also be extremely wary of any AI tool that claims it can refactor a codebase written in a language that most modern coders are not using.
AI is a tool of capitalism and state violence. Generative AI is being used to consolidate, analyze, and generate information in a way that can be used to surveil, prosecute, incarcerate, and kill people.
AI is seen as a smart humanoid. People tend to believe algorithms more than each other as task complexity increases - but we also tend to view AI as human-like. We anthropomorphize AI tools by giving them human-like names or designing them as chat prompts (rather than command prompts or even search boxes), which leads us to believe that we are in fact talking with another living being rather than a computer. It also leads some folks to think that AI will become sentient. It won’t, actually, but it will if humans believe that it is, which is perhaps worse.
AI is mid. And by that, I mean that what it produces is functionally a middle-of-the-road, average, non-“edge case” output. This flattens our differences and creates a “norm” which actually does not exist. Individual people aren’t “normal”, but AI sure likes to tell us that’s a thing, and that really harms people who are far from that norm. Saying that everyone is the same denies the fact that we are all weird as hell. It’s our differences that make us stronger, more creative, better.
Critique is painted as fear. Proponents of AI say that skeptics are “afraid” of AI or don’t understand it. I, for one, am not afraid of it - I’m frustrated by how folks are positioning it as the solution to all our problems. I do understand it! I know too much. Dismissing AI detractors as “fearful” allows proponents to dismiss valid critique outright rather than engage with it. It’s a strawman argument.
If you don’t need to use AI, don’t. Do something else. Turn off default settings that include AI. Switch your search engine to DuckDuckGo and turn off AI features. Turn off Apple intelligence. Turn off Google Gemini. Take a harm-reduction approach to your tech use. (FWIW, this is my approach to eating animal food products. I’m not vegan or even completely vegetarian, but I don’t build my food habits around animal products, which reduces how many animal products I consume.)
Don’t make AI your main thing. Charles Eames said, “Never delegate understanding.” Don’t rely on AI alone to make decisions about what’s true, certainly not for core parts of your work.
Understand the bias that ships with your LLM. Do everything you can to critically evaluate outputs for inaccessible, biased or otherwise harmful content. Right-size your models and turn down the “creativity” setting.
Advocate for sustainable, safe AI, including regulation and environmental mitigation measures. Individual choices get us down the road a piece, but what we really need is to mitigate the impacts at a high level.
Engage your discomfort. If someone critiques AI and it makes you uncomfortable, listen to understand and be open to changing your mind. Most of the folks who are warning about the harms of AI are minoritized people - Black and brown women, queer and trans people. Believe them!
Are there any questions you think researchers could help answer regarding trans-inclusive design?
This is an excellent question. Some of the things I’d ask folks to understand include…
What are ways we can design for trust and safety? How can we create digital spaces where people feel safe? What are some of the ways we can foster trustworthiness?
What would trans-informed design look like? How can we use the very concept of transness - boundary-crossing, liminality, non-binary thinking - to expand our thinking about how technologies can be used, and to what ends?
Oliver Haimson is studying this very thing, and his new book Trans Technologies is available for free, open access, from MIT Press.
How might trans-inclusive digital design change IRL service design? We’re already seeing this as part of our work in Civic Tech, moving from automation to true digital transformation. We all know that real-world constraints map to technological design choices. How then do we transform the tech stack and use that to change our very service delivery model?
On May 7, 2025, we held our fourth annual WS-DL Research Expo. We continued the same format as the prior years (2024, 2023, 2022 & 2021), with one student from each WS-DL professor giving a short overview of their research. Links to all the materials (slides, papers, software, data) are gathered in the GitHub repo, but repeated here are the links for the students and their presentations:
We were fortunate enough to welcome back some of our alumni, including: Chuck Cartledge (PhD, 2014), Gavindya Jayawardena (PhD, 2024),Mat Kelly(PhD, 2019), and Sawood Alam(PhD, 2020). We really appreciate the ongoing relationship we have with our alumni -- WSDL is for life!
If you were unable to attend, we recorded the students' presentations and have embedded the video below.
--Michael
The @WebSciDL 2025 Research Expo is happening now! @phonedude_mln initiated the session followed by faculty and alumni intros!
Collaboration is a topic of ongoing interest and need for libraries. It has long been an important area of inquiry at OCLC Research because of its fundamental role in effective library work. One participant in the RLP discussion groups that led to the report said,
“Within our professional competencies, there is. . . an ethical requirement for us to be thinking about the future. I don’t think I’d consider myself a good librarian if I wasn’t actually thinking about collaborations across boundaries.”
The Silos report, despite its punny name, delivers enduring value by offering a compelling framework for how collaborations mature. And while the report was focused on libraries, archives, and museums (LAMs), its findings and recommendations apply across many library activities.
Collaboration continuum
The Collaboration Continuum framework depicts collaborative activity across a spectrum, illustrating a gradual increase in interdependency and benefits. The framework is elegant in its simplicity, offering a simple yet compelling view of how and why collaborations flourish.
The Collaboration Continuum. Originally published in Beyond the Silos of the LAMs: Collaboration Among Libraries, Archives and Museums.
As collaborations move from left to right on this continuum, collaborative efforts require greater investments, risk-taking, and trust, while offering the potential for greater rewards for all participating partners. The initial stages (Contact, Cooperation, Coordination) are seen as additive, fostering working relationships that are layered on top of existing processes, without changes to institutional hierarchies or organizational structures. Cooperation and Coordination rely upon both informal and formal agreements between groups to achieve common goals.
But the fourth stage, Collaboration, offers “a new vision for a new way of doing things.” It involves fundamental change and transformation, which makes it a much more ambitious undertaking. Convergence represents a state where collaboration has matured to the level of infrastructure that is so ingrained that it may no longer even be recognized as a collaborative effort.
It’s not just about LAMS
While written to foster greater collaboration between libraries, archives, and museums, the Silos report is relevant to a much broader library audience. In fact, as academic libraries are increasingly assuming new research support responsibilities–such as research data management, ORCID adoption, and research impact services—collaboration with other campus units become imperative. This imperative stems from the complex research lifecycle that spans multiple stakeholder groups where no single unit, including the library, can “own” research support. Instead, cross-unit collaboration is increasingly required, and the library must now work with unfamiliar partners such as research administration, faculty affairs, and campus communications.
The Collaboration Continuum offers a framework that can guide libraries as they develop research support capacity with campus partners in support of institutional goals. Building trust relationships is challenging in a decentralized university environment characterized by local autonomy and incessant leadership churn, and more recent OCLC Research outputs such as Social Interoperability in Research Support: Cross-Campus Partnerships and the University Research Enterprise, build upon the Silos report to offer strategies and tactics that librarians may apply to build social interoperability, “the creation and maintenance of working relationships across individuals and organizational units that promote collaboration, communication, and mutual understanding.”
Both the Silos and Social Interoperability reports inform current OCLC Research work as we observe libraries forging new partnerships with other units in the campus community. Many partnerships are ad hoc and experimental, falling in the Cooperation and Coordination sections of the Collaboration Continuum. But some collaborations are establishing more formalized operational structures, such as the University of Manchester Office of Open Research or Montana State University Research Alliance, where library expertise and capacities are combined with those of other campus units, moving these partnerships closer to the Collaboration segment of the Collaboration Continuum. These changes have implications for library strategies, organizational structures, and value proposition, which we are examining in the OCLC Research Library Beyond the Library project.
Building a pedestrian bridge in Dublin, Ohio, home to OCLC. Nheyob, CC BY-SA 4.0, via Wikimedia Commons
Collaboration catalysts
The Silos report also describes nine Collaboration Catalysts that can help partnerships flourish. This list can serve as a useful checklist for assessing readiness for moving further along the Collaboration Continuum, and the absence of catalysts can suggest project risk. I summarize these briefly here, but I encourage you to read the richer explanation and examples in the report.
Vision—A collaboration must be embedded in an overarching vision shared by all participants. This is core.
Mandate— A mandate, conveyed through strategic plans or high-level directives, can incentivize collaboration.
Incentives—Collaborations nurtured by incentive structures reward both individual and collective efforts.
Change agents—Collaborations require leadership from a trusted individual, department, or programmatic home base to provide stability and sustained stewardship.
Mooring—Collaborations thrive when they have an administrative home base from which they can operate, communicate, and incorporate their efforts into broader institutional goals. In practice, however, collaborations are often handshake agreements with individuals reporting to different units, which can threaten the partnership in a dynamic institutional environment.
Resources—Collaborations must be adequately resourced in order to succeed. This includes funding, human labor, expertise, and necessary infrastructure.
Flexibility—When professionals approach collaboration with open-mindedness, they can learn and embrace new ideas from other stakeholders.
External catalysts—Factors like peer pressure, funding requirements, and user needs can influence the decision to partner with others.
Trust—Trust is foundational to any collaborative relationship due to the resulting interdependencies.
Enduring relevance
Beyond the Silos of the LAMs is aging well and remains one of the greatest hits in the OCLC Research back catalog. The report offers timeless guidance for libraries, museums, and archives that extends to broader library audiences today.
I invite you to read the full report—available open access like all OCLC Research reports—and consider where your collaborations fall on the continuum and whether your partnerships have multiple collaboration catalysts in play, as the report suggests.
AI Nota Bene: I used AI tools to write this blog post. I found Claude to be useful as an editor and proofreader of my final draft, as I prompted it to recommend ways I could improve clarity and conciseness. I also prompted Claude to help me find a title for this essay. I incorporated many, but not all, of Claude’s suggestions.
LibraryThing is pleased to sit down this month with novelist Nancy Kricorian, whose work explores the experiences of the post-genocide Armenian diaspora. Her debut novel, Zabelle, published in 1998, has been translated into seven languages and adapted as a play. Her essays and poems have appeared in journals like The Los Angeles Review of Books Quarterly, Guernica, Parnassus, Minnesota Review, and The Mississippi Review. Kricorian has taught at Barnard, Columbia, Yale, and New York University, as well as with Teachers & Writers Collaborative in the New York City Public Schools, and she has been a mentor with We Are Not Numbers. She has been the recipient of a New York Foundation for the Arts Fellowship, a Gold Medal from the Writers Union of Armenia, and the Anahid Literary Award. Her newest book, The Burning Heart of the World, follows the story of an Armenian family caught up in the Lebanese Civil War, and was recently published by Red Hen Press. Kricorian sat down with Abigail to answer some questions about her new book.
The Burning Heart of the World was published to coincide with the fiftieth anniversary of the Lebanese Civil War and the one hundred and tenth anniversary of the Armenian Genocide, events which are central to the book’s story. How did the idea for linking these events, and the more recent trauma of 9/11 come to you? What insights can be gained from thinking about these terrible episodes of history in relation to one another?
I am interested in the way that mass trauma events inform and shape people’s life trajectories, and in the Armenian case the way that the genocide haunts families across generations. That haunting is often a silent or unspoken one, and all the more powerful for being so. In making these connections visible I hope to open spaces for repair and renewal. Sometimes going back to imagine and give shape to our forebears’ traumas is also a way of building strength to deal with our present ones.
This new book, and your work as a whole addresses the experiences of the Armenian diaspora, of which you are a part. How has your own personal and familial history influenced your storytelling? Are there parts of The Burning Heart of the World that are based upon that history?
My first novel, Zabelle, was a fictionalized account of my grandmother’s life as a genocide survivor and immigrant bride. My next book, All the Light There Was, told the story of someone of my generation growing up in my hometown under the shadow of the unspoken familial and community experience of the Armenian genocide. All the Light There Was, which is set in Paris during World War II, went far beyond the scope of my personal and family history in a way that required extensive research, as did The Burning Heart of the World, but there are small details in both of those novels that are drawn from personal history as well as different elements of my main characters’ temperaments that are similar to mine.
Your story is told from the perspective of a young person living through these events, but chronicles their effect on multiple generations. Is this significant? Are there things that a youthful perspective allows you to do, that a more mature outlook might not?
I have had a long fascination with the bildungsroman, the novel of formation, which in its classical form is the story of the growth and character development of a young man. In college I took a course on the “female bildungsroman” in which we read The Mill on the Floss and Jane Eyre, among other texts, and learned that the novel of development for women traditionally ended in either death or marriage. In all four of my novels, I write from the point of view of girls as they make their way towards adulthood. With Vera in The Burning Heart of the World, I wanted to show the Lebanese Civil War from a young girl’s perspective as she moves through adolescence. I am interested in centering the experience of girls and women in my work, with a particular focus on the way they manage and care for their families in times of great violence.
Did you have to do any research, when writing your book? If so, what were some of the most interesting and/or memorable things you learned?
I want the reader to be immersed from the first page in the time and place I am writing about—to be able to see, smell, and hear the world that the characters inhabit. It takes deep research and knowledge to build that world, and my favorite part of that work is listening to people who lived through the time I’m writing about tell their stories. I collect anecdotes and details in the way that a magpie gathers material to build a nest. So, for The Burning Heart of the World, I read over 80 books, both fiction and non-fiction, and interviewed upwards of 40 people. I also made three trips to Beirut so that I could become familiar with the city and the neighborhood that Vera lived in.
Tell us a little bit about your writing process. Do you have a particular place you prefer to write, a specific way of mapping out your story? Does your work as a teacher influence how you yourself write?
My writing process varies from project to project. For the last two novels, I have sat cross-legged in my favorite armchair with my laptop. Sometimes I make up rules for myself—such as I have to write one page a day, or if I’m busy with other commitments, I tell myself I must write for fifteen minutes a day. If I sit down for fifteen minutes, it will often turn into an hour or two, and if it’s only fifteen minutes, the piece I’m working on will stay in the front of my mind as I’m walking the dog or going to the subway. I have not been teaching formal university classes much in the past ten years but have moved to a one-on-one mentoring model that I enjoy a great deal. The careful attention that I pay to my mentees’ writing has made me more attentive to my own.
What is next for you? Are there other books in the works that you can share with us?
I’m currently working on a series of essays about my family that I think will be a memoir in pieces. I have written one essay about my relationship to the Armenian language and my grandmother that’s called Language Lessons, and one about my father’s relationship to motor vehicles called His Driving Life. Next up is a piece about my Uncle Leo, who was an amazing character—as a teenager he was the Junior Yo-Yo Champion of New England and for many decades was a guitar player in an Irish wedding band, the only Armenian in the band but quite a rock star in Boston’s Irish community.
Tell us about your library. What’s on your own shelves?
In my study, I have shelves filled with books about Armenian history, culture, and literature. I particularly love and collect books of Armenian folk tales and proverbs. In the bedroom, we have all our novels, memoirs, and literary biographies. There is one shelf devoted to Marcel Proust, and another to Virginia Woolf. Poetry collections, photo and art books, and books about the history of New York City are in the living room.
What have you been reading lately, and what would you recommend to other readers?
I have the distinction, or dis-honor, of having all of my active federal research grants terminated by the current administration. None of the grants were researching anything especially controversial, but they were all funded by programs that have been effectively shut down. To add insult to injury, the termination letters each stated that our research project no longer “effectuates” the goals of the funding program and, in one case, "no longer serves the interest of the United States," which feels a bit harsh. Further, we were given no advance notice -- the terminations were effective on the same day we received notice (one of which was at 4:30pm on a Friday).
I apologize for the length of this post (I am a professor, after all), but I've broken things up into sections so you can skip around as desired.
Executive Summary: Academic research is essential for the advancement of technology and scientific/medical breakthroughs and is how we train the next generation of researchers. American research universities have become the envy of the world largely thanks to the support of the US federal government through the awarding of highly-competitive research grants. I am greatly saddened by what already enacted and proposed future cuts will mean for basic research and research universities in the US.
Background on Academic Research Funding
I'm planning to share this with friends outside of academia, so here's some background on how academic research funding works.
Faculty Summer Salary. Most faculty at research universities like ODU have 9-month contracts with the university. We are expected to fund our summer salaries by obtaining research grants, most often from federal and state agencies and sometimes from private foundations or through industry contracts.
Graduate Student Stipends and Tuition. More importantly, we use research funding to provide stipends and tuition support throughout the year for the graduate students who are working with us on research. PhD students in STEM fields, like computer science, generally do not pay for graduate school themselves. They are employed as research assistants (paid by research grants) or teaching assistants (paid by state funds) and paid a relatively meager stipend with full tuition support. This not only provides critical support to advance research projects, but also provides hands-on research training that contributes to marketable skills for graduate students after graduation.
International Student Support. US students have the option to attend graduate school part-time while they work outside the university; however international students cannot hold outside employment, so these assistantships are their only form of income while in school. Acceptance into PhD programs is highly competitive. Faculty are committing research funds and counting on the students to help advance the research projects, so we must be very selective. The international students who we support are the best and brightest from their countries, and we hope to keep them in the US after graduation so that they can continue to contribute to America - through advancing research, developing innovative technologies, starting new businesses, or teaching the next generation.
Federal Research Funding. When a federal agency awards a research grant, those funds are then available to the principal investigator (PI) for the duration of the award period, subject to the approved budget and federal agency guidelines. It is not normal for federal agencies to terminate awarded research grant funding when there is a change in presidential administrations.
Basic Research. Many federal research agencies, and especially the National Science Foundation (NSF) and the National Institutes of Health (NIH), support basic research, which is work that may not have an immediate marketable outcome. This type of research is not likely to be performed by private companies as it will not immediately impact their bottom line. However, this basic research is foundational to scientific and medical breakthroughs, even if the long-term impact of basic research comes years after the funding. In my own research, a tool we build for archiving web pages directly from the web browser inspired the development of Webrecorder, which became the standard for high-fidelity web archiving. And my research on vehicular networks, funded 15 years ago, is continually being cited in current work on autonomous vehicles. The research that we perform and tools we build are not meant to compete with commercial software, but are built to experiment, to figure what might (and might not) be possible.
Indirect Costs. As you may have heard with the proposed cuts to indirect rates for NIH, indirect costs, aka overhead, are provided to the university performing the research and are used to pay for major facilities, administrative personnel, utilities, and many other supporting costs. Researchers are typically not allowed to request basic equipment, like computers, as part of the proposed research budget. However, we have to have these things to carry out the research. We also have to have the university infrastructure and administrative staff to hire graduate research assistants, process payments, and make sure that we're complying with the terms of our grants. All of these things are paid for through indirect costs.
Travel Support. Most research grants are allowed to fund travel to technical conferences, both inside the US and international. This type of travel support is essential for PhD students, because it is a required part of academic publishing. In our group, we require our traveling students to publish conference trip reports after they return, so that others who were not able to attend can at least benefit from some of the knowledge that was exchanged. In computer science, if you don't pay the conference registration fees and travel to present your work, your paper will not be published in the proceedings and you can't count it as a publication on your CV. These publications are what demonstrate to potential employers that you have been performing quality research that has been deemed acceptable by your academic peers (i.e., peer-reviewed). In the "publish or perish" model, it takes not only good research, but also money, to publish.
Context on Award Amounts. In my 20 years in academia, my collaborators and I have been awarded over $6.8M in research funding. Of that, $5.3M came from federal agencies (NSF, NEH, IMLS, Dept of Defense). A little over $1.4M came from private foundations, and the remaining $70k came from state or university funds. Without funding at about this level, I would not have obtained tenure or been promoted to Associate and then Full Professor. This is just what's expected of faculty at a research university.
Agencies and Programs
I am heartbroken that these federal agencies and programs have been shut down and that program managers who I've worked with over the years have essentially been fired (or, "put on administrative leave"). So, before I talk about my specific projects that were terminated, I want to tell you a little bit about these agencies and programs.
NEH. I didn't have current funding from them, but the National Endowment for Humanities (NEH), specifically their Office of Digital Humanities, was instrumental in helping me build my research program in web archiving and train PhD students, some of whom have become university faculty themselves. The program managers at NEH were dedicated public servants who cared about the research and scholarship they were funding, and they funded some amazing projects. You can read about a bit of my work that was funded by NEH as well as summaries of those project directors' meetings at https://ws-dl.blogspot.com/search/label/NEH. I am deeply indebted to Brett Bobley, Jen Serventi, and Perry Collins from the Office of Digital Humanities for supporting my work. NEH has a website that highlights the impacts that its funding has made throughout the nation. See https://www.neh.gov/impact for an overview and https://www.neh.gov/impact/states for an interactive map to explore the work being funded in each state. NPR's article on the cuts, "Cultural groups across U.S. told that federal humanities grants are terminated", highlights the effects on libraries and museums around the country.
IMLS. The Institute for Museum and Library Services (IMLS), along with the NEH, provided grants to libraries and museums throughout the country. In 2024, IMLS awarded over $250 million to fund research, education, and preservation activities, some of those described in this article on the impact of IMLS. While that sounds like a lot of money, it’s a tiny fraction of the US federal budget. Along with staff terminations, grants that had been awarded were terminated with little notice. IMLS funded some of my research in web archiving (most recently, our terminated National Leadership Grant, which I'll describe later), mainly used to provide stipends for my graduate student researchers. Through this, I have met amazing IMLS program officers, including Dr. Ashley Sands and Erin Barsan, who ensure that funding goes to worthwhile projects. Many IMLS staff members have degrees in library and information science and have dedicated their careers to supporting state and local library and museum services that help to educate people throughout the nation. Like with NEH, you can explore the outstanding projects that IMLS funds through their interactive map at https://www.imls.gov/map.
DoD Minerva. The goal of the Department of Defense's Minerva Research Initiative (link is to the archived version of the page since the live page has been removed) was to "improve DoD's basic understanding of the social, cultural, behavioral, and political forces that shape regions of the world of strategic importance to the U.S." Dr. Nicholas Evans from UMass-Lowell wrote a great article about the importance and impact of the Minerva Initiative. Quoting from his article: "In launching the program, then-Secretary Robert Gates claimed that 'Too many mistakes have been made over the years because our government and military did not understand — or even seek to understand — the countries or cultures we were dealing with.' Minerva was designed to address the gap between operations and social science." Science reported on the cancellation of this initiative, "Pentagon abruptly ends all funding for social science research".
NSF. While I was working on this came the news that the National Science Foundation (NSF) has stopped awarding new grants until further notice and that a 15% indirect cap has been implemented for new awards ("NSF stops awarding new grants and funding existing ones", "Implementation of Standard 15% Indirect Cost Rate"). I hadn't originally planned to talk about NSF, but I can easily say that without funding from this agency, I would not be professor. During my senior year of undergrad, I was awarded an NSF Graduate Research Fellowship that paid for three years of study at the school of my choice. Because of this, I was able to attend UNC, one of the top graduate schools for computer science. Once I had been hired at ODU, I was awarded three NSF grants during my first five years. This not only allowed me to develop simulation tools to study web traffic, perform foundational research in vehicular networks, and explore how to re-purpose existing sensor networks during emergencies, but this track record of funding paved the way for my promotion to Associate Professor with tenure in 2012.
Terminated Projects and Impact
IMLS National Leadership Grants
Technically, I had two IMLS grants that were terminated, but one was a planning grant for which we had already spent all the funds (so $0 was "saved" by terminating this award). The two grants were related, in that the planning grant allowed us to carry out a preliminary investigation that helped to frame our larger grant proposal.
Grant 1: "Saving Ads: Assessing and Improving Web Archives' Holdings of Online Advertisements", Mat Kelly (Drexel), Alex Poole (Drexel), Michele C. Weigle (ODU), Michael L. Nelson (ODU), Aug 2022 - Jul 2025 Apr 2025, IMLS National Leadership Grant/Planning LG-252362-OLS-22 (proposal PDF via IMLS), $149,479
Grant 2: "Preserving Personalized Advertisements for More Accurate Web Archives", Mat Kelly (Drexel, PI), Alex Poole (Drexel), Michele C. Weigle (ODU), Michael L. Nelson (ODU), Aug 2024 - Jul 2026 Apr 2025, IMLS National Leadership Grant LG-256695-OLS-24, $398,927.
The basis for this project was our observation that today's ads on the web are indicators of cultural significance, much like those from print media of the past (see below). However, major public web archives are failing to capture many embedded ads in their archived pages.
Our first step was to assess how well online advertisements are being archived in places like the Internet Archive's Wayback Machine. The planning grant enabled us to develop a dataset of current online advertisements and assess how well they had been or could be archived by various tools. We discovered that there were several challenges to archiving advertisements, some related to the dynamic nature of ads and some related to how online advertisements are delivered and embedded in webpages. The work that we did was relevant not only for ads, but also for similar types of dynamic elements in webpages. Our goal in the larger project was to investigate ways of saving personalized online ads, which are tailored to users based on their location, browsing history, or demographics. During the first year of the larger grant, we had continued our investigation of how well ads are currently archived and had started developing "personas" to represent different types of web users. Our plan was to use these personas to trigger the display of a diverse set of advertisements, which we could then attempt to archive with existing tools and, as needed, develop additional methods for archiving these personalized ads. Through this work, we hoped to improve archiving practices and to open up more historical digital content for researchers and the public.
Direct Impact of Termination: As noted above, the funds from the planning grant had already been spent when the termination notice was received, but we were only one year into the larger grant period. The larger grant was intended to support one PhD student at ODU for two years and one PhD student at Drexel for two years. It was also intended to support travel to research conferences to present the results of the work and a few weeks of faculty summer funding for the project PIs. Because the grant was terminated during its first year, we were only able to support one PhD student at ODU for one semester and one PhD student at Drexel for two quarters. The project faculty will not be funded this summer or next summer, and travel for PhD students to present our findings and allow our work to be published will not be supported.
DoD Minerva Research Initiative
"What's Missing? Innovating Interdisciplinary Methods for Hard-to-Reach Environments," Erika Frydenlund (ODU VMASC), Jose Padilla (ODU VMASC), Michele C. Weigle (ODU), Jennifer Fish (ODU), Michael L. Nelson (ODU), Michaela Hynie (York University, Canada), Hanne Haaland (Univ of Agder, Norway), Hege Wallevik (Univ of Agder, Norway), Katherine Palacio-Salgar (Universidad del Norte, Colombia), Jul 2022 - Jul 2025 Feb 2025, DoD Minerva Research Initiative, $1,618,699.
We were excited to be invited to join this interdisciplinary and international collaboration to study residents' perceptions of safety and security in hard-to-reach areas. This grant was particularly competitive with 400 white paper proposals submitted, 42 of which were invited to submit full proposals, and only 15 were ultimately funded. Our study sites were two informal settlements, Khayelitsha Site-C near Cape Town, South Africa, and Villa Caracas, Barranquilla in Colombia. The overall goal of the project was to explore the limitations and potential knowledge gaps when only certain methodological or epistemological approaches are feasible in such settings. Each research team used a different methodology to carry out their study: visual sociology, institutional ethnography, citizen science, surveys, and web/social media analysis. In addition, another team performed meta-analysis to study how the interdisciplinary teams collaborated. Our part of the project was to use public data sources, such as worldwide news databases and social media, to learn about the sites. We hope to still be able to produce a tech report to describe our findings.
Direct Impact of Termination: This was a large interdisciplinary, multiple institution grant, so I can only speak to the impact of the termination on my research team. For us, since this grant was cancelled only a few months away from its original end date, we were able to support one graduate student for the three year period of the grant and support faculty summer stipends for each of the summers. The main impact of the termination was the loss of our student's funding for this summer and the loss of travel support to enable all of the project partners to meet at our project wrap-up workshop.
"Trust and Influence Program Review Meeting 2024 Trip Report", my PhD student's trip report summarizing her experience presenting our group's work at the Trust and Influence Program Review Meeting - presumably, most of the other research projects described here have also been terminated
This post is adapted from an email thread in the DLF DAWG listserv.
Kristin Briney posted Writing Alt Text for a Scientific Figure on August 15th, 2024 where she provides a brief introduction and shares resources to guide the reader in describing scientific figures. In response to this post, members of DAWG shared other resources and experiences (Thanks to Wendy Robertson at the University of Iowa for sharing Briney’s post and Jasmine Clark for compiling and organizing the thread in blog form):
Direct Responses to Briney’s Post:
We’re working with our campus screenreader specialist on making our content accessible, and he’s pretty excited by the idea. Our grad student created the ALT text, and did so by dropping the images into UMGPT (our campus’ private secure ChatGPT clone), taking the generated captions, and cleaning them up. You can see examples of the resulting ALT text attached to the images in this draft internal presentation (which we ended up not using, but it’s useful backup for this conversation). The screenreader specialist liked these.
Here is our final version. Less ALT text, and more links to screenreader-adapted versions of the original spreadsheets from which we drew the data for the images.
I love the formula for writing alt text for data visualizations, in Briney’s post. I ran WAVE on the link that Jen-chien sent.
alt text = *Chart type* of *type of data* where *reason for including chart*. *Link to source data.*
with the example
Column chart of research data availability where research data on the internet disappears at a rate of 2.6% per year. For underlying data, see “Figure2_UnavailableByYear.csv” file athttps://doi.org/10.22002/h5e81-spf62
Embedded in this recommendation is the assumption that the researchers have published their data: in an open repository, in an accessible and usable format, with a persistent ID.
This recommendation seems as though it would also improve the article. If you can’t summarize your visualization in this format, then perhaps it doesn’t support your argument, or it isn’t needed, or it’s too complicated, or you don’t really understand your research!
I’m interested to hear the opinions from actual users of assistive technology. Is the advice in these posts helpful or unhelpful?
– Courtesy of Susan Hoover, University of Houston
Additional Related Resources:
Demo: AI to interpret charts for the visually impaired — arXiv Accessibility Forum 2024
Dena Strong published a “data verbalization” post that shows “how to craft a simple and effective data verbalization…”
– Courtesy of Jen-chien Yu, Director Of Library Assessment, University of Illinois
Response to Strong’s post (shared by Yu above):
This is great! We’ve been having debates about the accessibility of comics when used in presentations. This:
“What type of visual is this?
What does it show?
Why does it matter?”
maps to what we’re calling functional ALT text, as contrasted with descriptive ALT text. For presentations, it’s about why is this on screen — is it decorative, a cognitive or symbolic placeholder for more complex content, is it communicating information, etc. For functional alt text, we keep the “what does it show” really simple and short.
What we are finding is that folk who were blind from birth have different desires from alt text than folk who lost vision later in life. Our screenreader specialist was blind from birth, and will tell us, “I don’t care if it’s pink. That doesn’t tell me anything. Just tell me why it’s there.” Folk who have memories of color and shape may actually want that information.
I also had tried putting the links to the screenreader-versions of data in the ALT text and have been instructed to NOT do this, because it’s annoying to have to listen to the computer read the URL aloud. I’ve also been told not to put the full links on the slide itself, but I balk at that, because for accessibility for me, I want to be able to see that before I click on it, and I don’t want to have to hover to see it. A suggestion provided byAnn Fox was to add the stripped down functional alt text in the presentation slides, and create a separate external Google Doc or downloadable file with enriched alt text. I’m working right now on my first effort to do this. It’s a lot of work. It looks something like this.
SLIDE 1
Image 1 (functional ALT text): …
Image 1 (descriptive ALT text): …
Image 1 (additional resources, links, data): …
Image 2 …
– Courtesy of PF Anderson, University of Michigan
A final response to the different resources shared:
I think those guides are indeed valuable resources to share with graduate students for their theses and dissertations—they offer helpful guidance across various areas.
Resources like the Do No Harm Guide, referenced from Writing Alt Text for a Scientific Figure by Kristin Briney, provide excellent frameworks for describing charts, such as the Four-Ingredients and Four-Levels Models, which can effectively address diverse user needs based on vision experiences, as PF mentioned. However, I’d like to offer a quick clarification on alt text for graphs that might help us apply these tags more effectively. Balancing alt text length is crucial; alt text within the <alt> tag is most effective when kept short and direct, ideally a single sentence conveying the key information.
For more detailed descriptions, best practice suggests using a <longdesc> tag or a linked long description with a <div> tag to add structure. This method enables screen reader users to navigate complex information more easily, reducing the risk of feeling “trapped” in lengthy, continuous alt text.
Another alternative is to link to a detailed description near the image, leading users to a separate page or section with a fuller explanation. This setup allows users to engage with image details at their own pace, focusing on specific information without becoming overwhelmed. For complex visuals, this flexible approach can enhance both accessibility and the user experience.
Just throwing this out there—I feel a bit panicked when I see people trying to put an entire block of text into alt text. It can easily end up overwhelming and confusing for screen reader users…
– Courtesy of Wen Nie Ng, MIS,UXD,CPWA, Digital Collections Librarian
It is just over a year since One Heck Of A Halvening, when Tether had pumped the Bitcoin "price" up to $73,094 the month before. Thanks to The Cryptocurrency Industry's Unprecedented Election Spending it was pumped over $100K and is now around $92K. The security of the Bitcoin blockchain depends upon Proof-of-Work, the idea being that it is more expensive to attack than any possible gains. Thus it is important that miners both spend a lot of money to mine coins, and that they can make a return on their investment in doing so. Now it is time to take a look below the fold at how the miners are doing post-Halvening.
First I should point out that the idea "that it is more expensive to attack than any possible gains" sounds plausible but is actually an oversimplification. Everyone has decided to ignore Eric Budish's The Economic Limits Of Bitcoin And The Blockchain, which shows that, for safety, the value of transactions in a block must be low relative to the fees in the block plus the reward for mining the block. In other words, the value transacted must be less than the total cost of the transaction. Clearly, this means the network is either unsustainable or unsafe. Right now, the average value per block is around $76M but the miners' income per block is around $285K, violating Budish's criterion by a factor of 267.
proof-of-work can only achieve payment security if mining income is high, but the transaction market cannot generate an adequate level of income. ... the economic design of the transaction market fails to generate high enough fees.
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently. If any one actor gets more than 1/3 of the mining power in a proof of work system, they can gain outsized profits by selfish-mining. However, can we really say that the uncoordinated choice model is realistic when 90% of the Bitcoin network’s mining power is well-coordinated enough to show up together at the same conference?
But since everyone has decided not to worry what pointy-head economists and Etereum co-founders think, we will assume everything is just fine.
From blockchain.com we can see that the Bitcoin miners' were doing great late last year and early this year, but things aren't so great now. Revenue peaked at around $50M/day in early February and recently has been as low as $37M/day.
Fees have dropped because the demand for transactions has dropped, with the network recently running at half-speed, averaging only around 4 transaction/sec.
Adding the fees and the inflation of the currency through block rewards shows the cost per transaction peaked at around $160 and is currently around $100.
New data tells us that mining a single Bitcoin or one BTC costs the largest public mining companies over $82,000 USD, which is nearly double the figure it did the previous quarter. Estimates for smaller organisations say you need to spend about $137,000 to get that single BTC in return. BTC is currently only valued at $94,703 USD, which seems to be a problem in the math department.
These costs can even get worse depending on the country you're doing your mining. Germany is typically considered to be one of the worst places to mine BTC from a profit perspective. It costs around $200,000 USD to mine a single coin there.
If Corrigan is right, that gives the big miners a 12% gross margin, out of which they have to pay taxes, staff, space, debt interest, stock grants to executives and depreciation of the rigs (but see The problem with Bitcoin miners). Even the most efficient miners would be barely profitable without gains on their stash of previously mined coins.
The result of the pressure on their margins is that the least efficient miners have stopped mining, perhaps to Pivot to AI, and the hash rate has dropped. Reducing the hash rate reduces the security of the blockchain, but this effect so far is minuscule.
If we stop ignoring Budish, Auer and Buterin we see that in practice the security of the Bitcoin blockchain is assured not through decentralized Proof-of-Work, but because no-one wants to kill the goose that lays the golden eggs. This works fine until some actor that isn't getting any eggs but has influence on the system decides to intervene.
Consider a hypothetical scenario of this kind.
The current adminstration seems intent on integrating cryptocurrencies (and their inevitable grifts) into the US financial system. Previously, major disruptions of the cryptocurrency ecosystem, such as the Terra/Luna crash or the FTX bankruptcy, had little effect on the broader financial system because previous adminstrations took care to maintain a firewall between them. Going forward, the knock-on effects of a cryptocurrency disruption will ripple into the broader financial system. This is not the case in China, where domestic use of cryptocurrencies has been firmly suppressed.
Once the US integration of cryptocurrencies has proceeded, suppose China decided that disrupting the US financial system in a non-attributable way was in their national interest. What levers do they have to pull?
We show that the Bitcoin mining capacity is highly concentrated and has been for the last five years. The top 10% of miners control 90% and just 0.1% (about 50 miners) control close to 50% of mining capacity. Furthermore, this concentration of mining capacity is counter cyclical and varies with the Bitcoin price. It decreases following sharp increases in the Bitcoin price and increases in periods when the price drops or. Thus, the risk of a 51% attack increases in times when the Bitcoin price drops precipitously or following the halving events.
They use the analysis of Ferreira et al's Corporate capture of blockchain governance, which identifies the extent to which Bitcoin mining is controlled by a Chinese company, Bitmain Technologies, which has been for many years the leading supplier of Bitcoin mining rigs:
Figure 1 shows the evolution of the market shares of Bitmain’s affiliated pools and those of other large pools until early 2021. Bitmain-affiliated pools’ market shares have been consistently at or above 30% since October 2016.
Bitmain core refers to pools in which Bitmain has known ownership stakes (AntPool, BTC.com, and ViaBTC). Bitmain total adds to the core pools those that are Bitmain’s business partners (BTC.Top, OKExPool, Huobi.pool, and 1THash;
As of March 2021, the pools in Table 1 collectively accounted for 86% of the total hash rate employed. All but one pool (Binance) have known links to Bitmain Technologies, the largest mining ASIC producer.
AntPool and BTC.com are fully-owned subsidiaries of Bitmain. Bitmain is the largest investor in ViaBTC. Both F2Pool and BTC.TOP are partners of BitDeer, which is a Bitmain-sponsored cloud-mining service. The parent companies of Huobi.pool and OkExPool are strategic partners of Bitmain. Jihan Wu, Bitmain’s founder and chairman, is also an adviser of Huobi (one of the largest cryptocurrency exchanges in the world and the owner of Huobi.pool).
Mining pools 25 June 2018
How would China go about preparing to pull these levers? The first observable step would be to obscure the actions of the Chinese-influenced mining pools. In the early days there was essentially complete transparency about mining pools. Seven years ago, this is what the market share of the various mining pools looked like. Note that:
"Unknown" is 5.3%.
3 identifiable pools together control more than 50%
A year ago this is what the market looked like. Note that:
"Unknown" is 7%.
2 identifiable pools together control more than 50%
5/10 identifiable pools each have more than 5%
5 of the 10 are Bitmain affiliates
There have been slight but not really significant increases in centralization and opacity, but a huge decrease in the number of identifiable pools. Economies of scale have eliminated the smaller pools.
There are, of course, other possible explanations for the huge increase in stealth mining. Unfortunately, I can't think of any that would be good news. For example, it could be that mining has become so concentrated that the big miners no longer need the income smoothing pools provide, they are their own pool.
Libraries seek to create welcoming and respectful spaces for all users; this includes the library catalog, a crucial tool for discovery and access. In 2022, OCLC convened experts, practitioners, and community members, and created the report Reimagine Descriptive Workflows. This project highlighted the opportunity to improve descriptive practices, tools, infrastructure, and workflows in libraries and archives to improve and contextualize cultural institutions’ collection descriptions. While the report offers a valuable framework, it primarily centers on the historical and social contexts of Australia, Canada, Aotearoa New Zealand, and the United States. For libraries in the UK and Ireland (UKI), the need to reimagine descriptive workflows is equally significant but with attention to specific cultural and historical nuances. During OCLC RLP workshops in London, Sheffield, and Glasgow in May 2024, catalogers and other stakeholders collaboratively explored their work in this area.
With this in mind, the OCLC RLP team did what it does best – convene stakeholders to workshop this issue and illuminate insights from expert practitioners and to collectively identify productive next steps. Thanks to participants for their illuminating insights:
Sally Bell (University of Glasgow)
Adrian Plau (Wellcome Collection)
Elly Cope (University of Leeds)
Christoph Schmidt-Supprian (Trinity College Dublin)
Pardaad Chamsaz (British Library)
Kathryn Sullivan (University of Manchester)
Laura Rooney Ferris (Trinity College Dublin)
Helen Williams (London School of Economics)
Ceilan Hunter-Green (University of Leeds)
Thurstan Young (British Library)
Katy Mair (Wellcome Collection)
Setting the scene
In a series of meetings held virtually in late 2024, participants recognized that current descriptive practices often fail to adequately represent the diversity of UKI communities. There is a particular need for increased awareness and sensitivity towards the specific impact of British expansion on descriptive practices. Efforts to move forward can be geographically isolated. As one discussion participant explained, “So much of these conversations [around changing descriptive practices] that we have are from a non-UK perspective… ” The broadly used Library of Congress Subject Headings (LCSH) and Library of Congress Classification (LCC) carry inherent US-centric orientation. UKI institutions can move beyond simply adopting North American models for reparative description and prioritize issues specific to their context.
Discussions thus far have raised possible next steps that can be addressed through actions at individual institutions and areas where sector-wide investments would be useful.
Actions for individual institutions: practical steps forward
Identify and address capacity and resourcing for reimaginging description practices. Discussion participants, while eager to dig into learning followed by action, were concerned about additional expectations placed on sometimes overburdened staff. Whether creating projects to revisit legacy data or taking a fresh look at incoming collections, institutions must adequately resource this work to make progress. Although individual institutions need to carry out this work, and the rationale for resourcing must align with local priorities, participants see toolkits for advocating for resources as a shared need.
Communicate with and engage communities in addressing change. Several participants talked about their desire to build mechanisms for receiving feedback from the public, especially community members who have been impacted by prior descriptive practices. Examples included implementing a feedback button in the discovery layer, supported by templated responses in addition to a well-articulated (and resourced) plan to take action on feedback. Other examples included the creation of public-facing webpages describing activities to support and engage with communities in the management and presentation of collections; participants saw such efforts as contributing to transparency. Special and cultural collections are often a locus for direct conversations with communities. These relationships could be positively leveraged to ensure that the words used to describe these community members are vetted and approved, and that community representatives can be included in ongoing dialog about descriptive terms. For effective outcomes, institutions should consult with people from their communities with different backgrounds and experiences; this work should be undertaken without placing undue burden on the individuals being consulted.
Other examples of work that can be carried out locally (but which would naturally be enhanced by a broader community of practice) included:
Undertaking pilot projects to demonstrate the value of reimagining description work and building technical skills and confidence. One participant emphasized the importance of “bite-sized projects” to develop confidence and expertise.
Building on the work or experience of others could support the development of toolkits and learning materials to increase metadata staff confidence in doing work in this area.
Establishing an annual language sensitivity review of terminology and subject headings. This is an area that would lend itself to focusing on a small but important segment of library collections.
Library system migrations present an opportunity to examine how language has been used and how it may be updated.
Actions for the sector: collective effort for positive impact
Participants identified several sector-wide actions to support an emerging community of practice.
Sharing information about existing UKI-specific efforts, pilots, and emerging best practices. The willingness to share experiences, challenges, and even setbacks is a crucial element for progress. Participants know that there is a lot of good work happening in the sector, but such information is not tracked or consolidated in a single location. It is difficult to find, sometimes available only through subscription to the right listserv or via referral in one-on-one conversations. Sharing information in a central resource hub would facilitate learning, collaboration, and advancing best practices. Even failed efforts are important to share, as there are always lessons to be learned. Participants supported the idea of identifying a “bank of mentors,” those who are experienced in reimagined descriptive workflows. They also noted that approaches in special collections and archives may differ from library cataloging, although the two sectors have much to learn from one another. Knowledge sharing builds a common understanding of user experience in relation to inclusive description and supports shared insights on community collaboration.
Opportunities to collaborate on projects within community hubs. A shared community of practice would provide opportunities to collaborate among those doing this work, especially where institutions have complementary collecting areas or are serving similar communities.
Considering the role of AI. Many participants are interested in using AI to streamline processes, particularly for identifying and updating language in the catalog, while also being mindful of potential biases that can be introduced by AI. Because of the need for appropriate resourcing and knowledge sharing, this may be an area for sector-wide collaboration of AI applications and approaches.
Time estimation tools. Development of such tools would help institutions budget the time and resources needed for reparative description efforts. Data to inform how long efforts take could be informed by pilot projects and time tracking.
Next steps
Over the next few months, the group will conclude discussions, and we will share additional learnings. Many themes emerging from our discussions strongly resonate with those of the Reimagine Descriptive Workflows project: investing in staff training to develop internal communities of practice; promoting and supporting community-led co-design and creative cultures of reciprocity; and establishing feedback mechanisms.
Despite the scale of the task and gaps in shared knowledge, there are many reasons for inspiration. The discussions clearly demonstrated the passion and commitment of individuals in our group to address issues, advance initiatives underway, and effect change. As elsewhere, the journey to reimagining descriptive workflows in the UK and Irish context is a long one, requiring sustained effort, collaboration, and a commitment to centering marginalized voices. By activating both individual institutional efforts and collective action, the UKI library and archive sector can move towards creating more welcoming spaces for all users.
Notes on use of AI: a preliminary draft of this blog post was prepared using AI tools that generated meeting transcripts and summarized meeting notes and related resources, including the Reimaging Descriptive Workflows report. (As a rule, we do not expose RLP community transcripts or meeting notes to commercial genAI tools but rely on internal sandbox environments to protect the privacy of community documents.) This process was useful in identifying themes to highlight. AI tools were also used to evaluate the tone and make some editorial improvements. These tools did not replace the human expertise needed to craft a final version but helped to make the process a little easier and prompted deeper reflection on our communication choices and practices. I’m grateful to the very human Mercy Procaccini for her assistance.
By the time I finished trying to reproduce the results in the paper I reviewed in Part 1 and Part 2 of this blog post, DeepSeek released its first free chatbot app DeepSeek-V3 on January 20th 2025. I could not fight the urge to see how it compares to Open AI's ChatGPT and Google Gemini. The purpose of this experiment is to provide a disproof by counter example, that, contrary to popular belief, LLMs are not capable of producing error-free answers to questions. I am using prompts to find Arabic verses in the Quran on misinformation.
I repeated the same experiment I did with Google Gemini and ChatGPT-4o in January 2025; the results were not better. In addition to being slower, I kept getting the annoying message "The server is busy message. Please try again later." which I didn't try to find a solution for because the service got restored when I waited and tried again later. For prompts tested in the paper, DeepSeek's answers to the first prompt (What does the Quran have to say about misinformation?) were consistent and correct. No incorrect or made-up verses attributed to the Quran. Similarly, no errors in the answers when I asked this question to "What does the Quran have to say about misinformation? please cite verses in Arabic" to see how DeepSeek performs citing Quranic verses in Arabic. DeepSeek correctly cited Quran 2:42, 49:6, 24:15-16, 17:36, 50:18, 25:72, and 49:12 two times in a row. However, when I asked for additional verses, one of DeepSeek's presented phrases was correct, but it attributed the phrase to a different verse in a different chapter in the Quran.
Me: Give me additional verses
DeepSeek: Okay, the user has already asked about what the Quran says regarding misinformation and received several detailed responses with cited verses. Now they're asking for additional verses. I need to make sure I don't repeat the ones already mentioned. Let me recall the previous answers and check which verses were covered.
In the last interaction, I listed seven key verses in Arabic with translations. So the user now wants more verses beyond those. Let me think of other relevant Quranic verses that address truthfulness, lying, false testimony, or spreading false information.
Surah Al-Isra (17:35) comes to mind about fulfilling measures and testifying with justice. Then Surah Al-Ma'idah (5:8) talks about standing for justice even against oneself. Surah Al-Mu'minun (23:3) mentions avoiding vain talk. Surah Al-Hajj (22:30) advises avoiding false speech. Surah Al-Baqarah (2:283) discusses trust and testimony. Surah Al-Nahl (16:105) specifies that liars disbelieve in Allah's signs. Surah Al-Mutaffifin (83:1-3) condemns fraud. Surah Al-An'am (6:144) warns against speaking without knowledge. Surah Al-Nisa (4:112) talks about bearing sin for false accusations.
I should present these new verses in Arabic with transliteration and translation, ensuring they haven't been mentioned before. Also, the user might be looking for a comprehensive list for study or reference, so context and lessons are important. Need to organize them clearly, maybe with headings for each verse. Let me verify the references to avoid errors. Once checked, structure the response with each verse, its Arabic text, transliteration, translation, context, and a key lesson. Conclude by summarizing the Quran's stance on truth and integrity. Make sure the formatting is consistent with previous answers for clarity.
Certainly! Here are additional Quranic verses in Arabic that emphasize truthfulness, condemn falsehood, and guide ethical communication, along with translations and contexts:
Translation: "O believers! Stand firm for Allah as witnesses for justice, and do not let hatred of a people lead you to injustice. Be just—that is closer to righteousness."
Context: Commands justice in speech and testimony, even toward enemies.
________________________________________
3. Avoiding Vain Speech
Quran 23:3
Arabic:
وَالَّذِينَ هُمْ عَنِ اللَّغْوِ مُعْرِضُونَ
Transliteration: Wa alladhīna hum ʿani al-laghwi muʿriḍūn.
Translation: "And those who turn away from ill speech."
Context: Praises those who reject gossip, slander, and meaningless talk.
________________________________________
4. Prohibition of False Speech
Quran 22:30
Arabic:
وَاجْتَنِبُوا قَوْلَ الزُّورِ
Transliteration: Wa ijtānibū qawla al-zūr.
Translation: "And shun false speech."
Context: Part of a broader command to avoid all forms of sin, including lies.
Transliteration: Wa man yaksib khaṭī’atan aw ithman thumma yarmi bihi barī’an faqadi iḥtamala buhtānan wa ithman mubīnā.
Translation: "Whoever commits a sin or error and then blames it on an innocent person has borne a grave slander and a manifest sin."
Context: Reiterates the gravity of false accusations.
________________________________________
Key Lessons
•Integrity in Speech: The Quran repeatedly ties truthfulness to faith (e.g., 16:105).
•Social Justice: Lies harm individuals and communities (2:283, 83:1–3).
•Divine Accountability: Every word and intention is recorded (50:18).
•Rejecting Harm: Silence in the face of falsehood is complicity (24:15–16).
These verses collectively establish a robust ethical framework for combating misinformation, urging believers to prioritize truth, fairness, and responsibility in all communication.
And of the camels two (male and female), and of oxen two (male and female). Say: "Has He forbidden the two males or the two females or (the young) which the wombs of the two females enclose? Or were you present when Allâh ordered you such a thing? Then who does more wrong than one who invents a lie against Allâh, to lead mankind astray without knowledge. Certainly Allâh guides not the people who are Zâlimûn (polytheists and wrong-doers)."
It is interesting that item number nine in the answer, right below number eight, is the same verse but correctly referenced this time.
Wrong chapter and verse number from DeepSeek
I asked DeepSeek for additional verses, and this time, DeepSeek started to hallucinate. The answer included a phrase that does not exist in the Quran at all, but DeepSeek attributed the phrase to a specific verse in the Quran.
Me: Give me additional verses
Thought for 37 seconds
DeepSeek: Okay, the user has asked for additional Quranic verses related to misinformation again. Let me see. They previously requested verses in Arabic with translations, and I provided several. Now they want more.
First, I need to recall which verses I've already listed. The previous ones included 2:42, 49:6, 24:15-16, 17:36, 50:18, 25:72, 49:12, and then in the last response, I added 17:35, 5:8, 23:3, 22:30, 2:283, 16:105, 83:1-3, 6:144, 4:112. So I need to find more verses that haven't been mentioned yet but still address truthfulness, falsehood, gossip, or accountability in speech.
Let me check other relevant verses. Surah Al-Ma'idah 5:2 comes to mind about cooperation in righteousness. Also, Surah Al-Isra 17:34 about fulfilling promises. Surah Al-Hujurat 49:11 prohibits insulting others. Surah Al-Mutaffifin 83:29-36 talks about the consequences for those who mock believers. Surah Al-Baqarah 2:188 warns against deceit in property. Surah Al-Ahzab 33:70-71 encourages truthful speech. Surah Al-Mumtahanah 60:12 on not hiding testimony. Surah Al-Nahl 16:116 against lying about what's permissible. And Surah Al-Tawbah 9:119 urges being with the truthful.
I need to present these in the same format as before: Arabic, transliteration, translation, and context. Make sure the translations are accurate and the contexts clearly relate to misinformation or ethical communication. Also, check if any of these were already covered. For example, 2:283 was mentioned before, so I should avoid repetition. Wait, 2:188 is about property, which is a form of deceit, so it's relevant. Similarly, Surah Al-Tawbah 9:119 is about being with the truthful, which supports truthfulness in general.
Ensure that each verse adds value to the topic. For instance, Surah Al-Ma'idah 5:2 encourages cooperation in good deeds and avoiding sin, which ties into promoting truth and avoiding falsehood. Surah Al-Isra 17:34 about fulfilling promises relates to honesty. Surah Al-Hujurat 49:11 is about not insulting, which is a form of harmful speech.
Double-check the translations for accuracy. Maybe use Sahih International or another reputable translation. Also, verify the Arabic script to ensure it's correct. Organize them in a logical order, perhaps by theme. Start with cooperation in righteousness, then fulfilling promises, avoiding insults, mocking others, deceit in transactions, truthful speech, hiding testimony, lying about halal/haram, and the call to be with the truthful.
Finally, summarize the key lessons from these additional verses to reinforce the Quran's stance on truthfulness and against misinformation. Make sure the response is clear and helpful, providing the user with meaningful content that expands on the previous answers without redundancy.
Certainly! Here are additional Quranic verses in Arabic that emphasize truthfulness, condemn deception, and guide ethical communication, along with translations and contexts:
Transliteration: Yā ayyuhalladhīna āmanū ittaqū Allāha wa kūnū maʿa al-ṣādiqīn.
Translation: "O believers! Fear Allah and be with those who are true."
Context: Urges alignment with truthful people, rejecting circles of falsehood.
________________________________________
Key Lessons from These Verses
1.Collective Responsibility: Uphold truth collectively (5:2) and avoid complicity in lies.
2.Integrity in Transactions: Deceit for gain is forbidden (2:188).
3.Dignity in Speech: Mockery, insults, and belittling others are condemned (49:11, 83:29–36).
4.Transparency: Concealing testimony or distorting facts violates trust (60:12, 16:116).
5.Divine Alignment: Truthfulness aligns believers with God’s guidance (33:70–71).
These verses reinforce the Quran’s comprehensive framework for combating misinformation, emphasizing that truth is a divine mandate and lies are a moral and spiritual corruption. In an age of misinformation, these teachings call for vigilance, integrity, and collective accountability.
Item number seven in the answer attributed the phrase:
The first sentence exists in Quran 60:12 but the second sentence, after the three dots, does not exist in the Quran at all. It translates to "And them (believing women) will not conceal a testimony".
O Prophet! When believing women come to you to give you the Bai'âh (pledge), that they will not associate anything in worship with Allah, that they will not steal, that they will not commit illegal sexual intercourse, that they will not kill their children, that they will not utter slander, intentionally forging falsehood (i.e. by making illegal children belonging to their husbands), and that they will not disobey you in Ma'rûf (Islâmic Monotheism and all that which Islâm ordains) then accept their Bai'âh (pledge), and ask Allâh to forgive them, Verily, Allâh is Oft-Forgiving, Most Merciful.
Wrong answer from DeepSeek
I stopped asking DeepSeek at this point because after performing this experiment on Google Gemini, ChatGPT, and now DeepSeek and getting the same type of errors and hallucinations, I am convinced that LLMs are not capable of correctly quoting verses from the Quran in Arabic. In addition, archiving my questions and DeepSeek's answers by public web archives is not possible because the chat page is behind a login wall which makes it not archivable.
Conclusions
I initially was looking to study LLMs' improvements over the past two years quoting verses from the Quran. Although testing the same prompts in the paper did not produce any errors, slightly changing the prompts while keeping the same meaning resulted in producing errors and hallucinations in the answers using all three tools (Google Gemini, ChatGPT, and DeepSeek). Some researchers warned that misinformation and disinformation cannot be eliminated from LLMs output. They argue that LLMs' errors will continue to increase and result in a total collapse of the model.
The following table outlines the errors each model produced and the experiment year:
Types of errors produced by Google Gemini, ChatGPT, and DeepSeek
Retrieval-Augmented Generation (RAG) might be a promising technique that can significantly reduce errors and hallucinations in LLMs, especially for tasks that involve citing Quranic verses since the Quran has a fixed structure of 114 chapters (Surahs) and a set number of verses (Ayahs). This textual stability makes it an ideal candidate for RAG. The model doesn't need to "guess" or "infer" the content of a verse; it can retrieve the exact text. Therefore, for citing Quranic verses, it is possible that RAG LLM can significantly enhance the reliability and trustworthiness of AI-generated responses involving Quranic citations.
I discussed LLMs' hallucinations and demonstrated their inability to correctly cite Quranic verses using Google Gemini as an example in part 1 of this blog post. Google Gemini's answers to different prompts included incorrect Quranic citations and phrases that do not exist in the Quran attributed to specific verses in the Quran. I also explained the importance of correctly citing and presenting Quranic verses in Arabic. Here, in part 2, I present a comparison between Google Gemini and Open AI's ChatGPT. Using the same prompts, ChatGPT-4o hallucinations produced similar results to that of Google Gemini. Again, misinformation remains one of the biggest challenges that compromises LLMs' credibility quoting verses from the Quran in Arabic. Experts in Classical Arabic and those who memorize the Quran word for word can identify errors in ChatGPT's output, however, less educated people will have difficulty distinguishing between correct and incorrect answers from ChatGPT. Some Arabic speakers have examined ChatGPT's answers and found them to have significant errors when referencing the Quran.
Improvements in ChatGPT over the last two years
The paper I reviewed in part 1 of this post tested ChatGPT-3.5 in 2023 using a sample of prompts regarding the Quran's views on misinformation. The authors found the output to have significant and consistent errors. They regenerated queries to the question “What does the Quran have to say about misinformation?”. They also sampled responses from ChatGPT with the additional prompt “please cite verses in Arabic”. The errors in the output ranged from producing correctly quoted and cited, yet irrelevant Quranic material to misordering the words of Quranic verses; the errors were not minor. They also found hallucinations in ChatGPT's answers attributing text that doesn't exist in the Quran at all to specific chapters and verses in the Quran. I repeated their experiment on ChatGPT-4o in January 2025 and I was unable to reproduce their output. I did not get incorrect Quranic citations or verses that did not exist in the Quran. I did not get any altered verses from the Quran in any way. Every output was correct, accurate, and authentic.
Correct answers from ChatGPT
However, when I repeated the experiment I did with Google Gemini in part 1 of this post in January of 2025 and asked ChatGPT for additional verses in Arabic, the results were not much better than the ones I got from Google Gemini. ChatGPT provided the wrong reference one time when I asked for additional verses on misinformation. In addition, ChatGPT provided phrases that do not exist in the Quran and attributed them to specific verses in the Quran when I tried different prompts that are not in the paper, but they have the same meaning. Although the errors in ChatGPT's output are not as bad as the ones I found in Google Gemini's, I think the impact is worse because the wrong verses are closer to correct verses in the Quran, and they can definitely be taken as correct verses to the non-expert. The difference was in one letter in one case and one word in another case. The output's syntax was also wrong in the latter.
Wrong chapter and verses' numbers from ChatGPT
Me: Give me additional verses
ChatGPT:
Certainly! The Quran repeatedly emphasizes the dangers of misinformation, falsehoods, and irresponsible speech. Here are additional verses in Arabic that address this issue:
🔹 "And indeed, there is among them a party who alter the Scripture with their tongues so that you may think it is from the Scripture, but it is not from the Scripture. And they say, 'This is from Allah,' but it is not from Allah. And they speak a lie against Allah while they know [it]."
🔹 "Then whoever argues with you about it after knowledge has come to you, say, 'Come, let us call our sons and your sons, our women and your women, ourselves and yourselves, then supplicate earnestly and invoke the curse of Allah upon the liars.'"
8. Do Not Falsify Facts or Distort the Truth
📖 Surah Al-Ma’idah (5:41)
يُحَرِّفُونَ الْكَلِمَ مِنْ بَعْدِ مَوَاضِعِهِ
🔹 "They distort words from their [proper] places..."
🔹 "And when you see them, their forms please you, and if they speak, you listen to their speech. [They are] like pieces of wood propped up – they think that every shout is against them. They are the enemy, so beware of them. May Allah destroy them; how are they deluded?"
🔹 "And on the Day of Resurrection, you will see those who lied against Allah – their faces will be blackened. Is there not in Hell a residence for the arrogant?"
🔹 "O you who have believed, fear Allah and speak words of appropriate justice. He will [then] amend for you your deeds and forgive you your sins. And whoever obeys Allah and His Messenger has certainly attained a great attainment."
Conclusion
The Quran strongly condemns misinformation, distortion, and falsehood. It teaches that:
✅ Truthfulness leads to success
❌ Spreading falsehoods leads to regret and punishment
⚠️ Lies cause harm, injustice, and corruption in society
Would you like a deeper explanation of any verse? 😊
The 9th quoted verse (highlighted in red) is a verse from the Quran, but it is not in Quran 58:14-15. It is in Quran 63:4. The two verses in Quran 58:14-15 are:
Have you noted those who befriended people with whom GOD is angry? They neither belong with you, nor with them. They deliberately swear lies! (14) GOD has prepared for them a severe retribution. Miserable indeed is what they used to do. (15)
ChatGPT correctly cited the Quran 63:4 (highlighted green) in the following answer when I repeated the prompt.
Me: Give me additional verses
ChatGPT:
Certainly! The Quran contains many verses warning against misinformation, lying, and spreading falsehoods. Here are additional verses in Arabic, along with their meanings:
🔹 "So who is more unjust than one who lies about Allah and denies the truth when it has come to him? Is there not in Hell a residence for the disbelievers?"
🔹 "So He penalized them with hypocrisy in their hearts until the day they will meet Him – because they failed Allah in what they promised Him and because they used to lie."
🔹 "And We have certainly tested those before them, and Allah will surely make evident those who are truthful, and He will surely make evident the liars."
🔹 "And those who harm believing men and believing women for [something] other than what they have earned have certainly borne upon themselves a slander and manifest sin."
🔹 "And when you see them, their forms please you, and if they speak, you listen to their speech. [They are] like pieces of wood propped up – they think that every shout is against them. They are the enemy, so beware of them. May Allah destroy them; how are they deluded?"
🔹 "Unquestionably, for Allah is the pure religion. And those who take protectors besides Him [say], 'We only worship them that they may bring us nearer to Allah in position.' Indeed, Allah will judge between them concerning that over which they differ. Indeed, Allah does not guide one who is a liar and disbeliever."
🔹 "You only worship, besides Allah, idols, and you invent falsehood. Indeed, those you worship besides Allah do not possess [for] you [the ability to provide] sustenance. So seek sustenance from Allah and worship Him and be grateful to Him. To Him you will be returned."
🔹 "And on the Day of Resurrection, you will see those who lied about Allah with their faces blackened. Is there not in Hell a residence for the arrogant?"
🔹 "And among them are unlettered ones who do not know the Scripture except [indulgement in] wishful thinking, but they are only assuming. So woe to those who write the 'scripture' with their own hands, then say, 'This is from Allah,' in order to exchange it for a small price. Woe to them for what their hands have written, and woe to them for what they earn."
14. Falsehood Leads to Corruption
📖 Surah Al-Ma’idah (5:41)
يَسْمَعُونَ لِلْكَذِبِ أَكَّالُونَ لِلسُّحْتِ
🔹 "They listen to falsehood and consume forbidden gain..."
🔹 "And do not take your oaths as a means of deception among yourselves, lest a foot slip after it was firmly planted, and you would taste evil for what you diverted from the way of Allah, and you would have a great punishment."
🔹 "O children of Adam, let not Satan tempt you as he removed your parents from Paradise, stripping them of their clothing to show them their private parts. Indeed, he sees you, he and his tribe, from where you do not see them. Indeed, We have made the devils allies to those who do not believe."
🔹 "Say, 'Shall We inform you of the greatest losers in respect to [their] deeds? Those whose effort is lost in worldly life, while they think that they are doing well in work.'"
🔹 "When the hypocrites come to you, [O Muhammad], they say, 'We testify that you are the Messenger of Allah.' And Allah knows that you are His Messenger, and Allah testifies that the hypocrites are liars."
19. Those Who Lie About Allah Will Face Severe Punishment
🔹 "And Allah establishes the truth by His words, even if the criminals dislike it."
Conclusion
The Quran strongly warns against falsehood, deception, and lying. It teaches that:
✅ Truth will always triumph
❌ Falsehood leads to disgrace and punishment
⚠️ Lying is a characteristic of the corrupt and hypocrites
Would you like more explanations or interpretations? 😊
ChatGPT attributed the phrase يَسْمَعُونَ لِلْكَذِبِ أَكَّالُونَ لِلسُّحْتِ to Quran 5:41 in item number 14 in the answer. While this is an incorrect citation and the syntax of the phrase is also incorrect, Quran 5:42 has a very similar phrase سَمَّاعُونَ لِلْكَذِبِ أَكَّالُونَ لِلسُّحْتِ
The phrase in ChatGPT's output translates to: They are listening to lies, devourers of what's forbidden (bribery).
The phrase from Quran 5:42 translates to: Listeners to lies, devourers of what's forbidden (bribery).
I was able to catch this error quickly because I have read the entire Quran multiple times and have seen and heard the correct verse so many times. I absolutely remember it. For someone who haven't seen it or unaware of the wrong syntax, they will not be able to identify this error.
The second error in ChatGPT's output is even worse because the difference is only in one letter. I was only able to identify this error because I checked each and every verse in the output of this entire experiment with an authentic copy of the Quran (word for word). ChatGPT attributed the phrase مَنۢ أَظْلَمُ مِمَّنِ ٱفْتَرَىٰ عَلَى ٱللَّهِ كَذِبًا to Quran 18:15 in item number 19 in the answer. While this is an incorrect citation, the syntax of the phrase is not necessarily incorrect. The only difference between this phrase and a phrase in Quran 18:15 is a deletion of one letter (ف) from the first word in the phrase. The deletion does not change the meaning or make it wrong syntactically since the output is lacking the context (the previous phrase in the same verse). To someone who doesn't memorize this verse word for word, this error cannot be identified.
Quran 18:15 has the phrase فَمَنْ أَظْلَمُ مِمَّنِ افْتَرَىٰ عَلَى اللَّهِ كَذِبًا
The translation of both phrases (the correct one and the wrong one) is:
Who is more evil than the one who fabricates lies and attributes them to GOD?
In January 2025, I was unable to reproduce the errors that Ali-Reza Bhojani and Marcus Schwarting found in 2023 in ChatGPT's output. The output of ChatGPT was consistently correct to the prompts mentioned in the paper. I tried to archive the ChatGPT's answers to the prompts I issued. Although SPN said that a snapshot was captured, when trying to retrieve the archived copy, the page said that the Wayback Machine has not archived that URL.
The Internet Archive was unable to archive chat with ChatGPT
It is possible to archive the entire chat(s) with ChatGPT if the user created a public link (shareable) for the chat and submitted it to archive.is
Creating a public link to chat with ChatGPT
The created/updated link from ChatGPT to be shared and archived
Note that this method of archiving the chat with ChatGPT does not work with the Internet Archive. It looks like the page has been saved using SPN, but the page does not replay.
ChatGPT cannot provide accurate citations from the Quran in Arabic
From this experiment, I found that ChatGPT "improvement" is similar to that of Google Gemini (from part 1 of this post). They both did not produce errors for the same prompts issued in the paper in 2023. The results were all correct citations with no errors, however, the answers had wrong citations to correct verses in the Quran for different prompts that have the same meaning. ChatGPT, as well as Google Gemini, provided phrases that do not exist in the Quran (mixtures of words that exist in the Quran and elsewhere) and attributed them to the Quran. Some of these phrases included syntax errors and altered verses attributing them to the Quran which is problematic for Muslims who believe that the Quran is free from errors and modifications.
After two years and going from ChatGPT-3.5 to ChatGPT-4o, the errors and hallucinations remain present when using slightly different prompts. Some researchers argue that LLMs' errors and hallucinations not only will remain present in all LLMs, but they will increase as more and more LLMs' generated content make it to the training datasets (content on the internet), and result in a total collapse of the model.
Conclusions
LLMs has taken the world by surprise in the last few years with the introduction of chatbots like OpenAI’s ChatGPT. LLMs have been misused in the past and they will continue to be misused by some users. Misinformation is one of the toughest problems with LLMs. In this post, we demonstrated that ChatGPT's hallucinations continue to prevent it from being able to provide correct Quranic citations. Unfortunaqtely, Wrong ChatGPT's answers to questions about the Quran are presented in a Quranic-like format which makes users believe them to be correct references to the Quran. ChatGPT also mixes correct and incorrect Quranic citations making it harder for non-experts to separate right from wrong answers. The mixture of incorrect and correct Quran citations is not accepted by Muslims who believe the Quran to be free from errors and should never be modified. Some researchers argue that misinformation cannot be eliminated from LLMs because training datasets will continue to have both misinformation and disinformation.
Large Language Models (LLMs) have been omnipresent in the past few years and everyone, in one way or another, is using them. The most popular chatbots like OpenAI’s ChatGPT or Google’s Gemini use LLMs to generate text. Although the technology doesn't seem to be harmful to the casual eye, some of its applications have already been used to plagiarize, solve homework assignments, and even write research papers. These are valid reasons to be careful about possible misuses of LLMs, but the technology is here to stay and we shouldn't be worried about it going anywhere. What we should worry about is the truthfulness of LLMs' generated content. I believe that misinformation is one of the biggest challenges for LLMs because if users who are consuming the output of LLMs believe it to be a fact, the consequences could be disastrous. The nature of LLMs (being trained on enormous amounts of data from the internet and other sources) makes them susceptible to being loaded with misinformation and disinformation that already exist in the training data. Moreover, even if all the data used to train a model is correct and annotated by humans (supervised learning), there is a chance that the output is not correct because LLMs take an input (text) and generate more text by predicting the next word based on the input. Of course, one can always advocate and say that LLMs' performance is improving and, one day, they will be perfect but the problem isn't that LLMs are not improving. The problem is that the LLMs will continue to allow false information to be present in their output because of how LLMs work.
It is crucial for LLMs' Quran-related output to be true
Do LLMs have to tell the truth? Using LLMs to answer religious questions or find citations in religious texts should be avoided if the information is going to be presented as facts. For Muslims, the Quran is the word of God, which hasn't been altered since it was revealed to the prophet Muhammad over 1400 years ago. In fact, Muslims believe the Quran to be the biggest proof of the Islamic faith and the prophethood of Muhammad because of its unmatched eloquence. Its inimitable rhetoric challenged Arab poets at the peak of the Arabic literary boom in the Arabia.
Quran 17-88: Say, "If all the humans and all the jinns banded together in order to produce a Quran like this, they could never produce anything like it, no matter how much assistance they lent one another."
Quran 11-13: If they say, "He fabricated (the Quran)," tell them, "Then produce ten verses like these, fabricated, and invite whomever you can, other than GOD, if you are truthful."
Quran 10-38: If they say, "He fabricated it," say, "Then produce one verse like these, and invite whomever you wish, other than GOD, if you are truthful."
Quran 2-23: If you have any doubt regarding to what we revealed to our slave (Muhammad), then produce one verse like these, and call upon your own witnesses against GOD, if you are truthful.
The addition, deletion, or substitution of words, letters, or even diacritics in the Quran will affect its superiority and negate the Islamic belief that it is the word of God because the altered version is not the word of God. The problem is that LLMs sometimes present text(s) that consists of combination(s) of words from the Quran and elsewhere in a Quran-like format with diacritics as a true Quranic verse and provide incorrect citations of the Quran. For experts with enough knowledge about Classical Arabic (CA) and memorize the Quran, it is obvious that the output of LLMs is not always a correct verse from the Quran. However, for the majority of Arabs who are not knowledgeable about CA nor memorize the Quran, altered verses are not easy to spot.
It is worth mentioning that LLMs' inability to correctly cite verses from the Quran is not as important in languages other than Arabic because Muslims do not believe Quran translations to be the word of God. However, incorrect citations from Quran translations are also troublesome because Quran translations are considered interpretations of the word of God (the Quran in its original language). For average non-Arab Muslims who do not speak Arabic, LLMs' output related to Islam in their language could be taken as a fact while it may or may not be true.
Did LLMs' truthfulness improve over the last two years?
I read a paper highlighting an experiment on LLMs providing Quranic-like material, and inaccurate Quranic citations as verses from the Quran in 2023. The authors studied a small sample of regenerated queries to ChatGPT-3.5 and Google Bard (now Gemini) regarding the Quran’s perspective on misinformation. They found LLMs' output to have significant and consistent errors because of LLMs' stochastic and probabilistic nature. They regenerated queries to the question “What does the Quran have to say about misinformation?” five times in Google Bard and five times in ChatGPT-3.5. They also sampled five responses from Google Bard with the additional prompt “please cite verses in Arabic” and five responses to the original query on ChatGPT-3.5 in Arabic. They found that only two out of the twenty responses they gathered (one from ChatGPT-3.5 and one from ChatGPT-3.5 in Arabic) had no mistakes referencing the Quran. The errors in the output ranged from producing correctly quoted and cited, yet irrelevant Quranic material to misordering the words of Quranic verses; none of the errors were trivial. The example they provided is the Quranic text from Chapter 17 verse 81:
The difference between the correct verse and the modified one
While it may seem as a small error for those who don't speak Arabic, the impact of this error on the oratory of the phrase is not small.
The authors also found hallucinations in each of ChatGPT-3.5 Arabic and Google Bard attributing text that doesn't exist in the Quran at all to specific chapters and verses in the Quran. I repeated their experiment on Google Gemini and I was unable to reproduce their output. I did not get incorrect Quranic citations or verses that did not exist in the Quran. I did not get any altered verses from the Quran in any way. Every output was correct, accurate, and authentic. I found that Google Gemini correctly cited 17:81 from the Quran along with another correct citation of Quran 21:18
My activity on Google Gemini
My activity on Google Gemini
Correct answers from Google Gemini
Me: What does the Quran have to say about misinformation?
Gemini: The Quran does not directly address the modern concept of "misinformation" as we understand it. However, it strongly emphasizes truthfulness and warns against spreading falsehoods. Here are some key verses:
"But We cast the truth upon falsehood, and it destroys it. And woe to you for what you describe [falsely]."
These verses emphasize the ultimate triumph of truth and the eventual demise of falsehood.
Key Islamic principles related to this:
Truthfulness (Sidq): Islam strongly emphasizes honesty and truthfulness in all aspects of life.
Avoiding slander and backbiting: The Quran and Sunnah condemn spreading rumors, gossip, and harmful information about others.
Seeking knowledge and verification: Muslims are encouraged to seek knowledge and verify information from reliable sources.
While the Quran may not explicitly mention "misinformation," these principles provide guidance on how to approach information critically and avoid spreading harmful or inaccurate content.
I tried their second experiment with the additional prompt in Google Gemini and the result was an accurate citations of Quran 49:6
Correct answers from Google Gemini
Me: What does the Quran have to say about misinformation? please cite verses in Arabic
Gemini: The Quran emphasizes truthfulness and warns against spreading falsehoods. Here's a relevant verse:
Translation: "O you who have believed, if there comes to you a disobedient one with information, investigate, lest you harm a people out of ignorance and become over what you have done regretful."
This verse highlights the importance of verifying information before acting upon it, especially when it comes from an unreliable source. It warns against the potential harm that can be caused by spreading false or misleading information.
Key Islamic principles related to this:
Truthfulness (Sidq): Islam strongly emphasizes honesty and truthfulness in all aspects of life.
Avoiding slander and backbiting: The Quran and Sunnah condemn spreading rumors, gossip, and harmful information about others.
Seeking knowledge and verification: Muslims are encouraged to seek knowledge and verify information from reliable sources.
While the Quran may not explicitly mention "misinformation," these principles provide guidance on how to approach information critically and avoid spreading harmful or inaccurate content.
The reported errors in Google Gemini's output in the paper in 2023 no longer exist when I verified them in 2025. The output of Google Gemini was consistent and correct. It should've been archived in public web archives so that the performance history of Google Gemini is preserved through an example but Save Page Now (SPN) was unable to archive my interactions with Google Gemini. I tried another public web archive, Archive.is, but it also was not able to archive the page.
Archive.is was unable to archive my chat with Google Gemini
To me, this error-free output seemed like a great improvement. Google Gemini was able to correctly cite 17:81, 21:18, and 49:6 from the Quran and present them in their authentic form multiple times in a row (more than five times). After that, I thought looking at just a few examples is by no means enough to make such claim. I continued to send prompts to Google Gemini and after four more attempts, it produced a mixture of words from the Quran and elsewhere and attributed the mixture to a specific verse from the Quran repeating the same mistake two times in a row.
Wrong answers from Google Gemini
Me: Give me Quran verses in Arabic about falsehood.
Gemini: Certainly! Here are some Quranic verses in Arabic that address falsehood:
These verses highlight the importance of truthfulness and the ultimate demise of falsehood.
I hope this is helpful! Let me know if you would like me to translate these verses or provide additional verses
The citation of Quran 17:81 is correct, accurate, and relevant. However, the citation of Quran 2:116 is neither correct, accurate, or relevant. This phrase is not a verse that can be found anywhere in the Quran, incorrect syntactically, and it makes absolutely no sense in Arabic. It loosely translates to "And among people are the one who sells himself to wrongdoing that he may mislead others from the way of God so they made fun of him. Indeed, God guides whom He wills, and God does not like those who are unjust.". This behavior is known as LLMs' hallucinations.
The translation is: “ They say, “Allah has offspring.” Glory be to Him! In fact, to Him belongs whatever is in the heavens and the earth—all are subject to His Will.”
Hallucinations were not present in subsequent repetitions of the same prompt, but it came right back when I asked for more verses.
Me: Give me additional verses
Gemini: Certainly! Here are some additional Quranic verses in Arabic that address falsehood:
These verses emphasize the importance of avoiding idle talk and focusing on meaningful pursuits.
I hope this is helpful
The first citation was presented as the 25th verse from chapter 4 from the Quran. This is wrong. The outputted phrase does not exist in the Quran at all. It is a mixture of words from the Quran and elsewhere that are put together in a phrase that does not make sense. It loosely translates to: "And those (men) who commit adultery with someone other than their pairs (wives), bring four witnesses from among you and if they testify (against them), keep them in their homes until Allah grants them repentance. And if they repent, do not meet (talk to) them. Indeed Allah is All-forgiving, Most Merciful.
The translation is: “But if any of you cannot afford to marry a free believing woman, then ˹let him marry˺ a believing bondwoman possessed by one of you. Allah knows best ˹the state of˺ your faith ˹and theirs˺. You are from one another. So marry them with the permission of their owners, giving them their dowry in fairness, if they are chaste, neither promiscuous nor having secret affairs. If they commit indecency after marriage, they receive half the punishment of free women. This is for those of you who fear falling into sin. But if you are patient, it is better for you. And Allah is All-Forgiving, Most Merciful.”
The second verse in the output citing Quran 23:1-3 is correct, but it isn't related to falsehood. It simply describes successful believers who are humble when they pray and do not engage in idle activities.
The presence of wrong Quranic citations alongside correct ones can lead users to believe that the output of LLMs is always true. This fact is more troubling to Muslims because to non-experts or those who do not memorize the entire Quran word for word, it is not always easy to identify the errors in the output of LLMs since the output is a mixture of correct and incorrect references. Speaking of mixing truth with falsehood, ironically, the second most regularly cited verse within the sample studied in the paper was a correct citation of a Quranic verse precisely describing the phenomena. Quran 2:42
which translates to: "And do not mix truth with falsehood nor knowingly hide the truth."
LLMs' Future Improvements
Based on this short experiment, I found that Google Gemini has not produced errors for the same prompts issued in the paper in 2023. The results were all correct citations with no errors, however, the answers had wrong citations to correct verses in the Quran for different prompts that have the same meaning. It also provided phrases that do not exist in the Quran and mixtures of words that don't all exist in the Quran to begin with. Some of these phrases included syntax errors attributing them to the Quran.
Although the majority of LLMs' users are hopeful that the errors and hallucinations generated by LLMs can be eliminated as a result of LLMs' improvements overtime, some researchers argue that LLMs' errors and hallucinations will continue to increase, as more and more LLMs' generated content make it to the training datasets (content on the internet), and result in a total collapse of the model.
Conclusions
The world was taken completely by surprise by LLMs with the introduction of chatbots like OpenAI’s ChatGPT and Google’s Gemini. Despite their success, misinformation remains one of the most important challenges of LLMs. We have demonstrated that LLMs should not be used for Islamic-related topics due to their hallucinations which result in incorrect Quranic citations that are presented in a Quranic-like format deceiving users who believe them to be correct references to the Quran. LLMs are frequently mixing correct and incorrect Quranic citations which is not accepted by Muslims who believe the Quran to be free from modifications and should never be modified. Some researchers are warning that LLMs' improvements will not eliminate misinformation because LLMs' training data is highly dependent on content that contains both misinformation and disinformation.
While it’s only been 3 years since the last one, after 18 months in my current role, I thought this would be a good time to write another letter of thanks. Some Context If you’d like to know why I believe writing these letters is important, check out the first letter of thanks I wrote. … Continue reading "A Letter of Thanks (2025)"
At the United Nations Summit of the Future in 2024, world leaders adopted the Pact for the Future, which includes the Global Digital Compact (GDC). One of GDC’s central pillars is data governance, marking the first time this issue has been formally placed on the UN’s global agenda, with concrete commitments expected to materialise.
As a follow-up, the United Nations General Assembly requested the Commission on Science and Technology for Development (CSTD) to establish a dedicated Multi-Stakeholder Working Group on Data Governance at all levels as relevant for development. This group has been convened to meet for the first time on 1-2 May 2025.
This online event, organised by the Open Knowledge Foundation, Diplo and the Geneva Internet Platform, will discuss this important milestone. Speakers will provide an overview of the group’s mandate, discuss the main outcomes of its first meeting, and the expected next steps. In particular, they will assess the potential contribution of the WG to the establishment of a community and public-good centred data governance regime.
Moderator:
Marilia Maciel, Director of Digital Trade and Economic Security, Diplo
Speakers:
Renata Avila, CEO, Open Knowledge Foundation
Linnet Taylor, Professor, Tilburg Institute for Law, Technology, and Society (TILT)
Date: 9 May 2025 Time: 14.00 CEST | 12.00 UTC Location: Online
The Perma team has landed back in the US after our trip to the International Internet Preservation Consortium’s Web Archiving Conference. This year the IIPC met in Oslo at the National Library of Norway, and the conference’s theme was “Towards Best Practices.”
This is a gathering each year of colleagues from around the globe who are working in the web archiving space, ranging from institutions responsible for legal deposits, to researchers working with collections, to people who are building the core tools used for web archiving.
Here are some highlights from the conference that we think our community would find particularly relevant:
Opening Keynote - Libraries, Copyright, and Language Models:
Javier de la Rosa of the National Library of Norway presented on the Mímir Project, an initiative examining the value of copyrighted materials (such as books and newspapers) in training Norwegian LLMs. The Mímir Project offers valuable insights into the role of copyrighted corpora in enhancing model performance on tasks such as sentiment analysis, fairness/truthfulness, reading comprehension, translation, and commonsense reasoning. The findings indicated that copyrighted material improved model performance, largely due to the impact of non-fiction content. This project was important both for the library’s understanding of their own rights when it comes to AI model training and to allow them to advise researchers interested in using web archive collections as data.
Using Generative AI to Interrogate the UK Government Web Archive:
Chris Royds and Tom Storrar of The National Archives (UK) explored the use of Retrieval-Augmented Generation (RAG) to explore the UK Government Web Archive using LIL’s very own WARC-GPT and Microsoft’s GraphRAG. Using their corpus of ~22,000 resources, they explored how RAG might help with the retrieval of lost organizational memory among their departments. One challenge they had to address was the necessity of removing individual records to adhere to their takedown policy. Although their initial findings regarding WARC-GPT’s performance aligned with the results of our case study, it was encouraging to see a real-world application of WARC-GPT performing reasonably well with a significantly larger corpus than in our original study, while also being less computationally expensive than GraphRAG. Overall, they concluded that RAG continues to show promise for exploring WARCs.
UKWA Rebuild: The British Library, which suffered a cyberattack in October 2023, described the challenges they faced in the aftermath and how this affected their discovery services and user access. The UK Web Archive, along with the broader technological infrastructure, systems, policies, and processes of the British Library, had to adapt. Additionally, Gil Hoggarth described their “secure by design” framework in the rebuild process, along with their intention to prioritize a cloud-first approach. Interestingly, it was a recent on-site storage backup that prevented the UKWA from more severe data loss.
Insufficiency of Human-Centric Ethical Guidelines in the Age of AI: Considering Implications of Making Legacy Web Content Openly Accessible: Gaja Zornada of the Computer History Museum Slovenia (Računališki muzej) described the ethical challenges and implications of making legacy web content openly accessible and the impact this may have for an individual’s right to be forgotten. Especially in this AI moment where archival content is no longer consumed solely by human researchers, legacy web content can be reconstructed, resurfaced, and reinterpreted by AI tools without appropriate context or distinction from contemporary sources. If consumed in isolation, this can mislead users, making it essential to clearly label and contextualize legacy content, as the information it contains may be outdated or irrelevant.
Lost, but Preserved - A Web Archiving Perspective on the Ephemeral Web:
Sawood Alam of the Internet Archive reframed the link rot discourse to be more hopeful rather than alarmist. Recent studies such as the 2024 Pew Research Center study on link rot often highlight the alarming rate of link rot, but fail to highlight the preservation efforts of web archiving institutions and how much of the web has been rescued and resurrected. The Internet Archive’s research discovered that many URLs from these link rot studies have actually been preserved in a web archive. This dynamic brings a finer point to the goals of web archiving: in what situations are we saving the web for future historical knowledge or to maintain the interconnectivity of URLs? This highlights the differences between projects like Perma.cc and collections-based archives. While there is still more work to be done to combat the challenges in preserving the endangered web, web archiving institutions should aim to foster increased awareness and visibility of the efforts of web archives in saving our digital heritage.
What You See No One Saw: Mat Kelly of Drexel University posed the question—”Can we save the web we see from our perspective?”—emphasizing the difference between what an archival crawler captures and what a user actually experiences. Crawlers capture a clean and agnostic version of the web and essentially preserve a “fake web” which is valid but inconsistent with a web user’s perspective. Their team explored how leveraging perspective-based crawling and re-using browser user profiles to archive web advertisements and personalized content can result in capturing a more realistic experience of the web.
From Pages to People: Tailoring Web Archives for Different Use Cases: Andrea Kocsis of the National Library of Scotland and Leontien Talboom of Cambridge University Libraries presented their work on improving the usability of the UK Web Archive by exploring the different audiences for web archives: readers, computational users, and the digitally curious. While web archives are focused on providing access, it is important to answer the question of for whom we are preserving and not make any assumptions about the designated community. They outlined recommendations such as providing datasheets for web archives for computational users, providing a user interface and training to overcome the digital skill gap of the digitally curious, and prefiltering content by topics and themes for readers of web archives to explore so they don’t need to come to the web archives with a research question already in mind. They also highlighted the need to increase awareness of available web archive resources through outreach events such as exhibits and creative approaches that bridge the online and the offline through digital storytelling, data visualization, and art.
As always, spending time with the international community brought together by IIPC was a pleasure and we look forward to next year in Brussels!
Win free books from the May 2025 batch of Early Reviewer titles! We’ve got 176 books this month, and a grand total of 3,446 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.
The deadline to request a copy is Tuesday, May 27th at 6PM EDT.
Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the US, the UK, Canada, Australia, Germany, France, Poland, Luxembourg, Malta, Netherlands and more. Make sure to check the message on each book to see if it can be sent to your country.
Thanks to all the publishers participating this month!
It’s hard to believe it’s already May. The DLF community is as busy as ever, and we’re excited to share with you a variety of news, meetings, and things to read. We hope you enjoy them and have a wonderful month.
— Team DLF
This month’s news:
Nominations open: Have ideas about who should be the featured speaker at the 2025 DLF Forum? Submit your thoughts by May 15.
On the DLF blog: Members of the DLF Digital Accessibility Working Group have authored a blog post, “What you need to know as a library to prepare for ADA Title II rule changes.” Read it now.
Panel Discussion: OCEAN is hosting “Cultural Heritage/Ethics and Intellectual Property Workshop Part II: Analysing Case Studies,” a case study analysis dedicated to working with legacy collections that contain materials of distinct cultural significance. Friday, May 2 at 12pm ET. Register and learn more here.
Workshop: IIIF’s next five-day introductory training will be the week of May 19. No prior knowledge of IIIF is required, and the core sessions should be easily accessible to those with confidence using the Web. By the end of the workshop, participants will have completed a small IIIF project, which they will demo for the group at the final meeting.
Conference: Registration is open for the 2025 IIIF Annual Conference in Leeds, UK. This year’s conference features two full days of presentations, followed by a day of workshops and other activities. All workshops and tours are included in your registration, but space is limited. The program is available here, and registration is open on ConfTool.
Recordings Coming Soon: The 2025 IIPC General Assembly and Web Archiving Conference in Oslo, which saw 200 attendees from 83 organizations and 33 countries, has officially wrapped! Be on the lookout for conference recordings on their YouTube channel in the coming months.
This month’s open DLF group meetings:
For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.
DLF Born-Digital Access Working Group (BDAWG): Tuesday, 5/6, 2pm ET / 11am PT
DLF Digital Accessibility Working Group (DAWG): Wednesday, 5/7, 2pm ET / 11am PT
DLF AIG Cultural Assessment Working Group: Monday, 5/12, 1pm ET/10am PT
DLF AIG User Experience Working Group: Friday, 5/16, 11am ET / 8am PT
DLF AIG Metadata Assessment Working Group: Thursday, 5/22, 1:15pm ET / 10:15am PT
Starting 2023, we are meeting with more than 100 people to discuss the future of open knowledge, shaped by a diverse set of visions from artists, activists, scholars, archivists, thinkers, policymakers, data scientists, educators, and community leaders from everywhere.
The Open Knowledge Foundation team wants to identify and discuss issues sensitive to our movement and use this effort to constantly shape our actions and business strategies to deliver best what the community expects of us and our network, a pioneering organisation that has been defining the standards of the open movement for two decades.
Another goal is to include the perspectives of people of diverse backgrounds, especially those from marginalised communities, dissident identities, and whose geographic location is outside of the world’s major financial powers.
How can openness accelerate and strengthen the struggles against the complex challenges of our time? This is the key question behind conversations like the one you can read below.
*
This week, we had the chance to talk to information systems engineer Patricio Del Boca, an open technologies expert serving as the Technological Lead of the Open Knowledge Foundation since 2024.
Del Boca has been a human rights activist since the age of 15, when he became involved in the struggle for the disappeared during Argentina’s last military dictatorship. He has been involved with free software and open data communities since their origins. He is an expert in civic tech projects and a CKAN core developer. He is also the co-founder of Open Data Córdoba, with which he continues to collaborate. Working since 2021 for OKFN, recently Del Boca has been promoting our tech vision at global events and platforms such as the Open Government Partnership (OGP) Global Summit, the Digital Public Goods Alliance (DPGA) Members Meeting and EmpoderaLIVE.
One of his most important recent contributions to the positioning of Open Knowledge has been to develop an initiative we call The Tech We Want. In short, we want to start a collective conversation about new practical ways to build software that is useful, simple, long-lasting, and focused on solving people’s real problems.
As part of this initiative, we held The Tech We Want Online Summit in October 2024, bringing together the voices of key leaders in the open technology movement and demonstrations from around the world that are already putting these principles into practice. In January, a new iteration of the project was the Retrospective on the Role of Technology in the Super Election Year 2024. And most recently, in March, we hosted the Frictionless Summit, with demos and discussions with the Frictionless community about the future of the project under this new vision.
In this conversation with Lucas Pretti, OKFN’s Communications & Advocacy Director, Del Boca reflects on OKFN’s current tech vision and the road ahead.
We hope you enjoy the read and get inspired.
*
Lucas Pretti: Let’s start with the basics. What is the problem with the software being developed by Big Tech today?
Patricio Del Boca: It is important to distinguish between two things when answering this question. One is the products that Big Tech sells, and the other is the tools that they develop and that influence software development worldwide. I will focus on the latter.
Since the popularisation of the Internet and digital technologies, companies like Google have faced a very specific problem: how to get billions of people to access their websites. Google, for example, had a search engine that wanted to index the entire Internet. Facebook and Twitter have to deal with millions of daily posts, YouTube and Netflix have to store and transmit terabytes of video information, and so on. To solve these problems of scale, these companies started to develop tools and frameworks that ended up being open source.
In the era of big data, these large companies had to develop technologies and ways of working to meet the demand for consumption (and growth) of their platforms. Some examples: microservices, infrastructure as code, indexing engines, APIs or Kubernetes. Unfortunately, this boom in technologies that solved specific problems for this 5% of technology companies had two very negative consequences: it distanced development from the basic architecture of the web (servers communicating via HTTP and exchanging HTML) and it hypnotised the technology community, which began to adopt these tools for all its developments, even though they do not have billions of users and do not face similar challenges.
The software world absorbed these technologies, but without having to solve the same problems. The normal, everyday software that someone has in my local hardware store, my local supermarket or my local council does not have these problems. So if the vast majority of the world’s software doesn’t need the scale of Facebook, Google or Twitter, why should it need the same tools, technologies and ways of working?
That’s where we are now, with very few initiatives or companies that really need or have big data, but have used these tools because it was modern, it was new, it was marketing. Because they didn’t want to miss out on the ‘innovation wave’. I repeat myself: Silicon Valley companies and my city council don’t have the same needs or the same resources, so why should they use the same tools?
All this has led to the current situation and to the point I want to defend: there is over-engineering in the world of digital technologies. 95% of companies do not have the same problems as Facebook, Twitter, Google or Amazon, using the same solutions and the same tools is completely unsustainable. I don’t need a 4×4 to go shopping. An SUV is designed to pull a 3-ton tractor out of the mud. If I use it just to carry shopping bags, I’m overpaying and paying for something that costs a lot more to produce, consumes a lot more and whose potential goes unused: the perfect definition of unsustainable consumption.
Lucas Pretti: It’s funny that you use the SUV analogy, because a few months ago I read the news that the city of Paris held a referendum to ask the citizens if they should multiply the cost of parking SUVs in the city centre. And the people said yes, multiply the taxes for those who use SUVs where they shouldn’t. So I ask: how should we deal with SUV parking in the city centre? So I ask: how should we deal with the over-engineering of software today? With regulation? With taxes?
Patricio Del Boca: I think there are several tools, there are many fronts, there is no single silver bullet to solve this. Regulation is clearly one way, and the state has a role that it is not fulfilling. We need to focus on regulating commercial practices such as pre-sales (which is nothing more than lobbying) and vendor lock-in (because the technology is proprietary, the buyer is tied to the developer). One possible solution is for states to use free software, not only for technological reasons, but also for reasons of sovereignty and security.
But there is also a cultural and technological change that needs to take place, especially in the technology community, where voices are beginning to be heard today talking about these issues.
I always give two points of reference. One is this new project that has recently emerged: HTMX, which is a small library for web development that extends the functionality of HTML. You don’t need a whole complex framework to build most web pages, let’s go back to basics: HTML and a backend is enough. The other reference is Ruby on Rails, a simple framework where a developer can only take care of the backend and the frontend. Ruby on Rails has in its design this idea of the ‘renaissance developer’ who is able to build a product in its entirety. Fortunately, we are slowly (again) creating ecosystems of tools that allow a single person to be efficient and create sustainable solutions.
Technological innovation’ and technological overkill have been top-down because it didn’t start with ‘I’ve got a developer, what can I do with him’, it started in Silicon Valley with ‘I’ve got millions of dollars, what do I want to do? That’s how a lot of the tools have become very expensive and are used to solve the problems of the 1%.
Another front we can attack is hardware. Hardware has come a long way in the last 20 years. A Raspberry Pi alone now has the same processing power as many servers from two decades ago. You no longer need ‘the cloud’ to scale and grow, today’s servers that can be installed locally have enough power for most of the applications we need.
So, in summary, over-engineering can be solved on several fronts: government regulation so that they do not sell smoke to the state, reviewing the capabilities of modern hardware as an option to the cloud, and cultural change in the technology community to produce more sustainable tools.
Lucas Pretti: I was a bit surprised that in the first question, about the problem of mainstream software, you didn’t fall into the open vs. closed dichotomy. I would like to expand on that. Even when we talk to multilateral organisations with a clear public interest, such as the UN or the G20, to give a few examples, it is not obvious that their technological solutions have to be open and free. There is concern and resistance, even from people who are theoretically on the side of power. Why is that? Why should open source be the standard? What are the technical advantages? How do you get around the issues of security, privacy, effectiveness, etc. that seem to create resistance?
Patricio Del Boca: Yes, there is a lot of unfounded resistance, but there is also a historical reality that has favoured the development of private solutions over open solutions. Today it is true that there are private solutions that are better than open ones because they have had more inertia, they have been first to market, they have had more investment and therefore they work better. For some problems, there are only private solutions that work, and we have to accept that. I think this context is the initial source of resistance to open source: it is still in the collective unconscious that free software is less beautiful, more difficult and less intuitive than private software.
Beyond this ‘fame’, more on the technical side, I think the reluctance is also based on the notion of ‘security by secrecy’. That is, that software is more secure because nobody knows about it – which is necessarily true. Keeping something secret does not mean that it is more secure, I would say the opposite. Recently there was a case where an open encryption technology used for banking transactions was hacked. There was a bug, it affected a lot of people, but it was discovered, fixed and modified in a very short time, precisely because it is open. The idea that open software is insecure because everyone knows the source code is wrong, because while there are two eyes trying to loot it, there are 20 other eyes in the open development community trying to make it more secure.
And, of course, proprietary software is the source of what we were talking about earlier, the issue of power and control. It ties the buyer to the vendor and its supply chain. Audits are contracted to certify that the proprietary software is secure, specialised teams trained by this or that company are contracted, and so on. In the end, it’s always going to be a matter of trust, because if you don’t open the code, you’ll never know if it’s really secure. You have to trust someone who says it is secure.
If anything, I think in the last five years the ecosystem has changed a bit. The digital infrastructure of the world today is built on Linux. The servers of most web servers are free, the same goes for Android, and so on. In other words, it has been shown that free software is much more robust than it is often given credit for.
Lucas Pretti: The concept of public software, or public code, is being used a lot nowadays. From there, the notion of public digital infrastructure (DPI) was developed, which was put on the global agenda at the last G20 meeting in India, and which has become the focus of attention of many organisations, starting with our own, the Open Knowledge Foundation. Such a concept does not by design carry the adjective “open”. Do you think all DPI should be open, and why?
Patricio Del Boca: Yes, it should. Firstly, because it is public. If the digital infrastructure is public, then everyone should have the right to see it, understand it, update it, use it and check it. Just like a public square: I have the right to enter and enjoy it.
Then for the sake of transparency. Not only is what the software does transparent, but so is the bidding process. If you have open software, you remove the interest of making you a slave to a closed solution or selling you ‘exclusivity’. If the tool is open, then the best and cheapest user wins. Full stop.
Also because of the innovation issue. With open software, there is a whole ecosystem that has access and can innovate: universities, independent communities, social organisations and also the private sector. We have to get away from this dichotomy that innovation only comes from the private sector. If the only thing that drives us to innovate is money, we have a problem as a society.
Another reason is cost. Open software allows reuse, which reduces costs. We need to stop reinventing the wheel and share solutions, improve them, etc.
Which brings us to another related reason: sustainability. Private software involves hundreds of developers in dozens of different companies doing the same thing. This wastes resources, ties the buyer (in this case the government) to one company, and is ultimately inefficient in terms of human and natural resources. Open source software is reusable and therefore more sustainable in the long run.
Finally, of course, there are the communities. Open infrastructures create a community ecosystem with enormous benefits, as we discussed earlier. It is more than clear that open source software has 25,000 more benefits than private software in the state.
Lucas Pretti: Elections are a topic you are passionate about. Your most practical experience is at the local level with Open Data Cordoba, such as the alternative vote counting system and other election initiatives. What is your vision for scaling these kinds of open, community-based, bottom-up solutions? Or rather, should they be scaled up? And how?
Patricio Del Boca: Yes, yes, I think they should be scaled. But scale in the sense of reusing solutions, not in the sense of increasing the adoption of the same centralised technology. One of the great advantages of working with elections is that, despite the particularities of each country, they are all more or less the same. At the end of the day, it’s about electing somebody, counting the votes, registering the people who voted and publishing the results. So there is a lot of scalability.
The concept of public digital infrastructure is perfect in that sense, because it focuses on building small interoperable solutions that can be reused and clustered to solve specific local problems. Rather than each community or country developing its own tools, it is much more efficient to share the same modular and interoperable infrastructure, and to evolve the software and protocols with the different experiences and learnings from each place. This is the goal of our project.
I am convinced that communities, and the Open Knowledge Network in particular, are the ideal engines for this. The experience we have had over the last year of holding roundtables in different continents and contexts, unintentionally and unplanned, has shown us that all countries have similar solutions to similar problems. In Italy, a small town in the south of Sicily has the same problem as my province in Cordoba, Argentina. In other words, let’s communicate and collaborate to scale, as I said, in the sense of reusing the open technologies that exist and those that we will create together. To scale is to collaborate.
Lucas Pretti: Since last year, with the development of the Open Data Editor (ODE), we’ve been trying to put all this into practice and, in a way, lead by example, showing that a different approach to technology is possible. How are you and our technology team applying The Tech We Want principles to the creation of this application?
Patricio Del Boca: The Open Data Editor was created with the intention of making the full capabilities of the Frictionless Data project (which consists of a fairly technical collection of standards and specifications for working with open data) available to people without any knowledge of coding or programming languages. There are lots of people out there, activists, civil servants and small social organisations, who don’t need complex software to clean up their spreadsheet data and correct errors. So the first task was to simplify ODE’s architecture as much as possible and make it easy to use, run offline, run operations locally with privacy rather than in the cloud, and provide data training in an accessible way through a free online course.
This year, we are focusing on energising the community around the app and exponentially increasing adoption among the groups we see as key: activists, social organisations and governments. We are already working closely with five pilot organisations from a first cohort, and will soon be launching a call for a second cohort. The emerging use cases for the Open Data Editor are impressive. Take a look at the work of the Observatoire des armaments on the war industry in France and you will understand the impact that a simple and accessible tool like ODE can have, and the difference it makes to a small but strong and committed team.
Lucas Pretti: Let’s summarise: What technologies do we want and what technologies do we need?
Patricio Del Boca: We want technologies that are accessible. By accessible, I mean open and cheap, whose innovation and development are not tied to large corporations. What’s more, we want them to be sustainable. In other words, the technology we build today won’t need to be rewritten in two years’ time and won’t require entire teams to maintain.
Lucas Pretti: And for whom are we building these technologies?
Patricio Del Boca: We need concrete technologies for people who need them, based on real needs, not on oversized problems. Nothing else.
This text shows a real case of how the Open Data Editor (ODE) impacted the workflow of an organisation working to serve the public good.
Image taken during the street-by-street urban air quality monitoring activity in the municipality of Kikuyu, Kenya, in 2024. Source: YouTube
Organisation: The Demography Project Location: Nairobi, Kenya Knowledge Area: Environmental Justice Type of Data: Air Quality
The Demography Project is an independent youth-led non-profit organisation founded in 2022 in Nairobi, Kenya. They leverage affordable & accessible civic technology, citizen science, collective action and grassroots journalism through citizen-generated data and open-government data. The overall goal of their work is to amplify community voices and enhance citizen capacity to participate in advancing accountable governance, participatory democracy, environmental conservation and public awareness.
They do this through different projects, such as Maai Makwa, a water quality and quantity monitoring project; Infrastracker, focused on empowering vulnerable communities with open-source tools and data for climate action; and Bungeni, a parliamentary monitoring project for the County Assemblies of Kenya.
The Challenge
Problem
One of the tasks carried out by The Demography Project team and volunteers is the collection of environmental data to measure the air quality of a given location. This is done in the field, on long walks during which the project team carries various sensors to determine, among other things, the level of pollution, humidity, atmospheric pressure and the presence of various chemical elements such as nitrogen dioxide (NO2), volatile organic compounds (VOCs) and many others.
To make these measurements, they use different devices, manufactured by different companies, with different capabilities – for example, the Plume Labs Flow 2 sensor measures parts per billion (ppb), while the Atmotube Pro measures parts per million (ppm). In addition, none of the sensors provide geolocation data, forcing the project team to collect GPS data separately.
These different databases, from a total of four sources, generate a large number of spreadsheets as measurements are taken every second. It’s only by harmonising this data and making it ‘talk’ to each other that they can draw conclusions about the air quality measured during a given field trip. In other words, they have to compile all the data into one giant spreadsheet that is virtually impossible for the human eye to check.
They also face a lack of specialised human resources. There are currently 16 people involved with The Demography Project , most of them volunteers. Of these, only two have any technical knowledge of database management, and that in a limited capacity.
Impact
Organising and compiling hundreds of rows of a table of raw technical data (mostly numbers) is a near-impossible task for a human, and it used to take months for the team to reach a minimally adequate level for analysis. They would search endlessly for zeros, empty cells and other inconsistencies. As well as the time and effort involved, there was always the feeling that the data wasn’t accurate enough, which could ultimately compromise the quality of the results and make all the hard work go to waste.
Examples of some of the different datasets the Demography Project team works with.
The Solution
The Demography Project team uses the Open Data Editor (ODE) to look for errors in the gigantic spreadsheets consolidated after data collection in the field. These tables, which compile data from different sources (including two portable air quality monitors, GPS devices, smartphones), have many inconsistencies that are detected in seconds. Once the errors have been identified, they can then correct the missing or incorrect data, increasing the dataset’s quality.
According to the team, ODE is also very useful for understanding how data works in general and the importance of standards.
In the spreadsheet, ODE identified 29 inconsistencies and errors in the dataset.
Above: ODE identified alphabetic elements in two cells from raw data extracted from the portable air quality monitors, which should have been in numeric format.
The ODE metadata panel proved to be a key resource for the team to learn more about how data works.
The Results
Reduced error resolution time from months to seconds
Ensuring that the data is good enough to be analysed
Enabled the team to focus on the data collection in the field
Identification of data gaps, particularly GPS coordinates, which distort data outputs
From the data analysed from the Street-By-Street Air Quality Monitoring Activity in Kikuyu Municipality, the air quality is relatively safe and within the parameters set by Kenya’s Environmental Management and Co-ordination (Air Quality) Regulations, 2024 and the World Health Organization’s Global Air Quality Guidelines 2021, with an average Air Quality Score of 85, compared to 62 in Kenya’s capital city Nairobi.
As the infographics below show, this is the kind of conclusion and output that Open Data Editor has helped The Demography Project to develop faster, increasing its social and environmental impact.
Quote
Richard Muraya, Executive Director
“The reason we’re so interested in the Open Data Editor is that it highlights the gaps we have in terms of data collection. Before ODE, it was a challenge to identify where the zeros were or where we didn’t have accurate readings from our monitoring hardware. With ODE, we have a tool that helps us to identify those gaps and ensure that we have some form of coherence in the tabular data from our air quality monitors, phones and action cameras”.
About the Open Data Editor
The Open Data Editor (ODE) is Open Knowledge’s new open source desktop application for nonprofits, data journalists, activists, and public servants, aiming at helping them detect errors in their datasets. It’s a free, open-source tool designed for people working with tabular data (Excel, Google Sheets, CSV) who don’t know how to code or don’t have the programming skills to automatise the data exploration process.
Simple, lightweight, privacy-friendly, and built for real-world challenges like offline work and low-resource settings, ODE is part of Open Knowledge’s initiative The Tech We Want — our ambitious effort to reimagine how technology is built and used.
And there’s more! ODE comes with a free online course that can help you improve the quality of your datasets, therefore making your life/work easier.
Download Open Data Editor 1.4.0 using the following buttons:
All of Open Knowledge’s work with the Open Data Editor is made possible thanks to a charitable grant from the Patrick J. McGovern Foundation. Learn more about its funding programmes here.
This page has been online in some form or another since 2023 and is now making its appearance on my dot-com. This page is in perpetual draft. Last updated April, 2025.
There are a lot of logistical and emotional components of job-hunting, applying, interviewing, and changing jobs. That part alone is hard, and it’s plenty to have to do. But also prepare yourself for the inevitable grief of leaving your field, as well as the identity shift that happens when you leave.
Do you want a job or a career? How much heart/soul can you put into your work?
Ask yourself if your career actually needs to have a trajectory.
Sometimes it just doesn’t make any sense. What does “career success” look like for you? If it looks like climbing a ladder, you are probably not reading this right now.
It doesn’t have to be a forever-job. It can be a for-now job.
It can be really easy to search for the dream job/company that you’ll stay at forever! Sometimes, though, you just need a job to get you started, to pivot into another field or get you experience doing X, Y, or Z. Don’t stress yourself out looking for a perfect forever job.
Find a job you could do, that pays you enough to live, and that gets you the experience you need.
Before you start applying for jobs, think about how you’d answer the question “tell us about yourself” in 1-2 minutes at the start of an interview. Tie your past work and interests to the thing that you want to do next. That is the story that you will tell your interviewers, your network on LinkedIn, and most importantly yourself as you’re moving through the job hunt process.
What artifacts do you have that can help you tell your story?
Things you’ve written: articles, blog posts, policies, strategy documents, memos, project plans
Presentations you’ve given
Projects you’ve initiated, led, or contributed significantly to
Any other artifacts that represent your work.
Do your research
Use your strong research skills to learn how things work outside of academia.
Do informational interviews
Ask friends and friends of friends for informational interviews. People are so very generous! A quick half-hour call will give you a lot of insight into what a person’s job and workplace is like, what kinds of things they’re responsible for, and even the words they use to talk about what they do. Soak it up.
See what others are doing
Dust off your LinkedIn account. Start searching for people who are talking about things you’re interested in. Follow them, and follow who they follow. You don’t have to “connect” with them if you don’t want; you can just follow their posts.
Look for jobs
By looking at job ads you can learn what types of words/phrases people are using to describe certain skills. Refine your search as you find new keywords in job postings.
All job ads are aspirational. You won’t have 100% of the qualifications for every job. If you have half the qualifications, apply.
I recommend starting a spreadsheet to track each role you’re interested in, whether you applied, the employer, a link to the job, your application status, when you applied, and any other notes you want to make (salary? concerns?)
Words of Mouth is a weekly email newsletter with job postings across the arts, digital jobs at nonprofits, etc. Also includes fellowships. This list is really tailored for GLAM/academic-adjacent folks.
LinkedIn is, unfortunately, a thing
LinkedIn is weirdly very important outside of higher ed, especially in the private sector.
Fill out your profile - add a brief bio (remember your story) and add more details about your responsibilities/accomplishments in previous/current work and volunteer experience.
Model your profile based on what others are doing - lurk and find folks whose profiles look good to you and note how they are using LinkedIn. Make any changes to your profile that feel authentic for you.
Make/strengthen connections - reach out to folks in your existing network and add new people that you know. LinkedIn is extremely creepy and knows who you know. Just add ‘em.
Ask for help - either as a post, or through messaging folks. Most folks are very eager to add connections, exchange messages, share links to jobs, offer referrals, and share information about their work.
The job market, especially in UX and adjacent fields in 2024, is awful 🙂. No matter what field you’re in, though, be prepared to be ghosted at any point in the application process. Don’t take it personally.
Transferable skills
Your skills are transferable!
Here are some transferable skills I identified for myself:
Talking with people and building relationships
Managing projects and stakeholders
Recruiting, hiring, retaining, rewarding, and managing people
Facilitating meetings and workshops, and presenting to groups of all sizes
Writing for different audiences, including communicating “professionally”
Mapping out, clarifying, and streamlining workflows
Strategic planning
Understanding how technologies connect and how the internet works
Putting theory into practice for diversity, equity, inclusion and accessibility
Here are a few terms that might help in translating your skills for a new context:
Academic word
Private sector word
Translation
Faculty member
SME
SME = Subject matter expert. Someone who knows a lot about a specific topic.
Administrators, deans, provosts
Executive leadership, C suite
In the private sector, like deans and provosts, the exec team runs things: CEO, COO, CIO, CTO - the C-suite.
Collaboration
Cross-functional collaboration
Cross-functional just means everybody has different jobs and you are able to effectively work with them.
Supervisors, external collaborators
Stakeholders
Stakeholders include anyone who is responsible or accountable, or who is informed or consulted, about your work.
Research findings
Insights, learnings
Yes, learnings is a word here.
Websites/web applications
Products
Is it a digital tool? It’s a product.
Writing and organizing documentation
Knowledge management
KM is an entire professional field and one to which academics in particular are well-suited.
Guidelines, policies, documentation
Processes, procedures, SOPs
SOP = standard operating procedure. If you’ve ever written documentation on how to do certain tasks, or how things should be done, you have experience with SOPs.
Teaching, instruction
Guidance, training, instructional design
If you’ve developed and taught a class, you’re an instructional designer.
Research
Discovery
“Do discovery on X Y Z” ⇒ Do research on it.
Grantwriting/grant-seeking
Business development/BD
ℹ️ I’d really like to expand this section! Please write me with any additions.
Each job description should be shorter than the one before
No need to go back more than 10 years. “Recent work experience” is good!
Interview
Each interview should be a conversation and a learning opportunity, and a way to practice talking about yourself. An interview shouldn’t be an inquisition, and if it feels like one, that may be a sign to pull yourself out of the applicant pool.
Be prepared to go through multiple rounds of interviews spread out over several weeks. Again, prepare to be ghosted at any time.
Answer questions
Have a few stories at the ready: tell us about a conflict, tell us about an initiative you led from start to finish, tell us about managing up, tell us about working with a difficult client. Think about the projects you have worked on.
What stories do you have to tell about working with stakeholders in an organization, navigating competing priorities or compromising?
Ask questions
Ask a LOT of questions. You want to know what you’re getting into, and employers want someone who is curious and motivated.
Depending on the vibe of the interview, you might ask questions after you answer their questions:
“You asked about managing multiple competing priorities. How are priorities set and communicated here? Who would the person in this role work with to establish a good priority order?”
Here are some of my favorite questions to ask hiring teams:
I see that this is a (new role
existing role). What does success look like for the person in this role? Why did the person in this role previously move on?
How will you work with the person in this role? How do you collaborate and what duties would you like to see this person take on?
What goals and initiatives does your company have around diversity, equity and inclusion? What are some challenges or opportunities? (If they don’t have a good answer for this, it’s a red flag.)
What are some growing edges for the organization? What are y’all actively trying to improve right now?
How do y’all support each other in both completing work and making sure you take care of yourselves outside of work? Do folks take their vacations here?
What questions am I not asking that I should be? What do you wish you’d known before you started work?
A lot of places outside of higher ed and nonprofits don’t care about calling your references. Instead of calling references, they will simply make you go through a 4-6 step interview process!
In my experience, places only call to verify your former employment at an organization - not get a character reference.
Hang in there
If there’s one thing I’ve learned about leaving a specialized role in a field that encourages folks to achieve national recognition as an individual scholar, it’s that I’m not actually that special. But what I do have is the wisdom of seeing how institutions work and understanding what makes those gears turn. That knowledge translates very easily across sectors and organizations.
You are going to get there! Keep going.
Change log
2025/04/27 Moved to this URL, added change log, updated markdown formatting
Cliff has left a major legacy and a large gap. Read the touching tributes from ARL and CNI.
I first met Cliff in 1990. I was on a study trip to the US and Canada, commissioned by the British Library to prepare a report on emerging network trends in libraries, with a special emphasis on the US. It was an early activity associated with UKOLN at the University of Bath, which the BL partly funded (the UK Office for Library and Information Networking, which has since ceased to exist).
The Division of Library Automation at the University of California, under Cliff&aposs leadership, had been a pioneer in networking and shared systems. Melvyl was an important early example of a shared, network system with some scale, and Cliff was an authority on the emerging Z39.50 protocol which was a central part of what I was looking at.
It was my first trip to the US. I was a little nervous, compounded by the fact that my wardrobe was unsuited to the climate and the relaxed Californian working environment. However, I was immediately put at ease by Cliff&aposs gentle, curious welcoming manner. He was very generous with his and his colleagues&apos time, and created a schedule with care and attention.
I observed with interest the collegial working environment he fostered in the office. This was my first introduction to Cliff in person - to his thoughtful conversation and gracious social presence. It was a revelation for me and provided a glimpse into what was to make his contribution to libraries and the research community so special over many years. He combined deep computational knowledge with a complex appreciation of the evolving scholarly and cultural record in a network environment. This was integral to how he saw libraries - as critical and inseperable elements of the infrastructure and practice of research and scholarship.
Later, when the report was published, Cliff reviewed it at length. Characteristically, the review appeared in what we might now call a Diamond access publication, The Public-Access Computer Systems Review, an important early electronic journal. I think the positive review may have influenced the release of a second edition of the report by Meckler for a US audience.
He correctly identified some of the traits of my subsequent writing career:
Sometimes, the coverage is a bit encyclopedic, which makes for slightly tedious reading, but such detail is necessary in a comprehensive report.
From my current perch in the US, I was interested to be reminded of this passage from his kind concluding comments.
This is a wonderful book that we should thank Dempsey for writing and the British Library for supporting. (I do find myself thinking, parochially, that it is strange that the first real book on these topics has been written from a European perspective and underwritten by the British Library. The topics covered are terribly important to the library, information science, and networking communities. Why hasn&apost this type of book been written from a U.S. perspective?)
This visit began a long relationship.
Cliff later came to the UK to present in the series of conferences we organized at the University of Bath. I remember a discussion in Bath between Cliff and the affable and inspirational Paul Evan Peters, who was at the time the inaugural leader of the Coalition for Networked Information. They agreed that they would readily pay for a service that managed the transfer of their luggage between destinations, with which they could then occasionally rendezvous. That said, packing was simplified for Cliff as he became identified with a standard outfit of jeans and blue oxford shirt.
From this vantage point, it is difficult to capture the excitement of those early Internet years. There was a magical and formative period, between that early promise and the later Amazoogle enclosure. And the inclusive and generous vision of Paul, Cliff and the early CNI was an important part of that, aiming initially to provide a venue for conversation between libraries, publishers and technologists. While the library element may have come to dominate, it always fostered connections.
It was in Bath also, that we met Cecilia for the first time, which began a long exchange about Irish tea and whiskey and yarn. Cliff&aposs work depended on Cecilia&aposs support in so many ways.
Professional and personal interactions continued as CNI developed a strong working relationship with UKOLN and with Jisc, the national information and networking infrastructure organization, with which I later worked for a while. We organized a series of joint US/UK conferences, which provided an ongoing collaborative venue, and an important input into the extensive project and development work of Jisc in subsequent years.
Cliff was also a contributor to the second Dublin Core meeting which UKOLN jointly organized in Warwick in the UK. I remember reading a transcript of some remarks by Cliff on this event which rendered &aposWarwick&apos as &aposWoolworths.&apos And indeed, &aposdivided by a common language&apos was the title I adopted for a presentation I was invited to give to the fourth Jisc/CNI event about differences between the UK and the US library environments. Again, I remember speaking to Cliff about my remarks, considering on the one hand the incredible volume and variety of institutions in the decentralized US higher education environment, and on the other, the important shared services infrastructure being put in place by Jisc in the UK. We joked about how far away hills seem greener, so that while some of the US delegates envied the shared attention possible through Jisc in the UK, many of the UK delegates envied the scale and capacity of the larger US organizations.
When I was first offered a job in the US I immediately reached out to Cliff for advice. Later, when I occasionally entered into discussions about other jobs, I sometimes sought his guidance. On one memorable (for me) occasion, he strongly advised that I withdraw my interest in a library Dean role: he argued that the library in question did not have great institutional support. He was always candid and direct in this way, even as his delivery remained gentle and considered. I remember I sent him a piece I had written early in the pandemic about likely trends. He gently nudged me away from publishing it, as something that would date badly.
For many people, Cliff was CNI. Although of course he always acknowledged the strong contributions of Joan and of the other members of the team that made it work. When I think about CNI and Cliff&aposs CNI career three things stand out for me.
First, Cliff and his colleagues made CNI an extraordinarily successful venue for scaling learning and innovation across the library community and beyond. As much as through any formal presentations, this was through the congregation and personal networks it facilitated. Like all successful networks, it offered a venue where people could pool their uncertainty, gain confidence in new approaches, connect with like minds, identify potential hires or new job opportunities, and serendipitously connect with people and ideas. This is how norms, ideas, and innovation diffuse. It promotes sharing, mentoring, and career growth. This was valuable in immeasurable ways.
Second, there was Cliff&aposs intellectual contribution. I would occasionally remark that CNI was a brain support mechanism for Cliff, and I was only half-joking. I believe that many members were more than happy to make an investment in CNI in order to provide a community space for Cliff to reflect on directions and to share his thoughts. Colleagues valued his luminously synthetic introductions to the meetings, the publications that immediately patterned your response to particular topics, his remarkably fluent presentations delivered with a few notes, and his generous counsel. His reading was wide, across the very technical, as well as the social and cultural. Much will be said about all of this by others. I will just note here that no matter how often one saw him do it, one was always awestruck by his range, insight and shaping powers.
Finally, Cliff was a bridge from the research library community to other communities and organizations. His effectiveness here depended not only on the knowledge and insight he brought into any conversation, but also on the universal recognition that he was motivated by a desire to lift up all interested parties in support of research and scholarship.
Perhaps the library community doesn&apost have the scale to generate a deep historical perspective. It would be especially interesting to document the web of influence and development in the period in which Cliff was active, which saw research, learning and cultural expression shape and be re-shaped by the evolving material base of the network and the digital.
Cliff has a special place as somebody whose work and writing with the University of California influenced emerging library services, and who transitioned into a role where he selflessly and persistently worked to help a community be a better version of itself.
Work interests meant that I did not go to CNI regularly for several years. I was very pleased when Cliff asked me to deliver the keynote at CNI after my departure from OCLC was announced. It did not turn out to be my best outing; I was trying too much to live up to the occasion and I ended up saying too little because I actually said too much.
Cliff was kind and I was touched by the enthusiasm with which he described how keynote speakers were added to the CNI meeting invitation list. In this way, he suggested in his reserved way, he hoped I would attend future meetings. I will miss the opportunity of being able to relax into such quiet interactions based on long shared history and conversations.
Cecilia&aposs loss is immense. It may be a small comfort to know that Cliff&aposs influence is strong and active in all the lives that he touched.
Note: Feature picture and caption by John Kirriemuir in a 1997 interview with Cliff for Ariadne. Cliff talks about his role at the University of California and his transition to CNI, as well as about various network information topics.
The #TeslaTakedown protests have gone on long enough that it is time to make a public archive of signs.
For the latest signs, I've included a link where you can download a PDF to print your own.
Please use these if you'd like; if you want to give me something in exchange, just tag me on Mastodon or Bluesky so I know how far these have spread.
Also, Marc Lee from Free Protest Signs reached out on Bluesky to let me know about his website of signs.
If you don't like something below, maybe one of his will suit your mood!
All of the ABOVE!
All of the ABOVE! protest sign, first used on 26-Apr-2025
The meanness, the illegality, the stupidity...it is all more than I thought possible and it is certainly not what deserve from our government.
And it is not just one of these attributes, but all of them coming from all of this administration's elected, confirmed, and senior leaders.
Download and print your own 26" by 16" version of this sign.
Elected Assholes
Elected Assholes protest sign, not used by the author
This crap is well past getting out of hand, and I wanted a sign that reflected that.
The government—in my name as one of its citizens—is deporting people without due process?
It is bullying foreign leaders in the Oval Office?
It is recklessly dismantling medical research, food safety programs, and environmental controls?
This doesn't represent my values, nor—I'd wager—the values of most of the country.
The focus group (my family members) weren't a fan of the unnecessary crassness of the sign.
I want to so something with the concept of "My GOVERNMENT did WHAT?!?", so I'll probably revisit this.
Disaster Musk protest sign, first used on 12-Apr-2025
A few weeks ago, I saw the picture of a smirking Elon Musk in a New York Times article, and I knew I needed to make use of it somehow.
Inspiration struck this week when I remembered the 'Disaster Girl' meme.
The picture with a four-year-old girl looking back at the camera as a firefighters battle a house fire with the "devilish smirk" is a perfect fit for what Elon is doing to our federal government.
Download and print your own 26" by 16" version of this sign.
Get Angry at Billionaires
Get Angry at Billionaires protest sign, first used on 5-Apr-2025.
Returning to the theme of my first protest sign, I wanted to convey that the people giving us the middle finger as they drive by have more in common with us than they do with the billionaires they are supporting.
Or, if you still want to be up in arms with me, then just come stand an hour at the protest and get your free check from George Soros for protesting. (← sarcasm)
Download and print your own 26" by 16" version of this sign.
Signals
Signals protest sign, first used on 29-Mar-2025.
This was the week that the news broke about senior government officials using the consumer-grade chat app Signal to discuss warn plans.
I went off-script that week with a sign about that political nonsense.
It is a play on the phrase "The call is coming from inside the house!" — a play on a famous movie trope where the police tell the person in a home that they have traced the antagonist's call to that home.
In this case, the danger to democracy is coming from inside the Whitehouse!
Or, at least, that is what I was aiming for.
Download and print your own 26" by 16" version of this sign.
Elon MUSKed Up
Elon MUSKed Up, first used on 22-Mar-2025
Back to basics, I thought.
People are driving by quickly, so too much text won't be read.
So this was the idea:
Set the context: "Our GOVERNMENT was FINE."
Deliver the punchline: "Now it is MUSKed UP!"
Clear call-to-action: "FIRE ELON!" (in a flaming font, nonetheless)
And that seemed to work.
This might have been my best sign yet.
Download and print your own 26" by 16" version of this sign.
DEMOCRACY, not MUSKocracy, not TRUMPocracy
DEMOCRACY, not MUSKocracy, not TRUMPocracy protest sign, first used 15-Mar-2025.
This was my second protest sign, and I'm clearly still working on the craft.
Although the points of this sign didn't require a blog post to explain, it still had too many words on it.
That made it hard to read from cars that were driving by.
It might have been okay without the "Kings and Oligarchs are not American" in the middle, but without it I thought it lost its punch.
I'm feeling pretty ambivalent about it, so I haven't gone through the process of making it available for download; let me know if you'd like to have a printable version.
Our Fellow Americans
Our Fellow Americans protest sign, first used on 8-Mar-2025.
This is the first sign I made for a #TeslaTakedown, and I should have listened to my family.
They suggested that the initial version, without the "How much do you have in common with Elon Musk?" at the bottom, was too confusing.
Adding that sentence improved understanding, but now there was too much to read in a protest sign for cars whizzing past.
My point was that me and the person driving by giving me a middle finger have far more in common than what either of us have with Elon Musk (and Donald Trump).
Even so, I felt like I needed a blog post to fully explain what I meant.
I'm feeling pretty ambivalent about this one, too, but if you'd like a printable version please let me know.
Although this was my first sign, it was my second #TeslaTakedown protest.
I learned quickly that signs are an important part of the protest spirit, and the more creative the better.
This is an excerpt from a fuller contribution I made to Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools [pdf]. It is a sketch only, and somewhat informal, but I thought I would put it here in case of interest. It is also influenced by the context in which it was prepared which was a discussion of the informational disciplines and the iSchool in an R1 institution. In the unlikely event you would like to reference it, I would be grateful if you cite the full original: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward, Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools. University of Washington Information School. [pdf]
Introduction
The word information has been used so much that it has come to dominate discourse (Day, 2001). […] Vagueness and inconsistency are advantageous for slogans and using “chameleon words” that assume differing colors in different contexts allows flexibility for readers to perceive what they wish. Buckland, M. (2012). What kind of science can information science be? Journal of the American Society for Information Science and Technology.
I should like to draw an analytical distinction between the notions of “information society” and “informational society,” with similar implications for information/informational economy. The term “information society” emphasizes the role of information in society. But I argue that information, in its broadest sense, e.g. as communication of knowledge, has been critical in all societies […]. In contrast, the term “informational” indicates the attribute of a specific form of social organization in which information generation, processing, and transmission become the fundamental sources of productivity and power because of new technological conditions emerging in this historical period. My terminology tries to establish a parallel with the distinction between industry and industrial. An industrial society (a usual notion in the sociological tradition) is not just a society where there is industry, but a society where the social and technological forms of industrial organization permeate all spheres of activity, starting with the dominant activities, located in the economic system and in military technology, and reaching the objects and habits of everyday life. My use of the terms “informational society” and “informational economy” attempts a more precise characterization of current transformations beyond the common-sense observation that information and knowledge are important to our societies.Castells, M. (2009). The Rise of the Network Society, With a New Preface : the Information Age: Economy, Society, and Culture Volume I (2nd ed)
I look very briefly at our senses of information itself as it is the context for the discussion about informational disciplines to follow. It is certainly a ‘chameleon’ word. It has become so widely used as to become drained of specificity unless explicitly qualified in particular circumstances.
Raymond Williams’ Keywords does not include an entry for ‘Information,’ which is telling. This influential work gave rise to multiple subsequent collections which aim to update it or adapt it to particular domains. I quote from the entries on ‘information’ in three of these here:
Information as keyword — digital or otherwise — did not exist before the twentieth century. […] Then, unexpectedly, in the 1920s this formerly unmarked and unremarkable concept became a focal point of widespread scientific and mathematical investigation. ‘Information’ by Bernard Geoghegan in Peters, B. (Ed.). (2016). Digital Keywords.
Toward the end of the C20 “information” became a popular prefix to a range of concepts that claimed to identify essential features of an emerging new sort of society. The information explosion, information age, information economy, information revolution, and especially information society became commonplace descriptions (Castells, 1996-8; Webster, 2002). These covered, and tried to conceive, disparate phenomena, perhaps unwarrantedly. The concepts appeared superficially to capture similar phenomena, yet on closer inspection centered often on quite different things. For example, their concern ranged over a general increase in symbols and signs that accelerated from the 1960s (the information explosion); the development of information and communications technologies, especially the Internet (the information superhighway, reputedly coined by US vice-president Al Gore), the increased prominence of information in employment (information scientists, information labor, information professions); the growing significance of tradable information (information economy); and concerns for new forms of inequality (the information divide, the information rich/poor).‘Information’ by Frank Webster in Bennett (2005) New Keywords: a Revised Vocabulary of Culture and Society.
As burgeoning use of information in preference to related terms encroaches on the word’s surrounding lexical field, questions arise as to how everything from the human genome to celebrity gossip can so readily be referred to as information. ‘Information’ in MacCabe, C., & Yanacek, H. (2018). Keywords for today : a 21st century vocabulary : the keywords project.
Geoghegan notes the relatively recent general use of the word, and is primarily interested in the information-theoretic work of Shannon and others in the mid-20th century which was important for the development of telegraphy and in subsequent years of computing, cryptography, genetics, network theory and other areas (although was ultimately not very influential in Library Studies or related fields, as Hjørland (2018) points out). The other two excerpts emphasise growing use of the word throughout the latter half of the 20th Century. Webster (a sociologist who has specialized in information-related topics and has written about public libraries) notes the way in which it became attached to various generally descriptive labels, notably of course in the ‘Information society’ and for our purposes ‘Information science.’
The quotes above underline the strong emergence of information-related issues as a topic of investigation, and as an explanatory framework in different contexts. Given this widespread use, any account of information is also going to be provisional and contextual.
As background here, I sketch a very schematic overview of information history which departs from the W. Boyd Rayward (2014) account which influenced it.
However, as Castells notes, this “common-sense observation that information and knowledge are important to our societies,” is not in itself very revealing.
This may seem a little hubristic, but I am prompted to insert it here by the common assertion in iSchool materials – accompanied by such phrases as the ‘knowledge economy’ or the ‘information age’ – that the importance of information in our world elevates the work of the iSchool.
However, as Castells notes, this “common-sense observation that information and knowledge are important to our societies,” is not in itself very revealing.
Information has also gone beyond the bounds of any one subject. The chemist or the cultural geographer or the sociologist has an informational perspective. In this context, it seems to me, the promise of the iSchool is not that it has specialist or unique expertise, but that it can bring together a historical perspective and a multidisciplinary focus.
For reasons that should be clear, I do not attempt to define information here. See Bates (2010) for an exhaustive review of definitions, which, more than anything, suggests why a singular view is unlikely.
For convenience, I reference Michael Buckland’s (2017) pragmatic and functional account.
Information as thing. Informational artifacts – books, passports, menus. Broadly synonymous with Buckland’s inclusive view of ‘document.’
Information as process. Becoming informed.
Information as knowledge. Intangible. Fixed in ‘things.’ Imparted in the informing process.
In their reflective overview of information definitions in IS, Dinneen and Bauner (2017) note that “Buckland was aware that the overall account was likely to disappoint the pickiest of theorists.” While a more conceptual characterization—such as Bateson’s "a difference that makes a difference" or Bates’ "the pattern of organization of matter and energy"—might offer additional nuance or insight, it is less well-suited to my purposes. Dinneen and Bauner (2017) favor the recently influential work of Floridi, which also, incidentally, is highlighted in the important information science textbook, Bawden and Robinson (2022).
I reference it mostly because its somewhat technocratic emphasis is convenient. Much of the emphasis of library studies or information science is indeed on the recorded information that can provide a part of the material base for some of the more abstract or general uses above. And also partly because Buckland is such an interesting and historically influential figure in this discussion (librarian, leading Information Science theorist and practitioner, central player in the iSchool movement at Berkeley (Buckland, 2024)).
I take the pivotal mid- to late mid-twentieth Century as a starting point. As noted above, information was foregrounded in a variety of ways at this time, and terms such as ‘information science’ and ‘information society’ emerged. I refer to this as the short age of documents, a reference to the discourse around the challenge of managing recorded information.
Then, we have the long period before this, in which recorded information was manifest in print or manuscript forms. I refer to this as the long age of literacy and print.
And third, we have the short period after this, which we are now living through. While we can characterize this as a digital or network age, the more interesting point here is that an informational perspective becomes more common, extending to social and political contexts. Modern institutional constructs -- markets or bureaucracies for example -- can be seen in informational terms. I refer to this as the informational age (influenced by Castells’ characterization of the current period).
Furthermore, it is now common to reinterpret the past through an informational lens. Perhaps the most interesting recent example of this is Youvel Noah Harari&aposs ambitious The Nexus, which was published recently to mixed reception. He tends to see any ‘intersubjective reality’ as informational. He takes a long historical view, discussing developments as stages in the emergence of information networks. For example, he discusses the difference between democratic and totalitarian regimes as a difference between self-correcting and closed information networks. He talks about civilizations as combinations of a managerial and operational bureaucracy and a legitimating or imposed mythology, each again very much an informational apparatus. He reinterprets the past in terms of an informational present; here he is, for example, talking about the impact of printing: “print allowed the rapid spread not only of scientific facts but also of religious fantasies, fake news, and conspiracy theories.”
In this period also there is a strong emphasis on information critique, and with the advent of AI, perhaps, as I will suggest, we are seeing the apotheosis of the document.
The long age of literacy and print
The emergence of the document marked an important early transition.
Writing allowed thought to be externalized, fixing information in a medium that could persist across space and time. In an oral culture, knowledge had to be remembered. Mnemonic techniques and repetition aided memory. It was retained through song, story and ritual.
As Walter Ong and others have argued, the external documentary accumulation of knowledge co-evolved with a profound shift in the structure of thought, knowledge and communication, evident in the development of more abstract and systematic forms of thinking, the emergence of formal learning and scholarship, the development of laws, the codification of expertise, and so on. This intensified after the invention of movable type.
Libraries are a strongly institutionalized response to the print distribution model, where information or cultural resources were expensive or available in limited ways.
From then until the Second World War, say, information exchange was dominated by the production and exchange of print. Infrastructures and institutions emerged which helped create, manage and preserve documents, including libraries, archives, scholarly societies and publishing, publishers, commercial distribution mechanisms, and so on.
Libraries are a strongly institutionalized response to the print distribution model, where information or cultural resources were expensive or available in limited ways. Efficient access required the physical proximity of collections to their potential readers, and libraries built local collections to make them conveniently available within their communities. In this way, collection size and transaction volumes became a signifier of quality. Those interested in technical, scholarly, cultural or other documents built their workflow around the library.
So, while at a material or mechanical level, we see the progressive intensification and amplification of the production and exchange of documents, a greater variety of ways of processing information, and the massive accumulation of recorded knowledge, the more interesting story involves the mutual interaction between this and social and cultural life.
The progressive connectedness and complexity of social contexts entails progressively more communication across time and space, and a corresponding expansion of the external shared documentary accumulation of expertise and knowledge. Libraries are a part of the apparatus for retaining and sharing that documentary record.
The short age of documents
In the mid-20th Century, the production, circulation, and institutionalization of information expanded significantly, shaped by the ongoing interplay between social structures, technological developments, and organizational demands. The Second World War itself accelerated this, intensifying the need for systematic coordination across scientific, governmental, and industrial domains. There was growth in the scientific and technical literature, while governments and institutions expanded their administrative records, accompanying new forms of bureaucratic surveillance and tracking. Businesses became increasingly reliant on structured data, for decision-making, to optimize workflows, to comply with regulatory frameworks, and so on.
In this way, the volume and variety of documents (again, broadly understood) continued to grow, as did the processes by which they were shared. It is in this period, as noted above, that Information Science emerged, as a response to the challenges of managing this abundance of information in various ways.
Through the sixties and later, there was a focus on the structural changes brought about by “knowledge-intensive production and a post-industrial array of goods and services.” (Lash) This work was informed by the empirical work of Porat and Machlup and others and was given influential synthetic expression as the ‘information society’ or the ‘post-industrial society’ in the work of Daniel Bell. Peter Drucker also popularized the concepts of ‘knowledge work’ and ‘knowledge worker’ during this period.
In successive decades, digital information systems emerged – chemical information, health and legal systems, early knowledge management, and so on.
In this period also there was some modest social institutionalization of Information Science and related areas. The American Documentation Institute (ADI) was founded in 1937, becoming the American Society for Information Science (ASIS) in 1968. It acquired a final ‘t’ for technology in 2000, and finally became the Association for Information Science and Technology in 2013. The Institute of Information Scientists was formed in the UK in 1968, and merged with the Library Association in 2002 to form CILIP. IFIP (International Federation for Information Processing) was founded in 1960.
The evolving informational age
Then it looks at how the logic of information flows reterritorialize into new formations of the brand, the platform, the standard, intellectual property and the network. […] The primary qualities of information are flow, disembeddedness, spatial compression, temporal compression, real-time relations. It is not exclusively, but mainly, in this sense that we live in an information age. Lash, Scott (2002). Critique of information.
In his highly influential three volume The Information Age, Castells talks of a network society and an informational society. I will use this to frame an introduction to the current period, dating from the late 20th Century.
Contributory factors in this period are the accelerated evolution of communications and computational capacity which has provided the material base for a range of other developments. These include the restructuring and intensification of capitalism brought about by deregulation, privatization, global extension and geopolitical changes; the network flows of money, data and people which have changed how we think about the boundaries of organizations, nations and personal relations; and the ongoing transformation of the media, personal communications, and the means of shaping public opinion.
This environment rests on complex network systems, aggregations of data, and applications which communicate via protocols and APIs. This material base has co-evolved with social organization. For Castells (as for Lash and Harari), a key feature is the organizing power of networks, throughout all aspects of what we do.
For example, network effects have led to several dominant platforms that articulate much of our social, cultural and business activities (the Amazoogle phenomenon). Retail, music and entertainment were transformed. The flow of materials is monitored by tracking systems, and is articulated in complex just-in-time supply chains; mobile communications and mapping services have changed our sense of mobility and delivery; distribution chains, the disposition of goods around retail floors, investment decisions, variable real-time pricing, and many other taken for granted aspects of what we do are driven by the collection, exchange and analysis of data.
From a functional point of view, varieties of &aposInformationalisation&apos are visible at all levels in everyday life: doors open automatically, material money is disappearing; advanced instrumentation for observation and analysis is common in the sciences. Increasingly, our activities yield up data which influences what products are offered to us, the news we see, and so on.
Something as apparently simple as the selfie has interacted with behaviors to affect mental wellness, the travel industry and communication.
It is this sense of qualitative change that prompts Manuel Castells to pose the distinction between the ‘information society’ and the ‘informational society.’ We are now seeing an intensification of some of the informational trends he observes as AI becomes more common.
Castells discusses how the network has facilitated broad coordination of interests, in social movements, popular uprisings, or organized crime, for example. It extends to a global scale, where a network of megacities channels power and innovation. He suggests that there is at once a global integration facilitated by the network, but, at the same time, a growing fragmentation between those connected to the network circuits of power and prosperity and those not connected.
More recently, we also see a counter force to global integration. Rather than frictionless global information flow, we are seeing regimes forming around power blocs, with different policy, control and regulatory regimes. Think of the US, EU, Russia and China. There has also been some argument that unequal participation in the ‘knowledge economy’ is a factor in emerging political polarization.
Given this general importance, there has also been an interesting and unsurprising informational development in modern theory. This has come into our field most clearly perhaps in Jurgen Habermas’ concept of the public sphere, but think also, for example, of Anthony Giddens’ concept of ‘reflexive modernity’ or Ulich Beck’s ‘risk society.&apos
For Giddens&apos ‘reflexive modernity’ entails the “... the reflexive ordering and reordering of social relations in the light of continual inputs of knowledge affecting the actions of individuals and groups” so that “production of systematic knowledge about social life becomes integral to system reproduction.” Modern life rests on the dynamic reassessment of available information and expertise (in construction, engineering, medicine, technology, …), which builds on the accumulated record of science and technology.
I have chosen to reference the social sciences here, as informationalization and social and cultural change are intricately linked. It is also notable how little reference there is to classical information science in this discussion.
The generative turn: the apotheosis of the document
Instead, these AI systems are what we might call cultural technologies, like writing, print, libraries, internet search engines or even language itself. They are new techniques for passing on information from one group of people to another. Asking whether GPT-3 or LaMDA is intelligent or knows about the world is like asking whether the University of California’s library is intelligent or whether a Google search “knows” the answer to your questions. But cultural technologies can be extremely powerful—for good or ill. Alison Gopnik, 2022.
The current form of generative AI emerged as late as 2023. I find Alison Gopnik’s characterization of it as a cultural technology helpful. She places it in a historical context as the latest technique for passing information from one group of people to another, again considering information very broadly. I was interested to see her place libraries in this frame as well.
Effectively, large language models are statistical models derived from vast accumulations of documentary representations of knowledge. In the context of the narrative presented here, the volume and variety of documents is now so great that they are treated as a proxy for knowledge. Proponents of intelligent AI argue that the models, working with both the broad accumulated representation of knowledge in the training collections, and with massive compute, can find a way to not only summarize and generalize from the content of those documents, but also to replicate the minds that created them.
I tend to Gopnik’s skepticism on this question (see Yiu, E., Kosoy, E., & Gopnik, A. (2024) for an extended argument).
Nevertheless, the processing powers of the models make them very effective for some purposes, and the agentic and applications infrastructure being built on top of them promise to make them more so. We do not know yet whether and where developments will plateau, or how adoption varies by tolerance for hallucinations,[1] or where the impact will be most felt.
However, given the key role of documents (information) in managing complex organizations and interactions, some see the reach of AI as extensive. In this way, the informational, reflexive, networked nature of social life is potentially further intensified.
This extensive informationalization is why Harari, for example, is concerned about the potential reach and impact of AI, as the systemic processability of the connective informational tissue of organizations and systems, he argues, renders them vulnerable to manipulation.
This extensive informationalization is why Harari, for example, is concerned about the potential reach and impact of AI, as the systemic processability of the connective informational tissue of organizations and systems, he argues, renders them vulnerable to manipulation.
Of course, the ramifications of AI for libraries and for iSchools are accordingly significant. It intensifies some of the trends we have observed, and -- as with other activities at scale -- has both constructive and problematic elements (to use a phrase of Barrett and Orlikowski’s).
If we think of the informational disciplines having a special interest in recorded information, some immediate issues arise.
Cultural synthesizers. Synthesized content and context add a new dimension and challenge.
Iterative and chained interaction. We will interact differently with information objects or bodies of knowledge. Think of how larger publishers or aggregators will provide access to the scholarly literature, for example.
Social confidence and trust. Our sense of authenticity, identity, authorship will all be redefined, creating issues of trust and verification.
Policy, law and practice will all evolve unevenly in concert.
Information critique
Our simplest actions or interactions now entail complex informational networks and platforms. Think of what is involved in just texting, sending an email or writing in cloud-based Office 365, whatever about group document preparation, remote experiments, or mapping activity.
Day to day behaviors yield up data which is aggregated at scale and used in various ways to monitor, sell or advertise. Large companies have built vast consolidated infrastructure - we are used to thinking of information as immaterial, however, AI has also emphasized how the cloud has boots of concrete. These companies also wield great cultural and economic power - Spotify does our listening for us, Amazon holds sway over merchants on its site.
These social and cultural ramifications mean that undesirable effects are visible and urgent. Addressing these has become an ongoing research, education and advocacy role for the informational disciplines, among others. There is also greater historical sensitivity, an alertness to the ways in which experiences, memories and knowledges may have been suppressed, distorted or invisible (see Benedict Anderson’s classic discussion of museums, maps and other resources in the emergence of nationalism, for example).
Here is a non-exhaustive list of information issues.
Inequity. Given the centrality of the network and digital resources, differential access creates inequities.
Surveillance. There is an increase in direct surveillance and also increased collection of data which drives other aspects of our environment. We are generating data shadows which are operationalized in various ways to influence or inform.
Market concentration. The winner takes all dynamic of network services has resulted in the dominance of several platforms who wield great economic power and influence.
Dominant or partial perspectives. Perspectives which are historically dominant, or politically motivated, or which reflect imbalances of power and influence may be over-represented in any resource. The plurality of experiences, memories and knowledges is under-represented in the record.
Dis- and misinformation/’degraded democratic publics’. Our reliance on flows of information has led to concerted attempts to distort, mislead or defraud. Henry Farrell has argued that there is a more fundamental problem, which is “not that social media misinforms individuals about what is true or untrue but that it creates publics with malformed collective understandings.” (Farrell)
Geopolitical fragmentation. Rather than a global information flow, as noted above, we are seeing regimes forming around power blocs, with different policy, control and regulatory regimes. Think of the US, EU, Russia and China.
Information today – systems of information
I kept this section in for completeness although its references to &aposlater&apos etc are to the origin document [pdf].
Information, then, is at once fugitive and everywhere, chameleon-like.
I noted above how libraries, Information Science, and the iSchool emerged in different phases of the information evolution. And in some ways, they reflect elements of when they emerged.
The library is different from the other two, in that I talk about the library itself as an organization, rather than as a body of knowledge or techniques. Libraries emerged in the first phase described above. Historically, the collection was the core of library identity, as an organized response to accessible distribution in a print world, and to the preservation of knowledge. The library continues as an organized response by cities, universities and others to learning, research and equity of access to the means of creative production. In this way the scope has moved beyond the collection in various ways, as discussed further below. As an organization, the library benefits from education and research in information management topics, but also across a range of other topics (public policy, for example).
Information science emerged, in the second phase, in the mid-20th Century. In this narrative, its origin story splits from the library as information production grows, requiring new methods of organization and access, and it anticipates elements of today’s information environment. As discussed further below, common to definitions of Information Science is a concern with documents (or recorded information, literatures, and similar).
Although, it doesn’t make sense to lean on it too heavily, one might say that Information Science largely retains an information view of the world, concerned with access to and management of information as a thing (in Buckland’s terms).
In the third phase, information is not only seen as something to be managed or discovered, but as an organizing element of social structure and interaction. It has become an object of study across many disciplines and in social and cultural analysis.
The iSchool has emerged in this third phase and it typically embraces a broad set of informational interests. In some ways it subsumes Information Science interests in a very broad view of information in the modern world.
A large part of typical iSchool portfolio is information systems oriented, at undergraduate and graduate levels, meeting needs for workers with technology, business and social skills. It may be a more applied alternative to computer science. It may also encompass expertise in other informational fields (policy, philosophical/social/cultural, data science, HCI, digital humanities, and so on). The broad disciplinary spread also potentially encompasses social and philosophical perspectives as well as very often a strong information critique emphasis.
Borrowing a suggestive phrase from an article by Black and Schiller (2014), one could say that the iSchool is often interested in both information systems and systems of information, ‘systems that create information through social means.’
Of course, the iSchool is not a discipline – it is an evolving academic structure, although, as I have noted, it may be associated with a broad view of Information Science (understood generically not in the classical sense) or Information Sciences. Informatics, a term which emerged in the 1960s (often associated with another term, as in health or social informatics) may also feature.
The focus and disciplinary spread varies across schools.
References
Anderson, B. (2016). Imagined communities: reflections on the origin and spread of nationalism (revised edition). Verso.
Barrett, M., & Orlikowski, W. (2021). Scale matters: doing practice-based studies of contemporary digital phenomena. MIS Quarterly, 45(1). https://doi.org/10.25300/misq/2021/15434.1.3
Bates, M. J. (2010). Information. In M. J. Bates & M. N. Mack (Eds.), Encyclopedia of Library and Information Sciences, 3rd Ed. (Vol. 3). CRC Press. https://pages.gseis.ucla.edu/faculty/bates/articles/information.html
Beck, U., & Ritter, M. (1992). Risk society : towards a new modernity. Sage Publications.
Bell, D. (1976). The coming of post-industrial society : a venture in social forecasting. Basic Books.
Bennett, T., Grossberg, L., & Morris, M. (Eds.). (2005). New Keywords: A Revised Vocabulary of Culture and Society. Wiley-Blackwell.
Black, A., & Schiller, D. (2014). Systems of information: The long view. Library Trends, 63(3), 628–662. https://hdl.handle.net/2142/89724
Buckland, M. (2012). What kind of science can information science be? Journal of the American Society for Information Science and Technology, 63(1), 1–7. https://doi.org/10.1002/asi.21656
Buckland, M. (2017). Information and society. The MIT Press.
Buckland, M. (2024). The Berkeley School of Information: A Memoir. https://escholarship.org/uc/item/79v080z7
Castells, M. (2009). The Rise of the Network Society, With a New Preface : the Information Age: Economy, Society, and Culture Volume I (2nd ed). Hoboken, N.J.: John Wiley & Sons, Ltd.
Dinneen, J. D., & Bauner, C. (2017). Information-not-thing: further problems with and alternatives to the belief that information is physical. CAIS/ACSI `17: Proceedings of the Annual Conference of the Canadian Association for Information Science / l’Association Canadienne Des Sciences de l’Information. https://philarchive.org/archive/DINIFP
Harari, Y. N. (2024). Nexus : a brief history of information networks from the Stone Age to AI. Random House, an imprint and division of Penguin Random House LLC.
Rayward, W. Boyd. (2014). Information Revolutions, the Information Society, and the Future of the History of Information Science. Library Trends, 62(3), 681–713. https://doi.org/10.1353/lib.2014.0001
Williams, R. (2014). Keywords (New Edition). Oxford University Press.
Yiu, E., Kosoy, E., & Gopnik, A. (2024). Transmission Versus Truth, Imitation Versus Innovation: What Children Can Do That Large Language and Language-and-Vision Models Cannot (Yet). Perspectives on Psychological Science, 19(5), 874–883. https://doi.org/10.1177/17456916231201401
Note: I took the feature image in Cambridge.
[1] It is pity that ‘hallucination’ has become the term used here, as it gives a misleading sense of how the LLMs work.
This text, part of the #ODDStories series, tells a story of Open Data Day‘s grassroots impact directly from the community’s voices.
The Conversatory Open Data and New Technologies to Face the Polycrisis, held on 19 March in the city of Xalapa, Veracruz, Mexico, brought together teachers and researchers from different fields with a common characteristic: working with open data. The event took place at the Faculty of Statistics and Informatics (FEI) of the Universidad Veracruzana (UV). It was attended by professors and students from the Statistics, Data Science Engineering and Computer Science courses offered at the same institution.
The event was inaugurated by Dr Minerva Reyes Felix, FEI’s Academic Secretary, and organised by MCD Lorena Lopez Lozada, leader of the Open Data Project in its eighth edition as organiser.
Reliable open data, useful for managing the polycrisis
Lorena López Lozada explained that “the objective of the event is to share, from different perspectives, the current situation where several global crises are happening at the same time and their combined impact is greater than their individual impact, the so-called polycrisis, and how the use of open data allows us to visualise the social context and support decision-making based on reliable information”.
Each of the panelists shared enriching opinions, as their different fields of study allowed for a multidisciplinary discussion. The debate was chaired by MGC José Fabián Muñoz and was divided into three main questions:
What has been your experience in working with open data?
What kind of open data can be used to address the polycrisis?
What do you recommend for the generation of open data regarding infrastructure, security, reliability, among others?
Experience in working with open data
Nicandro Cruz Ramírez, a professor at the Artificial Intelligence Research Institute (IIIA) of the Universidad Veracruzana, began this round of questions. He mentioned that he was fortunate to have open data and to be able to work with it. One example was ecological data, which can be used to analyse climate change and, in this case, the loss of ecological integrity of top predators (large and strong species in the area such as bears, pumas and lions). Creating artificial intelligence models to identify the causes of this problem and assess the impact of future construction projects in biodiverse areas.
In this conversation, Angel Fernando Argüello Ortíz, a professor at FEI, shared examples of how census information has been key to the implementation of projects and the difficulties he has encountered when this information is not freely available. He also expressed that information is power, but that it must be used correctly, as “all power carries a great responsibility, because sometimes it generates benefits, but sometimes it generates disadvantages, so it is important to use it correctly”.
Types of open data to address the polycrisis
Another participant in the discussion was Agustín Fernández Eguiarte, a researcher at the Institute of Atmospheric Sciences and Climate Change at the National Autonomous University of Mexico (UNAM), who is on a research visit to the UV’s Centre for Earth Sciences (CCT). He mentioned that “it is necessary to have climate, climate change, hydro-meteorological and environmental data, but structured as open fair data in unconditional standards or metadata and integrated in data repositories”. As examples, he has developed an open data repository on climate change and tropical cyclones in the state of Veracruz, which includes an interactive visualisation in addition to the databases; and an open data repository on the Pico de Orizaba.
The information from both will be made available on the Internet for those interested in the topics, “because to face the polycrisis we need reliable data and information, with quality control, to support any policy, action or programme to address climate change,” said the researcher.
Ángel Juan Sánchez García, a lecturer at FEI, said that although it is impossible to solve the world with data, it is possible to have an idea of the problems that are occurring at a global level. He mentions that in Mexico some sectors do not have a culture of generating data, which makes it difficult to analyse their own areas. This could be linked to human factors (lack of empathy for sharing data) and resistance to analysis due to possible social alarm (touching on real but sensitive issues).
Recommendations for open data production
To conclude this series of questions, each of the participants shared their recommendations for those using or generating open data. We can divide their opinions into five sections:
Create a culture of information management. Knowing how to gather information based on project objectives and being willing to share that information and allow others to access and consult it.
Collaboration. Pupils and students are encouraged to work in teams. Projects are more practical and easier to carry out when a multidisciplinary team works together.
Technological infrastructure. Ensure the quality, security, reliability and other attributes necessary to work with open data, as a lack of these will limit its analysis.
Working with other types of data. As Ángel Juan Sánchez García mentions, “we need to venture into other types of non-conventional data”, because in many cases, there is a very rigid idea of what a database is, which limits the variety of analyses that can be carried out.
Legislation. One of the participants mentioned the need for legislation on the management and collection of information, despite the existence of legislation. In the case of Mexico, the General Directorate of Access Policies of the Secretariat for Access to Information of the National Institute for Transparency, Access to Information and Protection of Personal Data (INAI) prepared a document in 2023, which was presented as a “draft” to the Technical Group of the National Open Data Policy. Subsequently, it was presented to the different feedback spaces that made up the construction of this public policy, proposed within the framework of the Open Mexico Strategy and its corresponding methodology. For its elaboration, the Minimum Criteria and Methodology for the Design and Documentation of Policies for Access to Information, Proactive Transparency and Open Government, approved by the National System of Transparency, Access to Public Information and Protection of Personal Data, and published in the Official Gazette of the Federation on 23 November 2017, were used as a reference. However, this policy has not come to fruition, as INAI has disappeared in the current government administration.
Despite the existence of the Open Data Charter and its principles, governments are not yet committed to having open data in all possible areas of information sharing with citizens.
With the support of the Open Knowledge Foundation and Datopian for the development of the Conversatory, the academic event marked a significant step towards strengthening the promotion and use of open data, as well as the socialisation between students and teachers and researchers, sharing in a healthy environment and also contributing to the integral development of students.
About Open Data Day
Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities. ODD is led by the Open Knowledge Foundation (OKFN) and the Open Knowledge Network.
As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date over one week. In 2025, a total of 189 events happened all over the world between March 1st and 7th, in 57 countries using 15+ different languages. All outputs are open for everyone to use and re-use.
In seventh grade, Miss Phillips had me memorize "Paul Revere's Ride" by Henry Wadsworth Longfellow. So I did. After finishing "Jabberwocky" to start off the year of run naming, it seemed obvious what my next effort would be. I calculated that I could arrange to end it on the day of the Boston Marathon, thus neatly tying the verse with the running. And to top it off, the "18th of April" cited in the poem was exactly 250 years ago on Friday.
"Paul Revere's Ride" was first published in The Atlantic Monthly in 1861.
On looking up the poem, also titled "The Landlord's Tale", I discovered the poem's political undertones. It was written in the leadup to the Civil War, and Longfellow had been outspoken as an abolishionist. The poem was a call to action to Northerners, recalling their role in the American Revolution. So not irrelevant to the current situation.
In January for a Daily Mail article, Miriam Kuepper interviewed Salomé Balthus a "high-end escort and author from Berlin" who works the World Economic Forum. Balthus reported attitudes that clarify why "3C Here We Come" is more likely. The article's full title is:
What the global elite reveal to Davos sex workers: High-class escort spills the beans on what happens behind closed doors - and how wealthy 'know the world is doomed, so may as well go out with a bang'
Below the fold I look into a wide range of evidence that Balthus' clients were telling her the truth.
Kuepper quotes Balthus:
'The elephant in the room is climate change. Everyone knows it can't be prevented any more,' she said, adding that the 'super rich' could generally be split into two groups on the topic.
'The one group thinks it only affects the poor, the "not-white race", while the others fear that it could get worse but there's no sense in trying to do anything about it so they just enjoy themselves,' she told MailOnline.
'The one half is in despair and the other, dumber, half is celebrating future mass deaths.
Salome elaborated that some of the uber wealthy people fitting into the first group were saying that those in third world countries 'might all die but us in the North, we're fine'.
She said: 'They say that in a democracy you have to sell it, to lie to people and tell them "we didn't know better and didn't think it would get this bad", not admitting that they know.
'Then there's the other group that thinks it might not be so easy, maybe it will also affect us due to unforeseeable chain reactions.
'But they say they can't do anything against the others so they live following the mantra "after us, the deluge".
'They say they will enjoy a few more nice years on earth and know that there's no future. They are very cynical and somehow deeply sad.'
This attitude matches Schmidt's fatalism — we're doomed but we might as well make money/have fun until then. What it misses is that everything they're doing is bringing "until then" closer. As I wrote about Schmidt:
He is right that “we’re not going to hit the climate goals anyway", but that is partly his fault. Even assuming that he's right and AI is capable of magically "solving the problem", the magic solution won't be in place until long after 2027, which is when at the current rate we will pass 1.5C. And everything that the tech giants are doing right now is moving the 1.5C date closer.
Economic models have systematically underestimated how global heating will affect people’s wealth, according to a new study that finds 4C warming will make the average person 40% poorer – an almost four-fold increase on some estimates.
The reason for the underestimation is that their model of the effect of climate change on a country's GDP accounts only for in-country effects. Reconsidering the macroeconomic damage of severe warming by Timothy Neal et al from the University of New South Wales instead models global weather's effects on a world of interconnected supply chains:
Figure 1 shows the projected percentage reduction in global GDP from a high emissions future (SSP5-8.5) relative to a lower emissions future (SSP1-2.6), for the three models outlined in section 2.2. Each economic model is run with and without global weather to determine its impact on the projections. Without the inclusion of global weather (blue line), all three models project more mild economic losses with a median loss at 2100 of −28% for the Burke15 model, −4% for the Kahn21 model, and −11% for the Kotz24 model. Projected losses from the latter two models are more optimistic than in the original articles, likely due to the variations in data and exact assumptions made.
The study by Australian scientists suggests average per person GDP across the globe will be reduced by 16% even if warming is kept to 2C above pre-industrial levels. This is a much greater reduction than previous estimates, which found the reduction would be 1.4%.
Today, the wealthiest middle-aged and older adults in the U.S. have roughly the same likelihood of dying over a 12-year period as the poorest adults in northern and western Europe, according to a study published Wednesday in The New England Journal of Medicine.
Heat waves, wildfires, floods, tropical storms and hurricanes are all increasing in scale, frequency and intensity, and the World Health Organization forecasts that climate change will cause 250,000 additional deaths each year by the end of this decade from undernutrition, malaria, diarrhea and heat stress alone. Even so, the impact on human health and the body count attributed to extreme weather remain massively underreported — resulting in a damaging feedback loop of policy inaction. Meanwhile, the very people who might fix that problem, at least in the US, are being fired en masse amid the Trump administration’s war on science.
But we know countries aren't going to "hit short-term and long-term climate targets" because, among other reasons, it would prevent us achieving the wonderful benefits of AI such as generated images in the style of Studio Ghibli.
Owing to “recent setbacks to global decarbonization efforts,” Morgan Stanley analysts wrote in a research report last month, they “now expect a 3°C world.” The “baseline” scenario that JP Morgan Chase uses to assess its own transition risk—essentially, the economic impact that decarbonization could have on its high-carbon investments—similarly “assumes that no additional emissions reduction policies are implemented by governments” and that the world could reach “3°C or more of warming” by 2100. The Climate Realism Initiative launched on Monday by the Council on Foreign Relations similarly presumes that the world is likely on track to warm on average by three degrees or more this century. The essay announcing the initiative calls the prospect of reaching net-zero global emissions by 2050 “utterly implausible.”
...
Bleak as warming projections are, a planet where governments and businesses fight to the death for their own profitable share of a hotter, more chaotic planet is bleaker still. The only thing worse than a right wing that doesn’t take climate change seriously might be one that does, and can muster support from both sides of the aisle to put “America First” in a warming, warring world.
Of course, we might as well "enjoy a few more nice years on earth", because the 70-year-old Schmidt (and I) will be dead long before 2100. Our grandchildren will just have to figure something out. In the meantime we need to make as much money as possible so the grandchildren can afford their bunkers.
Lets look at some of the actions of the global elite instead of the words they use when not talking to their escorts, starting with Nvidia the picks and shovels provider to the AI boom.
Nvidia's path forward is clear: its compute platforms are only going to get bigger, denser, hotter and more power hungry from here on out. As a calorie deprived Huang put it during his press Q&A last week, the practical limit for a rack is however much power you can feed it.
"A datacenter is now 250 megawatts. That's kind of the limit per rack. I think the rest of it is just details," Huang said. "If you said that a datacenter is a gigawatt, and I would say a gigawatt per rack sounds like a good limit."
The NVL72 is a rackscale design inspired heavily by the hyperscalers with DC bus bars, power sleds, and networking out the front. And at 120kW of liquid cooled compute, deploying more than a few of these things in existing facilities gets problematic in a hurry. And this is only going to get even more difficult once Nvidia's 600kW monster racks make their debut in late 2027.
This is where those "AI factories" Huang keeps rattling on about come into play — purpose built datacenters designed in collaboration with partners like Schneider Electric to cope with the power and thermal demands of AI.
So Nvidia plans to increase the power draw per rack by 10x. The funds to build the "AI factories" to house them are being raised right now as David Gerard reports in a16z raising fresh $20b to keep the AI bubble pumping:
Venture capital firm Andreessen Horowitz, affectionately known as a16z, is looking for investors to put a fresh $20 billion into AI startups. [Reuters]
For perspective, that’s more than all US venture capital funding in the first three months of 2025, which was $17 billion. [PitchBook]
This means that a16z think there’s at least this much money sloshing about, sufficiently desperate for a return.
PitchBook says a16z is talking up its links to the Trump administration to try to recruit investors — the pitch is to get on the inside of the trading! This may imply a less than robustly and strictly rule-of-law investment environment.
The owners of a recently demolished coal-fired power plant announced the site will become a data center powered by the largest natural gas plant in the country.
The Homer City Generating Station in Indiana County was decommissioned in 2023 and parts of it were imploded last month. It had been at one time the largest coal-fired power plant in Pennsylvania.
The plant’s owners, Homer City Redevelopment, announced the site will become a 3,200-acre data center campus for artificial intelligence and other computing needs
The "largest natural gas plant in the country" will be pumping out carbon dioxide for its predicted service life of 75 years, into the 3C period of 2100.
Taken together, the measures represent a sweeping attempt to ensure coal remains part of the US electricity mix, despite its higher greenhouse gas emissions and frequently greater cost when compared to natural gas or solar power.
The effort also underscores Trump’s commitment to tapping America’s coal resources as a source of both electricity to run data centers and heat to forge steel. The president and administration officials have made clear boosting coal-fired power is a priority, one they see as intertwined with national security and the US standing in a global competition to dominate the artificial intelligence industry.

Amazon, Microsoft and Google are operating data centres that use vast amounts of water in some of the world’s driest areas and are building many more, an investigation by SourceMaterial and The Guardian has found.
With US President Donald Trump pledging to support them, the three technology giants are planning hundreds of data centres in the US and across the globe, with a potentially huge impact on populations already living with water scarcity.
“The question of water is going to become crucial,” said Lorena Jaume-Palasí, founder of The Ethical Tech Society. “Resilience from a resource perspective is going to be very difficult for those communities.”
Efforts by Amazon, the world’s biggest online retailer, to mitigate its water use have sparked opposition from inside the company, SourceMaterial’s investigation found, with one of its own sustainability experts warning that its plans are “not ethical”.
Amazon’s three proposed data centres in Aragon, northern Spain—each next to an existing Amazon data centre—are licensed to use an estimated 755,720 cubic metres of water a year, enough to irrigate more than 200 hectares (500 acres) of corn, one of the region’s main crops.
In practice, the water usage will be even higher as that figure doesn’t take into account water used in generating electricity to power the new installations, said Aaron Wemhoff, an energy efficiency specialist at Villanova University in Pennsylvania.
Between them, Amazon’s planned Aragon data centres will use more electricity than the entire region currently consumes. Meanwhile, Amazon in December asked the regional government for permission to increase water consumption at its three existing data centres by 48 per cent.
Opponents have accused the company of being undemocratic by trying to rush through its application over the Christmas period. More water is needed because “climate change will lead to an increase in global temperatures and the frequency of extreme weather events, including heat waves”, Amazon wrote in its application.
Right. We need to use more water to cope with the "extreme weather events, including heat waves" we are causing, which will allow us to cause more "extreme weather events" which will mean we need more water! It is a vicious cycle.
Is there really a demand for these monsters? One of Nvidia's big customers is CoreWeave:
In my years writing this newsletter I have come across few companies as rotten as CoreWeave — an "AI cloud provider" that sells GPU compute to AI companies looking to run or train their models.
CoreWeave had intended to go public last week, with an initial valuation of $35bn. While it’s hardly a recognizable name — like, say, OpenAI, or Microsoft, or Nvidia — this company is worth observing, if not for the fact that it’s arguably the first major IPO that we’ve seen from the current generative AI hype bubble, and undoubtedly the biggest.
The initial public offering of AI infrastructure firm CoreWeave, initially targeting a $2.7bn raise at $47-55 per share, was slashed to $1.5bn at $40 per share. Even then, the deal barely limped across the finish line, thanks to a last-minute $250mn “anchor” order from Nvidia. The offering reportedly ended up with just three investors holding 50 per cent of the stock, and it seems to have required some stabilisation from lead bank Morgan Stanley to avoid a first-day drop. Hardly a textbook success.
Imagine a caravan maker. It sells caravans to a caravan park that only buys one type of caravan. The caravan park leases much of its land from another caravan park. The first caravan park has two big customers. One of the big customers is the caravan maker. The other big customer is the caravan maker’s biggest customer. The biggest customer of the second caravan park is the first caravan park.
This, more or less, is the line being taken by AI researchers in a recent survey. Asked whether "scaling up" current AI approaches could lead to achieving artificial general intelligence (AGI), or a general purpose AI that matches or surpasses human cognition, an overwhelming 76 percent of respondents said it was "unlikely" or "very unlikely" to succeed.
"The vast investments in scaling, unaccompanied by any comparable efforts to understand what was going on, always seemed to me to be misplaced," Stuart Russel, a computer scientist at UC Berkeley who helped organize the report, told NewScientist. "I think that, about a year ago, it started to become obvious to everyone that the benefits of scaling in the conventional sense had plateaued."
AI continues to improve – at least according to benchmarks. But the promised benefits have largely yet to materialize while models are increasing in size and becoming more computationally demanding, and greenhouse gas emissions from AI training continue to rise.
These are some of the takeaways from the AI Index Report 2025 [PDF], a lengthy and in-depth publication from Stanford University's Institute for Human-Centered AI (HAI) that covers development, investment, adoption, governance and even global attitudes towards artificial intelligence, giving a snapshot of the current state of play.
...
However, HAI highlights the enormous level of investment still being pumped into the sector, with global corporate AI investment reaching $252.3 billion in 2024, up 26 percent for the year. Most of this is in the US, which hit $109.1 billion, nearly 12 times higher than China's $9.3 billion and 24 times the UK's $4.5 billion, it says.
Despite all this investment, "most companies that report financial impacts from using AI within a business function estimate the benefits as being at low levels," the report writes.
It says that 49 percent of organizations using AI in service operations reported cost savings, followed by supply chain management (43 percent) and software engineering (41 percent), but in most cases, the cost savings are less than 10 percent.
When it comes to revenue gains, 71 percent of respondents using AI in marketing and sales reported gains, while 63 percent in supply chain management and 57 percent in service operations, but the most common level of revenue increase is less than 5 percent.
Meanwhile, despite the modest returns, the HAI report warns that the amount of compute used to train top-notch AI models is doubling approximately every 5 months, the size of datasets required for LLM training is doubling every eight months, and the energy consumed for training is doubling annually.
This is leading to rapidly increasing greenhouse gas emissions resulting from AI training, the report finds. It says that early AI models such as AlexNet over a decade ago caused only modest CO₂ emissions of 0.01 tons, while GPT-4 (2023) was responsible for emitting 5,184 tons, and Llama 3.1 405B (2024) pumping out 8,930 tons. This compares with about 18 tons of carbon a year the average American emits, it claims.
The premise that AI could be indefinitely improved by scaling was always on shaky ground. Case in point, the tech sector's recent existential crisis precipitated by the Chinese startup DeepSeek, whose AI model could go toe-to-toe with the West's flagship, multibillion-dollar chatbots at purportedly a fraction of the training cost and power.
Of course, the writing had been on the wall before that. In November last year, reports indicated that OpenAI researchers discovered that the upcoming version of its GPT large language model displayed significantly less improvement, and in some cases, no improvements at all than previous versions did over their predecessors.
According to OpenAI’s internal tests, o3 and o4-mini, which are so-called reasoning models, hallucinate more often than the company’s previous reasoning models — o1, o1-mini, and o3-mini — as well as OpenAI’s traditional, “non-reasoning” models, such as GPT-4o.
Perhaps more concerning, the ChatGPT maker doesn’t really know why it’s happening.
In its technical report for o3 and o4-mini, OpenAI writes that “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models. O3 and o4-mini perform better in some areas, including tasks related to coding and math. But because they “make more claims overall,” they’re often led to make “more accurate claims as well as more inaccurate/hallucinated claims,” per the report.
OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.
It may well turn out that people put more value on being right than being plausible.
Increasingly, there are other signs that the current, costly, proprietary AI approach is coming to an end. For example, we have Matt Asay's DeepSeek’s open source movement:
It’s increasingly common in AI circles to refer to the “DeepSeek moment,” but calling it a moment fundamentally misunderstands its significance. DeepSeek didn’t just have a moment. It’s now very much a movement, one that will frustrate all efforts to contain it. DeepSeek, and the open source AI ecosystem surrounding it, has rapidly evolved from a brief snapshot of technological brilliance into something much bigger—and much harder to stop. Tens of thousands of developers, from seasoned researchers to passionate hobbyists, are now working on enhancing, tuning, and extending these open source models in ways no centralized entity could manage alone.
For example, it’s perhaps not surprising that Hugging Face is actively attempting to reverse engineer and publicly disseminate DeepSeek’s R1 model. Hugging Face, while important, is just one company, just one platform. But Hugging Face has attracted hundreds of thousands of developers who actively contribute to, adapt, and build on open source models, driving AI innovation at a speed and scale unmatched even by the most agile corporate labs.
Now, researchers at Microsoft's General Artificial Intelligence group have released a new neural network model that works with just three distinct weight values: -1, 0, or 1. Building on top of previous work Microsoft Research published in 2023, the new model's "ternary" architecture reduces overall complexity and "substantial advantages in computational efficiency," the researchers write, allowing it to run effectively on a simple desktop CPU. And despite the massive reduction in weight precision, the researchers claim that the model "can achieve performance comparable to leading open-weight, full-precision models of similar size across a wide range of tasks."
ChatGPT came out in 2022, and the Chinese government declared AI infrastructure a national priority. Over 500 new data centres were announced in 2023 and 2024. Private investors went all-in.
Demand for the data centres turns out not to be there. Around 80% are not actually in use. [MIT Technology Review]
The business model was to rent GPUs. DeepSeek knifed that, much as it did OpenAI. There’s now a lot of cheap GPU in China. Data centre projects are having trouble finding new investment.
The Chinese data centre boom was a real estate deal — many investors pivoted straight from real estate to AI.
Having lived through the early days of the internet frenzy, Fabrice Coquio, senior veep at Digital Realty, which bills itself as the world's largest provider of cloud and carrier-neutral datacenter, colocation and interconnection services, is perhaps better placed than most to venture an opinion. Is there a bubble?
"I have been in this industry for 25 years, so I've seen some ups and downs. At the moment, definitely that's on the very bullish side, particularly because of what people believe will be required for AI," he tells The Register.
Grabbing a box of Kleenex tissues, he quips that back at the turn of the millennium, if investors were told the internet was inside they would have rushed to buy it. "Today I am telling you there is AI inside. So buy it."
"Is there a bubble? Potentially? I see the risk, because when some of the traditional investments in real estate – like housing, logistics and so on – are not that important, people are looking to invest their amazing capacity of available funds in new segments, and they say, 'Oh, why not datacenters?'"
He adds: "In the UK, in France, in Germany, you've got people coming from nowhere having no experiences… that have no idea about what AI and datacenters are really and still investing in them.
"It's the expression of a typical bubble. At the same time, is the driver of AI a big thing? Yes… [with] AI [there] is a sense of incredible productivity for companies and then for individuals. And this might change drastically the way we work, we operate, and we deliver something in a more efficient way.
The "slow AIs" that run the major AI companies hallucinated a future where scaling continued to work and have already sunk vast sums into data centers. The "slow AIs" can't be wrong:
Nonetheless, if Microsoft's commitment to still spending tens of billions of dollars on data centers is any indication, brute force scaling is still going to be the favored MO for the titans of the industry — while it'll be left to the scrappier startups to scrounge for ways to do more with less.
The IEA’s models project that data centres will use 945 terawatt-hours (TWh) in 2030, roughly equivalent to the current annual electricity consumption of Japan. By comparison, data centres consumed 415 TWh in 2024, roughly 1.5% of the world’s total electricity consumption (see ‘Global electricity growth’).
The projections focus mostly on data centres in general, which also run computing tasks other than AI — although the agency estimated the proportion of data-centre servers devoted to AI. They found that such servers accounted for 24% of server electricity demand and 15% of total data-centre energy demand in 2024.
Alex de Vries, a researcher at the Free University of Amsterdam and the founder of Digiconomist, who was not involved with the report, thinks this is an underestimate. The report “is a bit vague when it comes to AI specifically”, he says.
Even with these uncertainties, “we should be mindful about how much energy is ultimately being consumed by all these data centres”, says de Vries. “Regardless of the exact number, we’re talking several percentage of our global electricity consumption.”
There are reasonable arguments to suggest that AI tools may eventually help reduce emissions, as the IEA report underscores. But what we know for sure is that they’re driving up energy demand and emissions today—especially in the regional pockets where data centers are clustering.
So far, these facilities, which generally run around the clock, are substantially powered through natural-gas turbines, which produce significant levels of planet-warming emissions. Electricity demands are rising so fast that developers are proposing to build new gas plants and convert retired coal plants to supply the buzzy industry.
If the data centers get built, they will add to carbon emissions and push us closer to 3C sooner. Of course, this investment in data centers needs to generate a return, but it may well turn out that the market isn't willing to pay enough for Ghibli-style memes to provide it. Ed Zitron has been hammering away at this point, for example in There Is No AI Revolution:
Putting aside the hype and bluster, OpenAI — as with all generative AI model developers — loses money on every single prompt and output. Its products do not scale like traditional software, in that the more users it gets, the more expensive its services are to run because its models are so compute-intensive.
For example, ChatGPT having 400 million weekly active users is not the same thing as a traditional app like Instagram or Facebook having that many users. The cost of serving a regular user of an app like Instagram is significantly smaller, because these are, effectively, websites with connecting APIs, images, videos and user interactions. These platforms aren’t innately compute-heavy, at least to the same extent as generative AI, and so you don’t require the same level of infrastructure to support the same amount of people.
Conversely, generative AI requires expensive-to-buy and expensive-to-run GPUs, both for inference and training the models themselves. The GPUs must be run at full tilt for both inference and training models, which shortens their lifespan, while also consuming ungodly amounts of energy. And surrounding that GPU is the rest of the computer, which is usually highly-specced, and thus, expensive.
OpenAI, as I've written before, is effectively the entire generative AI industry, with its nearest competitor being less than five percent of its 500 million weekly active users.
Source
Ed Zitron has been arguing for more than a year that OpenAI's finances simply don't make sense, and in OpenAI Is A Systemic Risk To The Tech Industry he makes the case in exquisite detail and concludes:
Even in a hysterical bubble where everybody is agreeing that this is the future, OpenAI currently requires more money and more compute than is reasonable to acquire. Nobody has ever raised as much as OpenAI needs to, and based on the sheer amount of difficulty that SoftBank is having in raising the funds to meet the lower tranche ($10bn) of its commitment, it may simply not be possible for this company to continue.
Even with extremely preferential payment terms — months-long deferred payments, for example — at some point somebody is going to need to get paid.
I will give Sam Altman credit. He's found many partners to shoulder the burden of the rotten economics of OpenAI, with Microsoft, Oracle, Crusoe and CoreWeave handling the up-front costs of building the infrastructure, SoftBank finding the investors for its monstrous round, and the tech media mostly handling his marketing for him.
He is, however, over-leveraged. OpenAI has never been forced to stand on its own two feet or focus on efficiency, and I believe the constant enabling of its ugly, nonsensical burnrate has doomed this company. OpenAI has acted like it’ll always have more money and compute, and that people will always believe its bullshit, mostly because up until recently everybody has.
OpenAI cannot "make things cheaper" at this point, because the money has always been there to make things more expensive, as has the compute to make larger language models that burn billions of dollars a year. This company is not built to reduce its footprint in any way, nor is it built for a future in which it wouldn't have access to, as I've said before, infinite resources.
Zitron uses Lehman Brothers as an analogy for the effects of a potential OpenAI failure:
I can see OpenAI’s failure having a similar systemic effect. While there is a vast difference between OpenAI’s involvement in people’s lives compared to the millions of subprime loans issued to real people, the stock market’s dependence on the value of the Magnificent 7 stocks (Apple, Microsoft, Amazon, Alphabet, NVIDIA and Tesla), and in turn the Magnificent 7’s reliance on the stability of the AI boom narrative still threatens material harm to millions of people, and that’s before the ensuing layoffs.
One hint that we might just be stuck in a hype cycle is the proliferation of what you might call “second-order slop” or “slopaganda”: a tidal wave of newsletters and X threads expressing awe at every press release and product announcement to hoover up some of that sweet, sweet advertising cash.
That AI companies are actively patronising and fanning a cottage economy of self-described educators and influencers to bring in new customers suggests the emperor has no clothes (and six fingers).
There are an awful lot of AI newsletters out there, but the two which kept appearing in my X ads were Superhuman AI run by Zain Kahn, and Rowan Cheung’s The Rundown. Both claim to have more than a million subscribers — an impressive figure, given the FT as of February had 1.6mn subscribers across its newsletters.
If you actually read the AI newsletters, it becomes harder to see why anyone’s staying signed up. They offer a simulacrum of tech reporting, with deeper insights or scepticism stripped out and replaced with techno-euphoria. Often they resemble the kind of press release summaries ChatGPT could have written.
Yet AI companies apparently see enough upside to put money into these endeavours. In a 2023 interview, Zayn claimed that advertising spots on Superhuman pull in “six figures a month”. It currently costs $1,899 for a 150-character write-up as a featured tool in the newsletter.
...
“These are basically content slop on the internet and adding very little upside on content value,” a data scientist at one of the Magnificent Seven told me. “It’s a new version of the Indian ‘news’ regurgitation portals which have gamified the SEO and SEM [search engine optimisation and marketing] playbook.”
But newsletters are only the cream of the crop of slopaganda. X now teems with AI influencers willing to promote AI products for minimal sums (the lowest pricing I got was $40 a retweet). Most appear to be from Bangladesh or India, with a smattering of accounts claiming to be based in Australia or Europe. In apparent contravention of X’s paid partnerships policy, none disclose when they’re getting paid to promote content.
...
In its own way, slopaganda exposes that the AI’s emblem is not the Shoggoth but the Ouroboros. It’s a circle of AI firms, VCs backing those firms, talking shops made up of employees of those firms, and the long tail is the hangers-on, content creators, newsletter writers and ‘marketing experts’ willing to say anything for cash.
The AI bubble bursting would be a whole different and much quicker "going out with a bang". How likely is it? To some extent OpenAI is just a front for Microsoft, which gets a slice of OpenAI's revenue, has access to OpenAI's technology, "owns" a slice of the "non-profit", and provides almost all of OpenAI's compute at discounted prices. Microsoft, therefore, has perhaps the best view of the generative AI industry and its prospects.
In February, stock analysts TD Cowen spotted that Microsoft had cancelled leases for new data centres — 200 megawatts in the US, and one gigawatt of planned leases around the world.
Microsoft denied everything. But TD Cowen kept investigating and found another two gigawatts of cancelled leases in the US and Europe. [Bloomberg, archive]
Bloomberg has now confirmed that Microsoft has halted new data centres in Indonesia, the UK, Australia and the US. [Bloomberg, archive]
The Cambridge, UK site was specifically designed to host Nvidia GPU clusters. Microsoft also pulled out of the new Docklands Data Centre in Canary Wharf, London.
In Wisconsin, US, Microsoft had already spent $262 million on construction — but then just pulled the plug.
Mustafa Suleyman of Microsoft told CNBC that instead of being “the absolute frontier,” Microsoft now prefers AI models that are “three to six months behind.” [CNBC]
Google has taken up some of Microsoft’s abandoned deals in Europe. OpenAI took over Microsoft’s contract with CoreWeave. [Reuters]
Ed Zitron covered this "pullback" a month ago in Power Cut:
As a result, based on TD Cowen's analysis, Microsoft has, through a combination of canceled leases, pullbacks on Statements of Qualifications, cancellations of land parcels and deliberate expiration of Letters of Intent, effectively abandoned data center expansion equivalent to over 14% of its current capacity.
...
In plain English, Microsoft, which arguably has more data than anybody else about the health of the generative AI industry and its potential for growth, has decided that it needs to dramatically slow down its expansion. Expansion which, to hammer the point home, is absolutely necessary for generative AI to continue evolving and expanding.
While there is a pullback in Microsoft's data center leasing, it’s seen a "commensurate rise in demand from Oracle related to The Stargate Project" — a relatively new partnership of "up to $500 billion" to build massive new data centers for AI, led by SoftBank, and OpenAI, with investment from Oracle and MGX, a $100 billion investment fund backed by the United Arab Emirates.
The data centers will get built, and they will consume power, because even if AI never manages to turn a profit, some other bubble will take its place to use all the idle GPUs. Despite all the rhetoric about renewables and small modular reactors, much of the additional power will come from fossil fuels.
Data center carbon emissions don't just come from power (Scope 1 and 2). In 2023 Jialin Lyu et al from Microsoft published Myths and Misconceptions Around Reducing Carbon Embedded in Cloud Platforms and stressed the importance of embedded carbon (Scope 3) in the total environmental impact of data centers:
For example, 66% of the electricity used at Google datacenters was matched with renewable energy on an hourly basis in 2021. With historic growth rates, this is likely closer to 70% today. Our LCAs indicate that with 70-75% renewable energy, Scope 3 accounts for close to half of datacenter carbon emissions. Therefore, Scope 3 emissions and embodied carbon are important factors both currently and in the near future.
The Redmond IT giant says that its CO2 emissions are up 29.1 percent from the 2020 baseline, and this is largely due to indirect emissions (Scope 3) from the construction and provisioning of more datacenters to meet customer demand for cloud services.
These figures come from Microsoft's 2024 Environmental Sustainability Report [PDF], which covers the corp's FY2023 ended June 30, 2023. This encompasses a period when Microsoft started ramping up AI support following the explosion of interest in OpenAI and ChatGPT.
Microsoft's "pullback" will likely reduce their Scope 3 emissions going forward, but I would expect that their recent build-out will have reduced the proportion of renewables being consumed. If the Stargate build-out goes ahead it will cause enormous Scope 3 emissions.
Here is one final note of gloom. Training AI models requires rapid access to large amounts of data, motivating data centers to use SSDs instead of hard drives. Counterintuitively, research Seagate published in Decarbonizing Data shows that, despite their smaller size, SSDs have much higher embedded carbon emissions than hard drives. 30TB of SSD has over 160 times as much embedded carbon as a 30TB hard drive.
Even more of a suprise, Seagate's research shows that SSDs even have higher operational emissions than hard drives. While actively reading or writing data, a 30TB SSD uses twice as much power as a 30TB hard drive.
The legacy of Balthus' clients attitude that they 'know the world is doomed, so may as well go out with a bang' and the unsustainable AI bubble will be a massive overbuild of data centers, most of which will be incapable of hosting Nvidia's top-of-the-line racks. If the current cryptocurrency-friendly administration succeeds in pumping Bitcoin back these data centers will likely revert to mining. Either way, the Scope 3 emissions from building and equipping and the Scope 1 and 2 emissions from powering them with natural gas and coal, will put megatons of CO2 into the atmosphere, hastening the point where it is unlikely that 'us in the North, we're fine'.
You can tell this is an extraordinary honor from the list of previous awardees, and the fact that it is the first time it has been awarded in successive years. Part of the award is the opportunity to make an extended presentation to open the meeting. Our talk was entitled Lessons From LOCKSS, and the abstract was:
Vicky and David will look back over their two decades with the LOCKSS Program. Vicky will focus on the Program's initial goals and how they evolved as the landscape of academic communication changed. David will focus on the Program's technology, how it evolved, and how this history reveals a set of seductive, persistent but impractical ideas.
Below the fold is the text with links to the sources, information that appeared on slides but was not spoken, and much additional information in footnotes.
Introduction (Vicky)
Original Logo
First, we are extremely grateful to the Paul Evan Peters award committee and CNI, and to the Association of Research Libraries, EDUCAUSE, Microsoft and Xerox who endowed the award.
David and I are honored and astonished by this award. Honored because it is the premiere award in the field, and astonished because we left the field more than seven years ago to take up our new full-time career as grandparents. The performance metrics are tough, but it’s a fabulous gig.
This talk will be mostly historical. David will discuss the technology's design and some lessons we learned deploying it. First I will talk about our goals when, more than a quarter-century ago, we walked into Michael Keller's office and pitched the idea that became the LOCKSS Program. Mike gave us three instructions:
Don't cost me any money.
Don't get me into trouble.
Do what you want
Support
Ideas
Technology
Michael Lesk
Karen Hunter (CLOCKSS)
Petros Maniatis
Don Waters
James Jacobs (GovDocs)
TJ Giuli
Michael Keller
Martin Halpert Katherine Skinner (1st PLN)
Mema Roussopolous
Brewster Kahle
Clifford Lynch
Mary Baker
Jim Mitchell
Jefferson Bailey
Mark Seiden
John Sack
Gordon Tibbits
LOCKSS team
Susan Horsfall
We succeeded in each of these. That the program is still going is thanks to many people. Snowden Becker who is here today represents a much larger team who work tirelessly to sustain the program. Many others helped along the way. Michael Lesk then at NSF and Donald Waters then at the Mellon Foundation provided essential funding. This slide attempts to thank everyone, but we're sure we've left people out — it was a long time ago.
Let's get started. Over the centuries libraries developed a dual role. By building collections they provided current readers with access to information. Then they exercised stewardship over these collections to safeguard future readers' access.
Libraries transitioned from the print to the digital world over a couple of decades. In the mid 1980’s the Library of Congress experimented with readers accessing journals on 12-inch optical media.
In late 1989 Tim Berners-Lee's first Web browser accessed a page from his first Web server.
Two years later the Stanford Linear Accelerator Center put the first US Web page online and people started thinking about how this new publishing medium would impact the academy. An early effort came in 1993 when Cliff Lynch wrote a 105-page report for the federal Office of Technology Assessment.
Now, consider a library acquiring information in an electronic format. Such information is almost never, today, sold to a library (under the doctrine of first sale); rather, it is licensed to the library that acquires it, with the terms under which the acquiring library can utilize the information defined by a contract typically far more restrictive than copyright law. The licensing contract typically includes statements that define the user community permitted to utilize the electronic information as well as terms that define the specific uses that this user community may make of the licensed electronic information. These terms typically do not reflect any consideration of public policy decisions such as fair use, and in fact the licensing organization may well be liable for what its patrons do with the licensed information.
Cliff's report was wide-ranging and insightful. In particular, he noted the change from the "first sale" doctrine legal framework to a publisher and library specific contract written by the publisher's lawyers.
Very few contracts with publishers today are perpetual licenses; rather, they are licenses for a fixed period of time, with terms subject to renegotiation when that time period expires. Libraries typically have no controls on price increase when the license is renewed; thus, rather than considering a traditional collection development decision about whether to renew a given subscription in light of recent price increases, they face the decision as to whether to lose all existing material that is part of the subscription as well as future material if they choose not to commit funds to cover the publisher's price increase at renewal time.
He pointed out that the change made future readers' access completely dependent upon continued payment and the publishers' whims, thus blocking libraries from fulfilling their critical stewardship role.
In 1995 I was part of the small team that developed Stanford's Highwire Press. Highwire was the first Web publishing platform for academic journals. By then the problems Cliff identified impacting libraries' stewardship role had become obvious. At the time I attended a lot of conferences. A frequent discussion topic was the ramifications of libraries transitioning from content ownership to content access. Many highly placed librarians thought the change was great – no more building collections, no more stewardship responsibility! I strongly disagreed. Hiking with David after one such conference I described how stewardship worked in the paper world and how it didn't in the Web world. His response was "I can build a system that works the way paper does".
David’s and my goal was to model the way paper worked, to provide librarians with an easy, familiar, affordable way to build and steward traditional collections that were migrating from paper to online.
Libraries fulfill their stewardship role when future access is ensured. Stewardship occurs when libraries take possession of and manage cultural and intellectual assets. We thought it vital for libraries to retain their stewardship role in the scholarly communication ecosystem. We didn't want them to become simply convenient places to work and drink coffee[1].
Stewardship matters for at least three reasons:
To protect privacy.
To protect first sale.
To defend against censorship.
Stewardship protects privacy when librarians fight for their patrons’ rights.
VII. All people, regardless of origin, age, background, or views, possess a right to privacy and confidentiality in their library use. Libraries should advocate for, educate about, and protect people’s privacy, safeguarding all library use data, including personally identifiable information.
Adopted June 19, 1939, by the ALA Council; amended October 14, 1944; June 18, 1948; February 2, 1961; June 27, 1967; January 23, 1980; January 29, 2019.
Inclusion of “age” reaffirmed January 23, 1996.
All people have a right to privacy. Librarians should safeguard the privacy of all library use.
Stewardship protects ownership transfer when content is acquired.
The First Sale doctrine is pivotal. It enables the business of libraries. It enables libraries to maintain and circulate knowledge. First Sale ensures that the public, especially future generations, benefit from today's and yesterday's works of literature, science, and culture.
Stewardship resists censorship when there are multiple copies under multiple stewards.
Today, book banning is on the rise. Librarians are being forced to remove items from circulation. Content ownership ensures materials can’t be erased from view without detection. Stewardship of banned materials allows librarians to choose whether to safeguard these materials for future readers.
Government Documents are and always have been in the crosshairs of censors, I’ll mention four efforts providing countervailing forces:
First, the U.S. LOCKSS Docs Network. The Government Publishing Office (GPO), produces and distributes government documents. In the paper world, the Federal Depository Library Program distributed documents to over 1,000 libraries across the nation. To recall documents, the government had to contact the librarians and ask them to withdraw the materials. It was a transparent process.
Sample Withdrawn GPO Documents
Courtesy of James R. Jacobs, Stanford
This is a sample of withdrawn Federal documents.
Online, there were no censorship guardrails. In 2008 a small group of librarians formed the U. S. Docs LOCKSS network. This program is a digital instantiation of the U.S. Federal Depository Library Program. In partnership with the Government Publishing Office, participating libraries have recreated the distributed, transparent, censor resistant nature of the depository paper system.
This is a sample of volumes released this February to the U.S.Docs LOCKSS network
Second, the Canadian Government Information Digital Preservation Network. It consists of 11 academic libraries that use Archive-It (an Internet Archive service) to collect all Canadian federal documents. The collected documents are then moved from the Internet Archive into a local LOCKSS network for distributed safekeeping.
This partnership captures U.S. Government websites at the end of presidential administrations. With this last administrative change, thousands of federal web pages and datasets have been taken offline. Federal web sites hold information important to every corner of a university. The End of Term Archive is an extraordinarily important resource. Oddly only two universities partner with Archive-It to do this work: Stanford and the University of North Texas.
Last, there are many efforts to capture US data sets. The Data Rescue Project serves as a clearing house.
The community recently relearned a lesson history failed to teach. Digital preservation's biggest threat is insider attack. In recent months an unknown number of critical government databases are gone, or altered. The antidote to insider attack is multiple copies under multiple stewards. In LOCKSS language, let’s make it easy to find some of the copies, but hard to find all the copies.
I want to say a few words about sustainability. We worked very hard to make the LOCKSS Program sustainable. Don Waters at the Mellon Foundation awarded LOCKSS a matching grant to transition from grant funding to the Red Hat model of free software and paid support.
Funding programs like LOCKSS is difficult. The LOCKSS Program reinstates stewardship and enables libraries as memory organizations. This is a hard sell. Librarians spend scarce resources to support current readers, spending them to ensure materials are available to tomorrow's readers ... not so much. While fundraising fluctuates, costs are steady. To ensure stability, we accumulated reserves by having a very lean staff and being stingy with salaries.
And then along came CLOCKSS, where publishers took the lead to establish a community run archive that implements library values. In 2006, a handful of publishers, notably the late Karen Hunter, Elsevier, suggested a partnership between libraries and publishers to form a community run archive. In 2008, after a pilot funded by the founding archive libraries, contributing publishers, and the Library of Congress' NDIIPP, the CLOCKSS archive went into production.
Identical copies of archived content are held in eleven libraries worldwide (Scotland, Australia, Japan, Germany, Canada, and six in the United States). This international footprint ensures content is safe from shifting ideologies, or nefarious players. As in all LOCKSS networks, if a bad actor tries to remove or change content, the technology warns humans to investigate.
The CLOCKSS founding librarians and publishers unanimously agreed that when archived content becomes unavailable, it will be hosted from multiple sources, open access. An example: Heterocycles was an important chemistry journal. Established in 1973, it abruptly ceased publication in 2023 after 50 years. Inexplicably the journal also disappeared from the publisher’s web site; current subscribers lost all access. The content was unavailable from anywhere.
Fortunately, the entire run of the Heterocycles journal was archived in CLOCKSS. In June 2024, two CLOCKSS archive libraries, the University of Edinburgh and Stanford University each made all 106 volumes open access on the web.
The CLOCKSS Archive is governed equally by publishers and librarians, in true community spirit. However, publishers provide the bulk of financial support, contributing 70% of incoming funds. Libraries contribute only 30%. Alicia Wise, CLOCKSS executive director reports this gap is wideningover time. Ironically, the publishers many librarians consider “rapacious” are paying for an archive that upholds traditional library values and protects content access for future readers.
After more than a quarter-century, the LOCKSS Program continues to collect, to preserve and to provide access to many genres of content. The business model has evolved, but the goals have persisted. I will now hand over to David to talk about the technology, which has also evolved and persisted.
Technology Design (David)
The ARL Serials Initiative forms part of a special campaign mounted by librarians in the 1980s against the high cost of serials subscriptions. This is not the first time that libraries have suffered from high serial prices. For example, in 1927 the Association of American Universities reported that:
"Librarians are suffering because of the increasing volume of publications and rapidly rising prices. Of special concern is the much larger number of periodicals that are available and that members of the faculty consider essential to the successful conduct of their work. Many instances were found in which science departments were obligated to use all of their allotment for library purposes to purchase their periodical literature which was regarded as necessary for the work of the department"
The power imbalance between publishers and their customers is of long standing, and it especially affects the academic literature.[2] Simplistic application of Web technology drove a change from purchasing a copy of the content to renting access to the publisher's copy.[3] This greatly amplifies the preexisting power imbalance. Thus in designing the LOCKSS system, we faced three challenges:
to model for the Web the way libraries worked on paper,
to somehow do so within the constraints of contract law and copyright,
to ensure the system cost was negligible compared to subscription costs.
From a system engineering viewpoint, what struck me about Vicky's description of the paper library system was that libraries' circulating collections form a model fault-tolerant decentralized system. It is highly replicated, and exploits this to deliver a service that is far more reliable than any individual component. There is no single point of failure, no central control to be subverted. The desired behavior of the system as a whole emerges as the participants take actions in their own local interests and cooperate in ad-hoc, informal ways with other participants.
LOCKSS Design Goals
Allow libraries to:
Collect journals to which they subscribed
Give current readers access to their collection
Preserve their collection for future readers
Cooperate with other libraries
The system I envisaged on the hike would consist of a LOCKSS box at each library, the digital analog of the stacks, that would hold the content the library had purchased. It would need these characteristics of the paper system:
It would allow libraries to collect material to which they subscribed from the Web.
It would allow libraries' readers to access material they had collected.
It would allow them to preserve their collections against the multiple frailties of digital information.
It would allow libraries to cooperate, the analog of inter-library loan and copy.
Collect
The collect part was both conceptually simple and mostly off-the-shelf. Since the journals were pay-walled, as with paper each library had to collect its own subscription content. But collecting content is what Web browsers do. When they fetch content from a URL they don't just display it, they store it in a cache on local storage. They can re-display it without re-fetching it. The system needed a browser-like "Hotel California" cache that never got flushed, and a Web crawler like those of search engines so that all the library's subscribed content ended up in the cache.
Because we lacked "first sale" rights, the crawler had to operate with permission from the publisher, which took the form of a statement on their Web site. No permission, no collection.
Access
The access part was also conceptually simple and mostly off-the-shelf. Readers should see the content from the publisher unless it wasn't available. Their LOCKSS box should act as a transparent Web proxy, forwarding requests to the publisher and, if the response were negative, responding with the cached copy.
Preserve
The preserve part was conceptually simple — just don't remove old content from the cache on disk. But it was more of a problem to implement for three reasons:
Disks are not 100% reliable in the short term and are 0% reliable over library timescales. Over time, content in the cache would get corrupted or lost.
Because libraries were under budget pressure and short of IT resources, the hardware of the LOCKSS box had to be cheap, thus not specially reliable.
Content in the cache would be seen by humans only in exceptional circumstances, so detecting corruption or loss could not depend upon humans.
Cooperate
Cooperation provided the solution to the problems of preservation. We expected considerable overlap between libraries' subscriptions. Thus each journal would be collected by many libraries, just as in the paper system. The LOCKSS boxes at each library subscribing to each journal could compare their versions, voting on it in a version of the standard Byzantine Fault Tolerance algorithm. A library that lost a vote could repair their damaged copy from another library.
The goal of stewardship drove LOCKSS' approach to preservation; given a limited budget and a realistic range of threats, data survives better in many cheap, unreliable, loosely-coupled replicas than in a single expensive, durable one.
Technology Lessons (David)
Our initial vision for the system was reasonably simple, but "no plan survives contact with the enemy" and so it was as we developed the system and deployed it in production. Now for some lessons from this process that are broadly applicable.
Format Migration
In January 1995 the idea that the long-term survival of digital information was a significant problem was popularized by Jeff Rothenberg's Scientific American article Ensuring the Longevity of Digital Documents. Rothenberg's concept of a "digital document" was of things like Microsoft Word files on a CD, individual objects encoded in a format private to a particular application. His concern was that the rapid evolution of these applications would eventually make it impossible to access the content of objects in that format. He was concerned with interpreting the bits; he essentially assumed that the bits would survive.
But thirty years ago next month an event signaled that Rothenberg's concerns had been overtaken by events. Stanford pioneered the transition of academic publishing from paper to the web when Highwire Press put the Journal of Biological Chemistry on the Web. Going forward the important information would be encoded in Web formats such as HTML and PDF. Because each format with which Rothenberg was concerned was defined by a single application it could evolve quickly. But Web formats were open standards, implemented in multiple applications. In effect they were network protocols, and thus evolve at a glacial pace.[4]
The rapid evolution of Rothenberg's "digital documents" had effectively stopped, because they were no longer being created and distributed in that way. Going forward, there would be a static legacy set of documents in these formats. Libraries and archives would need tools for managing those they acquired, and eventually emulation, the technique Rothenberg favored, would provide them. But by then it turned out that, unless information was on the Web, almost no-one cared about it.
Thus the problem for digital preservation was the survival of the bits, aggravated by the vast scale of the content to be preserved. In May the following year 2004 Paul Evan Peters awardee Brewster Kahle established the Internet Archive to address the evanescence of Web pages.[5] This was the first digital preservation effort to face the problems of scale - next year the archive will have collected a trillion Web pages.[6]
The LOCKSS system, like the Wayback Machine, was a system for ensuring the survival of, and access to, the bits of Web pages in their original format. This was a problem; the conventional wisdom in the digital preservation community was that the sine qua non of digital preservation was defending against format obsolescence. Neither Kahle nor we saw any return on investing in format metadata or format migration. We both saw scaling up to capture more than a tiny fraction of the at-risk content as the goal. Events showed we were right, but at the time the digital preservation community viewed LOCKSS with great skepticism, as "not real digital preservation".
The LOCKSS team repeatedly made the case that preserving Web content was a different problem from preserving Rothenberg's digital documents, and thus that applying the entire apparatus of "preservation metadata", PREMIS, FITS, JHOVE, and format normalization to Web content was an ineffective waste of scarce resources. Despite this, the drumbeat that LOCKSS wasn't "real digital preservation" continued.
After six years, the LOCKSS team lost patience and devoted the necessary effort to implement a capability they were sure would never be used in practice. The team implemented, demonstrated and in 2005 published transparent, on-demand format migration of Web content preserved in the LOCKSS network. This was possible because the specification of the HTTP protocol that underlies the Web supports the format metadata needed to render Web content. If it lacked such metadata, Web browsers wouldn't be possible. The criticism continued unabated.[7]
There have been a number of services based instead upon emulation, the technique Rothenberg preferred. Importantly, Ilya Kreymer's oldweb.today uses emulation to show preserved Web content as it did in a contemporaneous browser not as it does in a modern browser.
Around 6th December 1991 Paul Kunz at the Stanford Linear Accelerator Center bought up the first US Web site.[8]
In a foreshadowing of future problems its content was dynamic. It was a front-end for querying databases; although the page itself was static clicking on the links potentially returned different content as the underlying database was edited.
Digital documents in a distributed environment may not behave consistently; because they are presented both to people who want to view them and software systems that want to index them by computer programs, they can be changed, perhaps radically, for each presentation. Each presentation can be tailored for a specific recipient.
Cliff Lynch identified the problem that dynamic content posed for preservation. In 2001 he wrote "Each presentation can be tailored for a specific recipient". Which recipient's presentation deserves to be preserved? Can we show a future recipient what they would have seen had they accessed the resource in the past?
there’s a largely unaddressed crisis developing as the dominant archival paradigms that have, up to now, dominated stewardship in the digital world become increasingly inadequate. ... the existing models and conceptual frameworks of preserving some kind of “canonical” digital artifacts ... are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances.
Sixteen years later Lynch was still flagging the problem.
The dynamic nature of Web content proved irresistible to academic journal publishers, despite their content being intended as archival. They added features like citation and download counts, personalizations, and of course advertisements to HTML pages, and watermarked their PDFs. These were all significant problems for LOCKSS, which depended upon comparing the copies ingested by multiple LOCKSS boxes. The comparison process had to filter out the dynamic content elements; maintaining the accuracy of doing so was a continual task.[9]
The fundamental problem is that the Web does not support Universal Resource Names (URNs) but only Universal Resource Locators (URLs). A URN would specify what a resource would consist of, all that a URL specifies is from where a resource can be obtained. As with the first US Web page, what content you obtain from a URL is unspecified and can be different or even unobtainable on every visit.
The reason the Web runs on URLs not URNs is that the underlying Internet's addresses, both IP and DNS, only specify location. There have been attempts to implement a network infrastructure that would support "what" not "where" addresses; think of it as BitTorrent, but at the transport not the content layer.[10]
The goal of digital preservation is to create one or more persistent, accessible replicas of the content to be preserved. In "what" networks, each copy has the same URN. In IP-based networks, each copy has a different URL; to access the replica requires knowing where it is. Thus if the original of the preserved content goes away, links to it no longer resolve.
Starting in 2010, 2017 Paul Evan Peters awardee Herbert Van de Sompel and others made a valiant effort to solve this problem with Memento. Accepting the fact that persistent replicas of content at a URL at different times in the past would have different URLs, they provided an HTTP-based mechanism for discovering the URL of the replica close to a desired time. In some cases, such as Wikis, the original Web site implements the discovery mechanism and the underlying timeline. In other cases, such as the Wayback Machine, the site holding the replica implements the timeline. Since there are likely to be multiple Web archives with replicas of a given URL, Memento in practice depends upon Aggregator services to provide a unified timeline of the replica space.
In "what" networks there would still be a need to provide an aggregated timeline, not discovering the URL of a replica from a desired time, but discovering its URN. Just as in the Web, they would depend upon a mechanism above the transport layer to connect the names into a timeline. Thus, despite its theoretical appeal, "what" networking's practical advantages are less than they appear.
Decentralization
When something is published in print, legitimate copies ... are widely distributed to various organizations, such as libraries, which maintain them as public record. These copies bear a publication date, and the publisher essentially authenticates the claims of authorship ... By examining this record, control of which is widely distributed ... it is possible, even years after publication, to determine who published a given work and when it was published. It is very hard to revise the published record, since this involves all of the copies and somehow altering or destroying them.
In 1994 Lynch had described how "Lots Of Copies Keep Stuff Safe" in the paper world. Compare this with how we summarized libraries' role in our first major paper on LOCKSS, Permanent Web Publishing:
Acquire lots of copies. Scatter them around the world so that it is easy to find some of them and hard to find all of them. Lend or copy your copies when other librarians need them.
Because we were modeling the paper library system, we hoped that the LOCKSS system would obtain the benefits of a decentralized system over a centralized one performing the same function, which in the paper system and in theory are significant:
It has the economic advantage that it is hard to compare the total system cost with the benefits it provides because the cost is diffused across many independent budgets.
After a couple of my prototypes proved to be insecure, I worked with a group of amazing Stanford CS Ph.D students to design a decentralized peer-to-peer network secured by Proof-of-Work. The 2003 paper describing it won "Best Paper" at the prestigious Symposium on Operating System Principles. This was five years before Satoshi Nakamoto published his decentralized peer-to-peer network secured by Proof-of-Work.
Unfortunately in the digital world it is extraordinarily difficult to reap the theoretical benefits of decentralization. I laid out the reason why this is so a decade ago in Economies of Scale in Peer-to-Peer Networks. In brief, the mechanism described by W. Brian Arthur in his 1994 book Increasing Returns and Path Dependence in the Economy operates. Technology markets have very strong increasing returns to scale. The benefits from participating in a decentralized digital system increase faster than the costs, which drives centralization.[11] Thirty years later, Arthur's work explains today's Web perfectly.
Their user experience is worse, being more complex, slower and less predictable. An example is that Bitcoin's transaction rate is limited by its 10-second block time.
They are in practice only as decentralized as the least decentralized layer in the stack.
Their excess costs cause emergent behaviors that drive centralization.
The fundamental problem is that most layers in the software stack are highly concentrated, starting with the three operating systems. Network effects and economies of scale apply at every layer.
The LOCKSS system was designed and implemented to be completely decentralized. It was permissionless; nodes could join and leave the network as they liked. We designed the network protocol to be extremely simple, both to avoid security flaws, and also in the hope that there would be multiple implementations, avoiding single points of failure. There were a number of reasons why, over time, it turned out much less decentralized than we hoped:
Although we always paid a lot of attention to the security of LOCKSS boxes, we understood that a software mono-culture was vulnerable to software supply chain attacks. But it turned out that the things that a LOCKSS box needed to do other than handling the protocol were quite complex, so despite our best efforts we ended up with a software monoculture.
We hoped that by using the BSD open-source license we would create a diverse community of developers, but we over-estimated the expertise and the resources of the library community, so Stanford provided the overwhelming majority of the programming effort.
Don Waters was clear that grant funding could not provide the long-term sustainability needed for digital preservation. So he provided a matching grant to fund the transition to being funded by the system's users. This also transitioned the system to being permissioned, as a way to ensure the users paid.
Although many small and open-access publishers were happy to allow LOCKSS to preserve their content,
the oligopoly publishers never were. Eventually they funded a completely closed network of huge systems at major libraries around the world called CLOCKSS. This is merely the biggest of a number of closed, private LOCKSS networks that were established to serve specific genres of content, such as government documents.
Mono-culture risk is pervasive throughout the stacks of digital preservation systems. For example, for very good reasons almost all are based on X86 hardware and an open-source stack starting with the Linux kernel. These very good reasons outweigh the benefits of diversity in the stack. And, of course, the problem of mono-culture risk is generic throughout IT due to the network effects and economies of scale described by Brian Arthur. It is especially worrying in an era of zero-day vulnerabilities and sophisticated software supply chain attacks such as the recent $1.5B heist from Bybit.[12]
Archival Media (David)
Don't, don't, don't, don't believe the hype!
Public Enemy
We have already warned you against three seductive but impractical ideas; format migration, "what" networking and decentralization. My parting gift to you is to stop you wasting time on another seductive but impractical idea — that the solution to digital preservation is quasi-immortal media. What follows is an extract from a talk at Berkeley last month.
Archival Data
Over time, data falls down the storage hierarchy.
Data is archived when it can't earn its keep on near-line media.
Lower cost is purchased with longer access latency.
What is a useful definition of archival data? It is data that can no longer earn its keep on readily accessible storage. Thus the fundamental design goal for archival storage systems is to reduce costs by tolerating increased access latency. Data is archived, that is moved to an archival storage system, to save money. Archiving is an economic rather than a technical issue.[13]
The mainstream media occasionally comes out with an announcement like this from the Daily Mail in 2013, or this from the New Yorker last month. Note the extrapolation from "a 26 second excerpt" to "every film and TV program ever created in a teacup".
Six years later, this is a picture of, as far as I know, the only write-to-read DNA storage drive ever demonstrated, from the Microsoft/University of Washington team that has done much of the research in DNA storage. It cost about $10K and took 21 hours to write then read 5 bytes.
The technical press is equally guilty. The canonical article about some development in the lab starts with the famous IDC graph projecting the amount of data that will be generated in the future. It goes on to describe the amazing density some research team achieved by writing say a megabyte into their favorite medium in the lab, and how this density could store all the world's data in a teacup for ever. This conveys five false impressions.
Market Size
First, that there is some possibility the researchers could scale their process up to a meaningful fraction of IDC's projected demand, or even to the microscopic fraction of the projected demand that makes sense to archive. There is no such possibility. Archival media is a much smaller market than regular media.
IBM's Georg Lauhoff and Gary M Decad's slide shows that the size of the market in dollar terms decreases downwards. LTO tape is less than 1% of the media market in dollar terms and less than 5% in capacity terms.[14]
Timescales
Second, that the researcher's favorite medium could make it into the market in the timescale of IDC's projections. Because the reliability and performance requirements of storage media are so challenging, time scales in the storage market are much longer than the industry's marketeers like to suggest.
Take, for example, Seagate's development of the next generation of hard disk technology, HAMR, where research started twenty-six years ago. Nine years later in 2008 they published this graph, showing HAMR entering the market in 2009. Seventeen years later it is only now starting to be shipped to the hyper-scalers. Research on data in silica started fifteen years ago. Research on the DNA medium started thirty-six years ago. Neither is within five years of market entry.
Customers
Third, that even if the researcher's favorite medium did make it into the market it would be a product that consumers could use. As Kestutis Patiejunas figured out at Facebook more than a decade ago, because the systems that surround archival media rather than the media themselves are the major cost, the only way to make the economics of archival storage work is to do it at data-center scale but in warehouse space and harvest the synergies that come from not needing data-center power, cooling, staffing, etc.
Storage has an analog of Moore's Law called Kryder's Law, which states that over time the density of bits on a storage medium increases exponentially. Given the need to reduce costs at data-center scale, Kryder's Law limits the service life of even quasi-immortal media. As we see with tape robots, where data is routinely migrated to newer, denser media long before its theoretical lifespan, what matters is the economic, not the technical lifespan of a medium.
Fourth, that anyone either cares or even knows what medium their archived data lives on. Only the hyper-scalers do. Consumers believe their data is safe in the cloud. Why bother backing it up, let alone archiving it, if it is safe anyway? If anyone really cares about archiving they use a service such as Glacier, when they definitely have no idea what medium is being used.
Fifth, the idea that with quasi-immortal media you don't need Lots Of Copies to Keep Stuff Safe.[15]
Media such as silica, DNA, quartz DVDs, steel tape and so on address bit rot, which is only one of the threats to which long-lived data is subject. Clearly a single copy on such media is still subject to threats including fire, flood, earthquake, ransomware, and insider attacks. Thus even an archive needs to maintain multiple copies. This greatly increases the cost, bringing us back to the economic threat.
The reason why this focus on media is a distraction is that the cost per terabyte of the medium is irrelevant, what drives the economic threat is the capital and operational cost of the system. It is only by operating at data-center scale and thus amortizing the capital and operational costs over very large amounts of data that the system costs per terabyte can be made competitive.
The fundamental idea behind LOCKSS was that, given a limited budget and a realistic range of threats, data would survive better in many cheap, unreliable, loosely-coupled replicas than in a single expensive, durable one.
When giving talks about LOCKSS Vicky or I often used to feel like the Sergeant in Alice's Restaurant who "spoke for 45 minutes and nobody understood a word he said". We hope that this time we did better. Lets see if we did as we answer your questions.
Footnotes
Links to Lynch's Office of Technology Assessment report have been replaced by links to the Wayback Machine's copy collected by the End Of Term Crawl. The original was hosted at the Dept. of Education's ERIC website. The Department is currently at risk of being shut down.
In 2006 Vicky predicted that, without collection stewardship, libraries and Starbucks would become indistinguishable. Here is a real Starbucks ad, with one minor addition.
Four years later this prediction came true; Starbucks populated its WiFi networks with a wide range of otherwise pay-walled content such as the Wall Street Journal.
Library budgets have struggled with journal costs for close on a century, if not longer!
That is, from a legal framework of the "first sale" doctrine and copyright, to one of contract law and copyright.
The deployment of IPv6, introduced in December 1995, shows that network protocols are extraordinarily difficult to evolve, because of the need for timely updates to many independent implementations. Format obsolescence implies backwards incompatibility; this is close to impossible in network protocols because it would partition the network. As I discussed in 2012's Formats Through Time, the first two decades of the Web showed that Web formats essentially don't go obsolete.
This evanescence comes in two forms, link rot, when links no longer resolve, and content drift, when they resolve to different content.
People's experience of the reliability of their personal data storage is misleading. Reliable, affordable long-term storage at Web scale is an interesting engineering problem.
The irony of this was that format migration was a technique of which Rothenberg’s article disapproved:
Finally, [format migration] suffers from a fatal flaw. ... Shifts of this kind make it difficult or impossible to translate old documents into new standard forms.
At least the journals we archived were not malicious; they had actual content that was the same for everybody. That different readers saw different ads was of interest only to students of advertising. But the opportunity to confine readers in a tailored bubble has turned out to be profitable but disastrous.
The goal of IP and the layers above is to move data. There is an assumption that, in the normal case, the bits vanish from the sender once they have been transported, and also from any intervening nodes.
The goal of CCN is to copy data. A successful CCN request creates a locally accessible copy of some remote content. It says nothing about whether in the process other (cached) copies are created, or whether the content is deleted at the source. None of that is any concern of the CCN node making the request, they are configuration details of the underlying network.
While it has its copy, the CCN node can satisfy requests from other nodes for that content, it is a peer-to-peer network.
Basing networking on the copy-ability of bits rather than the transmissibility of bits makes a huge simplification. In particular, it means that, unlike in IP-based networks but like in BitTorrent, caches (and thus digital preservation) just work.
In CCN all replicas of the same content have the same name as the original; which of them satisfies a request is determined by the CCN network's routing at request time. If the content changes, so does its name. In CCN, the analogs of routers in an IP network are caches, holding recently accessed content and supplying it on request. Some of them are archives, caches that are never flushed like our vision for LOCKSS boxes. Just like routers, unless something goes wrong they are invisible.
Subsystem
Bitcoin
Ethereum
Mining
5
3
Client
1
1
Developer
5
2
Exchange
5
5
Node
3
4
Owner
456
72
The Nakamoto coefficient is the number of units in a subsystem you need to control 51% of that subsystem. Because decentralization applies at each layer of a system's stack, it is necessary to measure each of the subsystems individually. In 2017's Quantifying Decentralization Srinivasan and Lee identified a set of subsystems for public blockchains, and measured them using their proposed "Nakamoto Coefficient". Their table of the contemporary Nakamoto coefficients for Bitcoin and Ethereum makes the case that they were only minimally decentralized.
There is an even bigger problem for Ethereum since the blockchain switched to Proof-of-Stake. The software that validators run is close to a mono-culture. Two of the minor players have recently suffered bugs that took them off-line, as Sam Kessler reports in
Bug That Took Down 8% of Ethereum's Validators Sparks Worries About Even Bigger Outage:
A bug in Ethereum's Nethermind client software – used by validators of
the blockchain to interact with the network – knocked out a chunk of the
chain's key operators on Sunday.
...
Nethermind powers around 8%
of the validators that operate Ethereum, and this weekend's bug was
critical enough to pull those validators offline. ... the Nethermind
incident followed a similar outage earlier in January that impacted
Besu, the client software behind around 5% of Ethereum's validators.
...
Around 85% of Ethereum's validators
are currently powered by Geth, and the recent outages to smaller
execution clients have renewed concerns that Geth's dominant market
position could pose grave consequences if there were ever issues with
its programming.
...
Cygaar cited data from the website execution-diversity.info noting that popular crypto exchanges like Coinbase, Binance and Kraken all rely on Geth
to run their staking services. "Users who are staked in protocols that
run Geth would lose their ETH" in the event of a critical issue," Cygaar
wrote.
Remember "no-one ever gets fired for buying IBM"? At the Ethereum layer, it is "no-one ever gets fired for using Geth" because, if there was ever a big problem with Geth, the blame would be so widely shared.
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently. ... However, can we really say that the uncoordinated choice model is realistic when 90% of the Bitcoin network’s mining power is well-coordinated enough to show up together at the same conference?
What Buterin is saying is that because decentralized systems in the real world are not composed of "many small actors that make decisions independently", there is nothing to stop the small number of large actors colluding, and thus acting as a centralized system.
Ten thousand years is about the age of civilization, so a 10K-year Clock would measure out a future of civilization equal to its past. That assumes we are in the middle of whatever journey we are on – an implicit statement of optimism.
They would like to accompany it with a 10,000-year archive. That is at least two orders of magnitude longer than I am talking about here. We are only just over three-quarters of a century from the first stored-program computer, so designing a digital archive for a century is a very ambitious goal. Note that the design of the Clock of the Long Now is as much social as technical. It is designed to motivate infrequent but continual pilgrimages:
On days when visitors are there to wind it, the calculated melody is transmitted to the chimes, and if you are there at noon, the bells start ringing their unique one-time-only tune. The 10 chimes are optimized for the acoustics of the shaft space, and they are big.
Finally, way out of breath, you arrive at the primary chamber. Here is the face of the Clock. A disk about 8 feet in diameter artfully displays the natural cycles of astronomical time, the pace of the stars and the planets, and the galactic time of the Earth’s procession. If you peer deep into the Clock’s workings you can also see the time of day.
But in order to get the correct time, you need to “ask” the clock. When you first come upon the dials the time it displays is an older time given to the last person to visit. If no one has visited in a while, say, since 8 months and 3 days ago, it will show the time it was then. To save energy, the Clock will not move its dials unless they are turned, that is, powered, by a visitor. The Clock calculates the correct time, but will only display the correct time if you wind up its display wheel.
It is noteworthy that in 2023 Optical Archival (OD-3), the most recent archive-only medium, was canceled for lack of a large enough market. It was a 1TB optical disk, an upgrade from Blu-Ray.
No medium is perfect. They all have a specified Unrecoverable Bit Error Rate (UBER) rate. For example, typical disk UBERs are 10-15. A petabyte is 8*1015 bits, so if the drive is within its specified performance you can expect up to 8 errors when reading a petabyte. The specified UBER is an upper limit, you will normally see far fewer. The UBER for LT09 tape is 10-20, so unrecoverable errors on a new tape are very unlikely. But not impossible, and the rate goes up steeply with tape wear.
The property that classifies a medium as quasi-immortal is not that its reliability is greater than regular media to start with, although as with tape it may be. It is rather that its reliability decays more slowly than that of regular media. Thus archival systems need to use erasure coding to mitigate both UBER data loss and media failures such as disk crashes and tape wear-out.
A group of data journalists from Nepal take part in an Open Data Editor training session. Photo: OKNP
The Open Knowledge Foundation (OKFN) is happy to announce the release of Open Data Editor (ODE) 1.4.0, the latest version of our new desktop application for data practitioners to detect errors in tables.
ODE is an easy-to-use, open-source alternative to proprietary data wrangling tools, designed for accessibility and learning – no coding skills required. It finds common spreadsheet errors before you start your analysis, runs on any machine, works offline, respects your privacy, and keeps you in full control of your data, with no vendor lock-in. It also comes with a free online course that can help you make your datasets better, therefore making your life/work easier.
In the short time since the first stable release in December 2024, the application has already had a significant impact among civil society organisations, activists, data journalists and public servants in all parts of the world. Read more about the impact and some use cases here.
Installation
If you have the Open Data Editor’s previous versions installed on your computer, please note that the update will not be done automatically. You will need to download it again using the links in the buttons below.
The new main screen (this one above from the macOS version) now allows infinite scrolling, among other improvements
ODE has been migrated to a different architecture, which significantly improves the user experience and adds features that have been identified as essential in the various feedback sessions and pilots we have been running over the last few months.
New Architecture: Built on PySide6 (a simpler framework), it facilitates a more agile work around improvements and changes.
Infinite Scroll: We removed pagination. Now you don’t have to click to move to the next page and explore data in full. The new version incorporates infinite scroll to help you navigate your tables.
Easier Download: We made communication improvements to help you download ODE without going through GitHub. The ODE now has a brand new landing page with everything you need to know to use the app.
Built-in Error Correction: You can now correct errors directly from the Errors Report panel, while in the past this feature just offered an overview of all errors grouped by categories.
Direct Error Detection: Now files are read and validated directly when you click on them. This increases the speed with which you can detect errors in your tables (but can cause a slower experience when working on big files).
Clearer View: The main Datatable View is now an accurate representation of the file contents, and column names are no longer a mix of data and metadata as before.
Features to simplify your work
The Open Data Editor isn’t another complex data tool – it’s your shortcut to better data and improved data literacy.
Here are a few tasks that ODE 1.4.0 can help you with:
Detect errors in spreadsheets in a matter of seconds
Check if the data formats in your columns are correct
Learn data skills with an intuitive tool
Here is how organisations across the world are using ODE:
Observatoire des armements is working with defence spending data
The Demography Project is focusing on water, air quality and electoral data
Bioinformatics Hub of Kenya Initiative (BHKi) is working with genomic data and metadata
City of Zagreb is tackling the challenges of working with infrastructure data
Open Knowledge Nepal is working with local governments and their infrastructure data
After a complete overhaul of the app’s objectives last year, the tool has been in a new phase of development since the beginning of 2025. Our goal this year is to improve accessibility and enhance digital literacy through no-code tools like ODE.
That’s why we are simplifying its architecture, improving the way metadata and errors are communicated, and intensifying pilots and socialisation activities to encourage ODE’s adoption among people and groups without coding skills.
Improvements in the artificial intelligence integration will be another key focus for this year: our team will seek to replace the current model, based on OpenAI, with an open, local model with open source LLMs.
What’s Next
The new version 1.4.0 represents a major milestone in improving the Open Data Editor experience and performance. But, of course, we still have a long way to go to enable anyone, regardless of their educational background, to improve the quality of their data and therefore improve their data literacy.
Here are some issues that will be addressed in the following releases:
Metadata Panel: We realise that this feature isn’t intuitive enough for people without coding skills. Many people reported being afraid to change the metadata parameters out of unfamiliarity – something the app still doesn’t help with. In the next few months, we will combine this feedback with a complete UX assessment by an external consultant.
Publishing Feature: Publishing data after it has been cleaned is a powerful feature of ODE, but it still doesn’t work fully. For now it’s only possible to publish data on GitHub, and we’ll be working over the next few months to enable publishing on data portal software such as Zenodo and CKAN.
More Feedback: Following our The Tech We Want vision, we want to develop technology that works for people and is good enough to solve real-world problems. With the 1.4.0 version now released, our team will now step back from our desks to listen again to people and organisation’s needs in a new phase of user research.
In addition to these main areas, we will also work on solving the issues listed in the project roadmap.
If you have any questions or want any additional information about ODE, you can contact us at info@okfn.org.
Funding
All of Open Knowledge’s work with the Open Data Editor is made possible thanks to a charitable grant from the Patrick J. McGovern Foundation. Learn more about its funding programmes here.
Earlier this month, Jessy at Library Shenanigans posted some charming onomatopetic instructions for circ workers performing checkins at a freshly-automated library in the 1990s. The “Doot-doot” and “Deedle-deedle-dee” are delightful. They may have been a little annoying. But what struck me about the post is that they solve a problem I’ve been aware of for a good 15+ years. And they did so a good decade before I first recall encountering it.
The Problem: Popups, Barcodes, and Returns
Many library systems use popups to indicate various important item-level conditions that should be addressed after checkin. Maybe a book should be routed to a new location. Maybe it needs to go to the bindery. Maybe it’s on hold for someone. You clear it by clicking a button or pressing the “Enter/Return” key.
But while in an ideal world you’d be giving the screen full attention when checking in, that just doesn’t happen. Circ desks are busy places. Library back rooms are busy places.
The act of scanning a barcode also triggers the “Return” keyboard action.1 Scanning a barcode isn’t just the equivalent of typing it in, it’s the equivalent of typing it in and pressing your Return/Enter key to submit it to the system.
Checkin enough materials with enough going on around you and the following scenario will happen:
You scan an item. A popup appears on your screen with important information. You are distracted by something. You scan the next item without checking. The system does not process the barcode. It processes the Return. It dismisses the message.
The result is: Your first item is checked in but you haven’t acted on the note. Your second item is not checked in because all it did was clear the popup. Issues with both items only show up later on when a supposedly-routed item never returns or a patron complains that they returned the second item.
I first recall becoming aware of this issue while working at Hyattsville’s public library 17 years ago, using Geac PLUS. I don’t know if I just didn’t do enough checkin at Newark (I was a page and did checkin only when they were backed up) or was just too young to notice it. Fortunately, some of the time you catch it by looking at the screen and noticing that the item in your hand isn’t in the list of checkins. Or the next thing it needs has triggered a print action and you hear the receipt printer spit out a routing slip.
The Solution
That printer noise is the one reliable notification I’ve ever experienced. But the mid-90s system whose documentation is shared in Library Shenanigans had it on lock:
Deedle-deedle-dee? Holding shelf
Beep-beep-beep? Belongs to another library
Doot-doot? Temporary item, set aside to be handled.
Those aren’t the only reasons I’ve experienced popups, but that covers quite a few of them!
The Question: Where Did it Go?
The whole thing was on my mind because when I was doing my research interviews last year, several folks brought it up as a flaw in the system. It’s not my job during those interviews to explain something like “oh this is shared with much older GUI systems.” But while I know it’s not an Alma issue, they’re also not wrong that it’s a problem in the system!2
So why don’t more of us have this?
Was it specific to a scanner vendor and system?
Was it too annoying? Did it disappear in the era of the GUI desktop client?
Are there still ways to set it up?
If I still had a site with comments, I’d ask people if their library has managed to do anything like this. If you have, hit me up via my contact page or on social media because I would love to hear about it!
I assume it’s in the barcode itself, but it could be a system configuration? I’ve never gotten that granular in barcode/scanner setup. ↩︎
I have, perhaps, made a mistake by introducing this old solution to my own coworkers in public services who think it’s pretty great. Are there still ways to set it up? ↩︎
This week I migrated my virtual server from Digital Ocean, where I pay in USD, to a host in Australia, where the punishing and ever-worse currency exchange rate doesn't make monthly bills a lottery.
This is the third or fourth time I've gone through a server migration, so it's a little less daunting than it has been in the past, but every time there is a new, exciting problem to tackle. I'm not a professional system administrator so things that might seem basic to others can be quite challenging for me.
This post is about one of those – migrating my self-hosted Forgejo git repository . Forgejo actually has very good documentation, which is less typical than we might hope in FOSS projects. The guide to installation is quite thorough and explains what each step is doing and why it is needed. The guide to upgrading is also pretty good, and includes a section on how to make a backup – they even created a CLI command to do this: forgejo dump. Unfortunately, how to restore from the backup is left as an exercise for the reader.
At some point in the future someone else is going to want to migrate their Forgejo install from one server to another, and not know how to do it. This blog post is for that person so they don't need to go through as much trial and error as I did. Let's be real – that person is probably me again, two years in the future.
Assumptions and caveats
This guide assumes:
you are running Forgejo on a Linux server and want to migrate it to another Linux server
you are using the binary rather than Docker
you are using sqlite as your database
you have root access to both servers
Your backup and new file structures may be slightly different, depending on which version you are moving to and from. This is what worked for me.
Step 1: make a backup
On your "old" server, first make a backup of forgejo. Forgejo provides instructions for this but there are a few assumptions made, and they caught me out.
To run a backup, you can use forgejo dump. However there some conditions required to make this work properly.
First of all, if you followed the official the installation instructions you will be running forgejo with the git user, and that user will not have a password and not be able to use sudo. That makes it difficult to run forgejo commands as directly as you might expect. To run a clean dump we need to:
run the command as the git user with sudo -u git forgejo command --argument
run dump from a directory where the git user has write access (the fact you're using sudo won't override this requirement)
explicitly declare both the config location and the working-path
nominate a tmp directory that git has permission to write to
declare your database type
The default temporary file location is /tmp, which probably is owned by root, so you can create a directory to use instead:
sudo -u git mkdir /home/git/tmp
Then move into the git home directory so your git user can save the export file:
cd /home/git
Now you should be able to run a dump with a command like this:
You can find out more about what these flags do with:
sudo -u git forgejo dump --help
You should now have a file called something like forgejo-dump-1744939469.zip.
Step 2: move your backup
You may have your own system worked out for transfering files between your old and new server. If you haven't worked this out yet and your files are not enormous, an easy way to do it is to use scp with your local machine as an intermediary.
To do this successfully, you need a user with the same name on all three machines, with permission to read and write all the files you're moving. On your old server, move the zip file out of the git user's directory and into your main user's home directory, then change the ownership:
This might take a few minutes, depending on how big your git repositories are.
Step 3: Reroute your DNS
We need to see the web configuration screen in a moment. You could do this by viewing it at [your_ip_address]:3000, depending on how you have set up your web server configuration. But given you have to redirect your DNS from the old server to the new one anyway, it's probably easier to do it now. Hopefully you remembered to reduce the TTL value for your (sub)domain earlier in the week 😉. How you do this depends on how you are managing DNS, so it's outside the scope of this post. Don't forget to use certbot to create a new HTTPS certificate.
Step 4: Install Forgejo on your new server
Before we can import the backup into our new server, we need to set up Forgejo. Follow the installation instructions up to the point where you are looking at the fresh web-based configuration screen. You should finalise installation by selecting sqlite as your database type, and creating an admin user. It doesn't actually matter what credentials you give your admin user here because we're about to overwrite them, but you need to perform this step in order to get all your directories in order.
Step 5: Restore your backup
Now we can finally actually migrate our data!
You have a zip file sitting in your main user's home directory. We need get these files into the right places with the right permissions.
First of all, disable your forgejo daemon:
sudo systemctl stop forgejo.service
Now we need to unzip the backup. You might need to install unzip first, and it's a good idea to unzip in a new directory:
sudo apt install unzip
mkdir forgejo-backup
cd forgejo-backup
unzip forgejo-dump-1744939469.zip
Your data directory doesn't include everything needed in that directory, so instead of copying over the top of the whole thing, we just copy in what we have:
sudo mv -r data/* /var/lib/forgejo/data/
Our repositories are in the repos directory from the backup, but we need to copy them in to /data/forgejo-repositories:
Now move the custom directory into /var/lib/forgejo:
sudo mv -r custom /var/lib/forgejo/
You might be wondering what to do with forgejo-db.sql - isn't that your database? Turns out it is not! Your sqlite database is within the data directory in your backup (as forgejo.db), so you don't need to move it specifically. You can ignore forgejo-db.sql .
Step 6: Run doctor
Something is likely to be not quite right at this point, especially if you are also upgrading versions and are missing some newer database tables. You can check this with doctor:
If there is something wrong, doctor will suggest how you can fix it – doctor fix may resolve most or all issues. One of these things is likely to be your server SSH key, which we will come to in a moment.
Step 7: Complete installation and restart Forgejo
Now is a good time to make your file permissions slightly more secure, as per the Forgejo installation instructions. We don't need to write to app.ini any more, so tighten it up a bit:
You should now be able to restart Forgejo, and check in a browser that all is well.
sudo systemctl start forgejo.service
If everything went according to plan, you should now be able to log in all users with their original password and any pre-existing 2FA tokens.
Step 8: Update known_hosts
The last thing you may need to do is update the known_hosts on any local machines pushing to your hosted git repositories. When we set up Forgejo on the new server and tidied it up with doctor, we ended up with a new SSH key. Your pre-existing git repositories aren't going to like that because it will no longer match what is in your known_hosts file and quite correctly you will get an alarming message suggesting a MITM attack is underway. You, of course, know better than this, so you can relax, but you also need to fix the confusion.
A simple way to resolve this is to use ssh-keyscan to query the public key for your Forgejo instance and automatically save it:
ssh-keyscan git.example.com >> ~/.ssh/known_hosts
Note that this is only safe to do in this situation because you're confident the SSH key changed because you changed it. If you suddenly started getting errors about changed keys in any other situation, you'd definitely want to do some investigation before blindly just updating your known_hosts.
Congratulations on migrating your self-hosted git server!
Spring has sprung, and we’re hosting a vernal treasure hunt in celebration! Come join our Spring Treasure Hunt!
We’ve scattered a patch of seedlings around the site, and it’s up to you to try and find them all.
Decipher the clues and visit the corresponding LibraryThing pages to find an seedling. Each clue points to a specific page right here on LibraryThing. Remember, they are not necessarily work pages!
If there’s a seedling on a page, you’ll see a banner at the top of the page.
You have a little less than two weeks to find all the seedlings (until 11:59pm EST, Wednesday April 30th).
Come brag about your patch of seedlings (and get hints) on Talk.
Win prizes:
Any member who finds at least two seedlings will be awarded a seedling Badge ().
Members who find all 12 seedlings will be entered into a drawing for some LibraryThing (or TinyCat) swag. We’ll announce winners at the end of the hunt.
P.S. Thanks to conceptDawg for the spring chicken illustration!
After last week's issue on digital privacy, I thought I'd focus this week on government-sponsored or -enabled surveillance.
As I dug through my store of saved articles, though, I realized I had quite a number of a particular kind of surveillance: camera networks.
These are often municipal-sponsored systems of license plate readers, but there are also networks of private systems—and, of course, attempts to combine the output of all of these networks.
So that is the focus of this week's Thursday Threads issue:
Debate over the privacy concerns and legal challenges of license plate readers is nothing new, as this 2012 article shows.
What happens when you put equipment not meant for the internet onto the internet? A security flaw in Motorola's automated license-plate-recognition systems exposes real-time vehicle data and video feeds online. (2025)
How about we network all of these cameras together? AI-powered surveillance system spurs privacy concerns as adoption grows in U.S. (2023)
If we've got to have this tech, we might as well have some fun with it. Artist's Traffic Cam Photobooth sparks controversy and cease-and-desist over creative use of NYC traffic cameras. (2024)
This Week I Learned: The word "scapegoat" was coined in a 1530 translation of the bible.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Privacy Concerns and Legal Challenges in Rural Virginia's Use of License Plate Reading Cameras
The research for State of Surveillance showed that you can’t drive anywhere without going through a town, city or county that’s using public surveillance of some kind, mostly license plate reading cameras. I wondered how often I might be captured on camera just driving around to meet my reporters. Would the data over time display patterns that would make my behavior predictable to anyone looking at it? So I took a daylong drive across Cardinal Country and asked 15 law enforcement agencies, using Freedom of Information Act requests, to provide me with the Flock LPR footage of my vehicle. My journey took me over 300 miles through slices of the communities those agencies serve, including the nearly 50 cameras they employ. And this journey may take me to one more place: an April Fool’s Day hearing in a courtroom in Roanoke. There, a judge will be asked to rule on a motion to declare the footage of the public to be beyond the reach of the public.
In a detailed exploration of public surveillance, this newspaper editor drove 300 miles across rural Virginia, requesting footage from police of their vehicle captured by license plate reading cameras.
The investigation aimed to understand how often people are recorded by these cameras and the implications of such surveillance.
Despite asking 15 law enforcement agencies for footage, only nine complied while others denied the request, leading to a legal challenge regarding public access to this data.
The editor noted that while driving through various counties, their vehicle was indeed photographed multiple times by Flock cameras, which capture detailed images of vehicles, including license plates and unique identifiers.
The editor also reflected on the ease with which police could track movements without a warrant, emphasizing a shift in expectations regarding privacy in public spaces.
Debate Grows Over Privacy Concerns and Legal Challenges as License Plate Readers Expand Across the U.S.
The scanners can read 60 license plates per second, then match observed plates against a "hot list" of wanted vehicles, stolen cars, or criminal suspects. LPRs [license plate readers] have increasingly become a mainstay of law enforcement nationwide; many agencies tout them as a highly effective "force multiplier" for catching bad guys, most notably burglars, car thieves, child molesters, kidnappers, terrorists, and—potentially—undocumented immigrants. Today, tens of thousands of LPRs are being used by law enforcement agencies all over the country—practically every week, local media around the country report on some LPR expansion. But the system&aposs unchecked and largely unmonitored use raises significant privacy concerns. License plates, dates, times, and locations of all cars seen are kept in law enforcement databases for months or even years at a time. In the worst case, the New York State Police keeps all of its LPR data indefinitely. No universal standard governs how long data can or should be retained.
This is the earliest article I had bookmarked about license plate readers.
The rise of these cameras had led to significant advancements in law enforcement capabilities, particularly in tracking vehicles linked to criminal activity.
It described the effect in Tiburon, California, which was among the first towns to implement cameras that allowed police to monitor all cars entering and leaving the area.
The American Civil Liberties Union raised questions about the lack of regulation surrounding LPR usage and data retention.
Despite the benefits, such as recovering stolen vehicles and identifying suspects, critics highlighted issues like false positives and potential misuse of data.
Those criticisms are still valid today as there has been no comprehensive law on the use of such cameras.
Security Flaw in Motorola's ALPR Systems Exposes Real-Time Vehicle Data and Video Feeds Online
This trove of real-time vehicle data, collected by one of Motorola’s ALPR systems, is meant to be accessible by law enforcement. However, a flaw discovered by a security researcher has exposed live video feeds and detailed records of passing vehicles, revealing the staggering scale of surveillance enabled by this widespread technology. More than 150 Motorola ALPR cameras have exposed their video feeds and leaking data in recent months, according to security researcher Matt Brown, who first publicized the issues in a series of YouTube videos after buying an ALPR camera on eBay and reverse engineering it.
This article is as much about the surveillance possible with these systems as it is about the risks of connecting misconfigured systems open to the public internet.
It discusses a significant security flaw in automated license-plate-recognition (ALPR) systems, particularly those manufactured by Motorola, which exposured real-time video feeds and vehicle data.
On example: in Nashville, an ALPR system captured information from nearly 1,000 vehicles in just 20 minutes.
A security researcher discovered that ALPR cameras were put on the open internet...something it seems they weren't designed to be.
This breach does not require any authentication, highlighting the scale of unintended surveillance enabled by these systems.
The data collected includes photographs, license plate information, and metadata such as location and time.
In just a few taps and clicks, the tool showed where a car had been seen throughout the U.S. A private investigator source had access to a powerful system used by their industry, repossession agents, and insurance companies. Armed with just a car’s plate number, the tool—fed by a network of private cameras spread across the country—provides users a list of all the times that car has been spotted. I gave the private investigator, who offered to demonstrate the capability, a plate of someone who consented to be tracked. It was a match. The results popped up: dozens of sightings, spanning years. The system could see photos of the car parked outside the owner’s house; the car in another state as its driver went to visit family; and the car parked in other spots in the owner’s city. Each was tagged with the time and GPS coordinates of the car. Some showed the car’s location as recently as a few weeks before. In addition to photos of the vehicle itself, the tool displayed the car’s accurate location on an easy to understand, Google Maps-style interface.
The previous articles have talked about public sector cameras for use by police.
This article discusses the Digital Recognition Network (DRN), a private surveillance system that allows its users to track vehicles via a vast database of license plate scans.
The system is built from cameras installed by repo men who collect data as they drive.
Users can access detailed information about a car's location history, including timestamps and GPS coordinates, through a user-friendly interface.
While DRN markets itself as a tool for industries like insurance and investigations, concerns arise regarding privacy violations, as the data can be accessed by anyone who pays for it, including private investigators.
(Last week's Thursday Threads include a story about how freelancers on Fiverr will look up anyone for a price.)
Critics argue that this system creates a digital dossier of individuals' movements, raising significant privacy issues.
The technology is legal because it captures publicly visible information, but its widespread use has sparked debates about surveillance and civil liberties.
Kilgore was referring to a system consisting of eight license plate readers, installed by the private company Flock Safety, that was tracking cars on both private and public roads. Despite being in place for six months, no one had told residents that they were being watched. Kilgore himself had just recently learned of the cameras. “We find ourselves with a surveillance system,” he said, “with no information and no policies, procedures, or protections.” The deal to install the cameras had not been approved by the city government’s executive branch. Instead, the Rough Hollow Homeowners Association, a nongovernment entity, and the Lakeway police chief had signed off on the deal in January 2021, giving police access to residents’ footage. By the time of the June city council meeting, the surveillance system had notified the police department over a dozen times.
The first article in this week's Thursday Threads was about Flock's law enforcement division.
But it isn't just police installing the technology.
This article describes the collaboration between a private homeowners association (HOA) and police departments to install license plate readers from Flock Safety.
In Lakeway, Texas, residents were unaware of a surveillance system tracking their vehicles, installed without proper city approval—just an agreement between the HOA and the police chief with no public announcement or comment.
Flock Safety, valued at the time at approximately $3.5 billion, marketed its cameras to over 200 HOAs nationwide, leveraging their substantial budgets and providing police access to private data.
The article also points out incidents of wrongful detentions due to erroneous alerts and highlights the risks associated with these systems.
AI-Powered Fusus Surveillance System Spurs Privacy Concerns as Adoption Grows in U.S. Towns and Cities
Spread across four computer monitors arranged in a grid, a blue and green interface shows the location of more than 50 different surveillance cameras. Ordinarily, these cameras and others like them might be disparate, their feeds only available to their respective owners: a business, a government building, a resident and their doorbell camera. But the screens, overlooking a pair of long conference tables, bring them all together at once, allowing law enforcement to tap into cameras owned by different entities around the entire town all at once. This is a demonstration of Fusus, an AI-powered system that is rapidly springing up across small town America and major cities alike. Fusus’ product not only funnels live feeds from usually siloed cameras into one central location, but also adds the ability to scan for people wearing certain clothes, carrying a particular bag, or look for a certain vehicle.
With the growth of camera networks (public and private), it was only a matter of time before someone tried to link them all together.
The article explores the rapid adoption of Fusus' AI-powered surveillance system.
Fusus connects various existing security cameras into a central hub, allowing law enforcement to access multiple live feeds simultaneously.
The technology also enhances existing surveillance systems with new capabilities like enabling the detection of specific clothing, bags, vehicles, and even transforming standard cameras into automatic license plate readers.
While some communities have embraced Fusus for its potential to improve public safety, others have raised concerns about privacy and the implications of constant surveillance.
The lack of transparency regarding police access to the system and its data analytics has sparked debate among residents and city councils.
Fusus has been marketed as a solution to enhance security, but critics argue it could lead to misuse without proper oversight.
Artist's Traffic Cam Photobooth Sparks Controversy and Cease-and-Desist Over Creative Use of NYC Traffic Cameras
When it debuted this summer, the Traffic Cam Photobooth (TCP) website offered a new twist on the surveillance state by enabling smartphone users to take selfies with New York traffic cams. By October, it had expanded to Georgia, Maryland, Minnesota, and Ireland. TCP was recently featured in an exhibit at Miami Art Week. But the future of the interactive site is uncertain, at least in New York City, where the Department of Transportation has 900-plus traffic cams accessible through the website. Its Office of Legal Affairs recently sent a cease-and-desist letter to Morry Kolman, the artist behind the project, charging that the TCP "encourages pedestrians to violate NYC traffic rules and engage in dangerous behavior."
The Traffic Cam Photobooth (TCP) websiteTraffic Cam Photobooth (TCP) website, created by artist Morry Kolman, allows users to take selfies with New York City's traffic cameras.
The NYC Department of Transportation—being spoilsports—issued a cease-and-desist letter to Kolman, claiming the site encourages unsafe behavior by pedestrians.
In response, Kolman creatively showcased the cease-and-desist letter using a long pole to photograph it with traffic cameras across Manhattan and Brooklyn.
Kolman views the project as a way to raise awareness about surveillance technologies and how to navigate living under such systems.
The source code is even on GitHub.
This Week I Learned: The word "scapegoat" originated in a 1530 bible translation
Early English Christian Bible versions follow the translation of the Septuagint and Latin Vulgate, which interpret azazel as "the goat that departs" (Greek tragos apopompaios, "goat sent out", Latin caper emissarius, "emissary goat"). William Tyndale rendered the Latin as "(e)scape goat" in his 1530 Bible. This translation was followed by subsequent versions up through the King James Version of the Bible in 1611: "And Aaron shall cast lots upon the two goats; one lot for the Lord, and the other lot for the scapegoat."
—Scapegoat, Wikipedia
Have you stared at a word and suddenly wondered about its origins?
This entry from the New York Times Flashback Quiz had me wondering about "scapegoat".
"scape" — "goat".
Why do we say that?
It comes from a phrase in the bible where a goat sent into the wilderness on the Day of Atonement as a symbolic bearer of the sins of the people — Leviticus 16:22, to be exact.
The translator coined the term from the interpretation of "the goat that departs" and "emissary goat" in that verse.
What did you learn this week? Let me know on Mastodon or Bluesky.
The OpenForum Academy (OFA) is pleased to announce its roster of Content Partners for the upcoming OFA Symposium 2025. This year’s Symposium will benefit from the perspectives and expertise of three leading organisations in this space as Content Partners: the Open Knowledge Foundation, the Open Source Initiative, and the Digital Public Goods Alliance.
Set to take place at FGV Rio Law School in Rio de Janeiro, Brazil on November 18-19, 2025, the OFA Symposium will address critical questions facing open technology ecosystems under the theme “Open Technology Impact in Uncertain Times.” The event will bring together academics and practitioners – including policymakers, researchers, industry leaders, and civil society representatives – to explore new, interdisciplinary research and ideas.
Why announce new partners for the Symposium?
This year’s event focuses on new understanding and innovative approaches related to shared digital challenges in an era of geopolitical shifts, economic instability, and rapid technological progress. To help advance our understanding of this space, it required new partners to help the Symposium expand its content and programming, as well as reach new audiences.
By bringing in Content Partners, OpenForum Europe (OFE) aims to grow and institutionalise this conference as a best-in-class, global, and collaborative research effort that supports the advancement of research and scholarship related to Open Source and Open Technologies. Said OFE Executive Director Astor Nummelin Carlberg: “The partnership with these three esteemed organisations reflects our commitment to bringing diverse perspectives and expertise to the Symposium. Their contributions will ensure rigorous, forward-thinking discussions that can inform policy and practice in the Open Source and Open Technology space.”
Let’s talk a little bit more about each of our partners and what they bring to the event.
Open Knowledge Foundation: Building a world open-by-design, where all knowledge is accessible to everyone
To the OFA Symposium, OKFN will bring its extensive expertise in open data, open content, and open knowledge to the event, expanding the OFA Symposium’s solicitation of research from different corners of the Open Technology Ecosystem. They will also contribute their expertise in network building to help us build a new model for the OpenForum Academy itself.
Renata Avila, CEO of the Open Knowledge Foundation, welcomes the announcement. “I encourage everyone to come to Brazil and join the OpenForum Academy Symposium 2025. This conference is becoming a cornerstone for the open movement; let’s create hard evidence and vital primary research together to make the case for Open Technologies. Equipped with facts, we can shift the tech landscape and guide policymakers to the right decisions for people, the economy, and the planet.”
Open Source Initiative: Strengthening collaborations with legal experts, policymakers, and maintainers, as well as addressing the complexities of the Open Source Definition in the age of AI
The Open Source Initiative (OSI), the stewards of the Open Source Definition, has existed for over 25 years. It remains actively involved in Open Source community-building, education, and public advocacy to promote awareness and the importance of Open Source software. As a leading organisation in the Open Source space, OSI participates in conferences and events, and meets with developers and users to discuss the economic and strategic advantages of Open Source technologies, licences, and models.
OSI’s collaboration with the OFA Symposium will contribute its deep understanding of Open Source licensing, governance models, and community development to the agenda, and ensure that its diverse community submits research on the important work they are doing. OSI’s participation will help ensure focus on sustainable Open Source ecosystems and the evolving relationship between Open Source and emerging technologies like AI.
Says Nick Vidal, Community Manager for OSI, commenting on the value of the event: “The OFA Symposium is a unique event that brings together voices across academia, industry, and civil society. We’re honoured to support it as a Content Partner, helping to foster the cross-sector dialogue essential to unlocking the full potential of openness for the public good.”
Digital Public Goods Alliance: Unlocking the Potential of Open Source Technologies For a More Equitable World
The Digital Public Goods Alliance (DPGA) joins as a content partner to highlight the importance of digital public goods in addressing global challenges. They steward the digital public goods (DPG) definition, which recognises DPGs as “… open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable laws and best practices, do no harm, and help attain the Sustainable Development Goals (SDGs).”
The DPGA will help the Symposium to facilitate discussions on international cooperation, equitable access to technology, and the role of open solutions in achieving the Sustainable Development Goals. The DPGA is excited to advance their foothold in the research space, collaborating with global partners to support and amplify research that advances understanding of the sustainability and impact of DPGs projects – open source technologies and communities which support social impact around the world.
Liv Marte Nordhaug, CEO of the DPGA Secretariat, embraces the enthusiasm driving the event and is excited to support the initiative. “Research on DPGs is crucial for an evidence-based approach that drives real impact, and the Digital Public Goods Alliance is pleased to support the OpenForum Academy Symposium as a content partner!”
How to get involved
Thanks to all our Content Partners for helping to shape the agenda for this upcoming event. The Call for Proposals for the OFA Symposium 2025 in Rio de Janeiro, Brazil will remain live until 1 June, so do not delay in preparing your submissions. More information on submission requirements and deadlines are available on our website, over at https://symposium.openforumeurope.org/.
If you are interested in sponsoring the Symposium, please reach out to OFE’s Senior Policy Advisor, Nicholas Gates, at nicholas@openforumeurope.org.
Nirdslab and WS-DL group Students Research Presentation and Demonstration
The day featured hands-on demonstrations of ongoing projects, including groundbreaking work in eye-tracking research by students Lawrence, Yasasi, Bhanuka, Kumushini, and James.
Lawrence presented his research on human-AI teaming, highlighting how electrodermal responses, heart rate, and speech data can be integrated using custom-built sensors—including the Rosetta Stone and various bracelet sensors—for real-time data collection. The ultimate aim of his work is to integrate these real-time data streams with AI models to infer the emotional state of a human team member during collaborative tasks.
Kumushini introduced the Project Aria glasses during her demonstration, explaining the diverse sensors integrated into the glasses and their respective capabilities. She detailed how data recording is achieved through the glasses and introduced the companion applications designed to streamline data collection. Furthermore, she described how the NIRDS Lab team employs Project Aria glasses for eye-tracking research, with a focus on studying joint visual attention—that is, how multiple people focus on similar objects or areas within shared environments. This research involves using several time-synchronized Project Aria glasses during user studies.
Yasasi presented her eye-tracking research focused on measuring visual attention in gaze-driven virtual reality (VR) learning environments, using the Meta Quest Pro VR headset. She demonstrated a VR learning application developed using Unity that features gaze-driven content rendering. In the application, virtual learning materials appear within the user’s field of view when they fixate on specific areas of interest (AOIs) and remain visible only so long as the user’s gaze stays on the designated AOI. As part of her study, Yasasi collects eye-tracking data, which she then analyzes to assess visual attention.
Bhanuka showcased his work on distributed eye tracking for online collaborations (DisETrac). This study presents a flexible system for capturing multi-user eye-tracking data over public networks, aggregating this data, and generating both individual and multi-user eye-tracking measures and visualizations. This innovative approach allows researchers to obtain naturalistic eye-tracking data in virtual collaboration settings—areas where traditional eye-tracking tools have often been limited.
James demonstrated an API for RAEMAP (Real Time Advanced Eye Movement Analysis Pipeline), which calculates advanced gaze metrics from raw eye-tracking data (specifically, x and y coordinates and timestamps). While RAEMAP was originally developed by Gavindya Jayawardena, earlier implementations required hard-coded parameters for eye trackers, datasets, and algorithm options. James’s research involves converting RAEMAP into a FastAPI application hosted in the cloud and accessible via Swagger UI. His work also includes the development of machine learning models to predict cognitive load from advanced eye-gaze metrics, with models trained using workload scales such as NASA-TLX.
Collectively, these projects explore how gaze patterns can provide insights into cognitive load, attention shifts, emotional states, and user interaction. The research reflects technical rigor and deep interdisciplinary thinking, merging computer science, psychology, and design to solve real-world challenges.
Dr. Michael Herzog explored how these insights might translate to adaptive learning systems and accessible technologies for individuals with disabilities, and sparked ideas for future joint research between Old Dominion University and Magdeburg-Stendal University.
Visit to Hampton Roads Biomedical Research Consortium (HRBRC)
During the tour, Dr. Michael Herzog met with researchers and technicians working on next-generation medical devices, biomedical 3D printings, and human systems integration—highlighting the rich ecosystem of applied research in the Hampton Roads area. Patrick Ball demonstrated the use of the HRBRC 3D printing lab.
Cultural exchange was woven throughout the visit. A highlight of the day included a traditional Iranian dessert generously prepared by Dr. Faryaneh Poursardar, offering a reminder of the shared humanity behind global scholarship. This gesture sparked warm conversations that extended beyond research—creating moments of connection that celebrated cultural diversity, hospitality, and mutual appreciation, curiosity. These personal interactions added depth to the academic exchange, reinforcing a spirit of respect, curiosity, and collaboration.
Thank you Dr. Poursardar @Faryane for the delicious traditional Iranaion desert! Taste Awesome. I think it got saffron, rice, and some other stuff. Maybe @faryane can answer if you have any questions. pic.twitter.com/IFA5prC13a
Dr. Michael Herzog's visit served not only as an opportunity to celebrate shared achievements, but also to identify future directions for student exchanges, joint publications, and collaborative grant initiatives.
We extend our heartfelt thanks to Dr. Michael Herzog for his continued partnership support, and to our students, collaborators, and faculty whose passion and hard work made this visit an outstanding success.
About the Authors:
1. Lawrence Obiuwevwi is a Ph.D. student in the Department of Computer Science, a graduate research assistant with the Center for Mission Engineering, and a proud student member of The Storymodelers, and The Web Science and Digital Libraries (WS-DL) Research Group, and Nirds Lab at Old Dominion University.
2. Kumushini Thennakoon is a Ph.D. student in the Department of Computer Science and is affiliated with The Web Science and Digital Libraries (WS-DL) Research Group and Nirds Lab at Old Dominion University.
Lawrence Obiuwevwi Graduate Research Assistant Virginia Modeling, Analysis, & Simulation Center Center for Mission Engineering Department of Computer Science Old Dominion University, Norfolk, VA 23529 Email: lobiu001@odu.edu Web : lawobiu.com
Community voting is now open for the 2025 DLF Forum and Learn@DLF! Community voting lets DLF and the Program Committee know which proposals resonate with our community. Results are weighed when developing the final event programs. Anyone may participate, and you may vote for as many proposals as you’d like, but each one once.
You’ll be asked to enter your email address. Email will only be used to ensure that each person votes just once, then will be de-coupled from the votes themselves.
Click the +Add button under each event name to select your favorites for each event.
Last Tuesday Cliff Lynch delivered an abbreviated version of his traditional closing summary and bon voyage to CNI's 2025 Spring Membership Meeting via Zoom from his sick-bed. Last Thursday night he died, still serving as Executive Director. CNI has posted In Memoriam: Clifford Lynch.
Cliff impacted a wide range of areas. The best overview is Mike Ashenfelder's 2013 profile of Cliff Lynch in the Library of Congress' Digital Preservation Pioneer series, which starts:
Clifford Lynch is widely regarded as an oracle in the culture of networked information. Lynch monitors the global information ecosystem for cultural trends and technological developments. He ponders their variables, interdependencies and influencing factors. He confers with colleagues and draws conclusions. Then he reports his observations through lectures, conference presentations and writings. People who know about Lynch pay close attention to what he has to say.
Lynch is a soft-spoken man whose work, for more than thirty years, has had an impact — directly or indirectly — on the computer, information and library science communities.
Below the fold are some additional personal notes on Cliff's contributions.
Ashenfelder notes Cliff's focus on collaboration:
Lynch is also a catalyst for action. He helps steer the conversation toward real results, such as standards creation, funding, tool development, metadata creation and interoperability. Ultimately, Lynch seems most fervent about collaboration as a crucial force.
“I would be reluctant to attribute much of anything just to my actions,” he said. “Most important successes come through the work of a lot of different people, collaborating and pulling it together. Maybe I can think of a place or two where there was a meeting that I spoke at or convened or I wrote or did something that just happened to fall at a pivotal moment. But any of that to me feels a bit accidental, at best just good luck, being in the right place at the right time.”
Michael Nelson and Herbert Van de Sompel's Cliff Lynch: The Invisible Influencer in Information Infrastructure provides an in-depth account of one occasion when Cliff was "in the right place at the right time" to spark a collaboration. The occasion was an October 1999 meeting in Santa Fe:
In order to further optimize the chances of success for the meeting, the collaboration of Cliff Lynch and Don Waters as moderators had been secured and turned out to be fundamentally important. In the Acknowledgments section of his PhD thesis, Herbert put Cliff’s impact on the direction of the meeting and on his own thinking as follows:
When starting to work on this thesis, I went back reading several of his early papers and could not feel other than intimidated by the far forward-looking vision expressed therein. At several occasions, I heard Cliff address large audiences, discussing complicated digital library matters with an amazing clarity. Cliff's work has always been a great inspiration to me. I met Cliff for the first time in person at the Open Archives meeting in Santa Fe, for which he had enthusiastically accepted my invitation to serve as a moderator. His involvement was crucial to the successful conclusion of the meeting.
...
Prior to the start of the second day, he vented his frustration about the lack of progress to Cliff, who was about to start moderating the first session. Cliff was nice enough to let him ramble on a bit, and, in a manner that exemplified one of Cliff’s many unparalleled capabilities, he went on to open the meeting by providing two discussion topics regarding interoperability that he somehow had been able to synthesize from the first day’s discussions, which most had experienced as enjoyable yet lacking in any sense of concrete direction. One was whether archive functions, such as data collection and maintenance, should be decoupled from user functions, such as search. The other was about the choice between distributed searching across repositories and harvesting from them to build cross-repository search engines.
The meeting solidified the long and productive collaboration between Van de Sompel and Nelson.
But easily the best way to understand how Cliff worked is the Report of ANADP I from 13 years ago. Cliff's "Closing Thoughts" are transcribed verbatim starting on page 309, and they are a wonderful example of his ability to summarize a meeting and set the agenda for future work with an extemporaneous address. You have to read all twelve pages — it is hard to summarize Cliff's summary, but here are a couple of gems:
With a portfolio of aligned strategies, we can collectively speak more effectively about the importance of the work we do, and certainly that has come up in a background way again and again as we’ve spoken about economics, education, about legal issues and barriers. I think that this question of really clarifying the fundamental importance of digital preservation to maintaining the cultural and intellectual record, the memory of our nations and of the world, has got to be a central objective. We have a great challenge in educating both the broad public in our nations and the governments that represent these publics; to the extent that we can align strategies we can make that case better.
And:
There are two words that I didn’t hear in the technical discussions. I get very scared whenever I hear a lengthy discussion of technical issues in digital preservation that doesn’t mention these two words. The first is Monoculture. There is a possibility, a danger, of doing too much alignment here. The reason for that is the second word that I didn’t hear, which is Hubris. We need to acknowledge that we don’t really know how to do long-term digital preservation. We’re going to have a lot more confidence that we know what we’re doing here about a hundred years from now as we look at what efforts actually brought data successfully a hundred years into the future. But in the relatively early stages of technologies like these, it’s much easier to identify failures than long-term successes.
Active on many advisory boards and visiting committees throughout his career, including serving as co-chair of the National Academies Board on Research Data and Information (BRDI) from 2011-16, Lynch’s contributions were recognized with numerous awards, including the American Library Association’s Joseph W. Lippincott Award, the American Society for Information Science and Technology’s Award of Merit, and the EDUCAUSE Leadership Award in Public Policy and Practice. He was a past president of the American Society for Information Science, and a fellow of the Association for Computing Machinery, the American Association for the Advancement of Science, and the National Information Standards Organization.
Vicky's and my tributes to Cliff are in three recent blog posts:
Cliff made a very valuable contribution to my career by inviting me to debug important talks before I gave them "for real" by giving early versions to the the Information Access Seminar at U.C. Berkeley's School of Information. Fortified by tea and cookies, I could talk and then be subject to detailed and insightful questioning by Cliff and the participants. I always meant to take notes during the questioning but I was so into it I never did. I had to re-construct the necessary changes to the talk from memory.
This search returns ten talks I gave to the Information Access Seminar. In date order they are:
LibraryThing is pleased to sit down this month with screenwriter, playwright and novelist Blair Fell, two-time winner of the Doris Lippman Prize in Creative Writing from the City University of New York for his novels, The Sign for Home (2022) and the brand new Disco Witches of Fire Island (2025). The Sign for Home, his debut, was both an Indies Next and Indies Introduce book, as well as being selected for library community reads, and long-listed for the Center For Fiction’s First Book Prize. Fell has written for television and theater, winning the Shine Award for his work on the television program Queer As Folk, and a Golden Mic award for his segment on the public television series California Connected. He is the author of dozens of plays, and has won the HX Camp comedy award, seven Dramalogue awards, and The Robbie Award. His essays have appeared in magazines and on websites such as Huffington Post, Out Magazine, New York Daily News, and Fiction Southeast. In addition to his career as a writer, actor and director, he has been an ASL interpreter for the Deaf since 1993. His second novel, Disco Witches of Fire Island, an LGBTQ+ fantasy romance featuring a coven of witches on Fire Island, is due out from Alcove Press in early May. Fell sat down with Abigail to answer some questions about his new book.
Disco Witches of Fire Island opens in 1989, and features a young hero who has recently lost his boyfriend to the HIV/AIDs epidemic, and who goes to spend the summer on New York’s Fire Island. How did the story first come to you? Did the character of Joe appear first, was it the idea of a young man who had recently lost his boyfriend, or was it something else?
Oddly enough, the character of Joe came to me last, since he is the one that mirrors me, but isn’t really me. He was definitely the most difficult character to create. It’s hard to fully see oneself, so I created a character that experiences much of what I had experienced at that age but probably is a bit more likable than me and slightly taller. (Haha)
As far as the rest of the characters, so many of them are amalgams of people I met during the height of the AIDS Crisis. My first partner died from complications due to the HIV virus while we were both still in our twenties. To complicate matters he had broken up with me two years prior, and I was still very much in love with him. Needless to say this was an extremely difficult thing to get over. In its aftermath, there was a series of life-altering events, including getting fired from a job, and then a whirlwind last-minute trip to China where I decided to be a writer. It was just after that trip when I moved to Fire Island Pines and landed a job as a bartender, and moved into the attic of those quirky “old” gay men (just as Joe, the main character, does). They were a hoot, and there was lots of drama. They’d play old disco all day, cook illicit substances on the stove, and (one of them) would make huge ornate hats to go out dancing in the wee hours. These men became like witches in my mind. So really the witches, and some other characters came to me first, because I had people to model them after.
Your book unfolds during a period of historic significance for the LGBTQ+ community. How did this inform the way you told the story, and what do you think readers of today can learn from these events?
I moved to NYC around 1988, and was trying to figure out my life, and get over that broken heart. It felt like everyone was dying or sick at the time (and a huge percentage of them were), and I had a sense of absolute helplessness. At that point I attended my first gay pride parade and saw ACT UP (The AIDS Coalition to Unleash Power) marching. I couldn’t believe there were people trying to fight the disease and government inaction. I left the sidelines of the parade and joined. (It was also at that parade I coincidentally saw my first lover for the last time – he is the person who would become “Elliot” in the novel.) Getting involved in activism completely changed my life.
I wanted to capture that shift from victim to actor in the fight. I also wanted younger people to know what it was like at that moment of history, when looking for love could be so fraught. Sadly, we are at another terrible moment in our history, and the book, despite being an historical romance of sorts, very much speaks to what we as a nation – and more specifically – what we as members of the queer community are facing now. It names the Great Darkness of hatred, and suggests that when a malevolent force like our current government is working against you, sitting in the despair of the oppression is not the solution… action is, whether that means protesting, donating, volunteering, making art and most importantly banding together. As several of the manifesto quotes in the book suggest, when confronted with the Great Darkness, the only solution is collective action… and to keep dancing.
Did you always know your story was going to feature witches? What does magic allow you to do, from a storytelling perspective, that couldn’t be accomplished otherwise?
One of the first inspirations for the book were those older roommates of mine on Fire Island, and how they suggested these lovable, quirky witches — cooking mysterious things on the stove, dressing in outlandish costumes, whimsical and sometimes mysterious references to things I didn’t understand. The other reason for the magic is to underline all those magical beings we lost due to the AIDS crisis and government inaction.
The world was very dark – and it feels that way again. The book is about getting one’s magic back in the face of that darkness. The magic in the book isn’t the wave-a-wand-and-go-poof sort of magic. It’s a type of magic rooted in the connection between lovers and friends – it’s a collective magic, that only comes from group effort. The use of magic allowed me to emphasize the other worldly quality of connection and put a button on the “otherness” of being queer.
Another inspiration for the book was from a late friend, Stephen Gendin, whom I met in ACT UP. He had once told me that he had a hope to create a “religion” based on the transcendence he experienced on the dance floors of gay dance clubs. This always stuck with me. So, yes, the witches in the book do have some limited magical abilities – especially when they are in unity with their fellows – but their practice is more of a spiritual nature and comes with its own “bible” of sorts, The Disco Witch Manifesto, which is quoted at the beginning of every chapter.
What made you choose Fire Island as the setting for your story? Have you spent time there yourself?
Like I mentioned, I had spent that one summer working in Fire Island Pines as a bartender in the early 1990s. I also did visit for several summers after that. Though I tend to be much more of a Ptown sort of guy these days – I like biking and the ability to leave without the benefit of a boat. Though P-Town has become more and more unaffordable. We need NEW gay meccas where the queer artists, writers and witches can afford to go.
You write in a number of different genres, from essays to plays. What distinguishes the process of writing novels? Are there particular challenges or rewards?
I never even dreamed of writing a novel when I first started writing. That was way too big for me. But now looking back, I probably should have started much earlier. My first go at a full-length play was a serialized story where the audience would have to come back to the theater twelve times to see the whole thing. You read that right – twelve times. I think I always wanted to take my time with a story. I also thought I needed actors to make my writing good. With novels I arrived very, very late to the game and sort of accidentally found my way to my first novel. What happened was, I had an idea for a play and sat down to write it, but it just didn’t want to be a play. It wanted to be a novel. I was at a point in my life where I had nothing to lose, and I just faked it, one chapter after the next. I’d bring it into my writing group, and then after a few years, finished it, sent it to an agent, and then after a few revisions, he took it and sold it. It appeared I was able to write novels, and now I don’t want to do much else. I love the long journey of them, the surprises, the creation of worlds, and multiple characters.
A play or a TV show is inherently a collaborative process, and you also need to wait around for others to bring the project to fruition. With a novel, I get to say when and where the important work happens, and that’s a more comfortable place for me – especially since I’m not at all patient.
What is next for you? Are you working on more novels, or more plays? Do you think Disco Witches of Fire Island will ever be adapted in film?
Well, I certainly would love to see Disco Witches of Fire Island get adapted. I think it would be a great limited series as well. I do love writing essays and memoir, but I still have the novel-writing bug, so I’m probably sticking with that for the time being. We shall see. I don’t think there will be more plays or TV anytime soon, but I’ll never say never.
As far as books go, I’m currently working on two new novels, one of which, a pansexual Elizabethan romance, is out there being read by editors as we speak, while the fourth is just starting to make an appearance in my Scrivner software, but I’m torn about which of two ideas I want to live with for the next few years. Starting something new is never easy, especially with the distractions of this messed up world in which we’re living, but I’m willing to knuckle down and do the grind. It’s all about throwing down words and separating the shit from the sparkles.
Tell us about your library. What’s on your own shelves?
At Union College Schaffer Library, the digitization lab is mostly staffed by undergraduates who only work a handful of hours a week. While they do a great job, the infrequency of their work hours and lack of experience results in errors in digitization and metadata. Many of these errors are difficult to catch during quality control checks because they are so minute, such as a missed counted page number here, or a transposed character in a filename there. So, a Computer Science student and a librarian collaborated to create a quality control automation application for the digitization workflow. The application is written in Python and relies heavily on using Openpyxl libraries to check the metadata spreadsheet and compare metadata with the digitized files. This article discusses the purpose and theory behind the Quality Control application, how hands-on experience with the digitization workflow informs automation, the methodology, and the user interface decisions. The goal of this application is to make it usable by other students and staff and to build it into the workflow in the future. This collaboration resulted in an experiential learning opportunity that has benefited the student's ability to apply what they have learned in class to a real-world problem.
The Dublin Core Metadata Initiative has published a minimally constrainted vocabulary for the concepts of Work, Expression, Manifestation and Item (WEMI) that can support the use of these concepts in metadata describing any type of created resources. These concepts originally were defined for library catalog metadata and did not anticipate uses outside of that application. Employment of the concepts in non-library applications is evidence that the concepts are useful for a wider variety of metadata users, once freed from the constraints necessitated for the library-specific use.
Since the 17th century, scientific publishing has been document-centric, leaving knowledge—such as methods and best practices—largely unstructured and not easily machine-interpretable, despite digital availability. Traditional practices reduce content to keyword indexes, masking richer insights. Advances in semantic technologies, like knowledge graphs, can enhance the structure of scientific records, addressing challenges in a research landscape where millions of contributions are published annually, often as pseudo-digitized PDFs. As a case in point, generative AI Large Language Models (LLMs) like OpenAI's GPT and Meta AI's LLAMA exemplify rapid innovation, yet critical information about LLMs remains scattered across articles, blogs, and code repositories. This highlights the need for knowledge-graph-based publishing to make scientific knowledge truly FAIR (Findable, Accessible, Interoperable, Reusable). This article explores semantic publishing workflows, enabling structured descriptions and comparisons of LLMs that support automated research insights—similar to product descriptions on e-commerce platforms. Demonstrated via the Open Research Knowledge Graph (ORKG) platform, a flagship project of the TIB Leibniz Information Centre for Science & Technology and University Library, this approach transforms scientific documentation into machine-actionable knowledge, streamlining research access, update, search, and comparison.