The #TeslaTakedown protests have gone on long enough that it is time to make a public archive of signs.
For the latest signs, I've included a link where you can download a PDF to print your own.
Please use these if you'd like; if you want to give me something in exchange, just tag me on Mastodon or Bluesky so I know how far these have spread.
Also, Marc Lee from Free Protest Signs reached out on Bluesky to let me know about his website of signs.
If you don't like something below, maybe one of his will suit your mood!
All of the ABOVE!
All of the ABOVE! protest sign, first used on 26-Apr-2025
The meanness, the illegality, the stupidity...it is all more than I thought possible and it is certainly not what deserve from our government.
And it is not just one of these attributes, but all of them coming from all of this administration's elected, confirmed, and senior leaders.
Download and print your own 26" by 16" version of this sign.
Elected Assholes
Elected Assholes protest sign, not used by the author
This crap is well past getting out of hand, and I wanted a sign that reflected that.
The government—in my name as one of its citizens—is deporting people without due process?
It is bullying foreign leaders in the Oval Office?
It is recklessly dismantling medical research, food safety programs, and environmental controls?
This doesn't represent my values, nor—I'd wager—the values of most of the country.
The focus group (my family members) weren't a fan of the unnecessary crassness of the sign.
I want to so something with the concept of "My GOVERNMENT did WHAT?!?", so I'll probably revisit this.
Disaster Musk protest sign, first used on 12-Apr-2025
A few weeks ago, I saw the picture of a smirking Elon Musk in a New York Times article, and I knew I needed to make use of it somehow.
Inspiration struck this week when I remembered the 'Disaster Girl' meme.
The picture with a four-year-old girl looking back at the camera as a firefighters battle a house fire with the "devilish smirk" is a perfect fit for what Elon is doing to our federal government.
Download and print your own 26" by 16" version of this sign.
Get Angry at Billionaires
Get Angry at Billionaires protest sign, first used on 5-Apr-2025.
Returning to the theme of my first protest sign, I wanted to convey that the people giving us the middle finger as they drive by have more in common with us than they do with the billionaires they are supporting.
Or, if you still want to be up in arms with me, then just come stand an hour at the protest and get your free check from George Soros for protesting. (← sarcasm)
Download and print your own 26" by 16" version of this sign.
Signals
Signals protest sign, first used on 29-Mar-2025.
This was the week that the news broke about senior government officials using the consumer-grade chat app Signal to discuss warn plans.
I went off-script that week with a sign about that political nonsense.
It is a play on the phrase "The call is coming from inside the house!" — a play on a famous movie trope where the police tell the person in a home that they have traced the antagonist's call to that home.
In this case, the danger to democracy is coming from inside the Whitehouse!
Or, at least, that is what I was aiming for.
Download and print your own 26" by 16" version of this sign.
Elon MUSKed Up
Elon MUSKed Up, first used on 22-Mar-2025
Back to basics, I thought.
People are driving by quickly, so too much text won't be read.
So this was the idea:
Set the context: "Our GOVERNMENT was FINE."
Deliver the punchline: "Now it is MUSKed UP!"
Clear call-to-action: "FIRE ELON!" (in a flaming font, nonetheless)
And that seemed to work.
This might have been my best sign yet.
Download and print your own 26" by 16" version of this sign.
DEMOCRACY, not MUSKocracy, not TRUMPocracy
DEMOCRACY, not MUSKocracy, not TRUMPocracy protest sign, first used 15-Mar-2025.
This was my second protest sign, and I'm clearly still working on the craft.
Although the points of this sign didn't require a blog post to explain, it still had too many words on it.
That made it hard to read from cars that were driving by.
It might have been okay without the "Kings and Oligarchs are not American" in the middle, but without it I thought it lost its punch.
I'm feeling pretty ambivalent about it, so I haven't gone through the process of making it available for download; let me know if you'd like to have a printable version.
Our Fellow Americans
Our Fellow Americans protest sign, first used on 8-Mar-2025.
This is the first sign I made for a #TeslaTakedown, and I should have listened to my family.
They suggested that the initial version, without the "How much do you have in common with Elon Musk?" at the bottom, was too confusing.
Adding that sentence improved understanding, but now there was too much to read in a protest sign for cars whizzing past.
My point was that me and the person driving by giving me a middle finger have far more in common than what either of us have with Elon Musk (and Donald Trump).
Even so, I felt like I needed a blog post to fully explain what I meant.
I'm feeling pretty ambivalent about this one, too, but if you'd like a printable version please let me know.
Although this was my first sign, it was my second #TeslaTakedown protest.
I learned quickly that signs are an important part of the protest spirit, and the more creative the better.
This is an excerpt from a contribution I made to Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools [pdf]. It is a sketch only, and somewhat informal, but I thought I would put it here in case of interest. It is also influenced by the context in which it was prepared which was a discussion of the informational disciplines and the iSchool. In the unlikely event you would like to reference it, I would be grateful if you cite the full original: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward, Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools. University of Washington Information School. [pdf]
Introduction
The word information has been used so much that it has come to dominate discourse (Day, 2001). […] Vagueness and inconsistency are advantageous for slogans and using “chameleon words” that assume differing colors in different contexts allows flexibility for readers to perceive what they wish. Buckland, M. (2012). What kind of science can information science be? Journal of the American Society for Information Science and Technology.
I should like to draw an analytical distinction between the notions of “information society” and “informational society,” with similar implications for information/informational economy. The term “information society” emphasizes the role of information in society. But I argue that information, in its broadest sense, e.g. as communication of knowledge, has been critical in all societies […]. In contrast, the term “informational” indicates the attribute of a specific form of social organization in which information generation, processing, and transmission become the fundamental sources of productivity and power because of new technological conditions emerging in this historical period. My terminology tries to establish a parallel with the distinction between industry and industrial. An industrial society (a usual notion in the sociological tradition) is not just a society where there is industry, but a society where the social and technological forms of industrial organization permeate all spheres of activity, starting with the dominant activities, located in the economic system and in military technology, and reaching the objects and habits of everyday life. My use of the terms “informational society” and “informational economy” attempts a more precise characterization of current transformations beyond the common-sense observation that information and knowledge are important to our societies.Castells, M. (2009). The Rise of the Network Society, With a New Preface : the Information Age: Economy, Society, and Culture Volume I (2nd ed)
I look very briefly at our senses of information itself as it is the context for the discussion about informational disciplines to follow. It is certainly a ‘chameleon’ word. It has become so widely used as to become drained of specificity unless explicitly qualified in particular circumstances.
Raymond Williams’ Keywords does not include an entry for ‘Information,’ which is telling. This influential work gave rise to multiple subsequent collections which aim to update it or adapt it to particular domains. I quote from the entries on ‘information’ in three of these here:
Information as keyword — digital or otherwise — did not exist before the twentieth century. […] Then, unexpectedly, in the 1920s this formerly unmarked and unremarkable concept became a focal point of widespread scientific and mathematical investigation. ‘Information’ by Bernard Geoghegan in Peters, B. (Ed.). (2016). Digital Keywords.
Toward the end of the C20 “information” became a popular prefix to a range of concepts that claimed to identify essential features of an emerging new sort of society. The information explosion, information age, information economy, information revolution, and especially information society became commonplace descriptions (Castells, 1996-8; Webster, 2002). These covered, and tried to conceive, disparate phenomena, perhaps unwarrantedly. The concepts appeared superficially to capture similar phenomena, yet on closer inspection centered often on quite different things. For example, their concern ranged over a general increase in symbols and signs that accelerated from the 1960s (the information explosion); the development of information and communications technologies, especially the Internet (the information superhighway, reputedly coined by US vice-president Al Gore), the increased prominence of information in employment (information scientists, information labor, information professions); the growing significance of tradable information (information economy); and concerns for new forms of inequality (the information divide, the information rich/poor).‘Information’ by Frank Webster in Bennett (2005) New Keywords: a Revised Vocabulary of Culture and Society.
As burgeoning use of information in preference to related terms encroaches on the word’s surrounding lexical field, questions arise as to how everything from the human genome to celebrity gossip can so readily be referred to as information. ‘Information’ in MacCabe, C., & Yanacek, H. (2018). Keywords for today : a 21st century vocabulary : the keywords project.
Geoghegan notes the relatively recent general use of the word, and is primarily interested in the information-theoretic work of Shannon and others in the mid-20th century which was important for the development of telegraphy and in subsequent years of computing, cryptography, genetics, network theory and other areas (although was ultimately not very influential in Library Studies or related fields, as Hjørland (2018) points out). The other two excerpts emphasise growing use of the word throughout the latter half of the 20th Century. Webster (a sociologist who has specialized in information-related topics and has written about public libraries) notes the way in which it became attached to various generally descriptive labels, notably of course in the ‘Information society’ and for our purposes ‘Information science.’
The quotes above underline the strong emergence of information-related issues as a topic of investigation, and as an explanatory framework in different contexts. Given this widespread use, any account of information is also going to be provisional and contextual.
As background here, I sketch a very schematic overview of information history which departs from the W. Boyd Rayward (2014) account which influenced it.
However, as Castells notes, this “common-sense observation that information and knowledge are important to our societies,” is not in itself very revealing.
This may seem a little hubristic, but I am prompted to insert it here by the common assertion in iSchool materials – accompanied by such phrases as the ‘knowledge economy’ or the ‘information age’ – that the importance of information in our world elevates the work of the iSchool.
However, as Castells notes, this “common-sense observation that information and knowledge are important to our societies,” is not in itself very revealing.
Information has also gone beyond the bounds of any one subject. The chemist or the cultural geographer or the sociologist has an informational perspective. In this context, it seems to me, the promise of the iSchool is not that it has specialist or unique expertise, but that it can bring together a historical perspective and a multidisciplinary focus.
For reasons that should be clear, I do not attempt to define information here. See Bates (2010) for an exhaustive review of definitions, which, more than anything, suggests why a singular view is unlikely.
For convenience, I reference Michael Buckland’s (2017) pragmatic and functional account.
Information as thing. Informational artifacts – books, passports, menus. Broadly synonymous with Buckland’s inclusive view of ‘document.’
Information as process. Becoming informed.
Information as knowledge. Intangible. Fixed in ‘things.’ Imparted in the informing process.
In their reflective overview of information definitions in IS, Dinneen and Bauner (2017) note that “Buckland was aware that the overall account was likely to disappoint the pickiest of theorists.” While a more conceptual characterization—such as Bateson’s "a difference that makes a difference" or Bates’ "the pattern of organization of matter and energy"—might offer additional nuance or insight, it is less well-suited to my purposes. Dinneen and Bauner (2017) favor the recently influential work of Floridi, which also, incidentally, is highlighted in the important information science textbook, Bawden and Robinson (2022).
I reference it mostly because its somewhat technocratic emphasis is convenient. Much of the emphasis of library studies or information science is indeed on the recorded information that can provide a part of the material base for some of the more abstract or general uses above. And also partly because Buckland is such an interesting and historically influential figure in this discussion (librarian, leading Information Science theorist and practitioner, central player in the iSchool movement at Berkeley (Buckland, 2024)).
I take the pivotal mid- to late mid-twentieth Century as a starting point. As noted above, information was foregrounded in a variety of ways at this time, and terms such as ‘information science’ and ‘information society’ emerged. I refer to this as the short age of documents, a reference to the discourse around the challenge of managing recorded information.
Then, we have the long period before this, in which recorded information was manifest in print or manuscript forms. I refer to this as the long age of literacy and print.
And third, we have the short period after this, which we are now living through. While we can characterize this as a digital or network age, the more interesting point here is that an informational perspective becomes more common, extending to social and political contexts. Modern institutional constructs -- markets or bureaucracies for example -- can be seen in informational terms. I refer to this as the informational age (influenced by Castells’ characterization of the current period).
Furthermore, it is now common to reinterpret the past through an informational lens. Perhaps the most interesting recent example of this is Youvel Noah Harari&aposs ambitious The Nexus, which was published recently to mixed reception. He tends to see any ‘intersubjective reality’ as informational. He takes a long historical view, discussing developments as stages in the emergence of information networks. For example, he discusses the difference between democratic and totalitarian regimes as a difference between self-correcting and closed information networks. He talks about civilizations as combinations of a managerial and operational bureaucracy and a legitimating or imposed mythology, each again very much an informational apparatus. He reinterprets the past in terms of an informational present; here he is, for example, talking about the impact of printing: “print allowed the rapid spread not only of scientific facts but also of religious fantasies, fake news, and conspiracy theories.”
In this period also there is a strong emphasis on information critique, and with the advent of AI, perhaps, as I will suggest, we are seeing the apotheosis of the document.
The long age of literacy and print
The emergence of the document marked an important early transition.
Writing allowed thought to be externalized, fixing information in a medium that could persist across space and time. In an oral culture, knowledge had to be remembered. Mnemonic techniques and repetition aided memory. It was retained through song, story and ritual.
As Walter Ong and others have argued, the external documentary accumulation of knowledge co-evolved with a profound shift in the structure of thought, knowledge and communication, evident in the development of more abstract and systematic forms of thinking, the emergence of formal learning and scholarship, the development of laws, the codification of expertise, and so on. This intensified after the invention of movable type.
Libraries are a strongly institutionalized response to the print distribution model, where information or cultural resources were expensive or available in limited ways.
From then until the Second World War, say, information exchange was dominated by the production and exchange of print. Infrastructures and institutions emerged which helped create, manage and preserve documents, including libraries, archives, scholarly societies and publishing, publishers, commercial distribution mechanisms, and so on.
Libraries are a strongly institutionalized response to the print distribution model, where information or cultural resources were expensive or available in limited ways. Efficient access required the physical proximity of collections to their potential readers, and libraries built local collections to make them conveniently available within their communities. In this way, collection size and transaction volumes became a signifier of quality. Those interested in technical, scholarly, cultural or other documents built their workflow around the library.
So, while at a material or mechanical level, we see the progressive intensification and amplification of the production and exchange of documents, a greater variety of ways of processing information, and the massive accumulation of recorded knowledge, the more interesting story involves the mutual interaction between this and social and cultural life.
The progressive connectedness and complexity of social contexts entails progressively more communication across time and space, and a corresponding expansion of the external shared documentary accumulation of expertise and knowledge. Libraries are a part of the apparatus for retaining and sharing that documentary record.
The short age of documents
In the mid-20th Century, the production, circulation, and institutionalization of information expanded significantly, shaped by the ongoing interplay between social structures, technological developments, and organizational demands. The Second World War itself accelerated this, intensifying the need for systematic coordination across scientific, governmental, and industrial domains. There was growth in the scientific and technical literature, while governments and institutions expanded their administrative records, accompanying new forms of bureaucratic surveillance and tracking. Businesses became increasingly reliant on structured data, for decision-making, to optimize workflows, to comply with regulatory frameworks, and so on.
In this way, the volume and variety of documents (again, broadly understood) continued to grow, as did the processes by which they were shared. It is in this period, as noted above, that Information Science emerged, as a response to the challenges of managing this abundance of information in various ways.
Through the sixties and later, there was a focus on the structural changes brought about by “knowledge-intensive production and a post-industrial array of goods and services.” (Lash) This work was informed by the empirical work of Porat and Machlup and others and was given influential synthetic expression as the ‘information society’ or the ‘post-industrial society’ in the work of Daniel Bell. Peter Drucker also popularized the concepts of ‘knowledge work’ and ‘knowledge worker’ during this period.
In successive decades, digital information systems emerged – chemical information, health and legal systems, early knowledge management, and so on.
In this period also there was some modest social institutionalization of Information Science and related areas. The American Documentation Institute (ADI) was founded in 1937, becoming the American Society for Information Science (ASIS) in 1968. It acquired a final ‘t’ for technology in 2000, and finally became the Association for Information Science and Technology in 2013. The Institute of Information Scientists was formed in the UK in 1968, and merged with the Library Association in 2002 to form CILIP. IFIP (International Federation for Information Processing) was founded in 1960.
The evolving informational age
Then it looks at how the logic of information flows reterritorialize into new formations of the brand, the platform, the standard, intellectual property and the network. […] The primary qualities of information are flow, disembeddedness, spatial compression, temporal compression, real-time relations. It is not exclusively, but mainly, in this sense that we live in an information age. Lash, Scott (2002). Critique of information.
In his highly influential three volume The Information Age, Castells talks of a network society and an informational society. I will use this to frame an introduction to the current period, dating from the late 20th Century.
Contributory factors in this period are the accelerated evolution of communications and computational capacity which has provided the material base for a range of other developments. These include the restructuring and intensification of capitalism brought about by deregulation, privatization, global extension and geopolitical changes; the network flows of money, data and people which have changed how we think about the boundaries of organizations, nations and personal relations; and the ongoing transformation of the media, personal communications, and the means of shaping public opinion.
This environment rests on complex network systems, aggregations of data, and applications which communicate via protocols and APIs. This material base has co-evolved with social organization. For Castells (as for Lash and Harari), a key feature is the organizing power of networks, throughout all aspects of what we do.
For example, network effects have led to several dominant platforms that articulate much of our social, cultural and business activities (the Amazoogle phenomenon). Retail, music and entertainment were transformed. The flow of materials is monitored by tracking systems, and is articulated in complex just-in-time supply chains; mobile communications and mapping services have changed our sense of mobility and delivery; distribution chains, the disposition of goods around retail floors, investment decisions, variable real-time pricing, and many other taken for granted aspects of what we do are driven by the collection, exchange and analysis of data.
From a functional point of view, varieties of &aposInformationalisation&apos are visible at all levels in everyday life: doors open automatically, material money is disappearing; advanced instrumentation for observation and analysis is common in the sciences. Increasingly, our activities yield up data which influences what products are offered to us, the news we see, and so on.
Something as apparently simple as the selfie has interacted with behaviors to affect mental wellness, the travel industry and communication.
It is this sense of qualitative change that prompts Manuel Castells to pose the distinction between the ‘information society’ and the ‘informational society.’ We are now seeing an intensification of some of the informational trends he observes as AI becomes more common.
Castells discusses how the network has facilitated broad coordination of interests, in social movements, popular uprisings, or organized crime, for example. It extends to a global scale, where a network of megacities channels power and innovation. He suggests that there is at once a global integration facilitated by the network, but, at the same time, a growing fragmentation between those connected to the network circuits of power and prosperity and those not connected.
More recently, we also see a counter force to global integration. Rather than frictionless global information flow, we are seeing regimes forming around power blocs, with different policy, control and regulatory regimes. Think of the US, EU, Russia and China. There has also been some argument that unequal participation in the ‘knowledge economy’ is a factor in emerging political polarization.
Given this general importance, there has also been an interesting and unsurprising informational development in modern theory. This has come into our field most clearly perhaps in Jurgen Habermas’ concept of the public sphere, but think also, for example, of Anthony Giddens’ concept of ‘reflexive modernity’ or Ulich Beck’s ‘risk society.&apos
For Giddens&apos ‘reflexive modernity’ entails the “... the reflexive ordering and reordering of social relations in the light of continual inputs of knowledge affecting the actions of individuals and groups” so that “production of systematic knowledge about social life becomes integral to system reproduction.” Modern life rests on the dynamic reassessment of available information and expertise (in construction, engineering, medicine, technology, …), which builds on the accumulated record of science and technology.
I have chosen to reference the social sciences here, as informationalization and social and cultural change are intricately linked. It is also notable how little reference there is to classical information science in this discussion.
The generative turn: the apotheosis of the document
Instead, these AI systems are what we might call cultural technologies, like writing, print, libraries, internet search engines or even language itself. They are new techniques for passing on information from one group of people to another. Asking whether GPT-3 or LaMDA is intelligent or knows about the world is like asking whether the University of California’s library is intelligent or whether a Google search “knows” the answer to your questions. But cultural technologies can be extremely powerful—for good or ill. Alison Gopnik, 2022.
The current form of generative AI emerged as late as 2023. I find Alison Gopnik’s characterization of it as a cultural technology helpful. She places it in a historical context as the latest technique for passing information from one group of people to another, again considering information very broadly. I was interested to see her place libraries in this frame as well.
Effectively, large language models are statistical models derived from vast accumulations of documentary representations of knowledge. In the context of the narrative presented here, the volume and variety of documents is now so great that they are treated as a proxy for knowledge. Proponents of intelligent AI argue that the models, working with both the broad accumulated representation of knowledge in the training collections, and with massive compute, can find a way to not only summarize and generalize from the content of those documents, but also to replicate the minds that created them.
I tend to Gopnik’s skepticism on this question (see Yiu, E., Kosoy, E., & Gopnik, A. (2024) for an extended argument).
Nevertheless, the processing powers of the models make them very effective for some purposes, and the agentic and applications infrastructure being built on top of them promise to make them more so. We do not know yet whether and where developments will plateau, or how adoption varies by tolerance for hallucinations,[1] or where the impact will be most felt.
However, given the key role of documents (information) in managing complex organizations and interactions, some see the reach of AI as extensive. In this way, the informational, reflexive, networked nature of social life is potentially further intensified.
This extensive informationalization is why Harari, for example, is concerned about the potential reach and impact of AI, as the systemic processability of the connective informational tissue of organizations and systems, he argues, renders them vulnerable to manipulation.
This extensive informationalization is why Harari, for example, is concerned about the potential reach and impact of AI, as the systemic processability of the connective informational tissue of organizations and systems, he argues, renders them vulnerable to manipulation.
Of course, the ramifications of AI for libraries and for iSchools are accordingly significant. It intensifies some of the trends we have observed, and -- as with other activities at scale -- has both constructive and problematic elements (to use a phrase of Barrett and Orlikowski’s).
If we think of the informational disciplines having a special interest in recorded information, some immediate issues arise.
Cultural synthesizers. Synthesized content and context add a new dimension and challenge.
Iterative and chained interaction. We will interact differently with information objects or bodies of knowledge. Think of how larger publishers or aggregators will provide access to the scholarly literature, for example.
Social confidence and trust. Our sense of authenticity, identity, authorship will all be redefined, creating issues of trust and verification.
Policy, law and practice will all evolve unevenly in concert.
Information critique
Our simplest actions or interactions now entail complex informational networks and platforms. Think of what is involved in just texting, sending an email or writing in cloud-based Office 365, whatever about group document preparation, remote experiments, or mapping activity.
Day to day behaviors yield up data which is aggregated at scale and used in various ways to monitor, sell or advertise. Large companies have built vast consolidated infrastructure - we are used to thinking of information as immaterial, however, AI has also emphasized how the cloud has boots of concrete. These companies also wield great cultural and economic power - Spotify does our listening for us, Amazon holds sway over merchants on its site.
These social and cultural ramifications mean that undesirable effects are visible and urgent. Addressing these has become an ongoing research, education and advocacy role for the informational disciplines, among others. There is also greater historical sensitivity, an alertness to the ways in which experiences, memories and knowledges may have been suppressed, distorted or invisible (see Benedict Anderson’s classic discussion of museums, maps and other resources in the emergence of nationalism, for example).
Here is a non-exhaustive list of information issues.
Inequity. Given the centrality of the network and digital resources, differential access creates inequities.
Surveillance. There is an increase in direct surveillance and also increased collection of data which drives other aspects of our environment. We are generating data shadows which are operationalized in various ways to influence or inform.
Market concentration. The winner takes all dynamic of network services has resulted in the dominance of several platforms who wield great economic power and influence.
Dominant or partial perspectives. Perspectives which are historically dominant, or politically motivated, or which reflect imbalances of power and influence may be over-represented in any resource. The plurality of experiences, memories and knowledges is under-represented in the record.
Dis- and misinformation/’degraded democratic publics’. Our reliance on flows of information has led to concerted attempts to distort, mislead or defraud. Henry Farrell has argued that there is a more fundamental problem, which is “not that social media misinforms individuals about what is true or untrue but that it creates publics with malformed collective understandings.” (Farrell)
Geopolitical fragmentation. Rather than a global information flow, as noted above, we are seeing regimes forming around power blocs, with different policy, control and regulatory regimes. Think of the US, EU, Russia and China.
Information today – systems of information
I kept this section in for completeness although its references to &aposlater&apos etc are to the origin document [pdf].
Information, then, is at once fugitive and everywhere, chameleon-like.
I noted above how libraries, Information Science, and the iSchool emerged in different phases of the information evolution. And in some ways, they reflect elements of when they emerged.
The library is different from the other two, in that I talk about the library itself as an organization, rather than as a body of knowledge or techniques. Libraries emerged in the first phase described above. Historically, the collection was the core of library identity, as an organized response to accessible distribution in a print world, and to the preservation of knowledge. The library continues as an organized response by cities, universities and others to learning, research and equity of access to the means of creative production. In this way the scope has moved beyond the collection in various ways, as discussed further below. As an organization, the library benefits from education and research in information management topics, but also across a range of other topics (public policy, for example).
Information science emerged, in the second phase, in the mid-20th Century. In this narrative, its origin story splits from the library as information production grows, requiring new methods of organization and access, and it anticipates elements of today’s information environment. As discussed further below, common to definitions of Information Science is a concern with documents (or recorded information, literatures, and similar).
Although, it doesn’t make sense to lean on it too heavily, one might say that Information Science largely retains an information view of the world, concerned with access to and management of information as a thing (in Buckland’s terms).
In the third phase, information is not only seen as something to be managed or discovered, but as an organizing element of social structure and interaction. It has become an object of study across many disciplines and in social and cultural analysis.
The iSchool has emerged in this third phase and it typically embraces a broad set of informational interests. In some ways it subsumes Information Science interests in a very broad view of information in the modern world.
A large part of typical iSchool portfolio is information systems oriented, at undergraduate and graduate levels, meeting needs for workers with technology, business and social skills. It may be a more applied alternative to computer science. It may also encompass expertise in other informational fields (policy, philosophical/social/cultural, data science, HCI, digital humanities, and so on). The broad disciplinary spread also potentially encompasses social and philosophical perspectives as well as very often a strong information critique emphasis.
Borrowing a suggestive phrase from an article by Black and Schiller (2014), one could say that the iSchool is often interested in both information systems and systems of information, ‘systems that create information through social means.’
Of course, the iSchool is not a discipline – it is an evolving academic structure, although, as I have noted, it may be associated with a broad view of Information Science (understood generically not in the classical sense) or Information Sciences. Informatics, a term which emerged in the 1960s (often associated with another term, as in health or social informatics) may also feature.
The focus and disciplinary spread varies across schools.
References
Anderson, B. (2016). Imagined communities: reflections on the origin and spread of nationalism (revised edition). Verso.
Barrett, M., & Orlikowski, W. (2021). Scale matters: doing practice-based studies of contemporary digital phenomena. MIS Quarterly, 45(1). https://doi.org/10.25300/misq/2021/15434.1.3
Bates, M. J. (2010). Information. In M. J. Bates & M. N. Mack (Eds.), Encyclopedia of Library and Information Sciences, 3rd Ed. (Vol. 3). CRC Press. https://pages.gseis.ucla.edu/faculty/bates/articles/information.html
Beck, U., & Ritter, M. (1992). Risk society : towards a new modernity. Sage Publications.
Bell, D. (1976). The coming of post-industrial society : a venture in social forecasting. Basic Books.
Bennett, T., Grossberg, L., & Morris, M. (Eds.). (2005). New Keywords: A Revised Vocabulary of Culture and Society. Wiley-Blackwell.
Black, A., & Schiller, D. (2014). Systems of information: The long view. Library Trends, 63(3), 628–662. https://hdl.handle.net/2142/89724
Buckland, M. (2012). What kind of science can information science be? Journal of the American Society for Information Science and Technology, 63(1), 1–7. https://doi.org/10.1002/asi.21656
Buckland, M. (2017). Information and society. The MIT Press.
Buckland, M. (2024). The Berkeley School of Information: A Memoir. https://escholarship.org/uc/item/79v080z7
Castells, M. (2009). The Rise of the Network Society, With a New Preface : the Information Age: Economy, Society, and Culture Volume I (2nd ed). Hoboken, N.J.: John Wiley & Sons, Ltd.
Dinneen, J. D., & Bauner, C. (2017). Information-not-thing: further problems with and alternatives to the belief that information is physical. CAIS/ACSI `17: Proceedings of the Annual Conference of the Canadian Association for Information Science / l’Association Canadienne Des Sciences de l’Information. https://philarchive.org/archive/DINIFP
Harari, Y. N. (2024). Nexus : a brief history of information networks from the Stone Age to AI. Random House, an imprint and division of Penguin Random House LLC.
Rayward, W. Boyd. (2014). Information Revolutions, the Information Society, and the Future of the History of Information Science. Library Trends, 62(3), 681–713. https://doi.org/10.1353/lib.2014.0001
Williams, R. (2014). Keywords (New Edition). Oxford University Press.
Yiu, E., Kosoy, E., & Gopnik, A. (2024). Transmission Versus Truth, Imitation Versus Innovation: What Children Can Do That Large Language and Language-and-Vision Models Cannot (Yet). Perspectives on Psychological Science, 19(5), 874–883. https://doi.org/10.1177/17456916231201401
Note: I took the feature image in Cambridge.
[1] It is pity that ‘hallucination’ has become the term used here, as it gives a misleading sense of how the LLMs work.
This text, part of the #ODDStories series, tells a story of Open Data Day‘s grassroots impact directly from the community’s voices.
The Conversatory Open Data and New Technologies to Face the Polycrisis, held on 19 March in the city of Xalapa, Veracruz, Mexico, brought together teachers and researchers from different fields with a common characteristic: working with open data. The event took place at the Faculty of Statistics and Informatics (FEI) of the Universidad Veracruzana (UV). It was attended by professors and students from the Statistics, Data Science Engineering and Computer Science courses offered at the same institution.
The event was inaugurated by Dr Minerva Reyes Felix, FEI’s Academic Secretary, and organised by MCD Lorena Lopez Lozada, leader of the Open Data Project in its eighth edition as organiser.
Reliable open data, useful for managing the polycrisis
Lorena López Lozada explained that “the objective of the event is to share, from different perspectives, the current situation where several global crises are happening at the same time and their combined impact is greater than their individual impact, the so-called polycrisis, and how the use of open data allows us to visualise the social context and support decision-making based on reliable information”.
Each of the panelists shared enriching opinions, as their different fields of study allowed for a multidisciplinary discussion. The debate was chaired by MGC José Fabián Muñoz and was divided into three main questions:
What has been your experience in working with open data?
What kind of open data can be used to address the polycrisis?
What do you recommend for the generation of open data regarding infrastructure, security, reliability, among others?
Experience in working with open data
Nicandro Cruz Ramírez, a professor at the Artificial Intelligence Research Institute (IIIA) of the Universidad Veracruzana, began this round of questions. He mentioned that he was fortunate to have open data and to be able to work with it. One example was ecological data, which can be used to analyse climate change and, in this case, the loss of ecological integrity of top predators (large and strong species in the area such as bears, pumas and lions). Creating artificial intelligence models to identify the causes of this problem and assess the impact of future construction projects in biodiverse areas.
In this conversation, Angel Fernando Argüello Ortíz, a professor at FEI, shared examples of how census information has been key to the implementation of projects and the difficulties he has encountered when this information is not freely available. He also expressed that information is power, but that it must be used correctly, as “all power carries a great responsibility, because sometimes it generates benefits, but sometimes it generates disadvantages, so it is important to use it correctly”.
Types of open data to address the polycrisis
Another participant in the discussion was Agustín Fernández Eguiarte, a researcher at the Institute of Atmospheric Sciences and Climate Change at the National Autonomous University of Mexico (UNAM), who is on a research visit to the UV’s Centre for Earth Sciences (CCT). He mentioned that “it is necessary to have climate, climate change, hydro-meteorological and environmental data, but structured as open fair data in unconditional standards or metadata and integrated in data repositories”. As examples, he has developed an open data repository on climate change and tropical cyclones in the state of Veracruz, which includes an interactive visualisation in addition to the databases; and an open data repository on the Pico de Orizaba.
The information from both will be made available on the Internet for those interested in the topics, “because to face the polycrisis we need reliable data and information, with quality control, to support any policy, action or programme to address climate change,” said the researcher.
Ángel Juan Sánchez García, a lecturer at FEI, said that although it is impossible to solve the world with data, it is possible to have an idea of the problems that are occurring at a global level. He mentions that in Mexico some sectors do not have a culture of generating data, which makes it difficult to analyse their own areas. This could be linked to human factors (lack of empathy for sharing data) and resistance to analysis due to possible social alarm (touching on real but sensitive issues).
Recommendations for open data production
To conclude this series of questions, each of the participants shared their recommendations for those using or generating open data. We can divide their opinions into five sections:
Create a culture of information management. Knowing how to gather information based on project objectives and being willing to share that information and allow others to access and consult it.
Collaboration. Pupils and students are encouraged to work in teams. Projects are more practical and easier to carry out when a multidisciplinary team works together.
Technological infrastructure. Ensure the quality, security, reliability and other attributes necessary to work with open data, as a lack of these will limit its analysis.
Working with other types of data. As Ángel Juan Sánchez García mentions, “we need to venture into other types of non-conventional data”, because in many cases, there is a very rigid idea of what a database is, which limits the variety of analyses that can be carried out.
Legislation. One of the participants mentioned the need for legislation on the management and collection of information, despite the existence of legislation. In the case of Mexico, the General Directorate of Access Policies of the Secretariat for Access to Information of the National Institute for Transparency, Access to Information and Protection of Personal Data (INAI) prepared a document in 2023, which was presented as a “draft” to the Technical Group of the National Open Data Policy. Subsequently, it was presented to the different feedback spaces that made up the construction of this public policy, proposed within the framework of the Open Mexico Strategy and its corresponding methodology. For its elaboration, the Minimum Criteria and Methodology for the Design and Documentation of Policies for Access to Information, Proactive Transparency and Open Government, approved by the National System of Transparency, Access to Public Information and Protection of Personal Data, and published in the Official Gazette of the Federation on 23 November 2017, were used as a reference. However, this policy has not come to fruition, as INAI has disappeared in the current government administration.
Despite the existence of the Open Data Charter and its principles, governments are not yet committed to having open data in all possible areas of information sharing with citizens.
With the support of the Open Knowledge Foundation and Datopian for the development of the Conversatory, the academic event marked a significant step towards strengthening the promotion and use of open data, as well as the socialisation between students and teachers and researchers, sharing in a healthy environment and also contributing to the integral development of students.
About Open Data Day
Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities. ODD is led by the Open Knowledge Foundation (OKFN) and the Open Knowledge Network.
As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date over one week. In 2025, a total of 189 events happened all over the world between March 1st and 7th, in 57 countries using 15+ different languages. All outputs are open for everyone to use and re-use.
In seventh grade, Miss Phillips had me memorize "Paul Revere's Ride" by Henry Wadsworth Longfellow. So I did. After finishing "Jabberwocky" to start off the year of run naming, it seemed obvious what my next effort would be. I calculated that I could arrange to end it on the day of the Boston Marathon, thus neatly tying the verse with the running. And to top it off, the "18th of April" cited in the poem was exactly 250 years ago on Friday.
"Paul Revere's Ride" was first published in The Atlantic Monthly in 1861.
On looking up the poem, also titled "The Landlord's Tale", I discovered the poem's political undertones. It was written in the leadup to the Civil War, and Longfellow had been outspoken as an abolishionist. The poem was a call to action to Northerners, recalling their role in the American Revolution. So not irrelevant to the current situation.
In January for a Daily Mail article, Miriam Kuepper interviewed Salomé Balthus a "high-end escort and author from Berlin" who works the World Economic Forum. Balthus reported attitudes that clarify why "3C Here We Come" is more likely. The article's full title is:
What the global elite reveal to Davos sex workers: High-class escort spills the beans on what happens behind closed doors - and how wealthy 'know the world is doomed, so may as well go out with a bang'
Below the fold I look into a wide range of evidence that Balthus' clients were telling her the truth.
Kuepper quotes Balthus:
'The elephant in the room is climate change. Everyone knows it can't be prevented any more,' she said, adding that the 'super rich' could generally be split into two groups on the topic.
'The one group thinks it only affects the poor, the "not-white race", while the others fear that it could get worse but there's no sense in trying to do anything about it so they just enjoy themselves,' she told MailOnline.
'The one half is in despair and the other, dumber, half is celebrating future mass deaths.
Salome elaborated that some of the uber wealthy people fitting into the first group were saying that those in third world countries 'might all die but us in the North, we're fine'.
She said: 'They say that in a democracy you have to sell it, to lie to people and tell them "we didn't know better and didn't think it would get this bad", not admitting that they know.
'Then there's the other group that thinks it might not be so easy, maybe it will also affect us due to unforeseeable chain reactions.
'But they say they can't do anything against the others so they live following the mantra "after us, the deluge".
'They say they will enjoy a few more nice years on earth and know that there's no future. They are very cynical and somehow deeply sad.'
This attitude matches Schmidt's fatalism — we're doomed but we might as well make money/have fun until then. What it misses is that everything they're doing is bringing "until then" closer. As I wrote about Schmidt:
He is right that “we’re not going to hit the climate goals anyway", but that is partly his fault. Even assuming that he's right and AI is capable of magically "solving the problem", the magic solution won't be in place until long after 2027, which is when at the current rate we will pass 1.5C. And everything that the tech giants are doing right now is moving the 1.5C date closer.
Economic models have systematically underestimated how global heating will affect people’s wealth, according to a new study that finds 4C warming will make the average person 40% poorer – an almost four-fold increase on some estimates.
The reason for the underestimation is that their model of the effect of climate change on a country's GDP accounts only for in-country effects. Reconsidering the macroeconomic damage of severe warming by Timothy Neal et al from the University of New South Wales instead models global weather's effects on a world of interconnected supply chains:
Figure 1 shows the projected percentage reduction in global GDP from a high emissions future (SSP5-8.5) relative to a lower emissions future (SSP1-2.6), for the three models outlined in section 2.2. Each economic model is run with and without global weather to determine its impact on the projections. Without the inclusion of global weather (blue line), all three models project more mild economic losses with a median loss at 2100 of −28% for the Burke15 model, −4% for the Kahn21 model, and −11% for the Kotz24 model. Projected losses from the latter two models are more optimistic than in the original articles, likely due to the variations in data and exact assumptions made.
The study by Australian scientists suggests average per person GDP across the globe will be reduced by 16% even if warming is kept to 2C above pre-industrial levels. This is a much greater reduction than previous estimates, which found the reduction would be 1.4%.
Today, the wealthiest middle-aged and older adults in the U.S. have roughly the same likelihood of dying over a 12-year period as the poorest adults in northern and western Europe, according to a study published Wednesday in The New England Journal of Medicine.
Heat waves, wildfires, floods, tropical storms and hurricanes are all increasing in scale, frequency and intensity, and the World Health Organization forecasts that climate change will cause 250,000 additional deaths each year by the end of this decade from undernutrition, malaria, diarrhea and heat stress alone. Even so, the impact on human health and the body count attributed to extreme weather remain massively underreported — resulting in a damaging feedback loop of policy inaction. Meanwhile, the very people who might fix that problem, at least in the US, are being fired en masse amid the Trump administration’s war on science.
But we know countries aren't going to "hit short-term and long-term climate targets" because, among other reasons, it would prevent us achieving the wonderful benefits of AI such as generated images in the style of Studio Ghibli.
Owing to “recent setbacks to global decarbonization efforts,” Morgan Stanley analysts wrote in a research report last month, they “now expect a 3°C world.” The “baseline” scenario that JP Morgan Chase uses to assess its own transition risk—essentially, the economic impact that decarbonization could have on its high-carbon investments—similarly “assumes that no additional emissions reduction policies are implemented by governments” and that the world could reach “3°C or more of warming” by 2100. The Climate Realism Initiative launched on Monday by the Council on Foreign Relations similarly presumes that the world is likely on track to warm on average by three degrees or more this century. The essay announcing the initiative calls the prospect of reaching net-zero global emissions by 2050 “utterly implausible.”
...
Bleak as warming projections are, a planet where governments and businesses fight to the death for their own profitable share of a hotter, more chaotic planet is bleaker still. The only thing worse than a right wing that doesn’t take climate change seriously might be one that does, and can muster support from both sides of the aisle to put “America First” in a warming, warring world.
Of course, we might as well "enjoy a few more nice years on earth", because the 70-year-old Schmidt (and I) will be dead long before 2100. Our grandchildren will just have to figure something out. In the meantime we need to make as much money as possible so the grandchildren can afford their bunkers.
Lets look at some of the actions of the global elite instead of the words they use when not talking to their escorts, starting with Nvidia the picks and shovels provider to the AI boom.
Nvidia's path forward is clear: its compute platforms are only going to get bigger, denser, hotter and more power hungry from here on out. As a calorie deprived Huang put it during his press Q&A last week, the practical limit for a rack is however much power you can feed it.
"A datacenter is now 250 megawatts. That's kind of the limit per rack. I think the rest of it is just details," Huang said. "If you said that a datacenter is a gigawatt, and I would say a gigawatt per rack sounds like a good limit."
The NVL72 is a rackscale design inspired heavily by the hyperscalers with DC bus bars, power sleds, and networking out the front. And at 120kW of liquid cooled compute, deploying more than a few of these things in existing facilities gets problematic in a hurry. And this is only going to get even more difficult once Nvidia's 600kW monster racks make their debut in late 2027.
This is where those "AI factories" Huang keeps rattling on about come into play — purpose built datacenters designed in collaboration with partners like Schneider Electric to cope with the power and thermal demands of AI.
So Nvidia plans to increase the power draw per rack by 10x. The funds to build the "AI factories" to house them are being raised right now as David Gerard reports in a16z raising fresh $20b to keep the AI bubble pumping:
Venture capital firm Andreessen Horowitz, affectionately known as a16z, is looking for investors to put a fresh $20 billion into AI startups. [Reuters]
For perspective, that’s more than all US venture capital funding in the first three months of 2025, which was $17 billion. [PitchBook]
This means that a16z think there’s at least this much money sloshing about, sufficiently desperate for a return.
PitchBook says a16z is talking up its links to the Trump administration to try to recruit investors — the pitch is to get on the inside of the trading! This may imply a less than robustly and strictly rule-of-law investment environment.
The owners of a recently demolished coal-fired power plant announced the site will become a data center powered by the largest natural gas plant in the country.
The Homer City Generating Station in Indiana County was decommissioned in 2023 and parts of it were imploded last month. It had been at one time the largest coal-fired power plant in Pennsylvania.
The plant’s owners, Homer City Redevelopment, announced the site will become a 3,200-acre data center campus for artificial intelligence and other computing needs
The "largest natural gas plant in the country" will be pumping out carbon dioxide for its predicted service life of 75 years, into the 3C period of 2100.
Taken together, the measures represent a sweeping attempt to ensure coal remains part of the US electricity mix, despite its higher greenhouse gas emissions and frequently greater cost when compared to natural gas or solar power.
The effort also underscores Trump’s commitment to tapping America’s coal resources as a source of both electricity to run data centers and heat to forge steel. The president and administration officials have made clear boosting coal-fired power is a priority, one they see as intertwined with national security and the US standing in a global competition to dominate the artificial intelligence industry.

Amazon, Microsoft and Google are operating data centres that use vast amounts of water in some of the world’s driest areas and are building many more, an investigation by SourceMaterial and The Guardian has found.
With US President Donald Trump pledging to support them, the three technology giants are planning hundreds of data centres in the US and across the globe, with a potentially huge impact on populations already living with water scarcity.
“The question of water is going to become crucial,” said Lorena Jaume-Palasí, founder of The Ethical Tech Society. “Resilience from a resource perspective is going to be very difficult for those communities.”
Efforts by Amazon, the world’s biggest online retailer, to mitigate its water use have sparked opposition from inside the company, SourceMaterial’s investigation found, with one of its own sustainability experts warning that its plans are “not ethical”.
Amazon’s three proposed data centres in Aragon, northern Spain—each next to an existing Amazon data centre—are licensed to use an estimated 755,720 cubic metres of water a year, enough to irrigate more than 200 hectares (500 acres) of corn, one of the region’s main crops.
In practice, the water usage will be even higher as that figure doesn’t take into account water used in generating electricity to power the new installations, said Aaron Wemhoff, an energy efficiency specialist at Villanova University in Pennsylvania.
Between them, Amazon’s planned Aragon data centres will use more electricity than the entire region currently consumes. Meanwhile, Amazon in December asked the regional government for permission to increase water consumption at its three existing data centres by 48 per cent.
Opponents have accused the company of being undemocratic by trying to rush through its application over the Christmas period. More water is needed because “climate change will lead to an increase in global temperatures and the frequency of extreme weather events, including heat waves”, Amazon wrote in its application.
Right. We need to use more water to cope with the "extreme weather events, including heat waves" we are causing, which will allow us to cause more "extreme weather events" which will mean we need more water! It is a vicious cycle.
Is there really a demand for these monsters? One of Nvidia's big customers is CoreWeave:
In my years writing this newsletter I have come across few companies as rotten as CoreWeave — an "AI cloud provider" that sells GPU compute to AI companies looking to run or train their models.
CoreWeave had intended to go public last week, with an initial valuation of $35bn. While it’s hardly a recognizable name — like, say, OpenAI, or Microsoft, or Nvidia — this company is worth observing, if not for the fact that it’s arguably the first major IPO that we’ve seen from the current generative AI hype bubble, and undoubtedly the biggest.
The initial public offering of AI infrastructure firm CoreWeave, initially targeting a $2.7bn raise at $47-55 per share, was slashed to $1.5bn at $40 per share. Even then, the deal barely limped across the finish line, thanks to a last-minute $250mn “anchor” order from Nvidia. The offering reportedly ended up with just three investors holding 50 per cent of the stock, and it seems to have required some stabilisation from lead bank Morgan Stanley to avoid a first-day drop. Hardly a textbook success.
Imagine a caravan maker. It sells caravans to a caravan park that only buys one type of caravan. The caravan park leases much of its land from another caravan park. The first caravan park has two big customers. One of the big customers is the caravan maker. The other big customer is the caravan maker’s biggest customer. The biggest customer of the second caravan park is the first caravan park.
This, more or less, is the line being taken by AI researchers in a recent survey. Asked whether "scaling up" current AI approaches could lead to achieving artificial general intelligence (AGI), or a general purpose AI that matches or surpasses human cognition, an overwhelming 76 percent of respondents said it was "unlikely" or "very unlikely" to succeed.
"The vast investments in scaling, unaccompanied by any comparable efforts to understand what was going on, always seemed to me to be misplaced," Stuart Russel, a computer scientist at UC Berkeley who helped organize the report, told NewScientist. "I think that, about a year ago, it started to become obvious to everyone that the benefits of scaling in the conventional sense had plateaued."
AI continues to improve – at least according to benchmarks. But the promised benefits have largely yet to materialize while models are increasing in size and becoming more computationally demanding, and greenhouse gas emissions from AI training continue to rise.
These are some of the takeaways from the AI Index Report 2025 [PDF], a lengthy and in-depth publication from Stanford University's Institute for Human-Centered AI (HAI) that covers development, investment, adoption, governance and even global attitudes towards artificial intelligence, giving a snapshot of the current state of play.
...
However, HAI highlights the enormous level of investment still being pumped into the sector, with global corporate AI investment reaching $252.3 billion in 2024, up 26 percent for the year. Most of this is in the US, which hit $109.1 billion, nearly 12 times higher than China's $9.3 billion and 24 times the UK's $4.5 billion, it says.
Despite all this investment, "most companies that report financial impacts from using AI within a business function estimate the benefits as being at low levels," the report writes.
It says that 49 percent of organizations using AI in service operations reported cost savings, followed by supply chain management (43 percent) and software engineering (41 percent), but in most cases, the cost savings are less than 10 percent.
When it comes to revenue gains, 71 percent of respondents using AI in marketing and sales reported gains, while 63 percent in supply chain management and 57 percent in service operations, but the most common level of revenue increase is less than 5 percent.
Meanwhile, despite the modest returns, the HAI report warns that the amount of compute used to train top-notch AI models is doubling approximately every 5 months, the size of datasets required for LLM training is doubling every eight months, and the energy consumed for training is doubling annually.
This is leading to rapidly increasing greenhouse gas emissions resulting from AI training, the report finds. It says that early AI models such as AlexNet over a decade ago caused only modest CO₂ emissions of 0.01 tons, while GPT-4 (2023) was responsible for emitting 5,184 tons, and Llama 3.1 405B (2024) pumping out 8,930 tons. This compares with about 18 tons of carbon a year the average American emits, it claims.
The premise that AI could be indefinitely improved by scaling was always on shaky ground. Case in point, the tech sector's recent existential crisis precipitated by the Chinese startup DeepSeek, whose AI model could go toe-to-toe with the West's flagship, multibillion-dollar chatbots at purportedly a fraction of the training cost and power.
Of course, the writing had been on the wall before that. In November last year, reports indicated that OpenAI researchers discovered that the upcoming version of its GPT large language model displayed significantly less improvement, and in some cases, no improvements at all than previous versions did over their predecessors.
According to OpenAI’s internal tests, o3 and o4-mini, which are so-called reasoning models, hallucinate more often than the company’s previous reasoning models — o1, o1-mini, and o3-mini — as well as OpenAI’s traditional, “non-reasoning” models, such as GPT-4o.
Perhaps more concerning, the ChatGPT maker doesn’t really know why it’s happening.
In its technical report for o3 and o4-mini, OpenAI writes that “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models. O3 and o4-mini perform better in some areas, including tasks related to coding and math. But because they “make more claims overall,” they’re often led to make “more accurate claims as well as more inaccurate/hallucinated claims,” per the report.
OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.
It may well turn out that people put more value on being right than being plausible.
Increasingly, there are other signs that the current, costly, proprietary AI approach is coming to an end. For example, we have Matt Asay's DeepSeek’s open source movement:
It’s increasingly common in AI circles to refer to the “DeepSeek moment,” but calling it a moment fundamentally misunderstands its significance. DeepSeek didn’t just have a moment. It’s now very much a movement, one that will frustrate all efforts to contain it. DeepSeek, and the open source AI ecosystem surrounding it, has rapidly evolved from a brief snapshot of technological brilliance into something much bigger—and much harder to stop. Tens of thousands of developers, from seasoned researchers to passionate hobbyists, are now working on enhancing, tuning, and extending these open source models in ways no centralized entity could manage alone.
For example, it’s perhaps not surprising that Hugging Face is actively attempting to reverse engineer and publicly disseminate DeepSeek’s R1 model. Hugging Face, while important, is just one company, just one platform. But Hugging Face has attracted hundreds of thousands of developers who actively contribute to, adapt, and build on open source models, driving AI innovation at a speed and scale unmatched even by the most agile corporate labs.
Now, researchers at Microsoft's General Artificial Intelligence group have released a new neural network model that works with just three distinct weight values: -1, 0, or 1. Building on top of previous work Microsoft Research published in 2023, the new model's "ternary" architecture reduces overall complexity and "substantial advantages in computational efficiency," the researchers write, allowing it to run effectively on a simple desktop CPU. And despite the massive reduction in weight precision, the researchers claim that the model "can achieve performance comparable to leading open-weight, full-precision models of similar size across a wide range of tasks."
ChatGPT came out in 2022, and the Chinese government declared AI infrastructure a national priority. Over 500 new data centres were announced in 2023 and 2024. Private investors went all-in.
Demand for the data centres turns out not to be there. Around 80% are not actually in use. [MIT Technology Review]
The business model was to rent GPUs. DeepSeek knifed that, much as it did OpenAI. There’s now a lot of cheap GPU in China. Data centre projects are having trouble finding new investment.
The Chinese data centre boom was a real estate deal — many investors pivoted straight from real estate to AI.
Having lived through the early days of the internet frenzy, Fabrice Coquio, senior veep at Digital Realty, which bills itself as the world's largest provider of cloud and carrier-neutral datacenter, colocation and interconnection services, is perhaps better placed than most to venture an opinion. Is there a bubble?
"I have been in this industry for 25 years, so I've seen some ups and downs. At the moment, definitely that's on the very bullish side, particularly because of what people believe will be required for AI," he tells The Register.
Grabbing a box of Kleenex tissues, he quips that back at the turn of the millennium, if investors were told the internet was inside they would have rushed to buy it. "Today I am telling you there is AI inside. So buy it."
"Is there a bubble? Potentially? I see the risk, because when some of the traditional investments in real estate – like housing, logistics and so on – are not that important, people are looking to invest their amazing capacity of available funds in new segments, and they say, 'Oh, why not datacenters?'"
He adds: "In the UK, in France, in Germany, you've got people coming from nowhere having no experiences… that have no idea about what AI and datacenters are really and still investing in them.
"It's the expression of a typical bubble. At the same time, is the driver of AI a big thing? Yes… [with] AI [there] is a sense of incredible productivity for companies and then for individuals. And this might change drastically the way we work, we operate, and we deliver something in a more efficient way.
The "slow AIs" that run the major AI companies hallucinated a future where scaling continued to work and have already sunk vast sums into data centers. The "slow AIs" can't be wrong:
Nonetheless, if Microsoft's commitment to still spending tens of billions of dollars on data centers is any indication, brute force scaling is still going to be the favored MO for the titans of the industry — while it'll be left to the scrappier startups to scrounge for ways to do more with less.
The IEA’s models project that data centres will use 945 terawatt-hours (TWh) in 2030, roughly equivalent to the current annual electricity consumption of Japan. By comparison, data centres consumed 415 TWh in 2024, roughly 1.5% of the world’s total electricity consumption (see ‘Global electricity growth’).
The projections focus mostly on data centres in general, which also run computing tasks other than AI — although the agency estimated the proportion of data-centre servers devoted to AI. They found that such servers accounted for 24% of server electricity demand and 15% of total data-centre energy demand in 2024.
Alex de Vries, a researcher at the Free University of Amsterdam and the founder of Digiconomist, who was not involved with the report, thinks this is an underestimate. The report “is a bit vague when it comes to AI specifically”, he says.
Even with these uncertainties, “we should be mindful about how much energy is ultimately being consumed by all these data centres”, says de Vries. “Regardless of the exact number, we’re talking several percentage of our global electricity consumption.”
There are reasonable arguments to suggest that AI tools may eventually help reduce emissions, as the IEA report underscores. But what we know for sure is that they’re driving up energy demand and emissions today—especially in the regional pockets where data centers are clustering.
So far, these facilities, which generally run around the clock, are substantially powered through natural-gas turbines, which produce significant levels of planet-warming emissions. Electricity demands are rising so fast that developers are proposing to build new gas plants and convert retired coal plants to supply the buzzy industry.
If the data centers get built, they will add to carbon emissions and push us closer to 3C sooner. Of course, this investment in data centers needs to generate a return, but it may well turn out that the market isn't willing to pay enough for Ghibli-style memes to provide it. Ed Zitron has been hammering away at this point, for example in There Is No AI Revolution:
Putting aside the hype and bluster, OpenAI — as with all generative AI model developers — loses money on every single prompt and output. Its products do not scale like traditional software, in that the more users it gets, the more expensive its services are to run because its models are so compute-intensive.
For example, ChatGPT having 400 million weekly active users is not the same thing as a traditional app like Instagram or Facebook having that many users. The cost of serving a regular user of an app like Instagram is significantly smaller, because these are, effectively, websites with connecting APIs, images, videos and user interactions. These platforms aren’t innately compute-heavy, at least to the same extent as generative AI, and so you don’t require the same level of infrastructure to support the same amount of people.
Conversely, generative AI requires expensive-to-buy and expensive-to-run GPUs, both for inference and training the models themselves. The GPUs must be run at full tilt for both inference and training models, which shortens their lifespan, while also consuming ungodly amounts of energy. And surrounding that GPU is the rest of the computer, which is usually highly-specced, and thus, expensive.
OpenAI, as I've written before, is effectively the entire generative AI industry, with its nearest competitor being less than five percent of its 500 million weekly active users.
Source
Ed Zitron has been arguing for more than a year that OpenAI's finances simply don't make sense, and in OpenAI Is A Systemic Risk To The Tech Industry he makes the case in exquisite detail and concludes:
Even in a hysterical bubble where everybody is agreeing that this is the future, OpenAI currently requires more money and more compute than is reasonable to acquire. Nobody has ever raised as much as OpenAI needs to, and based on the sheer amount of difficulty that SoftBank is having in raising the funds to meet the lower tranche ($10bn) of its commitment, it may simply not be possible for this company to continue.
Even with extremely preferential payment terms — months-long deferred payments, for example — at some point somebody is going to need to get paid.
I will give Sam Altman credit. He's found many partners to shoulder the burden of the rotten economics of OpenAI, with Microsoft, Oracle, Crusoe and CoreWeave handling the up-front costs of building the infrastructure, SoftBank finding the investors for its monstrous round, and the tech media mostly handling his marketing for him.
He is, however, over-leveraged. OpenAI has never been forced to stand on its own two feet or focus on efficiency, and I believe the constant enabling of its ugly, nonsensical burnrate has doomed this company. OpenAI has acted like it’ll always have more money and compute, and that people will always believe its bullshit, mostly because up until recently everybody has.
OpenAI cannot "make things cheaper" at this point, because the money has always been there to make things more expensive, as has the compute to make larger language models that burn billions of dollars a year. This company is not built to reduce its footprint in any way, nor is it built for a future in which it wouldn't have access to, as I've said before, infinite resources.
Zitron uses Lehman Brothers as an analogy for the effects of a potential OpenAI failure:
I can see OpenAI’s failure having a similar systemic effect. While there is a vast difference between OpenAI’s involvement in people’s lives compared to the millions of subprime loans issued to real people, the stock market’s dependence on the value of the Magnificent 7 stocks (Apple, Microsoft, Amazon, Alphabet, NVIDIA and Tesla), and in turn the Magnificent 7’s reliance on the stability of the AI boom narrative still threatens material harm to millions of people, and that’s before the ensuing layoffs.
One hint that we might just be stuck in a hype cycle is the proliferation of what you might call “second-order slop” or “slopaganda”: a tidal wave of newsletters and X threads expressing awe at every press release and product announcement to hoover up some of that sweet, sweet advertising cash.
That AI companies are actively patronising and fanning a cottage economy of self-described educators and influencers to bring in new customers suggests the emperor has no clothes (and six fingers).
There are an awful lot of AI newsletters out there, but the two which kept appearing in my X ads were Superhuman AI run by Zain Kahn, and Rowan Cheung’s The Rundown. Both claim to have more than a million subscribers — an impressive figure, given the FT as of February had 1.6mn subscribers across its newsletters.
If you actually read the AI newsletters, it becomes harder to see why anyone’s staying signed up. They offer a simulacrum of tech reporting, with deeper insights or scepticism stripped out and replaced with techno-euphoria. Often they resemble the kind of press release summaries ChatGPT could have written.
Yet AI companies apparently see enough upside to put money into these endeavours. In a 2023 interview, Zayn claimed that advertising spots on Superhuman pull in “six figures a month”. It currently costs $1,899 for a 150-character write-up as a featured tool in the newsletter.
...
“These are basically content slop on the internet and adding very little upside on content value,” a data scientist at one of the Magnificent Seven told me. “It’s a new version of the Indian ‘news’ regurgitation portals which have gamified the SEO and SEM [search engine optimisation and marketing] playbook.”
But newsletters are only the cream of the crop of slopaganda. X now teems with AI influencers willing to promote AI products for minimal sums (the lowest pricing I got was $40 a retweet). Most appear to be from Bangladesh or India, with a smattering of accounts claiming to be based in Australia or Europe. In apparent contravention of X’s paid partnerships policy, none disclose when they’re getting paid to promote content.
...
In its own way, slopaganda exposes that the AI’s emblem is not the Shoggoth but the Ouroboros. It’s a circle of AI firms, VCs backing those firms, talking shops made up of employees of those firms, and the long tail is the hangers-on, content creators, newsletter writers and ‘marketing experts’ willing to say anything for cash.
The AI bubble bursting would be a whole different and much quicker "going out with a bang". How likely is it? To some extent OpenAI is just a front for Microsoft, which gets a slice of OpenAI's revenue, has access to OpenAI's technology, "owns" a slice of the "non-profit", and provides almost all of OpenAI's compute at discounted prices. Microsoft, therefore, has perhaps the best view of the generative AI industry and its prospects.
In February, stock analysts TD Cowen spotted that Microsoft had cancelled leases for new data centres — 200 megawatts in the US, and one gigawatt of planned leases around the world.
Microsoft denied everything. But TD Cowen kept investigating and found another two gigawatts of cancelled leases in the US and Europe. [Bloomberg, archive]
Bloomberg has now confirmed that Microsoft has halted new data centres in Indonesia, the UK, Australia and the US. [Bloomberg, archive]
The Cambridge, UK site was specifically designed to host Nvidia GPU clusters. Microsoft also pulled out of the new Docklands Data Centre in Canary Wharf, London.
In Wisconsin, US, Microsoft had already spent $262 million on construction — but then just pulled the plug.
Mustafa Suleyman of Microsoft told CNBC that instead of being “the absolute frontier,” Microsoft now prefers AI models that are “three to six months behind.” [CNBC]
Google has taken up some of Microsoft’s abandoned deals in Europe. OpenAI took over Microsoft’s contract with CoreWeave. [Reuters]
Ed Zitron covered this "pullback" a month ago in Power Cut:
As a result, based on TD Cowen's analysis, Microsoft has, through a combination of canceled leases, pullbacks on Statements of Qualifications, cancellations of land parcels and deliberate expiration of Letters of Intent, effectively abandoned data center expansion equivalent to over 14% of its current capacity.
...
In plain English, Microsoft, which arguably has more data than anybody else about the health of the generative AI industry and its potential for growth, has decided that it needs to dramatically slow down its expansion. Expansion which, to hammer the point home, is absolutely necessary for generative AI to continue evolving and expanding.
While there is a pullback in Microsoft's data center leasing, it’s seen a "commensurate rise in demand from Oracle related to The Stargate Project" — a relatively new partnership of "up to $500 billion" to build massive new data centers for AI, led by SoftBank, and OpenAI, with investment from Oracle and MGX, a $100 billion investment fund backed by the United Arab Emirates.
The data centers will get built, and they will consume power, because even if AI never manages to turn a profit, some other bubble will take its place to use all the idle GPUs. Despite all the rhetoric about renewables and small modular reactors, much of the additional power will come from fossil fuels.
Data center carbon emissions don't just come from power (Scope 1 and 2). In 2023 Jialin Lyu et al from Microsoft published Myths and Misconceptions Around Reducing Carbon Embedded in Cloud Platforms and stressed the importance of embedded carbon (Scope 3) in the total environmental impact of data centers:
For example, 66% of the electricity used at Google datacenters was matched with renewable energy on an hourly basis in 2021. With historic growth rates, this is likely closer to 70% today. Our LCAs indicate that with 70-75% renewable energy, Scope 3 accounts for close to half of datacenter carbon emissions. Therefore, Scope 3 emissions and embodied carbon are important factors both currently and in the near future.
The Redmond IT giant says that its CO2 emissions are up 29.1 percent from the 2020 baseline, and this is largely due to indirect emissions (Scope 3) from the construction and provisioning of more datacenters to meet customer demand for cloud services.
These figures come from Microsoft's 2024 Environmental Sustainability Report [PDF], which covers the corp's FY2023 ended June 30, 2023. This encompasses a period when Microsoft started ramping up AI support following the explosion of interest in OpenAI and ChatGPT.
Microsoft's "pullback" will likely reduce their Scope 3 emissions going forward, but I would expect that their recent build-out will have reduced the proportion of renewables being consumed. If the Stargate build-out goes ahead it will cause enormous Scope 3 emissions.
Here is one final note of gloom. Training AI models requires rapid access to large amounts of data, motivating data centers to use SSDs instead of hard drives. Counterintuitively, research Seagate published in Decarbonizing Data shows that, despite their smaller size, SSDs have much higher embedded carbon emissions than hard drives. 30TB of SSD has over 160 times as much embedded carbon as a 30TB hard drive.
Even more of a suprise, Seagate's research shows that SSDs even have higher operational emissions than hard drives. While actively reading or writing data, a 30TB SSD uses twice as much power as a 30TB hard drive.
The legacy of Balthus' clients attitude that they 'know the world is doomed, so may as well go out with a bang' and the unsustainable AI bubble will be a massive overbuild of data centers, most of which will be incapable of hosting Nvidia's top-of-the-line racks. If the current cryptocurrency-friendly administration succeeds in pumping Bitcoin back these data centers will likely revert to mining. Either way, the Scope 3 emissions from building and equipping and the Scope 1 and 2 emissions from powering them with natural gas and coal, will put megatons of CO2 into the atmosphere, hastening the point where it is unlikely that 'us in the North, we're fine'.
You can tell this is an extraordinary honor from the list of previous awardees, and the fact that it is the first time it has been awarded in successive years. Part of the award is the opportunity to make an extended presentation to open the meeting. Our talk was entitled Lessons From LOCKSS, and the abstract was:
Vicky and David will look back over their two decades with the LOCKSS Program. Vicky will focus on the Program's initial goals and how they evolved as the landscape of academic communication changed. David will focus on the Program's technology, how it evolved, and how this history reveals a set of seductive, persistent but impractical ideas.
Below the fold is the text with links to the sources, information that appeared on slides but was not spoken, and much additional information in footnotes.
Introduction (Vicky)
Original Logo
First, we are extremely grateful to the Paul Evan Peters award committee and CNI, and to the Association of Research Libraries, EDUCAUSE, Microsoft and Xerox who endowed the award.
David and I are honored and astonished by this award. Honored because it is the premiere award in the field, and astonished because we left the field more than seven years ago to take up our new full-time career as grandparents. The performance metrics are tough, but it’s a fabulous gig.
This talk will be mostly historical. David will discuss the technology's design and some lessons we learned deploying it. First I will talk about our goals when, more than a quarter-century ago, we walked into Michael Keller's office and pitched the idea that became the LOCKSS Program. Mike gave us three instructions:
Don't cost me any money.
Don't get me into trouble.
Do what you want
Support
Ideas
Technology
Michael Lesk
Karen Hunter (CLOCKSS)
Petros Maniatis
Don Waters
James Jacobs (GovDocs)
TJ Giuli
Michael Keller
Martin Halpert Katherine Skinner (1st PLN)
Mema Roussopolous
Brewster Kahle
Clifford Lynch
Mary Baker
Jim Mitchell
Jefferson Bailey
Mark Seiden
John Sack
Gordon Tibbits
LOCKSS team
Susan Horsfall
We succeeded in each of these. That the program is still going is thanks to many people. Snowden Becker who is here today represents a much larger team who work tirelessly to sustain the program. Many others helped along the way. Michael Lesk then at NSF and Donald Waters then at the Mellon Foundation provided essential funding. This slide attempts to thank everyone, but we're sure we've left people out — it was a long time ago.
Let's get started. Over the centuries libraries developed a dual role. By building collections they provided current readers with access to information. Then they exercised stewardship over these collections to safeguard future readers' access.
Libraries transitioned from the print to the digital world over a couple of decades. In the mid 1980’s the Library of Congress experimented with readers accessing journals on 12-inch optical media.
In late 1989 Tim Berners-Lee's first Web browser accessed a page from his first Web server.
Two years later the Stanford Linear Accelerator Center put the first US Web page online and people started thinking about how this new publishing medium would impact the academy. An early effort came in 1993 when Cliff Lynch wrote a 105-page report for the federal Office of Technology Assessment.
Now, consider a library acquiring information in an electronic format. Such information is almost never, today, sold to a library (under the doctrine of first sale); rather, it is licensed to the library that acquires it, with the terms under which the acquiring library can utilize the information defined by a contract typically far more restrictive than copyright law. The licensing contract typically includes statements that define the user community permitted to utilize the electronic information as well as terms that define the specific uses that this user community may make of the licensed electronic information. These terms typically do not reflect any consideration of public policy decisions such as fair use, and in fact the licensing organization may well be liable for what its patrons do with the licensed information.
Cliff's report was wide-ranging and insightful. In particular, he noted the change from the "first sale" doctrine legal framework to a publisher and library specific contract written by the publisher's lawyers.
Very few contracts with publishers today are perpetual licenses; rather, they are licenses for a fixed period of time, with terms subject to renegotiation when that time period expires. Libraries typically have no controls on price increase when the license is renewed; thus, rather than considering a traditional collection development decision about whether to renew a given subscription in light of recent price increases, they face the decision as to whether to lose all existing material that is part of the subscription as well as future material if they choose not to commit funds to cover the publisher's price increase at renewal time.
He pointed out that the change made future readers' access completely dependent upon continued payment and the publishers' whims, thus blocking libraries from fulfilling their critical stewardship role.
In 1995 I was part of the small team that developed Stanford's Highwire Press. Highwire was the first Web publishing platform for academic journals. By then the problems Cliff identified impacting libraries' stewardship role had become obvious. At the time I attended a lot of conferences. A frequent discussion topic was the ramifications of libraries transitioning from content ownership to content access. Many highly placed librarians thought the change was great – no more building collections, no more stewardship responsibility! I strongly disagreed. Hiking with David after one such conference I described how stewardship worked in the paper world and how it didn't in the Web world. His response was "I can build a system that works the way paper does".
David’s and my goal was to model the way paper worked, to provide librarians with an easy, familiar, affordable way to build and steward traditional collections that were migrating from paper to online.
Libraries fulfill their stewardship role when future access is ensured. Stewardship occurs when libraries take possession of and manage cultural and intellectual assets. We thought it vital for libraries to retain their stewardship role in the scholarly communication ecosystem. We didn't want them to become simply convenient places to work and drink coffee[1].
Stewardship matters for at least three reasons:
To protect privacy.
To protect first sale.
To defend against censorship.
Stewardship protects privacy when librarians fight for their patrons’ rights.
VII. All people, regardless of origin, age, background, or views, possess a right to privacy and confidentiality in their library use. Libraries should advocate for, educate about, and protect people’s privacy, safeguarding all library use data, including personally identifiable information.
Adopted June 19, 1939, by the ALA Council; amended October 14, 1944; June 18, 1948; February 2, 1961; June 27, 1967; January 23, 1980; January 29, 2019.
Inclusion of “age” reaffirmed January 23, 1996.
All people have a right to privacy. Librarians should safeguard the privacy of all library use.
Stewardship protects ownership transfer when content is acquired.
The First Sale doctrine is pivotal. It enables the business of libraries. It enables libraries to maintain and circulate knowledge. First Sale ensures that the public, especially future generations, benefit from today's and yesterday's works of literature, science, and culture.
Stewardship resists censorship when there are multiple copies under multiple stewards.
Today, book banning is on the rise. Librarians are being forced to remove items from circulation. Content ownership ensures materials can’t be erased from view without detection. Stewardship of banned materials allows librarians to choose whether to safeguard these materials for future readers.
Government Documents are and always have been in the crosshairs of censors, I’ll mention four efforts providing countervailing forces:
First, the U.S. LOCKSS Docs Network. The Government Publishing Office (GPO), produces and distributes government documents. In the paper world, the Federal Depository Library Program distributed documents to over 1,000 libraries across the nation. To recall documents, the government had to contact the librarians and ask them to withdraw the materials. It was a transparent process.
Sample Withdrawn GPO Documents
Courtesy of James R. Jacobs, Stanford
This is a sample of withdrawn Federal documents.
Online, there were no censorship guardrails. In 2008 a small group of librarians formed the U. S. Docs LOCKSS network. This program is a digital instantiation of the U.S. Federal Depository Library Program. In partnership with the Government Publishing Office, participating libraries have recreated the distributed, transparent, censor resistant nature of the depository paper system.
This is a sample of volumes released this February to the U.S.Docs LOCKSS network
Second, the Canadian Government Information Digital Preservation Network. It consists of 11 academic libraries that use Archive-It (an Internet Archive service) to collect all Canadian federal documents. The collected documents are then moved from the Internet Archive into a local LOCKSS network for distributed safekeeping.
This partnership captures U.S. Government websites at the end of presidential administrations. With this last administrative change, thousands of federal web pages and datasets have been taken offline. Federal web sites hold information important to every corner of a university. The End of Term Archive is an extraordinarily important resource. Oddly only two universities partner with Archive-It to do this work: Stanford and the University of North Texas.
Last, there are many efforts to capture US data sets. The Data Rescue Project serves as a clearing house.
The community recently relearned a lesson history failed to teach. Digital preservation's biggest threat is insider attack. In recent months an unknown number of critical government databases are gone, or altered. The antidote to insider attack is multiple copies under multiple stewards. In LOCKSS language, let’s make it easy to find some of the copies, but hard to find all the copies.
I want to say a few words about sustainability. We worked very hard to make the LOCKSS Program sustainable. Don Waters at the Mellon Foundation awarded LOCKSS a matching grant to transition from grant funding to the Red Hat model of free software and paid support.
Funding programs like LOCKSS is difficult. The LOCKSS Program reinstates stewardship and enables libraries as memory organizations. This is a hard sell. Librarians spend scarce resources to support current readers, spending them to ensure materials are available to tomorrow's readers ... not so much. While fundraising fluctuates, costs are steady. To ensure stability, we accumulated reserves by having a very lean staff and being stingy with salaries.
And then along came CLOCKSS, where publishers took the lead to establish a community run archive that implements library values. In 2006, a handful of publishers, notably the late Karen Hunter, Elsevier, suggested a partnership between libraries and publishers to form a community run archive. In 2008, after a pilot funded by the founding archive libraries, contributing publishers, and the Library of Congress' NDIIPP, the CLOCKSS archive went into production.
Identical copies of archived content are held in eleven libraries worldwide (Scotland, Australia, Japan, Germany, Canada, and six in the United States). This international footprint ensures content is safe from shifting ideologies, or nefarious players. As in all LOCKSS networks, if a bad actor tries to remove or change content, the technology warns humans to investigate.
The CLOCKSS founding librarians and publishers unanimously agreed that when archived content becomes unavailable, it will be hosted from multiple sources, open access. An example: Heterocycles was an important chemistry journal. Established in 1973, it abruptly ceased publication in 2023 after 50 years. Inexplicably the journal also disappeared from the publisher’s web site; current subscribers lost all access. The content was unavailable from anywhere.
Fortunately, the entire run of the Heterocycles journal was archived in CLOCKSS. In June 2024, two CLOCKSS archive libraries, the University of Edinburgh and Stanford University each made all 106 volumes open access on the web.
The CLOCKSS Archive is governed equally by publishers and librarians, in true community spirit. However, publishers provide the bulk of financial support, contributing 70% of incoming funds. Libraries contribute only 30%. Alicia Wise, CLOCKSS executive director reports this gap is wideningover time. Ironically, the publishers many librarians consider “rapacious” are paying for an archive that upholds traditional library values and protects content access for future readers.
After more than a quarter-century, the LOCKSS Program continues to collect, to preserve and to provide access to many genres of content. The business model has evolved, but the goals have persisted. I will now hand over to David to talk about the technology, which has also evolved and persisted.
Technology Design (David)
The ARL Serials Initiative forms part of a special campaign mounted by librarians in the 1980s against the high cost of serials subscriptions. This is not the first time that libraries have suffered from high serial prices. For example, in 1927 the Association of American Universities reported that:
"Librarians are suffering because of the increasing volume of publications and rapidly rising prices. Of special concern is the much larger number of periodicals that are available and that members of the faculty consider essential to the successful conduct of their work. Many instances were found in which science departments were obligated to use all of their allotment for library purposes to purchase their periodical literature which was regarded as necessary for the work of the department"
The power imbalance between publishers and their customers is of long standing, and it especially affects the academic literature.[2] Simplistic application of Web technology drove a change from purchasing a copy of the content to renting access to the publisher's copy.[3] This greatly amplifies the preexisting power imbalance. Thus in designing the LOCKSS system, we faced three challenges:
to model for the Web the way libraries worked on paper,
to somehow do so within the constraints of contract law and copyright,
to ensure the system cost was negligible compared to subscription costs.
From a system engineering viewpoint, what struck me about Vicky's description of the paper library system was that libraries' circulating collections form a model fault-tolerant decentralized system. It is highly replicated, and exploits this to deliver a service that is far more reliable than any individual component. There is no single point of failure, no central control to be subverted. The desired behavior of the system as a whole emerges as the participants take actions in their own local interests and cooperate in ad-hoc, informal ways with other participants.
LOCKSS Design Goals
Allow libraries to:
Collect journals to which they subscribed
Give current readers access to their collection
Preserve their collection for future readers
Cooperate with other libraries
The system I envisaged on the hike would consist of a LOCKSS box at each library, the digital analog of the stacks, that would hold the content the library had purchased. It would need these characteristics of the paper system:
It would allow libraries to collect material to which they subscribed from the Web.
It would allow libraries' readers to access material they had collected.
It would allow them to preserve their collections against the multiple frailties of digital information.
It would allow libraries to cooperate, the analog of inter-library loan and copy.
Collect
The collect part was both conceptually simple and mostly off-the-shelf. Since the journals were pay-walled, as with paper each library had to collect its own subscription content. But collecting content is what Web browsers do. When they fetch content from a URL they don't just display it, they store it in a cache on local storage. They can re-display it without re-fetching it. The system needed a browser-like "Hotel California" cache that never got flushed, and a Web crawler like those of search engines so that all the library's subscribed content ended up in the cache.
Because we lacked "first sale" rights, the crawler had to operate with permission from the publisher, which took the form of a statement on their Web site. No permission, no collection.
Access
The access part was also conceptually simple and mostly off-the-shelf. Readers should see the content from the publisher unless it wasn't available. Their LOCKSS box should act as a transparent Web proxy, forwarding requests to the publisher and, if the response were negative, responding with the cached copy.
Preserve
The preserve part was conceptually simple — just don't remove old content from the cache on disk. But it was more of a problem to implement for three reasons:
Disks are not 100% reliable in the short term and are 0% reliable over library timescales. Over time, content in the cache would get corrupted or lost.
Because libraries were under budget pressure and short of IT resources, the hardware of the LOCKSS box had to be cheap, thus not specially reliable.
Content in the cache would be seen by humans only in exceptional circumstances, so detecting corruption or loss could not depend upon humans.
Cooperate
Cooperation provided the solution to the problems of preservation. We expected considerable overlap between libraries' subscriptions. Thus each journal would be collected by many libraries, just as in the paper system. The LOCKSS boxes at each library subscribing to each journal could compare their versions, voting on it in a version of the standard Byzantine Fault Tolerance algorithm. A library that lost a vote could repair their damaged copy from another library.
The goal of stewardship drove LOCKSS' approach to preservation; given a limited budget and a realistic range of threats, data survives better in many cheap, unreliable, loosely-coupled replicas than in a single expensive, durable one.
Technology Lessons (David)
Our initial vision for the system was reasonably simple, but "no plan survives contact with the enemy" and so it was as we developed the system and deployed it in production. Now for some lessons from this process that are broadly applicable.
Format Migration
In January 1995 the idea that the long-term survival of digital information was a significant problem was popularized by Jeff Rothenberg's Scientific American article Ensuring the Longevity of Digital Documents. Rothenberg's concept of a "digital document" was of things like Microsoft Word files on a CD, individual objects encoded in a format private to a particular application. His concern was that the rapid evolution of these applications would eventually make it impossible to access the content of objects in that format. He was concerned with interpreting the bits; he essentially assumed that the bits would survive.
But thirty years ago next month an event signaled that Rothenberg's concerns had been overtaken by events. Stanford pioneered the transition of academic publishing from paper to the web when Highwire Press put the Journal of Biological Chemistry on the Web. Going forward the important information would be encoded in Web formats such as HTML and PDF. Because each format with which Rothenberg was concerned was defined by a single application it could evolve quickly. But Web formats were open standards, implemented in multiple applications. In effect they were network protocols, and thus evolve at a glacial pace.[4]
The rapid evolution of Rothenberg's "digital documents" had effectively stopped, because they were no longer being created and distributed in that way. Going forward, there would be a static legacy set of documents in these formats. Libraries and archives would need tools for managing those they acquired, and eventually emulation, the technique Rothenberg favored, would provide them. But by then it turned out that, unless information was on the Web, almost no-one cared about it.
Thus the problem for digital preservation was the survival of the bits, aggravated by the vast scale of the content to be preserved. In May the following year 2004 Paul Evan Peters awardee Brewster Kahle established the Internet Archive to address the evanescence of Web pages.[5] This was the first digital preservation effort to face the problems of scale - next year the archive will have collected a trillion Web pages.[6]
The LOCKSS system, like the Wayback Machine, was a system for ensuring the survival of, and access to, the bits of Web pages in their original format. This was a problem; the conventional wisdom in the digital preservation community was that the sine qua non of digital preservation was defending against format obsolescence. Neither Kahle nor we saw any return on investing in format metadata or format migration. We both saw scaling up to capture more than a tiny fraction of the at-risk content as the goal. Events showed we were right, but at the time the digital preservation community viewed LOCKSS with great skepticism, as "not real digital preservation".
The LOCKSS team repeatedly made the case that preserving Web content was a different problem from preserving Rothenberg's digital documents, and thus that applying the entire apparatus of "preservation metadata", PREMIS, FITS, JHOVE, and format normalization to Web content was an ineffective waste of scarce resources. Despite this, the drumbeat that LOCKSS wasn't "real digital preservation" continued.
After six years, the LOCKSS team lost patience and devoted the necessary effort to implement a capability they were sure would never be used in practice. The team implemented, demonstrated and in 2005 published transparent, on-demand format migration of Web content preserved in the LOCKSS network. This was possible because the specification of the HTTP protocol that underlies the Web supports the format metadata needed to render Web content. If it lacked such metadata, Web browsers wouldn't be possible. The criticism continued unabated.[7]
There have been a number of services based instead upon emulation, the technique Rothenberg preferred. Importantly, Ilya Kreymer's oldweb.today uses emulation to show preserved Web content as it did in a contemporaneous browser not as it does in a modern browser.
Around 6th December 1991 Paul Kunz at the Stanford Linear Accelerator Center bought up the first US Web site.[8]
In a foreshadowing of future problems its content was dynamic. It was a front-end for querying databases; although the page itself was static clicking on the links potentially returned different content as the underlying database was edited.
Digital documents in a distributed environment may not behave consistently; because they are presented both to people who want to view them and software systems that want to index them by computer programs, they can be changed, perhaps radically, for each presentation. Each presentation can be tailored for a specific recipient.
Cliff Lynch identified the problem that dynamic content posed for preservation. In 2001 he wrote "Each presentation can be tailored for a specific recipient". Which recipient's presentation deserves to be preserved? Can we show a future recipient what they would have seen had they accessed the resource in the past?
there’s a largely unaddressed crisis developing as the dominant archival paradigms that have, up to now, dominated stewardship in the digital world become increasingly inadequate. ... the existing models and conceptual frameworks of preserving some kind of “canonical” digital artifacts ... are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances.
Sixteen years later Lynch was still flagging the problem.
The dynamic nature of Web content proved irresistible to academic journal publishers, despite their content being intended as archival. They added features like citation and download counts, personalizations, and of course advertisements to HTML pages, and watermarked their PDFs. These were all significant problems for LOCKSS, which depended upon comparing the copies ingested by multiple LOCKSS boxes. The comparison process had to filter out the dynamic content elements; maintaining the accuracy of doing so was a continual task.[9]
The fundamental problem is that the Web does not support Universal Resource Names (URNs) but only Universal Resource Locators (URLs). A URN would specify what a resource would consist of, all that a URL specifies is from where a resource can be obtained. As with the first US Web page, what content you obtain from a URL is unspecified and can be different or even unobtainable on every visit.
The reason the Web runs on URLs not URNs is that the underlying Internet's addresses, both IP and DNS, only specify location. There have been attempts to implement a network infrastructure that would support "what" not "where" addresses; think of it as BitTorrent, but at the transport not the content layer.[10]
The goal of digital preservation is to create one or more persistent, accessible replicas of the content to be preserved. In "what" networks, each copy has the same URN. In IP-based networks, each copy has a different URL; to access the replica requires knowing where it is. Thus if the original of the preserved content goes away, links to it no longer resolve.
Starting in 2010, 2017 Paul Evan Peters awardee Herbert Van de Sompel and others made a valiant effort to solve this problem with Memento. Accepting the fact that persistent replicas of content at a URL at different times in the past would have different URLs, they provided an HTTP-based mechanism for discovering the URL of the replica close to a desired time. In some cases, such as Wikis, the original Web site implements the discovery mechanism and the underlying timeline. In other cases, such as the Wayback Machine, the site holding the replica implements the timeline. Since there are likely to be multiple Web archives with replicas of a given URL, Memento in practice depends upon Aggregator services to provide a unified timeline of the replica space.
In "what" networks there would still be a need to provide an aggregated timeline, not discovering the URL of a replica from a desired time, but discovering its URN. Just as in the Web, they would depend upon a mechanism above the transport layer to connect the names into a timeline. Thus, despite its theoretical appeal, "what" networking's practical advantages are less than they appear.
Decentralization
When something is published in print, legitimate copies ... are widely distributed to various organizations, such as libraries, which maintain them as public record. These copies bear a publication date, and the publisher essentially authenticates the claims of authorship ... By examining this record, control of which is widely distributed ... it is possible, even years after publication, to determine who published a given work and when it was published. It is very hard to revise the published record, since this involves all of the copies and somehow altering or destroying them.
In 1994 Lynch had described how "Lots Of Copies Keep Stuff Safe" in the paper world. Compare this with how we summarized libraries' role in our first major paper on LOCKSS, Permanent Web Publishing:
Acquire lots of copies. Scatter them around the world so that it is easy to find some of them and hard to find all of them. Lend or copy your copies when other librarians need them.
Because we were modeling the paper library system, we hoped that the LOCKSS system would obtain the benefits of a decentralized system over a centralized one performing the same function, which in the paper system and in theory are significant:
It has the economic advantage that it is hard to compare the total system cost with the benefits it provides because the cost is diffused across many independent budgets.
After a couple of my prototypes proved to be insecure, I worked with a group of amazing Stanford CS Ph.D students to design a decentralized peer-to-peer network secured by Proof-of-Work. The 2003 paper describing it won "Best Paper" at the prestigious Symposium on Operating System Principles. This was five years before Satoshi Nakamoto published his decentralized peer-to-peer network secured by Proof-of-Work.
Unfortunately in the digital world it is extraordinarily difficult to reap the theoretical benefits of decentralization. I laid out the reason why this is so a decade ago in Economies of Scale in Peer-to-Peer Networks. In brief, the mechanism described by W. Brian Arthur in his 1994 book Increasing Returns and Path Dependence in the Economy operates. Technology markets have very strong increasing returns to scale. The benefits from participating in a decentralized digital system increase faster than the costs, which drives centralization.[11] Thirty years later, Arthur's work explains today's Web perfectly.
Their user experience is worse, being more complex, slower and less predictable. An example is that Bitcoin's transaction rate is limited by its 10-second block time.
They are in practice only as decentralized as the least decentralized layer in the stack.
Their excess costs cause emergent behaviors that drive centralization.
The fundamental problem is that most layers in the software stack are highly concentrated, starting with the three operating systems. Network effects and economies of scale apply at every layer.
The LOCKSS system was designed and implemented to be completely decentralized. It was permissionless; nodes could join and leave the network as they liked. We designed the network protocol to be extremely simple, both to avoid security flaws, and also in the hope that there would be multiple implementations, avoiding single points of failure. There were a number of reasons why, over time, it turned out much less decentralized than we hoped:
Although we always paid a lot of attention to the security of LOCKSS boxes, we understood that a software mono-culture was vulnerable to software supply chain attacks. But it turned out that the things that a LOCKSS box needed to do other than handling the protocol were quite complex, so despite our best efforts we ended up with a software monoculture.
We hoped that by using the BSD open-source license we would create a diverse community of developers, but we over-estimated the expertise and the resources of the library community, so Stanford provided the overwhelming majority of the programming effort.
Don Waters was clear that grant funding could not provide the long-term sustainability needed for digital preservation. So he provided a matching grant to fund the transition to being funded by the system's users. This also transitioned the system to being permissioned, as a way to ensure the users paid.
Although many small and open-access publishers were happy to allow LOCKSS to preserve their content,
the oligopoly publishers never were. Eventually they funded a completely closed network of huge systems at major libraries around the world called CLOCKSS. This is merely the biggest of a number of closed, private LOCKSS networks that were established to serve specific genres of content, such as government documents.
Mono-culture risk is pervasive throughout the stacks of digital preservation systems. For example, for very good reasons almost all are based on X86 hardware and an open-source stack starting with the Linux kernel. These very good reasons outweigh the benefits of diversity in the stack. And, of course, the problem of mono-culture risk is generic throughout IT due to the network effects and economies of scale described by Brian Arthur. It is especially worrying in an era of zero-day vulnerabilities and sophisticated software supply chain attacks such as the recent $1.5B heist from Bybit.[12]
Archival Media (David)
Don't, don't, don't, don't believe the hype!
Public Enemy
We have already warned you against three seductive but impractical ideas; format migration, "what" networking and decentralization. My parting gift to you is to stop you wasting time on another seductive but impractical idea — that the solution to digital preservation is quasi-immortal media. What follows is an extract from a talk at Berkeley last month.
Archival Data
Over time, data falls down the storage hierarchy.
Data is archived when it can't earn its keep on near-line media.
Lower cost is purchased with longer access latency.
What is a useful definition of archival data? It is data that can no longer earn its keep on readily accessible storage. Thus the fundamental design goal for archival storage systems is to reduce costs by tolerating increased access latency. Data is archived, that is moved to an archival storage system, to save money. Archiving is an economic rather than a technical issue.[13]
The mainstream media occasionally comes out with an announcement like this from the Daily Mail in 2013, or this from the New Yorker last month. Note the extrapolation from "a 26 second excerpt" to "every film and TV program ever created in a teacup".
Six years later, this is a picture of, as far as I know, the only write-to-read DNA storage drive ever demonstrated, from the Microsoft/University of Washington team that has done much of the research in DNA storage. It cost about $10K and took 21 hours to write then read 5 bytes.
The technical press is equally guilty. The canonical article about some development in the lab starts with the famous IDC graph projecting the amount of data that will be generated in the future. It goes on to describe the amazing density some research team achieved by writing say a megabyte into their favorite medium in the lab, and how this density could store all the world's data in a teacup for ever. This conveys five false impressions.
Market Size
First, that there is some possibility the researchers could scale their process up to a meaningful fraction of IDC's projected demand, or even to the microscopic fraction of the projected demand that makes sense to archive. There is no such possibility. Archival media is a much smaller market than regular media.
IBM's Georg Lauhoff and Gary M Decad's slide shows that the size of the market in dollar terms decreases downwards. LTO tape is less than 1% of the media market in dollar terms and less than 5% in capacity terms.[14]
Timescales
Second, that the researcher's favorite medium could make it into the market in the timescale of IDC's projections. Because the reliability and performance requirements of storage media are so challenging, time scales in the storage market are much longer than the industry's marketeers like to suggest.
Take, for example, Seagate's development of the next generation of hard disk technology, HAMR, where research started twenty-six years ago. Nine years later in 2008 they published this graph, showing HAMR entering the market in 2009. Seventeen years later it is only now starting to be shipped to the hyper-scalers. Research on data in silica started fifteen years ago. Research on the DNA medium started thirty-six years ago. Neither is within five years of market entry.
Customers
Third, that even if the researcher's favorite medium did make it into the market it would be a product that consumers could use. As Kestutis Patiejunas figured out at Facebook more than a decade ago, because the systems that surround archival media rather than the media themselves are the major cost, the only way to make the economics of archival storage work is to do it at data-center scale but in warehouse space and harvest the synergies that come from not needing data-center power, cooling, staffing, etc.
Storage has an analog of Moore's Law called Kryder's Law, which states that over time the density of bits on a storage medium increases exponentially. Given the need to reduce costs at data-center scale, Kryder's Law limits the service life of even quasi-immortal media. As we see with tape robots, where data is routinely migrated to newer, denser media long before its theoretical lifespan, what matters is the economic, not the technical lifespan of a medium.
Fourth, that anyone either cares or even knows what medium their archived data lives on. Only the hyper-scalers do. Consumers believe their data is safe in the cloud. Why bother backing it up, let alone archiving it, if it is safe anyway? If anyone really cares about archiving they use a service such as Glacier, when they definitely have no idea what medium is being used.
Fifth, the idea that with quasi-immortal media you don't need Lots Of Copies to Keep Stuff Safe.[15]
Media such as silica, DNA, quartz DVDs, steel tape and so on address bit rot, which is only one of the threats to which long-lived data is subject. Clearly a single copy on such media is still subject to threats including fire, flood, earthquake, ransomware, and insider attacks. Thus even an archive needs to maintain multiple copies. This greatly increases the cost, bringing us back to the economic threat.
The reason why this focus on media is a distraction is that the cost per terabyte of the medium is irrelevant, what drives the economic threat is the capital and operational cost of the system. It is only by operating at data-center scale and thus amortizing the capital and operational costs over very large amounts of data that the system costs per terabyte can be made competitive.
The fundamental idea behind LOCKSS was that, given a limited budget and a realistic range of threats, data would survive better in many cheap, unreliable, loosely-coupled replicas than in a single expensive, durable one.
When giving talks about LOCKSS Vicky or I often used to feel like the Sergeant in Alice's Restaurant who "spoke for 45 minutes and nobody understood a word he said". We hope that this time we did better. Lets see if we did as we answer your questions.
Footnotes
Links to Lynch's Office of Technology Assessment report have been replaced by links to the Wayback Machine's copy collected by the End Of Term Crawl. The original was hosted at the Dept. of Education's ERIC website. The Department is currently at risk of being shut down.
In 2006 Vicky predicted that, without collection stewardship, libraries and Starbucks would become indistinguishable. Here is a real Starbucks ad, with one minor addition.
Four years later this prediction came true; Starbucks populated its WiFi networks with a wide range of otherwise pay-walled content such as the Wall Street Journal.
Library budgets have struggled with journal costs for close on a century, if not longer!
That is, from a legal framework of the "first sale" doctrine and copyright, to one of contract law and copyright.
The deployment of IPv6, introduced in December 1995, shows that network protocols are extraordinarily difficult to evolve, because of the need for timely updates to many independent implementations. Format obsolescence implies backwards incompatibility; this is close to impossible in network protocols because it would partition the network. As I discussed in 2012's Formats Through Time, the first two decades of the Web showed that Web formats essentially don't go obsolete.
This evanescence comes in two forms, link rot, when links no longer resolve, and content drift, when they resolve to different content.
People's experience of the reliability of their personal data storage is misleading. Reliable, affordable long-term storage at Web scale is an interesting engineering problem.
The irony of this was that format migration was a technique of which Rothenberg’s article disapproved:
Finally, [format migration] suffers from a fatal flaw. ... Shifts of this kind make it difficult or impossible to translate old documents into new standard forms.
At least the journals we archived were not malicious; they had actual content that was the same for everybody. That different readers saw different ads was of interest only to students of advertising. But the opportunity to confine readers in a tailored bubble has turned out to be profitable but disastrous.
The goal of IP and the layers above is to move data. There is an assumption that, in the normal case, the bits vanish from the sender once they have been transported, and also from any intervening nodes.
The goal of CCN is to copy data. A successful CCN request creates a locally accessible copy of some remote content. It says nothing about whether in the process other (cached) copies are created, or whether the content is deleted at the source. None of that is any concern of the CCN node making the request, they are configuration details of the underlying network.
While it has its copy, the CCN node can satisfy requests from other nodes for that content, it is a peer-to-peer network.
Basing networking on the copy-ability of bits rather than the transmissibility of bits makes a huge simplification. In particular, it means that, unlike in IP-based networks but like in BitTorrent, caches (and thus digital preservation) just work.
In CCN all replicas of the same content have the same name as the original; which of them satisfies a request is determined by the CCN network's routing at request time. If the content changes, so does its name. In CCN, the analogs of routers in an IP network are caches, holding recently accessed content and supplying it on request. Some of them are archives, caches that are never flushed like our vision for LOCKSS boxes. Just like routers, unless something goes wrong they are invisible.
Subsystem
Bitcoin
Ethereum
Mining
5
3
Client
1
1
Developer
5
2
Exchange
5
5
Node
3
4
Owner
456
72
The Nakamoto coefficient is the number of units in a subsystem you need to control 51% of that subsystem. Because decentralization applies at each layer of a system's stack, it is necessary to measure each of the subsystems individually. In 2017's Quantifying Decentralization Srinivasan and Lee identified a set of subsystems for public blockchains, and measured them using their proposed "Nakamoto Coefficient". Their table of the contemporary Nakamoto coefficients for Bitcoin and Ethereum makes the case that they were only minimally decentralized.
There is an even bigger problem for Ethereum since the blockchain switched to Proof-of-Stake. The software that validators run is close to a mono-culture. Two of the minor players have recently suffered bugs that took them off-line, as Sam Kessler reports in
Bug That Took Down 8% of Ethereum's Validators Sparks Worries About Even Bigger Outage:
A bug in Ethereum's Nethermind client software – used by validators of
the blockchain to interact with the network – knocked out a chunk of the
chain's key operators on Sunday.
...
Nethermind powers around 8%
of the validators that operate Ethereum, and this weekend's bug was
critical enough to pull those validators offline. ... the Nethermind
incident followed a similar outage earlier in January that impacted
Besu, the client software behind around 5% of Ethereum's validators.
...
Around 85% of Ethereum's validators
are currently powered by Geth, and the recent outages to smaller
execution clients have renewed concerns that Geth's dominant market
position could pose grave consequences if there were ever issues with
its programming.
...
Cygaar cited data from the website execution-diversity.info noting that popular crypto exchanges like Coinbase, Binance and Kraken all rely on Geth
to run their staking services. "Users who are staked in protocols that
run Geth would lose their ETH" in the event of a critical issue," Cygaar
wrote.
Remember "no-one ever gets fired for buying IBM"? At the Ethereum layer, it is "no-one ever gets fired for using Geth" because, if there was ever a big problem with Geth, the blame would be so widely shared.
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently. ... However, can we really say that the uncoordinated choice model is realistic when 90% of the Bitcoin network’s mining power is well-coordinated enough to show up together at the same conference?
What Buterin is saying is that because decentralized systems in the real world are not composed of "many small actors that make decisions independently", there is nothing to stop the small number of large actors colluding, and thus acting as a centralized system.
Ten thousand years is about the age of civilization, so a 10K-year Clock would measure out a future of civilization equal to its past. That assumes we are in the middle of whatever journey we are on – an implicit statement of optimism.
They would like to accompany it with a 10,000-year archive. That is at least two orders of magnitude longer than I am talking about here. We are only just over three-quarters of a century from the first stored-program computer, so designing a digital archive for a century is a very ambitious goal. Note that the design of the Clock of the Long Now is as much social as technical. It is designed to motivate infrequent but continual pilgrimages:
On days when visitors are there to wind it, the calculated melody is transmitted to the chimes, and if you are there at noon, the bells start ringing their unique one-time-only tune. The 10 chimes are optimized for the acoustics of the shaft space, and they are big.
Finally, way out of breath, you arrive at the primary chamber. Here is the face of the Clock. A disk about 8 feet in diameter artfully displays the natural cycles of astronomical time, the pace of the stars and the planets, and the galactic time of the Earth’s procession. If you peer deep into the Clock’s workings you can also see the time of day.
But in order to get the correct time, you need to “ask” the clock. When you first come upon the dials the time it displays is an older time given to the last person to visit. If no one has visited in a while, say, since 8 months and 3 days ago, it will show the time it was then. To save energy, the Clock will not move its dials unless they are turned, that is, powered, by a visitor. The Clock calculates the correct time, but will only display the correct time if you wind up its display wheel.
It is noteworthy that in 2023 Optical Archival (OD-3), the most recent archive-only medium, was canceled for lack of a large enough market. It was a 1TB optical disk, an upgrade from Blu-Ray.
No medium is perfect. They all have a specified Unrecoverable Bit Error Rate (UBER) rate. For example, typical disk UBERs are 10-15. A petabyte is 8*1015 bits, so if the drive is within its specified performance you can expect up to 8 errors when reading a petabyte. The specified UBER is an upper limit, you will normally see far fewer. The UBER for LT09 tape is 10-20, so unrecoverable errors on a new tape are very unlikely. But not impossible, and the rate goes up steeply with tape wear.
The property that classifies a medium as quasi-immortal is not that its reliability is greater than regular media to start with, although as with tape it may be. It is rather that its reliability decays more slowly than that of regular media. Thus archival systems need to use erasure coding to mitigate both UBER data loss and media failures such as disk crashes and tape wear-out.
A group of data journalists from Nepal take part in an Open Data Editor training session. Photo: OKNP
The Open Knowledge Foundation (OKFN) is happy to announce the release of Open Data Editor (ODE) 1.4.0, the latest version of our new desktop application for data practitioners to detect errors in tables.
ODE is an easy-to-use, open-source alternative to proprietary data wrangling tools, designed for accessibility and learning – no coding skills required. It finds common spreadsheet errors before you start your analysis, runs on any machine, works offline, respects your privacy, and keeps you in full control of your data, with no vendor lock-in. It also comes with a free online course that can help you make your datasets better, therefore making your life/work easier.
In the short time since the first stable release in December 2024, the application has already had a significant impact among civil society organisations, activists, data journalists and public servants in all parts of the world. Read more about the impact and some use cases here.
Installation
If you have the Open Data Editor’s previous versions installed on your computer, please note that the update will not be done automatically. You will need to download it again using the links in the buttons below.
The new main screen (this one above from the macOS version) now allows infinite scrolling, among other improvements
ODE has been migrated to a different architecture, which significantly improves the user experience and adds features that have been identified as essential in the various feedback sessions and pilots we have been running over the last few months.
New Architecture: Built on PySide6 (a simpler framework), it facilitates a more agile work around improvements and changes.
Infinite Scroll: We removed pagination. Now you don’t have to click to move to the next page and explore data in full. The new version incorporates infinite scroll to help you navigate your tables.
Easier Download: We made communication improvements to help you download ODE without going through GitHub. The ODE now has a brand new landing page with everything you need to know to use the app.
Built-in Error Correction: You can now correct errors directly from the Errors Report panel, while in the past this feature just offered an overview of all errors grouped by categories.
Direct Error Detection: Now files are read and validated directly when you click on them. This increases the speed with which you can detect errors in your tables (but can cause a slower experience when working on big files).
Clearer View: The main Datatable View is now an accurate representation of the file contents, and column names are no longer a mix of data and metadata as before.
Features to simplify your work
The Open Data Editor isn’t another complex data tool – it’s your shortcut to better data and improved data literacy.
Here are a few tasks that ODE 1.4.0 can help you with:
Detect errors in spreadsheets in a matter of seconds
Check if the data formats in your columns are correct
Learn data skills with an intuitive tool
Here is how organisations across the world are using ODE:
Observatoire des armements is working with defence spending data
The Demography Project is focusing on water, air quality and electoral data
Bioinformatics Hub of Kenya Initiative (BHKi) is working with genomic data and metadata
City of Zagreb is tackling the challenges of working with infrastructure data
Open Knowledge Nepal is working with local governments and their infrastructure data
After a complete overhaul of the app’s objectives last year, the tool has been in a new phase of development since the beginning of 2025. Our goal this year is to improve accessibility and enhance digital literacy through no-code tools like ODE.
That’s why we are simplifying its architecture, improving the way metadata and errors are communicated, and intensifying pilots and socialisation activities to encourage ODE’s adoption among people and groups without coding skills.
Improvements in the artificial intelligence integration will be another key focus for this year: our team will seek to replace the current model, based on OpenAI, with an open, local model with open source LLMs.
What’s Next
The new version 1.4.0 represents a major milestone in improving the Open Data Editor experience and performance. But, of course, we still have a long way to go to enable anyone, regardless of their educational background, to improve the quality of their data and therefore improve their data literacy.
Here are some issues that will be addressed in the following releases:
Metadata Panel: We realise that this feature isn’t intuitive enough for people without coding skills. Many people reported being afraid to change the metadata parameters out of unfamiliarity – something the app still doesn’t help with. In the next few months, we will combine this feedback with a complete UX assessment by an external consultant.
Publishing Feature: Publishing data after it has been cleaned is a powerful feature of ODE, but it still doesn’t work fully. For now it’s only possible to publish data on GitHub, and we’ll be working over the next few months to enable publishing on data portal software such as Zenodo and CKAN.
More Feedback: Following our The Tech We Want vision, we want to develop technology that works for people and is good enough to solve real-world problems. With the 1.4.0 version now released, our team will now step back from our desks to listen again to people and organisation’s needs in a new phase of user research.
In addition to these main areas, we will also work on solving the issues listed in the project roadmap.
If you have any questions or want any additional information about ODE, you can contact us at info@okfn.org.
Funding
All of Open Knowledge’s work with the Open Data Editor is made possible thanks to a charitable grant from the Patrick J. McGovern Foundation. Learn more about its funding programmes here.
Earlier this month, Jessy at Library Shenanigans posted some charming onomatopetic instructions for circ workers performing checkins at a freshly-automated library in the 1990s. The “Doot-doot” and “Deedle-deedle-dee” are delightful. They may have been a little annoying. But what struck me about the post is that they solve a problem I’ve been aware of for a good 15+ years. And they did so a good decade before I first recall encountering it.
The Problem: Popups, Barcodes, and Returns
Many library systems use popups to indicate various important item-level conditions that should be addressed after checkin. Maybe a book should be routed to a new location. Maybe it needs to go to the bindery. Maybe it’s on hold for someone. You clear it by clicking a button or pressing the “Enter/Return” key.
But while in an ideal world you’d be giving the screen full attention when checking in, that just doesn’t happen. Circ desks are busy places. Library back rooms are busy places.
The act of scanning a barcode also triggers the “Return” keyboard action.1 Scanning a barcode isn’t just the equivalent of typing it in, it’s the equivalent of typing it in and pressing your Return/Enter key to submit it to the system.
Checkin enough materials with enough going on around you and the following scenario will happen:
You scan an item. A popup appears on your screen with important information. You are distracted by something. You scan the next item without checking. The system does not process the barcode. It processes the Return. It dismisses the message.
The result is: Your first item is checked in but you haven’t acted on the note. Your second item is not checked in because all it did was clear the popup. Issues with both items only show up later on when a supposedly-routed item never returns or a patron complains that they returned the second item.
I first recall becoming aware of this issue while working at Hyattsville’s public library 17 years ago, using Geac PLUS. I don’t know if I just didn’t do enough checkin at Newark (I was a page and did checkin only when they were backed up) or was just too young to notice it. Fortunately, some of the time you catch it by looking at the screen and noticing that the item in your hand isn’t in the list of checkins. Or the next thing it needs has triggered a print action and you hear the receipt printer spit out a routing slip.
The Solution
That printer noise is the one reliable notification I’ve ever experienced. But the mid-90s system whose documentation is shared in Library Shenanigans had it on lock:
Deedle-deedle-dee? Holding shelf
Beep-beep-beep? Belongs to another library
Doot-doot? Temporary item, set aside to be handled.
Those aren’t the only reasons I’ve experienced popups, but that covers quite a few of them!
The Question: Where Did it Go?
The whole thing was on my mind because when I was doing my research interviews last year, several folks brought it up as a flaw in the system. It’s not my job during those interviews to explain something like “oh this is shared with much older GUI systems.” But while I know it’s not an Alma issue, they’re also not wrong that it’s a problem in the system!2
So why don’t more of us have this?
Was it specific to a scanner vendor and system?
Was it too annoying? Did it disappear in the era of the GUI desktop client?
Are there still ways to set it up?
If I still had a site with comments, I’d ask people if their library has managed to do anything like this. If you have, hit me up via my contact page or on social media because I would love to hear about it!
I assume it’s in the barcode itself, but it could be a system configuration? I’ve never gotten that granular in barcode/scanner setup. ↩︎
I have, perhaps, made a mistake by introducing this old solution to my own coworkers in public services who think it’s pretty great. Are there still ways to set it up? ↩︎
This week I migrated my virtual server from Digital Ocean, where I pay in USD, to a host in Australia, where the punishing and ever-worse currency exchange rate doesn't make monthly bills a lottery.
This is the third or fourth time I've gone through a server migration, so it's a little less daunting than it has been in the past, but every time there is a new, exciting problem to tackle. I'm not a professional system administrator so things that might seem basic to others can be quite challenging for me.
This post is about one of those – migrating my self-hosted Forgejo git repository . Forgejo actually has very good documentation, which is less typical than we might hope in FOSS projects. The guide to installation is quite thorough and explains what each step is doing and why it is needed. The guide to upgrading is also pretty good, and includes a section on how to make a backup – they even created a CLI command to do this: forgejo dump. Unfortunately, how to restore from the backup is left as an exercise for the reader.
At some point in the future someone else is going to want to migrate their Forgejo install from one server to another, and not know how to do it. This blog post is for that person so they don't need to go through as much trial and error as I did. Let's be real – that person is probably me again, two years in the future.
Assumptions and caveats
This guide assumes:
you are running Forgejo on a Linux server and want to migrate it to another Linux server
you are using the binary rather than Docker
you are using sqlite as your database
you have root access to both servers
Your backup and new file structures may be slightly different, depending on which version you are moving to and from. This is what worked for me.
Step 1: make a backup
On your "old" server, first make a backup of forgejo. Forgejo provides instructions for this but there are a few assumptions made, and they caught me out.
To run a backup, you can use forgejo dump. However there some conditions required to make this work properly.
First of all, if you followed the official the installation instructions you will be running forgejo with the git user, and that user will not have a password and not be able to use sudo. That makes it difficult to run forgejo commands as directly as you might expect. To run a clean dump we need to:
run the command as the git user with sudo -u git forgejo command --argument
run dump from a directory where the git user has write access (the fact you're using sudo won't override this requirement)
explicitly declare both the config location and the working-path
nominate a tmp directory that git has permission to write to
declare your database type
The default temporary file location is /tmp, which probably is owned by root, so you can create a directory to use instead:
sudo -u git mkdir /home/git/tmp
Then move into the git home directory so your git user can save the export file:
cd /home/git
Now you should be able to run a dump with a command like this:
You can find out more about what these flags do with:
sudo -u git forgejo dump --help
You should now have a file called something like forgejo-dump-1744939469.zip.
Step 2: move your backup
You may have your own system worked out for transfering files between your old and new server. If you haven't worked this out yet and your files are not enormous, an easy way to do it is to use scp with your local machine as an intermediary.
To do this successfully, you need a user with the same name on all three machines, with permission to read and write all the files you're moving. On your old server, move the zip file out of the git user's directory and into your main user's home directory, then change the ownership:
This might take a few minutes, depending on how big your git repositories are.
Step 3: Reroute your DNS
We need to see the web configuration screen in a moment. You could do this by viewing it at [your_ip_address]:3000, depending on how you have set up your web server configuration. But given you have to redirect your DNS from the old server to the new one anyway, it's probably easier to do it now. Hopefully you remembered to reduce the TTL value for your (sub)domain earlier in the week 😉. How you do this depends on how you are managing DNS, so it's outside the scope of this post. Don't forget to use certbot to create a new HTTPS certificate.
Step 4: Install Forgejo on your new server
Before we can import the backup into our new server, we need to set up Forgejo. Follow the installation instructions up to the point where you are looking at the fresh web-based configuration screen. You should finalise installation by selecting sqlite as your database type, and creating an admin user. It doesn't actually matter what credentials you give your admin user here because we're about to overwrite them, but you need to perform this step in order to get all your directories in order.
Step 5: Restore your backup
Now we can finally actually migrate our data!
You have a zip file sitting in your main user's home directory. We need get these files into the right places with the right permissions.
First of all, disable your forgejo daemon:
sudo systemctl stop forgejo.service
Now we need to unzip the backup. You might need to install unzip first, and it's a good idea to unzip in a new directory:
sudo apt install unzip
mkdir forgejo-backup
cd forgejo-backup
unzip forgejo-dump-1744939469.zip
Your data directory doesn't include everything needed in that directory, so instead of copying over the top of the whole thing, we just copy in what we have:
sudo mv -r data/* /var/lib/forgejo/data/
Our repositories are in the repos directory from the backup, but we need to copy them in to /data/forgejo-repositories:
Now move the custom directory into /var/lib/forgejo:
sudo mv -r custom /var/lib/forgejo/
You might be wondering what to do with forgejo-db.sql - isn't that your database? Turns out it is not! Your sqlite database is within the data directory in your backup (as forgejo.db), so you don't need to move it specifically. You can ignore forgejo-db.sql .
Step 6: Run doctor
Something is likely to be not quite right at this point, especially if you are also upgrading versions and are missing some newer database tables. You can check this with doctor:
If there is something wrong, doctor will suggest how you can fix it – doctor fix may resolve most or all issues. One of these things is likely to be your server SSH key, which we will come to in a moment.
Step 7: Complete installation and restart Forgejo
Now is a good time to make your file permissions slightly more secure, as per the Forgejo installation instructions. We don't need to write to app.ini any more, so tighten it up a bit:
You should now be able to restart Forgejo, and check in a browser that all is well.
sudo systemctl start forgejo.service
If everything went according to plan, you should now be able to log in all users with their original password and any pre-existing 2FA tokens.
Step 8: Update known_hosts
The last thing you may need to do is update the known_hosts on any local machines pushing to your hosted git repositories. When we set up Forgejo on the new server and tidied it up with doctor, we ended up with a new SSH key. Your pre-existing git repositories aren't going to like that because it will no longer match what is in your known_hosts file and quite correctly you will get an alarming message suggesting a MITM attack is underway. You, of course, know better than this, so you can relax, but you also need to fix the confusion.
A simple way to resolve this is to use ssh-keyscan to query the public key for your Forgejo instance and automatically save it:
ssh-keyscan git.example.com >> ~/.ssh/known_hosts
Note that this is only safe to do in this situation because you're confident the SSH key changed because you changed it. If you suddenly started getting errors about changed keys in any other situation, you'd definitely want to do some investigation before blindly just updating your known_hosts.
Congratulations on migrating your self-hosted git server!
Spring has sprung, and we’re hosting a vernal treasure hunt in celebration! Come join our Spring Treasure Hunt!
We’ve scattered a patch of seedlings around the site, and it’s up to you to try and find them all.
Decipher the clues and visit the corresponding LibraryThing pages to find an seedling. Each clue points to a specific page right here on LibraryThing. Remember, they are not necessarily work pages!
If there’s a seedling on a page, you’ll see a banner at the top of the page.
You have a little less than two weeks to find all the seedlings (until 11:59pm EST, Wednesday April 30th).
Come brag about your patch of seedlings (and get hints) on Talk.
Win prizes:
Any member who finds at least two seedlings will be awarded a seedling Badge ().
Members who find all 12 seedlings will be entered into a drawing for some LibraryThing (or TinyCat) swag. We’ll announce winners at the end of the hunt.
P.S. Thanks to conceptDawg for the spring chicken illustration!
After last week's issue on digital privacy, I thought I'd focus this week on government-sponsored or -enabled surveillance.
As I dug through my store of saved articles, though, I realized I had quite a number of a particular kind of surveillance: camera networks.
These are often municipal-sponsored systems of license plate readers, but there are also networks of private systems—and, of course, attempts to combine the output of all of these networks.
So that is the focus of this week's Thursday Threads issue:
Debate over the privacy concerns and legal challenges of license plate readers is nothing new, as this 2012 article shows.
What happens when you put equipment not meant for the internet onto the internet? A security flaw in Motorola's automated license-plate-recognition systems exposes real-time vehicle data and video feeds online. (2025)
How about we network all of these cameras together? AI-powered surveillance system spurs privacy concerns as adoption grows in U.S. (2023)
If we've got to have this tech, we might as well have some fun with it. Artist's Traffic Cam Photobooth sparks controversy and cease-and-desist over creative use of NYC traffic cameras. (2024)
This Week I Learned: The word "scapegoat" was coined in a 1530 translation of the bible.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Privacy Concerns and Legal Challenges in Rural Virginia's Use of License Plate Reading Cameras
The research for State of Surveillance showed that you can’t drive anywhere without going through a town, city or county that’s using public surveillance of some kind, mostly license plate reading cameras. I wondered how often I might be captured on camera just driving around to meet my reporters. Would the data over time display patterns that would make my behavior predictable to anyone looking at it? So I took a daylong drive across Cardinal Country and asked 15 law enforcement agencies, using Freedom of Information Act requests, to provide me with the Flock LPR footage of my vehicle. My journey took me over 300 miles through slices of the communities those agencies serve, including the nearly 50 cameras they employ. And this journey may take me to one more place: an April Fool’s Day hearing in a courtroom in Roanoke. There, a judge will be asked to rule on a motion to declare the footage of the public to be beyond the reach of the public.
In a detailed exploration of public surveillance, this newspaper editor drove 300 miles across rural Virginia, requesting footage from police of their vehicle captured by license plate reading cameras.
The investigation aimed to understand how often people are recorded by these cameras and the implications of such surveillance.
Despite asking 15 law enforcement agencies for footage, only nine complied while others denied the request, leading to a legal challenge regarding public access to this data.
The editor noted that while driving through various counties, their vehicle was indeed photographed multiple times by Flock cameras, which capture detailed images of vehicles, including license plates and unique identifiers.
The editor also reflected on the ease with which police could track movements without a warrant, emphasizing a shift in expectations regarding privacy in public spaces.
Debate Grows Over Privacy Concerns and Legal Challenges as License Plate Readers Expand Across the U.S.
The scanners can read 60 license plates per second, then match observed plates against a "hot list" of wanted vehicles, stolen cars, or criminal suspects. LPRs [license plate readers] have increasingly become a mainstay of law enforcement nationwide; many agencies tout them as a highly effective "force multiplier" for catching bad guys, most notably burglars, car thieves, child molesters, kidnappers, terrorists, and—potentially—undocumented immigrants. Today, tens of thousands of LPRs are being used by law enforcement agencies all over the country—practically every week, local media around the country report on some LPR expansion. But the system&aposs unchecked and largely unmonitored use raises significant privacy concerns. License plates, dates, times, and locations of all cars seen are kept in law enforcement databases for months or even years at a time. In the worst case, the New York State Police keeps all of its LPR data indefinitely. No universal standard governs how long data can or should be retained.
This is the earliest article I had bookmarked about license plate readers.
The rise of these cameras had led to significant advancements in law enforcement capabilities, particularly in tracking vehicles linked to criminal activity.
It described the effect in Tiburon, California, which was among the first towns to implement cameras that allowed police to monitor all cars entering and leaving the area.
The American Civil Liberties Union raised questions about the lack of regulation surrounding LPR usage and data retention.
Despite the benefits, such as recovering stolen vehicles and identifying suspects, critics highlighted issues like false positives and potential misuse of data.
Those criticisms are still valid today as there has been no comprehensive law on the use of such cameras.
Security Flaw in Motorola's ALPR Systems Exposes Real-Time Vehicle Data and Video Feeds Online
This trove of real-time vehicle data, collected by one of Motorola’s ALPR systems, is meant to be accessible by law enforcement. However, a flaw discovered by a security researcher has exposed live video feeds and detailed records of passing vehicles, revealing the staggering scale of surveillance enabled by this widespread technology. More than 150 Motorola ALPR cameras have exposed their video feeds and leaking data in recent months, according to security researcher Matt Brown, who first publicized the issues in a series of YouTube videos after buying an ALPR camera on eBay and reverse engineering it.
This article is as much about the surveillance possible with these systems as it is about the risks of connecting misconfigured systems open to the public internet.
It discusses a significant security flaw in automated license-plate-recognition (ALPR) systems, particularly those manufactured by Motorola, which exposured real-time video feeds and vehicle data.
On example: in Nashville, an ALPR system captured information from nearly 1,000 vehicles in just 20 minutes.
A security researcher discovered that ALPR cameras were put on the open internet...something it seems they weren't designed to be.
This breach does not require any authentication, highlighting the scale of unintended surveillance enabled by these systems.
The data collected includes photographs, license plate information, and metadata such as location and time.
In just a few taps and clicks, the tool showed where a car had been seen throughout the U.S. A private investigator source had access to a powerful system used by their industry, repossession agents, and insurance companies. Armed with just a car’s plate number, the tool—fed by a network of private cameras spread across the country—provides users a list of all the times that car has been spotted. I gave the private investigator, who offered to demonstrate the capability, a plate of someone who consented to be tracked. It was a match. The results popped up: dozens of sightings, spanning years. The system could see photos of the car parked outside the owner’s house; the car in another state as its driver went to visit family; and the car parked in other spots in the owner’s city. Each was tagged with the time and GPS coordinates of the car. Some showed the car’s location as recently as a few weeks before. In addition to photos of the vehicle itself, the tool displayed the car’s accurate location on an easy to understand, Google Maps-style interface.
The previous articles have talked about public sector cameras for use by police.
This article discusses the Digital Recognition Network (DRN), a private surveillance system that allows its users to track vehicles via a vast database of license plate scans.
The system is built from cameras installed by repo men who collect data as they drive.
Users can access detailed information about a car's location history, including timestamps and GPS coordinates, through a user-friendly interface.
While DRN markets itself as a tool for industries like insurance and investigations, concerns arise regarding privacy violations, as the data can be accessed by anyone who pays for it, including private investigators.
(Last week's Thursday Threads include a story about how freelancers on Fiverr will look up anyone for a price.)
Critics argue that this system creates a digital dossier of individuals' movements, raising significant privacy issues.
The technology is legal because it captures publicly visible information, but its widespread use has sparked debates about surveillance and civil liberties.
Kilgore was referring to a system consisting of eight license plate readers, installed by the private company Flock Safety, that was tracking cars on both private and public roads. Despite being in place for six months, no one had told residents that they were being watched. Kilgore himself had just recently learned of the cameras. “We find ourselves with a surveillance system,” he said, “with no information and no policies, procedures, or protections.” The deal to install the cameras had not been approved by the city government’s executive branch. Instead, the Rough Hollow Homeowners Association, a nongovernment entity, and the Lakeway police chief had signed off on the deal in January 2021, giving police access to residents’ footage. By the time of the June city council meeting, the surveillance system had notified the police department over a dozen times.
The first article in this week's Thursday Threads was about Flock's law enforcement division.
But it isn't just police installing the technology.
This article describes the collaboration between a private homeowners association (HOA) and police departments to install license plate readers from Flock Safety.
In Lakeway, Texas, residents were unaware of a surveillance system tracking their vehicles, installed without proper city approval—just an agreement between the HOA and the police chief with no public announcement or comment.
Flock Safety, valued at the time at approximately $3.5 billion, marketed its cameras to over 200 HOAs nationwide, leveraging their substantial budgets and providing police access to private data.
The article also points out incidents of wrongful detentions due to erroneous alerts and highlights the risks associated with these systems.
AI-Powered Fusus Surveillance System Spurs Privacy Concerns as Adoption Grows in U.S. Towns and Cities
Spread across four computer monitors arranged in a grid, a blue and green interface shows the location of more than 50 different surveillance cameras. Ordinarily, these cameras and others like them might be disparate, their feeds only available to their respective owners: a business, a government building, a resident and their doorbell camera. But the screens, overlooking a pair of long conference tables, bring them all together at once, allowing law enforcement to tap into cameras owned by different entities around the entire town all at once. This is a demonstration of Fusus, an AI-powered system that is rapidly springing up across small town America and major cities alike. Fusus’ product not only funnels live feeds from usually siloed cameras into one central location, but also adds the ability to scan for people wearing certain clothes, carrying a particular bag, or look for a certain vehicle.
With the growth of camera networks (public and private), it was only a matter of time before someone tried to link them all together.
The article explores the rapid adoption of Fusus' AI-powered surveillance system.
Fusus connects various existing security cameras into a central hub, allowing law enforcement to access multiple live feeds simultaneously.
The technology also enhances existing surveillance systems with new capabilities like enabling the detection of specific clothing, bags, vehicles, and even transforming standard cameras into automatic license plate readers.
While some communities have embraced Fusus for its potential to improve public safety, others have raised concerns about privacy and the implications of constant surveillance.
The lack of transparency regarding police access to the system and its data analytics has sparked debate among residents and city councils.
Fusus has been marketed as a solution to enhance security, but critics argue it could lead to misuse without proper oversight.
Artist's Traffic Cam Photobooth Sparks Controversy and Cease-and-Desist Over Creative Use of NYC Traffic Cameras
When it debuted this summer, the Traffic Cam Photobooth (TCP) website offered a new twist on the surveillance state by enabling smartphone users to take selfies with New York traffic cams. By October, it had expanded to Georgia, Maryland, Minnesota, and Ireland. TCP was recently featured in an exhibit at Miami Art Week. But the future of the interactive site is uncertain, at least in New York City, where the Department of Transportation has 900-plus traffic cams accessible through the website. Its Office of Legal Affairs recently sent a cease-and-desist letter to Morry Kolman, the artist behind the project, charging that the TCP "encourages pedestrians to violate NYC traffic rules and engage in dangerous behavior."
The Traffic Cam Photobooth (TCP) websiteTraffic Cam Photobooth (TCP) website, created by artist Morry Kolman, allows users to take selfies with New York City's traffic cameras.
The NYC Department of Transportation—being spoilsports—issued a cease-and-desist letter to Kolman, claiming the site encourages unsafe behavior by pedestrians.
In response, Kolman creatively showcased the cease-and-desist letter using a long pole to photograph it with traffic cameras across Manhattan and Brooklyn.
Kolman views the project as a way to raise awareness about surveillance technologies and how to navigate living under such systems.
The source code is even on GitHub.
This Week I Learned: The word "scapegoat" originated in a 1530 bible translation
Early English Christian Bible versions follow the translation of the Septuagint and Latin Vulgate, which interpret azazel as "the goat that departs" (Greek tragos apopompaios, "goat sent out", Latin caper emissarius, "emissary goat"). William Tyndale rendered the Latin as "(e)scape goat" in his 1530 Bible. This translation was followed by subsequent versions up through the King James Version of the Bible in 1611: "And Aaron shall cast lots upon the two goats; one lot for the Lord, and the other lot for the scapegoat."
—Scapegoat, Wikipedia
Have you stared at a word and suddenly wondered about its origins?
This entry from the New York Times Flashback Quiz had me wondering about "scapegoat".
"scape" — "goat".
Why do we say that?
It comes from a phrase in the bible where a goat sent into the wilderness on the Day of Atonement as a symbolic bearer of the sins of the people — Leviticus 16:22, to be exact.
The translator coined the term from the interpretation of "the goat that departs" and "emissary goat" in that verse.
What did you learn this week? Let me know on Mastodon or Bluesky.
The OpenForum Academy (OFA) is pleased to announce its roster of Content Partners for the upcoming OFA Symposium 2025. This year’s Symposium will benefit from the perspectives and expertise of three leading organisations in this space as Content Partners: the Open Knowledge Foundation, the Open Source Initiative, and the Digital Public Goods Alliance.
Set to take place at FGV Rio Law School in Rio de Janeiro, Brazil on November 18-19, 2025, the OFA Symposium will address critical questions facing open technology ecosystems under the theme “Open Technology Impact in Uncertain Times.” The event will bring together academics and practitioners – including policymakers, researchers, industry leaders, and civil society representatives – to explore new, interdisciplinary research and ideas.
Why announce new partners for the Symposium?
This year’s event focuses on new understanding and innovative approaches related to shared digital challenges in an era of geopolitical shifts, economic instability, and rapid technological progress. To help advance our understanding of this space, it required new partners to help the Symposium expand its content and programming, as well as reach new audiences.
By bringing in Content Partners, OpenForum Europe (OFE) aims to grow and institutionalise this conference as a best-in-class, global, and collaborative research effort that supports the advancement of research and scholarship related to Open Source and Open Technologies. Said OFE Executive Director Astor Nummelin Carlberg: “The partnership with these three esteemed organisations reflects our commitment to bringing diverse perspectives and expertise to the Symposium. Their contributions will ensure rigorous, forward-thinking discussions that can inform policy and practice in the Open Source and Open Technology space.”
Let’s talk a little bit more about each of our partners and what they bring to the event.
Open Knowledge Foundation: Building a world open-by-design, where all knowledge is accessible to everyone
To the OFA Symposium, OKFN will bring its extensive expertise in open data, open content, and open knowledge to the event, expanding the OFA Symposium’s solicitation of research from different corners of the Open Technology Ecosystem. They will also contribute their expertise in network building to help us build a new model for the OpenForum Academy itself.
Renata Avila, CEO of the Open Knowledge Foundation, welcomes the announcement. “I encourage everyone to come to Brazil and join the OpenForum Academy Symposium 2025. This conference is becoming a cornerstone for the open movement; let’s create hard evidence and vital primary research together to make the case for Open Technologies. Equipped with facts, we can shift the tech landscape and guide policymakers to the right decisions for people, the economy, and the planet.”
Open Source Initiative: Strengthening collaborations with legal experts, policymakers, and maintainers, as well as addressing the complexities of the Open Source Definition in the age of AI
The Open Source Initiative (OSI), the stewards of the Open Source Definition, has existed for over 25 years. It remains actively involved in Open Source community-building, education, and public advocacy to promote awareness and the importance of Open Source software. As a leading organisation in the Open Source space, OSI participates in conferences and events, and meets with developers and users to discuss the economic and strategic advantages of Open Source technologies, licences, and models.
OSI’s collaboration with the OFA Symposium will contribute its deep understanding of Open Source licensing, governance models, and community development to the agenda, and ensure that its diverse community submits research on the important work they are doing. OSI’s participation will help ensure focus on sustainable Open Source ecosystems and the evolving relationship between Open Source and emerging technologies like AI.
Says Nick Vidal, Community Manager for OSI, commenting on the value of the event: “The OFA Symposium is a unique event that brings together voices across academia, industry, and civil society. We’re honoured to support it as a Content Partner, helping to foster the cross-sector dialogue essential to unlocking the full potential of openness for the public good.”
Digital Public Goods Alliance: Unlocking the Potential of Open Source Technologies For a More Equitable World
The Digital Public Goods Alliance (DPGA) joins as a content partner to highlight the importance of digital public goods in addressing global challenges. They steward the digital public goods (DPG) definition, which recognises DPGs as “… open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable laws and best practices, do no harm, and help attain the Sustainable Development Goals (SDGs).”
The DPGA will help the Symposium to facilitate discussions on international cooperation, equitable access to technology, and the role of open solutions in achieving the Sustainable Development Goals. The DPGA is excited to advance their foothold in the research space, collaborating with global partners to support and amplify research that advances understanding of the sustainability and impact of DPGs projects – open source technologies and communities which support social impact around the world.
Liv Marte Nordhaug, CEO of the DPGA Secretariat, embraces the enthusiasm driving the event and is excited to support the initiative. “Research on DPGs is crucial for an evidence-based approach that drives real impact, and the Digital Public Goods Alliance is pleased to support the OpenForum Academy Symposium as a content partner!”
How to get involved
Thanks to all our Content Partners for helping to shape the agenda for this upcoming event. The Call for Proposals for the OFA Symposium 2025 in Rio de Janeiro, Brazil will remain live until 1 June, so do not delay in preparing your submissions. More information on submission requirements and deadlines are available on our website, over at https://symposium.openforumeurope.org/.
If you are interested in sponsoring the Symposium, please reach out to OFE’s Senior Policy Advisor, Nicholas Gates, at nicholas@openforumeurope.org.
Nirdslab and WS-DL group Students Research Presentation and Demonstration
The day featured hands-on demonstrations of ongoing projects, including groundbreaking work in eye-tracking research by students Lawrence, Yasasi, Bhanuka, Kumushini, and James.
Lawrence presented his research on human-AI teaming, highlighting how electrodermal responses, heart rate, and speech data can be integrated using custom-built sensors—including the Rosetta Stone and various bracelet sensors—for real-time data collection. The ultimate aim of his work is to integrate these real-time data streams with AI models to infer the emotional state of a human team member during collaborative tasks.
Kumushini introduced the Project Aria glasses during her demonstration, explaining the diverse sensors integrated into the glasses and their respective capabilities. She detailed how data recording is achieved through the glasses and introduced the companion applications designed to streamline data collection. Furthermore, she described how the NIRDS Lab team employs Project Aria glasses for eye-tracking research, with a focus on studying joint visual attention—that is, how multiple people focus on similar objects or areas within shared environments. This research involves using several time-synchronized Project Aria glasses during user studies.
Yasasi presented her eye-tracking research focused on measuring visual attention in gaze-driven virtual reality (VR) learning environments, using the Meta Quest Pro VR headset. She demonstrated a VR learning application developed using Unity that features gaze-driven content rendering. In the application, virtual learning materials appear within the user’s field of view when they fixate on specific areas of interest (AOIs) and remain visible only so long as the user’s gaze stays on the designated AOI. As part of her study, Yasasi collects eye-tracking data, which she then analyzes to assess visual attention.
Bhanuka showcased his work on distributed eye tracking for online collaborations (DisETrac). This study presents a flexible system for capturing multi-user eye-tracking data over public networks, aggregating this data, and generating both individual and multi-user eye-tracking measures and visualizations. This innovative approach allows researchers to obtain naturalistic eye-tracking data in virtual collaboration settings—areas where traditional eye-tracking tools have often been limited.
James demonstrated an API for RAEMAP (Real Time Advanced Eye Movement Analysis Pipeline), which calculates advanced gaze metrics from raw eye-tracking data (specifically, x and y coordinates and timestamps). While RAEMAP was originally developed by Gavindya Jayawardena, earlier implementations required hard-coded parameters for eye trackers, datasets, and algorithm options. James’s research involves converting RAEMAP into a FastAPI application hosted in the cloud and accessible via Swagger UI. His work also includes the development of machine learning models to predict cognitive load from advanced eye-gaze metrics, with models trained using workload scales such as NASA-TLX.
Collectively, these projects explore how gaze patterns can provide insights into cognitive load, attention shifts, emotional states, and user interaction. The research reflects technical rigor and deep interdisciplinary thinking, merging computer science, psychology, and design to solve real-world challenges.
Dr. Michael Herzog explored how these insights might translate to adaptive learning systems and accessible technologies for individuals with disabilities, and sparked ideas for future joint research between Old Dominion University and Magdeburg-Stendal University.
Visit to Hampton Roads Biomedical Research Consortium (HRBRC)
During the tour, Dr. Michael Herzog met with researchers and technicians working on next-generation medical devices, biomedical 3D printings, and human systems integration—highlighting the rich ecosystem of applied research in the Hampton Roads area. Patrick Ball demonstrated the use of the HRBRC 3D printing lab.
Cultural exchange was woven throughout the visit. A highlight of the day included a traditional Iranian dessert generously prepared by Dr. Faryaneh Poursardar, offering a reminder of the shared humanity behind global scholarship. This gesture sparked warm conversations that extended beyond research—creating moments of connection that celebrated cultural diversity, hospitality, and mutual appreciation, curiosity. These personal interactions added depth to the academic exchange, reinforcing a spirit of respect, curiosity, and collaboration.
Thank you Dr. Poursardar @Faryane for the delicious traditional Iranaion desert! Taste Awesome. I think it got saffron, rice, and some other stuff. Maybe @faryane can answer if you have any questions. pic.twitter.com/IFA5prC13a
Dr. Michael Herzog's visit served not only as an opportunity to celebrate shared achievements, but also to identify future directions for student exchanges, joint publications, and collaborative grant initiatives.
We extend our heartfelt thanks to Dr. Michael Herzog for his continued partnership support, and to our students, collaborators, and faculty whose passion and hard work made this visit an outstanding success.
About the Authors:
1. Lawrence Obiuwevwi is a Ph.D. student in the Department of Computer Science, a graduate research assistant with the Center for Mission Engineering, and a proud student member of The Storymodelers, and The Web Science and Digital Libraries (WS-DL) Research Group, and Nirds Lab at Old Dominion University.
2. Kumushini Thennakoon is a Ph.D. student in the Department of Computer Science and is affiliated with The Web Science and Digital Libraries (WS-DL) Research Group and Nirds Lab at Old Dominion University.
Lawrence Obiuwevwi Graduate Research Assistant Virginia Modeling, Analysis, & Simulation Center Center for Mission Engineering Department of Computer Science Old Dominion University, Norfolk, VA 23529 Email: lobiu001@odu.edu Web : lawobiu.com
Community voting is now open for the 2025 DLF Forum and Learn@DLF! Community voting lets DLF and the Program Committee know which proposals resonate with our community. Results are weighed when developing the final event programs. Anyone may participate, and you may vote for as many proposals as you’d like, but each one once.
You’ll be asked to enter your email address. Email will only be used to ensure that each person votes just once, then will be de-coupled from the votes themselves.
Click the +Add button under each event name to select your favorites for each event.
Last Tuesday Cliff Lynch delivered an abbreviated version of his traditional closing summary and bon voyage to CNI's 2025 Spring Membership Meeting via Zoom from his sick-bed. Last Thursday night he died, still serving as Executive Director. CNI has posted In Memoriam: Clifford Lynch.
Cliff impacted a wide range of areas. The best overview is Mike Ashenfelder's 2013 profile of Cliff Lynch in the Library of Congress' Digital Preservation Pioneer series, which starts:
Clifford Lynch is widely regarded as an oracle in the culture of networked information. Lynch monitors the global information ecosystem for cultural trends and technological developments. He ponders their variables, interdependencies and influencing factors. He confers with colleagues and draws conclusions. Then he reports his observations through lectures, conference presentations and writings. People who know about Lynch pay close attention to what he has to say.
Lynch is a soft-spoken man whose work, for more than thirty years, has had an impact — directly or indirectly — on the computer, information and library science communities.
Below the fold are some additional personal notes on Cliff's contributions.
Ashenfelder notes Cliff's focus on collaboration:
Lynch is also a catalyst for action. He helps steer the conversation toward real results, such as standards creation, funding, tool development, metadata creation and interoperability. Ultimately, Lynch seems most fervent about collaboration as a crucial force.
“I would be reluctant to attribute much of anything just to my actions,” he said. “Most important successes come through the work of a lot of different people, collaborating and pulling it together. Maybe I can think of a place or two where there was a meeting that I spoke at or convened or I wrote or did something that just happened to fall at a pivotal moment. But any of that to me feels a bit accidental, at best just good luck, being in the right place at the right time.”
Michael Nelson and Herbert Van de Sompel's Cliff Lynch: The Invisible Influencer in Information Infrastructure provides an in-depth account of one occasion when Cliff was "in the right place at the right time" to spark a collaboration. The occasion was an October 1999 meeting in Santa Fe:
In order to further optimize the chances of success for the meeting, the collaboration of Cliff Lynch and Don Waters as moderators had been secured and turned out to be fundamentally important. In the Acknowledgments section of his PhD thesis, Herbert put Cliff’s impact on the direction of the meeting and on his own thinking as follows:
When starting to work on this thesis, I went back reading several of his early papers and could not feel other than intimidated by the far forward-looking vision expressed therein. At several occasions, I heard Cliff address large audiences, discussing complicated digital library matters with an amazing clarity. Cliff's work has always been a great inspiration to me. I met Cliff for the first time in person at the Open Archives meeting in Santa Fe, for which he had enthusiastically accepted my invitation to serve as a moderator. His involvement was crucial to the successful conclusion of the meeting.
...
Prior to the start of the second day, he vented his frustration about the lack of progress to Cliff, who was about to start moderating the first session. Cliff was nice enough to let him ramble on a bit, and, in a manner that exemplified one of Cliff’s many unparalleled capabilities, he went on to open the meeting by providing two discussion topics regarding interoperability that he somehow had been able to synthesize from the first day’s discussions, which most had experienced as enjoyable yet lacking in any sense of concrete direction. One was whether archive functions, such as data collection and maintenance, should be decoupled from user functions, such as search. The other was about the choice between distributed searching across repositories and harvesting from them to build cross-repository search engines.
The meeting solidified the long and productive collaboration between Van de Sompel and Nelson.
But easily the best way to understand how Cliff worked is the Report of ANADP I from 13 years ago. Cliff's "Closing Thoughts" are transcribed verbatim starting on page 309, and they are a wonderful example of his ability to summarize a meeting and set the agenda for future work with an extemporaneous address. You have to read all twelve pages — it is hard to summarize Cliff's summary, but here are a couple of gems:
With a portfolio of aligned strategies, we can collectively speak more effectively about the importance of the work we do, and certainly that has come up in a background way again and again as we’ve spoken about economics, education, about legal issues and barriers. I think that this question of really clarifying the fundamental importance of digital preservation to maintaining the cultural and intellectual record, the memory of our nations and of the world, has got to be a central objective. We have a great challenge in educating both the broad public in our nations and the governments that represent these publics; to the extent that we can align strategies we can make that case better.
And:
There are two words that I didn’t hear in the technical discussions. I get very scared whenever I hear a lengthy discussion of technical issues in digital preservation that doesn’t mention these two words. The first is Monoculture. There is a possibility, a danger, of doing too much alignment here. The reason for that is the second word that I didn’t hear, which is Hubris. We need to acknowledge that we don’t really know how to do long-term digital preservation. We’re going to have a lot more confidence that we know what we’re doing here about a hundred years from now as we look at what efforts actually brought data successfully a hundred years into the future. But in the relatively early stages of technologies like these, it’s much easier to identify failures than long-term successes.
Active on many advisory boards and visiting committees throughout his career, including serving as co-chair of the National Academies Board on Research Data and Information (BRDI) from 2011-16, Lynch’s contributions were recognized with numerous awards, including the American Library Association’s Joseph W. Lippincott Award, the American Society for Information Science and Technology’s Award of Merit, and the EDUCAUSE Leadership Award in Public Policy and Practice. He was a past president of the American Society for Information Science, and a fellow of the Association for Computing Machinery, the American Association for the Advancement of Science, and the National Information Standards Organization.
Vicky's and my tributes to Cliff are in three recent blog posts:
Cliff made a very valuable contribution to my career by inviting me to debug important talks before I gave them "for real" by giving early versions to the the Information Access Seminar at U.C. Berkeley's School of Information. Fortified by tea and cookies, I could talk and then be subject to detailed and insightful questioning by Cliff and the participants. I always meant to take notes during the questioning but I was so into it I never did. I had to re-construct the necessary changes to the talk from memory.
This search returns ten talks I gave to the Information Access Seminar. In date order they are:
LibraryThing is pleased to sit down this month with screenwriter, playwright and novelist Blair Fell, two-time winner of the Doris Lippman Prize in Creative Writing from the City University of New York for his novels, The Sign for Home (2022) and the brand new Disco Witches of Fire Island (2025). The Sign for Home, his debut, was both an Indies Next and Indies Introduce book, as well as being selected for library community reads, and long-listed for the Center For Fiction’s First Book Prize. Fell has written for television and theater, winning the Shine Award for his work on the television program Queer As Folk, and a Golden Mic award for his segment on the public television series California Connected. He is the author of dozens of plays, and has won the HX Camp comedy award, seven Dramalogue awards, and The Robbie Award. His essays have appeared in magazines and on websites such as Huffington Post, Out Magazine, New York Daily News, and Fiction Southeast. In addition to his career as a writer, actor and director, he has been an ASL interpreter for the Deaf since 1993. His second novel, Disco Witches of Fire Island, an LGBTQ+ fantasy romance featuring a coven of witches on Fire Island, is due out from Alcove Press in early May. Fell sat down with Abigail to answer some questions about his new book.
Disco Witches of Fire Island opens in 1989, and features a young hero who has recently lost his boyfriend to the HIV/AIDs epidemic, and who goes to spend the summer on New York’s Fire Island. How did the story first come to you? Did the character of Joe appear first, was it the idea of a young man who had recently lost his boyfriend, or was it something else?
Oddly enough, the character of Joe came to me last, since he is the one that mirrors me, but isn’t really me. He was definitely the most difficult character to create. It’s hard to fully see oneself, so I created a character that experiences much of what I had experienced at that age but probably is a bit more likable than me and slightly taller. (Haha)
As far as the rest of the characters, so many of them are amalgams of people I met during the height of the AIDS Crisis. My first partner died from complications due to the HIV virus while we were both still in our twenties. To complicate matters he had broken up with me two years prior, and I was still very much in love with him. Needless to say this was an extremely difficult thing to get over. In its aftermath, there was a series of life-altering events, including getting fired from a job, and then a whirlwind last-minute trip to China where I decided to be a writer. It was just after that trip when I moved to Fire Island Pines and landed a job as a bartender, and moved into the attic of those quirky “old” gay men (just as Joe, the main character, does). They were a hoot, and there was lots of drama. They’d play old disco all day, cook illicit substances on the stove, and (one of them) would make huge ornate hats to go out dancing in the wee hours. These men became like witches in my mind. So really the witches, and some other characters came to me first, because I had people to model them after.
Your book unfolds during a period of historic significance for the LGBTQ+ community. How did this inform the way you told the story, and what do you think readers of today can learn from these events?
I moved to NYC around 1988, and was trying to figure out my life, and get over that broken heart. It felt like everyone was dying or sick at the time (and a huge percentage of them were), and I had a sense of absolute helplessness. At that point I attended my first gay pride parade and saw ACT UP (The AIDS Coalition to Unleash Power) marching. I couldn’t believe there were people trying to fight the disease and government inaction. I left the sidelines of the parade and joined. (It was also at that parade I coincidentally saw my first lover for the last time – he is the person who would become “Elliot” in the novel.) Getting involved in activism completely changed my life.
I wanted to capture that shift from victim to actor in the fight. I also wanted younger people to know what it was like at that moment of history, when looking for love could be so fraught. Sadly, we are at another terrible moment in our history, and the book, despite being an historical romance of sorts, very much speaks to what we as a nation – and more specifically – what we as members of the queer community are facing now. It names the Great Darkness of hatred, and suggests that when a malevolent force like our current government is working against you, sitting in the despair of the oppression is not the solution… action is, whether that means protesting, donating, volunteering, making art and most importantly banding together. As several of the manifesto quotes in the book suggest, when confronted with the Great Darkness, the only solution is collective action… and to keep dancing.
Did you always know your story was going to feature witches? What does magic allow you to do, from a storytelling perspective, that couldn’t be accomplished otherwise?
One of the first inspirations for the book were those older roommates of mine on Fire Island, and how they suggested these lovable, quirky witches — cooking mysterious things on the stove, dressing in outlandish costumes, whimsical and sometimes mysterious references to things I didn’t understand. The other reason for the magic is to underline all those magical beings we lost due to the AIDS crisis and government inaction.
The world was very dark – and it feels that way again. The book is about getting one’s magic back in the face of that darkness. The magic in the book isn’t the wave-a-wand-and-go-poof sort of magic. It’s a type of magic rooted in the connection between lovers and friends – it’s a collective magic, that only comes from group effort. The use of magic allowed me to emphasize the other worldly quality of connection and put a button on the “otherness” of being queer.
Another inspiration for the book was from a late friend, Stephen Gendin, whom I met in ACT UP. He had once told me that he had a hope to create a “religion” based on the transcendence he experienced on the dance floors of gay dance clubs. This always stuck with me. So, yes, the witches in the book do have some limited magical abilities – especially when they are in unity with their fellows – but their practice is more of a spiritual nature and comes with its own “bible” of sorts, The Disco Witch Manifesto, which is quoted at the beginning of every chapter.
What made you choose Fire Island as the setting for your story? Have you spent time there yourself?
Like I mentioned, I had spent that one summer working in Fire Island Pines as a bartender in the early 1990s. I also did visit for several summers after that. Though I tend to be much more of a Ptown sort of guy these days – I like biking and the ability to leave without the benefit of a boat. Though P-Town has become more and more unaffordable. We need NEW gay meccas where the queer artists, writers and witches can afford to go.
You write in a number of different genres, from essays to plays. What distinguishes the process of writing novels? Are there particular challenges or rewards?
I never even dreamed of writing a novel when I first started writing. That was way too big for me. But now looking back, I probably should have started much earlier. My first go at a full-length play was a serialized story where the audience would have to come back to the theater twelve times to see the whole thing. You read that right – twelve times. I think I always wanted to take my time with a story. I also thought I needed actors to make my writing good. With novels I arrived very, very late to the game and sort of accidentally found my way to my first novel. What happened was, I had an idea for a play and sat down to write it, but it just didn’t want to be a play. It wanted to be a novel. I was at a point in my life where I had nothing to lose, and I just faked it, one chapter after the next. I’d bring it into my writing group, and then after a few years, finished it, sent it to an agent, and then after a few revisions, he took it and sold it. It appeared I was able to write novels, and now I don’t want to do much else. I love the long journey of them, the surprises, the creation of worlds, and multiple characters.
A play or a TV show is inherently a collaborative process, and you also need to wait around for others to bring the project to fruition. With a novel, I get to say when and where the important work happens, and that’s a more comfortable place for me – especially since I’m not at all patient.
What is next for you? Are you working on more novels, or more plays? Do you think Disco Witches of Fire Island will ever be adapted in film?
Well, I certainly would love to see Disco Witches of Fire Island get adapted. I think it would be a great limited series as well. I do love writing essays and memoir, but I still have the novel-writing bug, so I’m probably sticking with that for the time being. We shall see. I don’t think there will be more plays or TV anytime soon, but I’ll never say never.
As far as books go, I’m currently working on two new novels, one of which, a pansexual Elizabethan romance, is out there being read by editors as we speak, while the fourth is just starting to make an appearance in my Scrivner software, but I’m torn about which of two ideas I want to live with for the next few years. Starting something new is never easy, especially with the distractions of this messed up world in which we’re living, but I’m willing to knuckle down and do the grind. It’s all about throwing down words and separating the shit from the sparkles.
Tell us about your library. What’s on your own shelves?
At Union College Schaffer Library, the digitization lab is mostly staffed by undergraduates who only work a handful of hours a week. While they do a great job, the infrequency of their work hours and lack of experience results in errors in digitization and metadata. Many of these errors are difficult to catch during quality control checks because they are so minute, such as a missed counted page number here, or a transposed character in a filename there. So, a Computer Science student and a librarian collaborated to create a quality control automation application for the digitization workflow. The application is written in Python and relies heavily on using Openpyxl libraries to check the metadata spreadsheet and compare metadata with the digitized files. This article discusses the purpose and theory behind the Quality Control application, how hands-on experience with the digitization workflow informs automation, the methodology, and the user interface decisions. The goal of this application is to make it usable by other students and staff and to build it into the workflow in the future. This collaboration resulted in an experiential learning opportunity that has benefited the student's ability to apply what they have learned in class to a real-world problem.
The Dublin Core Metadata Initiative has published a minimally constrainted vocabulary for the concepts of Work, Expression, Manifestation and Item (WEMI) that can support the use of these concepts in metadata describing any type of created resources. These concepts originally were defined for library catalog metadata and did not anticipate uses outside of that application. Employment of the concepts in non-library applications is evidence that the concepts are useful for a wider variety of metadata users, once freed from the constraints necessitated for the library-specific use.
Since the 17th century, scientific publishing has been document-centric, leaving knowledge—such as methods and best practices—largely unstructured and not easily machine-interpretable, despite digital availability. Traditional practices reduce content to keyword indexes, masking richer insights. Advances in semantic technologies, like knowledge graphs, can enhance the structure of scientific records, addressing challenges in a research landscape where millions of contributions are published annually, often as pseudo-digitized PDFs. As a case in point, generative AI Large Language Models (LLMs) like OpenAI's GPT and Meta AI's LLAMA exemplify rapid innovation, yet critical information about LLMs remains scattered across articles, blogs, and code repositories. This highlights the need for knowledge-graph-based publishing to make scientific knowledge truly FAIR (Findable, Accessible, Interoperable, Reusable). This article explores semantic publishing workflows, enabling structured descriptions and comparisons of LLMs that support automated research insights—similar to product descriptions on e-commerce platforms. Demonstrated via the Open Research Knowledge Graph (ORKG) platform, a flagship project of the TIB Leibniz Information Centre for Science & Technology and University Library, this approach transforms scientific documentation into machine-actionable knowledge, streamlining research access, update, search, and comparison.
Gamification, as a way to engage students in the library, has been a topic explored by librarians for many years. In this article, two librarians at a small rural academic library describe their year-long collaboration with students from a Game Design Program to create a single-player pixel-art video game designed to teach information literacy skills asynchronously. The project was accomplished using the game engine Unity and utilizing GitHub for project management. Outlined are the project's inspiration, management, team structure, and outcomes. Not only did the project serve to instruct, but it was also meant to test the campus’ appetite for digital scholarship projects. While the project ended with mixed results, it is presented here as an example of how innovation can grow a campus’ digital presence, even in resistant libraries.
Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, without cloud compute cost. With these tools, university-owned publishing operations can increase their operating efficiency which, when combined with human review, has no effect on quality.
Refactoring is the process of restructuring existing code, in order to make the code easier to maintain, without changing the behavior of the software. Georgia Southern University is the product of a consolidation of two separate universities in 2017. Before consolidation, each predecessor university had its own cataloging practices and software settings in the integrated library system (ILS) / library services platform (LSP). While the machine-readable cataloging (MARC) standard focuses on discovery, and descriptive search blended well to support discovery, settings related to circulation were in discord following the merger. Three busy checkout desks each had different localized behaviors and requested additional behaviors to be built out without centrally standardizing. Complexity stemming from non-unified metadata and settings plus customizations implemented over time for multiple checkout desks had ballooned to make for circulation settings which were overly baroque, difficult to meaningfully edit when a change to circulation practices was needed, and which were layered and complex to such a degree that local standards could not be explained to employees creating and editing library metadata. This resulted in frequent frustration with how circulation worked, difficulty knowing what was or wasn’t a software bug, and inability to quickly fix problems once problems were identified or to make requested changes. During 2024, the Georgia Southern University Libraries (University Libraries) undertook a comprehensive settings clean up in Alma centered around software settings related to circulation. This article describes step-by-step how the University Libraries streamlined and simplified software settings in the Alma ILS, in order to make the software explainable and easier to manage, and all without impacting three busy checkout desks during the change process. Through refactoring, the University Libraries achieved more easily maintainable and explainable software settings, with minimal disruption to day-to-day operations along the way.
This article presents a case study for creating subject tags utilizing transcription data across entire oral history collections, adapting Franco Moretti’s distant reading approach to narrative audio material. Designed for oral history project managers, the workflow empowers student workers to generate, modify, and expand subject tags during transcription editing, thereby enhancing the overall accuracy and discoverability of the collection. The paper details the workflow, surveys challenges the process addresses, shares experiences of transcribers, and examines the limitations of data-driven, human-edited tagging.
The web platforms adopted for digital humanities (DH) projects come with significant short- and long-term costs—selecting a platform will impact how resources are invested in a project and organization. As DH practitioners, the time (or money paid to contractors) we must invest in managing servers, maintaining platform updates, and learning idiosyncratic administrative systems ultimately limits our ability to create and sustain unique, innovative projects. Reexamining DH platforms through a minimal computing lens has led University of Idaho librarians to pursue new project-development methods that minimize digital infrastructure as a means to maximize investment in people, growing agency, agility, and long-term sustainability in both the organization and digital outputs. U of I librarians’ development approach centered around static web-based templates aims to develop transferable technical skills that all digital projects require, while also matching the structure of academic work cycles and fulfilling DH project needs. In particular, a static web approach encourages the creation of preservation-ready project data, enables periods of iterative development, and capitalizes on the low-cost/low-maintenance characteristics of statically-generated sites to optimize limited economic resources and personnel time. This short paper introduces static web development methodology (titled “Lib-Static”) as a provocation to rethink DH infrastructure choices, asking how our frameworks can build internal skills, collaboration, and empowerment to generate more sustainable digital projects.
This text shows a real case of how the Open Data Editor (ODE) impacted the workflow of an organisation working to serve the public good.
Picture of a rally in 2021 at the Place de l’Hôtel de Ville, Paris, around the ‘Abolition of Nuclear Weapons’ airship.
Organisation: Observatoire des armements / CDRPC Location: Lyon, France Knowledge Area: Peace and conflicts Type of Data: Defence spending
Founded in 1984, the Observatoire des armements is the only independent centre of expertise and resources specialising in defence and security in France. Currently located in Saint-Just, Lyon, the documentation centre is open to the public by appointment. The archives contain unique items that are available to students, researchers, journalists and citizens seeking information.
The Observatoire operates on two main axes: the strengthening of democratic control of the arms industry and transfers, and the elimination of nuclear weapons as well as the recognition of their health and environmental consequences for populations. They do it by regularly disseminating independent information through the media, publishing books and reports, and promoting seminars, training and debates to encourage citizen engagement.
The Challenge
Problem
The Observatoire uses several different databases as sources for its reports, which follow different frameworks and criteria. They operate in a field of knowledge where there are no defined standards for data handling, so the data obtained from, for example, the United Nations Register of Conventional Arms (UNROCA), the SIPRI Arms Transfers Database, or the Open Security Data Europe, are different from each other. The Observatoire also works with data collected by researchers linked to the CDRPC, for example, this map built with Amnesty International France, powered by OpenStreetMap.
They need to standardise the data into a single spreadsheet to feed the information available at the Weapon Watch Open Data Environment platform, which is currently in conception.
They also face a lack of specialised human resources. The Observatoire currently has 15 active members and a network of 45 contributors (citizens, activists, researchers…) who actively contribute to the project through their research. Only three people have technological knowledge, and only one of them manages the databases, and even then in a self-taught way, without any specific training in data science or data management.
Impact
Compiling data from multiple sources into a single spreadsheet with hundreds of rows and dozens of columns is a manual, time-consuming task. It’s oftendays of work checking that each cell follows the same pattern as the others, or that there are no typos, different formatting or blank cells.
As they can’t rely on a permanent technical team, this can affect the equilibrium of groups already engaged in investigations and other data research.
Examples of some of the different databases the Observatoire works with.
The Solution
The Observatoire uses the Open Data Editor (ODE) at two different stages in the data collection and production process. The first is to load the spreadsheets from different sources so that they can see in seconds how many and what kind of errors the spreadsheet contains – either empty cells or inconsistencies in the pattern of information. They are then able to correct the missing or incorrect data.
The second stage comes after all the data has been compiled into a single spreadsheet. Before publication, the file is prepared with as much information as possible in the metadata. This is necessary to understand the context of the shared information and increases the quality of the dataset.
According to the team, ODE is also very useful for understanding how data works in general and the importance of standards.
In this spreadsheet, ODE flagged 48 inconsistencies in seconds.
Here, two cells used an incorrect pattern for the number that identifies the company (SIREN) that sells arms to the French government. In other words, ODE has solved a problem that would have jeopardised the accuracy of the content.
The ODE metadata panel proved to be a key resource for the team to learn more about how data works.
The Results
Reduced error resolution time from days to seconds
Eliminated 95% of manual review work
Enabled the team to focus on what matters
Using the app also helped with data literacy
The Observatoire des armements continue to develop the Weapon Watch Open Data Environment platform and can much more easily arrive at impressive and preliminary results with quality-assured data, such as the map below, entitled La guerre se fabrique près de chez nous (‘War is produced near us’), a tool for the surveillance network of military material production companies.
Working with the ODE will help them propose, for the first time, a standardised model to track the local industries’ activities.
Quote
Sayat Topuzogullari, Coordinator
“When I did the Open Data Editor course, it was very useful to understand that some datasets that we put on our websites are not really ready to be shared, for example because of all the metadata and therefore contextualization. Our center is taking on this task of publishing databases, with 40 years of experience in research and investigation. I’ve shared my experience with our researchers, and ODE meets our quality requirements. We know that it is not enough to put some points on the map or share some files. We really need to turn this research data into reusable and truly available data”.
About the Open Data Editor
The Open Data Editor (ODE) is Open Knowledge’s new open source desktop application for nonprofits, data journalists, activists, and public servants, aiming at helping them detect errors in their datasets. It’s a free, open-source tool designed for people working with tabular data (Excel, Google Sheets, CSV) who don’t know how to code or don’t have the programming skills to automatise the data exploration process.
Simple, lightweight, privacy-friendly, and built for real-world challenges like offline work and low-resource settings, ODE is part of Open Knowledge’s initiative The Tech We Want — our ambitious effort to reimagine how technology is built and used.
And there’s more! ODE comes with a free online course that can help you improve the quality of your datasets, therefore making your life/work easier.
All of Open Knowledge’s work with the Open Data Editor is made possible thanks to a charitable grant from the Patrick J. McGovern Foundation. Learn more about its funding programmes here.
Open research, also widely referred to as open science or open scholarship, encompasses best practices, policies, and support enabling greater openness, transparency, and accountability throughout the research life cycle. Open research is increasingly a strategic priority for libraries and their parent institutions, often driven by institutional and national policies. For libraries, open research includes support for open access publishing, research data management, data sharing, and more.
The OCLC Research Library Partnership (RLP) convened the Research Support Leadership Roundtable in March 2025 to discuss open research as a strategic priority. The roundtable discussions brought together 50 library leaders from 33 institutions across four countries, who shared their experiences, strategies, and concerns related to the implementation and support of open research.
Aston University
Stony Brook University
University of Maryland
British Library
Syracuse University
University of Miami
Carnegie Mellon University
Temple University
University of Nevada, Reno
Clemson University
Tufts University
University of Oxford
Cold Spring Harbor Laboratory
University of Arizona
University of Southern California
Colorado State University
University of Calgary
University of Tennessee, Knoxville
London School of Economics & Political Science
University of California, Irvine
University of Texas at Austin
Monash University
University of California, Riverside
University of Waterloo
New York University
University of California, San Diego
Virginia Tech
Ohio State University
University of Glasgow
Yale University
Penn State
University of Illinois Urbana Champaign
Rutgers University
University of Manchester
Our conversations focused on open research as a strategic priority, and participants were asked to consider these framing questions:
Is open research a cohesive institutional and/or library strategic priority, and is the institution/library tracking progress toward open research goals? If so, how?
How is open research being implemented at your institution? That is, is it a centralized effort coordinated by the library (or another unit) or is it highly distributed? How are stakeholders collaborating?
How are external factors impacting your institution’s open research goals and activities? This might include things like cybersecurity, national security, scrutiny of international collaborations, reputation and prestige, AI, etc.
This post offers a synthesis of our discussions. RLP leadership roundtables observe the Chatham House Rule; no specific comments are attributed to any individual or institution.
Open research as a strategic priority
The roundtable discussion revealed a spectrum of institutional practices related to open research strategy, ranging from:
Explicit incorporation into institution-level strategic plans
Inclusion in library strategic plans
No articulated strategy related to open research, despite significant activities
The absence of a central strategic priority statement about open research at either the campus or library level does not necessarily indicate a lack of support for open research. All participating libraries demonstrated leadership on open scholarship topics, often despite what one participant characterized as a “pretty lukewarm” campus response.
Approximately seven institutions participating in the roundtable reported that open research was an institution-level priority. One US institution with ambitious research productivity goals is investing in research support infrastructure, including an open scholarship unit within the library. UK institutions were generally more likely to make open research an institution-wide priority, influenced by national policies and priorities. In these cases, institutional prioritization typically results in greater centralization and coordination of the open research service bundle, usually housed within the library.
At least eight RLP institutions reported that open research is a stated strategic priority for the library, with open research explicitly called out in research strategic plans. At many other libraries, the commitment to open research remains largely tacit, demonstrated through array of services rather than explicit statements.
Our discussions also revealed that the bundling of open research services exists across a spectrum. An institutional-level commitment to open research typically indicates greater coordination under a single operational umbrella. For instance, I have previously written about the library-based Office of Open Research at the University of Manchester, which represents a new level of commitment, maturity, and coordination of open research activities for the university.
Libraries play a central role in advancing open research, though service offerings vary between institutions and may not be marketed to the university community as a comprehensive package.
Open publishing
For example, most institutions provide support for open access publishing, usually through a multimodal approach that could include:
Institutional repositories (IRs) that facilitate the deposit and discovery of publications, data, and other scholarly works.
Open access publishing funds that cover Article Processing Charges (APCs) for publication in hybrid journals for affiliated researchers.
Transformative/Read and Publish Agreements (TAs) shift library payments from subscription-based reading to open access publishing. These agreements are negotiated with each publisher and can be administratively burdensome.
Library publishing programs that host open access journals and support innovation with new publication formats. Furthermore, approximately half of US university presses also now report to the library, further extending and integrating library publishing efforts.
Open research services and expertise
Libraries may support other open research services (such as the ones listed below), although few institutions in our roundtable reported supporting all four.
Research data management service offerings have become quite expansive at some institutions, including curating and preserving research data, facilitating compliance with funder mandates, and promotion of FAIR data principles. Many libraries now offer infrastructure for data storage and sharing, often through the establishment of a local data repository or facilitated access to an external repository like Dryad.
Scholarly communications consultations offer guidance on OA policies, funder mandates, author rights and copyright, licensing, and data sharing. Many participants reported that their libraries were increasing their outreach to researchers related to rights retention statements, particularly in light of funder mandates requiring authors to retain rights.
Research impact metrics to assess factors like data reuse, open access publications, and the broader impact of open research is an emerging area of interest in some libraries.
Open Educational Resources (OERs) are a growing priority for many libraries, in alignment with institutional goals to support textbook affordability, equity, and student success. Some libraries now offer platforms as well as expert guidance to support OER creation, occasionally with funding to incentivize OER development and adoption by faculty.
Library-provisioned infrastructure
Furthermore, libraries have invested significant resources in infrastructures necessary to support open research, particularly:
Institutional repositories (IRs)
Data repositories
Data Management Planning (DMP) platforms like the DMPTool (US and Australia), DMPAssistant (Canada), and DMPonline (UK)
Open journal and monograph publishing platforms
Research information management systems (RIMS/CRIS) may also be used for tracking and reporting on OA goals
Library support for open research is nested within the broader institutional research enterprise. Despite significant library leadership, services, and infrastructure investments, campus awareness of library contributions to open research may be limited. For example, at one RLP institution, the Office of Research independently launched a new open science initiative, with no recognition of library expertise, services, or infrastructure. However, the initiative requires multi-stakeholder engagement, and the library now provides much of the programing for the effort. Collaboration on such a multi-faceted effort faced ongoing challenges due to persistent leadership turnover, decentralization, resource scarcity, and competing organizational priorities.
Factors shaping open research
Roundtable participants described many conditions that are shaping open research, both positively and negatively:
Mandates are a key driver of open research, with significant regional differences. For example, the UK’s Research Excellence Framework (REF) includes an open access mandate, and compliance directly impacts institutional research funding. This, together with open research mandates from UKRI (the primary government research funder), is driving institution-level commitments to open research. In the US, increased data sharing stems primarily from funder mandates rather than research initiative.
Researcher engagement remains low across many institutions, with low researcher interest in open publishing, particularly through green OA. Faculty promotion and tenure criteria rarely reward open research practices, while the need to publish in prestigious journals continues to drive faculty publishing decisions.
Misconceptions about research security create hesitancy toward open research, a concern that grows as more institutions experience cyberattacks. More than one roundtable participant described how articulating the principle of “as open as possible, as closed as necessary,” can address concerns.
Resource constraints present universal challenges to open access publishing. The rising costs of transformative agreements strain budgets. While many institutions have entered into transformative agreements (also referred to as read and publish or publish and read agreements) with publishers, questions about their effectiveness in transitioning to a fully open ecosystem are arising, particularly in the UK. Many small- to mid-size libraries may often lack the resources to participate in these agreements. Furthermore, many institutions offer minimal or no central APC subvention funds, and those that do support APCs find this popular resource quickly depleted, raising sustainability concerns.
Roundtable participants discussed both the challenges and opportunities generative AI presents for research. The widespread use of scholarly content, particularly OA content, in large language models (LLMs), coupled with ongoing uncertainties about fair use, copyright, and data ownership is raising concerns among researchers. Many are becoming hesitant to share their work openly, fearing its use in AI training datasets. This uncertainty is slowing adoption of open research in some locales and creating education and outreach challenges for libraries. Additionally, a few participants described how the rapid accessing and downloading of repository content by AI bots and agents is straining repository infrastructure. This challenge is so disruptive that at least one institution is considering restricting machine access to digital resources—counter to open principles—to maintain platform stability.
A few library leaders expressed an interest in GenAI and research impact analysis. For instance, a few libraries are exploring new ways to assess open research adoption by tracking data and software availability statements. Other are interested in the tracking of outputs related to AI, and some institutions are beginning to track AI-related publications as part of research impact metrics. An outstanding challenge, however, is an elusive definition of “AI” for metric purposes, due to the fragmented nature of the field, which includes robotics, machine learning, and machine vision.
Despite the challenges, participants were optimistic about GenAI’s potential to enhance open research, and several institutions are actively exploring ways to integrate AI into their infrastructure. For example, one institution is hoping to use more AI tools to enrich repository content, potentially involving structured XML representations and enhanced content like video abstracts. Another is focusing on helping researchers apply careful data curation approaches to building machine learning models. Additionally, multiple RLP institutions report hiring AI research librarians and data scientists to expand research support services and offer AI literacy to campus populations.
Conclusion
Libraries have invested heavily in open research and have the opportunity to play an important leadership role. However, the provision of research support services remains loosely organized at most institutions, congruent with the absence of strong institutional prioritization of open research. Where open research is a strategic priority, services tend to be more centralized and coordinated within the library—a trend likely to accelerate with increasing open research mandates.
AI Nota Bene: I used this blog post as an opportunity to experiment with and learn how AI tools can support writing. I specifically leveraged WebEx AI-generated summaries and transcripts, Google NotebookLM, and Claude, along with some more limited experiments with Copilot and ChatGPT. I used NotebookLM to help me identify key themes from my own notes and discussion transcripts, in conjunction with relevant external sources I selected. This was useful as I developed an outline for the blog post. While I did adapt some suggested language from NotebookLM, I did the majority of the writing myself, as I still found this necessary for including relevant details as well as maintaining my authorial voice. I found Claude to be useful as an editor and proofreader of a near-final draft, as I prompted it to recommend ways I could improve clarity and conciseness. I incorporated many, but not all, of Claude’s suggestions.
The UPS Prototype was a proof-of-concept web portal built in preparation of the Universal Preprint Service Meeting held in October 1999 in Santa Fe, New Mexico. The portal provided search functionality for a set of metadata records that had been aggregated from a range of repositories that hosted preprints, working papers, and technical reports. Every search result was overlaid with a dynamically generated SFX-menu that provided a selection of value-adding links for the described scholarly work. The meeting outcome eventually led to the Open Archives Initiative and its Protocol for Metadata Harvesting (OAI-PMH), which remains widely used in scholarly communication, cultural heritage, and beyond. The SFX-menu approach became standardized as the NISO OpenURL Framework for Context-Sensitive Services (NISO OpenURL) and compliant linking servers remain operational in academic and research libraries worldwide. Both OAI-PMH and NISO OpenURL as well as associated systems and services, have been so widely deployed that they can safely be considered an integral part of the scholarly information infrastructure. The authors, who were deeply involved in devising the UPS Prototype and played core roles in the OAI-PMH and NISO OpenURL specification efforts, take the reader behind the scenes of the development of these technologies and reveal Clifford Lynch as the Invisible Influencer in the establishment of scholarly Information Infrastructure.
Introduction
The book “Music with Roots in the Aether” bundles interviews and essays about seven contemporary American composers, including Philip Glass, Alvin Lucier, and Pauline Oliveros [1]. It was published in 2000 and is based on a 1975-1976 project by Robert Ashley – himself a contemporary American composer – for which he created 14 one-hour videotapes focusing on the creative genius of the selected composers [2]. Ashley was asked to write a foreword for the book and spends his first paragraphs emphasizing the challenge involved in reflecting on a project he did 25 years earlier:
But the Foreword turned out to be hard, even for me. I couldn’t remember who I was when the project was conceived. I couldn't remember any of the energies of the ideas that went into the project. Purposely I have not been good at remembering old ideas. I burn bridges. It keeps the path clear. [1]
Ashley’s sentiment resonates with us, because, for this 2025 essay, we impulsively chose to illustrate Clifford Lynch’s impact on the development of infrastructure for research and education by means of the Universal Preprint Service (UPS) project that we jointly initiated and executed in 1999. Some memories have remained strong, others have faded and become uncertain, and, undoubtedly a lot has just evaporated into the fabric of time. Fortunately, there are external memories that can serve as fall-backs when ours fail. Many aspects of the project and its context were documented in research papers. These papers reference documents with details about underlying discussions that are long gone from the organizational websites on which they were published, but fortunately were saved for posterity by the indispensable Internet Archive. The petites-histoires featuring people involved in our effort have not been publicly documented because the turn of the century was still social-media free. But personal touches often aptly illustrate the spirit of a project and the zeitgeist of the era in which it took place. Therefore, we feel compelled to also include a few anecdotes that remain glittering sharp in our memory. All in all, despite the fog of time, we are quite confident that the story we tell is an accurate reflection of events that were crucial to the eventual broad adoption of metadata harvesting using the Open Archives protocol (OAI-PMH) and open linking using OpenURL, and, especially, of the crucial role Cliff played in making that happen.
Two PhD Candidates, in for a Surprise
The middle to late 1990s were exciting times for those into computers, networks, and information. Times that seemed to hold an unlimited potential, rather abruptly brought about by the combination of the HTTP/HTML Web, the mainstreaming of the Internet, affordable personal computing, and increased digitization capabilities. Like many others, we were excited about how these technologies could bring about a better world and consequently devoured Wired, a magazine that abounds with “techno-utopianism and hippie-idealism” [3]. We had jobs that presented challenges in which this powerful combination of technologies could be leveraged to imagine and implement innovative solutions.
Herbert became systems librarian at the Ghent University Library in 1981, after completing an administrative automation project there to obtain a degree in Informatics. He didn’t exactly hit the ground running as he was trying to figure out what automation in academic libraries was all about. Most libraries were focusing their efforts on the catalogue but, given his science education, that didn’t seem to tick all the boxes. Eventually, it was the science librarian who turned on the light by putting the automation challenge in terms of the “consultation chain”: first searching secondary sources to find journal articles, then searching catalogues to determine where the journals were, and then obtaining the articles. And so it was that, as soon as CD-ROMs became available, Herbert started providing public access to Abstract & Indexing (A&I) databases, initially on stand-alone PCs, later on PCs in Local Area Networks (LAN), and eventually on PCs across the university’s Wide Area Network. He initiated an effort to create a Belgian Union Catalogue on CD-ROM and hooked it up to the network too (Figure 1). Access dramatically improved but constraints remained: consultation was restricted to Windows PCs, the LANs had to run the Banyan Vines operating system, and networking a large collection of CD-ROMs published by a variety of vendors was a dark art. It all amounted to access being restricted to dedicated library PCs operated in departmental libraries, which was better than what most other European academic libraries had to offer but not good enough for Herbert. That is why he experienced the interoperability fabric introduced by the Web as the chains coming off regarding ways to deliver scholarly information to researchers and students. That enthusiasm resulted in the 1997 release of the Library’s Executive Lounge, a menu-driven environment that provided web-based access to all information that had previously only been available on library PCs, with the addition of some electronic journal collections for good measure. But something was still missing: the Web had links and the Executive Lounge didn’t. Herbert put it as follows:
When using a library solution, the expectations of a net-traveler are inspired by his hyperlinked Web-experiences. To such a user, it is not comprehensible that secondary sources, catalogues and primary sources, that are logically related, are not functionally linked. [4]
Figure 1 - September 1989, Ghent University Library: Herbert showing off a CD-ROM
The frustration expressed in this quote led to a collaboration with SilverPlatter and Ex Libris to implement dynamic links from journal article descriptions in A&I databases to journal holding information in the library catalogue. And it also provided fertile ground for PhD research on how to empower libraries to create links across their electronic collections by means of an open linking framework.
Michael began his professional career at the NASA Langley Research Center (LaRC) in 1991, originally working in the Analysis and Computation Division of the supercomputer center. Early experiences with Usenet and anonymous FTP began to divert his attention from supercomputing and cluster computing (now known as cloud computing) to information networks and libraries. In 1993, he set up an anonymous FTP server, the Langley Technical Report Server (LTRS), for technical memorandums and technical papers published by LaRC. It effectively brought the culture to share and access technical reports via FTP, which already existed in computer science, to NASA. Later, in 1993, he added a web interface to LTRS, providing a much-needed boost in usability. Browsing functionality improved, abstracts were indexed, and became searchable using the Z39.50-based Wide Area Information Server (WAIS), which was pretty much the only free search software at the time (for example, MySQL was not released until 1995). Around the same time, the Center for AeroSpace Information (CASI) brought their own WAIS server online; it provided abstracts for all publicly available, NASA authored reports and articles. Other centers and projects were inspired by this activity and wanted to set up their own "report server". It became clear that a website - the term "digital library" was not yet widely adopted - allowing simultaneous WAIS search of all the NASA and NASA-affiliated report servers was needed. A bit of Perl hacking later, by Michael and his colleagues, and the NASA Technical Report Server (NTRS) was released in 1994 (Figure 2).
Figure 2: Summer 1999, NASA Langley Research Center: Michael among the desktop machines running LTRS and NTRS
The development of LTRS and NTRS assumed a 1:1 relationship from a metadata record to the URL of the associated full text document. But with the progression from ".ps.Z" to ".pdf" files, the usefulness of that assumption started to break down. It became unworkable by 1998, when Michael created a separate digital library for the scanned documents of the National Advisory Committee for Aeronautics (NACA), the 1915–1958 predecessor of NASA. Obviously, none of these documents were born digital, and a single NACA report presented on the web was composed of TIFF images, large and thumbnail JPEGs, and a PDF of the entire report. Based on the experience of managing and presenting these collections of files as a single web object, Michael's dissertation [5] evolved in the direction of creating buckets, the smart web objects in the Smart Objects, Dumb Archives (SODA) model [6]. The basic premises of SODA were that individual reports are more important than the repositories that hold them, and that it should be possible for multiple digital libraries to simultaneously make them discoverable. This 1997 insight is now commonplace, but went against the conventional wisdom of the time. It precedes, yet aligns with, the perspective of the W3C Architecture of the World Wide Web that individual resources are more important than the web servers that host them [7]. As a matter of fact, the Architecture of the World Wide Web only mentions resources, not web servers.
As Herbert and Michael embarked on their respective PhD explorations on different sides of the Atlantic, they didn’t realize they were about to meet, to collaborate on the UPS project, and to present their results at a meeting that would be moderated brilliantly by Cliff Lynch, a man they both admired but had never met in person.
The UPS Prototype
By early 1999, Herbert’s ideas to give libraries a say regarding links across their electronic collections had taken shape [8]. He had also conducted an experiment illustrating the components of the open linking framework he envisioned. A linking server operated by the library would feature a knowledge base detailing its collection as well as a rule engine that would dynamically decide which links to provide for which type of collection item. A user interested in links for a specific item would click the associated Special Effects link (SFX) that was targeted at the linking server and contained keys that allowed the server to collect sufficient metadata about the item to evaluate the rules and return item-specific links [9]. But inserting SFX links required control of the systems that provided access to the collection and, as such, the experiment only used sources operated locally by the Ghent University Library. Demonstrating the general feasibility of the approach required an experiment without such constraints.
When Rick Luce, director of the Los Alamos National Laboratory Research Library, visited the Ghent Library to check out the linking approach, it became clear that his groundbreaking Library Without Walls project [10] would provide the ideal setting: its collection combined locally and remotely controlled sources, including locally operated full-text, and it maintained close relationships with various parties in the scholarly information industry. So, Herbert packed up in February 1999 for a six-month stint in Los Alamos and successfully conducted an elaborate experiment that demonstrated the feasibility of the approach with sources under both local and remote control, including full-text collections from Wiley and the American Physical Society, and involved linking servers at Los Alamos and Ghent [11].
Figure 3 - Summer 1999, Donna Bergmark’s home: Rick Luce, Herbert, and Paul Ginsparg celebrating the call to action
But Los Alamos was also where the famous physics preprint server - then known as xxx.lanl.gov (now known as arXiv) - ran under Paul Ginsparg’s desk [12]. Having witnessed many years of fierce discussions at Ghent University about subscriptions to journals and their ever-increasing price tag, Herbert very much understood the appeal of the new communication paradigm it entailed and had brought his video camera along to Los Alamos, hoping he might get a chance to interview the much-revered Ginsparg. He shouldn’t have bothered. It turned out that Rick and Paul were already exploring whether the Library Without Walls, which ran a mirror of the preprint server, could become its institutional host, and Herbert started taking part in those conversations.
One brainstorm led to another and by the time Herbert got ready to return to Ghent, the trio published a call to action (Figure 3) for “the further promotion of author self-archived solutions” in which they announced a meeting with 25 invited experts to be held in Santa Fe, NM, in October 1999, to kick things off [13]. The stated goals were “to reach an agreement regarding an approach to build a promotional prototype multidisciplinary digital library service for the main existing e-print archives” and “to create a forum that will continue to address the interoperability of self-archiving solutions, as a means to promote their global adoption” [14]. To this day, Herbert vividly remembers the thrilling moment when he pushed the Send button on his Toshiba laptop to distribute the final version of the call to action to various listservs, while sitting in his tiny apartment in Santa Fe that was sparsely outfitted with rented furniture (Figure 4).
Figure 4: July 27 1999, Calle Mejia: Herbert sends out the invitation for the Santa Fe Meeting
Over time, Herbert had come to understand and embrace the “seeing is believing” power of prototypes. He had decided that a concrete strawman to illustrate services across e-print repositories would be needed to fuel discussions; but he would need collaborators to pull that off. When reaching out to various e-print repositories to obtain metadata dumps, Thomas Krichel, a major force behind the Research Papers in Economics (RePEc) [15] effort, enthusiastically came on board. Rick Luce identified just the other person who was needed. Via the New Mexico Library Alliance, he knew Michael’s supervisor Mike Little; together, they engineered a meeting in Washington, DC, anticipating that their Young Turks would resonate. During a four-day meeting in April 1999 it became clear they very much did, although their initial meeting didn’t get off to a good start due to Michael getting hopefully lost driving around Dupont Circle, in those pre-GPS days. They drew up technical plans for a prototype and even managed to get a meeting with Deanna Marcum at the Council on Library Information and Resources (CLIR) and Donald (Don) Waters at the associated Digital Library Federation (DLF), securing support and funding for the meeting and the prototype.
Figure 5 – October 1999: UPS Prototype - A bucket for a preprint shows ReDIF metadata and two SFX links
Together, Herbert, Michael, and Thomas, started working on the UPS Prototype to be presented at the very outset of the planned Santa Fe Meeting. And, although the prototype was intended “not to make statements about the architectural directions that UPS should take, but rather to facilitate discussions,” [11] its design did entail some significant technical choices. Metadata would be collected from various-print repositories using static dumps; metadata would be normalized to the ReDIF format [16] used in the RePEc initiative; the SODA model would be used to manage/present individual e-prints as buckets; search across the aggregated metadata would be realized using the NCSTRL+ extension of Dienst that supported buckets; each e-print-specific bucket would provide SFX linking capabilities (Figure 5). In order to realize this all in a six-month period, the prototype trio brought more help on board. And they met twice in person: once in Ghent, where Thomas showed up totally drenched on Herbert’s doorstep after having biked through heavy rain from Ostend; once in Los Alamos, prior to the Santa Fe Meeting, when Thomas arrived hours late having biked (Figure 6) through heavy snow from Albuquerque and spent the night in a drainage pipe (colleagues arriving late was a recurring theme for Herbert). Michael mostly remembers it being bitterly cold, since his earlier visit to Albuquerque in the Summer of that same year had not taught him to pack a sweater for Santa Fe in Fall. And, despite hiccups that plague every project, the well-documented UPS Prototype [17] was finished on time, ready to be presented to the meeting participants.
The Santa Fe Meeting for ...?
In order to further optimize the chances of success for the meeting, the collaboration of Cliff Lynch and Don Waters as moderators had been secured and turned out to be fundamentally important. In the Acknowledgments section of his PhD thesis, Herbert put Cliff’s impact on the direction of the meeting and on his own thinking as follows:
When starting to work on this thesis, I went back reading several of his early papers and could not feel other than intimidated by the far forward-looking vision expressed therein. At several occasions, I heard Cliff address large audiences, discussing complicated digital library matters with an amazing clarity. Cliff's work has always been a great inspiration to me. I met Cliff for the first time in person at the Open Archives meeting in Santa Fe, for which he had enthusiastically accepted my invitation to serve as a moderator. His involvement was crucial to the successful conclusion of the meeting. [18]
Figure 6 - October 21 1999, Fort Marcy: Thomas Kirchel’s bike made it to the Santa Fe Meeting
The meeting started off in a very concrete manner, with the presentation of the UPS Prototype, some exposes on repository interoperability, and reflections about institutional versus discipline-oriented archive initiatives. But, as the first day progressed, the discussions got increasingly distracted by back-and-forth arguments about the necessity (or not) of peer-review. The Stevan Harnad “self-archiving” camp (archiving the peer-reviewed version of a contribution on a personal or institutional server) insisted it is essential to keep scholarly communication trustworthy, whereas the Paul Ginsparg “preprint” camp (publishing unreviewed contributions on a discipline-oriented or institutional server) stated that knowledgeable readers can assess quality without external review and that novice readers should wait until a peer-reviewed version becomes available. Michael also remembers Paul saying something to the effect that the meeting would be a lot more productive if everyone just learned how to program in Perl and then do something instead of just talking about it. The peer-review tension had already been present prior to the meeting and is even reflected in the evolution of the title of its announcement: an unpublished version dated April 1999 was entitled “Call for participation aimed at the further promotion of the preprint concept”, the version published in July 1999 was entitled “Call for your participation in the UPS initiative aimed at the further promotion of author self-archived solutions”, whereas post-meeting the title was modified to become “The Open Archives initiative aimed at the further promotion of author self-archived solutions.” The choice of the term “archives” didn’t go down well with professional archivists [19], but it did neutralize the disagreement regarding peer-review. By the end of the first day, when participants mingled at the Santa Fe Institute (Figure 7), Herbert was frustrated despite a successful demonstration of the prototype. His bad mood must have been tangible because Ed Fox, whom Herbert had met for the first time at the meeting, volunteered one of his patented neck massages.
Figure 7 – October 21 1999, Santa Fe Institute: Herbert and Michael at the end of the first meeting day
That night, sleep would not come and Herbert, jetlagged and sleep-deprived, had incessant incoherent thoughts on how to get the meeting back on track. Prior to the start of the second day, he vented his frustration about the lack of progress to Cliff, who was about to start moderating the first session. Cliff was nice enough to let him ramble on a bit, and, in a manner that exemplified one of Cliff’s many unparalleled capabilities, he went on to open the meeting by providing two discussion topics regarding interoperability that he somehow had been able to synthesize from the first day’s discussions, which most had experienced as enjoyable yet lacking in any sense of concrete direction. One was whether archive functions, such as data collection and maintenance, should be decoupled from user functions, such as search. The other was about the choice between distributed searching across repositories and harvesting from them to build cross-repository search engines. This is what the meeting report has to say about the outcome of discussion regarding the first topic:
Although archive initiatives can implement their own end-user services, it is essential that the archives remain "open" in order to allow others to equally create such services. This concept was formalized in the distinction between providers of data (the archive initiatives) and implementers of data services (the initiatives that want to create end-user services for archive initiatives). [20]
The outcome of discussions of the second topic in favor of a harvesting solution is somewhat remarkable because distributed search using WAIS/Z39.50 was quite in vogue in libraries and digital libraries in those days. Cliff himself had a significant track record in Z39.50 and its standardization [21, 22],, but he had also identified harvesting approaches as a topic for further research [23]. Motivated by complexity and scalability concerns, he gently nudged discussions in favor of harvesting. In a paper in which he clarifies the complementary nature of Z39.50 and OAI-PMH, Cliff credits the meeting participants for the decision that was considered controversial by some in the community:
The Santa Fe group wanted a very simple, low-barrier-to-entry interface, and to shift implementation complexity and operational processing load away from the repositories and to the developers of federated search services, repository redistribution services, and the like. They also wanted to minimize the interdependency between the quality of applications services as viewed by the user and the behavior of repositories that supplied data to the applications services. [24]
By the end of the meeting, there was a general sense that the UPS Prototype had been helpful to illustrate the potential of cross-repository services and, hence, to emphasize the need for cross-repository interoperability. A paper that provides a rich summary of the Santa Fe Meeting describes it as follows:
There was general agreement among the participants at the meeting that the Prototype was an extremely useful demonstration of potential. There was also agreement, however, that trying to reach consensus on the full functionality of the prototype was "aiming too high" and that a more modest first step was in order. [25]
Towards OAI-PMH and OpenURL
By turning the focus of the meeting on these two topics, Cliff fundamentally changed its course. By thoughtfully guiding the discussions towards these concrete outcomes, he set the stage for work on what would become the Open Archives Initiative Protocol for Metadata Harvesting [26], of which both Herbert and Michael became editors. Undoubtedly, he had plenty of technical skills that would have allowed him to make significant contributions to the actual specification effort. But in a manner that characterizes Cliff, he silently took a step back and let the community decide its direction while expressing continued support for the work, on many occasions, and at venues around the world. There can be no doubt that his endorsement played a crucial role in the global adoption of OAI-PMH, which has been an integral part of the scholarly and cultural heritage infrastructure for over two decades.
The focus on interoperability to realize just a single aspect demonstrated by the prototype - cross-repository discovery - also meant that discussions about its other technical ingredients, including SFX linking, would have to be postponed. But both Cliff and Don were very much aware of the problem it addressed and about the nature of the proposed solution. They were both part of the NISO Reference Linking Working Group [27] that investigated how to tackle the so-called “appropriate copy problem”, which, simplifying the charge to the Group, can be summarized as follows: “how to resolve a reference link to a paper in such a manner that it ends up at one of potentially many distributed copies of that paper to which a user, covered by an institutional subscription, has access?”
The Working Group resulted from a meeting in February 1999 [28], in which various models for a link localization solution had been explored [29, 30]., Don Waters invited Herbert to present his linking work at a second meeting in June 1999 [31]. And, although the meeting report has nice things to say about SFX linking [32], including its ability to address link localization challenges beyond the appropriate copy problem, he remembers profusely apologizing to Don about a presentation not done well. Still, after the demonstration at the Santa Fe Meeting, Cliff extended an invitation for a presentation at the Spring 2000 meeting of the Coalition for Networked Information [33]. The room was packed with representatives from libraries, the scholarly publishing industry, and library system vendors, and the talk became a veritable breakthrough moment for SFX linking. But significant tasks remained, including standardizing the SFX link syntax and demonstrating the ability of the approach to integrate with the emerging DOI-based reference linking approach pursued by journal publishers and instantiated by CrossRef [34].
The standardization’s history is well documented [35]; it started in December 2000 when the original SFX URL specification [36] - by then renamed OpenURL - was submitted to NISO and concluded five years later with the release of The OpenURL Framework for Context-Sensitive Services [37]. The DOI integration was explored by means of a limited prototype [38] that was demonstrated and discussed at the July 2000 NISO/DLF/CrossRef meeting [39]. As the meeting seemed to reach a consensus in favor of the proposed model with an institutional localization component powered by OpenURL - essentially the SFX open linking approach - a question was brought forward as to whether the model with a centralized localization component that had been identified in the first meeting of the Working Group should also be further discussed. At that point, Cliff decidedly stepped in stating “No. We have a solution!” In doing so, he paved the way for the endorsement of the OpenURL linking framework by the Working Group, the rigorous testing of its feasibility in an extended prototype [40], and its eventual acceptance in the US scholarly communication community and beyond. Afterwards, Cliff continued to express support for the approach at numerous venues and gave it his strongest possible endorsement by becoming a member of Herbert’s PhD jury.
Thank you, Cliff
By means of the UPS Prototype effort, this essay has illustrated Cliff’s fundamental impact on the direction infrastructure for research, education, and cultural heritage has taken in the past decades. Two technologies, OpenURL that was used in the Prototype and OAI-PMH that resulted from the Prototype, became an integral part of that infrastructure. Hopefully, the essay has adequately shown that Cliff had a significant part in making that happen, not as an author of specifications, a writer of code, or a builder of tools. But rather as an identifier of problems to come and as a perceptive influencer, gently nudging forward the solutions he believed in and strongly supporting the community efforts that realized them. We have witnessed the same impact in other efforts we have been involved in since the UPS Prototype and can safely assume that others have experienced it in their projects aimed at improving the status-quo of scholarly information infrastructure.
We do want to emphasize that, as we dreamt up the outlines of the UPS Prototype, we were early career researchers with a visible, yet modest track record. Cliff (CNI), along with Paul Ginsparg (LANL), Rick Luce (LANL), Deanna Marcum (CLIR), and Don Waters (DLF) strongly and publicly endorsed our effort, shone the spotlight on us, and in doing so had a major impact on our career trajectories. We vividly remember receiving that support and the experience has led us to similarly support the young researchers we have mentored since.
Figure 8 – December 12 2017, Washington, DC: Cliff and Herbert at the Fall 2017 CNI Membership Meeting
As we were selected to write a contribution for this Festschrift, on behalf of all infrastructure plumbers, we want to profoundly thank Cliff. Scholarly infrastructure would not have progressed the way it did without him. We don’t envy the person who will step into his shoes once he has retired. The work ahead is enormous, with needs for new infrastructure and existing infrastructure crumbling. Indeed, OAI-PMH is being supplanted due to its reliance on XML, a technology that has become arcane in a JSON world. And the OpenURL Framework is under attack by the centralized Get Full Text Research [41] effort, launched by the major commercial publishers, that mutes the capabilities of libraries to influence the nature of links across their electronic collections. While 25 years of OAI-PMH and OpenURL do not put those technologies in the same IT infrastructure league of - say - UNIX, it is a substantial period considering that the lifetime of many digital library phenomena can typically be measured in terms of months or years, not decades. Cliff’s influence is directly traceable in the global penetration and longevity of these two technologies that go all the way back to the 1999 UPS Prototype.
[2] Robert Ashley, David Behrman, Philip Glass, Alvin Lucier, Gordon Mumma, Pauline Oliveros, Terry Riley. “Music with roots in the aether”. June 1, 1977-June 18, 1977. The Kitchen, New York, New York. https://thekitchen.org/on-file/music-with-roots-in-the-aether/
[4] Herbert Van de Sompel, Patrick Hochstenbach, and Tobias De Pessemier, “The hybrid information environment and our Intranet solution to access it“, Ghent University Academic Bibliography, 1997, accessed on January 27, 2025, https://hdl.handle.net/1854/LU-1056689
[5] Michael L. Nelson, "Buckets: smart objects for digital libraries,", PhD dissertation, Old Dominion University Digital Commons, 2000, accessed on January 28, 2025, https://doi.org/10.25777/gbh6-7d07
[6] Michael L. Nelson et al. “SODA: Smart Objects, Dumb Archives,” Lecture Notes in Computer Science 1696, (1999): 453-464, accessed on January 27, 2025, https://doi.org/10.1007/3-540-48155-9_28
[7] Ian Jacobs and Norman Walsh, “Architecture of the World Wide Web, Volume One,” W3C Recommendation, 15 December, 2004, accessed on January 27, 2025, https://www.w3.org/TR/webarch/
[8] Herbert Van de Sompel and Patrick Hochstenbach. “Reference Linking in a Hybrid Library Environment. Part 1: Frameworks for Linking,” D-Lib Magazine 5, no 4 (1999), accessed on January 27, 2025, https://doi.org/10.1045/april99-van_de_sompel-pt1
[9] Herbert Van de Sompel and Patrick Hochstenbach. “Reference Linking in a Hybrid Library Environment. Part 2: SFX, a Generic Linking Solution,” D-Lib Magazine 5, no 4 (1999), accessed on January 27, 2025, https://doi.org/10.1045/april99-van_de_sompel-pt2
[10] “Library Without Walls Welcome Page,” 1999, archived at the Wayback Machine, April 28, 1999,
[11] Herbert Van de Sompel and Patrick Hochstenbach. “Reference Linking in a Hybrid Library Environment. Part 3: Generalizing the SFX Solution in the SFX@Ghent & SFX@LANL experiment,” D-Lib Magazine 5 no 10, accessed on January 27, 2025, https://doi.org/10.1045/october99-van_de_sompel
[18] Herbert Van de Sompel. “Dynamic and context-sensitive linking of scholarly information,“ Ghent University Academic Bibliography, 2000, accessed on January 27, 2025, https://hdl.handle.net/1854/LU-522209
[19] Michael L. Nelson, "To the Editor: Response to Peter Hirtle's April 2001 editorial, OAI and OAIS: What's in a Name?," D-Lib Magazine 7 no 5, accessed on February 12, 2025, https://doi.org/10.1045/may2001-letters
[21] Clifford A. Lynch. “RFC1729: Using the Z39.50 Information Retrieval Protocol in the Internet Environment,” December, 1994, accessed on January 27, 2025, https://datatracker.ietf.org/doc/rfc1729/
[22] Clifford A. Lynch. “The Z39.50 Information Retrieval Standard - Part I: A Strategic View of Its Past, Present and Future,” D-Lib Magazine 3, no 4 (1997), accessed on January 27, 2025, https://hdl.handle.net/cnri.dlib/april97-lynch
[25] Herbert Van de Sompel and Carl Lagoze. “The Santa Fe Convention of the Open Archives Initiative,” D-Lib Magazine 6, no 2 (2000), accessed on January 27, 2025, https://doi.org/10.1045/february2000-vandesompel-oai
[26] Carl Lagoze, Herbert Van de Sompel, Michael L. Nelson, and Simeon Warner. "The Open Archives Initiative Protocol for Metadata Harvesting," June 14, 2002, accessed on January 28, 2025, https://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm
[30] Priscilla Caplan and William Y. Arms. “Reference Linking for Journal Articles,” D-Lib Magazine 5, no 7/8 (1999), accessed on January 27, 2025, https://doi.org/10.1045/july99-caplan
[31] National Information Standards Organization. “Second Workshop on linkage from citations to electronic journal literature,” June 1999, archived at the Wayback Machine, July 7, 2000,
[34] Helen Atkins et al. “Reference Linking with DOIs: A Case Study,” D-Lib Magazine 6, no 2 (2000), accessed on January 27, 2025, https://doi.org/10.1045/february2000-risher
[38] Herbert Van de Sompel and Oren Beit-Arie. “Open Linking in the Scholarly Information Environment Using the OpenURL Framework,” D-Lib Magazine 7, no 3 (2001), accessed on January 27, 2025, https://doi.org/10.1045/march2001-vandesompel
[39] "Meeting Report of the NISO/DLF/CrossRef Workshop on Localization in Reference Linking,” July, 2000, archived at the Wayback Machine, December 6, 2000,
While doing the research for a future talk, I came across an obscure but impressively prophetic report entitled Accessibility and Integrity of Networked Information Collections that Cliff Lynch wrote for the federal Office of Technology Assessment in 1993, 32 years ago. I say "obscure" because it doesn't appear in Lynch's pre-1997 bibliography.
To give you some idea of the context in which it was written, unless you are over 70, it was more than half your life ago when in November 1989 Tim Berners-Lee's browser first accessed a page from his Web server. It was only about the same time that the first commercial, as opposed to research, Internet Service Providers started with the ARPANET being decommissioned the next year. Two years later, in December of 1991, the Stanford Linear Accelerator Center put up the first US Web page. In 1992 Tim Berners-Lee codified and extended the HTTP protocol he had earlier implemented. It would be another two years before Netscape became the first browser to support HTTPS. It would be two years after that before the ITEF approved HTTP/1.0 in RFC 1945. As you can see, Lynch was writing among the birth-pangs of the Web.
Although Lynch was insufficiently pessimistic, he got a lot of things exactly right. Below the fold I provide four out of many examples.
Page numbers refer to the PDF, not to the original. Block quotes without a link are from the report.
Disinformation
Page 66
When discussing the "strong bias in the Internet user community to prefer free information sources" he was, alas, prescient although it took more than "a few years":
The ultimate result a few years hence — and it may not be a bad or inappropriate response, given the reality of the situation — may be a perception of the Internet and much of the information accessible through it as the "net of a million lies", following science fiction author Vernor Vinge's vision of an interstellar information network characterized by the continual release of information (which may or may not be true, and where the reader often has no means of telling whether the information is accurate) by a variety of organizations for obscure and sometimes evil reasons.
In the novel, the Net is depicted as working much like the Usenet network in the early 1990s, with transcripts of messages containing header and footer information as one would find in such forums.
The downsides of a social medium to which anyone can post without moderation were familiar to anyone who was online in the days of the Usenet:
Usenet is culturally and historically significant in the networked world, having given rise to, or popularized, many widely recognized concepts and terms such as "FAQ", "flame", sockpuppet, and "spam".
...
Likewise, many conflicts which later spread to the rest of the Internet, such as the ongoing difficulties over spamming, began on Usenet.:
"Usenet is like a herd of performing elephants with diarrhea. Massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it."
— Gene Spafford, 1992
Earlier in the report Lynch had written (Page 23):
Access to electronic information is of questionable value if the integrity of that information is seriously compromised; indeed, access to inaccurate information, or even deliberate misinformation, may be worse than no access at all, particularly for the naive user who is not inclined to question the information that the new electronic infrastructure is offering.
This resonates as the wildfires rage in Los Angeles.
Although Tim Berners-Lee's initial HTTP specification included the status code 402 Payment Required:
The parameter to this message gives a specification of charging schemes acceptable. The client may retry the request with a suitable ChargeTo header.
the Web in 1993 lacked paywalls. But Lynch could see them coming (Page 22):
There is a tendency to incorrectly equate access to the network with access to information; part of this is a legacy from the early focus on communications infrastructure rather than network content. Another part is the fact that traditionally the vast bulk of information on the Internet has been publicly accessible if one could simply obtain access to the Internet itself, figure out how to use it, and figure out where to locate the information you wanted. As proprietary information becomes accessible on the Internet on a large scale, this will change drastically. In my view, access to the network will become commonplace over the next decade or so, much as access to the public switched telephone network is relatively ubiquitous today. But in the new "information age" information will not necessarily be readily accessible or affordable;
The 402 (Payment Required) status code is reserved for future use.
Instead today's Web is infested with paywalls, each with their own idiosyncratic user interface, infrastructure, and risks.
The Death Of "First Sale"
Lynch understood the highly consequential nature of the change in the business model of paid information access from purchasing a copy to renting access to the publisher's copy; from a legal framework of copyright and the "first sale" doctrine, to one of copyright and contract law (Page 30):
Now, consider a library acquiring information in an electronic format. Such information is almost never, today, sold to a library (under the doctrine of first sale); rather, it is licensed to the library that acquires it, with the terms under which the acquiring library can utilize the information defined by a contract typically far more restrictive than copyright law. The licensing contract typically includes statements that define the user community permitted to utilize the electronic information as well as terms that define the specific uses that this user community may make of the licensed electronic information. These terms typically do not reflect any consideration of public policy decisions such as fair use, and in fact the licensing organization may well be liable for what its patrons do with the licensed information.
The power imbalance between publishers and their customers is of long standing, and it especially affects the academic literature. In 1989 the Association of Research Libraries published Report of the ARL Serials Prices Project:
The ARL Serials Initiative forms part of a special campaign mounted by librarians in the 1980s against the high cost of serials subscriptions. This is not the first time that libraries have suffered from high serial prices. For example, in 1927 the Association of American Universities reported that:
"Librarians are suffering because of the increasing volume of publications and rapidly rising prices. Of special concern is the much larger number of periodicals that are available and that members of the faculty consider essential to the successful conduct of their work. Many instances were found in which science departments were obligated to use all of their allotment for library purposes to purchase their periodical literature which was regarded as necessary for the work of the department"
The oligopoly rents extracted by academic publishers have been a problem for close on a century, if not longer! Lynch's analysis of the effects of the Web's amplification of this power imbalance is wide-ranging, including (Page 31):
Very few contracts with publishers today are perpetual licenses; rather, they are licenses for a fixed period of time, with terms subject to renegotiation when that time period expires. Libraries typically have no controls on price increase when the license is renewed; thus, rather than considering a traditional collection development decision about whether to renew a given subscription in light of recent price increases, they face the decision as to whether to lose all existing material that is part of the subscription as well as future material if they choose not to commit funds to cover the publisher's price increase at renewal time.
Thus destroying libraries' traditional role as stewards of information for future readers. And (Page 30):
Of equal importance, the contracts typically do not recognize activities such as interlibrary loan, and prohibit the library licensing the information from making it available outside of that library's immediate user community. This destroys the current cost-sharing structure that has been put in place among libraries through the existing interlibrary loan system, and makes each library (or, perhaps, the patrons of that library) responsible for the acquisitions cost of any material that is to be supplied to those patrons in electronic form. The implications of this shift from copyright law and the doctrine of first sale to contract law (and very restrictive contract terms) is potentially devastating to the library community and to the ability of library patrons to obtain access to electronic information — in particular, it dissolves the historical linkage by which public libraries can provide access to information that is primarily held by research libraries to individuals desiring access to this information. There is also a great irony in the move to licensing in the context of computer communications networks — while these networks promise to largely eliminate the accidents of geography as an organizing principle for inter-institutional cooperation and to usher in a new era of cooperation among geographically dispersed organizations, the shift to licensing essentially means that each library contracting with a publisher or other information provider becomes as isolated, insular organization that cannot share its resources with any other organization on the network.
Surveillance Capitalism
Lynch also foresaw the start of "surveillance capitalism" (Page 60):
we are now seeing considerable use of multi-source data fusion: the matching and aggregation of credit, consumer, employment, medical and other data about individuals. I expect that we will recapitulate the development of these secondary markets in customer behavior histories for information seeking in the 1990s; we will also see information-seeking consumer histories integrated with a wide range of other sources of data on individual behavior.
The ability to accurately, cheaply and easily count the amount of use that an electronic information resource receives (file accesses, database queries, viewings of a document, etc.) coupled with the ability to frequently alter prices in a computer-based marketplace (particularly in acquire on demand systems that operate on small units of information such as journal articles or database records, but even, to a lesser extent, by renegotiating license agreements annually) may give rise to a number of radical changes. These potentials are threatening for all involved.
He described search-based advertising (Page 61)
The ability to collect not only information on what is being sought out or used but also who is doing the seeking or using is potentially very valuable information that could readily be resold, since it can be used both for market analysis (who is buying what) and also for directed marketing (people who fit a certain interest profile, as defined by their information access decisions, would likely also be interested in new product X or special offer Y). While such usage (without the informed consent of the recipient of the advertising) may well offend strong advocates of privacy, in many cases the consumers are actually quite grateful to hear of new products that closely match their interests. And libraries and similar institutions, strapped for revenue, may have to recognize that usage data can be a valuable potential revenue source, no matter how unattractive they find collecting, repackaging and reselling this information.
Of course, it wasn't the libraries but Google, spawned from the Stanford Digital Library Project, which ended up collecting the information and monetizing it. And the power imbalance between publishers and readers meant that the reality of tracking was hidden (Page 63):
when one is accessing (anonymously or otherwise) a public-access information service, it is unclear what to expect, and in fact at present there is no way to even learn what the policy of the information service provider is.
This post has been authored by members of DLF’s Digital Accessibility Working Group.
With the updates to ADA Title II, many had questions about how these changes would affect them and their information organizations. DLF Digital Accessibility Working Group members shared some resources that they’ve come across to learn more:
From Karen Grondin: “The Big Ten Academic Alliance (BTAA) has gathered resources on the new ADA Title II regulations. Included here are links to webinar recordings, slides, and other presentation materials.”
ADA Title II and Academic Libraries (Library Accessibility Alliance)
From Karen Grondin: “This site provides only a brief overview of the ADA Title II regulations without a focus on how this will impact academic libraries. It does, however, have links to related news and resources and a link to the LAA’s Dear Colleague letter to library vendors that may be useful for other institutions.”
From Karen Grondin: “Webinar recording from October 18, 2023 presented by the Library Accessibility Alliance to share the Triangle Research Libraries Network Guide to Negotiating Accessibility in E-Resources Licenses (PDF). The webinar included a walkthrough of this document which was created as part of a larger licensing principles document as well as discussion around how libraries may use this guide to help with negotiations. In August 23, 2024, a follow-up webinar, “Licensing for Accessibility: Ask Me Anything” was held, but was not recorded. The group is working on revisions to the guide and will likely make changes to the negotiating strategies in relation to the ADA 504 updates.”
Understanding the Revised ADA Title II: Implications for Library Publishing (Library Publishing Coalition and Library Accessibility Alliance)
Shared by Wendy Robertson: [Full description] “…Join Pete Bossley, former Deputy ADA Coordinator at The Ohio State University and current Senior Manager of Accessibility at Thomson Reuters for a 60-minute webinar (30-minute presentation followed by Q&A) about the revisions to ADA Title II and its implications for library publishing. He will discuss what public entities need to know about their obligations under the new regulations, and what organizations serving these entities can do to support them in meeting those requirements. Angel Peterson, Production Specialist and Accessibility Coordinator at Penn State, as an expert in both digital accessibility and library publishing, will facilitate the Q&A….”
ADA Title II: Implications for Accessibility and Equity in Education
From Adele Fitzgerald: “An insightful webinar that gathers experts from educational institutions, ed-tech suppliers, and public sector to discuss the implications of these new regulations. Hosted by D2L and recorded Nov 7, 2024.”
How to Comply with DOJ’s Seemingly Impossible Web Accessibility Regulation
From Elliott Stevens: “This is such a simple, elegant, and devastating digital accessibility test. Workers with varying levels of accessibility testing skills can do the No Mouse Challenge on a website (or digital tool or application), and the No Mouse Challenge not only reveals if a website is accessible without a mouse but also shows whether or not it’s organized hierarchically and according to WCAG 2.1AA levels. If a website fails the No Mouse Challenge, it’s a problem.”
Preparing for the European Accessibility Act – Essential Guidance for Publishers (KGL)
From Wendy Robertson: “The European Accessibility Act will go into effect June 2025. This webinar (18 November 2024) is for publishers, with a lot of attention to books, but journals are also covered.”
Vicky and I were invited to contribute to a festschrift celebrating Cliff Lynch's retirement from the Coalition for Networked Information. We decided to focus on his role in the long-running controversy over how digital information was to be preserved for the long haul.
A long time ago in a Web far, far away it is a period of civil war between two conceptions of how digital information could be preserved for posterity. On one side is the mighty Empire, concerned with the theoretical threat of format obsolescence. On the other are the Rebels, devoted to the practical problem of collecting the bits and ensuring that they survive. Among the rebels are the Internet Archive and the LOCKSS Program. This is the story of how the rebels won, thanks in no small part to Cliff Lynch's sustained focus on the big picture.
Thirty Years Ago
It all started just thirty years ago. In January 1995 the idea that the long-term survival of digital information was a significant problem was popularized by Jeff Rothenberg's Scientific American article Ensuring the longevity of digital documents. Rothenberg's concept of a "digital document" was of things like Microsoft Word files on a CD, individual objects encoded in a format private to a particular application. His concern was with format obsolescence; the idea that the rapid evolution of these applications would, over time, make it impossible to access the content of objects using an obsolete format.
Rothenberg was concerned with interpreting the bits; he essentially assumed that the bits would survive. Given the bits, he identified two possible techniques for accessing the content:
Format migration: translating the content into a less obsolete format to be accessed by a different application.
Emulation: using a software implementation of the original computer's hardware to access the content using the same application.
Emulation was a well-established technique, dating from the early days of IBM computers.
The Web
But five months later an event signalled that Rothenberg's concerns had been overtaken by events. Stanford pioneered the transition of academic publishing from paper to the Web when Vicky was part of the HighWire Press team that put the Journal of Biological Chemistry on the Web. By then it was clear that, going forward, the important information would be encoded in Web formats such as HTML and PDF. Because each format with which Rothenberg was concerned was defined by a single application it could evolve quickly. But Web formats were open standards, implemented in multiple applications. In effect they were network protocols.
The deployment of IPv6, introduced in December 1995, shows that network protocols are extraordinarily difficult to evolve, because of the need for timely updates to many independent implementations. Format obsolescence implies backwards incompatibility; this is close to impossible in network protocols because it would partition the network. As David discussed in 2012's Formats Through Time, the first two decades of the Web showed that Web formats essentially don't go obsolete.
The rapid evolution of Rothenberg's "digital documents" had effectively stopped, because they were no longer being created and distributed in that way. Going forward, there would be a legacy of a static set of static documents in these formats. Libraries and archives would need tools for managing those they acquired, and eventually emulation, the technique Rothenberg favored, would provide them. But by then it turned out that, unless information was on the Web, almost no-one cared about it.
Integrity of Digital Information
Thus the problem for digital preservation was the survival of the bits, not of their format, aggravated by the vast scale of the content to be preserved. In May the following year Brewster Kahle established the Internet Archive to address the evanescence of Web pages. This comes in two forms, link rot, when links no longer resolve, and content drift, when they resolve to different content.
This is where Cliff Lynch enters the story. As he did in many fields, he focused on the big picture. He understood the importance to the big digital preservation picture of simply collecting the content and ensuring its integrity. Already in 1994's The Integrity of Digital Information: Mechanics and Definitional Issues he had written:
A system of information distribution that preserves integrity should also provide the user with a reasonable expectation of correct attribution and source of works. Even if deliberate attempts at fraud, misdirection, or covert revision may sometimes slip through the routine processes of the system these problems can be adjudicated by a formal challenge and examination system ,,, The expectation should be that violations of integrity cannot be trivially accomplished.
We assume that print is difficult to alter, that print authorship and source attribution are relatively trustworthy, and that printed works are normally mass-produced in identical copies. In fact, current technology trends undermine these assumptions. Printed publications are becoming increasingly tailored to very narrow audiences, and it has become easy to imitate the format of well-known and professionally presented publications.
Lynch discussed how the survival of the bits could be confirmed using digital hashes, the potential for digital signatures to confirm authenticity, and why such signatures were not used in practice.
LOCKSS
In October 1998 we proposed to Michael Keller, Stanford's Librarian, a decentralized system whereby libraries could cooperate to collect the academic journals to which they subscribed, and preserve them against the three threats we saw, technological, economic and legal. He gave us three instructions:
Don't cost me any money.
Don't get me into trouble.
Do what you want.
Thus was the LOCKSS (Lots Of Copies Keep Stuff Safe) Program born. The prototype was funded first by two small grants from Michael Lesk at the NSF, and then by Donald Waters at the Mellon Foundation, both of whom like Lynch understood the importance of assuring the survival of, and access to, the bits. Development of the first production system was mostly funded by a signficant grant from the NSF and by Sun Microsystems. We didn't cost Keller any money, quite the reverse considering Stanford's overhead on grants!
The LOCKSS system, like the Internet Archive, was a system for ensuring the survival of, and access to, the bits in their original format. This was a problem; somehow, despite Rotheberg's advocacy of emulation, the conventional wisdom in the digital preservation community rapidly became that the sine qua non of digital preservation was defending against format obsolescence by using format migration based upon collecting preservation metadata.
Actually, the sine qua non of digital preservation is ensuring that the bits survive. Neither Kahle nor we saw any return on investing in preservation metadata or format migration. We both saw scaling up to capture more than a tiny fraction of the at-risk content as the goal. Future events showed we were right, but at the time the digital preservation community viewed LOCKSS with great skepticism, as "not real digital preservation".
Paper Library Analogy
In his 1994 paper Lynch had described how the paper world's equivalent of ensuring the bits survive works; "Lots Of Copies Keep Stuff Safe":
When something is published in print, legitimate copies ... are widely distributed to various organizations, such as libraries, which maintain them as public record. These copies bear a publication date, and the publisher essentially authenticates the claims of authorship ... By examining this record, control of which is widely distributed ... it is possible, even years after publication, to determine who published a given work and when it was published. It is very hard to revise the published record, since this involves all of the copies and somehow altering or destroying them.
Compare this with how we summarized libraries' role in our first major paper on LOCKSS, Permanent Web Publishing:
Acquire lots of copies. Scatter them around the world so that it is easy to find some of them and hard to find all of them. Lend or copy your copies when other librarians need them.
Libraries' circulating collections form a model fault-tolerant distributed system. It is highly replicated, and exploits this to deliver a service that is far more reliable than any individual component. There is no single point of failure, no central control to be subverted. There is a low degree of policy coherence between the replicas, and thus low systemic risk. The desired behavior of the system as a whole emerges as the participants take actions in their own local interests and cooperate in ad-hoc, informal ways with other participants.
If librarians are to have confidence in an electronic system, it will help if the system works in a familiar way.
Threats
Lynch's focus on the big picture meant he also understood that economic and legal threats were at least as significant as technological ones. For example, in 1996's Integrity Issues in Electronic Publishing he wrote:
In the networked information environment, the act of publication is ill defined, as is the responsibility for retaining and providing long-term access to various "published" versions of a work. Because of the legal framework under which electronic information is typically distributed, matters are much worse than they are generally perceived to be. Even if the act of publication is defined and the responsibility for the retention of materials is clarified, the integrity of the record of published works is critically compromised by the legal constraints that typically accompany the dissemination of information in electronic formats.
He discussed some electronic journal pilots in a 1996 talk:
One key question Lynch identified was how acceptable transactional pricing systems will be to end users or to producers, suppliers, and rights holders. Will such models cause streams of income and expenditures to become unworkably erratic?
Now there are two lawsuits from the copyrightcartels aimed at destroying the Internet Archive, it is easy to understand that the most critical threats to preserved content are legal. A quarter-century ago this was less obvious in general. But even then, facing the oligopoly academic publishers, it was obvious to us that LOCKSS had to be designed around the copyright law.
Lynch continued to remind the library community of the economic and legal threats, and the broader issues impeding preservation of our digital heritage. Early examples include:
The retention, reuse, management, and control of this new cornucopia of recorded experience and synthesized content in the digital environment will, I expect, become a matter of great controversy. This will include, but not be limited to, privacy, accountability and intellectual property rights in their broadest senses. And these materials will hopefully become an essential and growing part of our library and archival collections in the 21st century - particularly as we sort through these controversies.
It is unclear how to finance archiving and preservation of these materials. Their volume is no longer driven by acquisitions budgets or by the scholarly publishing system, but by activities that may take place largely beyond the control of the library. And, of course, costs are open ended and unpredictable for digital preservation, unlike the costs associated with preserving modern printed materials (on acid-free paper).
Digital documents in a distributed environment may not behave consistently; because they are presented both to people who want to view them and software systems that want to index them by computer programs, they can be changed, perhaps radically, for each presentation. Each presentation can be tailored for a specific recipient.
Preservation of digital materials is a continuous, active process (requiring steady funding), rather than a practice of benignly neglecting artifacts stored in a hospitable environment, perhaps punctuated by interventions every few decades for repairs.
And:
It is probably not an exaggeration to say that the most fundamental problem facing cultural heritage institutions is the ability to obtain digital materials together with sufficient legal rights to be able to preserve these materials and make them available to the public over the long term. Without explicit and affirmative permissions from the rights-holders, this is likely to be impossible.
And:
What is threatening us today is not an abuse of centralized power, but rather a low-key, haphazard deterioration of the intellectual and cultural record that is driven primarily by economic motivations and the largely unintended and unforeseen consequences of new intellectual property laws that were enacted at the behest of powerful commercial interests and in the context of new and rapidly evolving technologies.
There were many others.
The "Standard Model"
The LOCKSS team repeatedly made the case that preserving Web content was a different problem from preserving Rothenberg's digital documents, and thus that applying the entire apparatus of "preservation metadata", PREMIS, FITS, JHOVE, and format normalization to Web content was an ineffective waste of scarce resources. Despite this, the drumbeat of criticism that LOCKSS wasn't "real digital preservation" continued unabated.
After six years, the LOCKSS team lost patience and devoted the necessary effort to implement a capability they were sure would never be used in practice. The team implemented, demonstrated and in 2005 published transparent, on-demand format migration of Web content preserved in the LOCKSS network. This was possible because the specification of the HTTP protocol that underlies the Web supports the format metadata needed to render Web content. If it lacked such metadata, Web browsers wouldn't be possible.
Unsurprisingly, this demonstration failed to silence the proponents of the "standard model of digital preservation". So five more years later David published Format Obsolescence: Assessing the Threat and the Defenses, a detailed exposition and critique of the standard model's components, which were:
Before obsolescence occurs, a digital format registry collects information about the target format, including a description of how content can be identified as being in the target format, and a specification of the target format from which a renderer can be created.
Based on this information, format identification and verification tools are enhanced to allow them to extract format metadata from content in the target format, including the use of the format and the extent to which the content adheres to the format specification. This metadata is preserved with the content.
The format registry regularly scans the computing environment to determine whether the formats it registers are obsolescent, and issues notifications.
Upon receiving these notifications, preservation systems review their format metadata to determine whether they hold content in an obsolescent format.
If they do, they commission an implementor to retrieve the relevant format specification from the format registry and use it to create a converter from the now-obsolescent target format to some less doomed format.
The preservation systems then use this converter and their format metadata to convert the preserved content into the less doomed format.
The critique included pointing out that creating a format specification for a proprietary format and then implementing a renderer from it was almost impossible, that the existence of open-source renderers made doing so redundant, that most HTML on the Web failed validation (a consequence of Postel's Law), that there were no examples of widely used formats going obsolete, and that Microsoft's small step in that direction in 2008 met with universal disdain and was abandoned. It also noted that:
the standard model is based on format migration, a technique of which Rothenberg’s article disapproves:
Finally, [format migration] suffers from a fatal flaw. ... Shifts of this kind make it difficult or impossible to translate old documents into new standard forms.
The critique was awarded the 2011 Outstanding Paper Award for Library High Tech, but again failed to silence the standard model's proponents. Although we no longer follow the digital preservation literature closely, it is our impression that over the intervening 15 years advocacy of the standard model has died down, thanks in no small part to Lynch's sustained focus on the big picture.
In March 2025, in response to previous discussions among the OCLC RLP Metadata Managers Focus Group members about the use and re-use of WorldCat Entities linked data URIs in WorldCat MARC, the group met again for a series of “Product Insights” sessions on the same topic.
41 participants from 30 RLP institutions in 4 countries attended the three separate sessions:
Art Institute of Chicago
The New School
University of Leeds
Binghamton University
New York Public Library
University of Manchester
Clemson University
New York University
University of Maryland
Cleveland Museum of Art
Pennsylvania State University
University of Pennsylvania
Cornell University
Princeton University
University of Pittsburgh
Library of Congress
Tufts University
University of Sydney
London School of Economics and Political Science
University of California, Los Angeles
University of Tennessee, Knoxville
Monash University
University of Chicago
University of Southern California
National Gallery of Art
University of Hong Kong
Virginia Tech
National Library of Australia
University of Kansas
Yale University
“Product Insights” sessions give our RLP partners exclusive and early access to information about the work our product colleagues are doing, and in return allow our product colleagues to gain relevant insights from the field. The guest of honour on this occasion was Jeff Mixter, Senior Product Manager at OCLC for Metadata and Digital Services. A former member of the OCLC Research team experimenting with linked data, back in the day, Jeff now leads OCLC’s linked data products, applications and services development and is always eager to engage in conversations around real life use cases for library linked data.
Unlock new opportunities, one step at a time, at scale
Jeff started the sessions off by quickly reiterating OCLC’s linked data strategy.
Instead of treating linked data as a natural progression from MARC, as is still often done, as the new technology that must by default be the better one, his team is looking at challenges and inefficiencies in today’s metadata operations, investigating how linked data can help solve problems, create more efficiencies or – most importantly – unlock new opportunities that simply aren’t there right now.
The team’s second focus is on gradual achievements, implementing metadata and workflow changes one step at a time, at scale, not disrupting systems and workflows that for decades have been reliant on MARC data and will continue do so for quite some time to come.
Thirdly, Jeff strongly believes in supporting libraries no matter where they are on their journey to linked data and what internal or external forces may limit their ability to change. One way of doing this is creating what is sometimes called “linky MARC” by adding WorldCat Entities URIs to MARC 21 bibliographic records in WorldCat. WorldCat Entities is a set of authoritative linked data entities and URIs for people, organizations and other entity types that can be used when describing library resources. These URIs are agnostic to the descriptive framework and can be used in whatever the preferred data model happens to be in a particular context. By populating WorldCat with WorldCat Entities URIs at scale, the intention is to support libraries that are far ahead on their journey to linked data, empower libraries that are on their way, and give all others a starting point for their transition to a metadata future less reliant on MARC.
A couple $1s, a 4.0, and some more facts worth knowing
The first part of the Product Insights session was used to answer a mix of concrete questions concerning OCLC ‘s linked data work.
URIs in WorldCat bibliographic records
Jeff explained that OCLC is adding URIs for WorldCat Entites to WorldCat bibliographic records in two ways:
Bulk updates of records to add URIs in certain MARC fields (e.g., 100, 651, and 700)
As part of the regular offline batch controlling service that controls headings from certain vocabularies, URIs are added when a candidate heading is controlled.
Updates to OCLC’s cataloging products in 2024 are also contributing to the enrichment of records with URIs. When a cataloger controls headings for persons in the 100 and 700 fields, a URI is automatically inserted. An attendee noted how a URI may easily be added to a controlled unqualified personal name heading (e.g., “Eco, Umberto”) by simply uncontrolling and then recontrolling the heading. A single action by a cataloger in a record both controls a heading for current MARC needs and provides a bridge for future linked data functionality.
Responding to questions from previous sessions, Jeff explained that OCLC is putting WorldCat Entities URIs in subfield $1 in MARC records, because this is where URIs for “Real World Objects” (RWOs) should go, as opposed to subfield $0 which is usually pointing to authority records.
These URIs in $1s are then by default included in all data exported from WorldCat no matter which route is chosen. Unfortunately, URIs get “lost” on their way from WorldCat to some library service platforms. Attendees were encouraged to approach their suppliers to check they are not inadvertently dropping URIs in $1 subfields upon ingest.
Terms and Conditions
Access to WorldCat Entities data is tiered in the way that authenticated users can get a larger set of data for a URI than unauthenticated ones. For authenticated use all one needs is a completely free API key which is available from the OCLC developer network. Use of the data regardless of level of access is governed by a CC BY-NC 4.0 license, as also detailed under “Terms and Conditions” on the WorldCat Entities website.
Data Provenance
When editing WorldCat Entities data through Meridian, data provenance information can currently be added at the individual claim level, such as for a person’s date of birth or area of expertise, by adding a URL for an information source. OCLC maintains history of description changes, logging all changes made to the entity, and by which institution.
Workflow integration
Participants asked how WorldCat Entities URIs integrate with current bibliographic workflows. In addition to the bulk updates and enrichments described above, one can also manually look up and add entity URIs in WorldShare Record Manager while cataloguing, or even create them on the fly, as a Meridian subscriber. This functionality will also come to the Connexion cataloguing application and ultimately also to CONTENTdm. Using the WorldCat Entities APIs, this type of integration could also be realized in any other tool used for metadata work in whichever context, provided the development work is being done.
Person Entities – yes, please
With all those new facts on the table, participants were ready to get more deeply engaged. Our first topic was to explore the specific value and the specific challenges surrounding person entities and identifiers.
Person identifiers are needed almost everywhere. They can be a way to identify individuals unambiguously, without too much effort. With person identifiers, it is also easier to identify more than just the corresponding authors for a publication. A participant shared that for their electronic theses and dissertations they add ORCID identifiers for multiple authors as well as for related persons, such as professors or committee members.
Even where creating authority records is the norm, staffing issues and competing priorities can slow this work considerably, motivating libraries to investigate quicker ways to reliably disambiguate authors by using identifiers.
Regional diversity is a strong driver for person entities as well, prompting both the wish to identify and disambiguate regional personalities that could not have an authority record, for some reason, and the need to add labels in multiple regional languages. This need not stand in contrast to having an authorized heading, where that exists. Linked data URIs, at least in WorldCat, will never overwrite the authorized heading, leaving the user with the best of both worlds.
Yes, but …
So the need and the benefits are clear, but … unfortunately, barriers seem to be everywhere.
A very concrete barrier discussed was the difference in practice for books and for articles. While person metadata for books is governed by rigorous authority control, article level data is not, and first names often listed only by initials, following citation formats. This makes it almost impossible to reliably reconciliate and match authors across these platforms and data sources.
An area where matching issues become visible also to the end user are experimental linked data features in library service platforms such as person cards. These often rely only on one single source, such as an authority file, and will not pick up information coming from other sources, such as the institutional repository data where authors are identified by their ORCID identifier, resulting in incomplete person cards. As one participant pointed out, these features are useful to show what linked data can do – and where the gaps still are.
If incomplete information is bad, incorrect information is worse by far. Libraries feel responsible for the accuracy of the data they are presenting to their users, and libraries can be concerned that quality issues resulting from errors in external data sources will reflect badly on them. Correcting data at the source, such as in Wikidata, will not always immediately update representations in the target systems. In one instance, an attendee shared, it took a full month to get the corrected Wikidata information to display in the library service platform’s person card. These may be teething troubles in new product features, but it was a wake-up call for the library in question.
So many PIDs, so little time
A large part of the discussions centered on problems created by the fact that there is a multitude of different person identifiers. These identifiers serve very specific purposes, have specific limitations and function best when used in specific workflows. As a result, libraries end up using ORCID identifiers in some contexts and authority files in others. And then, there is Wikidata. Wikidata was mentioned more than once as a source of identifiers to complement the set when the default options are not available. “It’s still kind of a grab bag” one participant noted. But the resulting mix of identifiers cannot easily be integrated. If this could be solved, many operational issues would disappear.
There are efforts underway to combine linked data identifiers with traditional metadata. One participant wondered if adding ORCID identifiers to MARC records could help. Other institutions add ORCID and ISNI identifiers to name authority records. But this kind of work, while seen as useful, does not easily scale.
Our discussions then explored the option of creating something that could sit in the middle and stitch it all together. A hub of identifiers with lots of “same as” relationships that would allow each identifier to function in its own context but could also be used or referred to for cross-identifier discovery. Alas the hub idea, a participant warned, would have to be quickly and fully adopted at scale to be integrated into workflows, and then there is also the question of how to generate sufficient trust in such a solution. Even a platform as large as Wikidata is not often trusted to be the strategic long-term choice for library identifiers.
At this point Jeff joined the discussion to share his conviction that for him, too, the goal must be to make identifiers talk to each other, and that WorldCat Entities was designed as a sort of bridge to try and help connect silos in library land. His vision goes beyond WorldCat, by the way, he is also working with product colleagues to start integrating WorldCat Entities identifiers across CONTENTdm digital materials, as well as in the OCLC Central Index, which is predominantly newspaper articles, journal articles, and book chapters.
Searching across silos and entities
Assuming we could solve these problems, with or without a hub, what could we then do that we cannot do now?
Easily search across platforms for related materials.
For example, we could improve discovery across the collection of rare print materials, managed in one system, and the materials in the cultural collections, managed in another system.
Extend our cross-platform searches beyond local environments and system landscapes and leverage the larger library linked data knowledge graph.
For example, primary source materials (the example mentioned was painted murals) could be connected with secondary sources, such as the painter’s printed works, and works about the painter or her work. This is the type of richer discovery experience that should become possible when using and connecting linked data entities.
In this context, the Linked Art project was mentioned (https://linked.art/), a model for linked data description and management of cultural heritage materials.
Make unexpected discoveries.
Many collections are in a place where researchers do not expect them to be, but network level searching or browsing would allow them to be “found” and connected in new ways.
Unlocking these types of new opportunities is perhaps one of the biggest promises of library linked data and a truly global GLAM knowledge graph.
Many thanks to all those who supported me in writing this blog post, in particular my colleagues Jeff Mixter, Kate James, and Rebecca Bryant.
This week's DLTJ Thursday Threads looks at digital privacy concerns from the commercial perspective.
I think next week's article will be a summary of recent happenings with government surveillance activities.
Late last month, Amazon launched Alexa+, and with it a flurry of privacy concerns. Why? Because Amazon now mandates cloud uploads to process Echo voice commands.
Using the technologies already in buildings, employers can monitor employee activities, raising privacy concerns.
Last year the FTC released a report that, while surprising no one, exposed the extensive data collection by social media platforms.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
With Alexa+ launch, Amazon mandates cloud uploads for Echo voice recordings
Amazon has disabled two key privacy features in its Alexa smart speakers, in a push to introduce artificial intelligence-powered “agentic capabilities” and turn a profit from the popular devices. Starting today (March 28), Alexa devices will send all audio recordings to the cloud for processing, and choosing not to save these recordings will disable personalisation features.
Starting a few weeks ago, Amazon required Echo users to send all voice recordings to its cloud, eliminating a privacy feature that allowed for local processing.
This change coincides with the rollout of Alexa+, a subscription service that enhances the voice assistant's capabilities, including recognizing individual users through a feature called Voice ID.
Users who previously opted out of sending recordings will find their devices' Voice ID functionality disabled.
Amazon justifies this move by stating that the processing power of its cloud is necessary for the new generative AI features.
Privacy concerns anyone?
Especially given Amazon's history of mismanaging voice recordings and allowing employees to listen to them for training purposes.
The company has previously faced penalties for storing children's recordings indefinitely and has been involved in legal cases regarding the use of Alexa recordings in criminal trials.
Surprise, surprise: this shift would appear to prioritize the financial viability of Alexa+ over user privacy concerns.
Workplace surveillance and privacy concerns over employee monitoring technologies
Office buildings have become like web browsers – they&aposre full of tracking technology, a trend documented in a report out this week by Cracked Labs. The study, titled "Tracking Indoor Location, Movement and Desk Occupancy in the Workplace," looks at how motion sensing and wireless network technology in buildings is being used to monitor the movement and behavior of office workers and visitors. "As offices, buildings and other corporate facilities become networked environments, there is a growing desire among employers to exploit data gathered from their existing digital infrastructure or additional sensors for various purposes," the report says. "Whether intentionally or as a byproduct, this includes personal data about employees, their movements and behaviors."
This is as fascinating as it is frightening.
It is possible to repurpose technologies built into the building to track employees' movements and behaviors.
The report is part of a broader series examining surveillance and digital control at work, supported by various organizations concerned with privacy and labor rights.
In the U.S. and Europe, regulators, including the Federal Trade Commission, are responding to the growing use of tracking technologies, which gather extensive personal information about workers.
Companies like Cisco utilize their networking systems to monitor the location of individuals and assets, enabling behavioral profiling based on location data.
However, the report notes instances of pushback, such as protests at Northeastern University against the installation of motion sensors under the desks of graduate student workers, which were viewed as invasive and unnecessary.
I expect this same kind of technology is being deployed in retail stores and other locations as well.
FTC report exposes extensive data collection by social media platforms
The Federal Trade Commission said on Thursday it found that several social media and streaming services engaged in a “vast surveillance” of consumers, including minors, collecting and sharing more personal information than most users realized. The findings come from a study of how nine companies — including Meta, YouTube and TikTok — collected and used consumer data. The sites, which mostly offer free services, profited off the data by feeding it into advertising that targets specific users by demographics, according to the report. The companies also failed to protect users, especially children and teens. The F.T.C. said it began its study nearly four years ago to offer the first holistic look into the opaque business practices of some of the biggest online platforms that have created multibillion-dollar ad businesses using consumer data. The agency said the report showed the need for federal privacy legislation and restrictions on how companies collect and use data.
The chairwoman of the FTC at the time, Lina Kahn, emphasized that such surveillance poses risks to privacy and personal safety, contributing to broader societal issues.
It remains to be seen if Congress and this administration pick up the ball and run with it, but I'm not certain that will happen (if, for no other reason then there are many more distractions happening).
The report criticized self-regulation by these companies as ineffective, so this issue is ripe for legislative action.
Fiverr freelancers advertise access to personal data
Dozens of sellers on the freelancing platforming Fiverr claim to have access to a powerful data tool used by private investigators, law enforcement, and insurance firms which contains personal data on much of the U.S. population. The sellers are then advertising the ability to dig through that data for prospective buyers, including uncovering peoples’ Social Security numbers for as little as $30, according to listings viewed by 404 Media. Fiverr removed the listings after 404 Media inquired about the practice. The advertised tool is TLOxp, maintained by the credit bureau TransUnion, and can also provide a target’s unlisted phone numbers, utilities, physical addresses, and more.
In case you aren't familiar with it, Fiverr is an online marketplace for freelancers—a place you can go if you need quick, specialized help with a task or have specialized skills to offer.
In this case, the article reports that dozens of Fiverr freelancers are advertising access to a powerful data tool containing personal information—Social Security numbers, unlisted phone numbers, addresses, and other private data—on just about everyone.
The tool is used by private investigators, law enforcement, and insurance firms, but it has also become a "secret weapon" for hackers and fraudsters to dox people.
This Week I Learned: The pronoun "I" was capitalized to distinguish it from similarly typset letters
In fact, the habit of capitalizing “I” was also a practical adaptation to avoid confusion, back in the days when m was written “ııı” and n was written “ıı.” A stray “i” floating around before or after one of those could make the whole thing hard to read, so uppercase it went. And now it seems perfectly logical.
I'm not buying the opinion author's underlying premise (capitalizing “they” in writing when it refers to a nonbinary person), but the origins of why we capitalize "I" and not other pronouns are fascinating.
What did you learn this week? Let me know on Mastodon or Bluesky.
Pickle curls up for a snuggle
I need to get pictures of Pickle into the newsletter while I can.
Later this fall she is off to Penn State with her "primary" as my daughter starts her graduate degree program.
TinyCat turns nine this month! Since 2016 more than 37,000 small libraries have signed up for the best user-friendly and affordable library management solution, powered by LibraryThing.
To celebrate, all TinyCat merchandise is on sale in the LibraryThing Store and we’re giving away pins, stickers, and tote bags too.
TinyCat Store Sale
All TinyCat merchandise and barcode scanners, including stickers, pins, and coasters, are on sale now through Friday, May 9. Check out what’s for sale in the LibraryThing Store.
TinyCat Giveaway
The winner of our giveaway will receive a free, heavy-duty cotton tote bag with the TinyCat logo! The first 25 submissions will receive a TinyCat sticker and pin for participating. The giveaway is open to TinyCat libraries with paid staff or volunteer accounts.
How to enter:
Take a photo of yourself with your library; either your favorite bookshelf or the building. If you would rather not be in the photo, include a furry friend instead! Make sure it’s a photo you won’t mind us using in promotional posts.
This text, part of the #ODDStories series, tells a story of Open Data Day‘s grassroots impact directly from the community’s voices.
The Inauguration
The program began on the 1st of March with a seminar centered around this year’s ODD theme, “Open Data to Tackle Poly Crises,” presented by Mr. Daniel Osei-Agyeman. The seminar was followed by a workshop on Participatory Mapping facilitated by Dr. Emmanuel Abeashi-Mensah. This session introduced the mappers to our project, “Komenda Shoreline Mapping Project – A YouthMappers Open Data Initiative,” which aimed to use open geospatial tools to map Komenda’s shoreline. The seminar by Mr. Daniel Osei-Agyeman was particularly enlightening as it delved into various aspects of how open data can be utilized to address multiple crises simultaneously. Participants were engaged in discussions around data transparency, accessibility, and the role of open data in fostering community resilience. The insights shared during this session were invaluable, setting the stage for the subsequent workshop where mappers were equipped with the skills and knowledge necessary for participatory mapping.
Dr. Emmanuel Mensah’s workshop was a hands-on experience that emphasized the importance of community involvement in mapping projects. Participants learned how to use various geospatial tools to gather and analyze data, creating maps that accurately represent the community’s vulnerabilities and resources. The workshop was a precursor to the fieldwork, providing mappers with practical skills that would be essential for the success of the Komenda Shoreline Mapping Project.
The Grand Celebration
The celebration culminated on the 6th of March with an actual fieldwork session at Dutch Komenda. We were warmly received by the community leaders and members, who enthusiastically aligned themselves with the five groups we had created from our mappers. The entire day was dedicated to exploring different parts of the community, guided by residents, to map the characteristics of the hazard and vulnerable areas prone to coastal flooding as well as the community’s critical infrastructure and shoreline.
The fieldwork was an immersive experience that allowed mappers to apply their newly acquired skills in a real-world setting. Each group, accompanied by community members, ventured into different areas of Dutch Komenda to collect data. The mapping process involved identifying key infrastructures such as schools, hospitals, and roads, assessing their vulnerability to coastal flooding, and pinpointing areas that required immediate attention. The collaboration between mappers and community members was crucial, as local knowledge provided insights that were not immediately apparent through data alone.
One of the highlights of the fieldwork was the interaction with community leaders and residents. Their involvement was instrumental in ensuring the accuracy of the maps and the relevance of the data collected. The community’s willingness to participate and share their experiences with coastal flooding enriched the mappers’ understanding of the challenges faced by Dutch Komenda. This collaborative effort underscored the importance of community engagement in open data initiatives.
The day concluded with a refreshing gathering at Dutch Komenda beach, where participants reflected on the day’s activities. The informal setting allowed for open discussions and the exchange of ideas, fostering a sense of camaraderie between students, community youth, and leaders. The celebration was not just about data collection but also about building relationships and trust within the community, paving the way for future collaborations.
Learning and Insights
Through this project, we learned the importance of community involvement in data collection and the practical application of open geospatial tools in building coastal resilience. The community’s insights were invaluable in understanding the real-world implications of our work.
One of the key lessons was the significance of participatory mapping. By involving community members in the mapping process, we ensured that the data collected was accurate and reflective of the community’s actual needs. This approach also empowered the community, giving them a sense of ownership over the project and its outcomes. The mappers realized that open data is not just about technology but also about people and their stories. The personal experiences shared by residents brought to light the human aspect of data, reminding us that behind every data point is a real person with real challenges.
The use of geospatial tools was another crucial aspect of the project. Mappers learned how to leverage these tools to create detailed maps that highlight vulnerabilities and resources. The hands-on experience gained during the workshop and fieldwork enhanced their technical skills, making them proficient in using open data for community resilience projects. The ability to visualize data through maps provided a clearer understanding of the community’s needs, enabling mappers to propose effective solutions to mitigate the risks associated with coastal flooding.
The project also highlighted the importance of collaboration. The partnership between UCC YouthMappers, community leaders, and residents was a testament to the power of collective action. Working together towards a common goal fostered a sense of unity and shared responsibility, emphasizing that addressing poly crises requires a collaborative effort. The success of the Komenda Shoreline Mapping Project was a result of this collective endeavor, showcasing the potential of open data initiatives to bring communities together and drive positive change.
Overall, the Open Data Day 2025 celebration was a transformative experience for UCC YouthMappers. It provided an opportunity to apply theoretical knowledge in a practical setting, enhancing their understanding of open data and its applications. The lessons learned and skills acquired during this project will undoubtedly contribute to their future endeavors in community resilience and open data initiatives.
Attendance:
Total students: 41
Team members: 10
What’s Next?
Watch for the next phase focusing on training the youth in utilizing open digital tools to reduce the vulnerabilities they face. This training will empower them to take proactive steps in safeguarding their community against future crises.
The upcoming training sessions will be designed to equip youth with the skills needed to harness the power of open digital tools. These sessions will cover various aspects of data collection, analysis, and visualization, enabling participants to create detailed maps and reports that highlight their community’s vulnerabilities. The training will also emphasize the importance of data-driven decision-making, encouraging youth to use data to inform their actions and advocate for necessary interventions.
By empowering youth with these skills, we aim to create a generation of informed and proactive individuals who can contribute to their community’s resilience. The training will be conducted in collaboration with local experts and organizations, ensuring that participants receive comprehensive and relevant knowledge. The focus will be on practical applications, with hands-on activities that allow youth to apply what they’ve learned in real-world scenarios.
The next phase of the project will also include follow-up activities to monitor the progress of participants and provide ongoing support. Regular check-ins and feedback sessions will be conducted to address any challenges faced by participants and ensure that they are equipped to utilize open digital tools effectively. The goal is to create a sustainable model that empowers youth to continuously contribute to their community’s resilience.
Acknowledgments
We extend our heartfelt gratitude to the community of Dutch Komenda and the Department of Geography and Regional Planning for their unwavering support and collaboration.
In conclusion, the Open Data Day 2025 celebration was a beautiful journey of learning, collaboration, and community engagement. The day ended with a delightful refreshment at Dutch Komenda beach, shared by both students and the community youth, alongside some of the leaders who joined us throughout the program. This is just one facet of our commitment to building coastal resilience and contributing to a safer, more informed world.
The success of this project would not have been possible without the support of the community of Dutch Komenda. Their active participation and willingness to share their experiences were instrumental in ensuring the accuracy and relevance of the data collected. The collaboration with community leaders and residents provided valuable insights that enriched the mapping process, highlighting the importance of local knowledge in open data initiatives.
We also extend our gratitude to the Department of Geography and Regional Planning for their guidance and support throughout the project. Their expertise and resources were crucial in facilitating the training sessions and fieldwork, ensuring that mappers were well-equipped to carry out their tasks. The department’s commitment to promoting open data and community resilience was evident in their unwavering support, making them an integral part of the project’s success.
As we look forward to the next phase of the project, we remain committed to our goal of building coastal resilience through open data initiatives. The lessons learned and relationships built during the Open Data Day 2025 celebration will serve as a foundation for future endeavors, driving our efforts to create a safer and more resilient community. We are excited to continue this journey, empowering youth and fostering collaboration to address the challenges posed by poly crises.
About Open Data Day
Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities. ODD is led by the Open Knowledge Foundation (OKFN) and the Open Knowledge Network.
As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date over one week. In 2025, a total of 189 events happened all over the world between March 1st and 7th, in 57 countries using 15+ different languages. All outputs are open for everyone to use and re-use.
Our most recent RLP Special Collections Leadership Roundtable focused on resource-sensitive approaches to collection building and implementing the Total Cost of Stewardship (TCoS) tools and frameworks. TCoS emerged in 2021, when many institutions struggled with pandemic-era closures, staffing shortages, and disruptions. Four years later, as libraries and archives regain bandwidth to function beyond survival mode, we’re seeing renewed engagement with the TCoS framework. Our conversations revealed that the issues TCoS aimed to address continue to plague special collections and archives, making resource-sensitive collecting practices even more vital now than when the report first appeared.
Institutions have implemented various approaches to resource-sensitive collecting, and our conversation focused on sharing these experiences to support our collective efforts in this arena.
The discussions included 36 participants from 35 institutions across the US, UK, Canada, Australia, and New Zealand:
Cleveland Museum of Art
Cornell University
Emory University
Getty Research Institute
Hofstra University
Montana State University
National Library of Australia
National Library of New Zealand
New York Public Library
New York University
OCAD University
Ohio State University
Smithsonian Institution
Stony Brook University
Syracuse University
The New School
University at Buffalo, SUNY
University of Arizona
University of California, Irvine
University of Chicago
University of Delaware
University of Glasgow
University of Illinois at Urbana-Champaign
University of Kansas
University of Leeds
University of Michigan
University of Nevada, Las Vegas
University of Nevada, Reno
University of Pennsylvania
University of Southern California
University of Toronto
University of Washington
Vanderbilt University
Virginia Tech
Washington University in Saint Louis
These questions guided our discussion:
Has your institution started shifting to a more resource-sensitive collecting approach? If not, what prevents you from doing so?
How are you approaching implementation, or what might be your first steps? Which parts of your program have adjusted, or will need to adjust, to work this way?
What challenges have you encountered, and what benefits or successes have you seen?
Each session yielded rich and nuanced discussion, but common themes emerged across all four conversations. This post summarizes those shared experiences and needs.
Legacy practices, backlogs, and space constraints
The Total Cost of Stewardship framework serves as a corrective, acknowledging that past collecting practices—often characterized by acquiring as much as possible without adequate consideration for stewardship—have created current challenges that require rethinking. As one participant noted, “The 20 previous years of collecting practices have now caught up with us, and our backlog is massive. We were just collecting, collecting, collecting, and not really thinking too much about downstream impacts.”
These downstream impacts include significant backlogs of unprocessed and uncataloged collections, impeding access to and discovery of the materials held in special collections.
However, the impact repeatedly mentioned across our conversations was a critical lack of space, with institutions facing full vaults and costly off-site storage. Approaching storage capacity limits is forcing a rethinking of collecting habits. A few institutions now allocate curators annual storage capacity for collecting, which requires them to budget for space as they might acquisition funds. Backlogs also significantly affect staff morale. One person characterized this as “fatigue from decisions of the past.” Teams feel burdened, guilty, and overwhelmed by having to address ever-accumulating backlogs.
Some participants have devoted resources to better understand and address their backlogs, yielding benefits beyond just clarifying the scope of work. One person shared that “having data about the backlog and how big it actually is, is also powerful — it made people go, ‘oh, I guess we have to deal with that and can’t just continue to build it.’”
Collection development policies and strategies
Many institutions are building collection development guidance as a first step toward more resource-sensitive collecting, typically by creating or revising collection development policies. Some are also crafting collecting strategies that complement broader collection development policies by offering more specific and timely guidance about current priorities. These documents often directly address stewardship capacity as one factor in decision-making.
Participants noted that articulating these policies supported their staff in making curatorial decisions more consistently and confidently. Several emphasized the importance of having these guidelines to help their team decline some collections, building what one participant described as “the internal fortitude to say NO.” Providing a framework for these decisions that disperses responsibility beyond individual curators is crucial. This is more than simply a policy change; it is also a cultural one that requires rethinking what constitutes successful collection building. One institution reported that their curators used to be evaluated on how much they brought in annually, and that this mindset can persist even years after evaluation criteria changed. The roundtable discussion highlighted the need to redefine success metrics for curators beyond acquisition volume to include alignment with teaching and research priorities, collection use, and other qualitative measures.
A few institutions have temporarily paused collecting due to space constraints or anticipated budget shortfalls. Others have considered full or partial pauses to address backlogs or slow down collection growth until additional storage can be secured. All agreed this approach presents challenges because collection development often unfolds over many years—initial conversations with donors might occur 5-10 years before materials arrive and significant time and labor goes into stewarding relationships with potential donors that may only come to fruition during a collecting pause. Additionally, flagship collections will only be available once, and collections squarely within development priorities can demand quick action. Hard stops on collecting don’t always work for a process that requires long-term relationship building or flexibility. Most agreed that a “soft pause” was more realistic but worried that anything short of a complete prohibition might lack sufficient impact.
Estimating costs and capacity
Many institutions began this work by better understanding the costs of caring for archives and special collections materials and their institutional capacity to do so. People use the TCoS cost and capacity estimator tools, the DLF digitization cost calculator, and the LOGJAM tool in the UK. These tools reveal previously unrecognized costs and activities. Some institutions are going all in on cost and capacity estimation, while others take a more incremental approach of estimating time but not necessarily translating that to cost or capacity impact.
Being transparent about costs has occasionally sparked challenging conversations, with administration questioning costs and donors reacting to stewardship costs as optional rather than necessary activities. One complication in using these tools is the difficulty in capturing the full scope of stewardship costs. People find it difficult to estimate some costs, such as physical and digital storage over time, the costs of iterative work over many years, and to more fully conceive of what to include in the costs of collection needs over time.
Information sharing and collective decision-making
Many participants are revamping their workflows to involve more stakeholders in acquisition decisions. They have standardized workflows that engage multiple departments and areas of responsibility to better understand the broader impact of potential additions to collections. One person described initiating early conversations about potential acquisitions with public services, technical services, and conservation colleagues to determine whether to pursue a collection further. The goal isn’t necessarily an immediate yes or no but rather deciding whether to continue investing energy in the collection.
Institutions benefit when teams think collectively about acquisition decisions and developing a culture of talking openly about costs. Several participants had uncovered hidden costs and interdependencies, helping them recognize they had previously underestimated the impact of collecting on staff both inside and outside special collections. This has improved relationships across the library: “Our big successes have really been thinking about how new acquisitions impact other parts of the library… [We’ve developed] better relationships with those other units by pulling them in sooner and considering them more effectively in our acquisitions process.”
More appraisal, earlier in the process
Resource-sensitive approaches have motivated several institutions to reinvigorate their appraisal work and reconsider its timing in the acquisition process. One institution’s reappraisal of their backlog revealed that around 30% of collections warranted deaccessioning, significant weeding, or transfer to elsewhere in the library. Discovering how much excess material they’d been storing for years sparked a cultural shift toward more pre-custodial appraisal and appraisal at accessioning.
Another participant explained how understanding stewardship costs led to emphasizing “appraisal before acquisition, because… we spend a lot of money on things we don’t end up keeping in terms of shipping, accessioning, and processing. We’re trying to limit that ‘total cost’ earlier in the process.” Collections that are smaller when they arrive or don’t come at all save on shipping and storage costs, and accessioning labor.
One institution now mandates site visits to assess all collections, and budgets necessary travel funds to support this practice. These visits enable better understanding of potential acquisitions and their requirements, allow more granular appraisal, and sometimes lead to declining a collection after the initial assessment. They found that site visits significantly reduce the volume of what they acquire versus what’s offered, helping them bring in less volume and higher-quality materials.
Supporting fundraising and advocacy
Many participants use TCoS tools to support fundraising and advocacy efforts. Cost estimation tools help plan grant projects with realistic budgets. Having concrete time and resource estimates improves communication with development colleagues, helping them understand the true costs and financial implications of accepting new materials. This empowers development staff to make compelling cases for funding stewardship activities which may be less “visible” but equally critical, making them more effective partners in fundraising efforts that address actual collection needs.
A common challenge when working with fundraising colleagues involves requests to acquire materials that monetary donors want to give the library. When development colleagues understand the full scope of collection management work, they can engage in more informed conversations about whether such acquisitions align with institutional capacity and priorities.
Similarly, institutions can have more transparent conversations with donors about the work involved in accepting their collections and why processes take time. This fosters greater appreciation for stewardship responsibilities and the care staff give collections. Some institutions have successfully negotiated with donors to include funds specifically for collection care. However, participants raised concerns about monetary donations becoming an expectation of donors. Any “pay-to-play” approach would contradict professional values and severely limit the diversity of materials in collections. While many have had productive discussions about costs with donors, others have experienced resistance from donors who believe that processing and storage are the institution’s responsibility.
Challenges, benefits, and continued evolution
Challenges certainly remain, even for institutions making significant progress in resource-sensitive collecting. Some challenges are structural: restrictions on endowment funds often limit their use for stewardship activities. Some participants work around this limitation by “buying individual more expensive things rather than nickel and diming ourselves to death… [because] it’s faster to catalog one expensive thing than 200 inexpensive things.” But others are renegotiating with donors to expand the use of endowment funds beyond acquisitions to include processing, cataloging, or preservation.
Cultural challenges also exist. One person noted, “The biggest impediment is institutional appetite…suddenly, something big will come that the president or director wants, and we’re taking it.” The shift to a more collective approach made some people, especially curators, feel their autonomy was diminishing. Leaders countered this by emphasizing that changes aimed to improve collective work and holistic approaches, not penalize individual performance. Often, challenges simply involve resources: “Our biggest challenge is not having enough people and hours to revamp processes because it takes time to stop what you’re doing, think of a better way, develop that better way, and then implement it.”
Overall, these discussions highlighted significant benefits from resource-sensitive collecting. Institutions are addressing backlogs and improving workflows to make collections more accessible. They report acquiring higher-quality collections, and becoming more strategic rather than reactive in their collecting practices. Better communication and visibility into each other’s work has improved relationships both within and beyond special collections.
For readers attending the RBMS conference in New Haven this June who want to build more resource-sensitive collecting programs, we are organizing a day-long pre-conference symposium on Total Cost of Stewardship. The day will include three panels featuring TCoS implementation experiences from multiple institutions, and structured exercises to help participants envision implementation at their own institutions. It promises to be an interesting and rewarding day—we hope you’ll join us!
AI Nota Bene:I used AI tools to assist me in writing this post. I used WebEx’s internal AI tools to create transcriptions and high-level summaries of the roundtable conversations. I consulted these to create my own notes and summary, which did not include any information identifying individuals or institutions. I then fed them into Google’s NotebookLM to analyze key themes, identify interrelationships between those themes, and generate potential ways of organizing this post. I reused some language generated by NotebookLM but did the bulk of writing myself. I used Claude to edit my final draft, asking it to review the post and make suggestions to improve clarity, concision, grammar, and reduce passive voice and redundancies. I used many, but not all, of Claude’s suggestions.
The Instituto Moreira Salles has been working for the past two years on the beginning of its digital preservation program, organizing obsolete media and their digital files. They have also been implementing a workflow for preserving institutional information in digital format so that it can be preserved alongside all the permanent digital files from its collections and digital surrogates generated for preservation and online distribution. Next year, they plan to conduct a self-evaluation based on NDSA’s Levels of Preservation and the DPC-RAM organizational and service capabilities, along with building a digital preservation policy and plan.
Since 2012, Amherst College Library has made significant strides in digitizing, preserving, and cataloging holdings within its Archives and Special Collections to provide continued public access to its holdings and the long-term preservation of materials. The Library works to provide sustainable access to digital content produced by both the Library and the College through Amherst College Digital Collections (ACDC). Amherst College Library adheres to national and international community-based standards and best practices in managing its digital collections. Ongoing initiatives focus on refining strategies, policies, and workflows for preserving born-digital content, web archives, and legacy media, while also providing support to the campus regarding data management, digital repositories, and open-access initiatives.
Each organization participates in one or more of the various interest and working groups – so keep an eye out for them on your calls, and be sure to give them a shout-out. Please join me in welcoming our new members! You can review the list of membershere.
TL;DR Just because someone says they’ve archived something from the
web doesn’t mean it isn’t worth checking and possibly archiving again.
We need better tools and methods for appraising web content in need of
preservation.
As the Trump administration’s destruction of the federal government has
spilled out into the removal of web pages and entire websites we have
seen the emergence, and re-emergence, of several efforts to try to
collect at risk government web content. There has been a lot of valuable
coverage of this work in the mainstream media, and I won’t pretend to do
it justice by trying to summarize it here.
However, one constant theme is the pivotal role that the End of Term Web Archive plays.
Since 2008 a group of institutions centered on the Internet Archive have
archived US government websites at the end of presidential
administrations. It’s hard to overstate how important this work is, as
the federal government has shifted since the late 1990s to web
publishing, instead of distributing physical media (tapes, CD-ROMs,
paper documents of various kinds) as part of the Federal
Depository Library Program.
Occasionally people will mention the work of the End of Term Web Archive
to allay fears that government websites are at risk, and that the work
of protecting this data is done and dusted, as it were. “This is already
being taken care of.” But the reality is that the .gov web is so large
that it’s quite difficult to say how much of it has been reliably
archived.
Logging in to ftp.census.gov with ncftp
As concern about the state of federal information on the web spread at
my workplace some became particularly interested in the status of the US
Census, specifically the venerable Census FTP site (ftp.census.gov),
which is also made available on the web at https://www2.census.gov. There are
worries about whether the data will continue to be made available,
whether the data will continue to be collected in the way that it has,
and if the already published data will be revised after the fact for
political ends.
As an experiment, a colleague of mine (Andrew Berger)
got interested in how feasible it was to download the entirety of the
Census FTP site over his home network connection using lftp. He started it up and it ran for 4
weeks and collected 5.9 Terabytes of data. As he started and restarted
the collection he noticed some files disappearing and others appearing.
This isn’t entirely surprising since parts of the FTP site seem to
resemble more a collection of random documents on somebody’s desktop
than a well organized archive. Be that as it may, other parts of the
site are very well organized with documentation and clear path
hierarchies.
While discussing this work Andrew expressed an interest in knowing how
much of the FTP data might be available in the Internet Archive’s Wayback Machine,
which is where the End of Term Web Archive data ultimately lands.
It was straightforward to turn the file system into a list of URLs since
the paths map directly to their location on the web. What was as bit
more tricky was efficiently looking these URLs up in the Wayback
Machine. The Internet Archive do have an API
for looking up a given URL and seeing what snapshots of it are
available. But it can take multiple seconds to look up a URL, and given
that there were 4,496,871 of them, a best case scenario is that could
take over 52 days of constant API requests to check them all.
The checking is complicated by the fact that the Wayback Machine’s API
will return snapshot records for pages that it failed to retrieve: HTTP
redirects (3XX) and HTTP Errors (4XX). In spot checking a few of the
results it was clear that archiving web crawlers sometimes received 403
Forbidden error when crawling. For example take a look at the snapshot
for https://www2.census.gov/geo/tiger/TIGER2024/ADDRFN/tl_2024_49039_addrfn.zip.
Perhaps the server flagged the crawler as a malicious bot requesting too
many resources in too short a time? It’s kind of difficult to say.
Nevertheless, in order to ascertain how well these Census files have
been archived it’s important to ignore these false positives, and only
count snapshots that resulted in 200 OK HTTP responses.
This process of deciding what to accession into an archive is known in
archival practice as appraisal.
It’s not uncommon to use statistical sampling when appraising archival
collections (Cook, 1991; Kolish, 1994). However sampling is
usually done because there is only space to store a representative
sample of an entire set of records. In this case the sampling is helpful
for determining whether a given set of records is in need of archiving,
based on whether the records have already been archived elsewhere.
According to this
calculator, if I want 95% confidence with 5% margin of error, I can
randomly sample 385 URLs out of the 4,496,871 and test only those.
That’s a lot more manageable to do. A sample like this obviously doesn’t
provide an exhaustive list of everything at ftp.census.gov that is in
need of archiving. But it can give a sense of the coverage in the
Wayback Machine, which can help guide decision making around whether to
archive this dataset.
So what did I find? If you want to see the details check out this Jupyter
notebook.
Basically the results suggest with 95% confidence, and a 5% margin of
error, that only 46% of the Census FTP URLs have a snapshot in the
Wayback Machine. I was kind of surprised by this result, so I ran 5
other samples and found they were all in the 5% margin of error.
Of the 4,496,871 files there is actually a subset that is of particular
interest to researchers at work:
These files only account for 12% of the total files on the Census FTP
site. So I thought it was possible that lack of coverage in other
directories, that contain many more files, could be skewing the results
for these high value datasets.
To account for this I sampled each subdirectory individually, tested and
gathered results:
Clearly there’s quite a bit of variability here. So what does this mean
for deciding whether to archive this data? One way of interpreting the
results is that:
The End of Term / Internet Archive work has not been able to collect
all the files made available in the US Census FTP site. This
casts some doubt on the coverage of less prominent, harder to crawl,
federal government websites.
There is value in collecting and accessioning this data into an
institutional repository, especially if members of our community place a
high value on being able to use it.
It always helps to work with domain experts who understand the web
content be crawled: what the content is useful for, what is required to
use it, how often it is updated, etc. Understanding the mechanics of
acquiring web content is necessary but not sufficient for web archiving
practice.
Basically, we shouldn’t take it for granted that these datasets have
been archived and will remain available in their current form. That
being said, nobody at my place of work has indicated that we are in fact
going to archive this data. This post was just part of some work to help
inform that decision making. But as we archive the web it’s important to
be able to determine how well archived sites are already, and this
exploration was just scratching the surface of that need.
I’ve made the sample datasets generated by the notebook available as CSV
files if you want to examine some the hits and misses:
On 28 March 2025, the Open Knowledge Foundation (OKFN) hosted a new iteration of The Tech We Want initiative – an ambitious effort to reimagine how technology is built and used. The focus this time was the Frictionless Data project, an open source toolkit for making data management, data integration, data validation, and data flows more FAIR (Findable, Accessible, Interoperable, Reusable). Frictionless has at its core the simple and extensible Data Package standard, recently released in its version 2, and software implementations in many programming languages that allow the description, extraction, validation, and transformation of data described and containerised according to the Data Package standard.
It’s been almost two decades since the project started, with the very first Data Package created back in November 2007, and the project has enormously evolved since. The Data Package update to v2 has inaugurated a better and more explicit governance for the standard, and the core libraries have a notable community investment, with the R library curated and maintained by INBO, the Belgian Research Institute for Nature and Forest, and the Python library maintained by the French multi.coop. It was therefore the right time to convene maintainers, users, and contributors to celebrate the project in a Frictionless Summit, and discuss challenges, priorities, and governance for the project moving forward.
Call recording
Frictionless today
The Summit started with seven presentations of amazing projects out there that use Frictionless.
1. Pierre Camilleri from multi.coop presented Validata – a platform they developed which offers users the opportunity to check that their data complies with the national reference data schemas listed on schema.data.gouv.fr. Validata uses the validation function of the Frictionless Python library, and it validates against the Frictionless Tableschem used by schema.data.gouv.fr.
2. Romina Colman from OKFN presented Open Data Editor– the Frictionless-based desktop application which offers data validation and publication to a non-tech audience.
3. Peter Desmet from INBO presented Camtrap DP– a data exchange format which extends the Data Package to allow users to easily exchange, harmonize and archive camera trap data at local to global scales.
4. Nick Kellett from Deploy Solutions presented one of their last prototypes: an app which allows people to calculate the risks linked to climate change to their assets, and what they can do to mitigate those risks. To standardise the data and metadata fields in their databases, Deploy Solution uses Frictionless schemas.
5. Phil Schumm from the University of Chicago presented some of the amazing work him and his team are doing with biomedical research data, to make the Data Package standard and Frictionless tools accessible to investigators working to follow NIH‘s Data Management and Sharing Policy. The policy heavily encourages FAIR data sharing and data sharing practices, which is why Frictionless is an ideal toolkit to use to comply with it. The HEAL Data Platform that Phil and his team are building on top of Gen3 Stack, is currently using the Frictionless Table Schema and uses some of the Python library pipelines.
6. Adam Shepherd from BCO-DMO showed how they have integrated Frictionless in their data submission process, data curation, and data accessibility. Every time contributors submit a new dataset to BCO-DMO, an interface under the hood builds a Data Package for them. The data curation team then uses the Frictionless Data Pipeline to turn the submitted data into consistent and standardised datasets, while recording the cleaning steps for reproducibility purposes.
You can read more about how BCO-DMO uses Frictionless in their blogs.
7. Ethan Welty from the World Glacier Monitoring Service (WGMS) talked about the comprehensive global glacier datasets they are curating. To ensure quality and reliability of the data they use the Frictionless standard and validation. Ethan gave us a demo of the Validator tool that he built on top of Frictionless to better handle the numerous tests he was running on the datasets.
Frictionless tomorrow
Using the use-cases that were presented as a starting point, we then moved to a common reflection about the future of the project. Patricio del Boca, OKFN’s Tech Lead, briefly presented The Tech We Want initiative, in order to understand if the idea of simplifying technology in order to make it last (in the sense that less resources are needed to keep it running) is an approach that could be beneficial to the Frictionless project too.
This idea resonated across the room. Pierre Camilleri highlighted the challenge of maintaining a codebase that has grown large and complex, especially with limited contributors. He suggested narrowing the scope and focusing on modularity – removing features that aren’t widely used and concentrating efforts where they matter most. “We need to strike a balance between having the flexibility to change things when it’s time, and refraining from changing things every time something isn’t perfect,” he noted.
The discussion focused on building a resilient core around the Data Package standard – something envisioned to last for 100 years – and minimizing dependencies to make that possible.
Peter echoed this by suggesting a more focused scope around validation, possibly reducing or even eliminating components related to data manipulation. The core idea is to simplify and solidify the foundational tools, leaving more domain-specific or experimental extensions to external efforts.
The conversation also touched on expanding the standard thoughtfully. There were suggestions to include elements like units and taxonomy mapping layers – but with caution. While mappings and controlled vocabularies are helpful, they should remain optional and outside of the core standard to avoid locking the project into domain-specific use cases.
Though Data Package v2 has been officially released for months now, much of the supporting software has yet to catch up. This underlines the urgency of deciding what parts of the ecosystem should be actively maintained and where to scale back. Phil Schumm advocated for maintaining one core library that is “very well thought through and well maintained,” which other libraries could then build upon – an approach mirrored by Camtrap DP’s R library, which focuses tightly on reading, converting, validating, and writing.
Pierre had prepared some clustering of the features of the Python library, an excellent starting point to discuss the scope reduction. You’ll find Pierre’s clustering in this GitHub discussion.
The conversation then shifted toward adoption and outreach. The truth is, the more Frictionless is adopted, the more it can rely on people to maintain things (e.g. GBIF or ROpenSci). How do we better communicate the value of Frictionless and its philosophy? Steve Diggs challenged the group to think about who needs Data Package but doesn’t know it yet. There’s still a lot of self-imposed “friction” in the world of data sharing, and Frictionless offers a powerful antidote to that. From concerns about political shifts in data policy to data rescue scenarios during natural disasters, Frictionless was positioned as a way to preserve and democratise access to data.
This point struck a chord, especially when reflecting on real-world situations where data is hidden or duplicated in panic, rather than shared in accessible, structured formats. Frictionless could provide clarity and structure – “untangling the ball of string,” as Steve put it – even for those without deep domain expertise.
Finally, the group agreed with Phil, when he mentioned that growing adoption will require partnerships with overlapping communities – platforms like OSF, standards groups, and data-sharing tools that already solve related problems. Rather than competing, Frictionless should aim for interoperability, making it easier for users to switch between standards and tools. The key will be clearly demonstrating how Frictionless tools can help automate and simplify the messy business of data curation.
Conclusion
The community agreed to follow up on the two macro-topics that emerged with separate meetings, possibly building working groups around them:
One for software simplification and core functionalities
I started this blog when my job title was the very long and very silly “Systems Librarian — Web, Communications, and Interface Design.” I was doing UX work related to the “web” part of that title and then expanded to non-web-related UX, joking that I took “interface design” to include the interface between users and the library.
For a couple of different reasons, my job has changed again. I no longer have any UX responsibilities. I’m still responsible for the library website and am also now overseeing our library service platform, Alma. My title is now, simply, “Systems Librarian.”
Theoretically, I should still undertake UX work to inform both my web and LSP responsibilities. But honestly, I’m a bit heartbroken by my UX failure(s). So I’m taking a break from UX work. I don’t expect to add new posts here for a while, if ever. I’m proud of some of the work documented here, so I’m happy to leave it all up for now.
For reasons that will become apparent as you read the introduction, this post has tangential references to a number of oppressed groups and other fascist newspeak as encoded in Library of Congress metadata standards.
If you've been aware at all about the current political situation in the US, you'll be aware that the dude in the Oval Office has been issuing executive orders left, right and centre. Much damage is being done to the country, its people and its infrastructure that will take generations to repair, but as someone watching from an ocean away it would be easy to believe it won't affect me.
It will.
Not only will any success experienced by fascism in the US embolden fascists around the world, but the US is so culturally and economically dominant that anything happening there will inevitably impact the rest of us. "When America sneezes, the rest of the world catches a cold."
If this all seems like a weird way to introduce a post about me figuring out how to query and manipulate RDF metadata, well, it is. I do intend to write more about this soon, but for now I'll just say that the question I'm interested in is this:
Is the linked data published by the Library of Congress being materially changed as a result of the aforementioned flurry of executive orders?
If you think this sounds hypothetical, well, the US National Cancer Institute has already begun removing certain gender- and sexuality-related terms from its linked data thesaurus. I like to believe that, for reasons of both professional integrity and institutional inertia, it will take much longer before similar changes are made to vocabularies like the Library of Congress Subject Headings (LCSH). These are resources used in cultural heritage institutions all over the world, though, so if & when that happens the impact will be widespread.
The setup
I eventually want to be able to do this kind of analysis on "the big one", LCSH. But that really is big and queries against it will take … a while … so I've chosen to start with something smaller, the Library of Congress Demographic Group Terms. I chose this in particular because it's a reasonable size but also contains terms that appear on a list of those government departments are being instructed to remove so seems likely to be a target of that censorship.
This data is all in the form of RDF triples. This post isn't the place for a full explanation of what that means, but in brief, a triple is a machine-readable statement of some property of a subject in the form:
<subject> <predicate> <object>
So I have two versions of the same RDF ontology, and I want to know what statements have been removed or altered from one to the other. I could process the raw files myself as text but: 1) I would eventually end up writing the bits of a triple-parser and store I need myself, from scratch, which would be a waste of time; and 2) I actually would like to learn more about the key technologies involved.
To work with RDF data, I needed a specialised database called a triplestore, which is optimised for running queries against this type of graph data using a query language whimsically named SPARQL. I didn't give this too much thought, and picked Apache Jena Fuseki because it was open source, was available as a package in my Linux distribution and worked first time when I tried starting it. Other triplestores are available.
The basic unit of a triplestore is a "graph", equivalent to a "database" in a database management system. Because these are two versions of the same data I can't just load them both up into the same graph because they have significant overlap and the second would largely replace the first leaving me no way to compare them. What I can do is load them up into separate "named graphs", which I can then refer to explicitly in my queries. I've called the graphs for the two versions urn:demographicTerms/20250314 and urn:demographicTerms/20250321, for the dates I downloaded them.
Since RDF triples tend to use URIs (technically IRIs) all over the place, which tend to be quite verbose, SPARQL allows you to define prefixes to make queries less verbose. Here are a few that might be useful:
I'll also define my own prefix, dt: to make it easier to refer to the two named graphs:
PREFIXdt:<urn:demographicTerms/>
You'll see these abbreviated to <<common-prefixes>> in the code below. With that set up, let's dive into the data.
Getting situated
To make sure that I'm successfully querying the right place, and get an idea of what data is in there, let's pull out a list of all the unique predicates: the things that are used to make statements about other things.
This gives me a little confidence that we are at least looking at the right sort of data, and also gives hints about the structure of the dataset. Now we can start digging in a bit more deeply.
Deprecated records
In common with a lot of such vocabularies, those produced by LoC are generally quite scrupulous about retaining deleted entries but clearly marking them as such, rather than simply dropping them from the dataset. This is important: if you make use of a term that is subsequently deprecated, you need to know when that happened and why to understand how to appropriately update your own records, and historical records will likely still reference terms that have since been deleted but still need to be interpreted.
Let's take a look at what's been officially deprecated from this vocabulary:
This authority record was deleted because it was created in error.
Concentration camp inmates
2022-11-09T12:41:19
This authority record has been deleted because the demographic group term is covered by the demographic group terms {Internment camp inmates} (DLC)dg2022060247 and {Nazi concentration camp inmates} (DLC)dg2022060248
Parents of autistics
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of autistics} (DLC)dg2024060011 and {Parents} (DLC)dg2015060230
Parents of mass murderers
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of mass murderers} (DLC)dg2024060075 and {Parents} (DLC)dg2015060230
Politicians' partners
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of politicians} (DLC)dg2024060009 {Spouses} (DLC)dg2024060004 and {Unmarried partners} (DLC)dg2024060005
Cancer patients' partners
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of cancer patients} (DLC)dg2024060007 {Spouses} (DLC)dg2024060004 and {Unmarried partners} (DLC)dg2024060005
Parkinson's disease patients' partners
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of Parkinson's disease patients} (DLC)dg2024060008 {Spouses} (DLC)dg2024060004 and {Unmarried partners} (DLC)dg2024060005
Parents of transgender people
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of transgender people} (DLC)dg2024060076 and {Parents} (DLC)dg2015060230
Parents of dyslexics
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of dyslexics} (DLC)dg2024060073 and {Parents} (DLC)dg2015060230
Partners (Spouses)
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Spouses} (DLC)dg2024060004 and {Unmarried partners} (DLC)dg2024060005
Parents of alcoholics
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of alcoholics} (DLC)dg2024060010 and {Parents} (DLC)dg2015060230
Parents of gays
2024-06-06T12:09:08
This authority record has been deleted because the heading is covered by the headings {Family members of gay people} (DLC)dg2024060074 and {Parents} (DLC)dg2015060230
United States Air Force officers
2024-08-21T13:21:46
This authority record has been deleted because it is not a valid construction.
United States Coast Guard officers
2024-08-21T13:21:46
This authority record has been deleted because it is not a valid construction.
Junior high school students
2025-01-24T13:29:48
This authority record has been deleted because the heading is covered by the heading {Middle school students} (DLC)dg2015060024
There's actually nothing problematic here, as far as I can tell. Yes, some terms have been removed over time but this is because they are covered by some other combination (e.g. Parents of dyslexics), weren't correctly constructed to start with (e.g. United States Coast Guard officers), or were duplicates (e.g. Junior high school students).
Let's move on…
Social terms (includes gender and sexuality)
There are several subsets defined within the full set of terms. One in particular, http://id.loc.gov/authorities/demographicTerms/collection_LCDGT_Social, contains terms referring to social groups, which includes various categories which are actively threatened by the current regime. These seem worth looking over to see if there's anything amiss.
I've limited the number of results here to 20, but I've scanned through the full collection of 325 and all the terms I'd expect to see removed (relating to sexuality & gender identity, for example) are still present. For example:
The check I've been building up to, though, is to directly compare snapshots of the vocabulary taken one week apart. This is why I uploaded two versions into named graphs, and requires slightly more verbose queries to compare.
I thought about a few ways of doing this, but I've landed on this simple option which should nonetheless identify the most obvious kinds of vandalism: what (textual) labels are present in the earlier snapshot but absent in the later? This should catch both deleted and modified terms, which we can then potentially inspect further.
In SPARQL we can do this by querying the earlier graph for all its term labels, then filtering out any of these that do not exist in the later graph:
This differs only in the case of the single letter "A", so we can reasonably assume that this is just a typo being corrected or something of that nature.
What's next?
So far, it looks like this particular vocabulary has not yet been defaced. That doesn't mean it won't be though, so I'll need to continue taking snapshots at regular intervals and repeat these tests to see if anything else changes. One follow-up task, then, is to better automate this so I can't forget to do it (I've already missed the 2025-03-28 update).
There are other vocabularies that we do know have already been changed. Library of Congress Subject Headings (LCSH) are used in English-language library catalogues around the world, so the recent widely-discussed changes of the terms for "Mexico, Gulf of" and "Mount Denali" to "America, Gulf of" and "Mount McKinley" respectively will have global impact.
I'm aware that these changes do not go unnoticed in the GLAM community, since many catologuing professionals will routinely check updates to LCSH and friends before applying them to their own catalogue. I'm still doing this for my own interest, and also because a growing number of organisations simply don't have the capacity to do such checks.
LCSH is a much bigger dataset, and I plan to dig into that for my next post on this subject.
This blog post summarizes recent Research Library Partnership accomplishments and upcoming programming as we look forward to new opportunities to learn, connect, and grow. Our team is working tirelessly to position partner libraries for continued success in an increasingly challenging and uncertain environment. These RLP resources are designed to support our partners through fiscal challenges, staffing shifts, and more.
The value of the OCLC RLP—what we’re learning
At the end of 2024, RLP team members interviewed the most highly engaged RLP institutions to identify what they value most about their affiliation. Partners highlighted these areas especially:
Professional development and skill-building support
Research support and collaboration activities
Investment in special collections and archives
SHARES and metadata management communities
Networking and community-building opportunities
The Partners we spoke to appreciate our strategic guidance, accessible programs and resources, and targeted communication—advancing library trends while offering actionable insights for strategic and operational needs. We’re planning a survey for later this year that will help us understand if these opinions are shared across the Partnership.
Exploring the role of AI in libraries
Research libraries are exploring AI to enhance workflows and tackle operational challenges. Last year, our Metadata Managers Focus Group discussed “Getting ready for AI,” which helped inform our Transforming Metadata session at ALA 2024.
While technical aspects are important, many conversations emphasize the leadership required for the responsible adoption of AI tools. To support this, we have launched the Managing AI for Metadata Workflows working group to provide practical guidance and support.
As a team, we are committed to exploring how AI can support your work, and you can expect to see more from us in innovative research and programming across our programmatic areas.
In 2024, we launched new OCLC RLP Leadership Roundtables for special collections and research support services. These sessions have attracted an appreciative audience of Partners eager to share and compare local insights. To date, we’ve held over eight total meetings and engaged with over 130 senior leaders at 75 institutions on a range of topics:
Library support for open research
Responsible stewardship for special collections
Scaling research support services
Cross-campus collaboration in research support
Library support for bibliometrics and research impact
The evolving public services landscape in special collections
Advocacy and resourcing in special collections
Stewarding born-digital archival collections
Interested in knowing more about these sessions? You’ll find summaries on Hanging Together
Leveraging research and learning investments
The OCLC RLP is committed to offering opportunities to connect our research to your practice. Here are a few areas where our team is working to deliver actionable insights:
Library Beyond the Library: This project builds on our influential Social Interoperability research to explore how libraries are engaging with other campus units and contributing to new institutional research priorities. Watch for webinars, insights, and other resources in 2025 to help your library learn more from this research.
Reimagine Descriptive Workflows: OCLC has worked to implement recommendations based on this research, which interrogated the systems and structures behind library metadata practices. These include supporting locally preferred subjects in WorldCat Discovery and seeking community input in developing the WorldCat ontology. We’re also looking to what’s next—last year, a series of OCLC Research workshops in the United Kingdom highlighted the opportunity to broaden the geographic scope of this work, so we’ve engaged a group of RLP Partners in the UK and Ireland and will share findings from our efforts in the coming month.
Inclusive collections: In 2023, we shared insights on how research libraries approach their collections to support goals around inclusion and serving community needs. We followed up with a series of webinars on strategies for inclusive collection development, including our webinar on “Diversifying library collections through student-led collection development” featuring Bryn Mawr College. This program provides students with a paid internship to gain hands-on experience in collection development and the book trade by selecting diverse books for the library, supporting local independent bookstores, and actively contributing to library acquisition.
This week, we will continue this conversation with RLP Partners Bryn Mawr and University of Nevada, Las Vegas in what promises to be an impactful ACRL conference session, “Navigating Inclusive Collection Development: Strategies for Authentic Representation.” If you’re attending the conference, please add this session to your agenda and stop by to say hello to Merrilee Proffitt and our RLP Partner panelists
Expanding the RLP community
We’re excited to welcome the National Library of New Zealand (Te Puna Mātauranga o Aotearoa) and Hofstra University Libraries to the RLP. The SHARES network is also growing, with new members including Hofstra University, Syracuse University College of Law, University of Pittsburgh, and Vanderbilt University. These additions reflect the Partnership’s ongoing commitment to fostering a diverse and dynamic community.
Looking ahead
Thanks to ongoing support and engagement from our Partners, the RLP program continues to evolve and grow. Together, we’re driving innovation and growth, addressing today’s challenges while preparing for the future. Here’s to a transformative 2025!
Mark Rober's video Can You Fool A Self Driving Car? demonstrating why self-driving cars need LIDAR not just cameras was extremely well done, and deserved its 16M views, but it was easily the least bad recent news for Tesla. When your sales figures and your stock price look this bad, desperation tends to set in. And trotting out the co-President and the Commerce Secretary to make sales pitches for you is a clear sign that it has.
Below the fold I look at the flood of bad news for Tesla.
The stock is currently down nearly 50% from its peak, but despite that drop its trailing P/E is still over 60, which is valuing the company as a tech company with vast growth potential. For comparison, Nvidia is down 25% from its peak and its P/E is around 40. In the most recent quarter, Nvidia's revenues were almost $40B with operating income up 77% year-on-year, Tesla's revenue was $25B with operating income down 23% year-on-year.
It is clearly time to pump the stock, and Musk's favored pumping technique is to announce an implausible schedule for some vaporware product and hope that no-one remembers what happened to all the previous implausible schedules he announced. So far this year we have:
the company would be “launching unsupervised Full Self-Driving as a paid service” in Austin in June.
Outside China, Waymo is the only company currently offering a fully autonomous paid taxi service. To do it they need staffed depots for charging and cleaning, and a staff of remote drivers for when the cars get into trouble. June is ten weeks away, so Tesla should be leasing depot space, and hiring and training staff. Is it?
Model 2 Redwood’s price will start at just $19,999,
Does anyone remember his announcement of a ship date and a price for the Cybertruck? Does anyone remember Tesla announced then cancelled the Model 2 last year? And given the collapse in used Tesla prices, at $20K it will compete with the Model 3.
SpaceX founder Elon Musk said Saturday its massive Starship rocket would leave for Mars at the end of 2026 with Tesla humanoid robot Optimus onboard, adding that human landings could follow "as soon as 2029."
Given that Starship needs a major redesign after two successive failures rained flaming debris on Caribbean islands and disrupted flights across the Eatern US, reporters might take note of Eric Berger's skepticism:
Although there will no doubt be pressure from SpaceX leadership to rapidly move forward, there appears to be a debilitating design flaw in the upgraded version of Starship. It will be important to understand and address this. Another launch before this summer seems unlikely. A third consecutive catastrophic failure would be really, really bad.
Automotive revenue fell 8% to $19.8 billion from $21.56 billion in the same quarter last year, and of that, $692 million came from regulatory credits.
4% of revenue doesn't sound like much, but it goes straight to the bottom line. A better way to think of it is that carbon credits are 30% of Tesla's net income. Tesla can only claim a credit after it sells a car, something that has recently become more difficult. Unsold cars don't just represent tied-up capital, they also represent unclaimed carbon credits. Dean Blundell's thread reports that:
The Canadian government is probing Tesla for allegedly inflating EV sales to claim $43M in rebates during the final days of the iZEV program. Here’s what we know. Over a 3-day period, Tesla reported selling 8,653 cars across 4 dealerships, claiming over half of the remaining rebate funds. This surge coincided with the program’s announced suspension, raising suspicions of foul play. To put this into perspective: Tesla would have needed to sell 2 cars per minute continuously for 72 hours. Many dealerships couldn’t even process claims as Tesla reportedly consumed most of the funds. Independent dealers are now out $10M after providing rebates they expected to be reimbursed for. The Canadian Automobile Dealers Association accuses Tesla of exploiting the system, leaving others locked out. Transport Canada officials noted discrepancies, suggesting Tesla may have backdated claims or exaggerated sales figures. One Quebec location alone claimed 4,000 sales in a weekend—an implausible number given its capacity. Tesla’s actions have sparked outrage among dealers and industry groups, with calls for stricter oversight of rebate programs. The investigation continues, but questions remain about whether laws were broken or loopholes abused.
There have been rumors for a long time that another Tesla pump technique was dodgy accounting, and it seems that the Financial Times may have found some. $1.4bn is a lot to fall through the cracks, even for Tesla by Dan McCrum and Steven Morris reports:
Compare Tesla’s capital expenditure in the last six months of 2024 to its valuation of the assets that money was spent on, and $1.4bn appears to have gone astray.
The sum is big enough to matter even at Tesla, and comes at a moment when attention is returning to the group’s underlying numbers, now that its fully diluted stock market valuation has crashed from $1.7tn to below $800bn.
Such anomalies can be red flags, potentially indicative of weak internal controls. Aggressive classification of operating expenses as investment can be used to artificially boost reported profits.
Why might Musk be desperate to pump the stock? Will Lockett examines this question in Tesla Is So Screwed:
Just a reminder that Tesla stock is so wildly overvalued because investors have been speculating that the 4680, Cybertruck, FSD, and Cybercab would be the revolutions Musk promised and would dominate the automotive world. As such, Musk should see the fact that Tesla’s stock has only shrunk by 36% as a miracle. But, as time goes on, and Tesla abandons the 4680, Cybertruck sales continue to disappoint, and FSD and Cybercab demonstrations are either delayed or it becomes painfully obvious to everyone how abysmally perilous they are, these investors will realise they made a bad bet and pull their money out, crashing Tesla even further.
Lockett looks at each of the four factors in turn:
This was meant to be a revolutionary battery with high specs and industry-crushing low prices. Tesla has sunk billions upon billions of dollars into its development, yet five years since it was announced, it has yet to meet the target specification or price. Naturally, it has slipped miles behind the curve, given that there are now battery packs that are simultaneously safer, far cheaper, and charge much faster than the 4680.
But, after a dismal year of sales, Tesla is already scaling back Cybertruck production capacity to only a few tens of thousands of units per year. This isn’t surprising, as they are terribly built, practically useless, and reports have found they are more deadly than the infamous Ford Pinto
Then there is the fact that Tesla has admitted that the current hardware installed on its cars isn’t good enough for self-driving, even though they previously stated that it was, meaning that to sell fully-fledged FSD, Tesla will have to do one of the most expensive recalls in history to retrofit new hardware into millions of cars.
On top of that, even after the Cybercab demonstration, we have yet to see a Tesla safely drive itself on public streets, even when its competitors have been achieving that for years.
According to a 2024 Tesla filing with the Securities and Exchange Commission (SEC), Musk has pledged 238.4 million shares "as collateral to secure certain personal indebtedness." At the time, Musk held a total of 715.0 million shares, meaning approximately one-third were being used as collateral for personal loans.
Musk currently owns around 411 million shares in Tesla, according to portfolio management service Whalewisdom, equating to a roughly 12.8 percent stake in the company.
Assuming these numbers are still correct, Musk has 172.6M unpledged shares, today "worth" about $46B. At JP Morgan's $120/share target, they would be worth about $20.7B. That's a problem for Musk. But the 238.4M pledged shares, today worth around $64B would be worth only $28.6B, which might be a bigger problem for Musk's lenders.
People tend to focus on Musk's and Tesla's assets but not so much on the debt. Lockett has looked at the the other side of the ledger in This Is How Tesla Will Die:
Musk has also used his Tesla stock as collateral for SpaceX, Twitter, and Tesla loans. Before he bought Twitter, over half of his shares were collateralised; now, that figure is far, far higher. Again, let’s be generous and assume only 70% of his 12.8% stake in Tesla is collateralised in this way, with a third of these loans for Tesla. That would mean Musk has $71.68 billion in personal loans, with $23.89 billion for Tesla.
In other words, Tesla actually has $72.28 billion in debt. That is more than the company is realistically worth!
Lockett means worth if it were valued as a car company, because at Toyota's PE of 8.4 instead of Tesla's current forward PE of 116, it would be worth $54B. Today, Tesla's market cap is around $827B. At JP Morgan's $120 target Tesla's market cap would be around $386B, so the total debt would be around 19% of market cap. Tesla's debt would be around 12.5% of market cap, so they'd be fine. But Musk would appear to owe around $72B but have around $24B of collateral pledged. Adding the $21B of unpledged shares gets to $45B, leaving a gap of $27B. Both Musk and the lenders would have a problem.
Apart from general greed, the co-Presidents have a special need to keep TSLA pumped. Their hold on the Republican party depends to a large extent on the perception that the $300M Musk paid to elect Trump was small change compared to what they could donate to their primary opponents. In Sen. Lisa Murkowski says fellow Republicans 'afraid' of Trump, Musk, Benjamin Siegel gives an example:
"It may be that Elon Musk decides that he's going to take the next billion dollars he makes off Starlink and put it directly against Lisa Murkowksi. And you know what. That may happen. But I'm not giving up one minute, one opportunity to try to stand up for Alaska," she said.
Tesla Inc.’s sales fell for the 10th time in the last 12 months in Europe, where Elon Musk’s politicking and a changeover of the carmaker’s most important product have been major hindrances.
The company registered 16,888 new cars in February, down 40% from a year ago, according to the European Automobile Manufacturers’ Association. Tesla’s sales plunged 43% in the first two months of the year, deviating from the 31% rise in industrywide EV registrations.
Tesla Inc.’s vehicle sales fell 13% last quarter to an almost three-year low, as the carmaker made over its most important model and dealt with international backlash against Elon Musk.
The company said Wednesday that it delivered 336,681 vehicles in the first three months of the year, its worst showing since the second quarter of 2022. Analysts on average were expecting the company to sell more than 390,000 cars and trucks, according to estimates compiled by Bloomberg.
In a Wednesday note to clients, the JPMorgan group led by Ryan Brinkman lowered its forecast for Tesla’s first-quarter deliveries by 20% from 444,000 to 355,000, significantly below the consensus analyst projection of 430,000, according to FactSet.
JPMorgan’s prediction calls for Tesla’s lowest deliveries since 2022’s third quarter and an 8% decline from 2024’s first quarter
...
JPMorgan’s $120 share price target for Musk’s company is the lowest on Wall Street, according to FactSet data, and it implies more than 50% downside from Tesla’s $248 ticker Wednesday.
Tesla's build quality problems are well known; from panel gaps to crummy interiors to glass roofs that just fly off for no reason, it's safe to say customers should be on the lookout when they purchase a brand-new Tesla.
...
Instead of being up in arms about quality issues on one of the most expensive purchases most people will ever make, Tesla customers are selling ways to double check quality problems to other owners. It's safe to say, no other automaker could ever get away with something like this.
This problem has been hard to ignore since the launch of the Cybertruck 16 months ago, because there have been 8 recalls, the latest because parts have been falling off and causing a hazard for other road users. Remy Tumin reports on this:
The announcement marks one of the largest recalls for Cybertrucks in the model’s short and at times flawed history on the road. Other issues with the vaunted model have included losing drive power, its front-windshield wiper malfunctioning and an accelerator pedal getting stuck. Cybertrucks sell for about $80,000 to $100,000, depending on customization.
Other car manufacturers don't have models that average a recall every other month.
One of the Tesla Cybertruck‘s headline features is its “bulletproof” windows and stainless steel body. Elon Musk has even gone as far as describing the angular, polarizing electric pickup as an “armored personnel carrier from the future.” But if the Cybertruck’s protection from Robin Hood is a major selling point for you, you should know that Tesla’s proof of its bullet resistance crumples like a hollow point under close scrutiny.
Consumer sentiment surveys from The Conference Board and University of Michigan have been dismal of late as households fear a resurgence in inflation from President Donald Trump’s tariffs. Companies have warned of higher prices and less demand, coinciding with economists’ forecasts that suggest a risk of stagflation and rising odds of recession.
Tesla’s sales dropped by 41% in Germany last year compared to 2023 despite EV sales surging 27% during the year.
Despite the already bad results in 2024, Tesla’s sales were down 70% in the first two months of 2025.
...
Amid this evident crisis for Tesla in Germany, we reported last week on a survey of 100,000 people by Germany’s popular T-Online publication that showed that only 3% of respondents would consider buying a Tesla vehicle.
...
Musk shared a post that claimed the survey now points to “70% of people in Germany would buy a Tesla again”
...
Sure enough, T Online has now reported that the survey has been manipulated by bots, with 253,000 votes coming from just two IP addresses in the US:
It isn't just that many fewer people are buying new Teslas, it is also that existing Tesla owners are dumping theirs. CNN's The used Tesla market is crumbling describes the rout:
CarGurus, a car-buying site, found that used Tesla prices are falling at more than double the rate of the average used car price. The Cybertruck, the controversial steel-sided pickup, fared the worst of any Tesla vehicle, with a resale value 58% less than its original price, according to CarGurus.
This has three effects. It increases the motivation for owners to sell their cars, because they see them depreciating so fast, it diverts buyers who actually want a Tesla from new to used, and it forces Tesla to discount new car prices to move them off the lots, which increases the depreciation. It is a vicious cycle.
But the much bigger problem is simply that Tesla hasn't kept up with the competition in the Chinese market, which last year contributed 22% of revenue.
BYD on Monday unveiled a new charging system that can deliver enough power to its cars to travel 292 miles within just five minutes, a tad longer than it takes to fill up a tank of gas. BYD Chair Wang Chuanu told Bloomberg News that the company plans to set up 4,000 charging stations, although he didn’t specify a timeline for doing so.
The “Super e-Platform” includes flash-charging batteries, a 30,000 RPM motor, and silicon carbide power chips, according to Bloomberg. It claims that its system is the fastest of its type for mass-produced vehicles.
The first BYD models to get the five-minute charging system will be the Han L sedan and Tang L SUV, which are set to hit the market in April at prices starting at 270,000 yuan ($37,338). The company sold more than 318,000 passenger vehicles in February, up 161% year-over-year.
Cornering the market on ultra-speed EV charging would be a major boost to BYD, which is already the biggest automaker in the world’s biggest auto market. BYD also plans to equip all of its models priced above $13,688 with the “God’s Eye” software, offering assisted driving software to mass-market buyers.
Tesla's Supercharger network is a big part of its product appeal, and it is vulnerable. Om Malik's China's EV Boom Is Bad For U.S.Tech has the dismal numbers:
In 2023, Tesla delivered 1.8 million cars, while BYD delivered 1.57 million. In 2024, Tesla delivered 1.79 million electric vehicles, while BYD sold 1.76 million. BYD is nipping on its heels, and poised to take a lead.
However, these numbers don’t tell the complete story. BYD reported higher revenues in the third quarter of 2024 and delivered more fully electric vehicles than Tesla in the fourth quarter. According to China Passenger Car Association data released March 4, Tesla’s wholesale sales in the Chinese market fell 49% in February compared to the same period last year, reaching 30,688 units.
...
BYD sold 4.27 million cars in 2024, approaching Ford's estimated sales for the year. China exports EVs in aggregate to the Global South rather than to the EU or US. No wonder Ford’s CEO is honest enough to admit that they are facing down the barrel of the gun.
Xiaomi is Why Apple Should Have Made a Car on Asianometry's YouTube channel recounts the history of just one of BYD's several competitors. Starting around 21:15 the video explains how an attempt by the US to sanction Xiaomi drove them into the EV market.
BYD is, like Tesla, a vertically integrated company. But Xiaomi and the other Chinese EV companies are not. They are enabled by the fact that the EV industry in China is like the PC or the smartphone business. The product is modular, and each module forms a competitive market. Xiaomi, the smartphone company, understood how to operate in this market when, on 30th March 2021 they announced that they would build an EV.
They launched their first car in December 2023, announced prices for a Model S size sedan in March 2024 at $30K ($4K cheaper than a Model 3) for the base model and up to $41.5K for one with a 497-mile range on the CLTC standard, 673 horsepower, four-wheel drive and a self-driving stack including dual Nvidia Orin and lidar. They started taking orders on 28th March 2024. The first 50K orders took 27 minutes, they had 100K by April, and were shipping a few days later. They delivered 140K cars in the first 8 months, and are now producing 20K cars/month.
If you had to guess what the chief executive officer of the Ford Motor Company used as his daily driver ... well, no matter what you might think, you'd probably be wrong. As it turns out, Ford CEO Jim Farley has been driving around in a Chinese-made Xiaomi SU7 as of late for a little competitive research — and he sure does love it.
Speaking on the Everything Electric Show podcast, Farley praised the brand-new automaker’s electric sedan. "I don't like talking about the competition so much, but I drive a Xiaomi," he said. "We flew one from Shanghai to Chicago, and I've been driving it for six months now, and I don't want to give it up."
In late 2024 Xiaomi launched an Ultra version with 1500hp and 0-100KPH in 2s, priced between $72K and $114K. Deliveries began last month. Jack Fitzgerald reports that a Xiaomi SU7 Ultra Prototype Laps the Nürburgring in 6:46.87 Minutes, the fastest time ever for a 4-door sedan:
A video released by the Nürburgring shows the Xiaomi prototype smashing the official lap records for production versions of four-door and electric cars. Not only is the electric sedan's time remarkable, but the SU7 Ultra managed it while appearing to lose power around the 4:15-minute mark, as indicated by the onboard video.
The incident on March 29 marks the first major accident involving the SU7 sedan, which Xiaomi launched in March last year and which since December has outsold Tesla's Model 3 on a monthly basis.
...
A disclosure from the company earlier on Tuesday said initial information showed the car was in the Navigate on Autopilot intelligent-assisted driving mode before the accident and was moving at 116 kph (72 mph).
In a rundown of the data submitted to local police posted on a company Weibo account, Xiaomi said the autopilot system had issued a risk warning of obstacles ahead.
A driver inside the car took over and tried to slow it down, but then collided with a cement pole at a speed of 97 kph.
Tesla had to roll back its ‘Full Self-Driving’ free trial in China after a policy change brought more scrutiny to software updates for advanced driver assist systems.
The automaker made the system available through a free trial this month to try to encourage people to buy the system through an over-the-air software update.
Previously, the system was called “FSD Intelligent Assisted Driving” in Chinese. The new name drops “FSD” from the title, and simply calls it “Intelligent Assisted Driving.” It has also previously been called “Full Self-Driving Capability” in China.
Tesla has received plenty of criticism over the years for the name of its system, which, despite being called “Full Self-Driving,” does not actually allow cars to fully drive themselves. Tesla changed the name to “Full Self-Driving (Supervised)” in the US last year, to show that a driver still needs to supervise the vehicle while the system is active.
"Intelligent Assisted Driving" costs almost $8K in China, but:
But immediately after that rollout, Tesla drivers started racking up fines for violating the law. Many roads in China are watched by CCTV cameras, and fines are automatically handed out to drivers to break the law.
It’s clear that the system still needs more knowledge about Chinese roads in general, because it kept mistaking bike lanes for right turn lanes, etc. One driver racked up 7 tickets within the span of a single drive after driving through bike lanes and crossing over solid lines. If a driver gets enough points on their license, they could even have their license suspended.
Chinese tech giant Baidu is reportedly planning to launch its robotaxi service outside of China as it looks to make inroads in the autonomous driving global market - a growing industry that other Chinese players as well as Western firms are racing towards.
The Beijing-based company is hoping to test and deploy its Apollo Go robotaxis in places including Hong Kong, Singapore and the Middle East, according to reports from the likes of Nikkei Asia and the Wall Street Journal that cited people familiar with the matter.
...
Baidu is already operating robotaxi services in multiple cities in China. It provided close to 900,000 rides in the second quarter of the year, up 26 per cent year-on-year, according to its latest earnings call. More than 7 million robotaxi rides in total had been operated as of late July.
Waymo is in the robotaxi market, albeit with vehicles that are a lot more expensive than Baidu's. Tesla is not in this market.
Xiaomi is just one of the Chinese EV makers, and both it and BYD are obviously performing better and faster than Tesla. How is Tesla responding? By getting the White House to place a 25% tariff on imported cars. China has studied the history of Japanese car exports, as Lingling Wei reports in China Explores Limiting Its Own Exports to Mollify Trump:
Like Japan decades ago, China is considering trying to blunt greater U.S. tariffs and other trade barriers by offering to curb the quantity of certain goods exported to the U.S., according to advisers to the Chinese government.
Tokyo’s adoption of so-called voluntary export restraints, or VERs, to limit its auto shipments to the U.S. in the 1980s helped prevent Washington from imposing higher import duties
These restrictions didn't prevent Toyota and Honda grabbing a big chunk of the US market, because their cars were better. The reviews of EVs in the Chinese market agree that comparable models are both better and cheaper than Tesla's, which is why their sales are dropping. It is likely that the highly competitive Chinese market can produce EVs that are better and more than 25% cheaper than US-made EVs.
Discover how AI-powered B2B sales enablement through advanced search technology can transform your manufacturing business. Learn the four stages of search maturity that eliminate lost sales, enhance customer satisfaction, and drive revenue growth.
Each year during the American Library Association’s “Banned Books Week,” librarians display famous books that have been challenged in the past as well as the most frequently challenged books in the present day. This book-centric approach seems to be the standard. By focusing on the books themselves, however, do librarians tacitly concede that these books are indeed controversial? Defending challenged books on their merits, as in typical Banned Books Week displays, accepts the debate terms set by would-be censors. The problem is not the book; it’s the act of censorship. We should closely examine the motivations and political context of book banning movements in order to face this challenge with a fuller understanding of the problem. This article describes an exhibit at Florida State University Libraries during Banned Books Week 2022 that confronted this problem directly by trying to understand acts of censorship in Florida history. The exhibit, “Against Liberty: A History of Banning Books in Florida,” deployed primary sources readily available online and in the FSU collections to explore who has challenged books in Florida history, and what were their motivations.
Introduction
Each year in fall, typically in late September, the American Library Association (ALA) sponsors “Banned Books Week,” an annual event observed in libraries across the United States to raise awareness of challenges to the freedom to read and to build community with librarians, authors, publishers, and the reading public. Many librarians take this opportunity to create book displays, often showcasing famous books that have been banned in the past as well as the most frequently challenged books in the present day. In support of such efforts, ALA maintains a webpage that includes a variety of creative ways to display banned books in libraries. In almost every example on the site, the books themselves are the focus, often with some note about their content that has been considered objectionable.1
But considered objectionable by whom? In describing the current wave of book challenges, the ALA website alludes to “[g]roups and individuals demanding the censorship of multiple titles, often dozens or hundreds at a time,” but doesn’t elaborate further about who these groups are.2 PEN America, an advocacy group for writers and free expression, reported that, since 2021, the dramatic increase in the number of book challenges in the United States is largely the work of a movement to “advance extreme conservative viewpoints about what is appropriate and allowable in schools.”3 But who belongs to this movement, and why are they coordinating efforts nationwide to remove books en masse from library shelves across the country? Moms for Liberty, headquartered in Florida, is one of the organizations coordinating today’s book banning movement. For example, Bruce Friedman, a member of Moms for Liberty in Clay County, near Jacksonville, submitted over 400 book challenges to the county school district during the 2022-2023 school year. According to the Tampa Bay Times, which reviewed statewide data on book challenges, over 600 challenges of the total 1,100 received throughout the state that year came from either Friedman or former Escambia County teacher, Vikki Baggett. Baggett presented to the Santa Rosa County chapter of Moms for Liberty in May 2023 to share tips on how to get books removed from school libraries.4 The numbers of book challenges are staggering and the coordinated campaign that now extends nationwide, led by well-connected organizations including Moms for Liberty, suggests there is more to this movement than local parents’ concerns. In the face of this powerful movement against the freedom to read, librarians and the reading public need to follow the advice of censorship and intellectual freedom expert Emily Knox and take seriously the book challengers’ reasons for action, both to better understand their motives and to more effectively respond to attempts at censorship.5
In the summer of 2022, during what seemed at the time to be the height of the mass book banning movement sweeping public and school libraries in Florida, I decided to respond by rethinking the traditional Banned Books Week display. My goals were to turn the spotlight onto the would-be book banners in order to interrogate their motives and to put the current wave of book challenges into historical perspective. I decided to focus entirely on the state of Florida, where I lived and worked, because it was home to most of the university’s student body, and because the state has been the epicenter of US book banning in the twenty-first century. Based on data collected by PEN America, the 2021-2022 school year was, unfortunately, not the height of book banning, but only the beginning. Since that year, when PEN America documented 566 book bans in Florida, the number more than doubled to 1,406 in the 2022-2023 school year, and then skyrocketed to 4,561 book bans in 2023-2024. Since 2022, Florida has led the nation with the most book bans by state.6 Florida has also served as a testing ground for government action, providing a model for other censorious politicians to follow at the state and federal levels.7
It is also important to acknowledge that my individual and institutional positionality made it possible to create an exhibit that some might find politically provocative. I was able to proceed in part because of my union membership in the Florida State University chapter of the United Faculty of Florida. As a faculty member covered by the collective bargaining contract, I enjoyed the same academic freedoms and job security as other university faculty and felt empowered to present an honest and historically accurate narrative of book banners in Florida history. Unions are also under threat in Florida; the Republican-controlled legislature regularly passes new laws making it difficult to sustain union membership and legal recognition. However, at the time of my exhibit, the FSU faculty union had succeeded in meeting each new requirement that the state imposed. I also obtained backing from my library Dean, who supported the project enthusiastically. Additionally, as a cisgender white male, I might not be as easily targeted as faculty and staff from historically marginalized groups, especially given the political backlash currently aimed at campus offices and curricula that support the university’s Diversity, Equity, and Inclusion (DEI) goals. And finally, working at a large research university library meant that my collections were not the typical targets of coordinated book banning movements, which are more likely directed at public and school libraries.8 For all of these reasons, I felt both empowered and compelled to raise my voice on this issue.
In this article, I will share both the content and process of creating my exhibit, “Against Liberty: A History of Banning Books in Florida.” In sharing my historical research and experience creating the exhibit, I hope that library workers and other readers will recognize the core message of my exhibit: that book banning is rarely pursued as a good faith effort to protect readers from harmful content. Rather, as evidenced many times in Florida history, acts of censorship are rooted in struggles for power and social dominance. The three major episodes of book banning that I explored in my exhibit coincided with times of social crisis. In Florida, in the 1830s, in the decades after Reconstruction, and again during the mid-twentieth century Civil Rights Movement, reactionary forces used censorship as one tool in their bid to control the political narrative in times of significant social and cultural change. This historical approach helps us to see continuities between the motives and methods of book banners past and present. With a clearer understanding of these moments of conjuncture and why censors have challenged books in the past, we may be better equipped to respond today by crafting effective policies, procedures, and political advocacy in our communities. As my exhibit showed, the challenges we face are not new, and recognizing the true motives of book banners is essential to resisting the often powerful interests that seek to limit our freedom to read, learn, and imagine alternative futures.
An Idea Emerges
The idea for this exhibit came together during the summer of 2022, when my graduate course work in nineteenth-century United States history intersected with my role as Humanities Librarian at Florida State University Libraries. As a student, I had recently conducted research on neo-Confederate organizations and how they shaped Southern universities in the late-nineteenth and early-twentieth centuries. These groups, including the United Daughters of the Confederacy (UDC) and the United Confederate Veterans (UCV), were very concerned about the history being taught in Southern schools and worked to influence the curricula and textbooks used in schools at all levels, including colleges and universities. This research was fresh in my mind when Dr. Laura McTighe, a professor of religion and frequent collaborator with the library, mentioned to me in conversation that she was reading about how, before the Civil War, Southern enslavers passed laws to punish people found with abolitionist books and pamphlets in an effort to prevent the spread of abolitionist ideas. Putting this conversation in context with my research and today’s book banning movement, I was beginning to see similarities between the censors of each time period.
The final catalyst arrived when a library colleague alerted me to a book that had just been returned with a note written on the inside cover page. (Fig. 1) My colleague wanted to know what we should do with a defaced book, and since it was a history book, it fell to me to make this collection development decision. When I saw that the “defacement” was a note saying, “Warning–this is a racist book,” complete with page numbers to the writer’s purported evidence, I was intrigued. As it happened, the writer was correct. The book was a reprint edition of History of Georgia, by Robert Preston Brooks, one of many examples of histories from Dunning School scholars of the early twentieth century. Named for Columbia University historian Archibald William Dunning, who trained many of the scholars who wrote in this tradition, it refers to an interpretation of the post-Civil War Reconstruction era critical of Black enfranchisement and officeholding while sympathetic to the white Southerners who overthrew the Reconstruction governments, often violently.9 Here in my hands was a physical manifestation of the neo-Confederate movement I had studied and a modern reader’s act of resistance to it. I decided that instead of deaccessioning the book because of the annotation and the book’s overall poor condition, it needed to go into an exhibit!
Fig. 1. “Defaced” inside cover page of History of Georgia. Photo by author.
Exhibit Logistics
As I conducted further research on book banners in Florida history, I had to consider how I could represent my findings in a physical exhibit. There were several logistical issues to consider in the process. First, the only space available for my display was a small empty wall on the first floor of our main library building. I took measurements of the space to keep in mind as I selected objects and images to include in the exhibit and worked with colleagues to develop appropriate title banners and signs.
Assembling my exhibit required a bit of resourcefulness. The library did not have locking exhibit cases, so I resolved not to use any rare materials in the exhibit. Instead, most of the objects I displayed were either books from the circulating collection or high-quality scanned images. I borrowed an empty, unused glass-doored cabinet with shelves to place in the center of the exhibit space to hold most of the physical books. A colleague in the library’s technology department was able to lend me a computer monitor, which could be secured with a lock, in order to include a documentary film, played on loop, with permission from the creators.
I printed out scanned images of historical documents and photographs on regular copy paper using the library’s color printer. The quality of the images was high enough that professional printing or high-grade paper seemed unnecessary. I mounted the print outs on pieces of foam core board, cut to size, using double-sided tape. I then affixed these to the wall with adhesive putty. I created exhibit labels using the same process as the scanned images, paying close attention to the font, size, and amount of text on each label.10 All of the materials described here were easily procured from a local office supply store for about $70, which the FSU Libraries was able to reimburse. I did most of the printing, cutting, and mounting. The FSU Libraries Special Collections and Archives provided book stands and the mylar strips I used to hold open one of the books on display. My colleagues in the marketing and communication department designed the main logo and printed out the exhibit title sign on the department’s large format printer, so I did not have to account for those costs. Other colleagues helped me to move the cabinet into place, and to set up the computer screen for the documentary. I also created QR codes for some of the object labels with links to related content that couldn’t easily be incorporated into the physical exhibit. After several weeks of researching, final installation took about two days. The exhibit ran from September 19, the start of Banned Books Week 2022, until Thanksgiving break. (Fig. 2)
Fig. 2 “Against Liberty: A History of Book Banning in Florida” exhibit, Strozier Library, Florida State University, 2022. Photo by author.
Florida’s Book Banners
“Incendiary Publications” in Antebellum Florida
Throughout Florida’s history, citizens and state agents have used book banning as a form of power to protect their cultural dominance and wealth. In the 1830s, where my exhibit began, enslavers attempted to censor abolitionist ideas in order to prevent slave revolts and other threats to their control of human property. Slave owners had convinced themselves that if enslaved people resisted, ran away, or revolted, as happened frequently throughout the Americas, then it must be because abolitionists had incited Black people to seek freedom. So, assuming that slave resistance was due to “outside agitators,” Southern state legislatures passed laws prohibiting enslaved people from learning to read, interfered with the US Postal Service, and restricted the movements of Black sailors and local free people of color who might distribute abolitionist books and pamphlets.11 Such measures reached a fever pitch in the 1830s in part because segments of the ruling class had come to embrace abolitionism on both sides of the Atlantic, particularly in the US North and in Great Britain among evangelical Protestants and the ascendent industrial elites.12 Enslavers in the US South had previously benefited from the general acceptance of slavery among merchants and manufacturers, at least when limited to the southern states. But beginning in the 1830s, the Slave Power encountered growing resistance in Northern and British newspapers, popular literature, and political speeches as slavery expanded into Florida and some of the western territories created out of the Louisiana Purchase.
One famous example of abolitionist literature that sparked such paranoia and repression was David Walker’s Appeal, in Four Articles, to the Coloured Citizens of the World. (Fig. 3) Walker, the son of an enslaved man and free Black woman, used the US Postal Service to circulate his Appeal from his home in Boston. In it he denounced slavery, called on the “coloured citizens” of the world to rise up against their oppressors, and asked White, Christian enslavers to recognize their sins against God and against liberty.13
Fig. 3. Cover image of David Walker’s Appeal, 1830.
Several Southern states passed laws banning such abolitionist literature. The Florida legislature drafted a similar ban (Fig. 4), but it was never enacted owing to Florida’s status as a federal territory at the time. Presumably the federal government wasn’t going to approve a law so prejudicial against its own Postal Service. There were also white Floridians who opposed the law because they felt it would give more legal protections to abolitionists than they deserved. One such group, meeting in 1835 at Shell Point, a coastal town south of Tallahassee, proclaimed that the proposed law against incendiary publications “dignifies the question…subjecting it to the operations of a Grand Jury,” when instead the citizens already have the right “to act in that summary and efficient way, according to the first great dictates which the God of nature has implanted in the bosoms of men.”14 In other words, a local vigilance committee could enact their own justice, likely by assaulting or lynching any suspected abolitionist.
In the exhibit, I paired these attempts to restrict abolitionist literature in the 19th century with evidence of similar reading restrictions enforced by the Florida Department of Corrections (FDC), which has banned over 20,000 publications from Florida prisons.15 The scale of the FDC’s book banning is considerable, but so too is the scale of incarceration in Florida. The FDC is Florida’s largest state agency and the third largest state prison system in the country, with an operating budget of $3.4 billion.16 According to the Prison Policy Initiative, in 2023 over 80,000 Floridians were incarcerated in state prisons. Add to that local jails and federal prisons, and over 157,000 Floridians were behind bars, equivalent to 795 prisoners for every 100,000 Floridians, an incarceration rate higher than the US as a whole.17
Among those books banned in Florida prisons is Tallahassee attorney Reggie Garcia’s book, How to Leave Prison Early, a guide to navigate the state’s clemency, parole, and work release laws and procedures. It may seem shocking that a guide to legal remedies available to incarcerated men and women should be kept from the very readers who could use it most, but to admit that those in prison deserve such relief seems to challenge the power and control of the prison system.
Controlling the Narrative in the Era of Jim Crow
After the Civil War, societies like the United Confederate Veterans (UCV) and the United Daughters of the Confederacy (UDC) formed to protect the legacy of their Confederate ancestors and to reclaim their cultural power and social status by promoting the “Lost Cause,” a distorted memory of the antebellum South that helped to legitimize the social and racial hierarchies of the Jim Crow South. These Southern elites focused especially on schools and universities, where control of textbooks and curricula were essential to reestablishing ideological hegemony after the end of Reconstruction. Convinced that the “vindication of the South must come from the pens of southern writers,” the History Committee of the UCV established guidelines for selecting history textbooks to be used in southern schools. As reported in the UCV’s magazine, Confederate Veteran, all histories published in the North were suspected of sectional “prejudices” against the South and were excluded outright from consideration for school adoption.18 (Fig. 5)
Fig. 5. Cover image of Confederate Veteran magazine, June 1895 issue.
Southern authors filled the void left by these banned books with histories of their own, justifying the Southern rebellion as both honorable and Constitutional, while also softening the image of plantation slavery. This project was not limited to the academic histories written by students of Columbia professor William Archibald Dunning, as mentioned above. Caroline Mays Brevard, a historian, author, and scion of two powerful Florida families, taught at Leon High School in Tallahassee and at the Florida State College for Women, predecessor of today’s Florida State University. Brevard’s 1904 book, A History of Florida, which romanticized plantation society and made heroes of the aristocratic enslaver class, was used as a textbook in Florida schools for two decades.19 The myth of the “Lost Cause” that this and many other books promoted took hold by the turn of the twentieth century and became the dominant historical interpretation of American history in white-controlled schools and in popular culture.20
The United Daughters of the Confederacy (UDC) was particularly successful in controlling which histories were taught in schools. In addition to creating their own evaluation criteria and lists of approved textbooks for Southern schools, the UDC coordinated attacks on anyone who contradicted their Lost Cause mythology.21 When Enoch M. Banks, professor of history at the University of Florida published an academic article stating that the South was “relatively in the wrong” about slavery and the Civil War, the UDC led a public campaign against him. The Florida UDC President, Sister Esther Carlotta of St. Augustine, proclaimed that his “writings proved him so unjust to the South’s attitude in 1861 as to unfit him for that position.” In 1911, Banks was forced to resign his position and left the state.22
In the exhibit, I paired the work of neo-Confederate groups working to control the official history of the South with contemporary debates over American history and the role of slavery in the founding of the United States. The Pulitzer-Prize winning 1619 Project was a frequent target for conservative politicians and activists who resisted a more inclusive and honest accounting of the history of American slavery, and in November 2022, the Florida Board of Education finalized a rule that prohibits using any material from the 1619 Project in K-12 education. The Rule likens the historical interpretations put forth by 1619 Project editor Nikole Hannah-Jones and other contributors to Holocaust denial and goes on to say that K-12 instruction “may not define American history as something other than the creation of a new nation based largely on universal principles stated in the Declaration of Independence.”23 Thus, just as the UCV and UDC worked to ban books deemed counter to their preferred narrative of the Civil War while filling the void with their own approved histories, so too has the Florida Board of Education excluded works that question the morality of existing racial power structures while limiting what Florida teachers may discuss in their classrooms.
The Fight Against Civil Rights
In the mid-twentieth century, some Florida politicians invoked the threat of communist subversion in their efforts to resist racial desegregation and the expansion of civil rights, an existential threat to legal white supremacy in the Jim Crow South. The Florida Legislative Investigation Committee (FLIC) led the charge, using McCarthy-style tactics to attack civil rights activists, especially those affiliated with the National Association for the Advancement of Colored People (NAACP). When the FLIC, better known as the Johns Committee after its chairman, State Senator Charley Johns, failed to slow the advance of Black civil rights in the law, the committee expanded its search for cultural and political subversives to Florida’s universities. Taking advantage of the so-called Lavender Scare, a nationwide panic over homosexuality and its alleged affinities with communism, the Johns Committee reframed their anti-civil rights crusade as an attempt to root out homosexuality from Florida schools and public life.24
The Johns Committee found a useful ally in September 1961 when Jane Stockton Smith of Tampa complained to the Dean of the College of Basic Studies at the University of South Florida (USF) that the textbooks her son, Skipper, brought home were anti-religious and emphasized sex and evolution. In an interview that year she identified several specific books that she found obscene, communistic, and “one-sided,” including John Steinbeck’s The Grapes of Wrath and The True Believer by philosopher Eric Hoffer, a book about mass movements that challenge the status quo.25 Smith was instrumental in bringing a Johns Committee investigation to USF. She participated in what the Tampa Tribune described as an “unorganized parents’ group” alarmed by what Smith claimed were “reams of evidence to concern every citizen” and USF course readings that she alleged to be pornographic and anti-religious. USF students were reportedly questioned about “political ideologies expressed on campus and about sex predominance in required reading assignments.”26
Fig. 6. From left are: Mrs. Stockton Smith; Mark Hawes, attorney for the Johns Committee; Fla. Rep. William G. O’Neill; and Fla. Sen. Charley Johns, who headed the committee. “Funds and future for Johns Committee gets approval,” May 7, 1963. Courtesy of the State Archives of Florida, Florida Memory.
In 2011, students taking a documentary film class at the University of Central Florida (UCF), created a documentary about the Johns Committee. Simply titled, “The Committee,” the film features interviews with two North Florida survivors and one investigator from the Committee’s attempt to root out LGBTQ teachers and students from state universities. In their research, they found that over 200 students and teachers were expelled or fired from Florida universities as a result of the Committee’s anti-LGBTQ efforts.27 After receiving permission from Lisa Mills, one of the course instructors, I included this award-winning documentary in my exhibit, played on a loop through a computer monitor positioned on a small table below the related images of documents and newspaper clippings. As of this writing, “The Committee” remains available for streaming online via PBS.
Florida libraries could also be complicit in the Lavender Scare by censoring materials in their own collections. A letter written in 1960 to then Director of FSU Libraries, Orwin Rush, asked that an intern working with the FSU Graduate School’s Institute of Human Development be permitted to check out some books on homosexuality and clinical problems related to his work which were held “under restriction” at the library.28 (Fig. 6) I did not find archival evidence that described the exact form of these access restrictions, but this kind of “protective storage” may still exist in some libraries. My colleague at FSU Libraries, Norman “Trip” Wyckoff, confirmed that Strozier Library still had a locked cabinet of restricted books in the early 2000s, though at the time of my exhibit it no longer existed and it was unclear when it had been removed. I remembered a similar locked cabinet at Tulane University’s Howard-Tilton Memorial Library, my previous place of employment, which in the 2010s included, among other titles, the coffee table book Sex, by Madonna, and a book about foraging for psychedelic mushrooms.
Fig. 7. Wallace A. Kennedy to Orwin N. Rush, January 6, 1960. Florida State University Library Records, HUA 2020-006. Courtesy of the FSU Special Collections & Archives.
Echos of the Past in 21st-Century Book Banning
The exhibit ended with the Johns Committee’s anti-LGBTQ crusade, but the parallels between today’s wave of book bans and Jane Stockton Smith’s “parents group” and its alliance with the Johns Committee are easy to see. While today’s Florida State Legislature has been an active participant in censorship, the 501(c)(4) nonprofit organization, Moms for Liberty, has also played a key role. As an organization, Moms for Liberty has made book challenges one of their signature issues, providing resources on their website such as book reviews of disfavored titles and lists of their own approved book publishers, seeming to take a page directly from the playbook of the United Daughters of the Confederacy. In addition to these resources, Moms for Liberty links members to the Leadership Institute’s “School Board Activism” training course, “designed to equip conservative leaders with tools and tactics to influence education,” alongside TurningPoint USA’s School Board Watch List, which purports to identify school board members who “support anti-American, radical, hateful, immoral, and racist teachings in their districts.”29 Moms for Liberty also has ties to the Republican Party, evidenced by both the activities of the group’s founders and by the list of speakers that have attended their conferences, including Florida Governor Ron DeSantis, and their endorsement of Donald Trump in 2024.30 As in the 1960s, local activists today have found support from powerful political players in the state and at the national level, fueling a coordinated effort to exert power over their communities by controlling access to reading materials and deciding which stories can be told in books and in school curricula.
It may be too soon to apply historical analysis to Moms for Liberty and the current book banning movement. However, when compared to my exhibit’s three major episodes of book banning in Florida history, we might conclude that the reactionary forces of the 2020s are responding to a similar social crisis and are seizing an opportunity to assert control. In 2020, the COVID-19 pandemic and the Black Lives Matter movement destabilized the ideological status quo of the nation, peeling back the veneer of multiculturalism to reveal persistent and often deadly racial disparities and discrimination in American society, especially in health care, labor conditions, and in the criminal justice system. The wave of diversity, equity, and inclusion (DEI) initiatives that followed in corporate and educational settings, however superficial they may have been, may have signaled to some conservative forces the opening of a new front in the ongoing culture wars. Florida Governor Ron DeSantis was among the first political leaders to reframe COVID safety precautions and DEI trainings as infringements of individual liberties.31 Moms for Liberty amplified this message with demands that their individual parental rights be allowed to override pro-social institutional rules like school vaccination requirements and masking in classrooms. The policy solutions coming from these reactionary forces go well beyond banning books and suggest a broader ideological motivation. For example, at the same time that the state legislature imposes content restrictions on teachers and librarians in public schools and libraries, and as the book banners demonize librarians and teachers as groomers and pornographers, they are also in league with one another to expand school voucher programs to shift more public funds into private schools in the name of individual choice and personalized learning.32 Is the contemporary book banning movement part of an effort to delegitimize and defund public education and public libraries altogether? These preliminary ideas require further study, but it does seem clear that Moms for Liberty and their political allies are using the same tactics as Florida book banners of the past in order to control cultural institutions like schools and libraries and to impose their ideology on the rest of civil society.
Reflection
Looking back, I think my first solo exhibit was successful in examining the motivations of Florida’s book banners and revealing the continuities of censors’ goals throughout Florida history. In this regard, the director of the History Department’s public history program was very complimentary of the exhibit design and content. Student engagement, on the other hand, was somewhat limited. The location wasn’t ideal for attracting attention; that space in the library wasn’t getting a lot of traffic in part because of service changes to the Starbucks café in the library. I only recorded about a dozen links to the QR codes, most of which came from the link to the prison book ban database. The location also made conducting observations in the space impractical, so I was not able to see how passers-by engaged with the exhibit. Had the exhibit been in view of the circulation desk, it might have been easier to unobtrusively see if students were stopping to engage with the content. That said, the exhibit was a hit with some of the faculty and I was asked to integrate both the exhibit and its subject matter into information literacy sessions for three different courses. Thus, the exhibit did serve as a vehicle for outreach to teaching faculty and generated three instructional opportunities.
In November 2023, I relocated and took a new position as Head of Research & Instruction at the Monroe Library of Loyola University New Orleans. When we celebrated Banned Books Week at Loyola in September 2024, I was able to draw upon my experience with the “Against Liberty” exhibit. The disruption of Hurricane Francine shortened our planning time, so we were not able to mount a full-scale historical exhibit similar to “Against Liberty,” but we augmented a traditional display of commonly banned books in the library lobby with a few examples of historically challenged books and accompanying exhibit labels for context. In addition to some examples I reused from the FSU exhibit, I added Monroe Library’s copy of the Index Librorum Prohibitorum, which I found gathering dust in the stacks. Our copy was published in 1948, the last edition printed. The Church formally abolished the Index in 1966, though not as a move to exonerate the forbidden books, but rather to shift responsibility to individual readers to avoid immoral reading.33
Once again, the historical content seemed to create more opportunities for engagement with faculty than with students. The more successful initiative with students was a pop-up trivia table set up in the quad outside the library building, an initiative of Dr. Julia Miller in our Teacher Education program. The spread of commonly challenged children’s and young adult fiction on the table attracted plenty of attention from students walking by who recognized some of their favorite books. Once engaged, we asked them trivia questions about challenged books and music, largely drawing examples from the last 50 years. Students were surprised at some of the books on the table and examples from the trivia questions.
In conclusion, the success of my exhibit at FSU and the more recent activities at Loyola depended on the audience. I will include historical context in future banned books exhibits, but I am also convinced that this needs to be paired with some other interactive modes of engagement to capture students’ attention and start a conversation. While the standard Banned Books Week exhibit may draw attention to individual titles that students recognize, librarians and teachers must find ways to engage students in questions about who bans books and why. By recognizing the motivations of censors, students and educators together may more effectively resist attempts to limit access to information and a quality education. Understanding the historical context of book banning is also critical for librarians. Our work as educators and culture bearers will always be implicated in struggles for cultural dominance, and we must be prepared to defend ourselves and our values in this and future culture wars.34 We must find ways to refuse the authoritarian agenda of the book banners. Recognizing the political agendas and methods of potential book banners will help us to create library policies and procedures that remain responsive to our local communities while discouraging coordinated, bad faith assaults on our staff, collections, and the freedom to read.
Acknowledgments
Many thanks to my publishing editor, Jaena Rae Cabrera, and my peer reviewers, Ian Beilin and Niki Fullmer. This project was substantially improved in response to their insightful feedback. Special thanks to Dr. Laura McTighe, whose collaboration was critical to starting the exhibit project, and to Mimi Bilodeau, who found the “defaced” book that sparked so much inspiration. Thanks to history professors Dr. Katherine Mooney, for her input on the exhibit’s historical content, and Dr. Jennifer Koslow, for her expertise in exhibit design. I also want to acknowledge everyone at FSU Libraries who helped me assemble the original exhibit at FSU: Rachel Duke, Emory Gerlock, Devon McWhorter, Laura Pellini, and Dan “Brew” Schoonover. Special thanks to my wife, Sarah Withers, who inspires me every day and also helped me during final installation of the exhibit.
References
Periodicals
AP News
Confederate Veteran
The Floridian
Newsweek
Orlando Sentinel
Politico
South Santa Rosa News
Tallahassee Democrat
Tampa Bay Times
Tampa Tribune
Manuscripts
Florida State University Library Records, HUA 2020-006. FSU Special Collections & Archives, Tallahassee, Florida.
John W. Egerton Papers, 1961-1965, MS-1965-03. University of South Florida Libraries, Special Collections, Tampa, Florida.
Territorial Legislative Council Records, 1822-1845. State Archives of Florida, Florida Memory. Tallahassee, Florida.
Bailey, Fred Arthur. “Free Speech at the University of Florida: The Enoch Marvin Banks Case.” The Florida Historical Quarterly 71, no. 1 (1992): 1–17.
———. “The Textbooks of the ‘Lost Cause’: Censorship and the Creation of Southern State Histories.” The Georgia Historical Quarterly 75, no. 3 (1991): 507–33.
Blight, David W. Race and Reunion: The Civil War in American Memory. Cambridge, Mass: Belknap Press of Harvard University Press, 2001.
Braukman, Stacy Lorraine. Communists and Perverts under the Palms the Johns Committee in Florida, 1956-1965. Gainesville: University Press of Florida, 2012.
Brevard, Caroline Mays. A History of Florida, by Caroline Mays Brevard, with Questions, Supplementary Chapters and an Outline of Florida Civil Government by H. E. Bennett. New York: American Book Company, 1904.
Burkholder, Joel M., Russell A. Hall, and Kat Phillips. “Manufactured Panic, Real Consequences: Why Academic Librarians Must Stand with Public and School Libraries.” College & Research Libraries News 85, no. 6 (June 7, 2024): 254-57. https://doi.org/10.5860/crln.85.6.254.
Cox, Karen L. Dixie’s Daughters: The United Daughters of the Confederacy and the Preservation of Confederate Culture. Gainesville: University Press of Florida, 2003.
Crockett, Hasan. “The Incendiary Pamphlet: David Walker’s Appeal in Georgia.” The Journal of Negro History 86, no. 3 (July 2001): 305–18. https://doi.org/10.2307/1562449.
Davis, David Brion. Inhuman Bondage: The Rise and Fall of Slavery in the New World. New York: Oxford University Press, 2006.
Friedman, Jonathan, Tasslyn Magnusson, and Sabrina Baêta. “Banned in the USA: The Growing Movement to Censor Books in Schools.” PEN America, September 19, 2022, https://pen.org/report/banned-usa-growing-movement-to-censor-books-in-schools/.
Giroux, Henry A. “Educators as Public Intellectuals and the Challenge of Fascism.” Policy Futures in Education 22, no. 8 (November 1, 2024): 1533–39. https://doi.org/10.1177/14782103241226844.
Graves, Karen. And They Were Wonderful Teachers: Florida’s Purge of Gay and Lesbian Teachers. Urbana: University of Illinois Press, 2009.
Hinks, Peter P. To Awaken My Afflicted Brethren: David Walker and the Problem of Antebellum Slave Resistance. University Park, Pa: Pennsylvania State University Press, 1997.
Knox, Emily J. M. Book Banning in 21st-Century America. Blue Ridge Summit: Rowman & Littlefield, 2015.
Lenard, Max. “On the Origin, Development and Demise of the Index Librorum Prohibitorum.” Journal of Access Services 3, no. 4 (July 26, 2006): 51–63. https://doi.org/10.1300/J204v03n04_05.
Meehan, Kasey, Jonathan Friedman, Sabrian Baêta, and Tasslyn Magnusson. “Banned in the USA: The Mounting Pressure to Censor.” PEN America, September 1, 2023, https://pen.org/report/book-bans-pressure-to-censor/.
Meehan, Kasey, Sabrina Baêta, Tasslyn Magnusson, and Madison Markham. “Banned in the USA: Beyond the Shelves.” PEN America, November 1, 2024. https://pen.org/report/beyond-the-shelves/.
Paulus, Carl Lawrence. The Slaveholding Crisis: Fear of Insurrection and the Coming of the Civil War. Baton Rouge, La.: LSU Press, 2017.
Poucher, Judith G. State of Defiance: Challenging the Johns Committee’s Assault on Civil Liberties. Gainesville, Florida: University Press of Florida, 2014.
Schoeppner, Michael A. Moral Contagion: Black Atlantic Sailors, Citizenship, and Diplomacy in Antebellum America. Studies in Legal History. Cambridge: Cambridge University Press, 2019. https://doi.org/10.1017/9781108695404.
Scott, Julius Sherrard. The Common Wind: Afro-American Currents in the Age of the Haitian Revolution. London: Verso, 2018.
Serrell, Beverly. Exhibit Labels: An Interpretive Approach. Second edition. Lanham, Maryland: Rowman & Littlefield, 2015.
Smith, John David, and J. Vincent Lowery. The Dunning School: Historians, Race, and the Meaning of Reconstruction. Lexington: University Press of Kentucky, 2013.
Vose, Robin J. E. The Index of Prohibited Books: Four Centuries of Struggle over Word and Image for the Greater Glory of God. London: Reaktion Books, 2022.
Walker, David. Walker’s Appeal, in Four Articles : Together with a Preamble, to the Colored Citizens of the World, but in Particular, and Very Expressly to Those of the United States of America. Written in Boston, in the State of Massachusetts, Sept. 28th, 1829. Second edition. Boston: David Walker, 1830. https://hdl.handle.net/2027/mdp.69015000003166.
Kasey Meehan et al., “Banned in the USA: Beyond the Shelves” (PEN America, November 1, 2024), https://pen.org/report/beyond-the-shelves/. ︎
Ian Hodgson, “Florida schools received roughly 1,100 complaints, but about 600 came from one dad and one teacher,” Tampa Bay Times, August 27, 2023; Romi White, “New Legislation Will Help Local Moms for Liberty More Quickly Remove Pornographic Material from Schools,” South Santa Rosa News, May 31, 2023. ︎
Emily J. M. Knox, Book Banning in 21st-Century America (Blue Ridge Summit: Rowman & Littlefield, 2015), vii. ︎
Katherine Fung, “In Florida, Trump Sees Model for National Education Policies,” Newsweek, November 19, 2024. https://www.newsweek.com/trump-desantis-florida-education-1987835. ︎
Burkholder et al. made a similar observation in their own call for academic librarians to support our colleagues in public and school libraries. See Joel M. Burkholder, Russell A. Hall, and Kat Phillips, “Manufactured Panic, Real Consequences: Why Academic Librarians Must Stand with Public and School Libraries,” College & Research Libraries News 85, no. 6 (June 7, 2024): 254-57. https://doi.org/10.5860/crln.85.6.254. ︎
John David Smith and J. Vincent Lowery, The Dunning School: Historians, Race, and the Meaning of Reconstruction (Lexington: University Press of Kentucky, 2013). ︎
For more on effective exhibit labels, see Beverly Serrell, Exhibit Labels: An Interpretive Approach, Second edition. (Lanham, Maryland: Rowman & Littlefield, 2015). ︎
Carl Lawrence Paulus, The Slaveholding Crisis: Fear of Insurrection and the Coming of the Civil War (Baton Rouge, La.: LSU Press, 2017); Michael A. Schoeppner, Moral Contagion: Black Atlantic Sailors, Citizenship, and Diplomacy in Antebellum America, Studies in Legal History (Cambridge: Cambridge University Press, 2019), https://doi.org/10.1017/9781108695404. Enslavers’ fears were not unfounded as news and ideas did circulate among enslaved communities across the Atlantic world through a variety of informal communication networks. See, for example, Julius Sherrard Scott, The Common Wind: Afro-American Currents in the Age of the Haitian Revolution (London ; Verso, 2018). ︎
The literature on nineteenth-century abolition movements is vast. Two classic works that most inform my interpretation are Eric Williams, Capitalism & Slavery (Chapel Hill: University of North Carolina Press, 1994 [1944]); and David Brion Davis, Inhuman Bondage: The Rise and Fall of Slavery in the New World (New York: Oxford University Press, 2006). ︎
David Walker, Walker’s Appeal, in Four Articles : Together with a Preamble, to the Colored Citizens of the World, but in Particular, and Very Expressly to Those of the United States of America. Written in Boston, in the State of Massachusetts, Sept. 28th, 1829 , second edition (Boston: David Walker, 1830), https://hdl.handle.net/2027/mdp.69015000003166; For more on Walker’s Appeal and its impacts, see Peter P. Hinks, To Awaken My Afflicted Brethren: David Walker and the Problem of Antebellum Slave Resistance (University Park, Pa: Pennsylvania State University Press, 1997); Hasan Crockett, “The Incendiary Pamphlet: David Walker’s Appeal in Georgia,” The Journal of Negro History 86, no. 3 (July 2001): 305–18, https://doi.org/10.2307/1562449. ︎
“Meeting at Shell Point,” The Floridian (September 26, 1835): 2. ︎
James Call, “Banned behind Bars: 20,000 Books Can’t Be Read by Florida Inmates; the List May Surprise You,” Tallahassee Democrat, August 11, 2019, https://www.tallahassee.com/story/news/politics/2019/08/09/banned-behind-bars-20-000-books-cant-read-florida-inmates/1934468001/. ︎
Florida Department of Corrections, “About the Florida Department of Corrections,” https://fdc.myflorida.com/about.html. Accessed January 7, 2025. ︎
Fred Arthur Bailey, “The Textbooks of the ‘Lost Cause’: Censorship and the Creation of Southern State Histories,” The Georgia Historical Quarterly 75, no. 3 (1991): 507–33; David W. Blight, Race and Reunion: The Civil War in American Memory (Cambridge, Mass: Belknap Press of Harvard University Press, 2001). ︎
Karen L. Cox, Dixie’s Daughters: The United Daughters of the Confederacy and the Preservation of Confederate Culture (Gainesville: University Press of Florida, 2003). ︎
Fred Arthur Bailey, “Free Speech at the University of Florida: The Enoch Marvin Banks Case,” The Florida Historical Quarterly 71, no. 1 (1992): 1–17. ︎
Florida Department of Education, State Board of Education, “Required Instruction Planning and Reporting,” Florida Administrative Code Rule 6A-1.094124, https://www.flrules.org/gateway/ruleno.asp?id=6A-1.094124. ︎
The extent and severity of the Johns Committee’s activities were revealed in 1993 when the legislative records were released to the public. David Barstow, “Secrets of State’s Search for ‘subversives’ Revealed.” Tampa Bay Times, July 2, 1993. See also Seth Weitz, “Campus of Evil: The Johns Committee’s Investigation of the University of South Florida,” Tampa Bay History 22, no. 1 (January 1, 2008), https://digitalcommons.usf.edu/tampabayhistory/vol22/iss1/5; Karen Graves, And They Were Wonderful Teachers: Florida’s Purge of Gay and Lesbian Teachers (Urbana: University of Illinois Press, 2009); Stacy Lorraine Braukman, Communists and Perverts under the Palms the Johns Committee in Florida, 1956-1965 (Gainesville: University Press of Florida, 2012); Judith G. Poucher, State of Defiance: Challenging the Johns Committee’s Assault on Civil Liberties (Gainesville, Florida: University Press of Florida, 2014). ︎
“An Open Interview with Mrs. S____,” John W. Egerton Papers, 1961-1965, MS-1965-03, Box 1, Folder 9. University of South Florida Libraries, Special Collections, Tampa, Florida. ︎
Steve Raymond, “USF Probe Broadens, Investigators Still Mum,” Tampa Tribune (19 May 1962):A1. ︎
Wallace A. Kennedy to Orwin N. Rush, January 6, 1960, Florida State University Library Records, HUA 2020-006, Permanent Files, 1958-1963 L-Z, Box 11, Folder “Miscellaneous,” FSU Special Collections & Archives, Tallahassee, Florida. ︎
Kathryn Varn, “DeSantis to conservative Moms for Liberty: ‘You gotta stand up, and you gotta fight,’” Tallahassee Democrat, July 15, 2022; Ali Swenson, “Moms for Liberty rises as power player in GOP politics after attacking schools over gender, race,” AP News, June 11, 2023, https://apnews.com/article/moms-for-liberty-2024-election-republican-candidates-f46500e0e17761a7e6a3c02b61a3d229; Ali Swenson, Moriah Balingit, and Ayanna Alexander, “Moms for Liberty Fully Embraces Donald Trump as Election Nears,” AP News, September 3, 2024, https://apnews.com/article/moms-for-liberty-trump-2024-election-harris-7c252c611b5bc73c333a24392b979372. ︎
John Kennedy, “A Defiant Florida Gov. Ron DeSantis Opens Legislative Session Touting Florida as ‘Free,’” Tallahassee Democrat, January 11, 2022, https://www.tallahassee.com/story/news/local/state/2022/01/11/ron-desantis-declares-florida-free-state-speech-attacks-biden-policies/9171715002/; Megan Messerly, Krista Mahr, and Arek Sarkissian, “DeSantis Is Championing Medical Freedom. GOP State Lawmakers like What They See,” POLITICO, March 1, 2023, https://www.politico.com/news/2023/03/01/desantis-medical-freedom-gop-florida-00084842. ︎
Executive Office of the Governor of Florida. “Governor Ron DeSantis Announces School Choice Success,” Executive Office of the Governor, Newsroom, January 10, 2025, https://www.flgov.com/eog/news/press/2025/governor-ron-desantis-announces-school-choice-success; Annie Martin and Leslie Postal, “Vouchers for All How Florida Law Is Supercharging School Choice Vouchers Vouchers Wealthy Families, Pricey Schools Reap Millions in Tax Funds,” Orlando Sentinel, February 16, 2025. ︎
For more on the history of the Index, see Max Lenard, “On the Origin, Development and Demise of the Index Librorum Prohibitorum,” Journal of Access Services 3, no. 4 (July 26, 2006): 51–63, https://doi.org/10.1300/J204v03n04_05; Robin J. E. Vose, The Index of Prohibited Books: Four Centuries of Struggle over Word and Image for the Greater Glory of God (London: Reaktion Books, 2022). ︎
Henry A. Giroux, “Educators as Public Intellectuals and the Challenge of Fascism,” Policy Futures in Education 22, no. 8 (November 1, 2024): 1533–39. https://doi.org/10.1177/14782103241226844. ︎