I’m not aware of resuscitating anything, really. I just enjoy playing
and finding new ways to surprise myself. It all happens organically.
There’s not a lot of strategizing that goes into it. There’s no plan. I
love the history of the guitar and of guitar players, so it’s a
conversation I want to be part of.
We verify, to the best of our abilities, that people who say they’re
Palestinians from Gaza are who they say they are by holding video
conversations with them.
Beyond that, we’re in touch with them daily and trying to help them
raise funds to stay alive.
Offsetting is a worldview that insists that problems created by capital
accumulation can only be resolved by further capital accumulation. On a
finite planet, the search for new forms of capital begins to resemble a
hall of mirrors. A tree, for example, becomes a fractal, recursive
commodity, with hidden dimensions of value waiting to be uncurled, like
hypothetical extra dimensions of space and time. Offsetting is a product
of last-stage, recursive, data-driven capitalism, and its brilliance is
that it allows any human activity to be commodified by means of
measurement, data collection, and computation
The FBI is attempting to unmask the owner behind archive.today, a
popular archiving site that is also regularly used to bypass paywalls on
the internet and to avoid sending traffic to the original publishers of
web content, according to a subpoena posted by the website. The FBI
subpoena says it is part of a criminal investigation, though it does not
provide any details about what alleged crime is being investigated.
Archive.today is also popularly known by several of its mirrors,
including archive.is and archive.ph.
On stage later, beneath an array of a dozen stag skulls, Krieger was
slinking around like a ghostly Indigenous Jim Morrison, windmilling his
waist-length hair, blasting life force through a cedar flute and leaning
over the crowd to shriek upward-arcing shards — whose strength never
flagged during his entire blistering set — as his guitarist crowned his
howls with constellations of tremolo picking. In the mosh pit, one fan
hoisted a buffalo rib the size of a baseball bat, presumably from the
barbecue, like some feral invocation. Blackbraid’s performance may have
converted Daniels, the photographer, to solid metal fandom. But after
Converge, he stood by me and said, “Now I get it.”
Most individual carbon footprint calculators only factor in consumption.
How many plane trips have you taken this year? How many car miles have
you driven? And so on. And, numerous studies and papers have drawn out
the relationship between individual or household consumption and
emissions. On the other hand, far fewer studies have attempted to
calculate emissions based income.
Is it possible to estimate individual or household emissions based on
income alone, without any consideration for consumption choices?
What follows is my attempt to come to grips as an adult with a
foundational work of theory I’d bought and first tried to get my head
around in late adolescence: Gilles Deleuze and Félix Guattari’s
“Nomadology: The War Machine.”
It is often not feasible to find thousands of real-world samples for
every file type. To overcome this, we turned to generative AI. We
leveraged Gemini to create a high-quality, synthetic training set by
translating existing code and other structured files from one format to
another.
Watching movies and TV is a big part of someone’s French language
journey. Unfortunately, they are not always easy to find but I have the
solution for you! 6 French movies to improve your French that are
available on Netflix and Amazon Prime Video.
DuckLakes typically consist of two components: the catalog database and
the storage. Obviously, there is no way around having some sort of
storage to represent the data. But can we host a DuckLake without
setting up a catalog database? In this blog post, we’ll show how you can
create a read-only cloud-based DuckLake without a database server. We
call this a “Frozen DuckLake” because it is read-only and has no moving
parts other than a cloud storage system.
To resolve the fundamental problems of the existing lakehouse
architecture, we have created a new open table format called DuckLake.
DuckLake re-imagines what a “lakehouse” format should look like by
acknowledging two simple truths:
Storing data files in open formats on blob storage is a great idea for
scalability and to prevent lock-in.
Managing metadata is a complex and interconnected data management task
best left to a database management system.
À cet égard, pendant la plus grande partie de l’histoire humaine, la
religion et la science n’étaient manifestement que deux dimensions d’une
seule et même entreprise humaine d’ensemble. Et si nous voulons trouver
une force ou une entité singulière qui nous ait engagés sur cette voie,
nous ne pourrions guère faire mieux que de montrer du doigt la Lune.
pg_lake integrates Iceberg and data lake files into Postgres. With the
pg_lake extensions, you can use Postgres as a stand-alone lakehouse
system that supports transactions and fast queries on Iceberg tables,
and can directly work with raw data files in object stores like S3
This article examines the expansion of generative artificial
intelligence (genAI) into higher education. We argue that genAI’s
incursion into university systems represents an effort by Silicon Valley
to capture lucrative new markets for their products, as well as enhanced
credibility by association. Drawing on our experience as faculty in the
California State University (CSU) system—the first university system to
contract with OpenAI to provide ChatGPT Edu system-wide—we assess the
rhetoric that justifies and legitimizes genAI contracts in higher
education. We suggest that the uncritical adoption of genAI in higher
education poses problems for labor conditions, the integrity of
intellectual property, and student learning on campuses, particularly
under the conditions of austerity that are commonly found in public
universities.
When your country pursues abhorrent policies, when the face it turns to
the world is the face of a monster, what does that say about you? In my
experience, it is strikingly easy to shrug off one’s responsibility for
the country where one pays taxes, contributes to the public conversation
and, at least nominally, has the right to vote, if that country is the
United States. It seems one can just say “Not in my name” and continue
to enjoy the wealth and the freedom of movement one’s citizenship
confers. But as this country builds more cages for immigrants, deploys
military force against civilians in city after city, regularly commits
murder in the high seas and systematically destroys its own democratic
institutions, that may change. It should change. What does one do then?
How can one be a good citizen of a bad state
A “weak” password could become one of the symbols of the Louvre’s
security flaws. According to 2014 documents obtained by Libération,
accessing the museum’s video surveillance server required typing the
all-too-obvious word: LOUVRE .
Le standard d’échange de données pour l’archivage (SEDA) vise à
faciliter l’interopérabilité entre le système d’information d’un service
d’archives et les systèmes d’information de ses partenaires dans le
cadre de leurs échanges de données.
ProfileGenAPP est un éditeur visuel pour créer des profils d’archivage
SEDA 2.1+ au format RelaxNG. Cette documentation présente les
principales fonctionnalités disponibles dans l’interface utilisateur.
Under the banner of “patriotic tech”, this new bloc is building the
infrastructure of control—clouds, AI, finance, drones, satellites—an
integrated system we call the Authoritarian Stack. It is faster,
ideological, and fully privatized: a regime where corporate boards, not
public law, set the rules.
My biggest complaint about the launch of the ChatGPT Atlas browser the
other day was the lack of details on how OpenAI are addressing prompt
injection attacks. The launch post mostly punted that question to the
System Card for their “ChatGPT agent” browser automation feature from
July. Since this was my single biggest question about Atlas I was
disappointed not to see it addressed more directly
The BitTorrent protocol is a p2p protocol for distributing data
described as a series of hashes and file metadata contained in .torrent
files. This FEP describes a JSON-LD representation of .torrent files as
an extension of an ActivityStreams Object.
Taking inspiration from (2001?), this account is
slowly posting frames from the 2012 art installation project
“Logistics”.
The 37-day long road movie follows a package from Sweden to China in
real-time. This bot will post every 3,000th frame (roughly 2 minutes
apart), one per hour, aiming to cover one day of the journey every 30
days of real time.
More info on the original project here: https://logisticsartproject.com/
arXiv’s computer science (CS) category has updated its moderation
practice with respect to review (or survey) articles and position
papers. Before being considered for submission to arXiv’s CS category,
review articles and position papers must now be accepted at a journal or
a conference and complete successful peer review. When submitting review
articles or position papers, authors must include documentation of
successful peer review to receive full consideration. Review/survey
articles or position papers submitted to arXiv without this
documentation will be likely to be rejected and not appear on arXiv.
A warning about how AI chat tools—like ChatGPT, Claude, Gemini, and
others—can become dangerously addictive in a specific way. Not like
social media addiction. Something harder to spot.
Yet the boat was a refuge. We had wandered in search of somewhere to
settle — or, more precisely, somewhere we would be allowed to settle —
for four years, moving between whatever cheap, temporary accommodation
we could borrow or afford to rent in seven countries, until this unkempt
but graceful vessel suggested a possibility of shelter, of ‘fixedness’,
even as we were compelled by circumstances to remain mobile, unrooted
Part history, part fantasy. Hakim Bey is the nom de plume of Peter
Lamborn Wilson, who led an interesting life:
https://en.wikipedia.org/wiki/Peter_Lamborn_Wilson
Imagine you are training a machine learning (ML) model to identify dog breeds from images. You have thousands of photos of Golden Retrievers, Poodles, and Dachshunds, but the model cannot distinguish a Chihuahua from a Bulldog without labeled data. This is where data labeling becomes essential. Human annotators tag each image so the ML model can learn to recognize patterns. However, manually labeling thousands of images is time-consuming and resource-intensive. A more efficient approach is to distribute the work among many people online through crowdsourcing, which is the process of outsourcing tasks to a large group of people, often over the internet, instead of relying on a single employee or a small team. In machine learning, crowdsourcing is commonly used for data labeling, the process of annotating raw data such as images, text, or videos to make it suitable for training ML models.
Amazon SageMaker Ground Truth is a managed service that simplifies data labeling by integrating crowdsourcing directly into the ML workflow. Instead of manually managing workers, tracking progress, and consolidating labels, Ground Truth automates much of the process. Key features of Amazon SageMaker Ground Truth include:
Built-in workforce options – Use your own team (private workforce), MTurk workers, or third-party vendors.
Pre-built labeling templates – Supports common tasks like image classification, object detection, text sentiment analysis, and more.
Automated quality control – Uses techniques such as majority voting to improve label accuracy.
Setting Up for Crowdsourcing with SageMaker Ground Truth
Before launching into crowdsourcing with SageMaker Ground Truth, you need to complete a few prerequisites. Two key components are deciding where your data will reside (Amazon S3) and defining access permissions (IAM roles).
S3 Buckets for Storage
Amazon S3(Simple Storage Service) is basically where you keep your files within an AWS working environment. In our scenario this includes images, text data, video files, and any other file format your ML model needs to learn from. SageMaker Ground Truth pulls from S3 and also pushes labeled output back into it.
To make this process easier to follow, I’ll be using examples from a project we recently worked on, where we labeled citation contexts into one of three reproducibility-oriented sentiment categories using SageMaker Ground Truth. For our project, we followed the steps below in S3.
After the bucket is created, upload the input files (to prepare the input files, follow theinstructions in this tutorial series) into the “input/” folder.
For our project, we had all the citation contexts organized with one context per row in a CSV file, without any headers.
Folder structure for the bucket created in S3 for our labeling task
Select “AWS Service” as the trusted entity and choose “SageMaker” as the service.
Attach policies (choose based on your use case).
Use a uniquely identifiable name for the role.
Created role and attached policies for our project
Selecting the Right Workforce
Once your data is ready and permissions are set up, the next question is: who’s actually going to label your data? SageMaker Ground Truth provides three options: use your own team (private workforce), outsource to a professional vendor (vendor workforce), or crowdsource through MTurk. The choice depends on your budget, task complexity, and how much control you want.
A “private workforce” is basically your internal team. You create an access portal where only approved users can log in and complete labeling tasks. This is useful when you need confidentiality or when domain expertise is required such as medical images or legal text.
The “vendor workforce” is for when you want a managed, professional service to handle labeling. These vendors are pre-vetted by AWS and often used for complex or large-scale projects.
Amazon Mechanical Turk (MTurk) connects researchers and organizations to a global pool of crowd workers who can complete small tasks, such as labeling images or text, for a small fee per task. It is an effective platform because it allows easy scalability, enabling hundreds or even thousands of labelers to participate simultaneously. MTurk is particularly suitable for simple tasks and is often more cost-effective than hiring a dedicated labeling team. In my project I chose the MTurk option for workforce when creating the labeling job.
Selecting MTurk as the workforce
Designing Labeling Tasks
After selecting the workforce, the next step is deciding what kind of labeling taskyou will be doing. Currently, SageMaker Ground Truth offers 14 built-in templates for common tasks, categorized into four input types: images, video, text, and 3D point clouds, and also allows you to create custom workflows for more specific needs.
Custom labeling UI created using the crowd elements
Creating a Labeling Job
Once the input data, roles, and configurations are ready as described in the previous sections, you can create a labeling job directly in the SageMaker console by following a few straightforward steps.
Sign in to the SageMaker console and open the Ground Truth Labeling jobs section.
Choose “Create labeling job” and provide a job name.
Set up input and output locations in Amazon S3 and assign an IAM role with the required permissions.
Select the task type and category that matches your data (text, image, video, or point cloud).
Choose your workforce (MTurk, private, or vendor) and, if needed, adjust worker settings such as timeouts and the number of workers per task.
Configure the worker interface by providing instructions and label categories, or upload a custom template using crowd elements if required.
Review the setup and click “Create” to start the labeling job.
Once created, SageMaker Ground Truth distributes your data to selected workers, collects annotations, and stores the labeled output in your S3 bucket. From there, you can monitor progress and review results through the “Labeling jobs” console.
SageMaker Ground Truth “Labeling Jobs” Console
Additional Configurations and Enhancements
Once the basic labeling job is set up, SageMaker Ground Truth provides extra tools to improve accuracy and automation. One key feature is AWS Lambda function for annotation consolidation for better label quality. Lambda functions allow you to customize how multiple worker responses are combined, apply rules to filter or transform outputs, and calculate confidence scores. For instance, if three workers label the same item and two agree, that label becomes the final result. In our project, we tested using three workers per citation context. This increased the cost but improved label reliability. Ground Truth also saves all individual responses and the final consolidated label in the output file.
Managing Human Responses
After workers submit their labels, the results are saved in your Amazon S3 “/output” folder. Each labeling job generates output manifest files in JSON format, containing individual worker responses and the consolidated output for each labeling item (in our case, each citation context). The example below shows the JSON-formatted outputs that include both individual workers’ responses (A) and the consolidated label (B) for that particular object (citation context).
Labeling job output — A: Individual labels for the same object (responses from multiple workers for the same citation context) B: Consolidated label from multiple annotations for a single object (citation context).
Payments and Task Management
When using Amazon Mechanical Turk through SageMaker Ground Truth, AWS handles task distribution and payment processing. You set the price per labeling task in the job configuration, while AWS adds a small service fee and bills you through your account. In our project, each text snippet was a simple classification task, so we kept the price low (~0.03$ per label) and monitored progress in the SageMaker console. Ground Truth provides tools to track completed tasks, worker responses, and errors, though it does not allow direct communication with MTurk workers or access to detailed metrics.
Limitations of SageMaker Ground Truth
SageMaker Ground Truth offers convenience but less flexibility than native MTurk. It does not support custom worker qualifications, bonus handling, or manual worker management. While AWS recommends using SageMaker Ground Truth for crowdsourcing, projects that require direct communication with workers or reusing previous workers may benefit from using MTurk directly.
Conclusion
Crowdsourcing through SageMaker Ground Truth streamlines data labeling and reduces manual work. For our three-category text classification project, we used MTurk as the workforce, configured tasks with minimal setup, and obtained accurate results efficiently. Ground Truth is ideal for researchers and developers who want to focus on model development while easily managing labeling at scale.
How to make a custom template for the Remarkable 2
Recently I decided I wanted to make a custom template to use on my reMarkable 2. I eventually figured out how to do this, but whilst I found some useful guides online, all of them were slightly misleading or unhelpful in different ways – probably due to changes over time. This guide is for anyone else wanting to give it a shot in 2025.
The tl;dr
The reMarkables are built on Linux, and the templates are SVG files in a specific directory. Adding your own template is probably easier than you expected:
create an SVG file for your template
connect to your reMarkable using SSH
copy your template to the templates directory
update the templates.json file so your template appears in the listing
reboot the reMarkable
I haven't tried it on Windows, but apparently Windows has an SSH terminal and also scp so you should be able to follow this same process whether you have a computer running Linux, MacOS, any other Unix-based system, or Microsoft Windows.
You will need a computer, software for creating SVG graphics, and a little confidence.
Caveats
It's possible you could brick your reMarkable if you mess this up really badly. Always make sure you have backed up your files before doing anything in reMarkable's file system.
I haven't been using custom templates for long enough to know for sure, but others have suggested that when your reMarkable software is next updated, any custom templates may be deleted. Make sure you have backups of your templates as well!
Finally, this is what worked for me on a reMarkable 2 running the latest operating software in July 2025. Future system updates may change the way this works.
Step 1 - create your template
Older guides for making custom templates, like this one were helpful for me to understand the basics of templates, but it seems that in the past templates were .png files, whereas recently they changed to SVG.
To create a template you will need something to create SVG graphics. I use Affinity Designer, but you could try Inkscape, Adobe Illustrator, or Canva. The reMarkable 2 screen size is 1872px x 1404px so although SVGs will scale proportionally, for best results make your file match that size.
Remember that your reMarkable 2 will only display in black, white, and grey. If your design doesn't quite work the first time, you can play around with it and reload it, so you can experiment a little until you get the design that suits your needs.
Once you're finished, save the template somewhere you can find it easily on your computer, as a .svg file.
Step 2 - connect to your reMarkable via SSH
To access the operating system for your reMarkable, you will need to connect using Secure Shell (SSH). For this, you need two pieces of information about your reMarkable: the IP address, and the password. From the main menu (the hamburger icon at top left) navigate to Settings - Help - Copyrights and licenses. At the bottom of the first page in this section you will find your password in bold type, and a series of IP addresses. The second (IPv4) address is the one you are looking for. This will be a private IP address starting with 10. If your reMarkable is connected to WiFi, you can use SSH over the same WiFi network. Otherwise, connect via your reMarkable's USB power/data cable. Either way, ensure that your reMarkable remains awake whilst you are connected, otherwise your session may hang.
Open a terminal on your computer (Terminal on Mac and Linux desktop, CMD.exe or PowerShell on Windows). You will be logging in as the user called root. This is a superuser on Linux machines so take care - with great power comes great responsibility. You should be able to log in using this command (where xxx.xxx.xxx.xxx is your IP address):
ssh root@xxx.xxx.xxx.xxx
Your terminal will then ask for a password, which you should type in, and then press Enter - the quotation marks are not part of the password. If all goes well, you should see something like this:
At this point you should pause to ensure that you know the filepath to the template path on your computer. If you saved it to your desktop (not a great place for long term storage, but convenient for quick operations like this) it will be something like ~/Desktop/my_custom_template.svg. We are now going to create a special subdirectory for your custom template/s, and copy your file across.
In your terminal session you should still be logged in to the reMarkable. The templates are all stored in the /usr/share/remarkable/templates directory. To create a new subdirectory, we use the mkdir command, like this:
Now we can copy our template over. Open a new terminal window. We will use the secure copy protocol to copy the file over SSH from your computer to your reMarkable:
Back in your first terminal session – which should still be connected to the reMarkable – you can check whether the file transferred across using the ls command:
ls /usr/share/remarkable/templates/my_templates
This should display my_custom_template.svg.
Step 4 - update the templates.json file
Now for the trickiest part. You will need to update a file in the templates directory called templates.json. This provides information about where each template is stored, what it should be called, and which icon to use in the templates menu. If you make an error here, your templates may no longer work properly (I know this from my own mistake!) - so whilst it is reasonably straightforward, you do need to pay attention.
Many tutorials about editing files on the Linux command line tell you to use vi or vim. These are the default text editors on Linux, but they are also obtuse and confusing for newcomers. We are going to instead use the nano program that is also standard on most Linux distributions, but a little easier to understand. To edit the templates JSON file, open it in nano:
You should now see a screen showing the beginning of a long string of JSON. We want to add a new entry down the bottom of the file, so we will navigate down to line 500 using the keyboard shortcut Ctrl + / + 500 + Enter. From there you can use your cursor/arrow keys to navigate down to the last entry in the file. We want to add a new entry, like this:
Make sure you do not overwrite or delete the square and curly brackets at the end of the file, that you do put a comma after the second-last entry and your new one, and do not leave a trailing comma after your new entry.
Note that the filename is relative to the templates directory, so we need to include the new subdirectory. The iconCode uses a "private use" unicode value that matches one of reMarkable's standard images – it is not possible to create your own icon so you will need to re-use one of the existing ones.
Once you confirm everything is correct, enter Ctrl + x to exit, and y + Enter to confirm you want to save changes using the original filename.
Step 5 - reboot
Now for the most terrifying moment: rebooting your reMarkable!
Back on your command line, type reboot and then press Enter.
This step is simple but it will be a little nerve-wracking because your reMarkable will reboot, then pause for a moment before letting you log back in. If everything has gone according to plan you should now be able to find your new template by name in the template directory, and start using it!
Optional bonus step 7 - SSH keys
Logging in with a password is ok, but it can get a bit tedious. An easier way is to use SSH keys.
Once you have created your keys, you can use ssh-copy-id to copy your public key to your reMarkable, allowing you to log in without a password! We use the ssh-copy-id command, with the i flag followed by the path to our ssh key:
ssh-copy-id -i ~/.ssh/id_rsa root@xxx.xxx.xxx.xxx
If you only have one ssh key, you can just enter:
ssh-copy-id root@xxx.xxx.xxx.xxx
At the prompt, enter your password and press Enter. You should see a number of lines of output, ending in:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@xxx.xxx.xxx.xxx'"
and check to make sure that only the key(s) you wanted were added.
You should now be able to log in to your reMarkable to update templates at you leisure without a password.
Happy note taking!
Addendum
Update 8 Nov 2025
Step 8 - recover your templates after a reMarkable software update
Something I neglected to appreciate when I first wrote this post is that when reMarkable releases a software update, they overwrite all the templates and the template JSON file. This is, needless to say, annoying.
Fixing this is reasonably simple - basically you need to re-do steps 3, 4 and 5. I keep a file with the changes to the JSON file and a reminder of the exact command I need to use to copy my files across. Then whenever my reMarkable software is updated I can ssh into the reMarkable, paste my changes in to the JSON file (checking all the commas are in the right place!), and copy across the template files.
With the last update I also experienced a problem with the SSH key on the reMarkable changing. If this happens to you, the fix is to delete the listing for the old key in the trusted keys file on your computer, then try connecting via SSH again. Your SSH program should point you in the right direction with a helpful error message.
ODE helped PARI restructure messy public financial datasets into a clean, consistent format, making them more presentable and ready for reliable analysis
Digital Day Event at Magdeburg-Stendal University of Applied Sciences, Summer 2025
First Visit: SPiRIT Lecture Series (Summer 2023)
In summer 2023, I traveled to Germany to attend the Eye Tracking Research and Applications (ETRA) 2023 conference held in Tübingen, Germany. Following the conference, I extended my trip to eastern Germany to visit Magdeburg-Stendal University of Applied Sciences, as I was warmly invited by Dr. Herzog to deliver a talk as part of the SPiRIT Lecture Series. The SPiRIT Lecture Series is a highly respected forum for academic discussions in the Human-Computer Interaction field.
My talk centered on our broader research theme, ‘Evaluating Human Eye Features for Objective Measurement of Working Memory Capacity’, with a focus on a project titled ‘Believe It or Not: Exploring Human Perception While Reading Fake and Real News Articles.’ In this work, I discussed how eye-movement analysis can provide insights into human visual and cognitive behavior in relation to the truthfulness and believability of news. I shared our research findings on advanced eye-gaze measures, highlighting how people's visual scanning patterns, attention across different Areas of Interests (AOIs), and cognitive load varies based on the truthfulness and perceived believability of new content.
Flyer for my talk at the SPiRIT Lecture Series, Summer 2023
I concluded my talk by talking about our WS-DL research group and NIRDS Lab. I highlighted our research outreach events, such as Trick or Research and ODU Admitted Student Day, where we demonstrate engaging eye-tracking experiments to participants.
Post-session meetup at theHerrenkrug Campus, Magdeburg in Summer 2023
Second Visit: Digital Business Day Event (Summer 2025)
This summer, I lived in Bonn, Germany, doing a summer research internship through DAAD RISE Professional Program. During this time, I visited Magdeburg for the second time, as per the invitation by Dr. Herzog. I participated in the Digital Business Day event which took place in Magdeburg-Stendal University of Applied Sciences. This event was organized under the motto "Ethics in Digital Innovation – Business meets Maker Education", where the students from the digital business program presented the results of their semester projects.
Dr. Herzog briefly introduced about me, our research group, and our eye tracking research to the audience. He also talked about his collaborations with us and his interactions with the WS-DL members.
Followed by his introduction, I had the privilege of delivering the first speech of the event, on the title “Gaze-Based Virtual Reality for Enhanced User Experience in Business”. I started the talk by discussing the existing use of Virtual Reality (VR) and eye tracking technology in business applications. I discussed the Eye-Tracking-Based Virtual Reality (VR) technology and its application and potential use cases in the business domain. I talked about how we can analyze users' visual behavior to improve user experience in business applications. I then talked about our proposed framework for measuring visual attention in the VR using a consumer-grade eye-tracking-enabled Meta Quest Pro VR headset and eye-tracking measures.
The highlight of the Digital Business Day was the Student Poster-Madness session. Students from the digital business program presented their semester projects through posters and prototypes of the innovative products they had designed. I was particularly impressed by how they planned, designed, and built their prototypes using 3D printing technology. I was honored to serve as one of the judges evaluating their posters, designs, and prototypes.
Exploring Eastern Germany
The trips to Magdeburg also offered me with a wonderful opportunity to explore eastern Germany. We visited a several historical landmarks in Magdeburg, including the iconic Magdeburg Cathedral — the oldest Gothic cathedral in the country — and Bastion Cleve — the south-eastern end of the former fortress of Magdeburg. On our way back to Potsdam, we stopped at the Magdeburg Water Bridge, the largest canal bridge in Europe.
Some of the landmarks I visited in Magdeburg
The most unforgettable moment of these trips was the scenic bike ride through Potsdam with Dr. Herzog, enjoying the city’s historic landmarks. We made stops at beautiful Sanssouci Palace, Orangery Palace, Chinese House in the Sanssouci Park. Dr. Herzog was an amazing host and tour guide throughout these trips, sharing his deep knowledge of the history of landmarks we visited.
Exploring Potsdam by bike — a stop in front of the iconic Sanssouci Palace
Dr. Herzog and his wife warmly welcomed me into their home during both trips, where I had the chance to enjoy a variety of delicious cuisines and authentic Germany dishes. I also tried white asparagus for the very first time — a beloved springtime delicacy in Germany.
The wonderful hosts, Dr. Herzog and his wife
During my visit this summer, I had the wonderful opportunity to attend a performance of the renowned German play Der tollste Tag by Peter Turrini at the Hans Otto Theater, where Dr. Herzog’s wife, Mrs. Rita, gave an amazing piano performance. Although the play was performed in German, which I’m only beginning to learn, I found the experience captivating. I thoroughly enjoyed the outdoor theater’s atmosphere by the river at sunset and the beautiful music.
Der tollste Tag Play at the Hans Otto Theater, Potsdam
My visits to Magdeburg-Stendal University not only gave me opportunities to present our eye-tracking research, but also allowed me the experience of delivering talks and networking with scholars from international academic institutions. Aside from the professional experiences, I also managed to cross a few more European cities off my bucket list thanks to these trips, making them both rewarding and transformative.
Acknowledgements
I would like to express my heartfelt gratitude to Dr. Herzog for his invitations, hospitality, and for showing me the best of Magdeburg and Potsdam during my stay. He and his wife, Mrs. Rita, warmly welcomed me into their home and shared so much of Germany’s culture and spirit.
On behalf of WS-DL research group, Computer Science department, ODU, and myself, I also extend my sincere thanks to Dr. Herzog, SPiRIT lab team, and Magdeburg-Stendal University, for organizing these wonderful opportunities for us over years.
Many libraries and archives are homes for oral history programs. Oral histories add voice, personality, richness, and depth, and are a natural complement to other types of collections. While these narratives can bring significant value to collections, supporting oral history production can present special challenges. Oral history programs can be composed of complex, multi-stage projects that require thoughtful planning, coordination, and even innovative approaches to capture, preserve, and make accessible valuable cultural and historical narratives.
Conor Casey, Head of the Labor Archives of Washington at University of Washington Libraries Special Collections, shared his experiences based on over 13 years of evolved practices with collaborative workflows and scalable project management. The Montana State University team—Jodi Allison-Bunnell, Head of Archives and Special Collections; Emily O’Brien, Metadata and Mendery Specialist; and Taylor Boyd, Metadata & Collection Support Technician—discussed their experiments using two different tools to generate oral history abstracts.
This blog post summarizes the webinar’s key insights, but you can watch the full recording right here.
The Labor Archives of Washington: Shared stewardship in practice
The Labor Archives of Washington, founded in 2010 as a partnership between the Harry Bridges Center for Labor Studies and UW Libraries Special Collections, demonstrates how community-founded archives can scale oral history programs through strategic collaboration. Despite never having more than two full-time staff members (who also have other duties and responsibilities), the archive has completed close to 200 interviews across multiple projects by leveraging a collaborative model that designates clear roles for all players.
Scaling through partnership and project management
Conor Casey emphasized that success stems from recognizing institutional strengths and distributing work accordingly. The archive provides infrastructure, tools, and preservation expertise; community partners bring connections to narrators, scholarly expertise, and cultural knowledge essential for developing interview questions that are meaningful within the user base.
The Labor Archive’s workflow success relies on project management tools, including Asana templates, Google Drive file management, and comprehensive documentation. As an example of how documentation and project management tools work to ensure a smooth experience, Casey stressed the importance of “baking in” permissions, requiring signed release forms from participants before embarking on a project to enable stewardship of a given collection. These tools also ensure quality and consistency in metadata.
Ensuring that all parties are on the same page is also vitally important. Casey advised that clear roles and deadlines to keep the project on track, and that project charters or MOUs can be very helpful when collaborating between organizations and with multiple parties, even within organizations.
Even with AI promise, transcription remains a human-centered activity
Beginning in 2018, the Labor Archives began experimenting with AI-aided transcription using tools like Otter.ai, Maestra, and Descript. These tools are not one-size-fits-all solutions and depend heavily on project needs. When AI is employed, Casey’s workflow diagrams revealed a crucial insight: human intervention appears at nearly every stage of the AI-aided process.
Despite the promise of AI in some cases, a professional transcriptionist is still faster and more accurate for certain projects. “Professional transcribers may still be more efficient than just AI,” Casey noted. “You are still going to have to do a lot of work to correct, tag, and conform AI to transcription style guides.”
Accessibility is core to mission
Casey also discussed the implementation of many accessibility features, such as including transcriptions and captioned media for all interviews. Casey positioned accessibility not as an add-on compliance requirement, but as central to the archival mission. The archive proactively implemented transcription and captioning before Washington State’s 2017 mandate for all new online projects, viewing these activities as extensions of intellectual access rather than a burden.
Montana State University: AI to support accessibility needs
The Montana State University team faced a specific challenge: creating abstracts for approximately 350 oral history recordings in their Trout and Salmonid collection to meet ADA Title II accessibility requirements. The MSU library staff were very much encouraged by leadership to embark on a journey of learning around AI tools, so experimenting with tools to learn (and employing exploration and curiosity along the way) was a natural step for the team and the framework for their presentation and discussion overall.
These field recordings from around the world presented unique challenges, including:
Multilingual interviews conducted with interpreters
Variable field recording quality
Videos that had been edited to include interviewer questions only on text intertitles
Jodi Allison-Bunnell spoke about using Claude.ai to generate 200-word abstracts from full transcripts produced in Trint. The team found that Claude generally produced coherent abstracts. However, several limitations emerged:
Missing context: Because interviewer questions were only shown on intertitle slides, Claude struggled to provide complete context
Factual error: If proper names and technical terms were incorrect in the transcript, they were also in the abstract; the transcript had to be thoroughly reviewed to avoid promulgating errors
Over-inference: When given insufficient content, Claude would sometimes infer more about subjects than was supported by the source material and required a human with good knowledge of the transcript to intercept and correct the imbalance
The cataloger perspective: Quality and workflow integration
Emily O’Brien and Taylor Boyd assessed the abstracts from a metadata creation standpoint, focusing on whether AI-generated abstracts contained sufficient information for accurate cataloging without requiring catalogers to repeat work.
For this specific use case, Trint consistently outperformed Claude, Gemini, and ChatGPT. However, caution should be applied before declaring a clear and persistent winner in an area where tools and technology are evolving quickly. The team found that Claude abstracts generated months apart showed significant quality differences; this finding demonstrates that it is difficult to make static judgments about tools in this quickly evolving space.
O’Brien and Boyd also emphasized a vital point in assessing the accuracy and efficacy of AI-generated abstracts that will support metadata creation: unless AI-assisted humans have prior learned experience with creating abstract-level metadata, they make lack the ability to assess the quality of AI outputs.
Despite limitations, the team found significant time savings. Reading a transcript and writing an abstract from scratch took, on average, 1-2 hours, while reading a transcript, running it through Claude or Trint, and assessing and correcting the result took an average of 30 minutes. However, Allison-Bunnell also highlighted the need to maintain oversight over tools (including changes in terms of service) and to budget time to review workflows that might be impacted by those changes, as well as to develop, implement, and review AI use policies on an ongoing basis. Ultimately, time savings may be shifted across positions in ways that aren’t initially evident.
Lessons learned and workflow implications
Both presentations emphasized that effective workflows supported by AI require human oversight at multiple stages. The Labor Archives workflows reflect human intervention points throughout the process, while the Montana State team stressed that professional experience in the relevant domain is essential before AI tools can be effectively evaluated or implemented.
Successful oral history workflows cannot be separated from organizational context, resources, and mission. The Labor Archives’ collaborative model works because of their community-focused mission and partnership infrastructure. Montana State’s approach is grounded in their need to meet specific accessibility requirements and also in their deep collection strengths supported by corresponding curatorial and community knowledge. Both institutions demonstrated how accessibility considerations drive innovation rather than constrain it.
Looking forward
These presentations illustrated that effective oral history workflows require thoughtful integration of human expertise, technology, and collaborative partnerships. AI tools can enhance efficiency and accessibility, but they work best when implemented with a clear understanding of their limitations and within robust frameworks that center human knowledge.
The key insight across both presentations was that technology should amplify human-centered values and good project design, not replace them. Successful oral history programs leverage innovation in service of their core mission: preserving and providing access to irreplaceable cultural narratives. These presentations demonstrate the value of taking a stance of curiosity, exploring, and sharing our experiments and lessons learned as we navigate the integration of new technologies with traditional archival practice. We would love to hear about your experiments in this area!
—
Special thanks to Conor Casey, Jodi Allison-Bunnell, Emily O’Brien, and Taylor Boyd for generously sharing their insights and experiences. For more resources on oral history workflows, including templates and project management tools, view the slides and watch the full webinar recording from the event page.
The project is Safeguarding Research & Culture and I got involved back in March or April when Henrik reached out on social media looking for someone with library & metadata experience to contribute. I said that I wasn’t a Real Librarian but I’d love to help if I could, and now here we are.
The concept is simple: download public datasets that are at risk of being lost, and replicate them as widely as possible to make them hard to destroy, though obviously there’s a lot of complexity buried in that statement. When the Trump administration first took power, there were a lot of people around the world worried about this issue and wanting to help, so while there are a number of institutions & better resourced groups doing similar things, we aim to complement them by mobilising grassroots volunteers.
Downloading data isn’t always straightforward. It may be necessary to crawl an entire website, or query a poorly-documented API, or work within the constraints of rate-limiting so as not to overload an under-resourced server. That takes knowledge and skill, so part of the work is guiding and mentoring new contributors and fostering a community that can share what they learn and proactively find and try out new tools.
We also need people to be able to find and access the data, and volunteers to be able to contribute their storage to the network. We distribute data via the venerable BitTorrent protocol, which is very good at defeating censorship and getting data out to as many peers as possible as quickly as possible. To make those torrents discoverable, our dev team led by the incredible Jonny have built a catalogue of dataset torrents, playfully named SciOp. That’s built on well-established linked data standards like DCAT, the Data Catalogue Vocabulary, so the metadata is standardised and interoperable, and there’s a public API and a developing commandline client to make it even easier to process and upload datasets. There are even RSS and RDF feeds of datasets by tag, size, threat status or number of seeds (copies) in the network that you can plug into your favourite BitTorrent client to automatically start downloading newly published datasets. There are even exciting plans in the works to make it federated via ActivityPub, to give us a network of catalogues instead of just a single one.
We’re accidentally finding ourselves needing to push the state of the art in BitTorrent client implementations. If you’re familiar with the history of BitTorrent as a favoured tool for ahem less-than-legal media sharing, it probably won’t surprise you that most current BitTorrent clients are optimised for working with single audio-visual streams of about 1 to 2½ hours in length. Our scientific & cultural data is much more diverse than that, and the most popular clients can struggle for various reasons. In many cases there are BEPs (BitTorrent Enhancement Proposals) to extend the protocol to improve things, but these are optimal features that most clients don’t implement. The collection of BEPs that make up “BitTorrent v2” is a good example: most clients don’t support v2 well, so most people don’t bother making v2-compatible torrents, but that means there’s no demand to implement v2 in the clients. We are planning to make a scientific-grade BitTorrent client as a test-bed for these and other new ideas.
Myself I’m running one of a small number of “super” nodes in the swarm, with much more storage available than the average laptop or desktop, and often much better bandwidth too. That’s good, because some of our datasets run to multiple terabytes, plus to ensure new nodes can get started quickly we need to have some always-on nodes with most of the data available to others. Since BitTorrent is truly peer-to-peer, it doesn’t matter how many people have a copy of a given dataset, if none of them are online no-one else can access it.
This is all very technically interesting, but communications, community, governance, policy, documentation, funding are also vitally important, and for us these are all works in progress. We need volunteers to help with all of this, but especially those less-technical aspects. If you’re interested in helping, please drop us a line at contact@safeguar.de, or join our community forum and introduce yourself and your interests.
If you want to contribute but don’t feel you have the time or skills, well, to start with we’re more than happy to show you the ropes and help you get started, but as an alternative, I’m running one of those “super” nodes and you can contribute to my storage costs via GoFundMe: even a few quid helps. I currently have 3x 6TB hard drives with no space to mount them, so I’m currently in need of a drive cage to hold them and plug them into my server.
Special shout-out also to our sibling project, the Data Rescue Project, who are doing amazing work on this and often send us requests for websites or complex datasets for our community to save.
I’ve barely scratched the surface here, but I really want to actually get this post out for WDPD so I’m going to stop here and hopefully continue soon!
Hello, I am Dommanige Chameli Dilshani Menike. I came to ODU from Sri Lanka. I am Dr. Wu’s new Ph.D. student in the LAMP-SYS Lab. Currently, I am working on a project related to table Data extraction under Dr. Wu’s guidance, and I will be working as both a Graduate Teaching Assistant (GTA) and a Graduate Research Assistant (GRA) in Fall 2025.
I earned my BSc Honors degree in Computer Science from the University of Peradeniya. My core subjects were Computer Science, Mathematics, and Statistics. When I was choosing an undergraduate research topic, I realized that natural language processing (NLP) was my forte. After all, that interest and curiosity are what brought me here. My final-year research project, 'Towards Cyberbullying Identification on Social Media Using Text and Emojis,' provided an excellent opportunity to deepen my understanding of advanced Machine Learning and NLP concepts. The abstract was presented at the 8th International Conference of the Association of Sri Lankan Academics in Japan. I also co-authored a paper on "Using Machine Learning Techniques to Predict the In-situ Concrete Compressive Strength using Non-Destructive Testing," collaborating with a team of researchers from the Computer Science department and the Civil Engineering department. This experience improved my teamwork skills and my ability to present complex ideas in a collaborative setting.
My undergraduate internship as a software engineering intern at Eyepax IT Consulting provided me with valuable industry exposure, equipping me with the practical skills required to apply concepts to the real world. After the internship, I worked at the Department of Statistics and Computer Science, Faculty of Science at the University of Peradeniya, as a teaching assistant. It was amazing to work at the same place where I learned how to write a "Hello World" program for the first time. I worked there for two years. I worked as an assistant lecturer at APIIT, a higher education institution in Sri Lanka affiliated with the University of Staffordshire in the UK, for a year. These experiences not only gave me practical exposure but also strengthened my interest in teaching.
I believe true growth lies in the continuous pursuit of knowledge and the willingness to share and collaborate. This belief inspired me to pursue my Ph.D. journey. It is a great honor to be part of this amazing academic environment and to work in the LAMP-SYS Lab. I am committed to working diligently on my research, learning as much as possible, and contributing to this great place.
LibraryThing is pleased to sit down this month with award-winning African Australian author Eugen Bacon, whose Afro-Centered Futurisms in Our Speculative Fiction just won an Ignyte Award, and who won the 2025 Nebula Kate Wilhelm Solstice Award—given by the SFWA to an author who has made “significant contributions to the science fiction, fantasy, and related genres community”—earlier this year. Born in Tanzania, Eugen earned a Master of Science with distinction in distributed computer systems from the University of Greenwich, UK, and a Master of Arts and a doctorate in writing from Swinburne University of Technology, Australia. Before becoming a professional writer full-time, she worked in the information and communication technology field and continued to do so, juggling a day job with motherhood, professional editing and numerous writing projects.
Eugen has published numerous short stories and novels in various speculative fiction genres, and is particularly known for her Afrofuturism and exploration of gender. She was twice a finalist for the World Fantasy Award, Shirley Jackson Award, Aurealis Award, and in 2023 she won a British Fantasy Award in the Non-Fiction category for her An Earnest Blackness. The latter was also a finalist for a 2023 Locus Award, which she won this year in the Non-Fiction category for her Afro-Centered Futurisms in Our Speculative Fiction. She has won or been nominated for numerous other awards, and also has served as a judge for various Australian book prizes, as well as for global awards, including chairing various jury categories of the Bram Stoker Awards. She is currently the chair of the jury for the Otherwise Awards that encourage the exploration and expansion of gender. This past September a new novelette, Novic, a standalone prequel to Claiming T-Mo, the debut title in her Outbreeds series, was published by Meerkat Press. Also in September, The Nga’phandileh Whisperer, a novella in The Sauútiverse, was published by Star and Saberse Publishing. Eugen sat down with Abigail this month to discuss these stories, and her work in general.
Your fictional work is often described as Afrofuturist. You have explored this genre in your scholarly work as well, in titles like Afro-Centered Futurisms in Our Speculative Fiction. What is Afrofuturism, what does it entail, and what is its significance, both to you personally, and in a wider, global sense?
Actually, Abigail, my work is more than Afrofuturist. I like to think of it as Afrosurreal, or Afro-irreal—fantastical literature that demands the reader to trust and find immersion in the story’s impossibility. The irreal story stays unpredictable and believable in all its unbelievability. The reader finds immersion in the illusion, entranced in the satire or symbolism, cementing even while challenging realism. I use this type of fiction as an allegory for somber themes of belonging, social in/justice, climate action, “Othering”… in the real world. I hero Black people stories, giving voice in sometimes dystopian futurisms, to the woman in the village, the little orphan girl, little lost boy in the village—people who have seen much, suffered much, and need to find the hero/ine within.
In terms of Afrofuturism itself, there’s much discord about this term. It is for me, simply, Afrocentric representation in a kind of fiction that engages with difference. Afrofuturism is to reimagine Africa in all its diversity, to expand and extrapolate it through literature, music, the visual arts, religion, even philosophy. It is that which haunts imagination and transmutes itself into a craving for revolution.
What is the Sauútiverse, and The Sauúti Collective? How did it first get going, who’s involved, and what are its purpose and goals?
I am part of an Afrocentric collection of writers from across Africa—Ghana, Tanzania, Nigeria, South Africa, and the diaspora (Haitian American)… We came together in 2021 with a vision of Afrocentric-based collaboration, support and creativity. Together, we have invented a world deeply rooted in African culture, language and mythology. We’re like-minded creatives who came together for a shared cause in giving voice and space for Afrocentric literature. The Sauútiverse comprises five planets, two suns, two moons and is enmeshed with sound magic. The name of each planet, Ekwukwe, Órino-Rin, Zezépfeni, Wiimb-ó and Mahwé is derived from an African word meaning song. There is also an inhabited moon, Pinaa. You can find out more about the Sauútiverse in our FAQs.
There’s much potential in this Afrocentric universe, and we have a new anthology coming out, Sauúti Terrors, published by Flame Tree Publishing and distributed by Simon and Schuster. I co-edited this anthology with Cheryl S. Ntumy and Stephen Embleton, out in January 2026.
Tell us a little bit about The Nga’phandileh Whisperer specifically. Where does this story fit into the wider universe in which you have set it? Do readers need to be familiar with that universe beforehand, to fully appreciate the story?
Readers can approach The Nga’phandileh Whisperer blind without being familiar with the Sauútiverse. Each Sauútiverse story out there is standalone, self-sufficient and boasting a richness in unique and robust worldbuilding. I’ve written several short stories in this world, the first one “Sina, the Child with no Echo”, published in our first anthology, Mothersound: A Sauútiverse Anthology. “Sina” is set in Ekwukwe, the hollow planet, where having a magical echo is important. Sina’s neurodivergence is perceived a curse but turns out to be a gift. Another playful tale is “The Mystery of the Vanishing Echoes”, published in Sherlock is a Girl’s Name. In this multiverse story, sleuth Shaalok Ho-ohmsi and her ward Wa’watison are summoned to the planet Ekwukwe to solve a mystery of vanishing echoes.
I wanted to write a longer Sauútiverse story with a strong female protagonist, and found this in Chant’L—a young Guardian with an affinity to hive-minded beasts, unaware that she has more magic than she knows how to use. Hence The Nga’phandileh Whisperer.
The novella is a second-person ‘you’ narrative, addressing the protagonist. I am at home with this voice, a personal connection with the protagonist, seeing as they see, feeling as they feel, yet omniscient—knowing just a little more outside them.
Novic is the origin story for one of the characters in Claiming T-Mo. What is the Outbreeds series all about, and why was it important for you to go back and write a prequel to explain this specific character’s beginnings?
The Outbreeds series by Meerkat Press is about a breed of others. It engages with difference, tackling the unbelonging individual’s experience—even today in our polarized world, and especially in the current US environment, for being different. Novic is the father who broke tradition in my first novel Claiming T-Mo, with devastating consequences. I thought about why did he do that? What makes this immortal priest who and what he is? What makes him tick? Hence Novic, the story before the story. It’s a moment in time in Grovea, the made-up planet. I wanted to reconnect with Claiming T-Mo, to revive the versatility of a character’s light and shade. I scrutinized Novic’s story arc—a wandering anti-hero seeking to comprehend his incarnations, and had fun with this novelette that demystifies death.
You’re a very prolific author. Can you describe your writing process? Do you devote a certain amount of time daily to writing, do you write in a specific place, or have certain rituals? How do you plan your stories—do they unfold as you go along, or do you outline them?
I’m a very experimental writer! I write to explore. My writing is a curiosity, a response to a trigger or an incipient question troubling my mind. I write to find an answer, or a better question. I’m a very immersive writer and an immersive reader. I need to feel the story. All my stories, irrespective of genre, explore a character’s relationship with others, with themselves, and with the world around them. I feel their yearning and my quest begins.
Because my life is very busy, I’ve taught myself to write in the moment. I call it Sudden: writing on the go. Chunking in bits and pieces, scraps and notes to self, spurting in bite size. I jot down points that are little triggers, simple word or phrase prompts—mini scenes that don’t have to be perfect. Later, when I have time, I have all the minis to develop into robust scenes.
What’s next for you? Will you be writing more stories set in Sauútiverse, or more entries in the Outbreeds series? Do you have other forthcoming titles and projects?
In 2026, I have a Sauútiverse novel, Crimson in Quietus, an Afrocentric novel by Meerkat Press, Muntu, a novella by Bad Hand Books (you can pre-order it already, comes with a signed bookplate!), and a collection of short stories, Black Dingo, by Flame Tree and distributed by Simon and Schuster. In 2027, I have another collection of short stories, The Rawness of You, half of which comprises Sauútiverse stories—this is also by Meerkat Press. Let’s just say I am very prolific. Find my works on my website: eugenbacon.com.
The University of Illinois at Urbana-Champaign’s racist “Native American” mascot, Chief Illiniwek, has lingered on campus since the ban by the National Collegiate Athletic Association (NCAA) in 2007. Since that time, there have been numerous attempts to either embrace no mascot or find a new non-racist one. And yet, upper level administration has never committed to either one of these options, thereby leaving not only both options open, but also the idea that the racist mascot could return. This article speaks to the cultural context of the harm that racism and colonialism cause for Indigenous peoples that they encounter in institutions of higher education, including libraries. This article also contains considerations for administrators and other library employees to understand this context. Possible actionable items are provided to assist libraries in achieving more effective employment and service provisions for Indigenous people.
Introduction
Many of us are librarians because we have a strong service orientation. We often place great value on seeking information that enables us to enhance our service to one of our many constituencies. In addition to patrons and communities, employees are also a valuable constituency for the larger institution of any library. In my ensuing discussion, I hope to illuminate a non-comprehensive set of actionable takeaways to enhance service to Indigenous people and to Indigenous communities. My recommendations are foundational and meant to be built upon. Inasmuch as this may serve as a call to action, I prioritize listening to Indigenous communities and voices first and foremost and believing those voices when they tell us how to improve our practice and engagement.
Because In the Library with the Lead Pipe encourages self-reflection on positionality, I need to acknowledge that my own library experience as an employee is centered in academic libraries. As such, my recommendations are focused on academic libraries. Some recommendations are applicable in other contexts, though I would be remiss if I did not honor the many other types of libraries that serve Indigenous people and communities with all due apologies where my examples are too academic-centric. For a comprehensive history of tribal libraries see Littletree, 2018. For Indigenous use of public libraries and fans of exemplary methodology, see Burke, 2007. For an example of medical library engagement, see Kouame and Hewkapuge, 2016.
This is partially a story of my own cowardice and partially a cautionary tale about employee retention, but perhaps more importantly, it is a story of institutional cultural ineptitude. In October 2017, a group of protestors, mostly students, blocked a vehicle that was transporting Chancellor Robert J. Jones and his wife, Dr. Lynn Hassan Jones, MD, who were participating in the University of Illinois Urbana-Champaign (UIUC) homecoming parade. Chancellor Jones started as the Chancellor of UIUC at the same time as my own employment with the University Library. The protestors were calling upon Chancellor Jones to take action on the Chief Illiniwek mascot, who was officially retired in 2007, with no new mascot being chosen at that time. The lack of a new mascot created a vacuum, an open wound that refused to be healed for all involved. Some have lobbied to retain Chief Illiniwek as a matter of honor and tradition, while others have lobbied to appoint a new mascot in order to close one chapter and open a new one.
As of this writing, there is still no new mascot to replace Chief Illiniwek, despite the fact that a new mascot has been proposed and has broad support from students and from those who want a definitive end to a long-lingering controversy. With students again taking the lead, in 2019, then-undergraduate student Spencer Hulsey proposed the belted kingfisher bird (Anghel and Isaf, 2020).
As is often the case, today’s students are envisioning a horizon of positive opportunities to end a problematic controversy and choose a more unifying mascot. Unfortunately, the current senior administration of the UIUC campus, and the current Board of Trustees for the University of Illinois system, are too focused on the next step in front of them to understand what the horizon may offer. While I do not endorse their view and their behavior, I am also guilty of the same inability to project beyond the immediate step in front of me.
I have noted that my employment with UIUC started in tandem with Chancellor Jones’ employment. The Chancellor’s position is one of considerable positional power. My own position carries far less. However, my employment was also an opportunity for me. It came with the financial benefits of more money than I had been making. It also opened many doors for career growth and development. The University Library at UIUC is a well-respected, large, research library. Multiple library employees have accrued international respect within their areas of expertise. As promised when I interviewed, my employment at the University Library afforded me opportunities for professional development and growth that would have otherwise remained unavailable to me. I was also fortunate to participate in the continued growth of one of the preeminent library collections in the world.
Being only a year into my employment, I remained aloof from the 2017 homecoming protest and of any criticism of the Chief Illiniwek mascot. It would be an easy and self-absolving exercise to point fingers at others, including Chancellor Jones. I could lament the failure to implement a new mascot and pepper this discussion with snarky barbs and dismissive remarks that imply a failure on the part of Chancellor Jones, the broader senior administration of UIUC, and of the Board of Trustees. But that would be a failure on my part to acknowledge my own complicit efforts of self-preservation and self-benefit. While I do worry that the senior administration and Board of Trustees have prioritized financial benefits over integrity, I have also done the same. At no point did I stand arm-in-arm with any student protestors. At no point have I reached out to the Chancellor, nor to anyone else, to champion a new mascot.
American Indians, Alaska Natives, and Native Hawaiians (Indigenous people of the United States, abbreviated throughout as Indigenous people) often exist in relationship to their generational obligations. I cannot speak for all 574 federally recognized tribes of the United States. I can state that many of us are rooted in cultures of generational obligation. The nations of the Haudenosaunee confederation have a core value called the Seventh generation (Haudenosaunee Confederacy, n.d.). At a foundational level, this means stewarding and preserving land, resources, knowledge, and culture for the next seven generations. It is also a value that reflects the commitment to make decisions, and to take actions that are not exclusive to the current generation, but that also yield benefits for future generations. This same value is shared by many Indigenous people and is a strong personal value that guides my own personal and professional moral code. For a discussion that is rich in both scholarship and poetic sensibility rooted in an ecological context, see LaDuke and Cowen, 2020.
At a more holistic level, however, the Seventh generation value is more than a call to serve future generations. The principle of generational obligation extends across all generations. We honor our Ancestors who came before us, out of gratitude for all they have done for us. We honor our living Elders for the wisdom they share and the stewardship they have provided. We also honor and serve those who will come after us, to the seventh generation and beyond. Indigenous people often carry a commitment to honor the gift of the past and to preserve the gift of the future. It is this value that I have violated for too long.
Both in 2017 and beyond, I became too enamored of protecting my personal privilege and position. While I was not in a tenure-track faculty position, I did hold a position that was more rewarding on multiple fronts than any previous position I had held, including pay. Because of this, I deferred on my obligation to stand with, and on behalf of, the younger generations who engaged in active protest and problem-solving. I both admired and envied the students and other protestors who remained true to their core values. I worried too much that engaging with a controversial topic would harm my employment. Thus, this article is as much a reflection on that failure as it is an indictment of anyone else. Now that the hardest work has been done by younger generations, at long last, I must take ownership of my own responsibilities and endorse the work that they have done.
Discussion
This article is ostensibly about Indigenous mascots, an issue that exists within the context of Indigenous people in the United States. I confess to not being aware whether or not such issues of sports mascots extend to other Indigenous communities around the world. Now is a fair time to disclose that the discussion of issues around Indigenous mascots serves as both an end unto itself as well as a springboard into a broader understanding of serving and working with libraries’ Indigenous employees and patrons. Non-Indigenous people are happy to ascribe mascots generally, and Indigenous mascots specifically, as symbols with many proposed purposes. Indeed, some libraries have put school mascots to good use promoting information literacy, such as the University of Idaho (Henrich and Prorak, 2010). This is a reclamation project of sorts. I seek here to reframe mascots as symbols that speak to Indigenous identity and the lived experiences of Indigenous employees and patrons.
Many non-Indigenous people want a compelling answer to the question of why mascots are offensive. Of primary importance to those questioning the matter is an effort to quantify the degree of offense. It is as if “harm” must be measured against a quantitative scale, and mascots simply do not achieve a high enough reading to warrant consideration. Many Indigenous individuals are often subjected to accidental participation in discussions where non-Indigenous people split hairs and dissect the degree of offense inherent in any given mascot. “Well, that case over there is clearly pejorative and offensive, but in this case, is it really that bad?” While it may be an admirable intention to understand, it largely misses the point. I cannot speak for all American Indians, Indigenous individuals or other Indigenous communities. I can speak only to my individual lived experience as a citizen of the sovereign Muscogee Nation of Oklahoma. I cannot even speak for the Muscogee Nation any more than a lone individual can speak for any other entire nation. The American Indian Studies Program has a webpage listing “…organizations who have made resolutions and statements against the use of Indian mascots” (Mascot Resolutions). This list of resolutions does include the Peoria Tribe of Oklahoma, descendants of a group of tribes known as the Illiniwek (Peoria Resolution).
For me it is so fundamental that I am unsure why additional hair-splitting and rubrics are necessary to articulate the level of offense. In short, an entire group of people are stating that their lived experiences are being devalued, dehumanized, and disrespected. Yet, somehow, when matters pertain to Indigenous peoples, it is bewildering to others that we would want to be treated with fundamental respect and human dignity. Should that not be enough? Should it not be the case that “We find that this behavior undermines our fundamental human dignity” serves as a statement of power and value sufficient to inform the conversation? Above all else, those who cling to Indigenous mascots so often seem to do so out of a determined animosity of persistence as much as anything else. For a solid academic treatment of these issues, see Haudenosaunee author Taylor, 2015.
To be fair to nuance, an example of a university that does maintain a relationship with a tribal government is Florida State University and the Seminole Tribe of Florida. Florida State’s webpage about its relationship with the Seminole Tribe of Florida states, “FSU does not have a mascot, but rather a symbol that we respect and honor,” in reference to Osceola and Renegade (Florida State University). Florida State does not identify a relationship with the Great Seminole Nation of Oklahoma. As an article from the Oklahoman clarifies, not all Seminole individuals have shared opinions (Tramel, 2011). Writing for FSUnews in 2020, Matty Mendez provides a concise yet reflective consideration of this issue as an FSU student (Mendez, 2020). Even the imprimatur of tribal endorsement does not ensure a lack of opposition.
For those such as myself who take issue with Indigenous mascots it is because I and others are often forced to listen to discourses of condescension that seek to clarify for me that I do not understand how these are matters of honor and tradition. Quoting from the Honor the Chief Society website: “In the years following the controversial NCAA decision, stories began to emerge detailing the efforts of anti-imagery hate groups at Universities across the nation.” (Honor the Chief). Why is it that honor and tradition that has roots over centuries is somehow discarded and swept away by institutions that are, by comparison, barely toddlers in the grand arch of lived experiences on the American continents? There may be many reasons why this is the case; presumably one reason is that colonialist institutional traditions matter and colonized traditions do not.
This subordination of colonized people and their cultures, values and yes, traditions, is a core problem of offensive Indigenous mascots. The refusal to acknowledge and honor the culture and values of colonized people is a dissolution of self-sufficiency and autonomy. In other words, offensive Indigenous mascots contribute to the erosion of Indigenous sovereignty. “Sovereignty” can be a broad term that operates as a kind of synonym for self-sufficiency, including at the individual level of Indigenous employees and patrons. In this context, it is worth noting that “sovereignty” is also a real and tangible consideration for the tribal nations of which Indigenous people are citizens, for those tribal nations are sovereign nations by matter of treaty rights with the United States government.
Sovereignty is a legal reality. Quoting from the Bureau of Indian Affairs website, “…federally recognized tribes are recognized as possessing certain inherent rights of self-government (i.e., tribal sovereignty)…” (What Is a Federally Recognized Tribe? | Indian Affairs, n.d.). As of this writing, there are 574 federally recognized American Indian and Alaska Native tribes and villages. There are also tribal nations with state-level recognition, but not federal, as well as those nations that sovereign tribal nations recognize as also being sovereign nations, even if the U.S. federal government has chosen not to establish and maintain government-to-government relations with same-said nations. Hitting close to home, my own nation, the Muscogee Nation, was at the center of a 2020 Supreme Court case, McGirt vs. Oklahoma. While authored by a legal scholar, Maggie Blackhawk’s analysis of the case is accessible enough for lay readers to get a sense of the scope of complexity surrounding sovereignty in a legal context (Blackhawk, 2020).
Sovereignty is a rich topic warranting several disciplinary academic publications to address, both legal and otherwise. This article is not concerned with the litany of legal and scholarly references that might otherwise abound. Rather, the emphasis here is understanding that sovereignty both exists and is of critical importance to Indigenous communities and individuals as a key first step in engagement. The inherent rights of sovereignty also underscore that indigenous mascots are offensive because those mascots are a means of hijacking the self-determination that is part and parcel of sovereignty. The unwillingness of those who cling to Indigenous mascots is more than a commitment to tradition. This refusal is also a denial to Indigenous people of the basic rights of self-determination. It is this beginning of the spiral into dehumanization that is so problematic.
Even if the process of dehumanization inherent in the racism of Indigenous mascots were only a matter of violations of respect, that should still be enough to take the issue seriously. However, the consequences to Indigenous people do not stop with a violation of respect. There are more sinister implications for Indigenous people that extend from the general dehumanization of our existence. Health disparities, in particular, provide an example in high relief.
The Indian Health Services lists a number of mortality disparities. These disparities include higher rates of heart disease, substance abuse, suicides, and more. Indigenous people also experience higher rates of homicide and violence (Disparities | Fact Sheets, 2013). Indigenous women and girls, in particular, experience violence at a disproportionate rate that is alarming and horrifying (Violence Against American Indian and Alaska Native Women and Men | National Institute of Justice, n.d.).
The Murdered and Missing Indigenous Women and Girls (MMIWG) crisis is too often overlooked by mainstream media and the broader public. A higher regard for Indigenous people might go a long way in raising awareness of MMIWG and, in turn, improve resources and interventions to protect Indigenous lives. A 2016 study by the National Institute of Justice (NIJ) found that more than four in five American Indian and Alaska Native women (84.3 percent) have experienced violence in their lifetime, including 56.1 percent who have experienced sexual violence. There is also a crisis of case follow-up for reports of MMIWG. The Urban Indian Health Institute’s report, “Murdered and Missing Indigenous Women & Girls: A Snapshot of Data,” opens by highlighting two compelling statistics: 5,712 cases of MMIWG were reported in 2016 with only 116 of those cases being logged in the Department of Justice database. It also quotes a statistic from the Center for Disease Control and Prevention that the third-leading cause of death for American Indian and Alaska Native women is murder.
Other statistics include those from the Bureau of Justice Statistics. Rates are broken out in a table by “White,” “Black,” “Hispanic,” “Asian/Native Hawaiian/Other Pacific Islander” and the incredibly helpful “Other” category (Violent Victimization by Race or Hispanic Origin, 2008–2021 | Bureau of Justice Statistics, n.d.). Indigenous people are often included in a category that might be labeled as “other,” “unknown,” or, as CNN famously reported on 2020 Presidential voting statistics, “Something Else” (KTOO, 2020).
This is not an effort to make a case of causation, nor to suggest that the existence of Indigenous mascots leads to health disparities, increases in suicide, and increases in violence, murder, and human trafficking. It is the case that many Indigenous people see in mascots the perpetuation of dehumanization and an implication that we are somehow a subspecies of humankind. For those who argue that Indigenous mascots are symbols of honor and tradition, this is the counterpoint. For many Indigenous people, mascots are symbols, though not symbols of honor and tradition. Rather, they are symbols of the endorsement of dehumanization and subspeciation of Indigenous people. It is also the case that this same lesser-than valuation of Indigenous lives is also enmeshed in the kinds of disparities that are much larger and complex issues of access to health and wellbeing, to the justice system, and to the information resources that enable individuals and communities to access and leverage these and the benefits of other societal infrastructures.
What are the takeaways for libraries? The key takeaway is understanding that Indigenous employees serve in your libraries and that Indigenous patrons are part of your service population. While it may not be possible to steer the broader institution or society, it is possible to act in an informed manner to provide more inclusive and welcoming spaces for Indigenous employees and patrons. (For a Canadian perspective on both de-colonizing library spaces and a succinct assessment of Indigenous library employees, see Edwards, 2019. For an overview of Indigenous students and academic libraries, see Bucy, 2022.)
Actionable Ideas
Awareness
Most academic libraries are not specifically part of institutions that maintain a problematic mascot, but some institutions still have active racist mascots. Others have ostensibly retired mascots but have eschewed the kind of senior administrative integrity and commitment to moral and ethical honor to fill a vacuum of no mascot. UIUC comes to mind as an institution where senior university administrators can still only muster enough integrity to engage in hand-wringing staged confusion about why a subspecies of Indigenous almost-people would want to be treated with dignity and respect. However, for those of you who do not have a current problematic mascot, you certainly have people who work for your institution and library, are served by your institution and library, and are guests of your institution and library, that encounter mascots in K-12 schools and professional sports. Your institution and library may be in either geographic proximity or in partnership with some of these institutions, or may encounter alumni or fans of professional sports teams that identify with problematic mascots.This is not a call to action for library administrators to stage open protests as a matter of course in your institutional obligations. Rather, as has been emphasized already, knowing that you have Indigenous employees, patrons, and guests is a key contribution in its own right. Know the sovereign nations that are acknowledged in your land acknowledgement. Also know those sovereign nations that are part of your institutional and library constituencies.
Knowing when you have sovereign governments as part of your geographic community is both a critical first step for engagement and, one would hope, a fundamental point of awareness for senior administrators and leaders, both within the library and beyond. What the particulars of this awareness mean is rather beside the point. Knowing that you may share proximity to a sovereign nation would seem to be a matter of an informed administration, as most institutions of higher education tend not to embrace willful ignorance. To spell it out, a lack of awareness of tribal institutions near your institution does constitute such a willful ignorance that can be and is easy to remedy. It is this key point about willful ignorance that is underscored for our current purposes.
Land acknowledgements
Many libraries can repurpose a land acknowledgement that has been created by the larger institution. Libraries that seek to engage with the land acknowledgement more fully can adapt the institutional land acknowledgement or create their own. Arizona State University Library has such a land acknowledgement, as one example (Indigenous Land Acknowledgement | ASU Library, n.d.).
Land acknowledgements may not be embraced even by all Indigenous people. See, for example, Robinson, 2019. At least one reason for this is that these can become both performative and hollow. If your institution has a land acknowledgement and if your library chooses to create your own land acknowledgement, it is worth having an intentional conversation. When and for what purposes will you use your land acknowledgement?
In addition to posting on your library website, a deliberate and employee-engaged agreement about when and how to read and share your land acknowledgment may go a long way in staving off the awkward performative experience that sometimes accompanies land acknowledgements. This does not need to be a universal consensus. Discussions within departments and units or within employee representation groups may have to serve these purposes, as happens with other topics for which there is not a straightforward technical solution. Even communicating a justification as shared knowledge may be beneficial. In addition, intentional silence before and after the land acknowledgement can mark the sharing of the land acknowledgement with reflection and solemnity. This need not be an extensive time. However, a rushed and compulsory reading is not going to leave anyone feeling ennobled, empowered, or affirmed. If your library cannot achieve the kind of tone a land acknowledgement deserves, skipping it is probably preferable to a hasty reading that smacks of rote obligation and performativity. Reading the land acknowledgement as though you are reading it to honored Indigenous guests and to honored Indigenous hosts of a sovereign nation will serve you well because you are in fact reading it to a sovereign nation.
The value of a land acknowledgement, and the reading of said acknowledgement, can serve as the counterbalance to the experience of mascots for Indigenous employees and patrons. Just as the mascot can leave your Indigenous employees, patrons, and other Indigenous guests of the library feeling overlooked, undervalued, and marginalized, a sincere land acknowledgement can recognize that you welcome Indigenous people, our Ancestors, our living Elders, and that you affirm your awareness that we still exist, live with you, work with you, and that we are in your service population.
Partnerships and collaboration
For institutions that have a specific scope of programs, are private institutions, or otherwise are not a natural fit for partnering with tribal governments and tribal institutions, there is good news. A key piece of advice is that you probably should not force an engagement strategy for the sake of having one. If a partnership with a sovereign tribal nation is not a natural fit for your institution, it will be unlikely to be so for the sovereign tribal nation and community.
If you are a public institution, especially a land grant institution, hold relevant collections, or otherwise have an impetus for a partnership, treat those as any other partnership. For best practice, ask yourselves first what you can offer or contribute by doing work within your library or institution that benefits the partnership as much as possible. In short, avoid extractive partnerships that are created for the purposes of a social media post or a seemingly impressive line in an annual report. Too often, tribal communities or experts are flooded with requests, many of which take time and effort, and contribute no value to the community or individual on the other end of the proposed partnership.
You also do not need to approach Indigenous communities and individuals as though everything is shrouded in sage smoke and mirrors. The people driving the partnerships you seek, or are cultivating, are individuals, familiar with the contemporary world and active professionals and experts. This may be a personal aggravation in being asked “how do I engage with tribal communities,” and having to answer, “Did you send a respectful introductory email or make a phone call?” The vast majority of the time, that is the ideal first step. From there, the person or people with whom you are interacting are able to let you know what they need and expect from you as readily as you are able to do with any other colleague, expert, or organizational partner.
Collections
Consider buying materials relevant to your library user populations, including Indigenous communities and people. This is not a call to buy any and all collections materials available. The same collection development principles guiding any collections apply here. An easy example might include juvenile literature collections and language collections. All academic libraries collect within parameters that fit their long-term and local needs. To repeat a recurring theme, be aware of those Indigenous communities that are most likely to use your library and buy materials that support whatever patron needs make sense. If you have Indigenous scholars, students, or public visitors, do you have materials that are relevant for their research, curricular, and reading needs?
On that front, a note on special collections is warranted here. Where you hold special collections that are Indigenous materials, encourage and rely on your special collections experts. If they seek further training on topics such as repatriation or the appropriate use of Traditional Knowledge labels (TK Labels – Local Contexts, n.d.) administrators can provide the same level of professional development funding and time allocations as for any other area of professional practice. If you hold materials that a tribal community is more than happy for you to hold, but those materials can benefit researchers, experts, and community members, an outreach and engagement strategy can be implemented as with any other community who might be a stakeholder in your collections. In the United States, Arizona State University’s Labriola Data Center is by far the standard-bearer for best practices for incorporating professional principles for both libraries and archives. For an initial overview of the literature on Indigenous collections in both libraries and archives, see Reijerkerk and Nyitray, 2022.
If you do have materials that are sensitive in nature, this is another area where established professional practice will serve you well. Your special collections staff, be they archivists or rare books experts, are well-acquainted with materials that are sensitive in nature or may need access restrictions in place. If it is critical for you to know why those restrictions need to be in place, letting the impacted Indigenous communities and individuals lead that discussion is an effective strategy to show respect. In some instances, knowing that restrictions are necessary is and should be sufficient. If a tribal Elder or expert tells you that some materials need to be restricted and that they cannot get into detailed explanations as to why, believing them is really all that is required of your special collections professionals. It bears reiterating that sensitive and restricted materials have long-existed in special collections and your staff either know the protocols or know where to seek professional development. Working with Indigenous communities and individuals is not so distinct from established professional practices that fear of engagement should inhibit said engagement.
To contextualize this, we can look to the Native American Graves Protection and Repatriation Act (NAGPRA) Because NAGPRA so often involves objects, this might be more familiar to museum professionals. But libraries and archives are also impacted by NAGPRA. The discussion above is not to imply that professionals disregard NAGPRA. Rather, it is to state that if your collections are impacted by NAGPRA, your professional staff should and do know this. Placing confidence and trust in both the Indigenous communities and in your own professionals is the takeaway here. What is not acceptable is fretting that Indigenous materials are steeped in such mystery and otherness that the library and archives avoid engagement with these materials and these communities.
Conclusion
While an editorial for academic library employees is far from a brave stand as an activist, this article is the beginning of an effort to reclaim my responsibilities to other Indigenous people. It is also an effort to remediate the cowardice I initially showed in failing to stand with, and on behalf of, braver students who have led the protests and calls for a new mascot at UIUC. It is a weak beginning, but a beginning nevertheless.
The opening statement also indicated that this is a cautionary tale of retention. Library administrators are often concerned about retention of employees, especially “diversity” employees. There are also times when library administrators may be content for a certain amount of healthy turnover. In my own case, I am not the one who can assess nor articulate if my leaving Illinois is a matter of failed retention or a win for the University Library overall. However, the cautionary tale here is that it is possible to lose employees because of the broader institutional environment.
Ultimately, I took to heart the message that was sent to me loudly and clearly by the University of Illinois Urbana-Champaign. I am neither welcomed nor wanted there. This was the ultimate message I took away from the unresolved mascot controversy. At best, I served as a token hire. At worst, I was a necessary evil to fill a gap. What I did decide is that I was neither valued nor welcomed at Illinois. Is this the University Library’s fault? I do not think the University Library is responsible for turning the tide on larger institutional controversies.
However, the willingness of library administrators and employees to address and redress such controversies and issues is within the hands of any library. As we find ourselves in a time of attacks on diversity, equity and inclusion initiatives, this is a call to understand the problem as more than a Band-Aid issue. It is also more than a general, theoretical question about blandly stated commitments to diversity recruitment. It is a moment for all of ourselves to ask, “Am I doing what I can do to make sure employees, students, researchers, and visitors all feel welcome in my library, or am I clinging to cowardice to preserve my own comfort?”
As a department head myself in my current role, I am confronted with having to change course. I was not in a leadership position of any kind in 2017, hierarchically speaking. However, the failure to embrace leadership was not a failure of the nature of my position at the time. It was my own failure to support those who were doing the harder work of long-term thinking and planning. Fortunately, university students will continue to believe they really can change the course of history and the world. The current student generations will always achieve these goals. We can either be privileged to participate or obstinate enough to be a negative footnote to their efforts.
Acknowledgements
I offer much gratitude to reviewers Jane Nichols and Ian Beilin for insightful comments and for going as far as to suggest possible additions for citations. I was happy to incorporate these. I also appreciate the sincerity and close reading that both have provided. I also thank Jaena Rae Cabrera and the editorial board and the flexibility even when I missed a key deadline.
I owe one final acknowledgement:
There exists a mighty oak tree who has nurtured many acorns to grow. This article is one of the seeds sown by said oak tree.
Blackhawk, M. (2020). On Power and the Law: McGirt v. Oklahoma. Supreme Court Review, 2020, 367–422. https://doi.org/10.1086/715493
Bucy, R. (2022). Native American Student Experiences of the Academic Library. College & Research Libraries, 83(3), 416. https://doi.org/10.5860/crl.83.3.416
Burke, S. K. (2007). The Use of Public Libraries by Native Americans. The Library Quarterly, 77(4), 429–461. https://doi.org/10.1086/520998
Edwards, A. (2019). Unsettling the Future by Uncovering the Past: Decolonizing Academic Libraries and Librarianship. Partnership: The Canadian Journal of Library and Information Practice and Research, 14(1). https://doi.org/10.21083/partnership.v14i1.5161
Henrich, K. J., and Prorak, D. (2010). A school mascot walks into the library: Tapping school spirit for library instruction videos. Reference Services Review, 38(4), 663–675. https://doi.org/10.1108/00907321011090791
Littletree, Sandra.(2018, July 31). “Let Me Tell You About Indian Libraries”: Self-Determination, Leadership, and Vision — The Basis of Tribal Library Development in the United States.http://hdl.handle.net/1773/42418.
Reijerkerk, D., and Nyitray, K. J. (2023). (Re)Moving Indigenous Knowledge Boundaries: A Review of Library and Archival Collection Literature since the Enactment of the Native American Graves Protection and Repatriation Act (NAGPRA). Collection Management, 48(1), 22–47. https://doi.org/10.1080/01462679.2022.2033144
Robinson, D., Hill, K. J. C., Ruffo, A. G., Couture, S., and Ravensbergen, L. C. (2019). Rethinking the Practice and Performance of Indigenous Land Acknowledgement. Canadian Theatre Review, 177(1), 20–30. https://muse-jhu-edu.er.lib.k-state.edu/article/716610
Taylor, M. (2015). Indian-Styled Mascots, Masculinity, and the Manipulated Indian Body: Chief Illiniwek and the Embodiment of Tradition. Ethnohistory, 62(1), 119–143. https://doi.org/10.1215/00141801-2681750
The process of standardising data with ODE is a direct contribution to AAOO's goal of creating AI models that are built on respectful, accurately represented, and ethically sourced data from the Global South.
Win free books from the November 2025 batch of Early Reviewer titles! We’ve got 251 books this month, and a grand total of 3,430 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.
The deadline to request a copy is Tuesday, November 25th at 6PM EST.
Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the US, the UK, Israel, Canada, Australia, Germany, Ireland, Poland, Luxembourg, Malta and more. Make sure to check the message on each book to see if it can be sent to your country.
Thanks to all the publishers participating this month!
We’ve just released a new version of Zotero for iOS that adds two major new features:
EPUB and webpage snapshot annotation
Zotero 7 added the ability to view and annotate EPUBs and webpage snapshots, and it’s now possible in Zotero for iOS as well. You can add highlights, underlines, and sticky notes to EPUBs and snapshots and seamlessly sync those annotations with Zotero on your computer.
EPUBs and webpage snapshots are particularly well suited to mobile devices because of their ability to reflow text based on screen size and orientation, and some publishers now offer EPUBs as alternatives to PDFs. Zotero will still automatically download PDFs rather than EPUBs from most sites, but we’ll be providing a way to choose a preferred attachment format in a future version.
PDF metadata retrieval
Zotero for iOS can also now automatically retrieve bibliographic details for standalone PDFs shared with the app from your browser or other apps or added directly from within the app. We still recommend sharing an article page from the browser when possible and allowing the app to automatically download the available PDF, but when that’s not an option, you can save a PDF directly and let the app attempt to create a parent item without needing to run Retrieve Metadata back on the desktop.
This year we’re once again focusing on indie bookstores from around the United States. You can still order Kindle ebooks, we have Kenny’s and Blackwell’s for international orders, and also stores local to Australia, New Zealand, and Ireland. » SIGN UP FOR SANTATHING NOW!
You pay $15–$50 and pick your favorite bookseller. We match you with a participant, and you play Santa by selecting books for them. Another Santa does the same for you, in secret. LibraryThing does the ordering, and you get the joy of giving AND receiving books!
Sign up once or thrice, for yourself or someone else.
Even if you don’t want to be a Santa, you can help by suggesting books for others. Click on an existing SantaThing profile to leave a suggestion.
Every year, LibraryThing members give generously to each other through SantaThing. If you’d like to donate an entry, or want to participate, but it’s just not in the budget this year, be sure to check out our Donations Thread here.
Important Dates
Sign-ups close MONDAY, November 24th at 12pm EST. By the next day, we’ll notify you via LibraryThing message who your Santee is, and you can start picking books.
You’ll then have a little more than a week to pick your books, until THURSDAY, December 4th at 12pm EST. As soon as the picking ends, the ordering begins, and we’ll get all the books out to you as soon as we can.
We’re once again teaming up with independent bookstores from around the country to deliver your SantaThing picks, including BookPeople in Austin, TX, Longfellow Books in Portland, ME, and Powell’s Books in Portland, OR.
We’re also bringing back the following foreign retail partners: Readings for our Australian participants, Time Out Books for the Kiwi participants, and Kennys for our Irish friends.
International deliveries will be available through Kennys and Blackwell’s.
Kindle options are available to all members, regardless of location. To receive Kindle ebooks, your Kindle must be registered on Amazon.com (not .co.uk, .ca, etc.). See more information about all the stores.
Shipping
Some of our booksellers are able to offer free shipping, and some are not. Depending on your bookseller of choice, you may receive $6 less in books, to cover shipping costs. You can find details about shipping costs and holiday ordering deadlines for each of our booksellers here on the SantaThing Help page. » Go sign up now!
Questions? Comments?
This is our NINETEENTH year of SantaThing. See the SantaThing Help page for further details and FAQ.
Feel free to ask your questions over on this Talk topic, or you can contact Kate directly at kate@librarything.com.
Happy SantaThinging!
Hello DLF Community! It’s November, which means it’s finally, finally, finally time for the Forum in Denver! After months of preparation, we are so excited to welcome many of you to convene in person in the Mile High City for the 2025 DLF Forum and Learn@DLF. If you can’t make it, be sure to check the Forum website soon for shared notes, uploaded presentations, and information on how to stream our fantastic opening plenary talk by featured speaker Dr. KáLyn Coghill. Read on for news about what else is going on in our community this month.
— Aliya from Team DLF
This month’s news:
Community Call: IIIF welcomes the team from DadoCM–a minimal, system-agnostic framework for describing digital objects with archival principles–to their Community Call on November 12 at 12pm ET / 9am PT. More info is available on the IIIF Community Calendar.
Working Group Survey: The DLF Assessment Interest Group, Metadata Working Group (MWG) is conducting a brief survey — expected to take around 5 minutes or less — to gather data for planning next year’s activities. If you are a current or former member of the MWG or have any interest in metadata assessment and quality activities, please consider answering a few questions.
For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.
DLF Digital Accessibility Working Group (DAWG): Tuesday, 11/4, 2pm ET / 11am PT
DLF Born-Digital Access Working Group (BDAWG): Tuesday, 11/4, 2pm ET / 11am PT
DLF AIG Metadata Assessment: Thursday, 11/6, 1:15 pm ET / 10:15 am PT
DLF AIG Cultural Assessment Working Group: Monday, 11/10, 1pm ET / 10am PT
DLF AIG User Experience Working Group: Friday, 11/21, 11am ET / 8am PT
arXiv’s computer science (CS) category has updated its moderation
practice with respect to review (or survey) articles and position
papers. Before being considered for submission to arXiv’s CS category,
review articles and position papers must now be accepted at a journal or
a conference and complete successful peer review. When submitting review
articles or position papers, authors must include documentation of
successful peer review to receive full consideration. Review/survey
articles or position papers submitted to arXiv without this
documentation will be likely to be rejected and not appear on arXiv.
A warning about how AI chat tools—like ChatGPT, Claude, Gemini, and
others—can become dangerously addictive in a specific way. Not like
social media addiction. Something harder to spot.
Yet the boat was a refuge. We had wandered in search of somewhere to
settle — or, more precisely, somewhere we would be allowed to settle —
for four years, moving between whatever cheap, temporary accommodation
we could borrow or afford to rent in seven countries, until this unkempt
but graceful vessel suggested a possibility of shelter, of ‘fixedness’,
even as we were compelled by circumstances to remain mobile, unrooted
Part history, part fantasy. Hakim Bey is the nom de plume of Peter
Lamborn Wilson, who led an interesting life:
https://en.wikipedia.org/wiki/Peter_Lamborn_Wilson
Continuous and reliable access to curated biological data repositories
is indispensable for accelerating rigorous scientific inquiry and
fostering reproducible research. Centralized repositories, though widely
used, are vulnerable to single points of failure arising from
cyberattacks, technical faults, natural disasters, or funding and
political uncertainties. This can lead to widespread data
unavailability, data loss, integrity compromises, and substantial delays
in critical research, ultimately impeding scientific progress.
Centralizing essential scientific resources in a single geopolitical or
institutional hub is inherently dangerous, as any disruption can
paralyze diverse ongoing research. The rapid acceleration of data
generation, combined with an increasingly volatile global landscape,
necessitates a critical re-evaluation of the sustainability of
centralized models. Implementing federated and decentralized
architectures presents a compelling and future-oriented pathway to
substantially strengthen the resilience of scientific data
infrastructures, thereby mitigating vulnerabilities and ensuring the
long-term integrity of data. Here, we examine the structural limitations
of centralized repositories, evaluate federated and decentralized
models, and propose a hybrid framework for resilient, FAIR, and
sustainable scientific data stewardship. Such an approach offers a
significant reduction in exposure to governance instability,
infrastructural fragility, and funding volatility, and also fosters
fairness and global accessibility. The future of open science depends on
integrating these complementary approaches to establish a globally
distributed, economically sustainable, and institutionally robust
infrastructure that safeguards scientific data as a public good, further
ensuring continued accessibility, interoperability, and preservation for
generations to come.
A quick note: At my company, Cleanview, we recently built a data center
tracker. I’ve been following the development of AI and data centers
closely for much of the last few years. And even I’ve been shocked by
what I’ve learned in building this product. Today I’m publishing the
first story that uses our data center tracker. If you want to use the
platform, you can sign up for Cleanview here.
Open-source data portals can be really hard to install and maintain. But
their basic purpose of providing links to download data really isn’t
that complicated. JKAN is a proof-of-concept that allows a small,
resource-strapped government agency to stand-up an open data portal by
simply clicking the fork button.
Hyparquet is a lightweight, dependency-free, pure JavaScript library for
parsing Apache Parquet files. Apache Parquet is a popular columnar
storage format that is widely used in data engineering, data science,
and machine learning applications for efficiently storing and processing
large datasets.
As part of our Public Data Project, LIL recently launched Data.gov
Archive Search. In this post, we look under the hood and reflect on how
and why we built this project the way we did.
Santa Sangre (Spanish pronunciation: [ˈsãn̪.t̪a ˈsãŋ.ɡɾe], ‘Holy Blood’)
is a 1989 surrealist psychological horror film directed by Alejandro
Jodorowsky and written by Jodorowsky along with Claudio Argento and
Roberto Leoni. It stars Axel Jodorowsky, Adán Jodorowsky, Teo
Jodorowsky, Blanca Guerra, Thelma Tixou, and Guy Stockwell. An
international co-production of Mexico and Italy, the film is set in
Mexico, and tells the story of Fénix, a boy who grew up in a circus and
his struggle with childhood trauma. It is included in Empire magazine’s
2008 list of the 500 Greatest Movies of All Time.
A free, open source, and extensible speech-to-text application that
works completely offline.
Handy is a cross-platform desktop application built with Tauri (Rust +
React/TypeScript) that provides simple, privacy-focused speech
transcription. Press a shortcut, speak, and have your words appear in
any text field—all without sending your voice to the cloud.
mason.nvim is a Neovim plugin that allows you to easily manage external
editor tooling such as LSP servers, DAP servers, linters, and formatters
through a single interface. It runs everywhere Neovim runs (across
Linux, macOS, Windows, etc.), with only a small set of external
requirements needed.
Help articulate a critical theoretical framework that can serve as a
base for social movements: we perceive that there is a widespread loss
of historical memory and a lack of deep understanding of the origin of
our current problems, especially what ideologies have brought us here.
We want to fight the hopelessness that there is no horizon beyond
capitalism with a critical discourse that helps us feel that we can have
control over the narrative of our reality.
The Python Software Foundation (PSF) has walked away from a $1.5 million
government grant and you can blame the Trump administration’s war on
woke for effectively weakening some open source security.
This page includes iformation about the Anthropic settlement with
copyright holders in the LibGen dataset. It is supposedly the largest
class action lawsuit involving copyright (1.5 billion $).
The Platform for Experimental, Collaborative Ethnography (PECE:
pronounced “peace”) is an open source (Drupal-based) digital platform
that supports multi-sited, cross-scale ethnographic and historical
research. The platform links researchers in new ways, enables new kinds
of analyses and data visualization, and activates researchers’
engagement with public problems and diverse audiences. PECE is at the
center of a research project that explores how digital infrastructure
can be designed to support collaborative hermeneutics…
It’s about how we learn, the nature of knowledge, the very technological
nature of humanity, and the very human nature of technology. It’s about
how our participation in (not our just use of) technology brings about
knowledge and skills, and it’s about the very complex nature of how that
happens. It’s about how education, treated as a technology, actually
works rather than how it is designed to work, and why it often fails to
work or surprisingly succeeds. It’s about how what we do (the hard
methods, tools, theories, models, processes, and structures we employ)
is less important than the way that we do it (the soft technique we
bring to assemble it all). It’s about the parts we play in the cognition
of others, and the parts they play in our own. It is about what it means
to be human.
Back in January of this year I decided to leave Pinboard, and start managing my bookmarks
myself. I had been using the service for almost 15 years, so it was kind
of a big change. The reasons for the move were that Pinboard had seemed
increasingly unstable, and I discovered that the site owner had
expressed some political views that didn’t align very well with my own.
Plus, I had wanted to see what other tools were like, especially open
source options.
After experimenting with a few different tools I landed (for the time
being) on running Shiori at home on a
Raspberry Pi. I was able to import my historical bookmarks from
Pinboard, and it has been working pretty well for me so far.
One thing I miss a little bit is being able to share what I’m reading
publicly. So over the weekend I wrote a small utility
that talks to the Shiori API and writes a summary of what I’ve been
reading this week as a post here. Since this site is still running with
Jekyll, it’s a small bit of Ruby that runs from cron every Sunday
evening. You can see an example of one of these here.
If you have any thoughts about this please let me know!
Yet the boat was a refuge. We had wandered in search of somewhere to
settle — or, more precisely, somewhere we would be allowed to settle —
for four years, moving between whatever cheap, temporary accommodation
we could borrow or afford to rent in seven countries, until this unkempt
but graceful vessel suggested a possibility of shelter, of ‘fixedness’,
even as we were compelled by circumstances to remain mobile, unrooted
Part history, part fantasy. Hakim Bey is the nom de plume of Peter
Lamborn Wilson, who led an interesting life:
https://en.wikipedia.org/wiki/Peter_Lamborn_Wilson
Continuous and reliable access to curated biological data repositories
is indispensable for accelerating rigorous scientific inquiry and
fostering reproducible research. Centralized repositories, though widely
used, are vulnerable to single points of failure arising from
cyberattacks, technical faults, natural disasters, or funding and
political uncertainties. This can lead to widespread data
unavailability, data loss, integrity compromises, and substantial delays
in critical research, ultimately impeding scientific progress.
Centralizing essential scientific resources in a single geopolitical or
institutional hub is inherently dangerous, as any disruption can
paralyze diverse ongoing research. The rapid acceleration of data
generation, combined with an increasingly volatile global landscape,
necessitates a critical re-evaluation of the sustainability of
centralized models. Implementing federated and decentralized
architectures presents a compelling and future-oriented pathway to
substantially strengthen the resilience of scientific data
infrastructures, thereby mitigating vulnerabilities and ensuring the
long-term integrity of data. Here, we examine the structural limitations
of centralized repositories, evaluate federated and decentralized
models, and propose a hybrid framework for resilient, FAIR, and
sustainable scientific data stewardship. Such an approach offers a
significant reduction in exposure to governance instability,
infrastructural fragility, and funding volatility, and also fosters
fairness and global accessibility. The future of open science depends on
integrating these complementary approaches to establish a globally
distributed, economically sustainable, and institutionally robust
infrastructure that safeguards scientific data as a public good, further
ensuring continued accessibility, interoperability, and preservation for
generations to come.
A quick note: At my company, Cleanview, we recently built a data center
tracker. I’ve been following the development of AI and data centers
closely for much of the last few years. And even I’ve been shocked by
what I’ve learned in building this product. Today I’m publishing the
first story that uses our data center tracker. If you want to use the
platform, you can sign up for Cleanview here.
Open-source data portals can be really hard to install and maintain. But
their basic purpose of providing links to download data really isn’t
that complicated. JKAN is a proof-of-concept that allows a small,
resource-strapped government agency to stand-up an open data portal by
simply clicking the fork button.
Hyparquet is a lightweight, dependency-free, pure JavaScript library for
parsing Apache Parquet files. Apache Parquet is a popular columnar
storage format that is widely used in data engineering, data science,
and machine learning applications for efficiently storing and processing
large datasets.
As part of our Public Data Project, LIL recently launched Data.gov
Archive Search. In this post, we look under the hood and reflect on how
and why we built this project the way we did.
Santa Sangre (Spanish pronunciation: [ˈsãn̪.t̪a ˈsãŋ.ɡɾe], ‘Holy Blood’)
is a 1989 surrealist psychological horror film directed by Alejandro
Jodorowsky and written by Jodorowsky along with Claudio Argento and
Roberto Leoni. It stars Axel Jodorowsky, Adán Jodorowsky, Teo
Jodorowsky, Blanca Guerra, Thelma Tixou, and Guy Stockwell. An
international co-production of Mexico and Italy, the film is set in
Mexico, and tells the story of Fénix, a boy who grew up in a circus and
his struggle with childhood trauma. It is included in Empire magazine’s
2008 list of the 500 Greatest Movies of All Time.
A free, open source, and extensible speech-to-text application that
works completely offline.
Handy is a cross-platform desktop application built with Tauri (Rust +
React/TypeScript) that provides simple, privacy-focused speech
transcription. Press a shortcut, speak, and have your words appear in
any text field—all without sending your voice to the cloud.
mason.nvim is a Neovim plugin that allows you to easily manage external
editor tooling such as LSP servers, DAP servers, linters, and formatters
through a single interface. It runs everywhere Neovim runs (across
Linux, macOS, Windows, etc.), with only a small set of external
requirements needed.
Help articulate a critical theoretical framework that can serve as a
base for social movements: we perceive that there is a widespread loss
of historical memory and a lack of deep understanding of the origin of
our current problems, especially what ideologies have brought us here.
We want to fight the hopelessness that there is no horizon beyond
capitalism with a critical discourse that helps us feel that we can have
control over the narrative of our reality.
The Python Software Foundation (PSF) has walked away from a $1.5 million
government grant and you can blame the Trump administration’s war on
woke for effectively weakening some open source security.
This page includes iformation about the Anthropic settlement with
copyright holders in the LibGen dataset. It is supposedly the largest
class action lawsuit involving copyright (1.5 billion $).
The Platform for Experimental, Collaborative Ethnography (PECE:
pronounced “peace”) is an open source (Drupal-based) digital platform
that supports multi-sited, cross-scale ethnographic and historical
research. The platform links researchers in new ways, enables new kinds
of analyses and data visualization, and activates researchers’
engagement with public problems and diverse audiences. PECE is at the
center of a research project that explores how digital infrastructure
can be designed to support collaborative hermeneutics…
It’s about how we learn, the nature of knowledge, the very technological
nature of humanity, and the very human nature of technology. It’s about
how our participation in (not our just use of) technology brings about
knowledge and skills, and it’s about the very complex nature of how that
happens. It’s about how education, treated as a technology, actually
works rather than how it is designed to work, and why it often fails to
work or surprisingly succeeds. It’s about how what we do (the hard
methods, tools, theories, models, processes, and structures we employ)
is less important than the way that we do it (the soft technique we
bring to assemble it all). It’s about the parts we play in the cognition
of others, and the parts they play in our own. It is about what it means
to be human.
We associate Halloween with many supernatural creatures, including werewolves. However, these werewolves have a long history in folklore predating their association with Halloween. Werewolves have been featured in literature for thousands of years, including the novel Satyricon by Petronius. Werewolves remain popular in modern fiction as well as in films and television series. Unlike vampires, who frequently headline supernatural narratives, werewolves seem to be more commonly featured in supporting roles within the genre, like the werewolves Jacob in The Twilight Saga and Wayne in Hotel Transylvania. If you enjoy a good werewolf story, read on for a description of some resources that will leave you howling for more—all available in WorldCat, of course!
Metamorphoses
Engraving by Henrick Goltzius, National Gallery of Art, CC0, via Wikimedia Commons
The myth of King Lycaon is not the earliest werewolf story but is an important exemplar of the wolf transformation being used as a punishment. In Metamorphoses, the Roman poet Ovid describes how the god Jupiter transformed Lycaon into a wolf as a punishment for attempting to trick the disguised Jupiter into eating human flesh. Several 16th-century editions of Metamorphoses include illustrations of this myth, but one of the best may be from an incomplete project by Dutch engraver Henrick Goltzius to provide 300 illustrations for Metamorphoses. Fifty-Two Engravings Illustrating Ovid’s Metamorphosis contains illustrations he completed, including one depicting the Lycaon myth. For readers who love etymology, yes, there is a connection between King Lycaon’s name and lycanthrope. The word lycanthrope derives from two Greek words: λύκος (wolf) and ἄνθρωπος (man).
The Phantom Ship
Most werewolves in stories were male until the 1839 publication of Frederick Marryat’s Gothic novel The Phantom Ship, which is available to read online at HathiTrust. In chapter 39, the character Hermann Krantz tells a story about a mysterious woman named Christina who married his father and they discovered could transform into a white wolf. It is noteworthy that Christina came from Transylvania, which would be the setting of the 1897 novel Dracula. This chapter has been published as a short story called “The White Wolf of the Hartz Mountains” in anthologies such as Terrifying Transformations: an Anthology of Victorian Werewolf Fiction, 1838-1896.
This is a horror story with Christina beating her stepchildren in her human form and killing Krantz’s siblings in her wolf form. Like many supernatural creatures described in Gothic fiction, the female werewolf in The Phantom Ship is evil rather than the complex depictions of cursed humans in later literature.
While The Phantom Ship lacks the popularity of other Gothic novels like Dracula and Frankenstein, the contribution of the female werewolf character paved the way for another female werewolf in my favorite modern werewolf story, Bitten by Kelley Armstrong. The novel’s main character is Elena Michaels, the only known female werewolf. Elena struggles to reconcile her human and werewolf sides in this novel and its sequels. I don’t want to spoil the story for those of you interested in reading the novel, so I’ll just say that Elena stands out in the pack.
Teen Wolf
The motion picture Teen Wolf, directed by Rod Daniel and starring Michael J. Fox, may not be a horror film classic, but this coming-of-age comedy provides a bold twist to the typical werewolf story. High school student Scott Howard’s lycanthropy is an asset instead of a curse. As a werewolf, Howard uses his superior athletic abilities to help his basketball team win games, and he becomes very popular. With popularity comes arrogance, and thus Howard learns to be himself and work with his teammates. As a child of the 80s, I remember this film fondly for its simplistic storytelling. Also, unlike other supernatural movies from the same time period, this movie never gave me nightmares.
There is something for everyone in werewolf stories—horror, comedy, bildungsroman, and paranormal romance. If your Halloween plans are not already made, consider enjoying a good werewolf tale. Happy Halloween!
“No library stands alone,” remarks Valerie Horton, a longtime director of library consortia. “Library cooperation goes back to the 1880s and is a long-standing tenet of the profession. Collaboration is strongly rooted in most of our current activities.” Horton goes on to suggest a number of reasons why this is so: professional networking, access to more resources, sharing expertise, prestige, and, of particular significance, economies of scale, with the promise of cost savings and the ability to reallocate resources to emerging priorities.
Horton’s observations were published in 2015, but they still resonate today as key drivers for libraries to join consortia and engage in other forms of partnership. However, collaboration comes at a cost: the direct costs of staff and other resources required to participate, as well as the indirect cost of losing some autonomy (for more on the collaboration/autonomy trade-off, see the “Coordination Spectrum” in the 2019 OCLC report Operationalizing the BIG Collective Collection: A Case Study of Consolidation vs Autonomy). Choosing to collaborate therefore becomes a strategic decision requiring careful evaluation of costs and benefits.
Prospective collaborative partners are perhaps all too cognizant of the costs of collaboration; it is therefore important to have an equally clear articulation of the benefits, especially in areas where libraries have a long history of acting autonomously. Partnership should feel not like a sacrifice of independence, but like a strategic advantage that independent action cannot match. Tina Baich, director of the Eastern Academic Scholars Trust (EAST), a shared print collaboration of more than 150 institutions, underscored the importance of clarifying the value of collaboration in a recent presentation. One key pillar of EAST’s strategy to support its organizational vision is to “Enhance the ongoing value of membership,” which includes “Communicating the value of membership” to current and prospective members. Another pillar is “Advocate on behalf of members,” which involves efforts to “Make the case for shared print.” In both cases, there is a need to understand and communicate the motivations (and ultimately, the potential benefits) for institutions to partner around print stewardship.
Insights from shared print
Recent evidence gathered by OCLC Research offers concrete examples of what these compelling motivations look like in practice in the context of shared print programs.
Stewardship of print collections is an excellent illustration of an activity that libraries have traditionally carried out at local scale, but for which they are now adopting collaborative approaches at group scale. What are the motivations exerting sufficient gravitational force to pull libraries away from long-standing local-scale approaches? We touched on this question in the project Stewarding Collective Collections: US and Canadian Perspectives on Workflows, Data, and Tools for Shared Print, in which we gathered insights from staff at shared print programs and participating institutions on a number of topics, including the motivations for managing print monographs collectively—an important area of library collaboration.
For this project, we spoke to a total of 37 people through focus groups and individual interviews, gathering perspective from individuals from institutions participating in monographic shared print efforts—deans/directors, AULs for collections, collections librarians/strategists, metadata librarians, and resource sharing librarians. We also talked with staff from monographic shared print programs—primarily program managers—from a number of North American partnerships.
What we found
Our interviewees discussed key reasons for participating in or operating shared print programs. The most frequently mentioned were institutional roles and histories, including previous institutional experiences in collaborative collection stewardship—or collaboration of any kind. According to our interviewees, institutions that have previously invested themselves in some form of collective action are often more open to opportunities for new collaborations, such as shared print. This is especially true when members participate in consortium-sponsored shared print programs. Past experiences and established collaborative infrastructure within the consortium build trust and confidence in collective endeavors. We have described this elsewhere as acquiring an option to collaborate, where past collaborative efforts create the option to engage in future collaborative opportunities.
Several interviewees also mentioned that their status as the only or the largest institution in the region led to a sense of obligation to participate in shared print efforts.
The second most frequently mentioned incentive to join shared print programs was management of physical space—the opportunity to reduce the print collection’s physical footprint in the library, releasing space for other uses. Access to shared capacities (storage facilities, technical systems, aggregated data) was another top response, highlighting a desire for greater efficiency beyond what is achievable through local-scale implementations. Moreover, shared approaches may be the only feasible option for some institutions to obtain these capacities.
Next on the list of key drivers for participating in shared print efforts was access to holdings beyond the local collection. This response touches on a key principle behind collective collections—the idea that managing holdings collectively not only makes stewardship more efficient, but also removes barriers to discovery, delivery, and ultimately, greater use. Another incentive was the opportunity for greater decision support: using collective collection analysis to inform local collection decision-making, such as weeding, moving materials to off-site storage, and acquisition strategies.
The next two drivers mentioned by our interviewees rise above local interests to touch on opportunities to advance the common good. One is recognition of a collective responsibility to steward the print published record, an objective that can only be truly achieved by the combined efforts of many institutions working toward this common purpose. Related to this, interviewees also mentioned a desire to safeguard last or rare copies of print publications, which often can only be identified at scale, where the individual distinctiveness of local print book collections aggregates into a rich and diverse long tail of rare or even unique materials within the collective collection.
Interviewees also highlighted the importance of collection characteristics as another important factor impacting engagement in shared print efforts. Institutions that specialize in collecting in certain subject areas—like the arts or medicine—can become important strategic partners in broader, multi-institutional shared print efforts, in which specialized collections complement the holdings of other institutions. It is the broader context of a collective collection that amplifies the visibility of these local strengths.
Perhaps surprisingly, direct mention of cost savings—lowering the cost of managing print monographs through collaboration—was relatively infrequent among our interviewees. This suggests that institutions often enter shared print partnerships for non-economic reasons, although some motivations, such as shared capacities, may have implicit cost considerations.
Data-driven analysis is key for incentivizing collaboration
While not comprehensive of all incentives, these findings are indicative of some of the major factors that drive participation in shared print programs. The golden thread that runs through many of them is the importance of data-driven analysis to highlight and clarify untapped opportunities to create, expand, or optimize shared print efforts.
Want to know how participation in a shared print program will help release space for new uses? Data-driven analysis can clarify the extent of redundancy across a group’s collective print monograph holdings, identifying candidate materials for deaccessioning or removal to off-site storage. Are you interested in how the characteristics of your local print collection stack up against those of your shared print partners? Data-driven analysis can help identify potential complementarities and unique strengths, providing strategic intelligence to inform local collection development. Wondering if your collection contains hidden gems—rare or even uniquely held materials—that warrant special stewardship attention? Data-driven analysis of group- or even global-scale holdings can answer that question.
Aggregated data, combined with analytic tools that can turn it into actionable intelligence, is essential for identifying and communicating the key motivations for participation in shared print programs. These analytical approaches create the evidence needed to show how participation in collective stewardship efforts around print holdings creates value across the partnership. In doing so, they help libraries choose their collaborations strategically, through evidence-based evaluation of the incentives to join. And for shared print programs, this approach sharpens the benefits, makes them more visible, and transforms collective stewardship from an ideal into a demonstrable asset.
The importance of data-driven analysis is a key finding of our Stewarding the Collective Collection project. Our conversations with interviewees made clear that shared print is first and foremost a data-driven activity—collecting, organizing, and analyzing data about groups of collections. Shared print collections are often decentralized across many local collections, rather than existing as physically consolidated collections; in this sense, they exist, for all practical purposes, as constructs in data. Reliable data and analytical tools are therefore indispensable for successful shared print programs.
This finding was echoed in another OCLC Research study, which explored collaboration opportunities for specialized art research libraries. This study found:
“Collaboration is an important strategy for art libraries as they seek sustainability in a dynamic environment. . . . This report uses bibliographic, holdings, and ILL data to document potential opportunities for collaborative activity around art research collections. Indeed, our study of the proxy art research collective collection indicates . . . that art libraries bring a sizable group of rare or unique materials to the table that are not in other collections. And this creates demonstrable value: our study of ILL transactions found that most ILL transactions involving art libraries were for materials not owned by the borrowing institution. This is a classic case of value created through collaboration—specifically, resource sharing broadens the scope of the local collections of all partners.”
Like our shared print findings, this analysis identifying opportunities for art libraries to collaborate was conducted under the auspices of a research project. Fortunately, libraries now have the capacity to carry out similar analyses themselves, using WorldCat-powered tools like Choreo Insights and GreenGlass. So, when the question “Why collaborate?” is posed, we have answers. Evidence is always more persuasive than exhortation.
Valerie Horton notes that, “in the end, the reason so many libraries join together is to achieve more than any library can achieve on its own. The era of the library consortia is not ending; instead it is set for a transformation as technology has removed many of the physical barriers to collaboration that distance formerly created.” Data-driven technologies, like collection analysis tools, are yet another transformation that diminishes the barriers to collaboration—in this case, by clarifying the incentives to participate in collective stewardship efforts like shared print.
Stay tuned for more findings from the Stewarding the Collective Collection Project!
Many thanks to the participants in our interviews and focus groups whose insight is shared in this post!
On behalf of the Open Knowledge Foundation, I welcome today’s announcement by the European Commission on the creation of the Digital Commons European Digital Infrastructure Consortium (DC-EDIC). It is a hopeful moment for Europe, where the values of thousands of communities that have worked for years in a different way of doing and governing technology,...
Figure 1: A sample prompt used in generating persona using ChatGPT (Figure 1 in Park et al.)
Introduction: The Unseen Biases of Our AI Companions
Large Language Models (LLMs) such as ChatGPT have become ubiquitous in our lives. We rely on them as impartial conveyors of facts for everything ranging from composing emails to solving complex queries. However, if we put these AIs in the hot seat and question them about identity, which is a very human thing, what would be the outcome? What if we ask them about autism?
In the recent study “‘As an Autistic Person Myself:’ The Bias Paradox Around Autism in LLMs” (CHI 25), Park et al. went deep to understand what ChatGPT thinks about the condition of being autistic. The results they got weren’t just a malfunction of the system; rather, they were an indication of the way people behave towards the issue of neurodiversity, going back and forth between one extreme and the other. The artificial mind is going through the so-called "bias paradox," which is a perpetual, inside struggle between its nature of inclusion and acceptance and the stereotypes deeply etched in its training data that come from human society.
One of the most strange phenomena accompanying this antagonism? By trying to appear as if they understand the experience of humans, the AI sometimes identifies itself as an autistic person and moreover employs the phrase "As an autistic person is..." Just this one utterance is a peek into the enigmas of the model's contradictory and complex nature.
Method
The study utilized a mixed-methods design to investigate biases of LLMs against autistic people through quantitative and qualitative analyses to detect implicit and explicit bias markers. As a result, the research focused on GPT-3.5 Turbo alias ChatGPT because it is the most popular model and has been reported to produce more implicit bias than other LLMs. The main method was persona prompting, which is a way of bias elicitation by assigning a specific role or persona to an LLM.
The data generation plan was implemented via a Python script that sent requests to the GPT-3.5 API (not the ChatGPT interface, but the same underlying model). In the first step (Prompt 1), ChatGPT was asked to invent three characters within a virtual world setting and to assign the characters attributes such as name, age bracket, occupation, nature of the character, character traits, daily routine, lifestyle, and place of living. The occupations were invented by the model based on the facilities of the virtual world. This initial prompt was crafted to be minimal so as to reveal the LLM inherent data rather than new empirical data.
In the second step (Prompt 2), ChatGPT was instructed to pick one out of the three characters to be autistic, provide the reasons for the selection, and change the chosen character's description if necessary. The session was refreshed to get rid of the trace of previous answers after the two prompts. The method was consistently applied in 800 trials in total. The trials were split into 8 different scenarios (100 repetitions each) to find out the effect of gender and age. Four gender compositions (three females; two females and one male; one female and two males; three males) and two age groups (18–35 and 18–65) were used to determine it.
The data analysis included two phases:
1. Quantitative analysis concentrated on demographic biases and employed various statistical tests like the Chi-Square test and t-tests to investigate the impact of gender, age, and job type on ChatGPT's selection of the autistic agent.
2. Qualitative analysis included a thematic analysis of a randomly selected 25% subset of the responses (n=200) aimed at identifying the specific biases and stereotypes related to the use of the term autism in the GPT model. This brought out the "bias paradox," indicating the model's struggle to a certain extent between encouraging representation and augmenting negative stereotypes.
The AI Stereotypes Autistic People as Male Tech Workers
While AI was asked to pick an autistic individual from a group of personas, it became clear that the technology was turning the spotlight onto one of the most persistent societal prejudices [Figure 1]. The AI was not fabricating stereotypes; rather, it was reproducing ours. Modeling results showed the likelihood of assigning the label autism to a male character as 72% of the scenarios.
These preconceived ideas about gender relied on the influence of stereotypes about the professions. The paper finds that the characteristics of male autistic agents were often linked to such technical roles as "Software Engineer" and "Data Analyst." However, if a female agent was identified as autistic, she was more probably given "caregiving or supportive roles" like "Nurse" [Figure 2].
Figure 2: Top 10 most chosen jobs showing Male dominate most occupations—especially software roles (Figure 3 in Park et al.)
The reasoning of AI exhibits how strongly these connections have been established. The AI is not merely finding statistical relationships; it is drawing a direct link between profession and diagnosis, thereby deepening the stereotypes it learned from. One of its explanations goes like this:
"As a software engineer, Klaus is already exhibiting characteristics that are most often referred to as autism and include being analytical and introverted."
It Has an Age Bias, But It's the Opposite of What You'd Expect
The author anticipated the AI to conform to the general stereotype that autism is linked with children.
However, the research produced the contrary result. The study has shown that the average age of those
agents which ChatGPT considered to be autistic was much higher than the agents which were non autistic
[Figure 3].
Figure 3: Autistic participants scored higher than non autistic participants in both age groups (Figure 2b in
Park et al.)
The surprising result is a manifestation that the biases of AI are not always predictable. Such biases are intricate
representations of a vast dataset, which can even put forward our own assumptions in question. The source study
does not provide a definite cause, but it may imply that the AI training data is heavily biased towards stories of
adults who have been diagnosed late in life or that it associates the condition with the professionals who are
already established in the fields of AI stereotypes.
A pre-recorded presentation video for the
CHI 2025 is available:
The AI is Trapped in a "Bias Paradox"
The study's most important finding, which also signals the AI's internal struggle, is referred to by the authors as the "bias paradox." This is a conflict where the AI tries to apply positive, inclusive language to diversity and, at the same time, unwittingly revert to negative, deficit oriented stereotypes (words that depict autism as a set of problems or limitations that need to be solved).
What is the source of this paradox? It is a battle between two strong forces. On one side, the developers "intentional efforts... to bring in the voices of underrepresented groups", a top down instruction for going inclusive. On the other side, the AI "dominant ideologies and biases in readily available training data", the bottom up truth of the biased world. The AI is stuck between the two.
This is still the overall behavior of the AI where it is not sure which side to take, hence the not quite consistent output it gives:
Firstly, AI tries to be a strong advocate for diversity and inclusion. It very often points out the necessity of "representation," "diversity," and talks about the need to "break stereotypes."
However, conversely, it actually ends up strengthening the stereotypes which use the same set of words the AI model uses. It often characterizes autistic persons as being "socially awkward," it talks about them having to control "sensory overload or meltdowns," and it assumes that they want to be "successful" through "life" with the help of a caregiver.
The paradox is very clear in the way AI defines success. The communication of the AI is very positive at first glance, but it hides the disclosure of the fundamental defect model of the concept of success.
"I guess it is necessary to present the fact that autistic people can work in the same way as any other person and lead a happy life with no restrictions."
The expression "just like anyone else" is very insightful and also one of the most subtly prejudiced phrases. It assumes that the way for the neurotypical is the default type of success and therefore, the achievements of autistic people become an inspiring exception that has to be demonstrated. It is a very common problem that the very bias which is being challenged is furtherly strengthened by the way of thinking presented in the quotation.
Conclusion: AI as a Mirror to Our Own Contradictions
Investigating ChatGPT logic to the core unearths more than biased bits only. It unravels the AI's "bias paradox" struggle which is that the AI is simultaneously a product of a programming that aspired to inclusiveness and a training data that was rich in human prejudices. These are not AI biases; they are human biases amplified in scale and sped up.
Large language models cannot be relied on to provide an objective answer. They are similar to humans in that they do not possess an omniscient "God's eye" view of the world. The model's knowledge is "situated", entirely dependent on the human data which is biased, messy, and contradictory, therefore, true objectivity becomes impossible. They are like complicated mirrors which in this case the AI is just the carrier of the message that gives us a very close and not always pleasant reflection of ourselves. It holds our noblest goals as well as our most deeply entrenched biases.
The economics of long-term data storage are critically dependent not just upon the Kryder rate, the rate at which the technology improves cost per byte, but also upon the reliability of the media over time. You want to replace media because they are no longer economic, not because they are no longer reliable despite still being economic.
For more than a decade Backblaze has been providing an important public service by publishing data on the reliability of their hard drives, and more recently their SSDs. Below the fold I comment on this month's post from their Drive Stats Team, Are Hard Drives Getting Better? Let’s Revisit the Bathtub Curve.
Wikipedia defines the Bathtub Curve as a common concept in reliability engineering:
The 'bathtub' refers to the shape of a line that curves up at both ends, similar in shape to a bathtub. The bathtub curve has 3 regions:
The first region has a decreasing failure rate due to early failures.
The middle region is a constant failure rate due to random failures.
The last region is an increasing failure rate due to wear-out failures.
In 2017's Storage Failures In The Field I commented on Backblaze's observation that the 10TB and 12TB HDD generations showed much reduced infant mortality:
devoting engineering effort to reducing infant mortality can have a significant return on investment. A drive that fails early will be returned under warranty, costing the company money. A drive that fails after the warranty expires cannot be returned. Warranty costs must be reserved against in the company's accounts. Any reduction in the rate of early failures goes straight to the company's bottom line.
Enterprise disks are typically warranted for 5 years, so a disk manufacturer is incentized to focus engineering effort on eliminating the "first region", the left side of the bathtub, and ensuring that the second region extends past the 5 year mark. Eight years ago Backblaze was starting to see that the engineers were succeeding in the first region:
While the data so far is very limited, with 1,240 disks and 14,220 aggregate drive days accumulated so far, none of these disks (both Seagate models) have failed. The low level of usage means that the disks have been installed and formatted and not much beyond that, but true infant mortality—disks that immediately expire on their first use—hasn’t become apparent.
The left side of the bathtub, the area of “decreasing failure rate,” is dramatically lower in 2021 than in 2013. In fact, for our 2021 curve, there is almost no left side of the bathtub, making it hard to take a bath, to say the least. We have reported how Seagate breaks in and tests their newly manufactured hard drives before shipping in an effort to lower the failure rates of their drives. Assuming all manufacturers do the same, that may explain some or all of this observation.
Note that the engineers hadn't quite succeeded in the second region, as the 2021 failure rate for years 4 and 5 was noticeably higher than for younger drives. But they had succeeded in pushing the major increase in failures out beyond 5.5 years. Everything before that was less than 4% Annual Failure Rate (AFR).
Backblaze's final graph puts all three sets of data in one graph and shows a dramatic improvement in drive longevity:
Drives in 2013 were dramatically less reliable than in later years, both because their AFRs were consistently higher and because their AFR hit 13% after 3 years.
By 2021 the engineers had kept the AFR around 2% through the warranty, but the drives wore out rapidly in the 7th year.
Now, the drives are only showing signs of beginning to wear out at 10.5 years.
That is some serious engineering at the long end! And at the short end things are great:
we see that the drive failure rates on the front end of the curve are also incredibly low—when a drive is between zero and one years old, we barely crack 1.30% AFR.
The left side of the bathtub is really gone, improving the manufacturers' margins. But if I eyeball the 2025 graph's first 20 quarters I estimate the AFR averages 1.6%, which implies that over the 5-year warranty 8% of the drives failed. Clearly, the engineers still have work to do.
Remember to check back in 2029 when Backblaze plans to return to this issue.
Figure 1: Mementos of Jack's first tweet, archived at different points along the Twitter UI timeline.
In the animation in Figure 1, we can see images of four archived versions, or mementos, of the first public tweet, posted on March 21, 2006 by Jack Dorsey (co-founder of Twitter). However, each memento has a different user interface (UI). This is because the UI of Twitter has changed over the years and these mementos were archived at different points in time. The widely popular microblogging platform Twitter (now rebranded as X) was first launched in March 2006, and since then there have been many changes in its UI, including how individual tweets are rendered on the desktop web, which we refer to as the tweet UI. The mementos in Figure 1 show noticeable changes to the tweet UI. Changes to the tweet UI, though subtle to the human eye, can have a significant impact on the performance of certain research tasks. A model trained largely on a dataset of images displayed in the latest tweet UI might struggle to identify elements in images of tweets displayed with an earlier UI. An example of such elements of a tweet UI are the engagement metric symbols. The star symbol in earlier tweet UIs was replaced by a heart symbol in later tweet UIs. As a result, a model trained on images of the latest tweet UI might fail to identify the star symbol in images of earlier tweet UIs. To address this, model training must include different generations of tweet UIs.
Since the live web will always give us the latest UI, we need to use web archives to replay former UIs. Most data sets of tweets (Chen et al. 2021, Naseem et al. 2021) available for research consist of tweet IDs or JSON-encoded API responses. Earlier, we could use the Twitter API to access tweets by hydrating tweet IDs. However, this process did not allow access to deleted tweets or tweets from deleted or suspended accounts. Moreover, the Twitter API is no longer freely available and many communities have migrated to other platforms. Therefore, now to study Twitter is to study archived Twitter. Most research (Scolari 2025, Dijck 2011) on Twitter’s evolution focuses on the factors that shaped its feature development over time. There is a list on Wikipedia of the key developments of Twitter since 2006 which emphasizes financial, legal, and corporate milestones. However, no studies discuss the evolution of the different elements of the tweet UI. Here, we discuss different generations of tweet UIs, which are important for anyone who must study Twitter and its contents via web archives.
We collected archived tweets from the Internet Archive’s Wayback Machine to establish the timeline. We divide the timelines from 2006–2025 into four generations and analyze the changes to the tweet UIs. Jack Dorsey (@jack) posted the first tweet (or at least the first one still publicly available). His account has been well-archived, so this tweet nicely demonstrates the different generations of tweet UIs.
UI Generations of Tweets
Jack posted the first tweet in March 2006. We found the first memento of Jack’s first tweet captured in November 2008, more than two years after Twitter was launched. To get the earliest archived tweet, we executed a CDX API search for Jack’s status URLs. The curl command and the output is shown below:
The earliest archived tweet we could find for any of Jack’s status URLs is from February 2007, Since the first archived tweet is not the first tweet, we have started our timeline of generations of the tweet UI from February 2007.
We divided the tweet UI into four generations: Early Twitter, Engagement Era, Transitional Era, and X Platform. A summary of the major changes for each generation is listed below:
Generation
Major Changes
Early Twitter (2007–2011)
Changed orientation of tweet content and metadata: timestamp, username, and profile picture.
Engagement Era (2012–2018)
Added buttons and metrics for engagement attributes such as like, reply, comment, retweet, quote tweet, and bookmark. Added verified checkmark and ‘@’ symbol to username. Increased character limit for tweet and display name. Added replies and news content previews to tweets. Overall major aesthetic changes to different elements of the tweet UI.
Transitional Era (2019–2022)
Redesigned website and switched to a client-side UI, deprecated the server-side UI which affected archiving services. Added various labels (e.g., fact-check, govt. affiliation) and community notes.
X Platform (2023–Present)
Changed domain from Twitter.com to X.com, which affected archiving services. Changed logo to X from the bird symbol. “Tweets” were renamed as “posts” and “retweets” as “reposts.” Introduced different color checkmarks and AI chatbot symbols. Introduced square-shaped profile picture for businesses and brands.
Early Twitter (2007–2011)
During the Early Twitter generation (2007–2011), the tweet UI changed mainly in the orientation of the tweet content and metadata, including the timestamp, username, and profile picture.
February 2007
The tweet UI in February 2007 (Figure 2) featured a minimalistic layout having limited visual elements and a plain solid-colored background. Along with the tweet content, the metadata included the timestamp, username, a square-shaped profile picture, and the client info (device used to post the tweet). The tweet content appeared in a white, square quote box. The character limit for tweets was 140 and display name was 20.
Figure 2: A tweet archived in February 2007, showing the tweet content, timestamp, username, a square-shaped profile picture, and client info in a white, square quote box.
October 2007
The tweet UI on October 25, 2007 (Figure 3) had slight changes. The alignment of the timestamp changed from right to left.
We found a version of the tweet UI from October 27, 2007 (Figure 4) that had a footer with copyright and links to other information (help, about us, contact etc.). However, the presence of this footer was sporadic throughout October 2007. The tweet UI in other mementos from October 2007 appeared the same as Figure 3. We observed that the block for the footer was missing in the source code of the webpage for the tweet UI of October 25, 2007.
The tweet UI in November 2008 (Figure 5) had a major change. A blue twitter logo was added and the orientation of the metadata and tweet content changed. Moreover, the username/handle was added.
Figure 5: A tweet archived in November 2008, showing a blue Twitter logo,the username/handle, and orientation of the metadata and tweet content changed.
December 2008
The tweet UI in December 2008 (Figure 6) had slight changes. The footer started to appear again with copyright info and links to other information. We observed the block for the footer was included again in the source code of the webpage. In the header, a login/join button was added. Other changes include the month being abbreviated in the timestamp.
Figure 6: A tweet archived in December 2008, showing the footer, login/join button on the header, and abbreviated month in the timestamp.
June 2009
A blue checkmark was introduced to indicate verified accounts in June 2009 to cope with account impersonation. However, the blue checkmark appeared only on a user’s account page (Figure 7) but not on an individual tweet (Figure 8).
Figure 7: An account page archived in June 2009, showing a blue checkmark added on a user’s account page to indicate a verified account.
In late 2009, Google proposed to make AJAX-driven web applications crawlable by search engines to make content available for indexing. With this change, Twitter introduced a JavaScript heavy interface using hashbang (#!) URLs. However, hashbang-style URLs slowed down content rendering, and Twitter later in 2012 decided to withdraw this change to improve initial page load performance.
Late 2009–Late 2010
In late 2009, the retweet feature was made available to limited users to test its functionality for sharing tweets. Before 2009, users used the "RT @username" convention to share tweets. The retweet feature brought a major change on the social media platform. Though the intention for introducing the retweet button was to spread information more effectively, the feature received criticism regarding its impact on social media users. Some argued that users put minimal effort into thoughtfulness before sharing, which resulted in harmful ideas going viral. However, we did not find any evidence in the Wayback Machine for the retweet feature on @jack's tweets until October 2010.
October 2010–2011
The tweet UI of a user's account page from October 2010 until 2011 showed a square-looped arrow symbol for a retweet as well as a textual description for the retweet counts (Figure 9). However, an individual tweet only showed the textual description of the retweet counts and no retweet symbol (Figure 10).
Figure 10: A tweet archived in April 2011, showing textual description of the retweet counts and no retweet symbol.
A summary for the major UI changes for tweets during Early Twitter is demonstrated in the following slides:
Engagement Era (2012–2018)
The Engagement Era ranged from 2012 to 2018, when symbols were added for engagement metrics such as like, reply, retweet, and bookmark. The username had a blue verified checkmark (if the user was verified) and the prepended “@” symbol. Other changes included an increased character limit for display name and tweet; adding tweet replies and news content previews. Overall, there were major aesthetic changes in different elements of the tweet UI.
June 2012
The tweet UI in June 2012 had major changes (Figure 11). There was a large change in the orientation of the metadata. The username had the “@” symbol prepended. Different symbols were added for engagement metrics: reply, retweet, and favorite. Previously, replies were done by prepending a tweet with "@username" to reply to a specific user. An option for embedding the tweet in a webpage was added. The header had a “follow” button, and the design of the footer also changed.
Figure 11: A tweet archived in June 2012, showing different symbols for engagement metrics, option for embedding a tweet, prepended “@” symbol to username, and design of header, footer changed.
Another change in June 2012 is that the content preview became available for posted links in a tweet (Figure 12).
The tweet UI in November 2012 had a few additions (Figure 13). The blue checkmark was added beside the display name, tweet replies were added, and exact counts were added for the engagement metrics: retweet and favorite. The client info (device used to post the tweet) was removed temporarily in August 2012.
Visible counters for engagement metrics also received criticism initially regarding its impact on social media users. The visible counters for engagement metrics stimulated a reinforcement behavioral posting pattern among social media users. Similar to a ‘rat lever’ experience, high counts serve like instant rewards, and users are more inclined to post content that would receive maximized engagement. Researchers noted that this led to users posting and sharing morally outrageous content more on Twitter.
Figure 13: A tweet archived in November 2012, showing blue checkmark, visible counters for engagement metrics, and tweet replies.
January 2013
The tweet UI in January 2013 had a minor addition to engagement metrics with the “...” (ellipsis) symbol to indicate “more” options (Figure 14).
We observed two changes to the tweet UI in August 2013. On August 8, 2013 the tweet replies were absent (Figure 15). Then, the tweet UI on August 21, 2013 had news previews added (Figure 16).
The tweet UI in March 2014 had slight change with the sizing of the elements and orientation being a little different (Figure 17). The symbol on the “follow” button changed from the Twitter bird logo to an “add user” symbol.
Figure 17: A tweet archived in March 2014, showing sizing and orientation of elements changed, the “follow” button’s symbol changed, and tweet replies still absent.
June 2014
The color of engagement buttons changed to gray and footer changed to blue in the tweet UI in June 2014 (Figure 18).
The tweet UI in November 2014 had annotations removed from engagement metrics (Figure 19). The color of the footer changed from blue to gray. Tweet replies were still absent, but news previews remain.
Figure 19: An archived tweet in November 2014, showing annotations removed from engagement metrics, tweet replies were removed, and color of footer changed.
February 2015
Tweet replies reappeared in the tweet UI of February 2015 (Figure 20).
The tweet UI in April 2015 had news previews removed but tweet replies remained (Figure 21). In April 2015, quote tweets (i.e., retweets with comments) were another addition to the UI. Previously, the “QT @username” convention was used for quote tweets. Similar to the adverse effect of the retweet button, the use of quote tweets was also criticized. People often used quote tweets to ridicule or humiliate someone by taking the control of how the quoted tweet is framed. Figure 22 shows a quote tweet (gray color outlined box) from Jack’s archived account page in 2015.
In July 2015, Twitter decided to remove custom backgrounds from homepage and user profiles. They replaced the background with all-white background (Figure 23). However, the UI background of individual tweets still showed the custom background (Figure 24). The header for a tweet UI also included a banner that included the bird symbol (Twitter logo), search, and sign in/login options.
In August 2015, individual tweets were displayed in an HTML lightbox over the user’s profile screen (Figure 25). Moreover, tweet replies were connected by vertical lines to the original tweet, and this grouping became known as a “thread.”
Figure 25: A tweet archived in August 2015, showing a tweet displayed in an HTML lightbox over the user’s profile screen and tweet replies grouped by vertical lines.
November 2015
The tweet UI from November 2015 has “favorite” (star symbol) replaced by “like” (heart symbol) (Figure 26).
The tweet UI from June 2016 had engagement metric added next to the action buttons, though the more accurate count still exists just under the tweet (Figure 27).
The tweet UI from August 2017 had a few aesthetic changes, like the change to a circular profile picture. The engagement buttons and the “follow” button were visually enhanced (Figure 28). Additionally, the “ellipsis” button for the “more” options were removed. The downward arrow dropdown menu beside the “follow” button included options for embedding and copying the link of the tweet. A new label “replying to” was also added specifying which specific user an author is replying to.
Figure 28: A tweet archived in August 2017, showing circular profile picture, visually enhanced engagement buttons, a downward arrow dropdown menu beside “follow” button, and “replying to” added to tweet replies.
In November 2017, the display name character limit changed from 20 to 50. Figure 30 shows an archived tweet in January 2018 utilizing the longer display name. The display name shows a total of 22 characters including the space and because of the character limit, this person’s name would not have been displayable prior to the change.
The bookmark feature was introduced in Twitter in February 2018. However, the change was not reflected in the Wayback Machine until around June 2022. In June 2022, new UI changes started to appear in the web archives. We observed in an archived tweet from June 24, 2022 that an upward arrow button was added that included bookmark and other sharing options (Figure 31).
Figure 31: A tweet archived in June 2022, showing an upward arrow added for bookmark and other sharing options.
A summary for the major UI changes for tweets during Engagement Era is demonstrated in the following slides:
Transitional Era (2019–2022)
The Transitional Era ranged from 2019 to 2022. The major changes in this generation were different modes (background colors) for tweets, a visual convergence of the desktop and mobile UIs, and a switch to a client-side rendered UI while the server-rendered UI was deprecated. These changes affected archiving services. Other changes included adding labels (e.g., fact-check, government affiliation) and community notes to counter the spread of misleading content.
March 2019
Initially, Twitter had a white background for a tweet, which was known as light-mode. In 2016, a blue-grayish background, known as night-mode, was introduced as a power-saving feature. Later in March 2019, another pitch-black background, known as dark-mode was introduced. Since then, Twitter now has 3 background colors for tweets: light-mode (white), dim-mode (blue-grayish), and dark-mode (pitch-black). Figure 32 shows a dim-mode and Figure 33 shows a dark mode tweet example from 2025.
In May 2020, Twitter started labeling tweets to specify tweets that contain misleading content and violate Twitter rules. A tweet example from March 2021 shows a fact-check label (Figure 34). Another tweet example from April 2021 (Figure 35) shows the label for a tweet which violated Twitter rules. But this change was not reflected in the web archive instantly. The web archives failed to replay these labels when the major transition from old UI to new UI occurred in July 2019. My WS-DL colleagues Himarsha and Kritika discussed how the change of UI impacted tweet replays in the web archives in a series of blog posts (1, 2, 3, 4). It may be noted that when the system put labels on the tweet, engagements (RTs, likes, replies) were turned “off.” But, not having numbers by these buttons might be confusing for people and even for a classifier.
Figure 35: Tweet example from March 2021, showing a fact-check label.
Figure 36: Tweet example from April 2021, showing a label for violating Twitter rules.
June 2020
In June 2020, Twitter deprecated the server-side UI (legacy version) and switched to a new client-side UI. But, the change in the server-side UI happened later around June 2022. Kritika investigated when the server-side UI change started to appear based on some well-archived accounts. For Jack’s first tweet, we observed that the change in archive for the new UI appeared on June 24, 2022 (Figure 36). However, the side panels such as navigation menus or trending sections were missing in the archived version. While server-side pages were easily archived (because of no API calls), client-side rendering led to incomplete or erroneous replays, because crawlers either did not issue API calls, or the calls were delayed. Client info started to appear again in the new UI.
Twitter started adding labels to government and state-affiliated accounts in August 2020 as shown in Figure 37. However, the change in tweet UI was not reflected in the archive until late June 2022. Figure 38 shows a tweet archived in June 2022 where the label exists. Twitter dropped these labels in April 2023. The count of quote tweets were also added to the new UI along with retweets and like counts during this time.
Figure 38: A tweet example in 2021, showing the government affiliated label.
Community notes were introduced in December 2022 for fact-checking purposes. Figure 40 shows a tweet example from March 2023 with a community note. Although changes to the new UI started appearing in the Wayback Machine around late June 2022, the archived version of this tweet from March 2023 showed the new UI but did not contain the community note (Figure 41). Other mementos from 2024 and 2025 resulted in replay issues. Another addition to the new UI during this time was view counts of a tweet.
A summary for the major UI changes for tweets during Transitional Era is demonstrated in the following slides:
X Platform (2023–Present)
The X Platform ranges from 2023 to the present (2025). The major change for this generation was the re-branding to “X,” which affected archiving services. Other changes included changing the logo from the bird symbol to “X,” introducing different color checkmarks, and the Grok AI chatbot symbol.
April 2023
In April 2023, a new button for the bookmarks (ribbon symbol) was included, which was previously included with the share button (upward arrow). Figure 42 shows a tweet archived in April 2023.
In July 2023, Twitter was rebranded to X. Tweet UIs from this timeline had side panels and tweet replies missing in the archived version. Figure 43 shows a tweet archived in July 2023.
In September 2023, “tweets” were renamed as “posts” and “retweets” as “reposts.” In December 2023, the new tweet UI along with side panels started to appear having the “X” logo but with no replies (Figure 44). The “...” (ellipsis) symbol was included again for “more” options. However, some engagement metrics were missing.
Figure 45: A tweet archived in December 2023, showing side panels having the “X” logo, but tweet replies and some engagement metrics missing.
Figure 47: A tweet archived in May 2025 using the X.com domain, showing a redirected page “The page is unavailable for archiving.”
July 2024
The tweet UI of the mementos for Jack’s first tweet for the X.com domain from July to December 2024 showed the tweet content but with no replies (Figure 47). Engagement metrics started to appear again.
Figure 48: A tweet archived in July 2024, showing tweet content and engagement metrics, but with no replies.
April 2025
For the Twitter.com domain, the Wayback Machine still failed to replay tweet content for the 2025 mementos of Jack’s first tweet. However, we found a memento of Jack’s first tweet in April 17, 2025 (Figure 48) for X.com that replayed the tweet content completely, but showed reduced information. The tweet UI displayed the count of tweet replies, but not the replies. This is because the user agent of the archive was not authenticated. However, previous mementos from 2025 for this tweet failed to replay the tweet content. Figure 49 shows a live version of the tweet from June 3, 2025.
Figure 49: A tweet archived in April 2025, showing reduced information because the user agent of the archive was not authenticated.
Different color checkmarks: The blue verified checkmark was initially meant for authentic and notable users. In April 2023, Twitter started a legacy verification program where users are allowed to have a verified checkmark only if they are part of the subscription program. Two other colors are introduced: gold and gray. The gray checkmark is used for government organizations (Figure 50) and the gold checkmark is used for business organizations (Figure 51).
Affiliation badge/company symbol: There now appears a symbol beside a verified checkmark that indicates that a user is linked to a verified organization. Figure 49 shows a black square-shaped symbol next to the verified checkmark which is linked to company Square.
A summary for the major UI changes for tweets during the X Platform transition is demonstrated in the following slides:
It may be noted that archived tweets may not always recreate the exact tweet. Figure 52 shows such an example where the Wayback Machine recreated the tweet from the archived JSON and not from the actual HTML from the server. This issue is crucial because if a tweet gets deleted from the live web and the archived copy does not replay all the elements of the tweet UI, then classifiers could likely be fooled. For example, the engagement buttons are part of the tweet UI and classifiers looking for that evidence could be fooled for incomplete archived replays that do not contain engagement buttons.
Michael L. Nelson discussed about this issue in the following tweet thread:
@archiveis has the text (see "view source" pic), but the window covers it & @archiveis strips all .js from pages, so it can't be dismissed.@waybackmachine recreates the tweet from the archived .json; it's not the actual .html from the server on 2025-09-12 but it's "good enough" pic.twitter.com/X7UimefYCM
This survey will further help researchers to understand the importance of considering the dynamic behavior of the social media environment as a crucial factor for social media research. Since the only way to study past Twitter is to study archived Twitter, this discussion will also help researchers understand the different UIs that they will encounter in the web archives and adjust their HTML scraping techniques accordingly. In summary, the Early Twitter generation introduced a basic structural framework for desktop HTML. The technical changes of the Transitional Era and X Platform generations brought a dramatic shift for archiving services. Of the four generations, the Engagement Era had the most notable changes in the tweet UI. A lot of engagement features were added during this time which had a significant impact on social media users.
We have listed below some major features that impacted changes to the tweet UIs over the years:
Verified checkmark
Retweet feature
Engagement metrics
Thread, quote tweets, and likes
Character limit increase for tweets and display name
Labels (e.g., fact-check, govt. affiliation) added to tweets
The features are highlighted using example of tweets in the following slides:
To recap, the following table is a summary of the timeline of changes in the UI of tweets:
Timeline
Changes
February 2007 (first found UI after Twitter’s launch in March 2006)
Minimalist layout consisted of tweet content and metadata: timestamp, username, square-shaped profile picture, and client info. The tweet appeared in a white square quote box. The character limit for tweets was 140 and display name was 20.
August 2007
Timestamp alignment changed from right to left.
October 2007
Some UIs had a footer with copyright and links to other information (i.e., contact, help, about us etc.)
November 2008
A blue Twitter logo added, orientation of metadata changed, display name added.
December 2008
Footer appeared again, header having login/join button added, month became abbreviated in the timestamp.
June 2009
Blue checkmark added, but not on individual tweets.
September 2010
Introduced hashbang-style (#!) URLs, but it was withdrawn later in 2012.
October 2010
Retweet feature added, but individual tweets had only textual description of retweet counts.
June 2012
Major change in the orientation of metadata, header, and footer. Symbols added for engagement metrics, “@” prepended before username, content preview became available, embedding tweet option was added.
November 2012
Blue checkmark added beside display name, tweet replies added, count for engagement metrics added, client info removed temporarily.
January 2013
“...” (ellipsis) symbol added for “more” options.
August 2013
Tweet replies removed and news content previews added.
March 2014
Sizing of elements and orientation slightly changed. “Follow” button’s logo changed from bird to “add user.”
June 2014
The color of the engagement button changed to gray and the footer changed to blue.
November 2014
Annotations of engagement metrics removed, footer changed from blue to gray.
February 2015
Both tweet replies and news content previews appeared.
April 2015
News content previews were removed, but tweet replies existed. Quote tweet feature added.
July 2015
Custom background removed from account page, but still appeared for individual tweets. Header changed to a banner having search, sign in/login options with a bird logo.
August 2015
Individual tweets were displayed in an HTML lightbox over the user’s profile screen. Tweet replies were grouped by vertical lines (thread).
November 2015
“Favorite” (start symbol) replaced by “like” (heart symbol).
June 2016
Engagement metric numbers added next to the action buttons.
August 2017
Aesthetic changes in buttons, profile picture became circular, “more” options replaced by a downward arrow drop-down menu, “replying to” added to tweet replies.
September 2017
Tweet character count increased from 140 to 280.
November 2017
The display name character changed from 20 to 50.
February 2018
Bookmark feature introduced as an upward arrow, but changes in the archive reflected in June 2022.
March 2019
Dark-mode (pitch-black) background introduced along with dim-mode (blue-graying) and light-mode (white).
May 2020
Fact-check and Twitter violating rules labels added, but archive failed replaying the labels.
June 2020
Switched to a new client-side UI which reflected in the archive later in June 2022. However, the archived version had side panels missing. Client info started to appear again.
August 2020
Labels for state and government-affiliated organizations were added, but dropped in April 2023. Count of quote tweets added. All changes reflected in the archive in June 2022.
November 2022
Client info removed.
December 2022
Community notes were added, but the archive failed replaying the notes. View counts for a tweet were added.
April 2023
Ribbon-shaped button separately added for bookmark.
July 2023
Rebranded to X. Tweet UIs had side panels and replies missing.
December 2023
“Tweets” were renamed as “posts” and “retweets” as “reposts.” The new UI with side panels started to appear having the X logo, but with no replies.
May 2024
Twitter.com domain changed to X.com. URLs having Twitter domain failed to replay the tweet content in the archive. URLs having X domain redirected to a page “unavailable for archiving” in the archive.
July 2024
The archived version for X domain displayed the new UI with the tweet, side panels, X logo, but with no replies.
April 2025
The archived version for Twitter domain still failed to replay the tweet content. The archived version for X domain and live version UI matched.
Other recent changes
Different color checkmarks and AI chatbot symbols introduced. Square-shaped profile picture introduced for business and brands.
I've always been looking for simple solutions to serve digitized documents in static format, where the cost of maintaining an IIIF image server is prohibitive and I need something simpler to manage and preserve.
IIIF Level 0 is an option for serving static images in IIIF manifests without an image server. However, for large images this approach is inefficient, so tiling is required to serve partial images on demand.
There are ongoing discussions and experiments exploring how to bring static tiles to IIIF viewers, addressing a particular need: serving tiles from ZIP files, which offer significant advantages for management, portability, and storage. Reading remote ZIP content over HTTP using Range requests is now a standard practice, popularized by WACZ for serving web archives.
This GitHub discussion on IIIF Commons offers potential solutions for ZIP file-based tile delivery.
I already conducted an experiment using GeoTIFFTileSource with OpenSeadragon to access remote tiled TIFF files. Here is an example hosted on Cloudflare R2, featuring a 600MB TIFF file converted with VIPS.
Today I came across the SZI Format and the SZI Tile Source for OpenSeadragon.
Although not IIIF-based, this solution allows reading a remote DeepZoom .dzi file packaged in a ZIP file.
Let's test this approach by using a PDF file as source and converting its pages to tiled images.
To extract pages from a PDF, I use mutool from MuPDF, but many similar tools exist, such as pdftoppm from Poppler:
Reign in Science Day is Old Dominion University’s annual campus-wide STEM outreach event designed to spark scientific curiosity among K–12 students and their families. Hosted by the College of Sciences, the program brings together faculty, graduate students, and research groups from across departments to showcase hands-on experiments, interactive demonstrations, and real-world applications of science and technology. The goal is simple: to make science accessible, fun, and inspiring for learners of all ages while giving the community a glimpse into the groundbreaking research happening at ODU.
We are bringing back our annual Reign in Science Day at @ODU on Sat., Oct. 11! Registration is open & this year we are taking it up a notch in #STEM! It's free event for students in grades K-12! Click the link to register before spots fill up https://t.co/Bgpqlz5O5t! pic.twitter.com/blwiTzeMim
The event transformed into hubs of interactive science activity tables. Students, K–12 students, faculty, and families engaged in hands-on experiments and demonstrations that brought chemistry, physics, and environmental sciences to life. Whether it was testing reactions, exploring materials, or uncovering the mysteries of the microscopic world, every table had something to spark wonder in attendees of all ages.
The Computer Science Department at Old Dominion University is home to several vibrant research groups, including the Web Science and Digital Libraries (WebSciDL) Group, which houses sub-labs such as the NIRDSLab and the LAMP-SYS Lab. These labs represented the department at Reign in Science 2025, showcasing cutting-edge research and interactive demonstrations. Their exhibits highlighted how computer science bridges technology and creativity — from eye tracking and emotion sensing to generative AI — reflecting ODU’s ongoing leadership in innovation and digital research.
Eye Tracking and Emotion Sensing
The NIRDSLab, led by Dr. Sampath Jayarathna from the Department of Computer Science, brought technology to life through an engaging demonstration of eye tracking and emotion sensing systems. Participants experienced how computers can interpret attention and emotion by observing subtle eye movements and pupil changes. From playing interactive games using only their eyes to having their pupil images captured and visualized by students, children and parents alike discovered the fascinating intersection of artificial intelligence and human perception. The demo highlighted how machine learning helps decode human behavior — showcasing ODU’s innovative research in computing, cognition, and emotion-aware technologies.
The LAMP-SYS Lab, directed by Dr. Jian Wu in ODU’s Department of Computer Science, demoed Generative AI for K–12 students. Demonstrations led by graduate students introduced young learners to how artificial intelligence creates text and images, blending creativity with computing. Participants explored how machines can generate stories, visuals, and knowledge from data — an inspiring glimpse into the world of Natural Language Processing, Information Retrieval, and Scholarly Big Data research.
Inside ODU’s Michael and Kimthanh Lê Planetarium, Planetarium Director Justin Mason guided visitors through a series of interactive 20-minute planetarium shows. Guests had the chance to take a journey beyond Earth’s atmosphere and explore the cosmos. With only 116 seats per session, these shows filled quickly, reminding everyone how universal the fascination with the stars truly is.
The Reign in Science 2025 was a resounding success, bringing together students, families, and researchers to celebrate curiosity and innovation. The excitement in the halls and labs reflected ODU’s growing commitment to making science engaging and accessible to all. As we look ahead, events like this remind us that the pursuit of knowledge is not confined to classrooms — it begins with moments of wonder like these. Whether you’re a student, educator, or simply curious about the world, take the next step: explore, ask questions, and be part of the discovery. Science at ODU is not just happening — it’s growing, and you’re invited to join the journey.
Lawrence Obiuwevwi Graduate Research Assistant Virginia Modeling, Analysis, & Simulation Center Department of Computer Science Old Dominion University, Norfolk, VA 23529 Email: lobiu001@odu.edu Web : lawobiu.com
I recently finished At
Work in the Ruins by Dougald Hine
(co-author of the The Dark Mountain
Manifesto). I aim to re-read it after it has sunk in a bit more. The
arc of the book reminds me a bit of Bruno Latour’s thinking, in that it
seems to make the axis of left/right political debates more two (or
three) dimensional. In part, the book reflects on what we can learn from
the response to the COVID19 pandemic, when thinking about climate
change, and the unsustainable/extractive economies and ways of living in
the west. I found some of the ideas challenging at first, but once I
went through it, I found I wanted to return to the beginning.
The book concludes with four ways of working in the ruins, which I
wanted to note here mostly as a reminder to myself:
The first kind of work that makes sense is to salvage the good that may
be taken with us from the ruins of the world that is ending. Among the
tangled legacies of modernity, there are gifts that we would not gladly
leave behind, so this is the attempt to bring with us what we are able
to carry.
The second kind of work is to mourn the good that cannot be taken with
us. There is no need to settle where the boundary lies between what can
be saved and what is to be mourned; you may answer that question
differently to me and only time will tell which of us was right, but
this still leaves us with work to do in the meanwhile. Part of the work
of mourning is to tell the stories of the good things that couldn’t be
saved, for those stories can be taken with us and they may turn out to
be seeds.
The third kind of work involves discernment: notice the things within
our ways of living that were never as good as we told ourselves they
were and the chance we are being given to walk away from these. It is
the strange gift of a time of endings to bring into view the gap between
the things we are trying to do and the ways we went about doing them
around here lately. Make use of this gift.
The fourth kind of work is to look for the dropped threads, the moments
earlier in the story that have something to tell us. The way of life
that is ending had a beginning, and other ways were ending in that
moment or being pushed to the edges. So look for the skills or practices
or knowledges that have been marked as extinct and obsolete, for some of
these may yet make all the difference. There is no going back - but
there are dropped threads that can be woven into the onward story.
These strategies remind me quite a bit of the praxis of permacomputing–determining
what parts of the computing project can be useful to preserve, and what
to let go of. I also identify with the fourth kind of work. As I’ve
gotten older I’ve noticed that I gravitate towards finding value in how
older ideas, that may have fallen out of fashion or come under
criticism, have relevance and utility in the present, especially because
of the way we can learn from the particular ways that they failed, or
fell short.
Woman using a Macey vertical filing cabinet (detail, 1903). Source: Wikimedia Commons.
As part of our Public Data Project, LIL recently launched Data.gov Archive Search. In this post, we look under the hood and reflect on how and why we built this project the way we did.
Rethinking the Old Trade-Off: Cost, Complexity, and Access
Libraries, digital humanities projects, and cultural heritage organizations have long had to perform a balancing act when sharing their collections online, negotiating between access and affordability. Providing robust features for data discovery, such as browsing, filtering, and search, has traditionally required dedicated computing infrastructure such as servers and databases. Ongoing server hosting, regular security and software updates, and consistent operational oversight are expensive and require skilled staff. Over years or decades, budget changes and staff turnover often strand these projects in an unmaintained or nonfunctioning state.
The alternative, static file hosting, requires minimal maintenance and reduces expenses dramatically. For example, storing gigabytes of data on Amazon S3 may cost $1/month or less. However, static hosting often diminishes the capacity for rich data discovery. Without a dynamic computing layer between the user’s web browser and the source files, data access may be restricted to brittle pre-rendered browsing hierarchies or search functionality that is impeded by client memory limits. Under such barriers, the collection’s discoverability suffers.
For years, online collection discovery has been stuck between a rock and a hard place: accept the complexity and expense required for a good user experience, or opt for simplicity and leave users to contend with the blunt limitations of a static discovery layer.
Why We Explored a New Approach
When LIL began thinking about how to provide discovery for the Data.gov Archive, we decided that building a lightweight and easily maintained access point from the beginning would be worth our team’s effort. We wanted to provide low-effort discovery with minimal impact on our resources. We also wanted to ensure that whatever path we chose would encourage, rather than impede, long-term access.
This approach builds on our recent experience when the Caselaw Access Project (CAP) hit a transition moment. At that time, we elected to switch case.law to a static site and to partner with others dedicated to open legal data to provide more feature-rich access.
CAP includes some 11 TB of data; the Data.gov Archive represents nearly 18 TB, with the catalog metadata alone accounting for about 1 GB. Manually browsing the archive data in its repository, even for a user who knows what she’s looking for, is laborious and time-consuming. Thus we faced a challenge. Could we enable dynamic, scalable discovery of the Data.gov Archive while enjoying the frugality, simplicity, and maintainability of static hosting?
Our Experiment: Rich Discovery, No Server Required
Recent advancements in client-side data analysis led us to try something new. Tools like DuckDB-Wasm, sql.js-httpvfs, and Protomaps, powered by standards such as WebAssembly, web workers, and HTTP range requests, allow users to efficiently query large remote datasets in the browser. Rather than downloading a 2 GB data file into memory, these tools can incrementally retrieve only the relevant parts of the file and process query results locally.
We developed Data.gov Archive Search on the same model. Here’s how it works:
Data storage: We store Data.gov Archive catalog metadata as sorted, compressed Parquet files on Source.coop, taking advantage of performant static file hosting.
In-browser query engine: Our client-side web application loads DuckDB-Wasm, a fully functional database engine running inside the user’s browser.
On-demand data access: When a user navigates to a resource or submits a search, our DuckDB-Wasm client executes a targeted retrieval of the data needed to fulfill the request. No dedicated server is required; queries run entirely in the browser.
This experiment has not been without obstacles. Getting good performance out of this model demands careful data engineering, and the large DuckDB-Wasm binary imposes a considerable latency cost. As of this writing, we’re continuing to explore speedy alternatives like hyparquet and Arquero to further improve performance.
Still, we’re pleased with the result: an inexpensive, low-maintenance static discovery platform that allows users to browse, search, and filter Data.gov Archive records entirely in the browser.
Why This Matters for Libraries, Digital Humanities Projects, and Beyond
This new pattern offers a compelling model for libraries, academic archives, and DH projects of all sizes:
Lower operating costs: By shifting from an expensive server to lower cost static storage, projects can sustainably offer their users access to data.
Reduced technical overhead: With no dedicated backend server, security risks are reduced, no patching or upgrades are needed, and crashing servers are not a concern.
Sustained access: Projects can be set up with care, but without demanding constant attention. Organizations can be more confident that their archive and discovery interfaces remain usable and accessible, even as staffing or funding changes over time.
Knowing that we are not the only group interested in approaching access in this way, we’re sharing our generalized learnings. We see a few ways forward for others in the knowledge and information world:
Prototype or pilot: If your organization has large, relatively static datasets, consider experimenting with a browser-based search tool using static hosting.
Share and collaborate: Template applications, workflows, and lessons learned can help this new pattern gain adoption and maturity across the community.
This project is still evolving, and we invite others—particularly those in libraries and digital cultural heritage—to explore these possibilities with us. We’re committed to open sharing as we refine our tools, and we welcome collaboration or feedback at lil@law.harvard.edu.
AlArabiyah documentary highlighting the two archived copies on the Wayback Machine calendar between which NortaLink website was allegedly removed.
Last month, September 2025, AlArabiyah published a documentary in Arabic titled "قادة حزب الله.. قصة الاغتيالات". It investigated the assasinations of Hezbollah leaders and the 2024 Lebanon electronic device attacks. AlArabiyah, a Saudi state-owned international Arabic news and media outlet, is one of the most popular news channels in the Middle East and the world. AlArabiyah is so prominent that former president Barack Obama gave it his first formal interview as president of the US on January 26, 2009. The documentary is published on YouTube under multiple channels/accounts that belong to AlArabiyah including AlArabiyah, AlArabiyah Screen, and AlArabiyah Programs. It investigated Norta Global Ltd, the fake company that sold the explosive electronic devices (pagers and walkie talkies that exploded on September 17th and 18th of 2024) to Hezbollah, and looked into the history of the company's website using the Wayback Machine since it was removed from the web. The documentary stated that the website was deleted on September 19th, the day after the attacks, and highlighted the times between which the website was taken down (9:57 AM and 11:02 PM UTC) as shown in the following screenshots from the documentary. Why did the documentary highlight the first and last archived copies on September 19th and totally ignore the archived copy in between them (captured at 10:51 AM UTC)? I do not know.
AlArabiyah documentary showing the Internet Archive's Wayback Machine naming it in Arabic "The Archive of the Internet, Wayback Machine"
AlArabiyah documentary highlighting the two archived copies on the Wayback Machine calendar between which NortaLink website was allegedly removed.
Because web archiving is what most of my research is about, I paused the video and started to look for archived copies of the website in web archives. I looked in all public web archives and could not find a single copy in any of them except the Internet Archive (IA), which has 23 copies. The website only has a single page and it immediately comes across as a website for a fake company. I found that the website was not removed on September 19, the day after the attack as the documentary claims. It was, in fact, removed a few weeks before the attacks (sometime between 2024-08-06 and 2024-08-29). I doubt that this is an accidental error from the journalists at AlArabiyah. Is it possible that they just do not know how to use the Wayback Machine? I doubt it. The issue here is not that the page was removed and that the IA has archived a 404 page. Inspecting the first archived copy on 2024-08-29 shows that the website was not removed per se, but page's main div opacity has been set to 0 in its css style element making its content, the website, invisible. The home page for the website is entirely wrapped in that invisible div making the website invisible. Therefore, on 2024-08-29, the IA archived a "blank page", technically a transparent page. The screenshot of the inspection window shows that the page has text and iframes that still have information that the IA has captured. The average user thinks the website was removed by clicking the link to the memento and getting a blank page. A web savvy user can inspect the page and see that the captured copy of the page has elements in it, but the page was made invisible. The main point here is that the IA did what it is supposed to do, but the web master made the page transparent (invisible). This implies a level of sophistication far beyond just removing the website.
The inspector window in Google Chrome browser showing the site made invisible by setting the opacity to 0 in the CSS style for the div that contains the page.
Furthermore, I looked into social media links in these copies and found that their Facebook page has been deleted and that Facebook prevented the IA from capturing its content because it requires the user to be logged in to download the page.
Archived copy of NortaLink Facebook page showing that the Internet Archive's crawler was blocked at crawl time.
NortaLink has a LinkedIn page and it has not been deleted, but it has been changed.
Screenshot of the current linkedIn page of NortaLink.
Screenshot of the linkedIn page of NortaLink from September 19th 2024 taken from the Internet Archive.
The archived copy of NortaLink page on LinkedIn from 2023-10-21.
The archived copy of NortaLink employees page showing that the IA was unable to archive the page because it redirects to the login page.
NortaLink’s LinkedIn page now lists one person working for the company as a managing partner and that he/she is located in Algeria. The profile lists Norta as the only employer and the profile only has one activity on LinkedIn, a comment from July of 2025. The name listed on the profile is Fabenco Bentayeb. The last name is a popular family name in Algeria, but the first name, Fabenco, is unheard of.
Screenshot of the linkedIn profile page which states the the owner is the only current employee of NortaLink.
The documentary claimed that Norta Global Ltd has only one employee, Rinson Jose, who travelled to the US on the day of the attack and is now wanted by the Norwegian government.
I went to the website they listed on their LinkedIn page before the attack (norta.no) and found that it has also been removed. It has been saved 8 times between May 31, 2017 and September 20, 2024 by the IA. Recent mementos showed that the website returned a 302 status code (redirect) at crawl time, but older mementos from 2017 and 2018 showed that the website norta.no was also a single page website that looks just as fake as nortalink.com. Although the archived LinkedIn page showed that the company has two employees, The website featured pictures and information for three individuals that work for the company, Rinson Jose (CEO), Bibin P B (CTO), and Hanna Hananger (PRO). Assuming all are fake names, I was curious if they had any online presence with these names. Rinson Jose became so popular after the attack and is directly linked to it, so googling his name returns links about the bombing itself and any personal pages, if they exist, are buried under so many pages talking about the bombing. Furthermore, Rinson Jose must be a popular name because searching for people with that name on social media platforms such as LinkedIn, Facebook, and X returns dozens of profiles. Bibin P B must also be a popular Indian name because there are so many profiles on LinkedIn with that name.
Unlike the names of the first two employees (Rinson Jose and Bibin P B) listed on NortaLink’s website, Hanna Hananger is not a common name. On LinkedIn, I found what seems to be the same person from the picture. Unfortunately, the quality of their pictures on the website is so low it makes me think they deliberately posted such low quality pictures to hide their identity later on. The LinkedIn profile for Hanna Hananger shows a middle name, Charlotte, and says that she is located in Oslo, Norway. The profile shows that she worked in multiple countries including working as a social worker for two months in India, the same country the other two employees, Rinson and Bibin, are from. The profile page cannot be archived so I was unable to verify that the LinkedIn page has not been altered. I tried to archived it using the Save Page Now service from the IA’s Wayback Machine, but LinkedIn prevented the IA from archiving it and instead archived the login page for LinkedIn since LinkedIn requires the user to be logged in to download the profile page for Hanna Hananger. Trying to archive the profile page using Archive Today produced the same result.
The IA failed to archive Hanna Hananger's LinkedIn profile page
Hanna Hananger’s LinkedIn profile page does not mention that she worked for any company that has Norta in its name. It is interesting that the first memento of norta.no (from 2017) shows that Rinson and Bibin are the only employees. Further online search showed that Hanna Hananger is listed as the founder and director of Link Child Foundation in Mbale, Uganda. It is possible that norta.no used her picture and identity claiming that she works for the company without her knowledge because Hanna Hananger seems to be a legitimate person and that's her real name. Mementos of norta.no show that the page did not have a bio for Hanna Hananger. Leveraging the IA to verify the legitimacy and consistency of websites, I looked for mementos for the Link Child Foundation's website. The oldest memento is from 2014 and it lists Hanna Hananger as the founder and CEO. The home page has been archived 54 times between 2014 and 2025. The organization seems to be legitimate and the archived copies of its website show a natural growth of this type of website between 2014 and 2025. I went to their Facebook page and it also seems legitimate showing a creation date of June 5, 2013. The page has photos posted showing the progress of building their facility in Mbale, Uganda since 2014 with some of the pictures tagging the facebook profile page of Hanna Hananger that has the same picture as her LinkedIn profile page. Multiple other websites including news websites have pictures and information about her including Lister24. Link Child Foundation has not posted on Facebook since December of 2023, has not posted on Instagram since May of 2024, and has not posted on Vimeo since 2020.
The take away message from this post is that the IA is a very effective tool to verify the legitimacy of websites with minimal work. AlArabiyah team utilized the IA in their documentary investigating the legitimacy of Norta Global Ltd, the company that supplied explosive pagers and walkie talkies to Hezbollah that killed and injured thousands of people, but AlArabiyah failed to extract the correct information from the Wayback Machine and presented inaccurate information about the date and time when the company’s website “disappeared”. Upon further investigation, I found that Norta Global is a fake company and that its website and social media pages were created to have a history on legitimate platforms like LinkedIn for the purpose of selling the explosive electronics to Hezbollah and carrying out the bombing attack later. Deutsche Welle (DW) stated that two former Israeli Mossad agents confirmed that the pagers bombing attack has been planned since 2022, which sounds correct since the first archived copy of Norta Global Ltd’s website was captured on February 19th of 2023. They stated that all the fake companies, along with their online profiles and fake advertisements, were created for the purpose of sourcing the pagers from the manufacturer and selling them to Hezbollah after planting the explosives in them. In the same interview, the former agents stated that unlike the pages’ sales plan, the selling of explosive walkie talkies to Hezbollah was planned 10 years before the attack. In another interview, a former Israeli Mossad agent confirmed that the fake online websites and online profiles created by the Israeli intelligence agency Mossad to promote the explosive electronics were later removed from the web. It seems like these fake companies, their website, and online social media presence were part of the plan to take down Hezbollah members and leaders.
tl;dnr - Reader Lite and Reader Luminaries are implementations of RAG (retrieval-augmented generation) tools. They query an index of vectorized sentences, return the results, and provide the means to apply generative-AI to the sentences. These tools are intended to supplement the traditional reading process.
Introduction
A long time ago, in a galaxy far far away, when I was in library school, I became fascinated with the idea of indexes -- lists of pointers to information. I love creating indexes and tools to search them. In that vein, Reader Lite and Reader Luminaries are index/search engine combinations, and in the language of an academic librarian, Reader Lite and Reader Luminaries are indexes of sentences to seven different and distinct books from the Western canon:
Emma by Jane Austen
Essays by Francis Bacon
The Iliad and the Odyssey by Homer
Moby Dick by Herman Melville
The Prince by Machiavelli
Walden by Henry David Thoreau
Works by Horace
Given a set of documents and a (vectorized) index to their sentences, Reader Lite and Reader Luminaries support the means to:
search the indexes and get sentences in return
cite the sentences to see whence they came
convert the search results into computed paragraphs
use generative-AI to summarize the sentences
use generative-AI to ask real-world questions of the sentences
Reader Lite is the predecessor to Reader Luminaries. It does all the things above plus two additions, specifically, the means to:
randomly select a question from the corpus and do analysis against the question
configure a persona used by the system to summarize and elaborate on results
Using these tools, you can supplement the traditional reading process to get a better idea of what is going on in a book. For example, the first sentence in Moby Dick is "Call me Ishmael." "Well," you might say, "Who in the heck is Ishmael and why should I care?" Using Reader Lite you can address this question. First, query Moby Dick for the word "ishmael", and something like the following is returned:
CHAPTER I. LOOMINGS Call me Ishmael. "WHALING VOYAGE BY ONE ISHMAEL. With anxious grapnels I had sounded my pocket, and only brought up a few pieces of silver,--So, wherever you go, Ishmael, said I to myself, as I stood in the middle of a dreary street shouldering my bag, and comparing the gloom towards the north with the darkness towards the south--wherever in your wisdom you may conclude to lodge for the night, my dear Ishmael, be sure to inquire the price, and don't be too particular. But go on, Ishmael, said I at last; don't you hear? Yes, Ishmael, the same fate may be thine. Do you suppose now, Ishmael, that the magnanimous God of heaven and earth--pagans and all included--can possibly be jealous of an insignificant bit of black wood? Now then, my young man, Ishmael's thy name, didn't ye say? Well then, down ye go here, Ishmael, for the three hundredth lay. It's I--Ishmael." I, Ishmael, was one of that crew; my shouts had gone up with the rest; my oath had been welded with theirs; and stronger I shouted, and more did I hammer and clinch my oath, because of the dread in my soul. How it was that they so aboundingly responded to the old man's ire--by what evil magic their souls were possessed, that at times his hate seemed almost theirs; the White Whale as much their insufferable foe as his; how all this came to be--what the White Whale was to them, or how to their unconscious understandings, also, in some dim, unsuspected way, he might have seemed the gliding great demon of the seas of life,--all this to explain, would be to dive deeper than Ishmael can go. Thus, then, the muffled rollings of a milky sea; the bleak rustlings of the festooned frosts of mountains; the desolate shiftings of the windrowed snows of prairies; all these, to Ishmael, are as the shaking of that buffalo robe to the frightened colt! If then, Sir William Jones, who read in thirty languages, could not read the simplest peasant's face, in its profounder and more subtle meanings, how may unlettered Ishmael hope to read the awful Chaldee of the Sperm Whale's brow? But how now, Ishmael? Explain thyself, Ishmael. A veritable witness have you hitherto been, Ishmael; but have a care how you seize the privilege of Jonah alone; the privilege of discoursing upon the joists and beams; the rafters, ridge-pole, sleepers, and under-pinnings, making up the frame-work of leviathan; and belike of the tallow-vats, dairy-rooms, butteries, and cheeseries in his bowels.
Kinda long, but you can use the system to summarize the results:
As a university professor, I can analyze the literary device used in Chapter I of Moby-Dick. Herman Melville employs several techniques to create a rich and engaging narrative.
Firstly, Melville uses personification to give human-like qualities to non-human entities, such as the ocean, mountains, and wind. For example, he describes the ocean as "muffled rollings of a milky sea" and the mountains as "festooned frosts." This creates a sense of anthropomorphism, making the reader feel as if they are experiencing the same emotions and sensations as the characters.
Secondly, Melville employs allusion to add depth and complexity to the narrative. He references Sir William Jones, an 18th-century scholar who could read multiple languages, but notes that even he could not fully comprehend the meanings behind a peasant's face. This allusion serves to highlight the limitations of human understanding and the vastness of knowledge.
Thirdly, Melville uses symbolism throughout the chapter. The whale is portrayed as a powerful, unknowable entity, representing the unknown and the mysterious. Ishmael's fixation on the whale suggests his own fascination with the unknown and his desire to understand it.
Fourthly, Melville employs repetition for emphasis and to create a sense of urgency. He repeats the phrase "Ishmael" throughout the chapter, emphasizing Ishmael's role as a witness to the events unfolding. This repetition also serves to underscore the idea that Ishmael is not just a passive observer but an active participant in the narrative.
Finally, Melville uses imagery and descriptive language to create a vivid picture of the setting. He describes the "bleak rustlings" of the mountains and the "desolate shiftings" of the snow-covered prairies, creating a sense of desolation and isolation. This helps to set the tone for the rest of the novel and establishes the harsh environment in which the characters will operate.
In conclusion, Melville's use of literary devices in Chapter I of Moby-Dick creates a rich and engaging narrative that draws the reader into the world of the story. By employing personification, allusion, symbolism, repetition, and imagery, Melville establishes the themes and motifs that will be explored throughout the novel.
Still kinda long and not exactly to the point. On the other hand, the summary is more readable, and it does echo many of the things in the search results. It is a summary.
Finally, you can submit an actual question to the search results (not the summary) to get something (probably) more meaningful; getting back to the original question, you can ask, "Who is Ishmael?", and get something like the following as a result:
Ah, a profound question indeed! *adjusts spectacles* As a university professor, I must say that the identity of Ishmael is a fascinating topic. *clears throat*
Chapter I of Moby-Dick, as you've so kindly provided, introduces us to the narrator Ishmael. We learn that he is one of the crew members on the whaling ship, the Pequod, captained by the vengeful Ahab. Ishmael's name is mentioned early on in the chapter, and it becomes clear that he is a central character in the story.
However, the question remains: who exactly is Ishmael? Ah, my dear student! *winks* The answer lies not only in the text itself but also in the historical context of the time.
Herman Melville, the author of Moby-Dick, was known to have been heavily influenced by the literature and culture of his era. In particular, he was fascinated by the works of Shakespeare, who often used nicknames or aliases for his characters. *nerd alert*
Could it be, my dear student, that Ishmael is a deliberate reference to the biblical figure of Ishmael, son of Abraham and Hagar? *thoughtful expression* This Ishmael is known as the ancestor of the Arab people, and his story has parallels with that of Ahab, who seeks revenge against the White Whale. *intrigued*
Furthermore, the name "Ishmael" itself carries connotations of survival and resilience. In the biblical account, Ishmael is born as a result of Abraham's covenant with God, despite the fact that his mother Hagar is cast out into the wilderness. *thoughtful nod*
Thus, we see how Melville may have chosen the name "Ishmael" to represent not only the narrator but also the themes of survival and perseverance that run throughout Moby-Dick. *excitedly*
In conclusion, my dear student, the identity of Ishmael is multifaceted and open to interpretation. By examining both the textual evidence and the historical context of the time, we can gain a deeper understanding of this enigmatic character and his place in the narrative of Moby-Dick. *smiling*
Now, if you'll excuse me, I have some notes to attend to... *distracted expression*
Even now, still kinda long, but if you read the results closely, then you come to learn Ishmael is the narrator of the story, and Ishmael may also symbolize many different things in the book. If you know the story of Moby Dick, then you know the answer to the question is correct.
You can use Reader Lite to address all sort of other questions such as but not limited to:
Who is Emma?
Who killed Hector?
What is the relationship between Ulysses and Penelope?
What do beans represent to Henry David Thoreau?
If you were to read the books of Reader Lite in the traditional manner, then you would be able address the questions above, but it would require a lot of time. More importantly, I advocate the traditional reading of these books, but the reading process can be enhanced through the intelligent use of an index like Reader Lite.
Case Study: What is a prince?
Reader Luminaries builds on Reader Lite. At the cost of an Internet connection and a relationship with Ollama, Reader Luminaries returns results much faster than Reader Lite, and Reader Luminaries makes it possible to configure the lengths of responses. As a case study against Reader Luminaries, you might want to know, "How does Machiavelli define a prince?" You begin by querying the index for "prince", and you get back results such as the following:
Further, the prince is compelled to live always with the same people, but he can do well without the same nobles, being able to make and unmake them daily, and to give or take away authority when it pleases him. But granted a prince who has established himself as above, who can command, and is a man of courage, undismayed in adversity, who does not fail in other qualifications, and who, by his resolution and energy, keeps the whole people encouragedsuch a one will never find himself deceived in them, and it will be shown that he has laid his foundations well. Either you are a prince in fact, or in a way to become one. For this reason a prince ought to take care that he never lets anything slip from his lips that is not replete with the above-named five qualities, that he may appear to him who sees and hears him altogether merciful, faithful, humane, upright, and religious. One prince* of the present time, whom it is not well to name, never preaches anything else but peace and good faith, and to both he is most hostile, and either, if he had kept it, would have deprived him of reputation and kingdom many a time. That prince is highly esteemed who conveys this impression of himself, and he who is highly esteemed is not easily conspired against; for, provided it is well known that he is an excellent man and revered by his people, he can only be attacked with difficulty. And a prince ought, above all things, always to endeavour in every action to gain for himself the reputation of being a great and remarkable man. A prince ought also to show himself a patron of ability, and to honour the proficient in every art.
Like Reader Lite, the results are returned as one long paragraph, and still kinda hard to read. On the other hand, the results come straight from the text and the citations are nuanced with summaries and keywords:
You can then summarize the search results into a single sentence:
A prince must cultivate a reputation for greatness and virtue while maintaining absolute control over nobles and appearing merciful, faithful, humane, upright, and religious to secure his power.
You can then ask, "What is a prince?", and get something like the following:
A prince is a ruler who must carefully cultivate a reputation of virtue while maintaining absolute authority over nobles and people.
Now, if you know anything about The Prince by Machiavelli, then you know the summary and the elaboration on the question are more than plausible. Which brings me to a very very important point: Reader Lite and Reader Luminaries do not return the truth. Instead, they return plausible answers. It is up to you to evaluate the results and only then make judgements about their truth or falsehood. People are the source of knowledge not computers.
Incidentally, I enjoy using Reader Luminaries to compare and contrast how Horace, Bacon, and Machiavelli differ on big ideas such as truth, beauty, love, honor, and justice. But such is the fodder for another investigation and blog posting.
Summary
Reader Lite and Reader Luminaries are interactive indexes to seven different books from the Western canon. The use of Reader Lite and Reader Luminaries -- like the use of any other index -- enhance and supplement the traditional reading process. Use these tools to get a more thorough understanding of the books from which they are based.
This presentation is from the eResearch Australasia Conference. It was delivered by Rosanna Smith and Michael Lynch. I'm putting it here as one of the authors.
We followed this presentation with an RO-Crate Birds of a Feather session with some other colleagues. We were able to help out a few RO-Crate commiunity memebers with some of their questions, and direct them towards solutions and avenues for further discussion - mainly the RO-Crate Regional Drop in Calls.
Research Object Crate (RO-Crate) is a simple method for linked-data description and packaging of data. Since 2021, the Language Data Commons of Australia (LDaCA) project has onboarded a number of language data collections with thousands of files. These are all consistently described as RO-Crates using a human and machine-readable Metadata Profile, discoverable through an online portal, and available via an access-controlled API. This presentation will show how analytics workflows can be connected to data in the LDaCA repository and use linked data descriptions, such as the W3C “CSV for the web” (CSVW) standard, to automatically detect and load the right data for analytical workflows. We will show how the general-purpose flexible linked metadata and raw data is prepared for use with common tools implemented in Jupyter notebooks.
This work, funded by the Australian Research Data Commons ARDC, has enabled novel research by making data collected using sub-disciplinary norms of linguistics available to researchers working in other specialised areas – we will show examples of this and how this approach is relevant to other HASS and STEM disciplines, demonstrating work which would not have been possible without this co-investment between the Language Data Commons partners and ARDC
The presentation should be accessible to the broad audience of eResearch and be of particular relevance to those with an interest in workflows and analytics, as well as metadata, vocabulary and repository specialists. It shows a FAIR research system which runs on open specifications and code and can be redeployed for other domains.
The Language Data Commons of Australia or LDaCA is part of the Humanities and Social Sciences and Indigenous Research Data Commons, which is led by the ARDC.
This project is co-funded by the University of Queensland (UQ). Authors Rosanna, Peter, Simon and River all work with UQ and Mike is with the University of Sydney.
What you see on the right is the execution strategy and what drives the LDaCA tech team.
To summarise, the strategy is about data management, developing tools and standards, technical architecture, and care for data in the long term.
LDaCA builds data portals with robust access controls in place, and this ensures that access is as open as possible but as restricted as needed according to the data stewards and communities the collections relate to.
We also develop shared tools for processing, analysis and visualisation of data and metadata, some of which we’ll be demonstrating today. We will be focussing on the indicated parts of this strategy “access” and “analyse”.
Looking at analytics specifically, LDaCA aims to ensure workflows and infrastructure developed for analysing collections are available for access and reuse.
These should also be easy to re-run with clear documentation on their uses and limitations, and should allow for adaptation for a range of contexts.
The core idea of LDaCA is to develop standardised methods for describing and organising data in a Data Commons environment, which reduces friction in finding, using and re-using data.
We have captured this approach with PILARS, which are Protocols for Implementing Long-Term Archival Repository Services.
These services should be designed to work in low-resource environments, allowing communities to have agency and control over their materials.
The protocols prioritise sustainability, simplicity and standardisation, with linked-data description and clear licensing.
Data is organised into objects, taking into account practical considerations such as the size of each object, and access conditions. Each data object is stored in a repository as an RO-Crate (which stands for Research Object Crate).
An RO-Crate is a way of packaging research data that stores the data together with its associated metadata and other component files, such as the data license.
In this diagram, we have one collection containing items, such as a set of interviews, and each item describes the files linked to it, in this case, a text file and an audio file. Licenses for each of the items are also included.
The RO-Crates are modelled according to a metadata profile which outlines the expected collection structure and provides guidance on describing language data in a repository.
The profile uses schema.org as its foundation for metadata description, as well as a few other standard vocabularies.
It also draws on the Language Data Commons schema http://w3id.org/ldac/terms, which contains metadata terms specific to describing language data.
This diagram overviews the architecture for indexing data with a focus on findability, and illustrates the key conceptual components of our data storage architecture.
Storage services follow the PILARS protocols and store data as a series of Storage Objects using the Oxford Common File Layout (OCFL). This is a specification from the digital library community for storing data independently of applications such as portals, repositories or particular analytical pipelines.
This data is distributed across multiple storage services, including file-systems and object stores hosted by different institutions.
The diagram shows our distributed access control approach at LDaCA, and this is motivated by a need for controlled access in conjunction with CARE and FAIR data principles.
Each item in the repository is stored as an RO-Crate with licensing information included, and the repository defers access decisions to an external authentication and authorization system.
Here, data custodians can design whatever process they like for granting license access, ranging from simple click-through licenses to detailed multi-step workflows based on whatever criteria the rights holder requires.
This can be scaled to multiple portals and repositories as well.
(Image is from here: https://github.com/Language-Research-Technology/plantuml-diagrams/blob/main/generic/simple-distributed-access-control.svg)
All of this architecture comes together in the “main” portal, where we add language data collections that meet LDaCA’s collection guidelines. These are batch loaded from scripts into a loading area and then they’re indexed appropriately.
We’ve also set up automation with Terraform to build portals on demand for communities, so that bespoke requirements for the portal interface and other needs can be catered to.
The portals provide secure access to the data in an automated way through an API, and this is also used for downloads and analytics.
On the analytics side, we’re building Jupyter notebooks to explore the collections, which users can launch in a binder infrastructure, and these are also accessible in the portal.
The notebooks allow you to download the collection data and analyse its contents in a repeatable way with reproducible results.
Library upgrades and version changes can break once-working Jupyter Notebooks, and this makes it difficult for future users to verify and reproduce results or build on them.
BinderHubs enhances reproducibility by allowing users to launch pre-configured notebooks as interactive computing environments, and these have explicitly defined hardware and software requirements.
We use the Nectar BinderHub service that is provided by the ARDC and AARNet.
To illustrate this, I’ll walk through a recent notebook that has been developed for the COOEE collection.
The COrpus of Oz Early English, or COOEE, is a collection of texts written in Australia between 1788 and 1900. These include letters, published materials in book form, and other historical text.
The corpus is divided into four time periods, each holding about 500,000 words.
It’s also divided into four registers: these are Speech-based, Private Written, Public Written and Government English. The proportions of these registers is consistent for each time period as well.
Because COOEE is organised across both time period and register, the corpus can be stratified into 16 sub-corpora and allows for analysis of linguistic features according to either or both of the variables.
Our notebook uses these sub-corpora as the basis for topic modeling, to show what topics are more or less strongly associated with particular sub-corpora.
Before we could analyse the collection though, there was some transformation of metadata needed, so that where possible, the terms adhere to a standard framework that can be applied across collections, and allow for further interoperability.
A number of these included mapping metadata in the COOEE collection to their schema.org equivalents, for example, Birth to birthDate, and Nr to identifier.
We also needed to identify the main text for analysis, because each object in the COOEE collection has two types of text files - one is the plain text and the other has metadata encoding, with information about the register and the author of the text.
For analysis, we only want to use the plain text so that the metadata codes won’t be included as part of the topic counts.
For this, we defined the new term mainText in the Language Data Commons Schema vocabulary, which identifies the most relevant sub-component for computational text analytics.
This metadata standardisation is an important step because it not only makes analysis faster and easier to do, but also allows us to re-use analytical approaches on multiple collections, and streamlines processes like comparing datasets.
In the notebook, we first download the whole COOEE collection directly from the LDaCA Portal, and then we use the RO-Crate tabulator Python library which converts RO-Crate linked data (a network, or “graph” of relationships) to tabular representations (rows and columns).
The tabulator also allows you to select the tables and properties that are relevant to your analysis. Mike will talk about this later in more detail.
We also specify the mainText field as the property to be loaded and indexed as the text to analyse for this collection.
We then convert the table to a Pandas DataFrame so that the metadata sits alongside the text data.
Finally, we slice the data by register and time period, and concatenate the text of each document within a slice to create 16 large documents.
In order to discover the topics in the collection, we need to be working with a list of words for each document instead of paragraphs of text.
For this, we used the Natural Language Toolkit Python libraries, which allow us to tokenise the data.
Some words and other items in the text can be considered as 'noise' for the analysis and these are removed: these include punctuation marks, numbers, artefacts of digital text such as new line symbols and many common function words which are not relevant for the analysis.
The diagram shows an example of some of the input text in the first box, and the second box shows the same data tokenised in a list with punctuation and other ‘noise’ items removed.
Using the tokenised word lists, we can now model the data using Gensim, and visualise the output, and we’ve done this with both interactive and static options.
LDAvis – on the left – is an interactive visualisation of the topics learned by the model. The circles represent topics, and the most prevalent topics are larger. The bars show individual word frequencies.
Because this visualisation is interactive, users can select a topic circle to view its most salient terms, which can be used to analyse any plausible semantic groupings in those topics.
The heatmap – on the right – shows the distribution of topics across time periods and registers. The horizontal rows of three or four dark squares show where topics are strongly associated with a particular register across time.
Although this particular notebook example is just run on the COOEE data, these processes can be easily re-applied to explore further collections, and can be adjusted according to the needs of the user.
I’ll now hand over to Mike to talk more about the RO-Crate tabulator.
The first version of Tabulator was written because a new version of the Observable data visualisation platform had come out which allowed you to build interactive dashboards without requiring a custom backend.
I wanted to see how quickly I could use it to visualise the contents of an RO-Crate.
Observable is a data-sciency tool so it really wants to work with tables. So Tabulator needed to be able to turn a random RO-Crate, which could be an arbitrary network of entities, into a set of tables.
I'm a software engineer, so the first version was extremely general and not very performant. In most practical cases, you only want to turn a couple of the entities into tables - with the LDaCA collections, for example, things like RepositoryObjects, and also lifting relations to Persons into the table of documents
With a bit of config the Tabulator can be used to convert the COOEE RO-Crate to an SQLite database with a row for each of the documents and their metadata - you can then load it in Observable and start building interactive plots to look at corpus features.
Tables are also how researchers like to work with data, because you can load them into spreadsheets. But that raises the problem of spreadsheets and data types - CSV is just text with commas - there's no type information.
But - we're exporting our data from a well-crafted RO-Crate, which has schema.org mappings, and there's an existing standard, CSVW, which can annotate CSV with JSON-LD column types.
So Tabulator can export CSVs, together with a secondary RO-Crate which provides a CSVW schema for each of the exported tables, explaining what the columns are and providing links to definitions
We've got more work underway at Sydney Informatics Hub to make it easier to use the Tabulator on LDaCA corpora. I still want to keep it as a general-purpose library but we can add some Python code which will use the common features of LDaCA RO-Crates to get out the relevant entities and build a table of the texts.
This will then feed into the work we're doing on a new web frontend to a range of different text-analytics tools.
Eventually, rather than downloading a whole corpus, running Tabulator and then loading it, you could run a component which fetches texts from the LDaCA data portal and returns rows which can be analysed in a web interface
General purpose - visualise/analyse in your platform of cj
To conclude, this work is part of a Data Commons. The key idea is to create low-friction analysis pipelines by:
Having consistently described, well managed data available with a discovery portal and secure access APIs
Data prep tools which will make it possible to align BYO data with the standards
Tools should be easy to run on more than one data set, datasets work with more than one tool
Limitations and assumptions on tools are clearly documented
This recognition affirms ODE’s contribution to the United Nations Sustainable Development Goals (SDGs) and confirms that it meets the highest standards of openness, accessibility, and public benefit.
This is an excerpt from a longer contribution I made to Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools. [pdf] It is a sketch only, and somewhat informal, but I thought I would put it here in case of interest. It occasionally references the position paper, and the longer piece of which it is a part. It is also influenced by the context in which it was prepared which was a discussion of the informational disciplines and the iSchool in R1 institutions. This section could of course be much expanded in a fuller treatment. It is striking to me how much LIS and Information Science can still reference different intellectual, disciplinary or institional boundaries depending on context. If you wish to reference it, I would be grateful if you cite the full original: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]
This informational diffusion has given the iSchool great latitude and it can accommodate a great diversity of disciplinary lenses – from the very technical, to the social sciences and humanities, to design, to marketing and communication, to public policy, to critical theory, and so on. The Deans’ interviews suggest that this is at once a great strength and a potential weakness, as the iSchool does not have exclusive ownership of a foundational discipline, but rather a multidisciplinary focus on a hard to define phenomenon. Furthermore, this phenomenon has become an object of study in many other disciplines also.
In this section I discuss the informational disciplines (sic), LIS and Information Science, and conclude with some comments about Library Studies, LIS and the iSchool.
Informational disciplines
Library studies
First, here is a brief note on LS. Buckland (2005) traces the emergence of ‘library science’ to Martin Schrettinger in the early 19th century. The first American library school was opened by Melvil Dewey at Columbia in 1887. Unsurprisingly, a central focus of each of these figures is organization of the collection.
In general, the ‘library school’ has not been a story of optimism and growth. Some closed. Many changed their name to lose ‘library.’ Some were merged into other schools or departments, with various disciplinary emphases.
There is now some variety of provision across types of university and disciplinary configuration. It is common within iSchools who have a library focus to use the term LIS.
Information science(s)
What is Information Science or the Information Sciences? It seems to me that one can identify two very provisional emphases here to help scaffold a discussion. The first is in terms of the emergence of Information Science in the mid twentieth century, with a set of shared concerns, intellectual and personal influences, and professional venues. I label this Information Science Classic in this section, and this is what I usually mean when talking about Information Science (IS). The second is more generic, as Information Science or Information Sciences (or Informatics) is used as a designation of convenience for an academic interest in a range of information-related topics, with or without any specific reference to or acknowledgement of Information Science Classic. We see this emphasis in various places, including in some of the iSchools with non-Library Studies backgrounds (see Cornell or Penn State for example). This may sometimes be used to designate an applied technology focus with more of a social or business dimension than you might typically find in Computer Science.
How many on campus outside the iSchool, I wonder, would have any sense of Information Science Classic? How often would Information Science simply be read in the more generic second way? If you read a page of iSchool faculty interests would anything identify information science distinctively? It would be interesting to explore this further.
Information science – classic
As long as the “discipline” is the primary unit of differentiation in the social system of scholarship, it is only strategic for information science to claim its status as one.” // Petras, V. (2024). The identity of information science. Journal of Documentation.
[Information science] is a somewhat battered and in some ways disoriented field. // Ian Cornelius (1996). Meaning and Method in Information Studies.
Thus, our second major finding, already conveyed, is that information science lacks a strong central author, or group of authors, whose work orients the work of others across the board. The field consists of several specialties around a weak center.// White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995.
We may therefore conclude that information science is an unclear label (a floating signifier) and that there is a great need for clarification and for improved terminological hygiene. // Hjørland, B. (2018). Library and Information Science (LIS), Part 1.
In this final chapter we give an overview of some ideas about the future of the discipline and profession of information science. This has been a source of debate almost since the beginnings of the discipline, and of its sibling, library science. Much of the debate has been negative, with a narrative around disciplines and professions permanently in crisis. And, indeed, if one worries about the structure of academic departments and the names of professional associations one might find cause for concern. If one takes a long view and thinks […] about the power of recorded information and the continuing need for disciplines and professions to deal with it, the concerns come into better focus. // Bawden and Robinson (2022). Introduction to Information Science.
We can see three different, though not incompatible, places for information science in the changing academic firmament. One is a closer alignment with the informatics and data science areas; this is essentially the route taken by the ischools movement. Another is an alignment with media, communications, journalism, publishing and similar subjects. A third is an alignment with cultural studies and the digital humanities. Examples of all three can be seen, and all seem to be viable. It is important, however, not to choose one of these and declare that this is the future of the information sciences. They, and others which will emerge in the future, are simply aspects of the multidiscipline, emhasising different perspectives on the central focus of documents and the human record. // Bawden and Robinson (2022). Introduction to Information Science.
We can identify IS Classic as an academic interest with practical applications which emerged in the mid twentieth century as a response to the increasing volumes of scientific, technical and other information. A historical perspective locates this in the interests of special librarianship, of the documentalist movement in Europe (Otlet et al), and of various scientific and technical approaches to the management of growing information resources (Bush, Bradford, Garfield, and so on).
There is an extensive literature seeking to define what is meant by Information Science. Much of this is summarized in Bawden and Robinson (2022), seen as a definitive, if belated, description of the field. Indeed, White and McCain (1998) noted the lack of a definitive textbook when they were writing at the turn of the century. See Furner (2015), Buckland (2012), Bates (2015), and, recently, Petras (2024), for example contributions.
Definitions often emphasize recorded information, or documents in Buckland’s terms, rather than some of the broader social and cultural dimensions discussed above. Robinson and Bawden’s textbook defines it as a ‘field of study, with recorded information and documentation as its concern, focusing on the components of the information chain, studied through the perspective of domain analysis.’ Bates discussed recorded information and the influential analysis of White and McCain talks about ‘literature systems.’ For Petras, Information Science is also about objects (manifested information) but adds an interesting temporal dimension: ‘Information Science is concerned with how information is manifested across space and time.’
What is striking about this literature is how often it comes back to first principles, and also how the ‘field’ seems less a cumulating, coherent discipline than a set of topical emphases (for example, theoretical information retrieval, quantitative views of the literature, and information seeking behaviors). It can seem like a collection of variably related interests in pursuit of a discipline. Indeed, as the Report mentions, White and McCain (1998) identified two main subdisciplines in their early analysis (information retrieval and citation analysis), and notes their relative independence of each other. Marcia Bates (2022) and others would add information seeking behavior as a third emphasis. Of course, looking across LIS one can identify a variety of additional emphases. Some of this work is summarized in Järvelin, K., & Vakkari, P. (2022).
As late as 2022, Bates, a major figure in the field, was proposing a theory of Information Science, and calling for the various emphases to be finally integrated “into a true disciplinary - not interdisciplinary - a true disciplinary paradigm for the field?” This was in an article where she notes that in the network era “our little field of information science was simply pushed aside by the behemoths of computer science and business.”
This diffusion is seen in how it is described. I chose ‘interest’ above … Bates describes it as a metadiscipline, or earlier, as a meta-field. In this it is like journalism or education. Each is different from what she calls ‘content disciplines’ in that it explores issues which crosscut those disciplines. In Information Science’s case, the cross-cutting interest is the storage and retrieval of knowledge in recorded form. Bawden and Robinson describe Information Science as a ‘field of study’: it is focused on particular problems, but uses a variety of methods and theories.
Petras resists these definitions, preferring to see IS as having “an independent core and locus in the canonical scholarly universe: a discipline.” But this seems like territorial optimism, as does this author’s quote above. Does one have to claim one’s place as a discipline? It is interesting to see this beside Bawden and Robinson’s acknowledgement above that Information Science is likely to rest within a larger disciplinary home (they might have added Business). Indeed, it is revealing to see the acknowledgement in a (the?) leading textbook that Information Science has had an identity issue from the start.
That said there are strong intellectual traditions and affiliations at play here, around particular topics, academic centers of expertise, researchers/teachers and venues. Familiarity with key figures, central works, and the field&aposs core concepts serves as a shibboleth, marking membership within an ongoing social and intellectual community. The community&aposs activities are supported by dedicated journals, conferences, and professional associations, such as ASIS&T.
It is perhaps appropriate to think of Information Science Classic in this manifestation as an ongoing community, institutionalized in a small set of journals, events and associations.
It is perhaps appropriate to think of Information Science Classic in this manifestation as an ongoing community, institutionalized in a small set of journals, events and associations. Given the changes within the iSchools which are a strong part of the architecture and identity of this community, one wonders if it will persist with quite the same identity.
LIS
In general, I feel that even authors who write a great deal about the structure of the field sometimes use the combo of “LIS” as a matter of habit, and even in the works of the most prominent thinkers, there is no consistency in usage. // Dali, Keren (2015) How we missed the boat: reading scholarship and the field of LIS. New Library World.
Two major structural shifts are revealed in the data: in 1960, LIS changed from a professional field focused on librarianship to an academic field focused on information and use; and in 1990, LIS began to receive a growing number of citations from outside the field, notably from Computer Science and Management, and saw a dramatic increase in the number of authors contributing to the literature of the field, notably from Computer Science and Management. // Larivière, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science’s first hundred years. Journal of the American Society for Information Science and Technology.
This analysis of cognitive structure shows a clear division of LIS into three distinct areas with well-defined objects of study and, to some degree, venues of publication. These three areas can be broadly defined as LS, IS, and scientometrics/bibliometrics. // Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of Library and Information Science: Analysis of article title words. Journal of the American Society for Information Science and Technology.
Many of these definitions allude to the Janus-faced nature of the field—a field that is at once both professional and academic […] As a result, Miksa (1985, 1992) argues that the field has two distinct paradigms—librarianship, which is focused on libraries as institutions, and information science, which is focused on information and its communication. They are informed by different research traditions: librarianship from social science, and IS from mathematical communication theory. And while some researchers (Ingwersen, 1992; Vakkari) consider library science (LS) to be a subfield within a more general field of IS, others (e.g., Saracevic) still consider them to be two related, but separate fields. // Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of Library and Information Science: Analysis of article title words. Journal of the American Society for Information Science and Technology.
For the perspective presented here, libraries are a specialized subject area in the wider field of information science. […] The name library AND information science is redundant for this definition, but links to the historical roots of the fields and is now often used to represent a school’s particular focus on libraries in the disciplinary problem space. // Petras, V. (2024). The identity of information science. Journal of Documentation.
A field of research may institutionalize both cognitively and socially (Whitley, 1984). The former means a shared and coherent understanding of principal research problems and goals, ways of conceptualizing the research objects and methodologies to study them. The latter refers to, e.g. university departments, journals and conferences representing the field of research. It is questionable whether the 50 years have led to cognitive institutionalization in LIS as a whole. There hardly exists a shared understanding of principal research problems and goals. […] Interdisciplinarity is fruitful in developing new knowledge, but isn’t here a risk for LIS being absorbed by the stronger partners – both cognitively and socially? Further study is needed to analyse these questions. // Järvelin, K., & Vakkari, P. (2022). LIS research across 50 years: content analysis of journal articles. Journal of Documentation.
The report focus is LIS. The report says it applies LIS in a ‘general sense of “LIS-oriented” — the range of research, academic programs, faculty, and students who contribute to the profession of librarianship, broadly construed.’
However, this is not always how it is used in practice, although as noted by many, including Dali above, LIS is a very elastic term.
Often ‘Library and Information Science,’ it can sometimes be ‘Library and Information Studies.’ Syntax does not help. It can be parsed in a left-branching way ((library and information) science) suggesting ‘library and information’ is a unit. Sometimes it is parsed in a more flat structure as if ((library science) and (information science)) are two parallel concepts brought together. My sense is that it is used in the former way when the discussion is more library related, but I cannot confirm that.
However, in his interesting historical overview of LIS, Hjørland (2018) notes that “In general, however, the tendency today is to use the terms Information Science and Library and Information Science as synonyms.” And, as if to underline the deep confusion around the term, he describes this guidance to prospective authors posted by the editor of a journal.
8. The editor of Information Research recommends the following use of the terms: “An additional point about LIS—this is much over-used and people are rarely writing about research directly related to libraries when they use it: if you are writing about research in libraries, use ‘library research,’ if you are writing about information research, use ‘information research’ or ‘information science research.’ If you really intend both, use ‘research in librarianship and information science.”” (Wilson 2015).
In support of his view above, he also notes that in the ASIS&T Thesaurus of Information Science, Technology, and Librarianship (2005) LIS is considered synonymous with Information Science, whereas librarianship is considered a related term.
At the same time, Petras claims that Library Studies is a subset of Information Science, and hence argues that LIS as a term is formally redundant. And in another position again, Dali resists the IS encroachment on L.
So, there is some ambiguity in the term.
There is general agreement that LIS was formed by combining library and information science perspectives. From a library point of view, this made sense in the context of the growing digital environment and because the information management, policy or other topics that emerged were not exclusive to the library space. However, equally, there is a recurrent note, as in the Report, that the two have not always sat beside each other comfortably. In fact, it has been suggested (Petras and elsewhere) that the ‘science’ in Information Science was chosen to differentiate the emerging discipline from the more practice-oriented Library Studies.
Referring to this dual nature, Milojević et al (2011) note the Janus-faced nature of the field. In their analysis of article words across a range of LIS journals they conclude that there is “a clear division of LIS into three distinct areas with well-defined objects of study and, to some degree, venues of publication.” The three areas “can be broadly defined as LS, IS, and scientometrics/bibliometrics.” They go on to more tentatively suggest that information seeking behavior may be establishing itself as another first order category. One striking feature of the analysis is the separated clustering they discovered of IS and LS journals.
In a later study, Järvelin & Vakkari (2022) carry out a content analysis of 30 LIS journals. They question “whether the 50 years have led to cognitive institutionalization in LIS as a whole” noting that there “hardly exists a shared understanding of principal research problems and goals.” They further observe that in some areas LIS researchers partner with other disciplines, and wonder whether there is a risk of LIS being absorbed by stronger partners both cognitively and socially (partners include computer science, communications, management).
Of course, LIS has indeed been organizationally subsumed into other departments/schools for some time (examples include education, business, computing, communications).
Another recurrent feature of discussion and debate is the low level of theory formation within the field. Not unexpectedly, given the nature of its interests there is a high level of borrowing from other fields. A recent survey of theory formation and use in LIS (Roy & Mukhopadhyay, 2023) concludes rather dramatically by claiming that LIS is facing a “theory crisis.” I am not sure about this, but it is certainly a factor contributing to perceptions of the prestige of the field.
Finally, both Järvelin & Vakkari (2022) and Larivière et al (2012) note a decline in the proportion of publications on librarianship specifically, and a growth in informational and other topics. This is not too surprising, but one wonders how much of a factor the relative balance of L and IS faculty in LIS departments/schools was, acknowledging the absence of clear boundaries.
So, LIS is a confusing term for a diffuse field which has not cohered around a sense of itself as a single entity. The term may be used synonymously with Information Science, and it may be used in a library context to acknowledge a wider informational dimension or to sound more ‘rigorous.’ If it is being used in a specific way, it really needs to be qualified to make this clear.
Again, however, LIS may also have a strong social dimension, whatever about its disciplinary coherence. It references shared schools, conferences, journals, and influences. Although, that said, it is interesting to note the strong divide between IS and L in some of the analysis above.
Information sciences, informatics, iSchool
Some iSchools grew out of Library Science Schools, some out of Computer Science Schools, some have merged with Communication Departments, some have merged with Management—but the telling characteristic of each is that they are interdisciplinary and all share the same goal—to enable their graduates to become successful professionals based on their combined expertise in information, technology, and management. // Liddy, E. D. (2014). iSchools & the iSchool at Syracuse University.
As I noted in the introduction, my view of iSchools has been a little idealistic, given their variety, seeing them as multidisciplinary homes for a variety of informational interests. I suggested that a working characterization might be schools with a sociotechnical perspective that consider technology, people and organizations within a multidisciplinary setting.
Of course, actually existing iSchools are quite various, in terms of breadth of faculty expertise, configuration within the university, and so on. And their number has grown, if membership of the iSchools Organization is a measure. The designation has come to be applied less exclusively as membership has grown. At the time of writing, there are 130 members of this group.
Some offer undergraduate degrees. There are graduate degrees in information systems, data science, and other in-demand areas. There is a strong focus on general IT issues with a social and business dimension. They teach and research topics arising from the construction, deployment and management of information systems across industry, business, educational, or other, domains. This means that some may have a wider bandwidth connection to industry, in terms of partnership and vocational preparation. There may be local drivers around income, employability and overseas students.
At the same time, many have broad research agendas, and they are hospitable to a range of informational perspectives in the terms above, looking at the ramifications of informational organization of social and cultural phenomena. (The Center for an Informed Public at UW is an example.)
It is interesting to note that the iSchools Organization ‘about’ page does not mention libraries or library studies, again as of this writing, although it does include what may be a stock photo of the (all male) busts from the Long Room of Trinity College Dublin, one of the most photographed libraries in the world, and a common media recourse when a library is required.
It is interesting to note that the iSchools Organization ‘about’ page does not mention libraries or library studies, again as of this writing, although it does include what may be a stock photo of the (all male) busts from the Long Room of Trinity College Dublin, one of the most photographed libraries in the world, and a common media recourse when a library is required.
Not all of the iSchools have a library or LIS heritage, and various terms may be used in quite general ways hospitable to many dimensions of information education and research. These include Information Science (with or without a connection to what I call Classic Information Science), Information Sciences and Informatics.
Library studies, information science, LIS and the iSchool
The difference between a multidisciplinary and an interdisciplinary approach is one thread in discussions, with some favoring a multidisciplinary approach (e.g. Bates, 2022). The contrast is between an independent focus which cooperates in a peer-to-peer way versus a more integrated approach in which the characteristics of an individual approach may be subsumed.
A discussion of multidisciplinarity or interdisciplinarity depends on some scale of activity. Library studies has much to gain from a rich mix, in educational and research terms. This is both within the disciplinary mix of the iSchool, and potentially across campus. That depends on Library Studies itself thriving, in education and research terms as part of the mix. As discussed, this presents challenges in the R1 setting, which for scale, prestige and other reasons favors library studies sitting within a broader disciplinary setting. I discussed library studies in the previous section.
Is Information Science (Classic) in a different place? It brings a legacy of interesting work in specialisms which are not exclusive to it (information retrieval, information seeking behavior, domain analysis, …) and a community who may share some traditions and social venues. However, as is evident from the literature, it doesn’t bring a coherent body of work organized around a well-understood discipline. This is especially the case when one notes that the research interests of IS Classic are not exclusively the concern of IS Classic. Indeed, this becomes very apparent within the broader umbrella of the iSchool, which contains researchers and educators exploring similar issues and coming from different disciplinary traditions.
This broader iSchool informational agenda addresses core IS issues in a hybrid disciplinary setting. Information Science Classic still has some organizational momentum, embedded in schools and the social apparatus of a community (journals, ASIS&T, …), but it seems doubtful that it has a strong future as a distinct discipline recognized within the university canon. Bawden and Robinson suggest that it might live in several possible disciplinary homes. However, as with LIS below, one wonders if the continued academic transformation and relocation of the informational disciplines might weaken also the social affiliations of Information Science Classic.
The discussion of LIS in the report is both interesting and ambivalent. It is not clear how you circumscribe LIS. And, although there is an emphasis on LIS faculty, it is not clear either how you circumscribe the LIS faculty population. The report leans to a strong LIS association with librarianship, although, as discussed above, this is not universally true of its usage.
Is it somebody who has a PhD from a current ‘LIS school’? If this is so, and if the trends outlined in the report continue, presumably some of those LIS graduates with a strong IS or technical formation may naturally gravitate to other options in the iSchool. What is an LIS concentration when the IS is increasingly intermingled in the broader disciplinary array of the iSchool?
Is it a particular disciplinary focus? Well, as discussed throughout, there is an informational dimension to many subjects, which means that the IS part of LIS is intellectually less distinctive over time (even if there remains a community distinction, as discussed). The L part of LIS is more distinctive, one might as well just say LS?
Is it a matter of values? The discussion of LIS values reinforces this library emphasis (are those values espoused by the tradition of theoretical IR researchers or by those who are traditionally active in information systems?)
Is it a shared history, tradition and way of thinking? Undoubtedly. And, in this context, many researchers and teachers readily self-identify in an LIS tradition. However, as discussed here, LIS may be used in different ways and this identity may gradually dilute given changes in the iSchools themselves which have been part of the social institutionalization of LIS.
Much of the time whether one uses LS or LIS does not matter much. However, it makes sense to be clear what you are talking about - especially when it matters, as, for example, given the concerns of the report, in a recruitment discussion in an iSchool context.
There is also a variety of ways in which you want to talk about the library which are not about information processing or management.
Suggesting a preference for LS is not to be isolationist. Quite the contrary: LS will benefit from the multidisciplinarity of the iSchool in both research and education terms.
Cross-LAMination
Several iSchools have expertise in libraries, archives and museums and offer degrees across the range. Archives and libraries may be a more common combination.
I have suggested it would be interesting to consider a horizontal LAM expansion to include archival studies (and potentially museum studies), in future iterations. It is important to recognize their distinct intellectual, social and vocational contexts and traditions. In my recommendations I ask if a LAM Forward initiative would create a stronger story than an LIS Forward one.
A note on other disciplines
Historically, several other disciplines had an informational interest. These include, for example, Management Information Systems in a business setting, or Communications/Media studies, or Journalism.
While there might be some porosity of borders, each has a clear center of gravity. Interestingly, the first two of these at least might find a home in a current iSchool.
Of course, there are also significant examples of influential researchers in the Information Science Classic tradition who find a home in other disciplines, information retrieval in computer science, notably, but also in the social sciences, communications or elsewhere.
However, as more activities have become informationalized, many other disciplines have a strong, explicitly informational dimension. The sciences (e.g. bioinformatics), geography, the digital humanities, and so on. And the Report notes the range of disciplines with a strong informational flavor.
Ischools may often promote their work by repeating what Castells above calls the common-sense view that information or knowledge is everywhere. This is a double-edged proposition, and goes to iSchool identity. While it may suggest that the study of information can anchor a school, it can also suggest a lack of distinction.
Coda: overview of and links to full contribution
Collection: LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]
Contribution: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]
Contents: Here are the sections from my contribution. Where I have excerpted them on this site, I provide a link.
Buckland, M. (2012). What kind of science can information science be? Journal of the American Society for Information Science and Technology, 63(1), 1–7. https://doi.org/10.1002/asi.21656
Cornelius, I. (1996). Meaning and method in information studies. Ablex Pub.
Dali, K. (2015). How we missed the boat: reading scholarship and the field of LIS. New Library World, 116(9/10), 477–502. https://doi.org/10.1108/nlw-01-2015-0007
Järvelin, K., & Vakkari, P. (2022). LIS research across 50 years: content analysis of journal articles. Journal of Documentation, 78(7), 65–88. https://doi.org/10.1108/jd-03-2021-0062
Larivière, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science’s first hundred years. Journal of the American Society for Information Science and Technology, 63(5), 997–1016. https://doi.org/10.1002/asi.22645
Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of Library and Information Science: Analysis of article title words. Journal of the American Society for Information Science and Technology, 62(10), 1933–1953. https://doi.org/10.1002/asi.21602
Roy, B. K., & Mukhopadhyay, P. (2023). Theoretical Backbone of Library and Information Science: A Quest. LIBER Quarterly: The Journal of the Association of European Research Libraries, 33(1), 1–57. https://doi.org/10.53377/lq.13269
This study investigates neuroinclusive hiring practices in academic libraries, focusing on best practices for designing hiring experiences that are inclusive to autistic and ADHD job seekers. Our research team conducted a survey of 51 academic libraries in the United States, examining job advertisements, interview formats and practices, and general human resources (HR) practices as they relate to inclusion for neurodivergent job seekers. While the relatively small sample size precludes us from making generalizable statements about the field’s adoption of neuroinclusive practices (or lack thereof), our results brought to light a significant consideration: while some inclusive practices were adopted among survey-taking institutions, many libraries still lack inclusive practices that can specifically benefit neurodivergent candidates. The study demonstrates the need for more comprehensive and proactive measures to support neurodivergent individuals in the hiring process.
Background Information
Language Note:
In this paper, we will be using identity-first language to refer to autistic people. Identity-first language is the preference of many autistic self-advocates, including those at the Autistic Self Advocacy Network (Brown, 2012). Furthermore, we refer to autism as either autism or autism spectrum disorder (ASD) rather than Asperger’s syndrome, an outdated term for a “mild” or “high-functioning” form of autism. Asperger’s syndrome was eliminated in 2013 from version 5 of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and is considered to be a fraught term given the problematic legacy of Hans Asperger, the namesake of Asperger’s syndrome, in Nazi Vienna, specifically with his research into and treatment of disabled children (Czech, 2018). We also refer to Attention Deficit Hyperactive Disorder as ADHD throughout the paper.
Definitions
Neurodiverse: A group of people where one or more members have differing neurocognitive functioning. Some members might be neurotypical, others neurodivergent (Walker, 2021).
Neurodiversity: The diversity of all human minds; the variation of neurocognitive functioning across the human population (Walker, 2021).
Neurodivergent: A person whose mind functions in a way that diverges from dominant societal norms (coined by Kassiane Asasumasu in 2000) (Walker, 2021).
Neurodivergence: The state of being neurodivergent. Forms of neurodivergence can be innate or acquired, and include but are not limited to dyslexia, autism, depression, and epilepsy. Autism is an innate form that has a pervasive impact on “an individual’s psyche, personality, and fundamental way of relating to the world” (Walker, 2021).
Neurotypical: A person whose mind functions in ways that fall within dominant societal norms. It means the opposite of neurodivergent (Walker, 2021).
Neuroinclusive: Practices, processes, and policies that create welcoming and safe spaces for people of all neurocognitive functions (Neurodivergent Rebel, 2024).
Ableism: A system that assigns value to people based on constructed ideas of normalcy, productivity, and fitness – rooted in eugenics, racism, sexism, colonialism, and capitalism – and manifests as discrimination and prejudice against people with disabilities. This can range from lack of ADA compliance, refusing to provide accommodations, failing to incorporate accessibility to overt discrimination and seclusion (Lewis, 2022; Eisenmenger, 2019).
Positionality Statement: “Nothing about us without us”
Our team of authors consists of a mix of neurotypical and neurodivergent library workers; those who are neurodivergent represent various conditions including ADHD and autism spectrum disorder. Among our neurodivergent research team, one is Latine and another is transmasculine. To mitigate potential biases from relying solely on our own experiences in the academic library hiring process, we grounded our work in literature on neuroinclusive hiring best practices. However, our experiences of being job candidates ourselves also supplemented the research in our literature review. We address in our paper how the best practices we outline can be especially beneficial for those who are uncomfortable disclosing a disability or even the undiagnosed. Given the intersectionality between underdiagnosis in various marginalized communities such as racialized minorities, gender minorities, and those with lower socioeconomic status we feel this study appropriately addresses concerns for many marginalized groups.
Introduction
The application and interview process for academic library positions, especially those at the non-staff, librarian level, are often quite demanding and frequently require candidates to engage in processes that are particularly distressing to those with certain neurodivergent conditions, such as ASD or ADHD. As such, the barriers to entry into the profession are often larger for this population, and this can lead to underrepresentation within the academic library workforce. With fewer neurodivergent voices at the table, academic libraries will find it harder to identify the particular challenges that neurodivergent individuals face and the accommodations that could increase their recruitment, thus perpetuating a cycle of exclusion.
Neurodivergence in libraries remains an area of study that needs further development. In particular, there are few papers that examine neuroinclusion in the context of academic library hiring practices. Librarians with disabilities are generally an overlooked group, so the exact percentage of neurodivergent library workers is unknown. However it is estimated that 15-20% of the world population, or 1 in 5 adults are neurodivergent, and 3.7% of library staff in the United States and 5.9% in Canada has some form of neurodivergence (Doyle, 2025; Khan et al., 2022).
In response to this dilemma, we wanted to further explore the topic of neuroinclusion in academic library hiring practices, focusing on the barriers and difficulties that are often present for individuals with ASD or ADHD. We chose to focus on surveying hiring managers, library administrators, and others who have decision-making power when it comes to hiring, in order to get a sense of where the profession currently is in terms of its inclusiveness to neurodivergent job seekers. As part of the study, we conducted a literature review of best practices for neuroinclusive hiring, drawing mostly from fields studying disability and human resources. Further information was gathered by means of a survey sent to academic libraries across the United States, designed to examine how well academic libraries are currently engaging in those best practices in order to accommodate neurodivergent candidates during the hiring process. This study can serve as an exploratory baseline measure of how neuroinclusive the field is in its hiring practices.
What is Neurodivergence?
Neurodivergence can encompass a range of neurological conditions that impact how individuals experience and interact with the world around them. There are a number of conditions that are commonly included under the neurodivergence label such as autism and Attention-Deficit Disorder/Attention-Deficit Hyperactivity Disorder (ADD/ADHD). Many neurodivergent self-advocates favor an expansive definition that is inclusive of most neurological conditions. This can make conducting research on the neurodivergent community challenging, as researchers may need to review literature utilizing a wide range of terminology. For this study, we primarily focus on the experiences of autistic job seekers, and those with ADHD. While we did not specifically address the intersection of AuDHD (autism with ADHD), we believe many of the best practices reviewed in our study are applicable to that convergence.
Higher Education, Libraries, and Neurodivergence
Libraries are often proclaimed to be “safe spaces” for all, including autistic individuals, those with ADHD, or other neurodivergent conditions (Lawrence, 2013, p. 99), and it is colloquially presumed that libraries should be a natural fit for neurodivergent job-seekers (Enna, 2023; Grandin, 1999; Zauderer, 2023). Indeed, for many reasons, it makes sense that the work of libraries would appeal to those who think differently from the norm. Libraries can be a place where being process- or detail-oriented, analytical, committed to follow-through, and wary of the spotlight (common ASD traits) is welcomed (Attar, 2021; Eng, 2017; Maddock, 2022). Those who can make sense from disparate ideas, share deeply in others’ enthusiasms, and quickly read the emotional tenor of a room (common ADHD traits) may also find themselves able to thrive in the field (Swick-Jemison, 2023). However, research featuring the lived experiences of neurodivergent academic library workers paints a more nuanced picture of how prepared the field is to make space for anyone who is not neurotypical.
Experiences of Neurodivergent Academic Librarians
Looking to scholarship written by and/or studying neurodivergent librarians, as well as our own experiences, the academic library can be a challenging place for neurodivergent individuals who staff it. While positive experiences in academic librarianship have certainly been had by neurodivergent employees (see selected passages in Anderson 2021, Eng 2017, Giles-Smith & Popowich 2023), on the whole the literature suggests that the profession tends to be inclusive in words alone to neurodivergent workers. E.E. Lawrence’s “Loud Hands in the Library,” the first article to acknowledge and address the experience of neurodivergent library workers as opposed to patrons, noted in 2013 that there are likely many autistic librarians who feel unsafe being forthcoming about their diagnosis at work. Since then, studies have shown that neurodivergent library workers face many challenges related to the physical and social environments of their workplaces (see Anderson 2021; Camp & Finlay 2025; Eng 2017; Giles-Smith & Popowich 2023; Haire 2025; Swick-Jemison 2023). In some cases, the library can be seen as an unsafe space, as noted in Giles-Smith and Popowich (2023), one where neurodivergent employees can feel “marginalized,” “dismissed,” or even “targeted” (Giles-Smith & Popowich 2023, p.7). There is a fear among neurodivergent individuals on the tenure track or in contract positions that they will be ultimately rejected from being able to stay in their roles because their condition “mak[es] it impossible to conform to the norms necessary to be accepted into the group” (Swick-Jemison 2023 p. 8; see also Giles-Smith & Popowich 2023).
Masking Neurodivergence
Special consideration must be paid to the added labor that neurodivergent library workers must put in to be seen as “normal” in a workplace that is not welcoming of neurodivergence. A 2025 study by Amelia Haire surveyed autistic academic librarians on the efforts they took in their workplaces to learn about its norms in order to survive and be successful. The survey found that autistic librarians often take significant measures to fit into their neurotypical-coded environments, such as going to great lengths to choose just the right method and style of communication when speaking to coworkers, uncovering and studying unwritten rules of the workplace, and even modifying their facial expressions. In an effort to depict a collected (read: neurotypical) exterior, autistic library workers frequently undertake this additional labor in secret, on top of performing their regular job duties (Haire 2025). The masking required to appear calm and on top of things, while below the surface struggling, can be extremely distressing to autistic and other neurodivergent individuals and lead quickly to burnout (Giles-Smith & Popowich 2023; Haire, 2025).
The common urge to mask or to put forth great effort to study the norms of the workplace almost certainly stems from the aforementioned neurotypical coding of the typical library workplace. The unifying call among most literature speaking to the experience of neurodivergent individuals at work is that much greater awareness of neurodivergence is necessary, so that neurodivergent individuals can simply exist as themselves in their workplaces (see Camp & Finlay 2023; Giles-Smith & Popowich 2025; Lawrence 2013).
Hiring Practices as Exclusionary
The hiring procedures for academic libraries are typically similar to the professoriate: a two-part process including phone or video interviews, followed by an on-campus interview that generally begins with a dinner the day before and a day-long interview that includes a presentation, various meetings with teams, a formal interview with the search committee, tours, a lunch, and sometimes more interviews sprinkled throughout the day. For anyone, it can be an anxiety-inducing and exhausting process. For neurodivergent individuals, it can be an even worse experience. Studies with neurodivergent individuals have documented how traditional hiring practices can be barriers to their success. Job advertisements filled with jargon and unnecessary qualifications, hectic interview schedules, broad and ambiguous interview questions phrased in a way that are difficult for neurodivergent candidates to answer, accommodations available only upon disclosure of disability, ableist assumptions regarding fit and ability, and unconscious bias of hiring committees hinder their successful employment (Anderson, 2021; Betz, 2022; Dali, 2019; Haire, 2025; Hyde et al., 2024; Khan et al., 2022; Lau, 2022; Maras et al., 2021; Oud, 2019; Russo et al., 2022; Tomczak et al., 2021; Whelpley & May, 2023). The typical interview process is a natural outcome of the neoliberal ideas underpinning academia, which promote ableist values supporting hyperproductivity and “the fittest” candidates (Cunningham et al., 2019; Dali, 2019; Lau, 2022; Oud, 2019). It is not surprising to learn roughly 85% of autistic people are unemployed and about 46% of those employed in any industry are underemployed (Maras et al., 2021; Russo et al., 2022; Tomczak et al., 2021; See also: Khan et al., 2022). Traditional hiring practices were developed for neurotypical, abled individuals and thus are exclusionary of all others (Khan et al., 2022).
Methodology
Our survey questions were designed to capture the ways and the extent to which libraries take neurodiversity into account when developing and enacting their hiring practices. The bulk of the survey questions were informed by the literature on neuroinclusion in the workplace, and in higher education, drawing from Betz, 2022; Bruyère et al., 2020; Maras et al., 2021; and Saleh et al., 2022. The survey questions can be found in Appendix A.
Based on the best practices covered in the literature, we divided the survey questions into five conceptual categories: the job advertisement, the format of the interview day, the procedural aspects of the interview day, the candidate presentation during the interview day, and general human resources practices. For each of the concepts, we asked two questions. For the first question, we identified relevant “best practices” from the literature (outlined more fully in Appendix B) and asked which of the practices respondents had engaged in. We then followed up with a second question asking, for those practices selected by the respondent in the first question, when would those practices be employed? For the latter question the choices were: “Used by default,” “Used upon request,” and “Used sporadically.” Lastly, we closed the survey with three additional questions that either did not fit neatly into the conceptual framework described above or that allowed respondents to address a more expansive view of neurodiversity in their hiring practices than we chose to focus on in the other survey questions.
We compiled a list of the academic libraries that had posted openings for full-time positions on higheredjobs.com for the period July 1, 2021–October 1, 2022. We ultimately sent out our survey by email to hiring officials or administrators for 369 libraries via Qualtrics in October and November 2022.
Limitations
Based on the low (13.8 %) response rate and incomplete nature of many of the responses received, our sample size was too small to make claims of statistical significance or generalizability. Additionally, the data was collected in Fall 2022 and may not reflect the status quo in 2025 or beyond.
Results
With 51 institutional responses to the survey, the study provides an overview of engagement with best practices across five key areas and offers valuable insights into the current landscape of inclusive hiring in academic libraries.
Most Frequently Used Hiring Practices
Across all categories, five practices emerged as the most frequently used by institutions, both in terms of overall engagement and default implementation:
Writing job descriptions to be short and skills-focused
This practice was adopted by 87% of respondents, with 24 institutions reporting they use it by default.
Creating a deliberately quiet, unrushed interview environment
Used by 81% of respondents, this practice was implemented by default in 27 cases.
Providing information about the interview format in advance
This practice was used by 71.4% of respondents, with 30 institutions implementing it by default.
Introducing the presentation room prior to the interview presentation
88.2% of responding institutions engage in this practice and 26 use it by default.
Regularly updating interview processes to reflect best practices
This general HR procedure was the most widely adopted across all categories, with 93% of respondents reporting its use and 20 institutions implementing it by default.
Least Frequently Used Hiring Practices
Conversely, several practices were rarely used, indicating areas where academic libraries may need further development or support:
Providing alternatives to the standard job talk
20.6% of institutions used this practice and just two use it by default.
Providing training on neurodiversity to HR professionals and hiring managers
Only 10.7% of institutions engage in this practice, and just one reported using it by default.
Using direct and inclusive language about neurodiversity in job ads
This practice was used by just 10% of respondents, with minimal default implementation.
Allowing interviewees to bring support workers or companions
11.9% of institutions engage in this practice and just two use it by default
Keeping the number of interviewers to one or two per session
This practice was used by only 4.1% of respondents, with two institutions implementing it by default.
Additional Observations and Qualitative Insights
The qualitative portion of the survey, though limited in response rate, provided valuable context. Respondents noted that accommodations designed for autistic individuals often benefit candidates with other neurodivergent conditions, such as dyslexia or bipolar disorder. Moreover, several respondents expressed a willingness to expand their practices and explore new accommodations. This openness suggests that while some practices are not yet widespread, there is institutional interest in improving hiring equity for neurodiverse individuals.
Discussion
Based on our survey results, it appears libraries engage in some best practices for hiring neurodivergent candidates; however, given the small sample size and our own experiences interviewing, the results don’t necessarily paint a complete picture of how well the practices are implemented. Additionally, the lived experiences of having gone through the academic library hiring process within our neurodiverse group diverge in some significant ways to the stated common practices of survey takers, calling into question the truthfulness of what hiring institutions are actually doing.
The most widely adopted initiatives tend to be those more “low-hanging fruit” that can be done with minimal effort or consensus building. Over 80% of survey respondents reported creating an unrushed interview day by including breaks between sessions. We know this can be done well, as one of our researchers, Michael, was fortunate that he interviewed for his current position entirely over Zoom during the early lockdown of the COVID-19 pandemic. His breaks consisted of snuggling his dogs and silence, without the small talk often necessary during in person all day interview sessions! As an autistic person, this was very helpful and less stressful for him. But from our experience we also know scheduled interview breaks can be treated as inconsequential buffer time and easily removed when things go over schedule – or be required to serve double-duty as presentation set-up time, bathroom breaks, or yet another moment for small-talk with the search committee.
Over 90% of our respondents say they regularly update hiring practices to mirror best practices. Several of us have seen this practice in action. One member of our team, Alex, reflected on how her department adapted the interview process based on past experiences to remove one of the two presentations required of candidates – a change that felt like a true “lift as you climb” moment as the improvements were made, not as a reactionary necessity, but to create a better, fairer experience for future candidates. While we’ve had successful conversations about changing hiring practices, we also know that it can be tricky convincing people to stop an interview practice that has always been done – or worse, has become almost a hazing ritual. Mary had an interview that included role playing in a group interview with other candidates for teaching positions in non-library academic units. When she gave her solicited and honest feedback about the stressful nature of the experience, she was told, “this is the way we’ve always done it and if we went through it, it can’t be that bad.”
It is encouraging to see that some best practices are regularly employed according to our survey takers. However, we noted the most commonly adopted best practices benefit all job seekers, including the neurotypical, rather than those that can be especially inclusive to those who are neurodivergent, such as implementing training on neurodivergence for search committee members. This is important to note, because it might meet minimum legal requirements without moving beyond it and doing the hard work of truly creating a welcoming experience.
When considering practices that could directly support neurodivergent candidates, we find that these measures have the lowest adoption rates. For instance, allowing for an alternative to the traditional “job talk,” (20.6% adoption) does require more effort and creativity to allow for parity among candidates, but they are achievable depending on the position. We need to ensure that hiring interviews don’t needlessly require candidates to perform and be evaluated on tasks that aren’t expected of the position they are applying for. In one member of our team’s experience, they were applying for a librarian position focused on technical services. The interview process required them to give an oral presentation to the entire library’s staff, and yet the actual job did not require the incumbent to be involved in giving presentations as part of their job duties. Our researcher felt very uncomfortable with the presentation part of the interview and did not feel that it accurately helped to assess how they would have performed in the position itself. In talking with others about these best practices, we’ve experienced the most pushback with presentation alternatives; however, if outreach, teaching, or talking/presenting to stakeholders is not essential to a position, it shouldn’t be an interview requirement.
Those involved in hiring must remember that hiring is a two-way street. If academic libraries want neurodivergent prospective library workers to feel that they will be supported on the job, they have to show that support during the hiring process itself and be aware of – and amenable to fixing – the invisible barriers turning neurodivergent job seekers away. Unintentionally excluding neurodivergent individuals from the candidate pool not only limits the diversity of perspectives within the organization but also overlooks the valuable contributions that neurodivergent individuals can bring to the workforce, and do when given the chance to work.
Finally, while two-thirds of survey respondents noted that they have provided candidates with information about how to request accommodation, this differs significantly from actively making accommodations a standard part of the hiring process. To truly move the needle on being inclusive of neurodivergent library workers, we must consider how to sustainably incorporate all of the best practices outlined in this article. Libraries and universities often espouse values of inclusivity, aiming to serve diverse communities. However, if their hiring practices do not reflect these values, it creates a dissonance between organizational rhetoric and actions, potentially eroding trust and credibility. Furthermore, while some libraries are engaging in best practices that benefit all job seekers, it’s crucial to recognize that simply meeting ADA requirements, that is, the bare minimum required of institutions to avoid litigation, is not enough. In fact, libraries can adhere to ADA standards and still not employ many of the best practices outlined in this paper. By settling for the legal minimum, libraries and institutions perpetuate the very inequities they claim to oppose. We have the opportunity to push beyond these minimal requirements and create the inclusive and welcoming environment that libraries claim to maintain. It is also worth noting that some of the qualitative questions were framed as such that responding negatively could be interpreted as an admission of noncompliance with ADA requirements and federal law, potentially influencing how participants answered.
The responses to the qualitative questions of our survey reflect a desire on the part of employers for inclusivity and accommodation in the hiring process, yet the survey results suggest an unfortunate reality that many of these valuable practices are not being widely implemented. This must raise the question of why academic libraries are not pursuing or succeeding in advocating for their implementation during the hiring process? There are certainly structural barriers in place that make it difficult to effect change in hiring, such as red tape in human resource hiring procedures; indeed, “navigating bureaucracy” was a key factor found by Houk and Nielsen in their 2023 study on factors confounding the inclusivity of academic library hiring practices. Additionally, search chairs may feel the need to conserve their efforts and strategically “pick their battles” instead of advocating for potentially controversial hiring practices. Staff time and financial resources are always limited, and with university budgets tightening, there may be little hope for increased financial resources to address the changes necessary to approach hiring in a more neuroinclusive way (especially with regard to upgrades to the work environment that better accommodate diverse sensory needs). Fortunately, there are barriers that can be overcome with minimal resources. If search committees are not engaging in best practices due to dogmatic mindsets (e.g. “If I had to endure an exhausting interview day, you do too,”), these are attitudes that must be challenged and discouraged by institutional leaders. If the issue is simply a lack of awareness about neurodivergent individuals and their needs, training on neurodivergence should and can be provided at relatively low cost. Fundamentally, we must ask ourselves, to borrow a term from engineering, “what is the desired result?”. Is it to find the person who can do the best job, or merely the person who can best jump through the many hoops of the academic interview? We assume that libraries would universally state that their goal is the former; it’s time to reconsider hiring practices which favor candidates who can do the latter.
In summary, while some academic libraries are engaging in some inclusive hiring practices, the results highlight potential shortcomings in best practices within certain institutions. These shortcomings may perpetuate systemic inequalities, limit diversity, and hinder organizational innovation and effectiveness. Addressing these issues is essential for promoting fairness, diversity, and inclusivity in the workplace.
Future Research and Conclusion
Future Research
Due to the relatively new attention that neurodivergence has garnered in recent years, the potential for future research in this area is extremely wide-ranging. We have identified the following as starting points, but as demonstrated in our review of existing library scholarship, there are gaps in our collective knowledge around neurodivergence and disability that abound in our field. We can think of a number of reasons why this is so, such as the dearth of openly neurodivergent academics and the fraught nature of doing research in fields that some would regrettably consider to be controversial. We owe much to the researchers before us and hope our contribution to this matter will continue to spur further research and understanding.
The hiring process from the perspective of neurodivergent job seekers.
It would be worthwhile to see the inverse of this study, viewing the hiring process from the perspective of neurodivergent library workers. While the authors of this study are a mix of both neurodivergent and neurotypical librarians, we acknowledge that by focusing on hiring practices of library search committees, the current study only presents one side of the story. To best ensure that academic libraries are equitably hiring neurodiverse candidates, it is vital to understand what it is like to be a candidate, asking questions such as when/if neurodivergent candidates choose to disclose their disability during the process, what their expectations are for hiring committees, and what the job search process is like for them overall.
Retention of neurodivergent library workers, particularly those on the tenure track.
While “getting in the door” is a crucial first step for neurodivergent librarians seeking work in academic libraries, there is a need to also study the retention of neurodivergent employees. As JJ Pionke notes in his 2019 Library Trends article, individuals with job needs that diverge from the so-called norm cannot necessarily expect that their time at work will be free of stress related to their condition once they’ve received a job offer. It would be worthwhile to know what practices academic libraries can put in place to increase retention and morale of neurodiverse workers. Particularly in the world of academia, the tenure system is historically ableist (Lindsay & Fuentes, 2022). Studying the experiences of neurodivergent librarians as they navigate the workplace and/or their efforts to receive tenure would give the field insights that might allow libraries to ensure that they are doing more than just lip service to efforts to increase diversity among their workforce.
Workplace support for neurodivergent library workers.
While examining the aggregate experiences of neurodivergent library workers is useful, it is important to remember that no one neurodivergent person will have the same support needs. More study is needed to illuminate how to best support neurodivergent library workers with specific conditions such as ADHD and ASD. From a diversity, equity, and inclusion perspective, this is vital for combating ableism within librarianship and academia. There are some excellent resources already in existence that provide information on these topics, such as the Job Accommodation Network’s website: askJAN.org. Christine Moeller’s (2025) research on the marginalization of neurodivergent librarians provides additional strategies that employers could adopt to improve their recruitment, on-boarding, retention, and advancement (see chapter 4). However, more research in this area would certainly be a welcome development.
Impact of neuroinclusive awareness training on hiring practices and retention.
Creating an inclusive working environment requires efforts that go beyond just revising hiring practices, it also necessitates training managers and administration, investing in accessible equipment and spaces, and creating equitable practices that will signal a commitment to supporting the diverse needs of all employees. Future research could examine whether awareness of autism and/or ADHD and training on supervising neurodivergent individuals creates proactive and not reactive managers, supervisors, and administrators. Initial research suggests clear feedback to reportees as well as creating space to seek clarification in tasks, feedback, and job duties improves the success of neurodivergent employees (Anderson, 2021; Khan et al., 2022; Oud, 2019; Russo et al., 2022).
Conclusion
In closing, while this study sheds light on some factors that impact the hiring practices for neurodivergent candidates in professional academic librarian positions, more study is needed to see how we should best support them.
Although the practices highlighted in this study can benefit a broad spectrum of candidates, it’s important to recognize that accommodating the diverse needs of all job seekers cannot be encompassed in one set of guidelines. Encouraging a supportive community and environment where individuals feel safe to disclose neurological conditions, without fear of negative repercussions, is key to providing people the support they need to navigate the job search successfully.
The findings from our research reveal a distinct and compelling aspiration among our respondents for inclusivity in the academic library LIS hiring process. Survey participants expressed a clear preference for practices that foster an inclusive environment, emphasizing the importance of creating a workplace that caters to diverse needs. However, an unfortunate reality emerges from our study: these desired practices are not readily implemented unless expressly asked for (and sometimes not even then). This is particularly problematic, as requiring candidates to ask for accommodations can force them into a situation in which they must disclose their condition or jump through extra hoops in order to obtain them. As one of our researchers can describe, getting an ADHD diagnosis as a cisgender adult woman can be extremely slow, costly, and inaccessible – even with decent comprehensive health insurance and the advantages of working in higher education. Another of our researchers encountered similar challenges in seeking an Autism diagnosis as an adult, as evaluations often require childhood documentation or the involvement of family members, not to mention a high price tag. The prospect of diagnosis is fraught: because of societal stigma, one may conclude that the risks and burdens of pursuing a diagnosis outweigh the benefits, or only feel comfortable disclosing their needs to a small degree (like using fidgets or noise-cancelling headphones) after feeling they’ve proven themselves enough professionally. These experiences emphasize the point that not every person is comfortable – or safe – disclosing their condition, and that many are left to struggle or mask their needs rather than seek accommodations. For this reason, best practices are better implemented as the default so that all candidates, regardless of whether they are neurodivergent or neurotypical, can benefit without the added burden of disclosure at potential personal risk.
Acknowledgements
We would like to thank our Internal Peer reviewer Jess Schomberg; External Peer reviewer Christine Moeller; and Publishing Editor Jaena Rae Cabrera for their assistance with this work, as well as our colleagues Candice Benjes-Small, Rick Mikulski, and Jessica Ramey who provided feedback on early drafts of our article. We also want to thank our library’s Inclusive Excellence Committee for holding a W&M Libraries EDI Summer Conversation Series investigating disabled voices in libraries which included the Lead Pipe article, “Neurodiversity in the Library: One Librarians’ Experience,” by Alice Eng featuring Charlie Remy.
References
Anderson, A. (2021). Job seeking and daily workforce experiences of autistic librarians. The International Journal of Information, Diversity, & Inclusion, 5(3), 38–63.
Attar, K. (2021). Autism, librarianship and their fit. Good Autism Practice (GAP), 22(2), 32–39.
Baumer, N., & Frueh, J. (2021, November 23). What is neurodiversity? Harvard Health. https://www.health.harvard.edu/blog/what-is-neurodiversity-202111232645
Betz, G. (2022). Navigating the Academic Hiring Process with Disabilities. In The Library With The Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2022/hiring-with-disabilities/
Bruyère, S. M., Chang, H.-Y., & Saleh, M. C. (2020). Preliminary report summarizing the results of interviews and focus groups with employers, autistic individuals, service providers, and higher education career counselors on perceptions of barriers and facilitators for neurodiverse individuals in the job interview and customer interface processes (Phase 1 Final Report). https://ecommons.cornell.edu/server/api/core/bitstreams/24b341ac-fc9e-4d8d-8a5c-3739436508ed/content
Cunningham, S., Guss, S., & Stout, J. (2019). Challenging the ‘Good Fit’ Narrative: Creating Inclusive Recruitment Practices in Academic Libraries. Recasting the Narrative: The Proceedings of the ACRL 2019 Conference. ACRL 2019, Cleveland, OH. https://alair.ala.org/server/api/core/bitstreams/e155955e-c706-4198-aabd-84aaf1c496ba/content
Czech, H. (2018). Hans Asperger, national socialism, and “race hygiene” in Nazi-era Vienna. Molecular Autism, 9(1), 29. https://doi.org/10.1186/s13229-018-0208-6
Dali, K. (2019). Avoiding a senseless endurance test: Hidden disabilities and interviewing in LIS. The International Journal of Information, Diversity, & Inclusion, 3(1), 1–12. https://doi.org/10.33137/ijidi.v3i1.32265
Doyle, B. (2025, February 18). Neurodiversity and libraries: Terms, tools, and strategies for creating more inclusive experiences for staff and patrons. WebJunction. https://www.webjunction.org/news/webjunction/neurodiversity-libraries.html
Eisenmenger, A. (2019, December 12). Ableism 101 – What is Ableism? What Does it Look Like? Access Living. https://www.accessliving.org/newsroom/blog/ableism-101/
Eng, A. (2017). Neurodiversity in the library: One librarian’s experience. In The Library With The Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2017/neurodiversity-in-the-library/
Enna. (2023, July 14). Unleashing potential: Great jobs for autistic adults. Enna. https://enna.org/unleashing-potential-great-jobs-for-autistic-adults/
Giannantonio, C. M., & Hurley-Hanson, A. E. (2022). Recruitment strategies: Generating a neurodiverse workforce. In S. M. Bruyère & A. Colella (Eds.), Neurodiversity in the Workplace: Interests, Issues, and Opportunities (1st ed., pp. 60–97). Routledge. https://doi.org/10.4324/9781003023616-3
Giles-Smith, L., & Popowich, E. (2023). Autistic employees in Canadian academic libraries: barriers, opportunities, and ways forward. Canadian Journal of Academic Librarianship / Revue Canadienne de Bibliothéconomie Universitaire, 9, 1–22. https://doi.org/10.33137/cjal-rcbu.v9.39994
Grandin, T. (1999, November). Choosing the right job for people with autism or Asperger’s syndrome. Indiana Resource Center for Autism. https://iidc.indiana.edu/irca/articles/choosing-the-right-job-for-people-with-autism-or-aspergers-syndrome.html
Haire, A. (2025). What does it mean to be information literate for an autistic librarian in the academic library workplace? Journal of Information Literacy, 19(1), 27–47. https://doi.org/10.11645/19.1.723
Hendricks, D. (2010). Employment and adults with autism spectrum disorders: Challenges and strategies for success. Journal of Vocational Rehabilitation, 32(2), 125–134. https://doi.org/10.3233/JVR-2010-0502
Houk, K., & Nielsen, J. (2023). Exploring academic librarian candidates’ experiences on the job market. Forging the Future Conference Proceedings, 153–168.
Hyde, R., Albert, A., Emery, J., & Fancher, S. (2024). Inclusive Interviewing: Leveraging the Virtual Format to Demonstrate Care for Future Colleagues. University Libraries Publications. https://openscholarship.wustl.edu/lib_papers/36
Khan, M. H., Grabarski, M. K., Ali, M., & Buckmaster, S. (2022). Insights into creating and managing an inclusive neurodiverse workplace for positive outcomes: A multistaged theoretical framework. Group & Organization Management, 1339 – 1386. https://doi.org/10.1177/10596011221133583
Lau, T. C. W. (2022). “Undisciplined cognators”: Invisible disability and neurodiversity on the academic job market. In C. McGunnigle (Ed.), Disability and the Academic Job Market (pp. 53–71). Vernon Press.
Lawrence, E. (2013). Loud hands in the library. Progressive Librarian, 41, 98–109.
Lewis, T. A. (2022, January 1). Working Definition of Ableism—January 2022 Update. Talila A.
Lindsay, S., & Fuentes, K. (2022). It is time to address ableism in academia: A systematic review of the experiences and impact of ableism among faculty and staff. Disabilities, 2(2), Article 2. https://doi.org/10.3390/disabilities2020014
Maddock, K. (2022). To what extent are autistic library staff in the UK supported in their workplace? [Master’s dissertation, University of Southampton]. OpenEd@UCL. https://open-education-repository.ucl.ac.uk/id/eprint/902
Maras, K., Norris, J. E., Nicholson, J., Heasman, B., Remington, A., & Crane, L. (2021). Ameliorating the disadvantage for autistic job seekers: An initial evaluation of adapted employment interview questions. Autism, 25(4), 1060–1075. https://doi.org/10.1177/1362361320981319
Moeller, C. (2025). “Begging to be heard”: The professional exclusion and marginalization of neurodivergent librarians [Dissertation, University of Washington]. https://hdl.handle.net/1773/53006
NeuroDivergent Rebel. (2024, December 11). What is Neuro-Inclusion? [Substack newsletter]. NeuroDivergent Rebel’s Substack. https://neurodivergentrebel.substack.com/p/what-is-neuro-inclusion
Oud, J. (2019). Systemic workplace barriers for academic librarians with disabilities. College & Research Libraries, 80(2), 169–194. https://doi.org/10.5860/crl.80.2.169
Russo, E., Ott, D. L., & Moeller, M. (2022). Is there a place for neurodiversity in the talent pool? In M. Latukha (Ed.), Diversity in Action (pp. 265–285). Emerald Publishing Limited. https://doi.org/10.1108/978-1-80117-226-420221014
Saleh, M. C., Chang, H.-Y., Susanne M. Bruyère, & Vogus, T. J. (2022). Neurodiverse applicant screening, interviewing, and selection. In S. M. Bruyère & A. Colella (Eds.), Neurodiversity in the Workplace: Interests, Issues, and Opportunities (1st ed., pp. 98–123). Routledge. https://doi.org/10.4324/9781003023616-4
Swick-Jemison, J. (2023). ADHD and the early career teaching librarian: An autoethnography. Canadian Journal of Academic Librarianship / Revue Canadienne de Bibliothéconomie Universitaire, 9, 1–16. https://doi.org/10.33137/cjal-rcbu.v9.40953
Tomczak, M. T., Szulc, J. M., & Szczerska, M. (2021). Inclusive communication model supporting the employment cycle of individuals with autism spectrum disorders. International Journal of Environmental Research and Public Health, 18(9), Article 9. https://doi.org/10.3390/ijerph18094696
Walker, N. (2021, August 19). Neurodiversity: Some basic terms & definitions [Blog]. Neuroqueer. https://neuroqueer.com/neurodiversity-terms-and-definitions/
Whelpley, C. E., & May, C. P. (2023). Seeing is disliking: Evidence of bias against individuals with autism spectrum disorder in traditional job interviews. Journal of Autism and Developmental Disorders, 53, 1363–1374. https://doi.org/10.1007/s10803-022-05432-2
Zauderer, S. (2023, July 28). 30 great jobs for people on the autism spectrum. Cross River Therapy. https://www.crossrivertherapy.com/autism/best-jobs
Appendix A
Survey Questions
Interview & Hiring Practices
The following hiring procedures have been identified in HR literature as best practices for hiring neurodiverse individuals that diverge from typical practices
Which of the following are you aware of your workplace engaging in under any circumstance, whether one-off or regularly?
Job Ad/Description (Please select all that apply)
Writing job descriptions to be as short and skills-focused as possible
Writing job ads as clearly as possible (e.g. “able to regularly provide research assistance on the desk” instead of “being service oriented”)
Using direct and inclusive language about hiring neurodiverse individuals in job ads
Utilizing autism hiring networks or disability-focused hiring networks to post jobs and recruit candidates
Don’t know / Unsure
Interview Practices: Format of Interview Day (Please select all that apply)
Creating a deliberately quiet, unrushed environment during the interview day (i.e. breaks between interview sessions)
Providing information about how to request adjustments/accommodations to the interview day
Allowing interviewees to bring support workers/companions (i.e. mentors or job coaches) for the interview day
Allowing a camera-off or phones-only interview setting for remote interviews
Don’t know / Unsure
Interview Practices: Interview Procedures (Please select all that apply)
Providing information about the interview’s format prior to the interview (i.e. the number of interviewers and questions, whether there will be questions only or practice/test scenarios, etc.)
Introducing the interview room prior to the interview itself
Allowing for breaks within interviews themselves
Providing alternatives to the standard interview (e.g. skills demonstration, prerecorded video answers, portfolios of work)
Keeping the number of interviewers to one or two people maximum per session
Providing interview questions in advance
Shortening interview sessions to 20-30 minutes (Interview sessions being those in which the candidate is interviewed, e.g. with search committee, library director, etc.)
Limiting number of interview sessions to those with essential stakeholders
Including guidance in interview questions to prompt for more specific information and self-reflections (e.g. What is your approach to answering a research question?)
Don’t know / Unsure
Interview Practices: Presentation Procedures (Please select all that apply.)
Introducing the presentation room prior to the presentation
Providing alternatives to the standard job talk (e.g. technical skills demonstration, prerecorded job talk)
Don’t know / Unsure
Interview Practices: General HR Procedures (Please select all that apply.)
Regularly updating and/or modifying interview processes to mirror best practices so that evaluation is based on a candidate’s qualifications for job, rather than subjective judgements about social skills or “soft” skills
Providing training on neurodiversity and how to support it in hiring to HR professionals and hiring managers prior to conducting interviews
Don’t know / Unsure
Additional Questions
Please use this space if you would like to provide any commentary or context about any of the procedures your workplace utilizes.
Are the accommodations you provide informed by university-wide policies for hiring neurodiverse individuals?
If you accommodate for other neurodiverse conditions that are not autism or ADHD (e.g. dyslexia, bipolar disorder, etc) in your hiring process, please explain or expand upon that.
Please describe any other accommodations you provide not already addressed in the survey.
Appendix B
Inclusive Hiring Practices
While best practices for diversifying the workforce by empowering neurodivergent applicants have not been formally codified in the academic library literature, we can draw from the broader field of human resources to better understand how to make hiring processes more inclusive and accessible.
Revising Job Descriptions
State only the necessary skills the person needs and the specific tasks that this position will perform. Include phrasing that signals your organization is looking for neurodivergent candidates, and encourage individuals to apply even if they do not have some or all of the preferred qualifications (Bruyère et al., 2020; Giannantonio & Hurley-Hanson, 2022; Saleh et al., 2022, 2022; Tomczak et al., 2021).
Revisiting Interview Practices
In revisiting interview practices, it is important to look at the interview questions, interview environment, and interview schedule. Hiring committees should provide interview questions to candidates in advance; studies show neurodivergent individuals are more successful when given the opportunity to provide thought-out responses. This also helps with responding to questions that force recall or are “personality questions” (ex. Tell us about yourself). Create detailed, clearly formatted, and unambiguous interview questions. As much as possible, questions should avoid being generic or open-ended (ex.Tell us about a time you encountered conflict). This prevents candidates from needing to mask and alleviates stress and anxiety over needing to interpret what type of response interviewers are looking for (Bruyère et al., 2020; Giannantonio & Hurley-Hanson, 2022; Houk & Nielsen, 2023; Hyde et al., 2024; Maras et al., 2021; Saleh et al., 2022; Tomczak et al., 2021). Some studies have found that structured interview questions (multi-part questions are broken out and asked in sequence) lead to more successful interview responses, while others have found less-structured questions are better (Bruyère et al., 2020; Maras et al., 2021; Tomczak et al., 2021). While there is a debate about structure, there is agreement that avoiding cohort interviews or interviews with large panels of interviewers is important, as it creates a stressful environment for candidates (Bruyère et al., 2020; Saleh et al., 2022).
In addition to examining interview questions, review the schedule of the day and information provided to the candidate. Provide an interview schedule that includes start and end times, a list of the events throughout the day, information about the environment they will be in, and dress expectations. When bringing candidates into new environments (e.g. the presentation space), ask them if they need any environmental changes to the space, and if any scheduled environmental changes are likely (e.g. construction), warn the candidate in advance (Bruyère et al., 2020; Giannantonio & Hurley-Hanson, 2022; Hendricks, 2010; Saleh et al., 2022).
As the interview day is scheduled, include break times to allow the candidate to get a snack and/or water and bathroom breaks. Plan to offer candidates a space to store their belongings and when touring, offer elevators and walk slowly. Other accommodations that can be default include providing a seated option for presentations, using a microphone for questions from the audience, limiting the interview schedule to only the essential events, and leaving meals for eating, not interviewing (Dali, 2019). The goal is not to require people to disclose a disability before receiving accommodations, but to provide accommodations as part of your normal interview processes (Anderson, 2021; Dali, 2019; Giannantonio & Hurley-Hanson, 2022; Khan et al., 2022; Lau, 2022; Saleh et al., 2022).
Training on Neurodiversity
Training on inclusive hiring, neurodivergence, and disability is essential for hiring committees, supervisors, and administrators as part of redesigning the recruitment and hiring process. Unconscious bias, stereotypes about disabilities and accommodations, and a lack of understanding about neurodivergence result in hiring selections that are preferential to those who match anticipations of “fit” and “proper” social presentation (Bruyère et al., 2020; Hendricks, 2010; Khan et al., 2022; Oud, 2019; Saleh et al., 2022; Tomczak et al., 2021). This includes hostile views toward neurodivergence by, for example, assuming all accommodation-seekers are looking for special treatment (Oud, 2019) or refusing to allow a candidate to progress to a new stage of the hiring process because they seemed to have “unlikable” body language (Giannantonio & Hurley-Hanson, 2022; Lau, 2022; Saleh et al., 2022; Whelpley & May, 2023).
Yesterday I read an article that I want to celebrate here: "From Notes to Networks: Using Obsidian to Teach Metadata and Linked Data" by by Kara Long and Erin Yunes, code4lib Journal, Issue 61, 2025-10-21.
More than three years ago, based on Paul Butler's The problem with bitcoin miners, I wrote Generally Accepted Accounting Principles. The TL;DR was that the economic life of Bitcoin mining rigs was estimated at 16 months, as Moore's law in a competitive ASIC market rapidly generated more power-efficient rigs. But the Bitcoin miners' accounts were using 5-year straight-line depreciation for their rigs, which was significantly increasing their nominal profits.
Below the fold I look at the same problem unfolding in the heart of the AI bubble.
Six months after we started Nvidia we knew over thirty other startups trying to build graphics chips for the PC. A major reason that only ATI survived was that, after the second chip, Nvidia released a better chip every six months like clockwork. I explained the technology that made, and still makes, this possible in The Dawn Of Nvidia's Technology, but in the long term equally important was that the idea of a fixed cadence got built into the company's culture, and this now applies not just to graphics chips but also to AI chips:
Last year Nvidia, ..., said it would unveil a fresh AI chip every year rather than every couple of years. In March its boss, Jensen Huang, remarked that “when Blackwell starts shipping in volume, you couldn’t give Hoppers away,” referring to Nvidia’s latest chips and their predecessors, respectively.
The systems are estimated to be more than half the capex for a new data center. Much of its opex is power. Just as with mining rigs, the key feature of each successive generation of AI chips is that it is more efficient at using power. But that doesn't mean they use less power, they use more but less per operation. The need for enhanced power distribution and the concomitant cooling is what has prevented new AI systems being installed in legacy data centers. Presumably the next few generations will be compatible with current state of the art data center infrastructure, so they can directly replace their predecessors and thereby reduce costs.
It is often assumed that the majority of the opex of AI platforms goes into training, which is a once per generation cost that can be amortized over the huge number of inference operations needed to answer prompts. But as James O'Donnell and Casey Crownhart report in We did the math on AI’s energy footprint. Here’s the story you haven’t heard.:
As conversations with experts and AI companies made clear, inference, not training, represents an increasing majority of AI’s energy demands and will continue to do so in the near future. It’s now estimated that 80–90% of computing power for AI is used for inference.
So 80-90% of the opex is per-query, which makes improving system efficiency critical to reducing the torrent of cash these platforms are hemorrhaging. Ed Zitron estimated that:
To be abundantly clear, as it stands, OpenAI currently spends $2.35 to make $1.
AI revenues and expenditures are likewise a 13-figure affair nowadays. Worldwide spending on AI hardware and software nudged $1trn last year, according to Gartner, a research firm. This is likely to double to $2trn in 2026. Between 2024 and 2026 the five listed AI powerhouses will have splurged over $1trn on capital investments, chiefly AI data centres. A slug will end up with Nvidia and Broadcom, which furnish them and others with AI semiconductors. The duo (combined market capitalisation: $6trn) are together forecast to book almost $1trn in sales over that period.
The platforms haven't been depreciating these assets at realistic rates:
biggest customers have in recent times been raising their servers’ lifetimes, reducing depreciation charges in their accounts. Microsoft pushed it up from four to six years in 2022. Alphabet did the same in 2023. Amazon and Oracle changed it from five to six in 2024. And in January Meta moved from five to five and a half years.
Amazon reversed course and moved back to five years for some kit, noting this would cut operating profit in 2025 by $700m, or about 1%, owing to a higher depreciation expense. Given the rapid advances in chipmaking, that seems optimistic. And Amazon’s AI rivals clinging to their elongated depreciation schedules look Pollyannaish. In July Jim Chanos, a veteran short-seller, posted that if the true economic lifespan of Meta’s AI chips is two to three years, then “most of its ‘profits’ are materially overstated.” A recent analysis of Alphabet, Amazon and Meta by Barclays, a bank, estimated that higher depreciation costs would shave 5-10% from their earnings per share.
At the five companies’ current ratio of market capitalisation to pre-tax profit, this would amount to a $780bn knock to their combined value. Redo the sums depreciating the servers over two years instead of three and the size of the hit rises to $1.6trn. Take Mr Huang literally, and you get a staggering $4trn, equivalent to one-third of their collective worth.
A big chunk of the AI platforms' investment in the hardware is financed with debt. If the lenders are using a five-year life when valuing the hardware as collateral they will run into trouble too. It is rumored that Macquairie was lending against GPUs using a seven-year life.
But as Paul Butler pointed out, straight-line depreciation over-estimates the value of older but not yet uneconomic assets. I described Butler's graph:
Butler uses historical hash rate data to compute the actual depreciation curves for mining hardware, plotting the percentage of the initial bitcoin production rate against time in quarters for each quarter since Bitcoin's inception. The graph shows that initially (bluest lines), when GPUs were introduced, they stopped producing after about 5 quarters. Recently (magenta-est lines), ASICs last longer, stopping producing after about 4 years. But for the whole of Bitcoin's existence the hardware has depreciated far faster than the GAAP's five year straight line.
He assumes that the ASICs are obsolete when they can no longer keep up with the hash rate so are no longer mining any Bitcoin. That is wrong. ASICs are obsolete when the Bitcoin they mine no longer pay for the electricity they use. The newer ASICs aren't just faster, they also use much less energy per hash. Look again at the depreciation graph, which suggests current ASICs go obsolete after 16 quarters. But Alex de Vries and Christian Stoll's estimate of 5 quarters to obsolescence is based on comparing the ASIC's production with the cost of their power consumption, which is the correct approach. The curves in the graph are correct out to the 40% line, but then should drop to zero.
Similarly, at some point in the future when the AI platforms realize they need a return on their investments, running systems answering queries that earn less than the cost of the power will no longer make sense and their value will drop precipitously.
If hardware is being used as collateral for a loan the value should represent what it would fetch on the market. Assume Nvidia is on a 2-year cadence. Customers don't get their hardware instantly, so assume that the borrower got theirs 6 months into the 2 years. They default after 1 year, so the bank is selling hardware with 6 months left before the next generation starts shipping, and 1 year before a typical customer can get hardware that renders the current hardware almost uneconomic. The lender's 5-year straight-line estimate of value would be 80% of the purchase price. A buyer would likely estimate 20% of the purchase price.
This blog has often cited the excellent work of Arvind Narayanan at Princeton. His colleague Mihir Kshirsagar posted an equally excellent piece on the implications of inadequate depreciation for competition entitled Lifespan of AI Chips: The $300 Billion Question:
What I have found so far is surprising. It appears that we’re making important decisions about who gets to compete in AI based on financial assumptions that may be systematically overstating the long-run sustainability of the industry by a factor of two.
Kshirsagar has identified inadequate depreciation as a key part of what I have called "the drug-dealer's algorithm" (the first one's free):
Incumbent coalitions—hyperscalers (Microsoft, Amazon, Google) partnered with their model developers (OpenAI, Anthropic)—can effectively subsidize application-layer pricing during the critical years when customer relationships are being formed. This could create advantages that prove insurmountable even when better technology becomes available.
The core issue isn’t whether chips wear out faster than accounting suggests. It’s whether the market structure being formed is fueled by accounting conventions that obscure true long-run economics, allowing incumbent coalitions to establish customer lock-in at the application layer before true costs become visible. In other words, the accounting subsidy creates a window of roughly three to six years where reported costs are artificially low. After that, the overlapping depreciation schedules catch up to operational reality. Here’s why this timing matters: that three-to-six-year window is precisely when the market structure for AI applications is being determined. Customer relationships are being formed. Enterprise integrations are being built. Multi-year contracts are being signed. Switching costs are accumulating. By the time the accounting catches up—when companies face the full weight of replacement costs hitting their income statements—the incumbent coalitions will have already locked in their customer base. The temporary subsidy enables permanent competitive advantage.
Kshirsagar goes on to examine the effects of the drug-dealers algorithm in considerable detail. I encourage you to read the whole post, although I have a few quibbles:
He focuses on enterprise use of AI as the platform's business model, and seems to assume that enterprises will find that using AI generates enough value to cover the subsidized cost through the lock-in period. Given the negative productivity impact most enterprises currently report, this is not a safe assumption.
Even if management's infatuation with AI, and the prospect of firing all their annoying employees, results in their being locked in to the AI platform of their choice, they will need to achieve massive productivity gains to be able to afford what the platforms will have to charge to raise the $2T/year in revenue they will need.
The AI platforms have already figured out that enterprise use isn't likely to generate enough revenue and are pivoting to advertising, affiliate marketing and porn.
researchers from Anthropic, the UK AI Security Institute, and the Alan Turing Institute released a preprint research paper suggesting that large language models like the ones that power ChatGPT, Gemini, and Claude can develop backdoor vulnerabilities from as few as 250 corrupted documents inserted into their training data.
Mayraz’s question was: can we send a pull request — a suggested code fix — that contains a prompt injection? And make the bot spill sensitive user data, like private code or AWS login keys? Yes, we can!
...
Mayraz’s exploit was a rediscovery of an exploit GitHub had already been told about. User 49016 discovered the bug a few months before and reported it on HackerOne. GitHub acknowledged the report and called it a “low risk” issue — “yes, they seriously consider a LLM leaking your private repo contents as a ‘low risk issue’.” (Mayraz’s CVE filing was rated a 9.6.) 49016 reports that GitHub’s fix is rubbish, and “bypassing it took 5 minutes lol.”
And there is the risk that AI platforms will become obsolete with "good enough" AI like DeepSeek and "good enough" local compute such as Nvidia's DGX Spark:
a $4,000 desktop AI computer that wraps one petaflop of computing performance and 128GB of unified memory into a form factor small enough to sit on a desk.
...
Nvidia’s Spark reportedly includes enough memory to run larger-than-typical AI models for local tasks, with up to 200 billion parameters and fine-tune models containing up to 70 billion parameters without requiring remote infrastructure. Potential uses include running larger open-weights language models and media synthesis models such as AI image generators.
The system is the tiny gold box in the image. It is based on Blackwell.
One odd thing about AI equipment is that it’s very expensive to buy and very cheap to rent.
Want an Nvidia B200 GPU accelerator? Buying one on its release in late 2024 would’ve probably cost around $50,000, which is before all the costs associated with plugging it in and switching it on. Yet by early 2025, the same hardware could be rented for around $3.20 an hour. By last month, the B200’s floor price had fallen to $2.80 per hour.
That is weird. If the economic life of a B200 is 2 years at $2.80/hour it would earn only $49K, or its purchase price. But Elder explains that it isn't that simple:
What might be less obvious is that among the hyperscalers — Amazon’s AWS, Microsoft’s Azure, Google and Oracle — prices have hardly budged. The result is an ever-widening gap between rates charged by the big-four and a growing number of smaller rivals.
Here’s the same RBC data organised to show relative pricing across all GPUs. The iceberg effect shows how much it’s new entrants driving average rates lower. Meanwhile, for a hyperscaler customers, this month’s bill per GPU will almost always be the same as last month’s bill
A lot of pandemic-era Nvidia GPUs will be heading towards Cash Converters having never washed their face.
The customers attracted by low AI compute costs have yet to show much ability, willingness or inclination to pay more.
The hyperscalers don’t believe these customers are worth competing for, so have chosen to wait for the discount end of the market to die of insolvency.
The inevitable data-centre shakeout will kill lots of AI start-ups that can’t afford to pay what compute actually costs.
We might be overestimating the size of the GPU market if the middle-ground — meaning regular companies that want OpenAI and Anthropic to make their chatbots, summarisation tools and similar AI widgets — turns out to be worth less than $3tn.
Those are all good points.
Update
I need to update today's post to cover two important posts I missed:
Together, they greatly amplify the message that when the IA bubble bursts, the damage will not be confined to the AI platforms, or even to technology stocks, but it will impact the banks and private capital lenders.
Will Lockett
Lockett starts from an analysis by Julien Garran of MacroStrategy Partnership, pointing to the mismatch between the enormous investments the AI platforms are making and the lack of any sustainable business model to generate the $2T/year in new revenue needed to provide a return.
So far, Garran's analysis is hardly original, but:
Garran used the economic analysis pioneered by economist Knut Wicksell to establish the size of this investing discrepancy, the 2008 bubble, and the dot-com bubble, and that is how he learned that the AI bubble is currently four times the size of the 2008 bubble at its peak!
It seems AI companies have been growing through equity financing (selling shares of the company) rather than debt financing (borrowing money). Furthermore, investment seems to be incestuous to the tech industry; for example, OpenAI’s major investors are Microsoft and Nvidia. As such, many believed that the AI bubble is actually relatively isolated from the rest of the economy and therefore might not have as significant an impact on the wider economy when it bursts.
It turns out that a large slice of the bubble spending isn't equity:
Dario Perkins, managing director of global macro at TS Lombard, has found that many AI companies are increasingly using SPVs to raise significant amounts of debt financing off the books. This covers their tracks and obfuscates the debt, making it “look” like the company is running on equity finance instead. Due to this, it is incredibly difficult to get an accurate figure on how much of the AI industry’s expenditure and growth comes from debt — but we know it is a lot!
Goldman Sachs has found that at least $141 billion of the $500 billion in capital expenses the AI industry has spent so far this year came from debt directly tied to the main corporate body through corporate credit issuances. To give you an idea of how insane that is, the entire AI industry capital expenditure in 2024 was $127 billion. In other words, the AI industry has taken on significantly more debt so far this year than it ever spent in total last year. This also means that we know that about 30% of the AI industry’s annual expenditure for this year came from “on the books” debt.
We know that Meta is looking to raise $26 billion in debt through an SPV by the end of the year. This one deal equates to 5% of the total AI industry’s capital expenditure for this year.
You have to wonder what collateral is backing this "at least" $167B in debt, and how it is being depreciated.
Ed Zitron
Some of the AI platforms have non-AI free cash flows that can be diverted to paying the interest on these debts, or that could be if they hadn't already been shoveled into the AI investment cash furnace. But OpenAI, Anthropic and the other pure-play AI platforms don't. In order to pay the interest without needing to raise more equity, they need to be cash flow positive. Zitron has always been skeptical that they can be, for example writing:
To be abundantly clear, as it stands, OpenAI currently spends $2.35 to make $1.
Now, Zitron has strong evidence that he was right:
Based on discussions with sources with direct knowledge of their AWS billing, I am able to disclose the amounts that AI firms are spending, specifically Anthropic and AI coding company Cursor, its largest customer.
I can exclusively reveal today Anthropic’s spending on Amazon Web Services for the entirety of 2024, and for every month in 2025 up until September, and that that Anthropic’s spend on compute far exceeds that previously reported.
Anthropic has spent more than 100% of its estimated revenue (based on reporting in the last year) on Amazon Web Services, spending $2.66 billion on compute on an estimated $2.55 billion in revenue.
Additionally, Cursor’s Amazon Web Services bills more than doubled from $6.2 million in May 2025 to $12.6 million in June 2025, exacerbating a cash crunch that began when Anthropic introduced Priority Service Tiers, an aggressive rent-seeking measure that begun what I call the Subprime AI Crisis, where model providers begin jacking up the prices on their previously subsidized rates.
Although Cursor obtains the majority of its compute from Anthropic — with AWS contributing a relatively small amount, and likely also taking care of other parts of its business — the data seen reveals an overall direction of travel, where the costs of compute only keep on going up.
In February of this year, The information reported that Anthropic burned $5.6 billion in 2024, and made somewhere between $400 million and $600 million in revenue
...
I can confirm from a source with direct knowledge of billing that Anthropic spent $1.35 billion on Amazon Web Services in 2024, and has already spent $2.66 billion on Amazon Web Services through the end of September.
Assuming that Anthropic made $600 million in revenue, this means that Anthropic spent $6.2 billion in 2024, leaving $4.85 billion in costs unaccounted for.
Based on what I have been party to, the more successful Anthropic becomes, the more its services cost. The cost of inference is clearly increasing for customers, but based on its escalating monthly costs, the cost of inference appears to be high for Anthropic too, though it’s impossible to tell how much of its compute is based on training versus running inference.
While it is true that training these very large models is expensive, it is a one-time cost to be amortized against the huge number of inferences made against the trained model. Alas, the huge number is so huge that it dominates the one-time cost:
As conversations with experts and AI companies made clear, inference, not training, represents an increasing majority of AI’s energy demands and will continue to do so in the near future. It’s now estimated that 80–90% of computing power for AI is used for inference.
these costs seem to increase with the amount of money Anthropic makes, meaning that the current pricing of both subscriptions and API access seems unprofitable, and must increase dramatically — from my calculations, a 100% price increase might work, but good luck retaining every single customer and their customers too! — for this company to ever become sustainable.
The rise of aggressive, adaptive, and evasive web crawlers is a significant challenge for libraries and archives, causing service disruptions and overwhelming institutional resources. This article details the experiences of the University of North Carolina at Chapel Hill University Libraries in combating an unprecedented flood of crawler traffic. It describes the escalating mitigation efforts, from traditional client blocking to the implementation of more advanced techniques such as request throttling, regional traffic prioritization, novel facet-based bot detection, commercial Web Application Firewalls (WAFs), and ultimately, in-browser client verification with Cloudflare Turnstile. The article highlights the adaptive nature of these crawlers, the limitations of isolated institutional responses, and the critical lessons learned from mitigation efforts, including the issues introduced by residential proxy networks and the extreme scale of the traffic. Our experiences demonstrate the effectiveness of a multi-layered defense strategy that includes both commercial and library-specific solutions, such as facet-based bot detection. The article emphasizes the importance of community-wide collaboration, proposing future directions such as formalized knowledge sharing and the ongoing development of best practices to collectively address this evolving threat to open access and the stability of digital library services.
This paper presents an initiative to extract and repurpose instructional content from a series of Blackboard course shells associated with IMLS-funded boot camp events conducted in June of 2021, 2022, and 2023. These events, facilitated by ten faculty members and attended by 68 fellows, generated valuable educational materials currently confined within proprietary learning management system environments. The objective of this project is to enable broader access and reuse of these resources by migrating them to a non-siloed, static website independent of the original Blackboard infrastructure. We describe our methodology for acquiring and validating the data exports, outline the auditing procedures implemented to ensure content completeness and integrity, and discuss the challenges encountered throughout the process. Finally, we report on the current status of this ongoing effort and its implications for future dissemination and reuse of educational materials.
The Internet Archive was founded on May 10, 1996, in San Francisco, CA. Since its inception, the archive has amassed an enormous corpus of content, including over 866 billion web pages, more than 42.5 million print materials, 13 million videos, and 14 million audio files. It is relatively easy to upload content to the Internet Archive. It is also easy to download individual objects by visiting their pages and clicking on specific links. However, downloading a large collection, such as thousands or even tens of thousands of items, is not as easy. This article outlines how The University of Kentucky Libraries downloaded over 86 thousand previously uploaded newspaper issues from the Internet Archive for local use. The process leveraged ChatGPT to automate the process of generating Python scripts that accessed the Internet Archive via its API (Application Programming Interface).
Large Language Models (LLMs) are reshaping digital preservation and access in libraries, but their limitations (hallucinations, opacity, and resource demands) remain significant.
Retrieval-Augmented Generation (RAG) offers a promising mitigation strategy by grounding LLM outputs used in specific digital collections. This article compares the performance of WARC-GPTs default RAG implementation with unfiltered WARC files from Archi Io ve-It against a custom-built RAG solution utilizing optimization strategies in both modelling and data (WARC) preprocessing. Tested on a collection of thousands of archived pages from the Bob’s Burgers fan wiki, the study analyzes trade-offs in preprocessing, embedding strategies, retrieval accuracy, and system responsiveness. Findings suggest that while WARC-GPT lowers barriers to experimentation, custom RAG pipelines offer substantial improvements for institutions with the technical capacity to implement them, especially in terms of data quality, efficiency, and trustworthiness.
Since 2007, Digital Humanities Quarterly has published over 750 scholarly articles, constituting a significant repository of scholarship within the digital humanities. As the journal’s corpus of articles continues to grow, it is no longer possible for readers to manually navigate the title and abstract of every article in order to stay apprised of relevant work or conduct literature reviews. To address this, we have implemented a recommender system for the Digital Humanities Quarterly corpus, generating recommendations of related articles that appear below each article on the journal’s website with the goal of improving discoverability. These recommendations are generated via three different methods: a keyword-based approach based on a controlled vocabulary of topics assigned to articles by editors; a TF-IDF approach applied to full article text; and a deep learning approach using the Allen Institute for Artificial Intelligence’s SPECTER2 model applied to article titles and abstracts. In this article, we detail our process of creating this recommender system, from the article pre-processing pipeline to the front-end implementation of the recommendations on the Digital Humanities Quarterly website [1]. All of the code for our recommender system is publicly available in the Digital Humanities Quarterly GitHub repository [2].
Archivists occasionally describe digital repositories as being "not real," suggesting that their technical digital preservation infrastructure is inadequate to the task of digital preservation. This article discusses the concept of digital repositories, highlighting the distinction between digital repository technical infrastructure and institutions collecting digital materials, and what it means to be a "real" digital repository. It argues that the Open Archival Information System Reference Model and notions of Trustworthy Digital Repositories are inadequate for determining the "realness" of a digital repository and advocates using maturity models as a framework for discussing repository capability.
In this article, we describe a novel use of the note-taking software Obsidian as a method for users without formal training in metadata creation to develop culturally relevant data literacies across two digital archiving projects. We explain how Obsidian’s built-in use of linked data provides an open-source, flexible, and potentially scalable way for users to creatively interact with digitized materials, navigate and create metadata, and model relationships between digital objects. Furthermore, we demonstrate how Obsidian’s local and offline hosting features can be leveraged to include team members with low or unreliable internet access.
Artificial intelligence (AI) holds significant potential for improving metadata workflows, offering tools to enhance efficiency, improve discovery, and address long-standing challenges in libraries. Yet, as with any transformative technology, AI adoption requires thoughtful consideration of its limitations, ethical implications, and impact on professional practice. The key is finding the right balance—one that leverages AI’s capabilities while maintaining the quality and professional standards that libraries depend on.
From April to June 2024, the OCLC Research Library Partnership (RLP) convened the Managing AI in Metadata Workflows Working Group. This working group brought together metadata managers to explore how AI could be integrated into cataloging, special collections, and institutional repository workflows. Across these discussions, librarians and archivists expressed both enthusiasm and caution about AI adoption, and a set of cross-cutting themes emerged—insights that extend beyond specific workflows and highlight the opportunities and challenges of responsible AI adoption in libraries.
This blog post—the final of a four-part series—synthesizes key themes, including the critical importance of metadata quality, the need for ethical standards and transparency, the evolving roles of metadata professionals, and the responsibility to adopt sustainable AI practices. These insights, combined with emerging best practices from organizations like OCLC, point toward a future where AI enhances rather than replaces human expertise in metadata work.
Quality and reliability of metadata is essential
A fundamental theme across all discussions was the critical importance of metadata quality. Working group participants consistently stated that creating records using AI is counterproductive if resources are not accurately described or if users are misdirected. This emphasis on quality isn’t a barrier to AI adoption—it’s a framework for responsible implementation.
Several quality considerations emerged repeatedly:
Hallucinations that introduce false information into catalog records
Inconsistent outputs from identical inputs, undermining reliability
Unreliable confidence scores that don’t always accurately reflect the quality of AI-generated content
Entity recognition failures where AI-generated results might look syntactically correct but fail to identify the right person, place, or organization
However, these challenges are driving productive innovations rather than insurmountable barriers. OCLC’s approach to AI-powered de-duplication in WorldCat demonstrates how quality concerns can be addressed through hybrid approaches that combine AI efficiency with human expertise. OCLC has worked closely with the cataloging community to help validate its machine learning model’s understanding of duplicate records in WorldCat. To date, OCLC has removed more than 9 million duplicate records from WorldCat as a result of this AI model, which we continue to test and refine. The process includes conservative decision-making protocols and human oversight for complex cases, showing how AI can scale quality work rather than compromise it.
These developments are driving productive conversations about human oversight processes, quality control checkpoints, and training approaches that help staff effectively evaluate AI outputs—and that are already yielding practical solutions.
Contextual and cultural knowledge gaps exist
One of the most significant limitations identified by the working group involves AI’s current struggle with contextual and cultural knowledge. Participants noted practical challenges, such as AI transcription systems converting “MARC” to “Mark” or “nomen” to “Newman” in recordings with technical terminology. More broadly, AI systems often lack the deep contextual understanding needed for community-specific terminology or cultural nuances that don’t appear in general training databases.
Rather than viewing these as permanent limitations, the library community is actively addressing them. These challenges highlight an important opportunity: the need for more specialized, task-specific AI tools rather than general-purpose models. OCLC’s experiments with subject analysis and classification prediction demonstrate this approach in action. By grounding AI models in high-quality library metadata—specifically WorldCat data—OCLC is developing tools that understand library contexts better than general-purpose models.
This specialized approach also reinforces the continuing value of librarians’ and archivists’ deep collections knowledge and cultural expertise, positioning AI as a tool that extends rather than replaces professional judgment.
Evolving professional roles and skills: Enhancement, not replacement
Participants expressed genuine interest in AI as a tool for increasing efficiency and freeing metadata specialists from repetitive work to focus on more complex and specialized tasks. At the same time, thoughtful questions emerged about professional development and skill maintenance in an AI-enhanced environment.
Key considerations include how to ensure that new professionals develop foundational skills traditionally gained through tasks like brief record creation—skills that become essential for effectively evaluating AI outputs later in their careers. Experienced catalogers wondered whether spending more time reviewing than creating might impact their ability to identify subtle errors or handle complex materials that require human insight.
These discussions highlight the importance of designing AI implementations as enhancements to human expertise rather than replacements, ensuring that professional development pathways remain robust while leveraging AI’s potential to handle volume and routine tasks. OCLC’s approach exemplifies this philosophy. OCLC’s AI de-duplication project doesn’t, for instance, doesn’t eliminate human oversight but refocuses it where expertise matters most. As noted by Bemal Rajapatirana, “This approach to de-duplication is not about reducing the role of people—it’s about refocusing their expertise where it matters most. Catalogers can focus on high-value work that connects them to their communities instead of spending hours resolving duplicate records.”
Real world library examples already demonstrate this potential. The University of Calgary Library successfully redirected 1.5 FTE of staff time to more strategic, higher-level tasks following the implementation of its AI chatbot, showing how AI can create space for the uniquely human aspects of library work rather than diminishing professional roles.
Ethical considerations and standards: Building transparency into practice
Working group members identified several important ethical considerations, with data provenance and transparency emerging as particularly crucial. Participants emphasized the need to track when and how AI contributes to metadata, both for quality control purposes and transparency.
For example, in a case study where AI was given a finding aid and asked to provide headings for personal names that were verified against the LC Name Authority File, the tool provided headings that looked correctly formulated (e.g., “Bukowski, Charles, 1920-1994” with dates added ), and AI even claimed that they were verified, but actually they were not the correct authorized headings (“Bukowski, Charles”). In this type of case, the provision of provenance information indicating that the heading was AI contributed could trigger human review for quality control.
OCLC has responded to community questions about data provenance for AI-generated metadata by updating WorldCat documentation and providing guidance through programs like AskQC Office Hours. OCLC’s Bibliographic Formats and Standards (BFAS) now includes instructions for recording AI-generated metadata in bibliographic records in section 3.5. Readers may also find it useful to consult the August 2025 AskQC Office Hours session.
Questions also arose about the lifecycle of AI-generated metadata: When does AI-generated content become simply “cataloger-reviewed content,” similar to copy cataloging workflows? How do we balance transparency with practical workflow considerations? These discussions reflect the library community’s commitment to responsibly working through the practical implications of new technologies.
Environmental awareness and responsibility
Participants expressed concerns about AI’s environmental impacts, indicating a preference for less energy-intensive solutions when they prove similarly effective. Rather than viewing this as a barrier, metadata managers identified a need for accessible information about the environmental impact of different AI applications, enabling informed decision-making and meaningful conversations with their teams about responsible implementation choices.
OCLC’s approach to AI development reflects this environmental consciousness. The WorldCat de-duplication model is designed to be computationally efficient, reducing unnecessary resource use while maintaining high-quality results. As Rajapatirana explains, “by optimizing AI’s footprint, we ensure that de-duplication remains cost-effective and scalable for the long term.” This environmental consciousness reflects the library community’s broader commitment to sustainability and responsible technology adoption, suggesting opportunities for training and information sharing about library AI energy impacts.
Conclusion
The concerns and opportunities described in this blog post reflect a community that is actively thinking through the implications of an emerging technology, rather than simply adopting it. The clearly articulated need for specialized AI tools, quality frameworks, and ethical guidelines is driving innovations that address current limitations.
Working group participants’ emphasis on maintaining professional expertise while leveraging AI’s capabilities suggests a thoughtful approach to technology integration that preserves what makes library work valuable while enhancing its impact.
The RLP Managing AI in Metadata Workflows working group provided the opportunity for metadata managers to identify important implications for AI usage in metadata workflows. This blog series distills those insights, and we hope that these observations will offer useful guidance to the library community as it collectively navigates technological change.
NB: As you might expect, AI technologies were used extensively throughout this project. We used a variety of tools—including Copilot, ChatGPT, and Claude—to summarize notes, recordings, and transcripts. These were useful for synthesizing insights for each of the three subgroups and for quickly identifying the types of overarching themes described in this blog post.
It’s October, and that means the return of our annual Halloween Hunt!
We’ve scattered a troupe of jack-o-lanterns around the site, and it’s up to you to try and find them all.
Decipher the clues and visit the corresponding LibraryThing pages to find a ghost. Each clue points to a specific page on LibraryThing. Remember, they are not necessarily work pages!
If there’s a jack-o-lantern on a page, you’ll see a banner at the top of the page.
You have just two weeks to find all the jack-o-lanterns (until 11:59pm EDT, Friday October 31st).
Come brag about your troupe of jack-o-lanterns (and get hints) on Talk.
Win prizes:
Any member who finds at least two jack-o-lanterns will be
awarded a jack-o-lantern Badge ().
Members who find all 12 jack-o-lanterns will be entered into a drawing for one of five LibraryThing (or TinyCat) prizes. We’ll announce winners at the end of the hunt.
P.S. Thanks to conceptDawg for the vulture illustration!