Sequential / Ed Summers

We are sequential beings. Actions cannot be undone; life, as we experience it, cannot be reversed. The irreversibility of human life is the source of our pain and also our wonder.

Spinoza’s Rooms by Madeliene Thien.

On the dissemination of ideas and innovation / Lorcan Dempsey

This is an excerpt from a longer contribution I made to Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools. [pdf] It is a sketch only, and somewhat informal, but I thought I would put it here in case of interest. It occasionally references the position paper. It is also influenced by the context in which it was prepared which was a discussion of the informational disciplines and the iSchool in R1 institutions.
It would be interesting to take a fuller discussion in one of two directions, which would fill in more individual or organizational names. The first is empirical, based on survey, citations, and other markers of influence. A second would be to be more opinionated, which would be partial (in more than one sense) but might prompt some reflection about emphasis and omission.
If you wish to reference it, I would be grateful if you cite the full original: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]

The diffusion of ideas

As numerous critics beyond Kristof have observed, the professionalization of the academy prioritizes peer-reviewed publications over other forms of writing. Professors allocate the bulk of their efforts to researching, writing, and publishing in their field journals. The first task of any professor—particularly junior professors—is to publish in prestigious peer-reviewed outlets. Even scholars who have some facility with engaging a wider audience have warned that it takes time away from research.49 It is great when academics also express their ideas to a wider audience. Professional incentives dictate, however, that this will always be the hobby and not the job. // Daniel W. Drezner (2017) The Ideas Industry.

The university plays an important role in the generation and diffusion of ideas and innovation. The Report does not focus on this area. However, informing practice and influencing policy is an important part of what the university does, especially in a practice-oriented discipline. As noted, in a period of change, libraries benefit from data-based frameworks, evidence, and arguments to support advocacy work, or to think about new service areas. In a related context, being interviewed on NPR when expertise is required or writing an op-ed in a leading newspaper are markers of esteem (see the discussion of symbolic capital in the next section).

... in a period of change, libraries benefit from data-based frameworks, evidence, and arguments to support advocacy work, or to think about new service areas.

Drezner’s typology of sources

Dan Drezner writes about the dissemination of ideas in The Ideas Industry. Drezner is interested in how ideas are diffused and taken up in political and policy contexts, and how they lead to action or practical decisions. He discusses the evolving sources of ideas in the policy arena.

Academic. The academy may have been the historically primary source of ideas, although Drezner argues that its influence has waned for various reasons. He notes the scholarly incentives of faculty, which promote peer-reviewed articles in leading journals as the peak achievement, and which in turn leads to disciplinary peers as the primary audience and community they seek. Disciplines will have different dynamics. For example, perhaps because of its normative base, Drezner suggests, economics has more influence than other social sciences.
Think tanks. He charts the rise and changing role of think tanks such as the Rand Corporation or Brookings Institution. Unlike universities, these do have an explicit role in influencing policy, and over time some have become more partisan.
Industry and management consulting. Various firms – McKinsey, Gartner, and others – have developed capacity for published research and thought pieces, often as a form of reputational promotion of their consulting or related services.
Individuals. Drezner highlights, for example, the careers of Fareed Zakaria and Neil Ferguson.

How does this play out in the library field?

The incentives Drezner mentions are strongly at play in R1 schools and may not be aligned with broader community engagement. This is evident in the comments of the Early Career Researchers. Of course, taken collectively iSchools do work which influences both practice and policy, and there are some notable connections (Sheffield and exploration of open access, for example). There are also some high-profile iSchool faculty members who make important and visible contributions to broader debate outside the library context.

While they are not think-tanks as such, one can point to Ithaka S&R and OCLC Research, divisions, respectively, of large not-for-profit service organizations, each of which is quite active in working with groups of libraries to develop applied R&D outputs.[1] They tend to focus on areas of topical interest, such as collections, collaboration, research infrastructure and scholarly communication. Over the years, they have worked on a variety of topics (including, for example, metadata and protocols, research support, library collaboration, and user behavior in the case of OCLC Research). Ithaka S&R has an academic and cultural focus. OCLC Research works with academic and public libraries (notably through WebJunction, a learning platform for libraries). In each case, there is definitely an interest in providing knowledge, evidence and models that help influence practice or inform policy.

This interest is also evident in the output of professional associations and others which produce outputs on behalf of members. While different from Drezner’s consultancy category, there are some parallels in terms of providing value to members. Here one might point to the Urban Libraries Council or to the Association for Research Libraries and the Coalition for Network Information, or to the divisions of ALA. ARSL is another example.

Advocacy and other groups also produce materials to inform and guide. Helping with evidence and arguments is important here. SPARC and EveryLibrary are examples.

An important element of what associations and membership groups do is to provide venues for networking and to support communities of practice. They aim to scale learning and innovation within their constituencies.

An important element of what associations and membership groups do is to provide venues for networking and to support communities of practice. They aim to scale learning and innovation within their constituencies.

One can also see that vendors produce occasional reports, as a value-add to customers. Think of Sage or Clarivate for example. In some cases, these may not be seen as more than elevated marketing.

Finally, there is a variety of individual practitioner voices that are quite influential.

I have not given a lot of examples above, because without some analysis, it would be very subjective. However, some exploration of the diffusion of ideas and innovation in this space would be interesting, acknowledging that it is a smaller more tight-knit community than some of the areas Drezner (who is a scholar and commentator of International Relations) discusses.

Public intellectuals and thought leaders

Public intellectuals delight in taking issue with various parts of the conventional wisdom. By their very nature, however, they will be reluctant to proffer alternative ideas that appeal to any mass audience. Thought leaders will have no such difficulty promising that their ideas will disrupt or transform the status quo. And the shifts discussed in this chapter only increase the craving for clear, appealing answers. // Daniel W. Drezner (2017) The Ideas Industry.

This is an inherent tension between scholarship and communication, one that breeds resentment for academics trying to engage a wider audience as well as readers who have to wade through complex, cautious prose. Daniel W. Drezner (2017) The Ideas Industry.

In Drezner’s terms, thought leaders favor large explanatory ideas, and deliberately aim to influence policy and practice. They value clear communication, may view the world through a single frame, and evangelize their ideas actively. Thomas Friedman is an example in the book. Public intellectuals promote critical thought across different arenas, may not offer easy solutions or answers, and emphasize complexity and questioning. Francis Fukuyama and Noam Chomsky are cited examples here.

Drezner notes that the current climate favors thought leaders because their ideas are easier to consume: their big idea can be delivered in a Ted Talk. Perhaps the library community hit peak thought leadership in the heyday of the personal blog, where several influential librarians achieved large audiences.

Platform publications

Computing has Communications of the ACM. Engineering has IEEE Spectrum. Business readers turn to the Harvard Business Review. The HE technology community has Educause Review.

These are what I have called in the past ‘platform’ publications (Dempsey and Walter). They aggregate the attention of an audience within a particular domain, including leadership, practice and research. They provide a platform for their authors, who may be reasonably assured of a broad engaged audience.

The library community does not have such a publication, which could provide a venue for research and practice to co-exist in dialog.

Ischools and influence on policy and practice

What is the role of the iSchool in influencing policy and informing practice? More specifically, how important is it for Library Studies to visibly do this?

The bilateral research - practice connection is of course much discussed, and I wondered about the gap here above. Is the influence on policy at various levels perhaps less discussed?

This works in a variety of ways, not least through participation in professional venues – membership of associations, presentation where practitioners congregate to learn about direction, partnership with library organizations and libraries.

Again, without supporting evidence, my impression is that there may be a higher level of engagement with policy and practice in Archival Studies than in Library Studies when measured against overall research and education capacity.

I believe that markers of influence are important for elevating the overall profile of Library Studies, and that the initiative should look at this engagement in a further iteration of this work. A comparative perspective would be interesting, thinking firstly of the LAM strands, and then of other practice-oriented disciplines. How do library studies perform in terms of impact on policy/practice compared to other comparable disciplines?

This seems especially important now, given the importance of evidence and arguments in a time of contested value and values.

Coda: overview of and links to full contribution

Collection: LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]

Contribution: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]

Contents: Here are the sections from my contribution. Where I have excerpted them on this site, I provide a link.

1 Introduction
2 Information: a brief schematic history [excerpted here]
3 Libraries and library studies [excerpted here]
4 Informational disciplines
5 On the dissemination of ideas and innovation [excerpted here]
6 Symbolic capital
7 Recommendations and candidate recommendations
Coda 1: Google Ngram
Coda 2: Personal position
References

[1] Buschman (2020) talks about ‘white papers’ in the library space, and, focusing attention on the outputs of Ithaka S&R, describes them as ‘empty calories.’

References

Buschman, J. (2020). Empty calories? A fragment on LIS white papers and the political sociology of LIS elites. The Journal of Academic Librarianship, 46(5), 102215. https://doi.org/10.1016/j.acalib.2020.102215

Dempsey, L., & Walter, S. (2014). A platform publication for a time of accelerating change. College and Research Libraries, 75(6), 760–762. https://doi.org/10.5860/crl.75.6.760

Drezner, D. W. (2017). The Ideas Industry: How Pessimists, Partisans, and Plutocrats are Transforming the Marketplace of Ideas. Oxford University Press.

The Drugs Are Taking Hold / David Rosenthal

cyclonebill CC-BY-SA

In The Selling Of AI I compared the market strategy behind the AI bubble to the drug-dealer's algorithm, "the first one's free". As the drugs take hold of an addict, three things happen:

Their price rises.
The addict needs bigger doses for the same effect.
Their deleterious effects kick in.

As expected, this what is happening to AI. Follow me below the fold for the details.

The price rises

Ethan Ding starts tokens are getting more expensive thus:

imagine you start a company knowing that consumers won't pay more than $20/month. fine, you think, classic vc playbook - charge at cost, sacrifice margins for growth. you've done the math on cac, ltv, all that. but here's where it gets interesting: you've seen the a16z chart showing llm costs dropping 10x every year.

so you think: i'll break even today at $20/month, and when models get 10x cheaper next year, boom - 90% margins. the losses are temporary. the profits are inevitable.

it’s so simple a VC associate could understand it:

year 1: break even at $20/month

year 2: 90% margins as compute drops 10x

year 3: yacht shopping

it’s an understandable strategy: "the cost of LLM inference has dropped by a factor of 3 every 6 months, we’ll be fine”

Source

The first problem with this is that only 8% of the users will pay the $20/month, the 92% use it for free (Menlo Ventures thinks it is only 3%). Indeed it turns out that an entire government agency only pays $1/year, as Samuel Axon reports in US executive branch agencies will use ChatGPT Enterprise for just $1 per agency:

The workers will have access to ChatGPT Enterprise, a type of account that includes access to frontier models and cutting-edge features with relatively high token limits, alongside a more robust commitment to data privacy than general consumers of ChatGPT get. ChatGPT Enterprise has been trialed over the past several months at several corporations and other types of large organizations.

The workers will also have unlimited access to advanced features like Deep Research and Advanced Voice Mode for a 60-day period. After the one-year trial period, the agencies are under no obligation to renew.

Did I mention the drug-dealer's algorithm?

But that's not one of the two problems Ding is discussing. He is wondering why instead of yacht shopping, this happened:

but after 18 months, margins are about as negative as they’ve ever been… windsurf’s been sold for parts, and claude code has had to roll back their original unlimited $200/mo tier this week.

companies are still bleeding. the models got cheaper - gpt-3.5 costs 10x less than it used to. but somehow the margins got worse, not better.

What the A16Z graph shows is the rapid reduction in cost per token of each specific model, but also the rapid pace at which each specific model is supplanted by a better successor. Ding notes that users want the current best model:

gpt-3.5 is 10x cheaper than it was. it's also as desirable as a flip phone at an iphone launch.

when a new model is released as the SOTA, 99% of the demand immediatley shifts over to it. consumers expect this of their products as well.

Source

Which causes the first of the two problems Ding is describing. His graph shows that the cost per token of the model users actually want is approximately constant:

the 10x cost reduction is real, but only for models that might as well be running on a commodore 64.

so this is the first faulty pillar of the “costs will drop” strategy: demand exists for "the best language model," period. and the best model always costs about the same, because that's what the edge of inference costs today.
...
when you're spending time with an ai—whether coding, writing, or thinking—you always max out on quality. nobody opens claude and thinks, "you know what? let me use the shitty version to save my boss some money." we're cognitively greedy creatures. we want the best brain we can get, especially if we’re balancing the other side with our time.

So the business model based on the cost of inference dropping 10x per year doesn't work. But that isn't the worst of the two problems. While it is true that the cost in dollars of a set number of tokens is roughly constant, the number of tokens a user needs is not:

while it's true each generation of frontier model didn't get more expensive per token, something else happened. something worse. the number of tokens they consumed went absolutely nuclear.

chatgpt used to reply to a one sentence question with a one sentence reply. now deep research will spend 3 minutes planning, and 20 minutes reading, and another 5 minutes re-writing a report for you while o3 will just run for 20-minutes to answer “hello there”.

the explosion of rl and test-time compute has resulted in something nobody saw coming: the length of a task that ai can complete has been doubling every six months. what used to return 1,000 tokens is now returning 100,000.

Source

Users started by trying fairly simpple tasks on fairly simple models. The power users, the ones in the 8%, were happy with the results and graduated to trying complex questions on frontier models. So their consumption of tokens exploded:

today, a 20-minute "deep research" run costs about $1. by 2027, we'll have agents that can run for 24 hours straight without losing the plot… combine that with the static price of the frontier? that’s a ~$72 run. per day. per user. with the ability to run multiple asynchronously.

once we can deploy agents to run workloads for 24 hours asynchronously, we won't be giving them one instruction and waiting for feedback. we'll be scheduling them in batches. entire fleets of ai workers, attacking problems in parallel, burning tokens like it's 1999.

obviously - and i cannot stress this enough - a $20/month subscription cannot even support a user making a single $1 deep research run a day. but that's exactly what we're racing toward. every improvement in model capability is an improvement in how much compute they can meaningfully consume at a time.

The power users were on Anthropic's unlimited plan, so this happened:

users became api orchestrators running 24/7 code transformation engines on anthropic's dime. the evolution from chat to agent happened overnight. 1000x increase in consumption. phase transition, not gradual change.

so anthropic rolled back unlimited. they could've tried $2000/month, but the lesson isn't that they didn't charge enough, it’s that there’s no way to offer unlimited usage in this new world under any subscription model.

it's that there is no flat subscription price that works in this new world.

Ed Zitron's AI Is A Money Trap looks at the effect of Anthropic figuring this out on Cursor:

the single-highest earning generative AI company that isn’t called OpenAI or Anthropic, and the highest-earning company built on top of (primarily) Anthopic’s technology.

When Anthropic decided to reduce the rate at which they were losing money, Cursor's business model collapsed:

In mid-June — a few weeks after Anthropic introduced “priority tiers” that required companies to pay up-front and guarantee a certain throughput of tokens and increased costs on using prompt caching, a big part of AI coding — Cursor massively changed the amount its users could use the product, and introduced a $200-a-month subscription.

Cursor's customers weren't happy:

Cursor’s product is now worse. People are going to cancel their subscriptions. Its annualized revenue will drop, and its ability to raise capital will suffer as a direct result. It will, regardless of this drop in revenue, have to pay the cloud companies what it owes them, as if it had the business it used to. I have spoken to a few different people, including a company with an enterprise contract, that are either planning to cancel or trying to find a way out of their agreements with Cursor.

So Cursor, which was already losing money, will have less income and higher costs. They are the largest company buit on the AI major's platforms, despite only earning "around $42 million a month", and Anthropic just showed that their business model doesn't work. This isn't a good sign for the generative AI industry and thus, as Zitron explains in details, for the persistence of the AI bubble.

Ding explains why OpenAi's $1/year/agency deal is all about with similar deals at the big banks:

this is what devins all in on. they’ve recently announced their citi and goldman sachs parterships, deploying devin to 40,000 software engineers at each company. at $20/mo this is a $10M project, but here’s a question: would you rather have $10M of ARR from goldman sachs or $500m from prosumer devleopers?

the answer is obvious: six-month implementations, compliance reviews, security audits, procurement hell mean that that goldman sachs revenue is hard to win — but once you win it it’s impossible to churn. you only get those contracts if the singular decision maker at the bank is staking their reputation on you — and everyone will do everything they can to make it work.

Once the organization is hooked on the drug, they don't care what it costs because both real and political switching costs are intolerable,

Bigger doses are needed

Anjli Raval reports that The AI job cuts are accelerating:

Even as business leaders claim AI is “redesigning” jobs rather than cutting them, the headlines tell another story. It is not just Microsoft but Intel and BT that are among a host of major companies announcing thousands of lay-offs explicitly linked to AI. Previously when job cuts were announced, there was a sense that these were regrettable choices. Now executives consider them a sign of progress. Companies are pursuing greater profits with fewer people.

For the tech industry, revenue per employee has become a prized performance metric. Y Combinator start-ups brag about building companies with skeleton teams. A website called the “Tiny Teams Hall of Fame” lists companies bringing in tens or hundreds of millions of dollars in revenue with just a handful of employees.

Source

Brandon Vigliarolo's IT firing spree: Shrinking job market looks even worse after BLS revisions has the latest data:

The US IT jobs market hasn't exactly been robust thus far in 2025, and downward revisions to May and June's Bureau of Labor Statistics data mean IT jobs lost in July are part of an even deeper sector slowdown than previously believed.

The Bureau of Labor Statistics reported relatively flat job growth last month, but unimpressive payroll growth numbers hid an even deeper reason to be worried: Most of the job growth reported (across all employment sectors) in May and June was incorrect.

According to the BLS, May needed to be revised down by 125,000 jobs to just 19,000 added jobs; June had to be revised down by even more, with 133,000 erroneous new jobs added to company payrolls that month. That meant just 14,000 new jobs were added in June.
...
Against that backdrop, Janco reports that BLS data peg the IT-sector unemployment rate at 5.5 percent in July - well above the national rate of 4.2 percent. Meanwhile, the broader tech occupation unemployment rate was just 2.9 percent, as reported by CompTIA.

Note these points from Janco's table:

The huge spike of 107,100 IT jobs lost last November.
The loss of 26,500 IT jobs so far this year.
That so far this year losses are 327% of the same period last year.

Source

The doses are increasing but their effect in pumping the stock hasn't been; the NDXT index of tech stocks hasn't been heading moonwards over the last year.

CEOs have been enthusiastically laying off expensive workers and replacing them with much cheaper indentured servnts on H-1B visas, as Dan Gooding reports in H-1B Visas Under Scrutiny as Big Tech Accelerates Layoffs:

The ongoing reliance on the H-1B comes as some of these same large companies have announced sweeping layoffs, with mid-level and senior roles often hit hardest. Some 80,000 tech jobs have been eliminated so far this year, according to the tracker Layoffs.fyi.

Gooding notes that:

In 2023, U.S. colleges graduated 134,153 citizens or green card holders with bachelor's or master's degrees in computer science. But the same year, the federal government also issued over 110,000 work visas for those in that same field, according to the Institute for Sound Public Policy (IFSPP).

"The story of the H-1B program is that it's for the best and the brightest," said Jeremy Beck, co-president of NumbersUSA, a think tank calling for immigration reform. "The reality, however, is that most H-1B workers are classified and paid as 'entry level.' Either they are not the best and brightest or they are underpaid, or both."

While it is highly likely that most CEOs have drunk the Kool-Aid and actually believe that AI will replace the workers they fired, Liz Fong-Jones believes that:

the megacorps use AI as pretext for layoffs, but actually rooted in end of 0% interest, changes to R&D tax credit (S174, h/t @pragmaticengineer.com for their reporting), & herd mentality/labour market fixing. they want investors to believe AI is driving cost efficiency.

AI today is literally not capable of replacing the senior engineers they are laying off. corps are in fact getting less done, but they're banking on making an example of enough people that survivors put their heads down and help them implement AI in exchange for keeping their jobs... for now.

Note that the megacorps are monopolies, so "getting less done" and delivering worse product by using AI isn't a problem for them — they won't lose business. It is just more enshittification.

Presumably, most CEOs think they have been laying off the fat, and replacing it with cheaper workers whose muscle is enhanced by AI, thereby pumping the stock. But they can't keep doing this; they'd end up with C-suite surrounded by short-termers on H-1Bs with no institutional memory of how the company actually functions. This information would have fallen off the end of the AIs' context.

The deleterious effects kick in

The deleterious effects come in three forms. Within the companies, as the hype about AI's capabilities meets reality. For the workers, and not just those who were laid off. And in the broader economy, as the rush to build AI data centers meets limited resources.

The companies

But Raval sees the weakening starting:

But are leaner organisations necessarily better ones? I am not convinced these companies are more resilient even if they perform better financially. Faster decision making and lower overheads are great, but does this mean fewer resources for R&D, legal functions or compliance? What about a company’s ability to withstand shocks — from supply chain disruptions to employee turnover and dare I say it, runaway robots?

Some companies such as Klarna have reversed tack, realising that firing hundreds of staff and relying on AI resulted in a poorer customer service experience. Now the payments group wants them back.

Of course, the tech majors have already enshittified their customer experience, so they can impose AI on their customers without fear. But AI is enshittifying the customer experience of smaller companies who have acutal competitors.

The workers

Shannon Pettypiece reports that 'A black hole': New graduates discover a dismal job market:

NBC News asked people who recently finished technical school, college or graduate school how their job application process was going, and in more than 100 responses, the graduates described months spent searching for a job, hundreds of applications and zero responses from employers — even with degrees once thought to be in high demand, like computer science or engineering. Some said they struggled to get an hourly retail position or are making salaries well below what they had been expecting in fields they hadn’t planned to work in.

And Anjli Raval note that The AI job cuts are accelerating:

Younger workers should be particularly concerned about this trend. Entire rungs on the career ladder are taking a hit, undermining traditional job pathways. This is not only about AI of course. Offshoring, post-Covid budget discipline, and years of underwhelming growth have made entry-level hiring an easy thing to cut. But AI is adding to pressures.
...
The consequences are cultural as well as economic. If jobs aren’t readily available, will a university degree retain its value? Careers already are increasingly “squiggly” and not linear. The rise of freelancing and hiring of contractors has already fragmented the nature of work in many cases. AI will only propel this.
...
The tech bros touting people-light companies underestimate the complexity of business operations and corporate cultures that are built on very human relationships and interactions. In fact, while AI can indeed handle the tedium, there should be a new premium on the human — from creativity and emotional intelligence to complex judgment. But that can only happen if we invest in those who bring those qualities and teach the next generation of workers — and right now, the door is closing on many of them.

In Rising Young Worker Despair in the United States, David G. Blanchflower & Alex Bryson describe some of the consequences:

Between the early 1990s and 2015 the relationship between mental despair and age was hump-shaped in the United States: it rose to middle-age, then declined later in life. That relationship has now changed: mental despair declines monotonically with age due to a rise in despair among the young. However, the relationship between age and mental despair differs by labor market status. The hump-shape in age still exists for those who are unable to work and the unemployed. The relation between mental despair and age is broadly flat, and has remained so, for homemakers, students and the retired. The change in the age-despair profile over time is due to increasing despair among young workers. Whilst the relationship between mental despair and age has always been downward sloping among workers, this relationship has become more pronounced due to a rise in mental despair among young workers. We find broad-based evidence for this finding in the Behavioral Risk Factor Surveillance System (BRFSS) of 1993-2023, the National Survey on Drug Use and Health (NSDUH), 2008-2023, and in surveys by Pew, the Conference Board and Johns Hopkins University.

History tends to show that large numbers of jobless young people despairing of their prospects for the future is a pre-revolutionary situation.

The economy

Source

Bryce Elder's What’ll happen if we spend nearly $3tn on data centres no one needs? points out the huge size of the AI bubble:

The entire high-yield bond market is only valued at about $1.4tn, so private credit investors putting in $800bn for data centre construction would be huge. A predicted $150bn of ABS and CMBS issuance backed by data centre cash flows would triple those markets’ current size. Hyperscaler funding of $300bn to $400bn a year compares with annual capex last year for all S&P 500 companies of about $950bn.

It’s also worth breaking down where the money would be spent. Morgan Stanley estimates that $1.3tn of data centre capex will pay for land, buildings and fit-out expenses. The remaining $1.6tn is to buy GPUs from Nvidia and others. Smarter people than us can work out how to securitise an asset that loses 30 per cent of its value every year, and good luck to them.

Brian Merchant argues that this spending is so big it is offsetting the impact of the tariffs in The AI bubble is so big it's propping up the US economy (for now):

Over the last six months, capital expenditures on AI—counting just information processing equipment and software, by the way—added more to the growth of the US economy than all consumer spending combined. You can just pull any of those quotes out—spending on IT for AI is so big it might be making up for economic losses from the tariffs, serving as a private sector stimulus program.

Source

Noah Smith's Will data centers crash the economy? focuses on the incredible amounts the big four — Google, Meta, Microsoft, and Amazon — are spending:

For Microsoft and Meta, this capital expenditure is now more than a third of their total sales.

Smith notes that, as a proportion of GDP, this roughly matches the peak of the telecom boom:

That would have been around 1.2% of U.S. GDP at the time — about where the data center boom is now. But the data center boom is still ramping up, and there’s no obvious reason to think 2025 is the peak,

The fiber optic networks that, a quarter-century later, are bringing you this post were the result of the telecom boom.

Source

Over-investment is back, but might this be a good thing?

I think it’s important to look at the telecom boom of the 1990s rather than the one in the 2010s, because the former led to a gigantic crash. The railroad boom led to a gigantic crash too, in 1873 ... In both cases, companies built too much infrastructure, outrunning growth in demand for that infrastructure, and suffered a devastating bust as expectations reset and loans couldn’t be paid back.

In both cases, though, the big capex spenders weren’t wrong, they were just early. Eventually, we ended up using all of those railroads and all of those telecom fibers, and much more. This has led a lot of people to speculate that big investment bubbles might actually be beneficial to the economy, since manias leave behind a surplus of cheap infrastructure that can be used to power future technological advances and new business models.

But for anyone who gets caught up in the crash, the future benefits to society are of cold comfort.

Source

How likely is the bubble to burst? Elder notes just one reason:

Morgan Stanley estimates that more than half of the new data centres will be in the US, where there’s no obvious way yet to switch them on:

America needs to find an extra 45GW for its data farms, says Morgan Stanley. That’s equivalent to about 10 per cent of all current US generation capacity, or “23 Hoover Dams”, it says. Proposed workarounds to meet the shortfall include scrapping crypto mining, putting data centres “behind the meter” in nuclear power plants, and building a new fleet of gas-fired generators.

Good luck with that! It is worth noting that the crash has already happened in China, as Caiwei Chen reports in China built hundreds of AI data centers to catch the AI boom. Now many stand unused.:

Just months ago, a boom in data center construction was at its height, fueled by both government and private investors. However, many newly built facilities are now sitting empty. According to people on the ground who spoke to MIT Technology Review—including contractors, an executive at a GPU server company, and project managers—most of the companies running these data centers are struggling to stay afloat. The local Chinese outlets Jiazi Guangnian and 36Kr report that up to 80% of China’s newly built computing resources remain unused.

Elder also uses the analogy with the late 90s telecom bubble:

In 2000, at the telecoms bubble’s peak, communications equipment spending topped out at $135bn annualised. The internet hasn’t disappeared, but most of the money did. All those 3G licences and fibre-optic city loops provided zero insulation from default:

Peak data centre spend this time around might be 10 times higher, very approximately, with public credit investors sharing the burden more equally with corporates. The broader spread of capital might mean a slower unwind should GenAI’s return on investment fail to meet expectations, as Morgan Stanley says. But it’s still not obvious why creditors would be coveting a server shed full of obsolete GPUs that’s downwind of a proposed power plant.

When the bubble bursts, who will lose money?

A data center bust would mean that Big Tech shareholders would lose a lot of money, like dot-com shareholders in 2000. It would also slow the economy directly, because Big Tech companies would stop investing. But the scariest possibility is that it would cause a financial crisis.

Financial crises tend to involve bank debt. When a financial bubble and crash is mostly a fall in the value of stocks and bonds, everyone takes losses and then just sort of walks away, a bit poorer — like in 2000. Jorda, Schularick, and Taylor (2015) survey the history of bubbles and crashes, and they find that debt (also called “credit” and “leverage”) is a key predictor of whether a bubble ends up hurting the real economy.

The Jorda et al paper is When Credit Bites Back: Leverage, Business Cycles, and Crises, and what they mean by "credit" and "leverage" is bank loans.

Smith looks at whether the banks are lending:

So if we believe this basic story of when to be afraid of capex busts, it means that we have to care about who is lending money to these Big Tech companies to build all these data centers. That way, we can figure out whether we’re worried about what happens to those lenders if Big Tech can’t pay the money back.

And so does The Economist:

During the first half of the year investment-grade borrowing by tech firms was 70% higher than in the first six months of 2024. In April Alphabet issued bonds for the first time since 2020. Microsoft has reduced its cash pile but its finance leases—a type of debt mostly related to data centres—nearly tripled since 2023, to $46bn (a further $93bn of such liabilities are not yet on its balance-sheet). Meta is in talks to borrow around $30bn from private-credit lenders including Apollo, Brookfield and Carlyle. The market for debt securities backed by borrowing related to data centres, where liabilities are pooled and sliced up in a way similar to mortgage bonds, has grown from almost nothing in 2018 to around $50bn today.

The rush to borrow is more furious among big tech’s challengers. CoreWeave, an ai cloud firm, has borrowed liberally from private-credit funds and bond investors to buy chips from Nvidia. Fluidstack, another cloud-computing startup, is also borrowing heavily, using its chips as collateral. SoftBank, a Japanese firm, is financing its share of a giant partnership with Openai, the maker of ChatGPT, with debt. “They don’t actually have the money,” wrote Elon Musk when the partnership was announced in January. After raising $5bn of debt earlier this year xai, Mr Musk’s own startup, is reportedly borrowing $12bn to buy chips.

Smith focuses on private credit:

These are the potentially scary part. Private credit funds are basically companies that take investment, borrow money, and then lend that money out in private (i.e. opaque) markets. They’re the debt version of private equity, and in recent years they’ve grown rapidly to become one of the U.S.’ economy’s major categories of debt:

Source

Are the banks vulnerable to private credit?.

Private credit funds take some of their financing as equity, but they also borrow money. Some of this money is borrowed from banks. In 2013, only 1% of U.S. banks’ total loans to non-bank financial institutions was to private equity and private credit firms; today, it’s 14%.

BDCs are “Business Development Companies”, which are a type of private credit fund. If there’s a bust in private credit, that’s an acronym you’ll be hearing a lot.

And I believe the graph above does not include bank purchases of bonds (CLOs) issued by private credit companies. If private credit goes bust, those bank assets will go bust too, making banks’ balance sheets weaker.

The fundamental problem here is that an AI bust would cause losses that would be both very large and very highly correlated, and thus very likely to be a tail risk not adequately accounted for by the banks' risk models, just as the large, highly correlated losses caused the banks to need a bail-out in the Global Financial Crisis of 2008.

The Book of Records / Ed Summers

I recently finished Madeleine Thien’s The Book of Records and found this in the acknowledgements at the end:

The Book of Records, guided by histories, letters, philosophies, poetry, mathematics and physics, is a work of the imagination. I am indebted to the library, and to librarians, archivists and translators, for their companionship and light–they are the steadfast keepers of the building made of time.

I like how this blends the people and infrastructure of libraries and record keeping, and recognizes them as partners in imagination. Reading and writing are the central theme, of this beautiful book, which David Naimon describes well in the opening to his extended interview with her:

The Book of Records is many things: a book of historical fiction and speculative fiction, a meditation on time and on space-time, on storytelling and truth, on memory and the imagination, a book that impossibly conjures the lives and eras of the philosopher Baruch Spinoza, the Tang dynasty poet Du Fu and the political theorist Hannah Arendt not as mere ghostly presences but portrayed as vividly and tangibly as if they lived here and now in the room where we hold this very book. But most of all this is a book about books, about words as amulets, about stories as shelters, about novels as life rafts, about strangers saving strangers, about friendships that defy both space and time, about choosing, sometimes at great risk to oneself, life and love.

I will add that the underlying theme of being a refugee from various forms of fascism and totalitarianism amidst a catastrophically changing climate really speaks to our moment–especially considering that the book took her ten years to write.

I heard in the interview that Thien worked through copying Spinoza’s Ethics as an exercise while writing The Book of Records. I don’t know if I’m going to do this, but I did enjoy the sections on Spinoza a lot, and previously enjoyed reading about how his philosophy informed Joyful Militancy, so I got a copy too. Fun fact: George Eliot (Mary Ann Evans) wrote the first English translation of Ethics in 1856, but it sat unpublished until 1981.

Deeper Dive into Estimating BTAA Sociology Serials Holdings with WMS APIs, Z39.50, and Spreadsheets / Library | Ruth Kitchin Tillman

Two years ago, my colleague Stephen Woods approached me about collaborating on an article¹ extending research he’d already performed about serials use in doctoral sociology work. He and another colleague, John Russell, had developed a methodology for determining “CDRank” based on the number of times a journal was citated across a dissertation dataset and the number/% of dissertations it was cited in.²

On his sabbatical, Stephen had mined citations in 518 sociology dissertations from Big Ten schools. He planned to perform a CDRank analysis and determine the most influential journals by school and see where they overlapped (or didn’t). He had a spreadsheet of titles and ISSNs for the 5,659 distinct journals cited and a related question: What did holdings for these look like across the Big Ten Libraries?

He was interested in whether the highest-ranked journals were more universally held, whether there were any noticeable gaps, basically would any patterns emerge if we looked at the Big Ten’s holdings for these journals. And then, at an institution-by-institution level, were any of the most-used journals for that institution not held by the library?

As anyone who works with it knows, holdings data is notoriously difficult. But I was interested in it as a challenge: could I combine a variety of resources to come up with a reasonable assessment of which libraries had some holdings of the serial in question?

Obtaining Library Holdings: The Summary

The journal article was focused on the outcome, so it wasn’t a place for me to write a deep dive of the process I used for identifying holdings. This is the summarized version from the article:

The 57,777 citations were condensed to a list of 5,659 distinct journal title/ISSN entries. Holdings data from across the BTAA was then queried to determine the extent to which these journals are held by BTAA institutions. It was first necessary to obtain all representative ISSNs for each journal. The WorldCat Metadata API 2.0 was queried by ISSN and results were processed to identify additional ISSNs. These additional ISSNs were used in subsequent queries. During this process, 25 titles were identified that did not include ISSNs and the list was reduced to 5,634 unique pairings.

Holdings data was obtained from the WorldCat Search API v.2 and Z39.50 services. First, the WorldCat Search API’s bibliographic holdings endpoint was queried by ISSN and results limited to a list of OCLC symbols representing the BTAA institutions. However, an institution’s WorldCat holdings may not be up-to-date and are unlikely to represent electronic-only items. In the second step, MarcEdit software was used to query each institution’s Z39.50 service by ISSN for any of the 5,634 entities not found at that institution during the WorldCat API phase. This combined holdings data was saved to a JSON database.

In limitations, I addressed some of the challenges I ran into:

Holdings represent those found in WorldCat and respective library ILSes during November 2023. Several factors limit the effectiveness of representing libraries’ journal holdings. Coverage is not recorded in ways which can be easily machine-parsed at scale to determine whether holdings represent part or all of a publication run. E-journal records are often updated on a monthly basis, resulting in varying results by month. Additionally, if a library does not have sufficient staffing to perform updates, their WorldCat holdings statements may not reflect recent weeding. The presence of WorldCat holdings or of a record in the library’s ILS (queried by Z39.50) indicates, at minimum, that the library has held some coverage of this journal at some point in time.

University of Nebraska-Lincoln’s Z39.50 documentation was not available online and email inquiries were not answered, so the Z39.50 phase could not be run. Gaps in Nebraska’s WorldCat holdings for the combined list of top 161 journals were manually queried by title and ISSNs using the library’s Primo discovery search. As indicated in Results, all but two of these journals were found.

Obtaining Library Holdings: The Whole Story

Even for our original intended audience at Serials Review,³ a full writeup would’ve been too deep a dive (buckle in, this is 2500 words), but I really enjoyed (and hated) the challenge of figuring out how to even tackle the project and solving problems along the way (except when I tore my hair out). So I thought I’d share it here.

I need to preface by noting again that my research question was not whether an institution had complete holdings or precisely which holdings they had. It’s challenging to do that at the scale of one’s own institution. My question was:

Does this institution appear to hold print or electronic versions of some segment of this serial?

Processing ISSNs for Siblings, Mostly

First, I evaluated my starting data. I had journal titles and ISSNs. A quick check confirmed my hypothesis that some were for print materials and some were for e-journals. I wanted to check for both kinds, of course.

Because library records don’t yet have rich cluster ISSNs and I didn’t have a API subscription to the ISSN Portal,⁴ I decided to use the next best thing – WorldCat. I searched brief bibs in the WorldCat Search API v.2 using the ISSN to obtain all records. I used a function to run through list of ISSNs in a brief bib, clean it up if needed, and append all ISSNs found to a new list. So my output was the title, original ISSN, and a list of all ISSNs found.

{ "title": "journal of loss and trauma",
"original_issn": "1532-5024",
"all_issns": ["1532-5024", "1532-5032"] }

Challenges

The first challenge I ran into was that I was relying on data which came from a field originally intended for recording misprints. However, it had become repurposed to do what I wanted – record the ISSN for the other (print / electronic) version of a serial. Frustration with this dual purpose sparked my MARC Misconceptions post re: the 022$y which explores the problem further. After some rabbit holes attempting to find ways to identify these and some spot checks to identify how often the problem was happening, I finally accepted that the holdings data was just going to be generally accurate. I also decided that I would allow for continuations if they showed up in the ISSN data because when a record had more than 2 ISSNs, my spot checking determined it was almost always for one or more continuations vs. another work entirely.

A more concrete problem was that sometimes ISSN data was recorded with hyphens and sometimes it wasn’t. Sometimes it even contained parentheticals. I developed some rather complex processing logic, including regular expressions and substring slices, to turn a field into just the ISSN, formatted as 1234-5678. Using Regex, I reviewed my data and manually corrected the few errors, most of which were caused by a cataloger typoing a 7-digit ISSN, e.g. 234-5678 and 1234-567.

I also used this phase to manually review a small handful of ISSNs which showed up in two records. In most cases, they were for the same serial. The original citations had used slight title variants (author error) and then the print and electronic ISSNs (natural variance), leading to a database duplication. A few were continuations. I resolved all of these, referring to the ISSN Portal when needed. I also informed Stephen so he could recalculate CD Rank of the merged records.

25 of the journals in the original dataset simply had no ISSNs. While I was running my big scripts to gather sibling ISSNs, I used the ISSN Portal to confirm that they really had no ISSN. Fortunately, all had extremely low CDRanks representing one-off citations.

Querying Holdings: WorldCat

Next, I needed to get actual holdings. The one place I could think of to get holdings in aggregate was, again, WorldCat. I used the bibs holdings API for this one.

First, I created a list of the institutional identifiers for each school. For each record in my database, I ran the list of its potential ISSNs (most often just a pair) through the API using the “heldBySymbol” limiter and grabbed a list of institutions with some holding for this ISSN. It output these to a JSON file/database of records consisting of: title, the original ISSN, the list of ISSNs, the list of holding institutions.

{ "title": "journal of loss and trauma",
"original_issn": "1532-5024",
"holdings": ["MNU", "UPM","IPL","UMC","OSU","GZM","NUI","LDL","IUL","EYM","EEM"],
"all_issns": ["1532-5024", "1532-5032"] }

However, my years of experience working in cataloging departments and with library data meant I also know that WorldCat holdings are unreliable. Worst case for this research, the institution had weeded the journal and not updated their holdings. But, conversely, they likely didn’t provide holdings information for their e-journal records.

Sampling the results I got at this phase, I knew I wasn’t getting the whole picture…

Querying Holdings: Z39.50

So far, I’d been able to work on the whole thing as a single batch – one big batch of ISSN sibling hunts, one big batch where I queried all the library codes at once.⁵ But now, it was time to get targeted.

I wrote a script to check each each entry in the database for which institutions were not present. It wrote all the ISSNs from these entries to a line break-separated text file of ISSNs. I saved these by symbol, so UPM.txt, EEM.txt, etc. Some of these files were 3000 ISSNs long (but keep in mind that, in most cases, several ISSNs represent the same journal).

I then used MarcEdit to query each institution’s Z39.50 for the whole thing.

Now, in addition to writing MARC files, MarcEdit provides a handy log of your query and results:

Searching on: 1942-4620 using index: ISSN
0 records found in database University of Michigan
Searching on: 1942-4639 using index: ISSN
0 records found in database University of Michigan
Searching on: 1942-535X using index: ISSN
1 records found in database University of Michigan

I saved these as text files⁶ and then ran a Python script over them to process the search key and results. It read through each pair of lines, made a dict of the results {"1942-4620" : 0, "1942-4639" : 0, "1942-535X" : 1}, then opened the JSON database and updated the holdings. I used an “if value not in” check so that an entry’s holdings would only update once even if the Z39.50 output matched 3 sibling ISSNs from that entry.

…this was one of those moments in coding where you feel like an utter genius but worry that you might be a little unhinged as well.

Querying Holdings: Shared Print

In some cases, the reason an institution didn’t have a journal any more was that they’d contributed it to the BTAA Shared Print Repository. This project specifically targeted journals, so it was entirely possible that one of these institutions had eased its shelf space by sending a run to Shared Print.

Using my contacts at the BTAA, I got emails for the people at Indiana and Illinois who actually managed the projects. Fortunately, both had holdings spreadsheets, including ISSNs, and were willing to share them.

I wrote a Python script to take these (as CSVs) and check for the presence of an ISSN in each spreadsheet. If it found the ISSN, it would write the OCLC library code (UIUSP or IULSP) to the output database. I wrote and ran this while gathering Z39.50 data, since that took several weeks.

This turned out to be a non-issue for the overall project, since almost all of the top 161 journals were held at all the institutions. If contributions to shared print were partly judged on the basis of usage, this would make sense. Still, it might be interesting to look at shared print coverage of the database as a whole.

Minding the Gaps

There was one big gap in the whole thing – University of Nebraska-Lincoln. They had somewhat recently migrated to Alma, their systems librarian at left when they migrated, and they had not yet filled the position. I contacted several people there asking about Z39.50 access for a research project but didn’t hear anything. (Fortunately, they’ve now got a new and engaged systems librarian whom I met at ELUNA this year.)

Anyway, this posed a challenge. If they had Z39.50 turned on, it wasn’t showing up in any of the ways I could think of. I made several attempts, mimicking the many other Alma schools I had queried. Nothing worked.

By this point, we had a combined list of the top 161 journals. We also had partial holdings data for Nebraska from the WorldCat query. So I sat down and did this one manually. I think I searched ~30 journals by hand in their Primo front-end, using advanced search by ISSN and then by title/material type if the ISSN didn’t come up (and then double check to ensure it was the right journal). I marked all the ones I found and used this data to update the top 161.

Because there weren’t many, I decided to be as thorough as possible and manually check each institution’s catalog/discovery/e-journal finders for remaining gaps in the top 161.

Observations

In some ways, my findings were not very exciting: BTAA schools widely hold (at least some of) the journals most commonly used in sociology dissertations. Either that or the commonality of these holdings means that they’re the most widely used. (But many others were just as widely held and not as widely used, so I suspect the former, with internal usage data playing a role in retention.)

Ultimately, my process got far more data than we actually used. I could’ve just run the queries for the top 161. That would’ve been a much smaller project and I could’ve thoroughly validated my results. For example, I would’ve checked any instances where the ISSN list contained more than 2, using the ISSN Portal to be sure these were cases of a journal continuation vs. an actual incorrect ISSN. But when we started, Stephen was still working on his own analysis of the data. And while an enormous job, this yielded a really interesting database of results, something I might be able to revisit in the future. It was also a fascinating challenge.

Woods, Stephen, and Ruth Kitchin Tillman. “Supporting Doctoral Research in Sociology in the BTAA.” Pennsylvania Libraries: Research & Practice, 13, no. 1 (2025). 10.5195/palrap.2025.303. ↩︎
Woods, Stephen and John Russell. “Examination of journal usage in rural sociology using citation analysis.” Serials Review, 48, no. 1–2 (2022), 112–120. 10.1080/00987913.2022.2127601 ↩︎
We’d intended this for Serials Review, like the other articles Stephen had published in this vein, but they did not respond to our submission for more than 6 months (they did finally acknowledge some time after we pulled it from consideration) and failed to publish an issue, so we pulled it. ↩︎
Though I sure did use it manually throughout the project. ↩︎
Lest this sound smooth on its own, it required a lot of iterative scripting and testing, followed by running them in batches, and dealing with occasional errors which ground things to a halt. It was sometimes exciting and sometimes stressful. At one point, I got snippy with OCLC support for an issue that was on my code’s end (though I still think it should have given a different error message). ↩︎
After spending a day or more running a Z39.50 query, I always felt so nervous at this stage, paranoid that I would close the log while I was attemping to copy it. ↩︎

Going around in Circles: Interrogating Librarians’ Spheres of Concern, Influence, and Control / In the Library, With the Lead Pipe

In Brief: The practice placing one’s anxieties into circles of concern, influence, and control can be found in philosophy, psychology, and self-help literature. It is a means of cultivating agency and preventing needless rumination. For librarians, however, it is often at odds with a profession that expects continuous expansion of responsibilities. To reconcile this conflict, it is useful to look back at the original intent of this model, assess the present library landscape through its lens, and imagine a future in which library workers truly feel in control of their vocation.

By Jordan Moore

Introduction

It is a beautiful experience when you discover something that reorients your entire outlook on life. This happened to me during one of my first therapy sessions after being diagnosed with Generalized Anxiety Disorder. My therapist gave me a piece of paper and a pencil and instructed me to draw a large circle. Next, they told me to imagine that circle was full of everything I was anxious about, all the real and hypothetical problems that stressed me out. We labeled that circle “concern.” Then, they asked me to draw a much smaller circle in the middle of it. I would say it was one-tenth the size of the first circle. “That” they said, “represents what you can control.”

A small circle labeled control, within a large circle labeled concern.

I felt disheartened while looking at that picture, as if it spelled out a grave diagnosis. The second circle was already so small, and I could have sworn it was even tinier when I looked back at the page and compared it to the first circle. Then, we began to populate the circle of control with what was in my power to determine – how much sleep I got, how often I reached out to loved ones, how many hours I spent doomscrolling, and so on. Finally, my therapist asked, “How much time do you spend thinking about things in the outer circle?” If I didn’t answer 100%, the number was close. They tapped a finger on the inner circle and, in the way that therapists often phrase advice as a question, asked “What if you concentrated on what is in your control instead?” What if indeed.

That conversation occurred over a decade ago. Since then, I have grown accustomed to categorizing my anxieties into ones of concern or control. If something is weighing on me, but is outside of my circle of control, I do my best not to ruminate on it, or at least redirect my thoughts back to what I, as a single person, can do. I try to devote most of my energy to practices that keep me in good health and good spirits. This has done wonders for my mental health. It has also proven beneficial in my professional life, keeping me focused on the aspects of my job that fulfill me. It has become so integral to my way of thinking that I have even discussed the concept (and the context I learned it from) at work. Naturally, I was at first hesitant to bring “therapy talk” into work. However, it has proven to be a catchy idea. I have been at numerous meetings where someone describes a situation, often the behavior of patrons or administrators, as “outside of our circle,” with a nod in my direction.

Sometimes, though, instead of accepting the situation for what it is, we discuss what adjustments we need to make to our practice or policy to fix the situation. When these types of conversations occur, I think back to that original drawing of two circles. Suddenly, another circle appears between the circle of concern and control. It is the circle of influence. It’s something that wasn’t in my initial understanding of the model, but is in plenty of other illustrations. It is a place meant for one to use tools in their circle of control to enact a small, person-sized amount of impact to their circle of concern. An example of this would be a librarian informing a lingering patron that the library is closing soon. They are not going to pick the patron up and toss them out the door, but they can encourage them to exit promptly. That is a reasonable expectation of influence. An unreasonable expectation would be if that librarian felt the need to make sure that that patron, or any patron, never had a negative thing to say about the library. In my experience, it appears that librarians and libraries seem to have high expectations of influence. I began to wonder why that is, and what could be done to alleviate that burden. To start, I decided to learn more about the model that had been so life-changing for me. That inquiry would take me back further than I expected.

An Unexpected Literature Review

Because I need to find a new therapist every time my health insurance changes – Great job, American Healthcare system! – I unfortunately could not ask the therapist who introduced me to the model of circles how they learned about it. Fortunately, looking for answers is part of my job, and I was able to play both parts of a reference interview. One of the first websites I visited was “Understanding the Circles of Influence, Concern, and Control,” written by Anna K. Scharffner. I noticed that Schraffner’s qualifications include “Burnout and Executive Coach,” which let me know others were thinking about this concept in the workplace. I also noticed that Schraffner’s model includes a sphere of influence. In her description of that area, she writes, “We may or may not have the power to expand our influence… We can certainly try. It is wise to spend some of our energy in that sphere, bearing in mind that we can control our efforts in this sphere, but not necessarily outcomes.”

A circle containing 3 rings: the innermost ring is labeled "circle of control: things I can control," the middle ring is labeled "circle of influence: things I can influence" and the outer ring is labeled "circle of concern: things that are outside of my control"

As I continued reading interpretations of the circles model, I noticed references to other concepts that I only had passing familiarity with. The oldest among these was Stoicism. To learn more, I decided to speak with my brother-in-law, a Classical Studies professor. After I told him about what I was researching, he said it had a lot in common with Stoics’ quest to lead a virtuous life by valuing logic and self-possession. At the root of Stoicism is the recognition of the difficult truth that humans cannot control much – neither the whims of capricious gods, nor the actions of flawed human beings. The Greek philosopher Epictetus states in the opening lines of his Enchiridion,

Some things are in our control and others not. Things in our control are opinion, pursuit, desire, aversion, and, in a word, whatever are our own actions. Things not in our control are body, property, reputation, command, and, in one word, whatever are not our own actions (I).

Later, the Roman emperor and philosopher Marcus Aurelius writes in his Meditations, “If thou art pained by any external thing, it is not this that disturbs thee, but thy own judgment about it. And it is in thy power to wipe out this judgment now” (VII. 47).

As unfamiliar and phonetically challenging as these authors and texts were at first glance, I was quickly able to make connections between them and literature in my own frame of reference. I recalled the line in Hamlet, “There is nothing either good or bad, but thinking makes it so” (II.ii). I thought back to reading Man’s Search for Meaning by Victor Frankl, which I picked up on the recommendation of another therapist. I remembered being particularly moved by the line, “Everything can be taken from a man but one thing: the last of the human freedoms – to choose one’s attitude in any given set of circumstances, to choose one’s own way” (75). It turns out I was a fan of Stoicism without knowing it.

Speaking of ideas I learned about in therapy – and you can tell I constantly am – the next concept I came across was cognitive behavior therapy (CBT). Having engaged in CBT work throughout my time in therapy, I was familiar with its thesis that maladaptive behaviors stem from “cognitive distortions,” thoughts and feelings about ourselves and our experiences that do not reflect reality. CBT posits that by challenging these distortions, one can think, feel, and act in a healthier way. What I did not know was that Aaron Beck, one of the pioneers of CBT, was a student of Stoicism. In Cognitive Therapy of Depression, he credits Stoicism as “the philosophical origins of cognitive therapy” (8). The connection made sense once I realized how much of my time with that first therapist was spent battling the cognitive distortion that I could control any situation if I worried about it hard enough.

I still wanted to learn more about the in-between space of influence, and why it seems particularly vast for librarians. As I continued to search for literature about the circle of influence, my references became less tied to philosophy and psychology and closer to self-help and business. One title that kept popping up, and one that I had heard before, was The 7 Habits of Highly Effective People by Stephen Covey. When I started reading it, I felt like I was in familiar territory. Covey supplies anecdotes of people benefitting from concentrating on the elements of their life that they can control, even referencing Viktor Frankl as an example. However, Covey later diverges from the Stoic belief that there are limits to our control. He combines the spheres of control and influence into one circle and instructs readers to pour their energy into it, not necessarily for the sake of their sanity, but for the opportunity to gain more influence. He calls this being “proactive” and writes, “Proactive people focus their efforts in the Circle of Influence. They work on the things they can do something about. The nature of their energy is positive, enlarging, and magnifying, causing their Circle of Influence to increase.” This idea of ever-increasing influence allows Covey to claim, “We are responsible for our own effectiveness, for our own happiness, and ultimately, I would say, for most of our circumstances” (96-98).

A circle within a larger circle. The inner circle is labeled circle of influence, the outer circle is labeled circle of concern. The inner circle has arrows pointing outward, to indicate that the inner circle (circle of influence) is growing. The image is labeled Proactive focus: positive energy enlarges the circle of influence.

Applications in Librarianship

Thinking about Covey’s advice in context of my job made me uneasy. His model, with its arrows pushing ever-outward, gave me the same sense of pressure I got from conversations about how my library or librarianship in general needs to do more to meet that day’s crisis. I also suspected that Covey’s argument for power over all circumstances ignores some basic truths that people, especially those without societal privilege, must face. I knew 7 Habits was popular, with dozens of reprints and special editions since its original publication. However, I was able to find critical voices who shared my skepticism. For instance, in “Opening Pandora’s box: The Unintended Consequences of Stephen Covey’s Effectiveness Movement,” Darren McCabe writes, “Covey preaches freedom, but he fails to acknowledge the constraints on freedom that operate within a capitalist system,” and notes that Covey’s outlook “may be acceptable in a utopian society, but not when one faces inequality, pay freezes, work intensification, monotonous working conditions, autocratic management, or redundancy” (186-187). I also recalled how Schaffner, the burnout specialist, advises against devoting too much energy to the circle of influence, saying we can only control our efforts, not our outcomes. Having brought my research of the history of the spheres model up to the present, I was ready to turn to library literature to see how they play out in the profession.

Giving until it hurts

Since the topic of burnout was fresh on my mind, I began by revisiting Fobazi Ettarh’s “Vocational Awe and Librarianship: The Lies We Tell Ourselves.” In it, she characterizes librarianship’s inflated sense of responsibility and influence like this: “Through the language of vocational awe, libraries have been placed as a higher authority and the work in service of libraries as a sacred duty.” Ettarh describes how this can cause librarians to be underpaid, overworked, and burnt out. After all, it is much more difficult to negotiate the terms of a sacred duty than an ordinary job.

Ettarh is also quoted in Library Journal and School Library Journal’s 2022 Job Satisfaction Survey by Jennifer A. Dixon titled, “Feeling the Burnout: Library Workers Are Facing Burnout in Greater Numbers and Severity—And Grappling with it as a Systemic Problem.” In it, Ettarh states “One of the biggest system-wide problems, when it comes to librarianship, is job creep.” This term describes the continual addition of responsibilities librarians are expected to perform. The report also describes “mission creep,” where libraries, particularly public ones, become response centers for issues that are far afield from routine services. This results in librarians being responsible for assisting patrons experiencing drug overdoses, mental health crises, and homelessness. In these situations, librarians are rarely given additional training or resources, and are indeed dealing with these crises exactly because society at large does not give them adequate attention or funding. In summary, job creep and mission creep cause librarians’ circle of concern to expand, and, as the report illustrates, attempting to exert control or influence over all that new territory can spell disaster. Dixon puts it this way, “With institutions continually cutting budgets without actually reducing their expectations of what library workers can accomplish, those who are committed to service and to their profession will continue pushing themselves to the point of burnout.”

Feeling the disparity

The job satisfaction survey points to another source of discontent for librarians, and that is the cognitive dissonance caused by the gulf between their perceived level of influence and their actual level of influence. For academic librarians, the issue can be seen in the lack of recognition of their expertise in comparison to other professionals on campus. The ambiguous status of academic librarians is also listed as a contribution to low morale in a 2021 Journal of Library Administration review. This review cites Melissa Belcher’s “Understanding the experience of full-time nontenure-track library faculty: Numbers, treatment, and job satisfaction,” which illustrates how academic librarians enjoy less autonomy and less professional courtesy than traditional faculty. I could very much relate to the sentiments expressed in these articles. It is a classic academic librarian conundrum to be expected to be in constant contact with faculty, but not be able to get them to reply to emails.

For public librarians in the Library Journal survey, the issue can be seen in the disconnect between their perceived status as “heros” or “essential workers” and the antagonism they face from patrons, particularly while attempting to enforce masking during COVID. The Journal of Library Administration review also notes that physical safety is of particular concern to public librarians, stating “It is important to note that morale in libraries can be impacted not only by theoretical or conceptual concerns, but also by qualms about basic physical safety from surrounding communities.” Since hostility toward public libraries has only increased since the report’s publication due to their vilification from federal, state, and local powers, its words are prescient.

Because my experience is limited to academia, I wanted to get a public librarian’s take on the Library Journal job satisfaction survey. When I brought it up to a friend, they were kind enough to share their thoughts, though they wished to remain anonymous. They wrote that during COVID,

“There were a few especially egregious instances of aggression due to patrons’ unwillingness to wear a mask that still affects how I view these folks today. Management was unsupportive, did not originally handle these volatile encounters by stopping people at the door, and expected other staff members lower on the hierarchy to handle these issues.”

In those instances, management had something that was in their control (whether patrons could enter the building without masks) and chose instead to leave it up to librarians to influence patrons’ behaviors.

My friend also provided examples of how management used both vocational awe and job creep to overload staff. They summed up the situation like this,

“Workloads are never analyzed before staff members are given even more tasks, and if there is any sort of push back, you are viewed as not being a team player. People who speak up are used as examples, and the rest of the staff stays quiet because they fear similar retaliation… I’m always like, ‘OMG, if you don’t like something, please speak up so I’m not constantly viewed as causing trouble and it’s not only me who has the issue.’”

Starting from the top

The stories featured in these articles about job satisfaction and moral, as well as my friend’s account, reminded me of Anne Helen Peterson’s 2022 talk, “The Librarians Are Not Okay,” which appeared in her newsletter, Culture Studies. In it, she lays out the necessity of institutional guardrails to do the work that individual boundaries cannot accomplish alone. She explains that in today’s parlance, “Boundaries are the responsibility of the worker to maintain, and when they fall apart, that was the worker’s own failing.” Guardrails, on the other hand, are “fundamental to the organization’s operation, and the onus for maintaining them is not on the individual, but the group as whole.” An individual’s boundaries can be pushed for many reasons. They could be trying to live up to the ideal that their vocational awe inspires, as Ettarh puts it. Their management may be using that vocational awe to turn any pushback into accusations of betrayal, as both Ettarh and my friend describe. Peterson shows how guardrails can remedy those internal and external pressures by creating a shared understanding of expectations. Those expectations play a critical role in preventing burnout. She gives the example, “an email is not a five-alarm fire, and you shouldn’t train yourself to react as if it was, because that sort of vigilance is not sustainable.” Peterson’s piece caused me to reflect on times that I have talked about the circles of concern, influence, and control in the workplace. I appreciated the moments when all of us, including administration, agreed that something was outside of our responsibility, and we would breathe a sigh of relief. And those occasions when a supervisor or administrator told me not to worry about something I was convinced I needed to worry about? Heaven.

In the interest of exploring what the circles of concern, influence, and control may look like for administrators, I read the most recent Ithaka S+R Library Survey, published in 2022. This survey of academic library leadership offered interesting examples of administrators grappling with the breadth of their concern and the limits of their influence. The report explains,

“Convincing campus leaders of the library’s value proposition remains a challenge. While over 72 percent of library deans and directors report high levels of confidence in their own ability to articulate their library’s value proposition in a way that aligns with the goals of the institution, only 51 percent are confident other senior administrators believe in this alignment.”

The study also lists several key issues, such creating impactful Diversity, Equity, Inclusion and Accessibility (DEIA) initiatives, hiring and retaining staff in technology roles, and supporting Open Access, that leaders categorize as high priorities, yet express a low level of confidence in their organization’s strategies to address these concerns. (This is even before the federal government and the Department of Education began attacking DEIA measures and threatening institutional funding.) At the same time, the survey offers examples of administrators resisting mission creep and focusing their efforts on library service inside their control. The report states, “Deans and directors see the library contributing most strongly to increasing student learning and helping students develop a sense of community, rather than to other metrics such as addressing student basic needs or improving post-graduation outcomes.” Survey results about budgetary considerations also demonstrate the leaders’ commitment to recruiting and retaining positions with high customer-service impact. All in all, the survey shows that these leaders recognize that their library cannot do it all. Because of that, they make strategic choices on where to allot resources, and just as importantly, where to not. Being in charge of their institution, that is their prerogative. But what if that kind of decision-making was available to individuals, as well?

Taking it slow

There is an existing philosophy in our field that complements the philosophy of circles very nicely – slow librarianship. On her blog, Information Wants to be Free, in a post titled “What is Slow Librarianship,” Meredith Farkas describes what slow librarianship values. She writes, “Workers in slow libraries are focused on relationship-building, deeply understanding and meeting patron needs, and providing equitable services to their communities. Internally, slow library culture is focused on learning and reflection, collaboration and solidarity.” In describing what slow librarianship opposes, she writes, “Slow librarianship is against neoliberalism, achievement culture, and the cult of productivity.” Similarly to Peterson, Farkas describes how sticking to these principles require not just boundaries, but guardrails. She writes,

“One of the most important pieces of the slow movement is the focus on solidarity and collective care and a move away from the individualism that so defines the American character. If you’re only focused on your own liberation and your own well-being, you’re doing it wrong.”

What I appreciate about this picture of slow librarianship is that it gives librarians a useful framework to decide if they should dedicate time and energy to a task. It must be meaningful to both the patrons and themselves, and it must support the relationship between them. Better yet, when they identify such a task, they are not going at it alone, but with the community they have developed. Even better still, slow librarianship demands that librarians use their influence not to expand what they control, but to protect what is important to themselves and others.

Another benefit of slow librarianship is that it can alleviate some of the causes of burnout. In “Rising from the Flames: How Researching Burnout Impacted Two Academic Librarians,” Robert Griggs-Taylor and Jessica Lee discuss the changes they have made to their management style after studying and experiencing different factors of burnout. Although the authors do not call their approach slow librarianship, several of their adjustments align with its tenets. This includes encouraging staff to pursue avenues of interest during the workday and to take earned time away from work without overdrawn explanation or guilt. The article is another example of how administrative influence can allow librarians to maintain a healthy circle of control.

I’ve spent the majority of this article using circular imagery to get my point across, but let me offer two more ways of thinking about slow librarianship. In “The Innovation Fetish and Slow Librarianship: What Librarians Can Learn from the Juciero,” Julia Glassman uses flowers, specifically the jacaranda, as a metaphor for the importance of rest and reflection. She explains how in order to bloom in one season, flowers go dormant in others. She writes, “It’s supremely unhealthy, for both individuals and organizations, to try to be in bloom all the time.” I am more of an indoor person, so what comes to my mind is The Fellowship of the Rings and Bilbo Baggins’ description of exhaustion as feeling “like butter that has been scraped over too much bread” (40). When I shared this line with my current therapist, they pointed out that the problem in that scenario is not a lack of butter, but an excess of bread. Librarians have enough butter. We are talented, motivated, and knowledgeable people. There is just too much bread to be concerned about! We can continue to spread ourselves thin, or we can take on only what we can manage without scraping.

Conclusion

If this article were a therapy session – and it may as well be – now would be when the therapist says, “we’re just about out of time” and we would take stock of what we’ve learned. So, we know librarians are being pressured by patrons, administrators, and their own sense of duty to overextend themselves. Even librarians in leadership positions seem to recognize that pouring time, energy, or money into a concern does not guarantee influence over it. This may sound like a sad state of affairs, but I still believe in the philosophy of circles, because it has always meant to cultivate agency in the face of adversity. For librarians and libraries, being cognizant and honest about what aspects of the profession are inside each circle is a start. The next challenge is to maintain those distinctions in the face of internal and external pressures to exert influence over every concern, risking job creep, mission creep, and burnout. Even if one’s work environment is not conducive to such thinking, the beauty of this concept is that it starts with the individual. If it remains an internal process to keep anxiety in check? Great! If it ends up being discussed in staff meetings? Also great! I did not begin talking about it with colleagues in a Covey-esque maneuver to increase my influence in the workplace. In the same vein, I did not write this article with the idea that librarians everywhere will suddenly be free of outsized expectations. Although, the idea certainly is appealing. It would mean not being seen as the last bastion of intellectual freedom or the single remaining thread of a ruined social safety net. Librarians would be able to go slower, grow stronger roots, and not try to cover so much ground (or bread). All that would be lovely, but this exercise has taught me to start small. So I will pose this last question: What would happen if one librarian was empowered to reconsider one of their expectations and nurture one part of their practice that is truly in their control? And yes, that was advice phrased as a question.

Acknowledgements

Thank you to my reviewers, Brea McQueen and Patrice Williams. Thank you to my publishing editor, Jessica Schomberg. Thank you to Alexander Hall, Teaching Professor of Classical Studies & Latin at Iowa State University, for talking shop during family time. Thank you to the public librarian who shared their challenges in trying to create a healthier environment for themself and their colleagues. Thank you to the mental health professionals who have given me advice throughout the years. I’m glad I wrote so much of it down!

Works Cited

Anonymous. Personal interview. 13 March 2025.

Aurelius, Marcus. The Meditations, translated by George Long, 1862. https://classics.mit.edu/Antoninus/meditations.html.

Becher, Melissa. “Understanding the Experience of Full-time Nontenure-track Library Faculty: Numbers, Treatment, and Job Satisfaction.” The Journal of Academic Librarianship, 45, no. 3 (2019) 213-219. https://doi.org/10.1016/j.acalib.2019.02.015.

Beck, Aaron T. Cognitive Therapy of Depression. Guilford Press, 1979.

Covey, Stephen R. The 7 Habits of Highly Effective People. 1989. RosettaBooks LLC, 2012.

Dixon, Jennifer A. “Feeling the Burnout: Library Workers Are Facing Burnout in Greater Numbers and Severity—And Grappling with it as A Systemic Problem.” Library Journal 147, no. 3 (2022): 44. https://www.proquest.com/trade-journals/feeling-burnout/docview/2634087993/se-2.

Epictetus. The Enchiridion, translated by Elizabeth Carter, 1807. https://classics.mit.edu/Epictetus/epicench.html.

Ettah, Fobazi. “Vocational Awe and Librarianship: The Lies We Tell Ourselves.” In the Library With the Lead Pipe, 10 Jan. 2018. https://www.inthelibrarywiththeleadpipe.org/2018/vocational-awe.

Farkas, Meredith. “What is Slow Librarianship?” Information Wants to Be Free, 18 October 2021. https://meredith.wolfwater.com/wordpress/2021/10/18/what-is-slow-librarianship.

Frankl, Viktor E. Man’s Search for Meaning. Translated by Ilse Lasch, New York: Beacon Press, 2006.

Glassman, Julia. “The Innovation Fetish and Slow Librarianship: What Librarians Can Learn from the Juciero.” In the Library With the Lead Pipe, 18 Oct. 2017. https://www.inthelibrarywiththeleadpipe.org/2017/the-innovation-fetish-and-slow-librarianship-what-librarians-can-learn-from-the-juicero.

Griggs-Taylor, R., & Lee, J. “Rising from the Flames: How Researching Burnout Impacted Two Academic Librarians.” Georgia Library Quarterly, 59, no. 4 (2022). https://doi.org/10.62915/2157-0396.2539.

Hulbert, Ioana G. “US Library Survey 2022: Navigating the New Normal.” Ithaka S+R. Last modified 30 March 2023. https://doi.org/10.18665/sr.318642.

McCabe, Darren. “Opening Pandora’s Box: The Unintended Consequences of Stephen Covey’s Effectiveness Movement.” Management Learning 42, no. 2 (2011): 183. https://doi.org/10.1177/1350507610389682.

Petersen, Anne Helen. “The Librarians are Not Okay.” Culture Studies, 1 May 2022. https://annehelen.substack.com/p/the-librarians-are-not-okay.

Schaffner, Anna Katharina. “Understanding the Circles of Influence, Concern, and Control.” Positive Psychology. Last modified 13 March 2023. https://positivepsychology.com/circles-of-influence.

Shakespeare, William. Hamlet from The Folger Shakespeare. Ed. Barbara Mowat, Paul Werstine, Michael Poston, and Rebecca Niles. Folger Shakespeare Library, https://folger.edu/explore/shakespeares-works/hamlet.

Weyant, E. C., Wallace, R. L., & Woodward, N. J. “Contributions to Low Morale, Part 1: Review of Existing Literature on Librarian and Library Staff Morale.” Journal of Library Administration, 61, no 7 (2021): 854–868. https://doi.org/10.1080/01930826.2021.1972732.

[Announcement] Open Data Editor 1.6.0 AI-enhanced Version Release / Open Knowledge Foundation

We are glad to announce today the release of ODE's new version. The app is now evolving into a key companion tool in the early and critical stages of your AI journey.

The post [Announcement] Open Data Editor 1.6.0 AI-enhanced Version Release first appeared on Open Knowledge Blog.

2025-08-03: The Wayback Machine Has Archived at Least 1.3M goo.gl URLs / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

The interstitial page for https://goo.gl/12XGLG, telling the user that Google will soon abandon this shortened URL.

Last year, Google announced it intended to deprecate its URL shortener, goo.gl, and just last week they released the final shut down date of August 25. I was quoted in Tech Friend, a Washington Post newsletter by Shira Ovide, joking that the move "would save Google dozens of dollars." Then last Friday, Google announced a slight update, and that links that have had some activity in "late 2024" would continue to redirect.

To be sure, the shut down isn't about saving money, or at least not about the direct cost of maintaining the service. goo.gl stopped accepting new shortening requests in 2019, but continued to redirect existing shortened URLs, and maintaining the server with a static mapping of shortened URLs to their full URLs has a negligible hardware cost. The real reason is likely that nobody within Google wants to be responsible for maintaining the service. Engineers in tech companies get promoted based on their innovation in new and exciting projects, not maintaining infrastructure and sunsetted projects. URL shorteners are largely a product of the bad old days of social media, and the functionality has largely been supplanted by the companies themselves (e.g., Twitter's t.co service, added ca. 2011). URL shorteners still have their place: I still use bitly's custom URL service to create mnemonic links for Google Docs (e.g., https://bit.ly/Nelson-DPC2025 instead of https://docs.google.com/presentation/d/1j6k9H3fA1Q540mKefkyr256StaAD6SoQsJbRuoPo4tI/edit?slide=id.g2bc4c2a891c_0_0#slide=id.g2bc4c2a891c_0_0). URL shorteners proliferated for a while, and most of them have since gone away. The 301works.org project at the Internet Archive has archived a lot, but not all, of the mappings.

When Shira contacted me, one of the things she wanted to know was the scale of the problem. A Hacker News article had various estimates: 60k articles in Google Scholar had the string "goo.gl", and another person claimed that a Google search for "site:goo.gl" returned 9.6M links (but my version of Google no longer shows result set size estimates).

2025-08-03 Google Scholar search for "goo.gl"

2025-08-03 Google search for "goo.gl"

Curious and not satisfied with those estimates, I started poking around to see what the Internet Archive's Wayback Machine has. These numbers were taken on 2025-07-25, and will surely increase soon based on Archive Team's efforts.

First, not everyone knows that you can search URL prefixes in the Wayback Machine with the "*" character. I first did a search for "goo.gl/a*", then "goo.gl/aa*", etc. until I hit something less than the max of 10,000 hits per response.

https://web.archive.org/web/*/goo.gl/a*

https://web.archive.org/web/*/goo.gl/aa*

https://web.archive.org/web/*/goo.gl/aaa*

https://web.archive.org/web/*/goo.gl/aaaa*

We could repeat with "b", "bb", "bbb", "bbbb", etc. but that would take quite a while. Fortunately, we can use the CDX API to get a complete response and then process it locally.

The full command line session is shown below, and then I'll step through it:

% curl "http://web.archive.org/cdx/search/cdx?url=goo.gl/*" > goo.gl

% wc -l goo.gl

3974539 goo.gl

% cat goo.gl | awk '{print $3}' | sed "s/https://" | sed "s/http://" | sed "s/?.*//" | sed "s/:80//" | sed "s/www\.//" | sort | uniq > goo.gl.uniq

% wc -l goo.gl.uniq

1374191 goo.gl.uniq

The curl command accesses the CDX API, searching for all URLs prefixed with "goo.gl/*", and saves the response in a file called "goo.gl".

The first wc command shows that there are 3.9M lines in a single response (i.e., pagination was not used). Although not listed above, we can take a peek at the response with the head command:

% head -10 goo.gl

gl,goo)/ 20091212094934 http://goo.gl:80/ text/html 404 2RG2VCBYD2WNLDQRQ2U5PI3L3RNNVZ6T 298

gl,goo)/ 20091217094012 http://goo.gl:80/? text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1003

gl,goo)/ 20100103211324 http://goo.gl/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1166

gl,goo)/ 20100203080754 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1010

gl,goo)/ 20100207025800 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1006

gl,goo)/ 20100211043957 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1001

gl,goo)/ 20100217014043 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 999

gl,goo)/ 20100224024726 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1000

gl,goo)/ 20100228025750 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1003

gl,goo)/ 20100304130514 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1008

The file has seven space-separated columns. The first column is the URL in SURT format (a form of normalizing URLs), the second column is the datetime of the visit, and the third column is the actual URL encountered. The above response shows that the top level URL, goo.gl, was archived many times (as you would expect), and the first time was on 2009-12-12, at 09:49:34 UTC.

The third command listed above takes the 3.9M line output file, uses awk to select only the third column (the URL, not the SURT), and the first two sed commands remove the schema (http and https) from the URL, and third sed command removes any URL arguments. The fourth sed command removes any port 80 remnants, and fifth sed removes any unnecessary "www." prefixes. Then the result is sorted (even though the input should already be sorted, we sort it again just to be sure), then the result is run through the uniq command to remove duplicate URLs.

We process the URLs and not the SURT form of the URLs because in short URLs, capitalization in the path matters. For example, "goo.gl/003br" and "goo.gl/003bR" are not the same URL – the "r" vs. "R" matters.

goo.gl/003br --> http://www.likemytweets.com/tweet/217957944678031360#217957944678031360%23like

and

goo.gl/003bR --> http://www.howtogeek.com/68999/how-to-tether-your-iphone-to-your-linux-pc/

We remove the URL arguments because although they are technically different URLs, the "?d=1" (show destination) and "si=1" (remove interstitial page) arguments shown above don't alter the destination URLs.

% grep -i "003br" goo.gl | head -10

gl,goo)/0003br 20250301150956 https://goo.gl/0003bR application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 239

gl,goo)/0003br 20250301201105 https://goo.gl/0003BR application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 328

gl,goo)/0003br?d=1 20250301150956 https://goo.gl/0003bR?d=1 text/html 200 YS7M3IHIYA4PGO37JKUZBPMX3WDCK5QW 591

gl,goo)/0003br?d=1 20250301201104 https://goo.gl/0003BR?d=1 text/html 200 GSJJBSKEC2AULCMM3VLZZ4R7L37X65T7 718

gl,goo)/0003br?si=1 20250301150956 https://goo.gl/0003bR?si=1 application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 237

gl,goo)/0003br?si=1 20250301201105 https://goo.gl/0003BR?si=1 application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 325

gl,goo)/003br 20250228141837 https://goo.gl/003br application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 273

gl,goo)/003br 20250228141901 https://goo.gl/003bR application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 281

gl,goo)/003br2 20250302155101 https://goo.gl/003BR2 text/html 200 JO23EZ66WLVAKLHZQ57RS4WEN3LTFDUH 587

gl,goo)/003br2?d=1 20250302155100 https://goo.gl/003BR2?d=1 text/html 200 IQ6K5GU46N3TY3AIZPOYP4RLWZC4GEIT 623

The last wc command shows that there are 1.3M unique URLs, after the URL scheme and arguments have been stripped.

If you want to keep the arguments to the goo.gl URLs, you can do:

% cat goo.gl | awk '{print $3}' | sed "s/https://" | sed "s/http://" | sed "s/:80//" | sed "s/www\.//" | sort | uniq > goo.gl.args

% wc -l goo.gl.args

3518019 goo.gl.args

And the Wayback Machine has 3.5M unique goo.gl URLs if you include arguments (3.5M is, not unsurprisingly, nearly 3X the original 1.3M URLs without arguments).

Not all of those 1.3M (or 3.5M) URLs are syntactically correct. A sharp eye will catch that in the first screen shot for https://web.archive.org/web/*/goo.gl/a* there is a URL with an emoji:

Which is obviously not syntactically correct and that URL does not actually exist and is thus not archived:

https://web.archive.org/web/20240429092824/http://goo.gl/a%F0%9F%91%88 does not exist.

Still, even with a certain number of incorrect URLs, they are surely a minority, and would not effectively change the cardinality of unique 1.3M (or 3.5M) goo.gl URLs archived at the Wayback Machine.

Shira noted in her article that Common Crawl (CC) told her that they estimated 10M URLs were impacted. I'm not sure how they arrived at that number, especially since the Wayback Machine's number is much lower. Perhaps there are CC crawls that have yet to be indexed, or are excluded from replay by the Wayback Machine, or they were including arguments ("d=1", "si=1"), or something else that I haven't considered. Perhaps my original query to the CDX API contained an error or a paginated response that I did not account for.

In summary, thankfully the Internet Archive is preserving the web, which includes shortened URLs. But also, shame on Google for shutting down a piece of web infrastructure that they created, walking away from at least 1.3M URLs they created, and transferring this function to a third party with far fewer resources. The cost to maintain this service is trivial, even in terms of engineer time. The cost is really just intra-company prestige, which is a terrible reason to deprecate a service. And I suppose shame on us, as a culture and more specifically a community, for not valuing investments in infrastructure and maintenance.

Google's concession of maintaining recently used URLs is not as useful as it may seem at first glance. Yes, surely many of these goo.gl URLs redirect to URLs that are either now dead or are/were of limited importance. But we don't know which ones are still useful, and recent usage (i.e., popularity) does not necessarily imply importance. In my next blog post, I will explore some of the shortened URLs in technical publications, including a 2017 conference survey paper recommended by Shira Ovide that used goo.gl URLs, presumably for space reasons, to link to 27 different datasets.

–Michael

2025-08-10: Who Cares About All Those Old goo.gl Links Anyway? / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

11 of the 26 goo.gl URLs for data sets surveyed in Yin & Berger (2017)

In a previous post, I estimated that when Google turns off its goo.gl URL shortening service, at least 1.3M goo.gl URLs are already saved by the Internet Archive's Wayback Machine. Thanks to the efforts of Archive Team and others, that number will surely grow in the coming weeks before the shutdown. And Google has already announced plans to keep the links that have recently been used. But all of this begs the question: "who cares about all those old goo.gl links anyway?" In this post, I examine a single technical paper from 2017 that has 26 goo.gl URLs, one (1/26) of which is scheduled to be deprecated in two weeks. Assuming this loss rate (1/26) holds for all the goo.gl URLs indexed in Google Scholar, then at least 4,000 goo.gl URLs from the scholarly record will be lost.

In our discussions for the Tech Friend article, Shira Ovide shared with me "When to use what data set for your self-driving car algorithm: An overview of publicly available driving datasets", a survey paper published by Yin & Berger at ITSC 2017 in Japan (preprint at ResearchGate). I can't personally speak to the quality of the paper or its utility in 2025, but it's published at an IEEE conference and according to Google Scholar it has over 100 citations, so for the sake of argument I'm going to consider this a "good" paper, and that as a survey it is still of interest some 8 years later.

109 citations for Yin & Berger on 2025-08-09 (live web link).

The paper surveys 27 data sets that can be used to test and evaluate self-driving cars. Of those 27 data sets, 26 of them are directly on the web (the paper describing the BAE Systems data set has the charming chestnut "contact the author for a copy of the data"). For the 26 data sets that are on the web, the authors link not to the original link, such as:

http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/daimler_pedestrian_benchmark_d.html

but to the much shorter:

https://goo.gl/l3U2Wc

Presumably, Yin & Berger used the shortened links for ease and uniformity of typesetting. Especially in the two column IEEE conference template, it is much easier to typeset the 21 character goo.gl URL rather than the 98 character gavrila.net URL. But the convenience of the 77 character reduction comes with the loss of semantics: if the gavrila.net URL rotted (e.g., became 404, the domain was lost), then by visual inspection of the original URL, we know to do a search engine query for "daimler pedestrian benchmark" and if it's still on the live web with a different URL, we have a very good chance of (re)discovering its new location (see Martin Klein's 2014 dissertation for a review of techniques). But if goo.gl shuts down, and all we're left with in the 2017 conference paper is the string "l3U2Wc", then we don't have the semantic clues we need to find the new location, nor do we have the original URL with which to discover the URL in a web archive, such as the Internet Archive's Wayback Machine.

Fortunately, http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/daimler_pedestrian_benchmark_d.html is still on the live web.

Let's consider another example that is not on the live web. The short URL:

https://goo.gl/07Us6n

redirects to:

https://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

Which is currently 404:

https://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ (via https://goo.gl/07Us6n) is now 404 on the live web.

From inspection of the 404 URL, we can guess that "Caltech Pedestrians" is a good SE query, and the data appears to be available from multiple locations, including the presumably now canonical URL https://data.caltech.edu/records/f6rph-90m20. (The webmaster at vision.caltech.edu should use mod_rewrite to redirect to data.caltech.edu, but that's a discussion for another time).

The Google SERP for "Caltech Pedestrians": it appears the data set is in multiple locations on the live web.

https://data.caltech.edu/records/f6rph-90m20 is presumably now the canonical URL and is still on the live web.

Even if all the caltech.edu URLs disappeared from the live web, fortunately the Wayback Machine has archived the original URL. The Wayback Machine has archived the new data.caltech.edu URL as well, though it appears to be far less popular (so far, only 8 copies of data.caltech.edu URL vs. 310 copies of the original vision.caltech.edu URL).

https://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ is well archived at the Wayback Machine.

This 2017-03-29 archived version is probably close to the state of the page at the time as it was cited by Yin & Berger in 2017.

The new data.caltech.edu URL is archived, but less so (so far).

Resolving the 26 goo.gl URLs, 18 of them successfully terminate in an HTTP 200 OK. The eight that did not have the following response codes or conditions:

404 Not Found: 07Us6n, pxr3Yc, 0R8XX6 (see the discussion below for 0R8XX6)
403 Forbidden: xXDTwI
410 Gone: ausKsL (410 responses are rare in the wild!)
Timed out or did not resolve: GNNq0f, KRBCLa, rf12z6 (since these are not HTTP events, their final response code in the log terminates with a 302).

Although marked as "404" above, goo.gl/0R8XX6 resolves to an HTTP 200 OK, but it's that HTTP 200 response is actually to an interstitial page saying that this URL was not accessed in late 2024, and thus will be sunsetted on 2025-08-25. Appending the argument "?si=1" to bypass the interstitial page results in a redirection to the 3dvis.ri.cmu.edu page, and that URL is 404. Fortunately, the page is archived at the Wayback Machine. For those in the community, perhaps there is enough context to rediscover this data set, but the first several hits for the query for "CMU Visual Localization Dataset" does not return anything that is obvious to me as the right answer (perhaps the second hit subsumes the original data set?).

The reference to http://goo.gl/0R8XX6 in Yin & Berger (2017).

A Google query for "CMU Visual Localization Dataset" on 2025-08-10; perhaps the data set we seek is included in the second hit?

https://goo.gl/0R8XX6 did not win the popularity contest in late 2024, and will cease working on 2025-08-25. It appears that dereferencing the URL now (August 2025) will not save it.

Dereferencing https://goo.gl/0R8XX6?si=1 yields http://3dvis.ri.cmu.edu/data-sets/localization/, which no longer resolves (which is technically not an HTTP event, since there is not a functioning HTTP server to respond).

https://3dvis.ri.cmu.edu/data-sets/localization/ was frequently archived between 2015 and 2018.

https://3dvis.ri.cmu.edu/data-sets/localization/ as archived on 2015-02-19.

So under the current guidance, one of the 26 goo.gl URLs (https://goo.gl/0R8XX6) in Yin & Berger (2017) will cease working in about two weeks, and it's not immediately obvious that the paper provides enough context to refind the original data set. This is compounded by the fact that the original host, 3dvis.ri.cmu.edu, no longer resolves. Fortunately, the Wayback Machine appears to have the site archived (I have not dived deeper to verify that all the data has been archived; cf. our Web Science 2025 paper).

2025-08-03 Google Scholar search for "goo.gl"

Here, we've only examined one paper, so the next natural question would be "how many other papers are impacted?" A search for "goo.gl" at Google Scholar a week ago estimated 109,000 hits. Surely some of those hits include simple mentions of "goo.gl" as a service and don't necessarily have shortened links. On the other hand, URLs shorteners are well understood and probably don't merit extended discussion, so I'm willing to believe that nearly all of the 109k hits have at least one shortened URL in them; the few that do not are likely balanced by Yin & Berger (2017), which has 26 shortened URLs.

For simplicity, let's assume there are 109,000 shortened URLs indexed by Google Scholar. Let's also assume that the sunset average (1/26, or 4%) for the URLs in Yin & Berger (2017) also holds for the collection. That would yield 109,000 * 0.04 = 4,360 shortened URLs to be sunsetted on 2025-08-25. Admittedly, these are crude approximations, but saying there are "at least 4,000 shortened URLs that will disappear in about two weeks" passes the "looks right" test, and if forced to guess, I would bet that the actual number is much larger than 4,000. Are all 4,000 "important"? Are all 4,000 unfindable on the live web? Are all 4,000 archived? I have no idea, and I suppose time will tell. As someone who has devoted much of their career to preserving the web, especially the scholarly web, deprecating goo.gl feels like an unforced error in order to save "dozens of dollars".

–Michael

A gist with the URLs and HTTP responses is available.

2025-08-09: ODU's Strategic Research Thrust Areas Uniquely Describe ODU / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

https://www.odu.edu/research/strategic-research-areas

ODU published its "Strategic Research Areas" last April, and I've been meaning to comment on it for a while. First, it hasn't been well messaged (yet), as reflected by an informal poll of my WSDL peers. Second, I'm reaching the stage in my career where I read with enthusiasm "strategy" and "policy" documents.

So what makes this strategy statement different from previous iterations? First, it uniquely describes ODU. Instead of a collection of generic terms like "impact", "development", "innovation", etc., the four thrusts identified capture many of ODU's current activities and are areas of ODU's comparative and competitive advantage. The official document has more eloquent language justifying the four thrusts, but mostly it comes down to the geography and resulting industry and demographics of the Hampton Roads region:

Coastal Resilience: we have a lot of flooding, for example, Norfolk is second only to New Orleans for flood risk
Health Innovations: we have a lot of uninsured people, which is one the many reasons for our healthcare disparities
Maritime Systems: we have a lot of ships, centered around the largest shipyard in the country
National Security: we have a lot of military, including the largest naval base in the world

Certainly other institutions are active in some of these areas, but I can't think of another institution more centered at the intersection of the four. In addition to the main areas, there are five cross-cutting research areas that, while not unique to ODU, are critically important and enabling to nearly every research pursuit:

Happily, I find myself, and WSDL, in most if not all of the five cross-cutting areas.

This "4+5" model does not exhaustively catalog all of ODU's research areas, but it is a helpful descriptive and prescriptive model for informing future resource investments. All institutions have to choose what they are going to be good at, and in this case, we've chosen to be good at the things that are unique to ODU and Hampton Roads. These are difficult times for university research, and the nation seems to have lost sight of the economic impact of funding higher education. Hopefully ODU's alignment of research thrusts to this unique combination of the region's strengths – and weaknesses – will allow us to demonstrate that higher education is a public good.

–Michael

Seeing No Red Flags: Why Do Authors Ignore Journal Titles? / Journal of Web Librarianship

DC.creator
Chitnarong Sirisathitkul Division of Physics, School of Science, Walailak University, Nakhon Si Thammarat, ThailandChitnarong Sirisathitkul obtained his D.Phil. in 2000 from the University of Oxford, UK. He has been working at Walailak University, Thailand, since 2001, where he was awarded the Best Teacher in 2005 and the Best Researcher in 2017. He is currently an associate professor and head of the Division of Physics at the School of Science. His publications in Scopus-indexed journals cover topics such as magnetic materials, traditional ceramics and artificial intelligence in education. Serving as the editor of Area Based Development Research Journal and Thai Journal of Physics, he is also interested in scholarly publication ethics.
DC.title
Seeing No Red Flags: Why Do Authors Ignore Journal Titles?
DC.publisher
Journal of Web Librarianship
DC.date
Fri, 08 Aug 2025 02:36:46 +0000
DC.rights

Author Interview: Joanne Harris / LibraryThing (Thingology)

LibraryThing is pleased to sit down this month with bestselling Anglo-French author Joanne Harris, whose 1999 novel, Chocolat—shortlisted for the Whitbread Award—was made into a popular film of the same name. The author of over twenty novels, including three sequels to Chocolat—as well as novellas, short stories, game scripts, screenplays, the libretti for two operas, a stage musical, and three cookbooks, her work has been published in over fifty countries, and has won numerous awards. She was named a Member of the Order of the British Empire (MBE) in 2013 and an Officer of the Order of the British Empire (OBE) in 2022, for services to literature. A former teacher, Harris is deeply involved in issues of author rights, serving two terms as Chair of the UK’s Society of Authors (SOA) from 2018 to 2024. She is a patron of the charity Médecins Sans Frontières (Doctors Without Borders), to which she donated the proceeds from sales of her cookbooks. Cooking and food are consistent themes in her work, and she returns to the story of her most famous culinary character in her newest novel, Vianne, a prequel to Chocolat that is due out from Pegasus Books in early September. Harris sat down with Abigail this month to discuss this new book.

Set six years before the events of Chocolat, your new book is actually the fifth novel about Vianne Rocher to be released. What made you decide you needed to write a prequel? Did any of the ideas for the story come to you as you were writing the other books about Vianne, or was it all fresh and new as you wrote?

Vianne and I have travelled together for over 25 years, and although we’re different in many ways, I think we have some things in common. When I wrote Chocolat, I was the mother of a small child, and wrote Vianne’s character from a similar perspective. I left her in 2021 as the mother of two children, both young adults, and I realized that both Vianne and I needed to look back in order to move forward. Hence Vianne, my protagonist’s origin story, which answers a number of questions left unanswered at the end of Chocolat, and hopefully gives some insights into her journey. Most of it was new; I found a few references in Chocolat to work from, but until now I’ve had very little idea of what Vianne’s past might have been, which made the writing of this book such an interesting challenge.

Food and cooking are important themes in your work. Why is that? What significance do they have for you, and what can they tell us about the characters in your stories, and the world in which they live?

Food is a universal theme. We all need it, we all relate to it in different, important ways. It’s a gateway to culture; to the past; to the emotions. In Vianne it’s also a kind of domestic magic, involving all the senses, and with the capacity to transport, transform and touch the lives of those who engage with it.

Talk to us about chocolate! Given its importance in some of your best-known fiction, as well as the fact that you published The Little Book of Chocolat (2014), I think we can assume you enjoy this treat. What are your favorite kinds? Are there real life chocolatiers you would recommend, or recipes you like to make yourself? (Note: the best chocolate confections I myself ever tasted came from Kee’s Chocolates in Manhattan).

As far as chocolate is concerned, my journey has been rather like Vianne’s. I really didn’t know much about it when I wrote Chocolat, but since then I’ve been involved with many artisanal chocolatiers, and I’ve travelled to many chocolate producing countries. Some of my favourites are Schoc in New Zealand, and Claudio Corallo in Principe, who makes single-origin bean to bar chocolate on location from his half-ruined villa in the rainforest. And David Greenwood-Haigh, a chef who incorporates chocolate into his recipes much as Vianne does in the book (and who created the “chocolate spice” to which I refer in the story.)

Like its predecessors (or successors, chronologically speaking), Vianne is set in France. As the daughter of an English father and French mother, what insight do you feel you bring to your stories, from a cultural perspective? Do you feel you are writing as an insider, an outsider, perhaps both— and does it matter?

I think that as a dual national, there’s always a part of me that feels slightly foreign, which is why Vianne, too, is a perpetual outsider. But I do know enough about France to write with authority and affection – and maybe a little nostalgia, too. The France of my books is a selective portrait, based on the places and people I love, some of which have disappeared. These books are a way of making them live again.

Tell us a little bit about your writing process. Are you someone who maps out your story beforehand, or do you like to discover where things are going as you write? Do you have a particular writing routine? What advice would you give young writers who are just getting started?

My process varies according to the book, but as a rule I don’t map out the story in its entirety: I usually start with a voice, and a mission, and a number of pivotal scenes, and I see where that takes me. I write where I can: if I’m at home, I prefer my shed in the garden, but I can make do with any quiet space. My process involves reading aloud, so it’s best if I’m alone. And I use scent as a trigger to get me into the zone: a trick borrowed from Stanislasky’s An Actor Prepares, which I’ve been using for 30 years. In the case of Vianne I used Chanel’s Coromandel, partly because it’s an olfactory relative of Chanel No. 5, which I used when I was writing Chocolat. (And on the same theme, I’ve created a scent of my own with the help of perfumier Sarah McCantrey of 4160 Tuesdays): it’s called Vianne’s Confession, and it illustrates a passage from the book.)

As for my advice to young writers; just write. You get better that way. And if you are indeed just getting started, don’t be in a hurry to publish or to share your work if you don’t feel ready. You have as long as you like to write your first book, and only one chance at making a first impression. So take it slow, let yourself grow, and enjoy the process, because if you don’t enjoy what you do, why should anyone else?

What’s next for you? Do you have further books in the pipeline? Do you think Vianne, or any of the sequels to Chocolat, will also be made into a film?

I always have more books in the pipeline: the next one is very different; it’s a kind of quiet folk-horror novel called Sleepers in the Snow. As for films, it’s too early to say, but it would be nice to see something on screen again – though preferably as a series, as I really think these books, with their episodic structure, would probably work better that way.

Tell us about your library. What’s on your own shelves?

At least 10,000 books in French, English, German. I find it hard to give books away, so I’ve accumulated quite a library of all kinds of things, in many different genres.

What have you been reading lately, and what would you recommend to other readers?

Right now I’m reading a proof of Catriona Ward’s new book, Nowhere Burning, which is terrific: so well-written, and like all her books, quite astonishingly creepy.

2025-08-06: Paper Summary: "ETD-MS v2. 0: A Proposed Extended Standard for Metadata of Electronic Theses and Dissertations" / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

Our paper, “ETD-MS v2.0: A Proposed Extended Standard for Metadata of Electronic Theses and Dissertations,” was accepted at the 27th International Symposium on Electronic Theses and Dissertations (ETD 2024), held in Livingstone, Zambia. ETD 2024 welcomed contributions on a wide range of topics related to Electronic Theses and Dissertations (ETDs), including digital libraries, institutional repositories, graduate education and training, open access, and open science. The symposium brought together global researchers, practitioners, and educators dedicated to advancing the creation, curation, and accessibility of ETDs.

As the number of ETDs in digital repositories continues to grow, the need for a metadata standard that aligns with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles becomes increasingly important. Dublin Core and ETD-MS v1.1 are widely used metadata schemas for scholarly documents and ETDs. However, we identified several gaps that limit their ability to fully represent the structure and content of ETDs. In particular, content-level metadata, such as the individual components or “objects” within an ETD, has become increasingly important. This level of detail is essential for supporting machine learning applications that extract scientific knowledge and for enabling large-scale scholarly data services.

In this paper, we present ETD-MS v2.0, an extended metadata schema developed to address these limitations. ETD-MS v2.0 provides a comprehensive description of ETDs by representing both document-level and content-level metadata. The proposed schema includes a Core Component building on the existing ETD-MS v1.1 schema, and an Extended Component that captures objects, their provenance, and user interactions for ETDs.

Motivation

The motivation for ETD-MS v2.0 arises from three major limitations observed in current metadata standards. First, existing metadata standards lack the metadata elements to describe access rights and ETD file formats in detail. For example, the dc.rights field in ETD-MS v1.1 offers only three preset values for access. The dc.format field assumes a single MIME type per ETD, which is inadequate for ETDs that include multiple file types. Second, current standards lack metadata elements for representing internal components of ETDs such as chapters, figures, and tables. In our schema, these are referred to as “objects,” and they often have rich attributes of their own that require structured representation. Third, existing schemas do not support metadata originating from sources outside the original ETD submission, such as those generated by human catalogers or AI models. The absence of provenance information for such metadata further limits its utility.

Schema Design

ETD-MS v2.0 is composed of two main components: the Core Component and the Extended Component.

Figure 1: Relationships among Entities in the Core and Extended Components of ETD-MS v2.0. Blue represents Extended Components, and green represents core components.

Core Component

The Core Component focuses on document-level metadata and was developed using a top-down approach. We analyzed 500 ETDs sampled from a collection of over 500,000 ETDs (Uddin et al., 2021) spanning various disciplines and publication years. The Core Component comprises 10 entities and 73 metadata fields.

Some key improvements include the transformation of dc.rights into a dedicated “Rights” entity, with attributes such as rights_type, rights_text, and rights_date. Another major addition is the “ETD_File” entity, which captures metadata related to multiple file types, file descriptions, generation methods, and checksums. We also introduced a new “References” entity, missing in earlier schemas, to capture structured metadata for cited works, including the fields reference_text, author, title, year, and venue.

The Core Component entities are categorized into two types: those that describe the ETD itself, such as “ETDs,” “Rights,” “Subjects,” “ETD_classes,” and “ETD_topics,” and those that capture relationships between ETDs or collections of ETDs, such as “References,” “ETD-ETD_neighbors,” “Collections,” and “Collection_topics”.

Extended Component

Figure 2: Relationships among Entities in the Extended Components of ETD-MS v2.0. Blue represents Category E.1, red represents Category E.2, and orange represents Category E.3.

The Extended Component focuses on content-level metadata and was developed using a bootstrap approach. It introduces 18 entities with 87 metadata fields, grouped into three categories:

Category E.1: Includes entities such as “Objects,” “Object_metadata,” “Object_summaries,” “Object_classes,” and “Object_topics” to describe individual components such as figures, tables, and sections.
Category E.2: Entities such as “Classifications,” “Classification_entries,” “Classifiers,” “Topic_models,” and “Summarizers” store metadata about how certain content was generated or classified.
Category E.3: Captures metadata about user behaviors and preferences using entities such as “Users,” “User_queries,” “User_queries_clicks,” “User_topics,” “User_classes,” and “User-user_neighbors”.

Implementation

To evaluate the feasibility of ETD-MS v2.0, we implemented the schema using a MySQL database and populated it with data from a separate collection of 1,000 ETDs (distinct from the 500 ETDs used for schema development). These ETDs, sourced from 50 U.S. universities and published between 2005 and 2019, were used to simulate real-world metadata extraction. We used OAI-PMH APIs and HTML scraping to gather document-level metadata, and employed PyMuPDF and Pytesseract for text extraction from born-digital and scanned ETDs, respectively. We developed a GPT-3.5 based prompt to classify ETDs using the ProQuest subject taxonomy, and applied summarization models such as T5-Small and Pegasus to generate chapter and object summaries. For topic modeling, we used LDA, LDA2Vec, and BERT, while CNNs and YOLOv7 were used to detect and classify visual elements such as figures and tables. User interaction data was populated with dummy data. The full process of extracting, processing, and inserting metadata for all 1,000 ETDs was completed in approximately 11 minutes on a virtual machine with 32 CPU cores and 125 GB RAM, demonstrating the scalability of our approach.

Interoperability and Mapping

To ensure interoperability and mitigate schema adoption challenges, we created a detailed mapping between ETD-MS v2.0 and the existing standards Dublin Core and ETD-MS v1.1. For example, the new field ETDs.owner_and_statement aligns with dc.rights, and ETDs.discipline maps to thesis.degree.discipline in ETD-MS v1.1. In some cases, our schema introduces new metadata fields with no equivalents in older standards, such as the detailed “References,” “ETD_File,” and “Object_metadata” entities.

Limitations and Future Work

The current version of the schema was developed using a sample of 500 ETDs, which may not fully capture the metadata of ETDs beyond the scope of selection. For example, some ETDs contain multiple date fields, such as submission date and public release date, or include metadata such as a “peer reviewed” status. These elements are not represented in our current schema.

We view ETD-MS v2.0 as an evolving framework. In the future, we will refine the schema by including additional metadata elements. We will also collect feedback from ETD repository managers, librarians, and other stakeholders.

Conclusion

ETD-MS v2.0 is a comprehensive and extensible metadata schema developed to align ETD metadata with the FAIR principles. Our proposed schema extends existing standards by providing a more complete and detailed description and integrating content-level metadata. The proposed ETD-MS v2.0 schema, along with its mappings to both Dublin Core and ETD-MS v1.1, is available at the following GitHub link: https://github.com/lamps-lab/ETDMiner/tree/master/ETD-MS-v2.0.

References

Salsabil, L., Wu, J., Ingram, W. A., & Fox, E. (2024). ETD-MS v2.0: A Proposed Extended Standard for Metadata of Electronic Theses and Dissertations . In Proceedings of the 27th International Symposium on Electronic Theses and Dissertations (ETD 2024).

Uddin, S., Banerjee, B., Wu, J., Ingram, W. A., & Fox, E. A. (2021, December). Building A large collection of multi-domain electronic theses and dissertations. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 6043-6045). IEEE. https://doi.org/10.1109/BigData52589.2021.9672058

-- Lamia Salsabil (@liya_lamia)

Open Data and ODE in Bangladesh: Students and Researchers Step into a New World of Openness / Open Knowledge Foundation

During our ODE workshop, many participants had an eye-opening moment. At first, some didn’t quite get how open source or open data related to their work. But once we introduced them to tools like the Data Package and its connection with ODE, it all clicked.

The post Open Data and ODE in Bangladesh: Students and Researchers Step into a New World of Openness first appeared on Open Knowledge Blog.

2025-08-04: Trip Report: 2025 AIAA SciTech Forum / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

The 2025 AIAA SciTech Forum in Orlando served as a seminal meeting point for researchers, practitioners, and policy influencers from across the aerospace spectrum. Throughout the week, a series of plenaries, panel discussions, and technical sessions provided a multifaceted view of contemporary challenges and future directions in aerospace research and development. This report will review key themes—from resilient space systems and exascale computing to transformative applications of artificial intelligence (AI) and NASA’s evolving strategies for continuous human presence in low Earth orbit (LEO) and my own personal takeaways from the various talks I attended.

I. Opening Plenary and Community Building

In his opening address, Clay Mowry, the newly appointed CEO of the American Institute of Aeronautics and Astronautics (AIAA), set the stage by emphasizing the forum’s role in fostering technical exchange and innovation. Speaking to an audience exceeding 6,000 attendees, including over 2,000 students and young professionals, Mowry underscored the Institute’s dual commitment to honoring its long-standing heritage and charting a forward-looking course for aerospace. He highlighted several strategic priorities:

Community Engagement and Mentorship: Mowry’s remarks stressed the importance of intergenerational knowledge transfer, noting the active participation of first-generation college students and international members within AIAA’s volunteer base. This is, to me, a key benefit to attending these types of conferences.
Institutional Growth: With AIAA approaching its centennial in 2031, the organization is actively seeking to expand its membership base and enhance its member services, thereby reinforcing its role as a thought leader in aerospace.
Inspiration for Innovation: Mowry’s energetic account of his recent visits to industry sites—such as Lockheed Martin and GE Aerospace—and his personal reflections on 32 years in the field served to inspire attendees by linking historical achievement with the promise of emerging technologies. It’s easy to forget how much innovation is going on when you live in the NASA bubble despite regular industry collaborations.

JPL’s Vision and the Future of Planetary Exploration

In the opening keynote by Dr. Lori Leshin of NASA’s Jet Propulsion Laboratory (JPL), the forum’s attention shifted to the challenges of planetary exploration. Dr. Leshin’s presentation covered topics such as:

Advances in Robotic Exploration: JPL’s work on Mars sample return, the deployment of sophisticated rovers, and the development of autonomous robotic systems illustrates the agency’s commitment to addressing the complex challenges of landing and operating on extraterrestrial surfaces.
Deep-Space Optical Systems: The integration of a next-generation coronagraph instrument on the Roman Space Telescope was highlighted as a transformative advance, offering the potential to detect exoplanets that are up to 100 million times fainter than their host stars.
Interdisciplinary and International Collaboration: Dr. Leshin stressed the importance of sustained partnerships—with entities such as the Indian Space Research Organization (ISRO) and various European space agencies—to address global scientific challenges and ensure that exploration efforts yield both technological and scientific dividends.

Dr. Leshin’s keynote underscored that the future of planetary exploration will be defined by the capacity to execute “ridiculously hard” missions; an endeavor that demands the convergence of technical innovation, rigorous testing, and robust international cooperation. A focus on complex mission research and execution helps to drive home the point that we can not continue to be a leader in space exploration with commercial space alone.

II. Advancing Resilient Space Systems

A key theme at the forum was the redefinition of resiliency in space systems. In a panel discussion featuring Dr. Deborah Emmons from the Aerospace Corporation and other experts, participants examined the limitations of traditional point-to-point resiliency models and advocated for a distributed, holistic approach. The session presented a rigorous analysis of emergent threats, including:

Systemic Vulnerabilities: The increasing reliance on space assets for global communications, navigation, and defense necessitates architectures that can autonomously reassign tasks and maintain functionality despite targeted disruptions.
Technological Threats: Contemporary challenges—ranging from anti-satellite (ASAT) weapons to the potential deployment of nuclear systems in orbit—demand innovative countermeasures and collaborative research efforts.
Interdisciplinary Collaboration: The discussion reinforced that strategic partnerships among government agencies, industry leaders, and academic institutions are indispensable for advancing robust space technologies.

Such deliberations reinforce the imperative for next-generation space systems to incorporate resiliency as an emergent, distributed property — a concept that will shape both technical R&D and national security policy.

III. Exascale Computing and Its Impact on Aerospace Research

Dr. Bronson Messer’s keynote presentation on Oak Ridge National Laboratory’s Frontier supercomputer provided a technical deep-dive into the transformative potential of exascale computing. Frontier, with its reported peak performance of 2.1 exaflops in double precision, exemplifies the convergence of advanced hardware, optimized interconnectivity, and innovative cooling solutions. Key points included:

Architectural Innovation: The supercomputer’s nearly 10,000-node configuration, housed in cabinet-sized racks and cooled via a state-of-the-art liquid-cooling system, enables unprecedented computational throughput essential for multi-scale, multiphysics simulations.
Scientific Applications: Frontier’s capabilities are already being leveraged for high-fidelity simulations in turbulence modeling, combustion dynamics, and retropropulsion analysis for Mars missions. These applications are critical for validating theoretical models and accelerating technology maturation.
Collaborative Synergies: Messer emphasized the importance of interdisciplinary collaboration, highlighting partnerships with industry (e.g., GE Aerospace) and academic institutions to maximize the impact of exascale resources on aerospace innovation.

Dr. Messer’s presentation illustrates that advances in computational infrastructure are pivotal to solving complex aerospace problems, thereby fostering breakthroughs in both fundamental science and applied engineering.

IV. AI as a Catalyst for Transformative Aerospace Applications

The session titled “The AI Future Is Now,” led by Alexis Bonnell, CIO of the Air Force Research Lab (at the time of the presentation), offered a forward-looking perspective on the integration of AI into aerospace systems. Moderated by Dr. Karen Wilcox, Banel’s presentation addressed several critical issues:

Iterative Learning and Rapid Adaptation: Bonnell noted that the accelerated pace of AI innovation requires a “fail-fast, learn-fast” approach. This methodology is essential for refining generative AI systems and ensuring that technological developments remain relevant in rapidly changing operational contexts.
Transformation of Routine Operations: One of the most compelling insights was the potential for AI to convert mundane tasks into strategic “time,” thus enhancing operational efficiency. This shift is particularly significant in defense, where reducing cognitive load can free decision-makers to focus on high-priority challenges.
Ethical and Cultural Considerations: Bonnell’s discussion also addressed the ethical dimensions of AI deployment, arguing that AI should be viewed as an augmentation tool rather than a replacement for human judgment. This perspective is crucial for fostering a balanced relationship between technology and human expertise.

The session’s exploration of AI underscores its role as both a technical enabler and a transformative force that reshapes the dynamics of human-machine collaboration in aerospace.

V. NASA’s Evolving Vision for Low Earth Orbit (LEO)

NASA’s strategic vision for continuous human presence in LEO was articulated by Associate Administrator Jim Free in a session that presented the agency’s new LEO microgravity strategy. Free’s remarks provided a comprehensive overview of NASA’s long-term objectives, which build on the legacy of the International Space Station (ISS) while charting a course for future exploration. Salient aspects included:

Historical Continuity and Future Ambitions: Free contextualized NASA’s achievements, from Apollo and the ISS to upcoming missions, emphasizing that LEO remains a critical proving ground for sustaining human presence and advancing exploration technologies.
Consultative Strategy Development: The formulation of the Leo microgravity strategy involved extensive stakeholder consultation with industry, academia, and international partners. This collaborative process yielded a refined set of goals and objectives that emphasize a “continuous heartbeat” in LEO.
Operational and Budgetary Considerations: Free discussed the challenges of maintaining a sustainable transportation base, managing orbital debris, and balancing budgetary priorities to ensure that strategic objectives can be met.

This session not only provided a detailed roadmap for future LEO operations but also highlighted the importance of consultation and iterative strategy development in addressing the multifaceted challenges of space exploration.

VI. NASA Langley Specific Talks

Dr. Danette Allen

The presentation by Dr. B. Danette Allen, titled "Teaming with Autonomous Systems for Persistent Human-Machine Operations in Space, on the Moon, and on to Mars," explored the critical role of autonomous systems in NASA’s long-term Moon-to-Mars strategy. The discussion emphasized the need for reliable, resilient, and responsible autonomy to support human-machine teaming in deep space exploration.

Dr. Allen framed the talk around the question of whether autonomy should be "irresponsible"—a rhetorical setup that presented the challenges of ensuring trust, safety, and effectiveness in autonomous robotic systems. The presentation aligned with NASA’s broader Moon-to-Mars architecture, which envisions integrated human and robotic operations to maximize scientific and engineering productivity. The emphasis was on creating autonomous systems that can function effectively in harsh, time-critical environments while maintaining transparency, explainability, and human oversight.

A key focus was the concept of Human-Machine Teaming (HMT) which involves the integration of human cognition with robotic efficiency to optimize exploration activities. NASA’s strategy aims to balance supervised autonomy with trusted, independent robotic operations that extend the reach of human explorers. This approach ensures that, even during uncrewed mission phases, habitation systems, construction equipment, and surface transportation can function autonomously while still allowing human intervention when necessary.

The presentation detailed how autonomous systems will contribute to NASA’s Lunar Infrastructure (LI) and Science-Enabling (SE) objectives. These include autonomous site surveying, sample stockpiling, and in-situ resource utilization (ISRU) to prepare for crewed missions. Autonomous construction techniques will be crucial for building long-term infrastructure, such as power distribution networks and surface mobility systems, while robotic assistants will help optimize astronaut time by handling routine or high-risk tasks.

One of the central challenges discussed was trust in autonomous systems. Dr. Allen highlighted that autonomy in space is not merely about function allocation but about fostering justifiable trust, which ensures that robots make decisions in a way that humans can understand and rely on — especially in safety-critical scenarios. The talk addressed different levels of autonomy, ranging from supervised to fully autonomous systems, and how human explorers will interface with these technologies through natural interaction methods such as gestures, gaze tracking, and speech.

From an in-space assembly perspective, this research is vital. As NASA moves toward constructing large-scale space infrastructure, ranging from modular lunar habitats to Martian research stations, robotic autonomy will be essential in assembling, repairing, and maintaining these structures. Autonomous systems capable of adapting to dynamic conditions will reduce reliance on Earth-based control, allowing for more resilient and self-sustaining operations.

The Moon-to-Mars strategy’s emphasis on interoperability and maintainability also ties into the need for autonomous systems that can adapt to different mission phases. Whether constructing habitats, assisting in scientific research, or supporting crew logistics, autonomy must be integrated seamlessly across NASA’s exploration objectives. By leveraging artificial intelligence and robotic automation, NASA is setting the foundation for a future where in-space assembly and long-term space habitation become feasible and sustainable.

Ultimately, the idea that autonomy in space must be trustworthy, explainable, and mission-critical is fundamental to the development of reliable human-machine teams. These teams will be a cornerstone of NASA’s efforts to establish a persistent human and robotic presence on the Moon and Mars, paving the way for deeper space exploration and long-term space infrastructure development.

Dr. Natalia Alexandrov

The presentation by Natalia M. Alexandrov and colleagues, "MISTRAL: Concept and Analysis of Persistent Airborne Localization of GHG Emissions," explored an innovative approach to tracking and mitigating methane (CH₄) emissions using persistent airborne monitoring. Funded by NASA’s Convergent Aeronautic Solutions (CAS) initiative, the project sought to develop a scalable, low-cost solution for real-time methane detection, with a particular focus on high-emission regions like the Permian Basin.

The presentation emphasized the urgency of methane reduction by highlighting that the global temperature increase had surpassed the critical 1.5°C threshold in 2024. This warming has exacerbated environmental, economic, and health crises, with methane playing a significant role due to its potency as a greenhouse gas. The discussion also addressed the direct health effects of methane emissions, which displace oxygen and contribute to respiratory, cardiovascular, and neurological conditions. Studies cited in the talk estimated that emissions from oil and gas industry operations contribute to 7,500 excess deaths and a $77 billion annual public health burden in the U.S. alone.

Initially, the research team explored airborne CO₂ removal but pivoted toward methane due to its greater short-term climate impact. The final concept emphasized persistent localization and reporting rather than scrubbing, as some experts raised concerns that removal technologies might unintentionally encourage more emissions. Instead, MISTRAL proposed a decentralized approach in which fleets of commercial off-the-shelf (COTS) drones would conduct continuous monitoring and reporting of methane leaks, allowing for timely intervention and mitigation.

The design reference mission (DRM) centered on the Permian Basin, one of the largest methane super-emitters in the world. The project proposed partitioning the observation area into units, each operating a fleet of drones for continuous surveillance of emissions from production sites, pipelines, and storage facilities. The study also explored different operational strategies, such as distributed battery hot swapping and chase vehicle-based battery replacements, to maximize efficiency and minimize downtime.

A key aspect of the analysis was its feasibility assessment. The team evaluated the economic viability of the system, modeling costs under pessimistic assumptions. Even in worst-case scenarios, the study found that small municipalities could afford to implement and maintain a localized monitoring network. The project also aligned with existing Environmental Protection Agency (EPA) third-party reporting initiatives, empowering local governments, first responders, and communities to take direct action in holding polluters accountable.

From an Earth science and conservation perspective, MISTRAL represented a major step forward in environmental monitoring and climate change mitigation. Persistent airborne surveillance of greenhouse gases could provide critical data for climate researchers, regulatory agencies, and policymakers, improving the accuracy of emissions inventories and facilitating more effective enforcement of environmental regulations. The ability to track methane emissions in near-real-time also complemented broader conservation efforts by helping to identify and address sources of ecosystem degradation, such as habitat loss due to oil and gas extraction.

Furthermore, MISTRAL’s model of community-driven, low-cost, technology-enabled environmental oversight offered a scalable blueprint for other regions grappling with industrial pollution. By decentralizing environmental monitoring and making it more accessible, the project aligned with global efforts to use technology for conservation, supporting initiatives like methane reduction pledges under the Global Methane Pledge and broader climate resilience strategies.

Ultimately, the presentation concluded that the MISTRAL concept was not only technically and economically viable but also a transformative tool for conservation and environmental protection. By leveraging autonomous aerial systems for persistent methane tracking, the project offered a pragmatic, actionable solution for reducing greenhouse gas emissions and mitigating climate change at a critical time for global climate action.

Dr. Javier Puig-Navarro

Dr. Javier Puig-Navarro’s talk, “Performance Evaluation of a
Cartesian Move Algorithm for the LSMS Family of Cable-Driven Cranes”, presented on the performance of a novel algorithm designed for the Lightweight Surface Manipulation System (LSMS).

The LSMS crane operates through multiple cable actuators that provide both support and control of the payload, enabling large workspaces with lightweight hardware. However, the system’s complex nonlinear dynamics, coupled actuator paths, and lack of traditional joint sensors pose significant challenges to motion planning, especially in the precise manipulation required for autonomous or teleoperated operations on the Moon or Mars.

The Cartesian Move Algorithm: A Simpler Path to Precision

Puig-Navarro’s team developed the Cartesian Move Algorithm to simplify these challenges by shifting control focus from joint space to task space. The algorithm's objective is to drive the crane’s end effector (e.g., hook or gripper) to a desired 3D location, maintaining position even in the face of actuation delays, feedback uncertainty, and mechanical compliance.

Inputs to the algorithm include:

Goal position: Supplied by a perception system (e.g., vision-based localization)
Hook position estimate: Computed from actuator encoders
Ideal motion profile: A straight-line trajectory from current position to target

Instead of prescribing precise joint motions, the algorithm computes control signals that move the end effector directly along a Cartesian path. This approach abstracts the operator or control planner away from the complexities of cable tensioning, kinematic switching, and nonlinear coupling; common obstacles in cable-driven robotic systems.

In practice, the Cartesian move operates during several key motion phases:

Capture and lift in pick-up tasks
Drop, release, and retreat during payload placement

For initial alignment (approach), a separate joint-space trajectory tracking algorithm is used, which ensures smooth transition into Cartesian control when precision is most critical.

Performance Insights from Hardware and Simulation

Puig-Navarro reported on a rigorous evaluation of the algorithm using 49 real-world trials on LSMS testbeds. The results were impressive:

46 of 49 Cartesian move executions achieved a “desired” result (minimal error at the goal)
Only 3 were classified as "minimum acceptable" or "unsatisfactory"

Moreover, the team benchmarked algorithm performance across both hardware and simulation environments. While both platforms showed excellent convergence behavior, physical hardware introduced subtle differences in path curvature and command saturation—attributable to real-world constraints like cable elasticity and latency.

Practical Implications for Planetary Robotics

The key takeaway from Puig-Navarro’s talk is that the Cartesian move algorithm is a powerful and practical solution for tasks requiring final-position accuracy in environments where traditional robot arms are impractical or infeasible. For operations where the path shape is also critical (e.g., obstacle avoidance or coordination with other manipulators), the team recommends using trajectory-tracking or path-following algorithms instead.

Dr. Joshua Moser

Dr. Joshua Moser’s talk, "Bridging the Gap Between Humans and Robotic Systems in Autonomous Task Scheduling," explored the integration of human decision-making with autonomous task scheduling to enhance operational efficiency in space environments. The core focus was on the sequencing and allocation of tasks and crucial elements in ensuring smooth execution of autonomous operations, particularly in scenarios involving data collection, mining, offloading, assembly, repair, maintenance, and outfitting.

Moser discussed various approaches to task sequencing, emphasizing the importance of dependencies and workflow constraints. He introduced Mixed Integer Programming (MIP) and Genetic Algorithms as computational techniques for optimizing task execution order, ensuring efficiency and feasibility in robotic operations. Similarly, task allocation was analyzed through the lens of an agent’s capabilities, location, and travel constraints — highlighting the necessity of considering independence, dependencies, and failure probabilities when assigning work to robotic systems.

A significant aspect of the presentation was the role of human-autonomy collaboration. Moser distinguished between "human-in-the-loop" and "human-on-the-loop" frameworks, where humans either actively direct autonomous systems or oversee their operations with minimal intervention, respectively. The key challenge lies in creating interfaces that enable intuitive human interaction with autonomy; leveraging graphical representations, large language models (LLMs), and interactive visualization tools.

Moser uses the LSMS (Lightweight Surface Manipulation System) as an example of real-world applications, illustrating how autonomous scheduling can optimize payload offloading using a cable-driven crane and rover system. The emphasis on graphical task-agent visualization and intuitive user inputs (such as click-and-drag interfaces) reflected an effort to make autonomy more interpretable and manageable by human operators.

In the broader context of NASA’s in-space assembly efforts, Moser’s work aligns with ongoing initiatives aimed at enabling autonomous robotic construction and maintenance of space infrastructure. As NASA pushes toward large-scale space structures—such as modular space habitats, solar power stations, and next-generation observatories—intelligent task scheduling and allocation mechanisms become increasingly critical. Bridging the gap between human cognition and robotic automation will be essential to achieving scalable and resilient in-space assembly systems, reducing reliance on direct human intervention while ensuring mission success in unpredictable environments.

Me:

I presented on "Trust-Informed Large Language Models via Word Embedding-Knowledge Graph Alignment," exploring innovative methods to enhance the reliability and accuracy of large language models. The central theme of my presentation was addressing the critical challenge of hallucinations,instances where LLMs generate plausible yet incorrect information, particularly problematic in high-stakes fields such as aerospace, healthcare, and financial services.

My research investigates the integration of LLMs with knowledge graphs, structured representations of real-world knowledge, to foster intrinsic evaluation of information credibility without external verification sources. Specifically, I discussed aligning word embeddings, mathematical representations of words capturing semantic relationships, with knowledge graph embeddings which encode entities and their interconnections. By merging these two types of embeddings into a unified vector space, the resultant model significantly improves its ability to evaluate the plausibility of generated content intrinsically, thus reducing its dependence on external systems and mitigating the risk of hallucinations.

During the presentation, I provided a comprehensive survey of existing alignment methods, including mapping-based approaches, joint embedding techniques, and the application of graph neural networks. Additionally, I outlined key applications where this methodology could significantly enhance trust in AI systems, particularly in safety-critical decision-making environments such as aerospace operations.

Lastly, I addressed the technical, methodological, and ethical challenges that accompany this integration, offering insights into future research directions to further develop robust, trustworthy AI. My work aims not only to advance understanding of language models but also to contribute practically to developing safer, more reliable AI systems that can independently discern truth from misinformation.

VII. Conclusion

The 2025 AIAA SciTech Forum exemplified the integration of cutting-edge technology with strategic foresight in the aerospace domain. Several overarching themes were repeated throughout the conference such as the imperative to develop space systems that are resilient by design, capable of dynamic, distributed response to emergent threats by targeted research and development in Distributed Resilience. Additionally, the transformative role of exascale computing in enabling high-fidelity simulations that drive both fundamental research and applied technology development. Finally, the promise of artificial intelligence to not only optimize operational efficiency but also fundamentally alter the relationship between human decision-making and information processing.

As I return to my work at NASA Langley, I'm reminded that innovation often happens at the boundaries between fields. The conversations in hallways, the unexpected connections between presentations, and the diverse perspectives of over 6,000 attendees all contribute to pushing aerospace forward. In an era where the challenges are "ridiculously hard" (to borrow Dr. Leshin's phrase), our solutions must be equally ambitious—and thoroughly collaborative.

The path from Earth to a sustained presence on the Moon and Mars will require not just technological breakthroughs, but a fundamental shift in how we approach complex systems. The 2025 SciTech Forum showed that the aerospace community is ready for this challenge, armed with distributed thinking, unprecedented computational tools, and a commitment to building AI systems worthy of our trust.

- Jim

How to make a custom template for the Remarkable 2 / Hugh Rundle

How to make a custom template for the Remarkable 2

Recently I decided I wanted to make a custom template to use on my reMarkable 2. I eventually figured out how to do this, but whilst I found some useful guides online, all of them were slightly misleading or unhelpful in different ways – probably due to changes over time. This guide is for anyone else wanting to give it a shot in 2025.

The tl;dr

The reMarkables are built on Linux, and the templates are SVG files in a specific directory. Adding your own template is probably easier than you expected:

create an SVG file for your template
connect to your reMarkable using SSH
copy your template to the templates directory
update the templates.json file so your template appears in the listing
reboot the reMarkable

I haven't tried it on Windows, but apparently Windows has an SSH terminal and also scp so you should be able to follow this same process whether you have a computer running Linux, MacOS, any other Unix-based system, or Microsoft Windows.

You will need a computer, software for creating SVG graphics, and a little confidence.

Caveats

It's possible you could brick your reMarkable if you mess this up really badly. Always make sure you have backed up your files before doing anything in reMarkable's file system.

I haven't been using custom templates for long enough to know for sure, but others have suggested that when your reMarkable software is next updated, any custom templates may be deleted. Make sure you have backups of your templates as well!

Finally, this is what worked for me on a reMarkable 2 running the latest operating software in July 2025. Future system updates may change the way this works.

Step 1 - create your template

Older guides for making custom templates, like this one were helpful for me to understand the basics of templates, but it seems that in the past templates were .png files, whereas recently they changed to SVG.

To create a template you will need something to create SVG graphics. I use Affinity Designer, but you could try Inkscape, Adobe Illustrator, or Canva. The reMarkable 2 screen size is 1872px x 1404px so although SVGs will scale proportionally, for best results make your file match that size.

Remember that your reMarkable 2 will only display in black, white, and grey. If your design doesn't quite work the first time, you can play around with it and reload it, so you can experiment a little until you get the design that suits your needs.

Once you're finished, save the template somewhere you can find it easily on your computer, as a .svg file.

Step 2 - connect to your reMarkable via SSH

To access the operating system for your reMarkable, you will need to connect using Secure Shell (SSH). For this, you need two pieces of information about your reMarkable: the IP address, and the password. From the main menu (the hamburger icon at top left) navigate to Settings - Help - Copyrights and licenses. At the bottom of the first page in this section you will find your password in bold type, and a series of IP addresses. The second (IPv4) address is the one you are looking for. This will be a private IP address starting with 10. If your reMarkable is connected to WiFi, you can use SSH over the same WiFi network. Otherwise, connect via your reMarkable's USB power/data cable. Either way, ensure that your reMarkable remains awake whilst you are connected, otherwise your session may hang.

Open a terminal on your computer (Terminal on Mac and Linux desktop, CMD.exe or PowerShell on Windows). You will be logging in as the user called root. This is a superuser on Linux machines so take care - with great power comes great responsibility. You should be able to log in using this command (where xxx.xxx.xxx.xxx is your IP address):

ssh root@xxx.xxx.xxx.xxx

Your terminal will then ask for a password, which you should type in, and then press Enter - the quotation marks are not part of the password. If all goes well, you should see something like this:

ｒｅＭａｒｋａｂｌｅ
╺━┓┏━╸┏━┓┏━┓   ┏━┓╻ ╻┏━╸┏━┓┏━┓
┏━┛┣╸ ┣┳┛┃ ┃   ┗━┓┃ ┃┃╺┓┣━┫┣┳┛
┗━╸┗━╸╹┗╸┗━┛   ┗━┛┗━┛┗━┛╹ ╹╹┗╸
reMarkable: ~/

~ hacker voice ~ You're in 😎.

Step 3 - copy your template to the reMarkable

At this point you should pause to ensure that you know the filepath to the template path on your computer. If you saved it to your desktop (not a great place for long term storage, but convenient for quick operations like this) it will be something like ~/Desktop/my_custom_template.svg. We are now going to create a special subdirectory for your custom template/s, and copy your file across.

In your terminal session you should still be logged in to the reMarkable. The templates are all stored in the /usr/share/remarkable/templates directory. To create a new subdirectory, we use the mkdir command, like this:

mkdir /usr/share/remarkable/templates/my_templates

Now we can copy our template over. Open a new terminal window. We will use the secure copy protocol to copy the file over SSH from your computer to your reMarkable:

scp ~/Desktop/my_custom_template.svg /usr/share/remarkable/templates/my_templates/

Back in your first terminal session – which should still be connected to the reMarkable – you can check whether the file transferred across using the ls command:

ls /usr/share/remarkable/templates/my_templates

This should display my_custom_template.svg.

Step 4 - update the `templates.json` file

Now for the trickiest part. You will need to update a file in the templates directory called templates.json. This provides information about where each template is stored, what it should be called, and which icon to use in the templates menu. If you make an error here, your templates may no longer work properly (I know this from my own mistake!) - so whilst it is reasonably straightforward, you do need to pay attention.

Many tutorials about editing files on the Linux command line tell you to use vi or vim. These are the default text editors on Linux, but they are also obtuse and confusing for newcomers. We are going to instead use the nano program that is also standard on most Linux distributions, but a little easier to understand. To edit the templates JSON file, open it in nano:

nano /usr/share/remarkable/templates/templates.json

You should now see a screen showing the beginning of a long string of JSON. We want to add a new entry down the bottom of the file, so we will navigate down to line 500 using the keyboard shortcut Ctrl + / + 500 + Enter. From there you can use your cursor/arrow keys to navigate down to the last entry in the file. We want to add a new entry, like this:

    {
      "name": "Hexagon small",
      "filename": "P Hexagon small",
      "iconCode": "\ue98c",
      "categories": ["Grids"]
    },
    {
      "name": "My Daily Schedule",
      "filename": "my_templates/my_custom_template.svg",
      "iconCode": "\ue9ab",
      "categories": ["Planners"]
    }
  ]
}

Make sure you do not overwrite or delete the square and curly brackets at the end of the file, that you do put a comma after the second-last entry and your new one, and do not leave a trailing comma after your new entry.

Note that the filename is relative to the templates directory, so we need to include the new subdirectory. The iconCode uses a "private use" unicode value that matches one of reMarkable's standard images – it is not possible to create your own icon so you will need to re-use one of the existing ones.

Once you confirm everything is correct, enter Ctrl + x to exit, and y + Enter to confirm you want to save changes using the original filename.

Step 5 - reboot

Now for the most terrifying moment: rebooting your reMarkable!

Back on your command line, type reboot and then press Enter.

This step is simple but it will be a little nerve-wracking because your reMarkable will reboot, then pause for a moment before letting you log back in. If everything has gone according to plan you should now be able to find your new template by name in the template directory, and start using it!

Optional bonus step 7 - SSH keys

Logging in with a password is ok, but it can get a bit tedious. An easier way is to use SSH keys.

You can set up an SSH "key pair" on Linux and MacOS and also now natively on Windows.

Once you have created your keys, you can use ssh-copy-id to copy your public key to your reMarkable, allowing you to log in without a password! We use the ssh-copy-id command, with the i flag followed by the path to our ssh key:

ssh-copy-id -i ~/.ssh/id_rsa root@xxx.xxx.xxx.xxx

If you only have one ssh key, you can just enter:

ssh-copy-id root@xxx.xxx.xxx.xxx

At the prompt, enter your password and press Enter. You should see a number of lines of output, ending in:

Number of key(s) added:        1

Now try logging into the machine, with:   "ssh 'root@xxx.xxx.xxx.xxx'"
and check to make sure that only the key(s) you wanted were added.

You should now be able to log in to your reMarkable to update templates at you leisure without a password.

Happy note taking!

‘Educating the Gaze’ with Open Data Editor at Abrelatam/Condatos Bolivia 2025 / Open Knowledge Foundation

Workshop participants were able to identify problems associated with working with data. Some common scenarios were addressed, which were linked to a fear of working with data or dealing with databases without knowing what to do.

The post ‘Educating the Gaze’ with Open Data Editor at Abrelatam/Condatos Bolivia 2025 first appeared on Open Knowledge Blog.

How I use Zotero + OpenRefine + QuickStatements to create Scholia profiles from Wikidata / Mita Williams

Let's make scholarly profiles for our colleagues. Together.

August 2025 Early Reviewers Batch Is Live! / LibraryThing (Thingology)

Win free books from the August 2025 batch of Early Reviewer titles! We’ve got 232 books this month, and a grand total of 4,410 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.

If you haven’t already, sign up for Early Reviewers. If you’ve already signed up, please check your mailing/email address and make sure they’re correct.

» Request books here!

The deadline to request a copy is Tuesday, September 2nd at 6PM EDT.

Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the US, the UK, Canada, Germany, Australia, New Zealand, Spain, Denmark, France, Belgium and more. Make sure to check the message on each book to see if it can be sent to your country.

Thanks to all the publishers participating this month!

Akashic Books	Akashic Media Enterprises	Alcove Press
Anchorline Press	Awaken Village Press	Baker Books
Bear Paw Press	Bellevue Literary Press	Bethany House
Bigfoot Robot Books	Broadleaf Books	Castle Bridge Media
Chosen Books	Cinnabar Moth Publishing LLC	CMU Press
Consortium Book Sales and Distribution	Crooked Lane Books	eSpec Books
Gefen Publishing House	Gnome Road Publishing	Harbor Lane Books, LLC.
Harper Horizon	HarperCollins Leadership	Harvard Business Review Press
HB Publishing House	Henry Holt and Company	Heritage Books
Mayobook	Minds Shine Bright	Muse Literary Publishing
Paul Stream Press	Pegasus Books	PublishNation
Purple Moon Publishing	Revell	RIZE Press
Ronsdale Press	Rootstock Publishing	Running Wild Press, LLC
Seerendip Publishing	Simon & Schuster	Sunrise Publishing
Tapioca Stories	Tundra Books	University of Nevada Press
University of New Mexico Press	UpLit Press	What on Earth!
Wolf’s Echo Press	WorthyKids	Yorkshire Publishing

DLF Digest: August 2025 / Digital Library Federation

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here.

Greetings, DLF community! August can seem like a quiet, slow-going month, but it’s also the time when so much behind-the-scenes work is being done to prepare for the busy events, meetings, and schedules in the months ahead. If you’re like much of our community, quietly working hard now with an eye toward the fall, just remember to take a beat and soak up a few sunny warm moments here and there – you deserve rest too.

— Aliya from Team DLF

This month’s news:

DLF Forum early bird registration ends August 15: Register at the earlybird rate to join us in Denver in November, and view the program for the 2025 DLF Forum and Learn@DLF.
Call for Proposals: The CFP for the IIPC 2026 Web Archiving Conference is open now through October 15, 2025. Learn more and submit.
Sponsorship opportunities available: Support the community and the DLF Forum by becoming a partner for this year’s events.
Office closure: CLIR offices will be closed September 1st for Labor Day.

This month’s DLF group events:

Arts and Cultural Heritage Working Group Special Session: Reimagining Digital Library Software for Arts & Cultural Heritage Collections

Thursday, August 28 2025 at 1pm ET / 10am PT; https://clirdlf.zoom.us/meeting/register/BcLPpWNkQQubOtFBKN2oLg

Join the Arts and Cultural Heritage Working Group for a special session presented by Kendra Bouda.

Metavus (https://metavus.net) is a free, open source digital collections platform developed by Internet Scout Research Group at the University of Wisconsin–Madison. Originally designed to support STEM repositories with no physical holdings, Scout is now exploring how the platform might be adapted for small to mid-sized libraries, museums, historical societies, and archives that manage both digital and physical collections.

Join presenter Kendra Bouda for this one-hour session introducing the Metavus for Museums project—a customized installation of Metavus tailored to the needs of arts and cultural heritage institutions. Kendra will share project goals, demo the software and its features, and discuss key challenges encountered in adapting the platform.

Participants will be invited to share their own experiences and expectations—whether by discussing functionality they value in a digital collections platform or reflecting on challenges they’ve encountered in their own environments. This session aims to spark conversation, surface shared needs, and explore ideas that may lead to more adaptable and user-informed tools. Participants of all backgrounds and levels of expertise are welcome to attend.

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.

DLF Born-Digital Access Working Group (BDAWG): Tuesday, 8/5, 2pm ET / 11am PT
DLF Digital Accessibility Working Group (DAWG): Tuesday, 8/5, 2pm ET / 11am PT
DLF AIG Cultural Assessment Working Group: Monday, 8/11, 1pm ET / 10am PT
DLF AIG Metadata Assessment Working Group: Thursday, 8/14, 1:15pm ET / 10:15am PT
DLF AIG User Experience Working Group: Friday, 8/15, 11am ET / 8am PT
DLF Digital Accessibility IT Subgroup (DAWG-IT): Monday, 8/25, 1pm ET / 10am PT
DLF Digitization Interest Group: Monday, 8/25, 2pm ET / 11am PT
DLF Committee for Equity and Inclusion: Monday, 8/25, 3pm ET / 12pm PT
DLF Climate Justice Group: Tuesday, 8/26, 1pm ET / 10am PT
DLF Digital Accessibility Working Group Policy & Workflows Subgroup: Friday 8/29, 1pm ET / 10am PT

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member organization. Learn more about our working groups on our website. Interested in scheduling an upcoming working group call or reviving a past group? Check out the DLF Organizer’s Toolkit. As always, feel free to get in touch at info@diglib.org.

Get Involved / Connect with Us

Below are some ways to stay connected with us and the digital library community:

Subscribe to the DLF Forum newsletter.
Join, start, or revive a working group and browse their work on the DLF Wiki.
Subscribe to our community listserv, DLF-Announce.
Bookmark our Community Calendar.
Learn more about becoming a DLF member organization.
Follow us on LinkedIn and YouTube
Contact us at info@diglib.org.

The post DLF Digest: August 2025 appeared first on DLF.

Five for Friday – Interesting things about mobile libraries / Artefacto

This week we have put together five interesting things about mobile libraries. While these aren’t necessarily new, some are definitely new to us. And others are just neat things that should be celebrated. We are always impressed by the important work that mobile libraries do – outreach, community building, digital equity and more. The thing [...]

Source

The Road to a Press Freedom AI Commons / Open Knowledge Foundation

Opening the International Press Institute (IPI) archive is not just for preservation, but the first step toward creating a digital press freedom commons for the public good.

The post The Road to a Press Freedom AI Commons first appeared on Open Knowledge Blog.

Unlocking Transparency with Open Data Editor: Join Us at DRAPAC25 and Open Tech Camp Kuala Lumpur 2025 / Open Knowledge Foundation

The Digital Rights Asia-Pacific Assembly returns for its third edition on August 26 - 28, 2025, in Kuala Lumpur, Malaysia, bringing together diverse stakeholders to combat rising digital authoritarianism through collaborative, rights-based digital governance.

The post Unlocking Transparency with Open Data Editor: Join Us at DRAPAC25 and Open Tech Camp Kuala Lumpur 2025 first appeared on Open Knowledge Blog.

Training Indian Researchers on Data Literacy with Open Data Editor / Open Knowledge Foundation

The offline session discussed the different features of ODE using public datasets downloaded from www.data.gov.in.

The post Training Indian Researchers on Data Literacy with Open Data Editor first appeared on Open Knowledge Blog.

Empowering Life Scientists in Africa through Open Data: Highlights from the ODE Training / Open Knowledge Foundation

Using data such as COVID-19 genomic metadata from GISAID and custom datasets like dog genomic metadata, participants explored how to detect inconsistencies and errors in datasets.

The post Empowering Life Scientists in Africa through Open Data: Highlights from the ODE Training first appeared on Open Knowledge Blog.

ODE Training in Germany and Argentina: A Hands-on Experience with Data Journalism / Open Knowledge Foundation

The sessions were designed to equip participants with essential skills needed in today's data-driven journalism landscape and incorporating the Open Data Editor (ODE) in the data pipeline for journalists.

The post ODE Training in Germany and Argentina: A Hands-on Experience with Data Journalism first appeared on Open Knowledge Blog.

Our Network Contribution to Shape the European Union’s Data Policy / Open Knowledge Foundation

To create a more coherent and effective legal framework, we propose the following 5 pillars as foundations for the European Data Union Strategy.

The post Our Network Contribution to Shape the European Union’s Data Policy first appeared on Open Knowledge Blog.

In which we ask Copilot to do the team a solid / Andromeda Yelton

So, I wrote an alien artifact no one else on my team understood. (I know, I know.) I’m not a monster — it has documentation and tests, it went through code review for all that that didn’t accomplish its usual knowledge-transfer goals — and there were solid business reasons the alien artifact had to exist and solid skillset reasons I had to write it, and yet. There we were. With an absolutely critical microservice that no one understood except me.

One day someone reported a bug and my creative and brilliant coworker Ivanok Tavarez was like, you know, I’m pretty sure I know where this bug is in the code. I have no idea what’s going on there, but I asked Copilot to fix it. Also I have no idea what Copilot did. But it seems to have fixed it. Knowing that I’m rather more of an AI skeptic than he is, he asked, would I entertain this code?

And you know what? Let’s do it.

the Mean Girls girls-in-car meme template captioned 'get in loser, we're reviewing code live'

I mean obviously we’ve gotta have a human review this code before it lives in main and there isn’t an option besides me. But suddenly we have an opportunity, because if I turn this code review into a Twitch-meets-Webex livestreaming party, my whole team can watch me talk my way through it, interrupt with questions, and hear my whole mental model of this section of the code, right? Hear my reasons why this code works or doesn’t, fits with the existing structure or doesn’t?

It turns out I just needed some code from outside of myself to make this possible. And the only way to get code for our alien artifact was from an alien author.

And it was great.

I could see the gears turning, the lightbulbs flipping on, the “+1 XP”s pinging up. I think it was the first time anyone else on the team got real mental traction on this code. Actually it was so great I’m now doing a series of walkthroughs where I just narrate my way through the code with the team until we all get tired of it or feel adequately enlightened. And for the first time, I feel like if we need to add more functionality to this microservice, I might actually be able to assign someone else to do it — with handholding and consultation, yeah, but without me being that guy from Nebraska for my team.

https://xkcd.com/2347/ , the one with the Jenga tower labeled 'all modern digital infrastructure' and the tiny yet load-bearing element labeled 'a project some random person in Nebraska has been thanklessly maintaining since 2003'

So…yeah. I’m pretty interested in machine learning (I did the original Andrew Ng Coursera course and several of its followups, on my own time! I did a whole experimental neural net interface (now offline) to MIT’s thesis collection in 2017! I taught an AI course at the SJSU iSchool! I was an ML researcher for the Library of Congress!). But I’m also reflexively skeptical of it (I use it in my own coding for autocomplete, but I’ve never described code and had the LLM write it! that AI course sure did talk a lot about algorithmic bias! I went to a college whose actual mission statement is about the impact of technology on society! I believe in the importance of real human thought and kinda want LLMs to just get off my lawn!). This use case captivated me because it genuinely surprised me. I hadn’t thought about it as a way to potentially expand the capacities of my team — not in some hackneyed capitalist-grind productivity way, but by getting us outside my own head (the limiting feature in this case) and giving us a shared artifact that we could use as the basis of a conversation to genuinely advance our own skills.

I can hear you asking whether the code was any good. For the purposes of this use case, the great thing is it doesn’t matter; it just has to be relevant enough to support substantive review. But fine, fine, I won’t leave you hanging: I had minor stylistic quibbles but in fact it correctly identified a semantic bug I had totally missed and fixed it in a way which fit cleanly into the existing structure, and I merged it without qualms.

And yesterday Ivanok came up with a clever new way to leverage AI for teambuilding and skill-building purposes, so I’m gonna tell you about that too! But in the interests of word count, you’ll have to wait for part two :).

Book Club Pláticas: Reflexiones on Culturally-centered Methodologies / In the Library, With the Lead Pipe

In Brief In spring 2024, two Latinx colleagues at California State University, East Bay, developed a pilot program focused around hosting a book club which has evolved into a larger exploration of plática methodology. This article explores culturally sustaining co-curricular collaborations and spaces on a university campus through the use of book club pláticas and PRAXISioner reflexiones (Reyes, 2021). The authors reflect on their roles as PRAXISioners, plática as methodology and practice, and engage on the value of self-sustaining practices as Latine educators.

By Daisy Muralles and Vanessa Varko Fontana

“This pedagogy makes oppression and its causes objects of reflection by the oppressed, and from that reflection will come their necessary engagement in the struggle for their liberation. And in the struggle this pedagogy will be made and remade.”
Paulo Freire (1921-1997)

Introduction

In spring 2024, we took a popular model often used in American libraries, the book club, and added a cultural and community-building lens as part of that experience. In this article, we will share how we came to this work as PRAXISioners, and the barriers we aim to break down through our collaborative work. We will also describe how our collaboration on the book club project acted as a vehicle to hold culturally informed pláticas and what they looked like; and, finally, we also reflect on how this work allows us the space to come together with our own experiences as teachers and learners. The book club gave us an opportunity to explore the works of Latine scholars and authors, to engage in pláticas, allowing us to dive into new concepts and ideas about our culture that we had not discussed before–the unnamed things that somehow we understood as being part of our cultural identities but were not always sure of where they came from or why they existed. Throughout this article we will use the gender-inclusive “Latine” in place of the plural Latinx or Latina or Latino or Latin@, or its many variations. Created by feminist and nonbinary communities in both Latin America and the United States in the 2000s, Latine aims to describe all people, not just men or women (Guzmán, 2023).

We hope readers will walk away knowing the importance of culturally-sustaining co-curricular programs. We hope readers feel empowered to lean into their cultural-sustaining pedagogies to inform practices that are by and for BIPOC communities. We hope to inspire or mostly affirm for librarians who are already doing this cultural work, that this is important work for ourselves, our students, and campus communities.

Some of the content of this article was originally presented as, “Praxisioners Platicando: Fostering Belonging Through Culturally Centered Learning,” for Case Studies In Critical Pedagogy hosted by the Metropolitan New York Library Council (Muralles & Varko Fontana, 2024). The “Case Studies in Critical Pedagogy” event was a primer for learning about and thinking about anti-colonial theory and pedagogy. This article hopes to expand the reflective process of that presentation.

Context and Positionality

We are both Latine educators at California State University – East Bay, a higher education institution. Cal State East Bay is one of 23 campuses in the California State University (CSU) System. In 2024, it was ranked 13th in the USA Today Most Diverse Universities, and listed in the top 2% by CollegeNet.com in the Social Mobility Index, among other awards and recognition (Cal State University East Bay, 2024). Additional facts from the 2024 report include that Cal State East Bay has a female population of 59.3%, 57.1% are first generation college students, and 40.9% of undergraduates identify as Latinx. Cal State East Bay is also considered a commuter campus. In an internal Parking and Transportation Services report (FY 22-23), from 581 (65%) of undergrads, most students commuted over 5+ miles to get to campus with 119 reporting they commute for more than 30 miles to get to campus (Parking and Transportation, 2023). These are important demographics to keep in mind as our target audience was to reach Latine women on our campus.

Specifically, we are a Latinx student success coordinator and a Latinx librarian navigating relationships, politics, changing leadership, ongoing financial hardship, and more at a Hispanic-Serving Institution (HSI) that has received the Seal of Excellencia. Institutions that receive this national certification are recognized for their direct service to better serve Latino students, rather than the HSI designation which is based primarily on enrollment numbers (Excelencia in Education, 2024). However, we started working with each other in part because we were frustrated that “HSI” and “Seal of Excelencia” still felt like lip-service because of the ongoing financial cuts to our programs, including hearing that some of the programs that led to receiving the Seal of Excellencia were no longer going to be supported by the university.

Barriers Latine Students Face

The ongoing budget cuts, consistent news of low enrollment, and recent executive orders, such as Ending Radical and Wasteful Government DEI Programs and Preferencing (Exec. Order No.14151, 2025), and Ending Illegal Discrimination and Restoring Merit-Based Opportunity (Exec. Order No.14173, 2025) exacerbate issues of discrimination, bias, and exclusion experienced by immigrant and Latine communities pursuing higher education. And despite there being efforts to promote diversity and inclusion in higher education prior to these executive orders, Latine students continue to face barriers to fully engaging and feeling a sense of belonging on their college campuses (Manzano-Sanchez, et al., 2019; Dueñas & Gloria, 2020). This impacts the psychological well-being, academic performance, and overall college experience of Latine students (Manzano-Sanchez, et al., 2023; Fernandez, et al., 2022). Additionally, Latine undergraduates are likely the first in their families to attend college (Postsecondary National Policy Institute, 2022). Our students share with us that they often feel that they can’t fully express their problems or college experiences in their home setting, while also not being able to share the challenges they have at home with their peers.

The book club became a way of meeting these problems, seeing them, and learning directly from students about how these problems impact them and their sense of belonging. Hearing our students’ stories and connecting with those stories highlighted another ongoing problem in higher education. We hear these individual stories and recognize them as stories of resilience and hope. We see the problem-solving from our students and recognize that we have a shared struggle with them, that as Latine women in higher education, we still share these similar experiences in our academic journeys as well. We are reminded that before March 2022, there was no Latinx Student Success Center. We are reminded that there has never been a Latinx librarian at this institution before I was hired in 2020. But we are here now and what our roles as librarians and student success center coordinators afford us is to bring these students together, not only to us but to connect them with other Latine staff and faculty, to learn from our shared experiences and support each other. We also recognize that as Latinx educators we hear ourselves saying similar things. Both of us have experienced imposter syndrome (Brown, 2023), vocational awe (Ettarh, 2018), or the feeling like we are the “the only one” (Pierre, 2024). But the book club brought us together.

Where we are coming from

In our work, we remind ourselves and each other that we need to keep supporting our cultural selves, as Latine educators and cultural workers in higher education, to be able to welcome and support our racially and ethnically diverse students. Below we provide more information about where we are coming from to help readers understand how we came to this point in our work. We hope to share a bit more about how together, through the book club, our exploration of plática methodology, and our ongoing reflective practice, we have begun to more systematically explore how our partnership with the University Library and the Latinx Center can work towards building culturally sustaining, co-curricular spaces for our students.

Daisy: Born and raised in a primarily immigrant community in a Black and Hispanic neighborhood in Central Los Angeles, I was one of four daughters to Guatemalan immigrants. My parents ingrained in me that education was a priority. For this reason, my father, in an effort to improve our educational attainment and social welfare (there was a lot of substance abuse and gang violence in our neighborhood), moved us to the San Fernando Valley. We were no longer in a familiar environment of primarily Black and brown folks, but as the demographics of our new community changed, and our field of view grew, we were exposed to more diverse communities. This continued throughout my life and several years later, while working as a staff member at an archive, I was introduced to the California Ethnic and Multicultural Archives, curated by Salvador Güereña. I did not find an exact cultural representation of my experience in the different collections within this archive, but I did see something that went beyond. I began to expand my understanding about the shared immigrant experience, about being poor, and coming from a working class family. I felt the “ni de aqui, ni de alla” feeling across silkscreens, postcards, and art work from Chicanx/Latinx, Native American, Asian American and African American artists but something else as well. My cultural self was not necessarily exactly represented but there was some other visual articulation that represented the neighborhoods I grew up in, the food we ate, the familiar figures in our lives.

Fast forward to my current position in the East Bay, where I am the only Latina in my department of 12, I was feeling the cultural disconnect again. Despite being in an HSI/Seal of Excelencia receiving institution, I felt very much like the only one. It was not until a presentation with the Chicanx/Latinx Staff and Faculty Association that I was able to connect to other Latine faculty members. It was through a relationship with one very special Latina faculty member (thank you, Professor Crystal Perez), through her sisterhood, that I was able to meet and befriend other Latine folks on campus, which thank goodness, also included Vanessa. What started off as first meeting in events and attending each other’s workshops and programs turned into hanging out to de-stress, in addition to recommending courses and making student referrals to each other. And through these encounters we finally realized we could be working more closely together because being together encouraged us to do the outreach and programming we wanted to do to serve our Latine student community with our authentic selves.

Vanessa: I was born and raised in Los Angeles county in the 1980s and ‘90s to a Salvadoran mother and Guatemalan father. Within four years my father died, which began my experience being raised by a single mother and later embracing my identity as a fatherless child. Along with this identity shift came the new reality of balancing suburban living and inner city experience. This looked like adjusting to a private school culture as a Spanish-speaking child during the week and staying connected to family and culture headquarters that remained in the heart of an urban mecca on the weekends. Simultaneously, my family chose the route of acculturation and silencing the painful history of our roots in El Salvador which included poverty, civil war, and violence. While the awareness came later, it is clear that these early experiences became a foundation for the career in education that I later embarked on and have embraced for over twenty years. What I came to learn is that this duality mirrored the journey unfolding as a first generation Latina in the US. The feeling of “ni de aqui, ni de alla” explained my struggles as a daughter of an immigrant, a fatherless child, and the choice to develop as a social justice educator and academic scholar. I realized that my liberation would come with learning my history and asking questions to better understand my ancestors, their pain, and their radical resistance. This realization led me to find a degree focused on activism and social change from New College of California and the space to learn from various social movements, including those in El Salvador. As my undergraduate studies offered me the historical and theoretical framework to my early experiences and observations, it was the community work I delved into while studying that provided the additional layer of self awareness and commitment to social justice education.

Personally, my first generation experience meant working 40+ hours a week while maintaining my path towards graduation. These experiences included the typical retail student jobs as well as entry level jobs in education and youth development programs. The non-profits that I connected with offered me additional knowledge and theories such as harm reduction, youth development, positive sexuality, and anti-oppression. As I earned my bachelor’s degree and continued my professional career in non-profits and schools, I recognized the importance of mentorship and guidance. It was also illuminated that when this connection was created with fellow Latine professionals, it added a unique layer of support and understanding that has been essential to my professional and personal goals towards healing and liberation. Hearing their stories, feeling their support, and creating communities helped redirect my professional development to healing centered engagement instead of the typical burnout path that many of us educators experience as we navigate the bureaucracy and institutional oppression that exist at every level in education (Ginwright, 2018). When I arrived at CSUEB as the inaugural Latinx Student Success Center (LSSC) Coordinator, I approached the role with the intention to build community throughout campus and find like-minded individuals. This mindset led me to connect with Latinx professors in the English department with the intention to collaborate on campus events. These successful partnerships led to professional collaborations that would become friendships. Naturally, this camaraderie and safety created a space to share ideas and thought partnership on continuing to build together as individuals and as professionals on campus. When I met Daisy and learned of our shared Guatemalan and San Fernando Valley roots, it felt like a familiar and comfortable space with a cosmic push to create a collaborative project with the library and LSSC. This was proven by the ease and natural flow this project came to be and through the healing, powerful, conversations with each other and student participants.

Becoming PRAXISioners

In our work, we have begun to adopt the term PRAXISioner in referring to our efforts to address systemic problems experienced by the individuals we work with, the communities we aim to uplift and support, and ourselves. A PRAXISioner reframes the practitioner through praxis. A PRAXISioner is thus embedded and is concerned with the history, needs, and aspirations of the community towards self-determination and actualization. The PRAXISioner is continuously studying and sharpening their analysis by deepening their ongoing learning and self-reflection of critical theories; the PRAXISioner understands the affirming and healing potential of their work especially through historizing, problematizing, and reframing (Reyes, 2021). Reyes (2024) further shares that a PRAXISioner:

… is to be of concrete help to my local and global community in our struggle for community preservation and liberation…rooted in [Paulo] Freire’s conception of praxis, which involves engaging the language of critique to problem-pose one’s material conditions within a cycle that includes engagement with critical knowledge/theory, self-reflection, dialogue, and action.

When the opportunity for our departments to collaborate came up, we discussed how to bridge our work and the importance of including self-reflection. It was through our conversations that we saw this term of PRAXISioner as incredibly reflective and applicable to us, both in aspirational and inspirational ways. Therefore, we’ll be using PRAXISioner to describe ourselves in this process as researcher-scholars. In this process, we are continuously recognizing that we need to go through the cycle of problematizing, visualizing, reframing, and reimagining the ways in which we want to lead our practice in supporting our students and ourselves in academic spaces. We hope our reflective article describes how pláticas allows us to do this process, and we are affirmed by the growing educational research around pláticas (Bernal, et al., 2023). We also hope that by introducing this idea of PRAXISioner to our readers, that we can be more critical about how we thread the way we show up for our students in both professional and personal ways.

On Plática Methodology

In this article, we go in a few circles about our reflective process. This is primarily because it has been that type of cycle, an iterative process that we repeat. As we both learn more about plática methodology we find ourselves reading about what we are doing. As PRAXISioners, this process, the relationality, and the theorizing that takes place feel familiar. Learning about plática affirms us in so many ways by recognizing that this methodology has been enacted for centuries–in our communities–in our pedagogies of the home (Fierros & Bernal, 2016; Garcia & Bernal, 2021). In this process, we engage in reflexiones as part of an autoethnographic approach. Autoethnography seeks to describe and systematically analyze personal experience in order to understand cultural experience, which we aim to do with our reflexiones (Ellis, Adams & Bochner, 2011). We analyze our personal experiences in order to understand the broader experience of Latine cultural workers and information educators, and further our understanding of our Latine students.

To clarify, pláticas are informal conversations that take place in one-on-one or group spaces that allow us to share “memories, experiences, stories, ambiguities, and interpretations that impart us with a knowledge connected to personal, familial, and cultural history” (Fierros & Bernal, 2016; Bernal, 2006). Fierros & Bernal articulate, “…family pláticas allow us to witness shared memories, experiences, stories, ambiguities, and interpretations that impart us with a knowledge connected to personal, familial, and cultural history” (2016, p. 99). We believe that through our book club pláticas we were able to engage in familial pláticas, communicating thoughts, memories, ambiguities, and interpretations of our own experiences as Latine individuals in higher education through the discussion of culturally relevant readings, songs, and artworks. By allowing our conversations to shift where they needed to go when they did, it opened us up to explore those familial spaces. The readings shared brought personal, family, and community stories from participants. These stories about lived experiences, family traditions, and community connections were our book club anchors. We began to realize we were engaging in the practice of plática because we were building relationships with each other, honoring each other’s stories, and “find[ing] that pláticas beget other pláticas” (Guajardo & Guajardo, 2013).

A vivid example of engaging in the practice of plática emerged from the reading of “My mother is the first one in her family to leave the Dominican Republic for the United States” (Acevedo, 2023, p. 181). We started with first engaging in reactions and first impressions and quickly discovered that all twelve of us connected and had stories about preparing plátanos and eating different plantain dishes. We quickly began to discuss the recipes but primarily what sides we ate with our plantains. We shared the regions of our ancestors and the impact they had on the plantain dishes we enjoyed while never really knowing about their history. As the plática conversation continued, the topics organically flowed into a conversation about poverty and the socioeconomics of food. These conversations taught us about our different lived experiences and how we all came to be in this same room together. This realization was a reminder of the power in our individual and shared history as Latine folks in the United States.

The moments occurred throughout all of our sessions. These sessions happened each week, and our conversations evolved to deeper topics as the weeks progressed. We recognized that our pláticas were a teaching and learning process and a teaching and learning tool. We were “contributors and co-constructors of the meaning-making process” in our pláticas (Fierros & Bernal, 2016, p.111). The pláticas allowed us all to engage in a hopefully familiar way of learning, to collect and synthesize data from and with each other by ways of reflection, and to extend our ways of knowing through and with each other. In our book club pláticas, we exchanged knowledge, and sometimes it felt like we were building new knowledge of navigating that crosswalk between home and higher education. This meant that our pláticas were healing; they were open and vulnerable spaces that allowed us to look at ourselves individually and collectively as Latine folks in academia. Much of this was because of the location of our conversations as scholars in higher education coming together, opening up, and relating while we have a shared experience.

Project Overview

From creating community to developing a research idea

In the fall of 2023, Daughters of Latin America (2023), edited by Sandra Guzmán, was going viral on social media. Our personal excitement to share Daughters of Latin America (DOLA), combined with the expressed interest that we heard throughout the semesters from students looking to build a community connected to their academics, gave us the idea of a book club. The book club format felt like the right vehicle to lead us to our destination of community building, a sense of belonging, and learning more about our culture. We used the basic structure of a book club by choosing a book, creating a meeting schedule, reading and discussing the works presented in the book. The book club really was the container we were looking for to bring us together. Using our connections with various students and student organizations, we anticipated forming an inclusive environment where folks could come together, meet other students and make new connections.

The anthology is a great introductory work to learn about women authors and different styles of the written word, from poetry to essay to short story, ranging throughout history and the Latin American diaspora. The book also lent itself to align with Women’s History Month and the work we were each already doing in our primary roles on campus. But, we knew we were going to host a different kind of book club. Guzmán encourages readers to “read from front to back, back to front, or open.. at any page. It’s also meant to be read while listening to songstresses of the Americas – from salsa Queen Celia Cruz to Toto La Momposina, from Ile and Elza Soares to La Lupe…” (2023, p. 17). And this we did. The songstress list she provided inspired us to create a supplemental YouTube playlist that we used to showcase songs before our book talk sessions as ambience while folks got settled into the Zoom room and then highlighting the artists during the sessions to revel in their stories for inspiration.

Our approach and focus felt aligned with our values. Much like the Puente Programs, GANAS, or other culturally-based college programs, we saw that a book club would allow us a collective space to use culturally relevant pedagogies and practices, such as pláticas and PRAXISioner work (Castillo et al., 2023). This collective space where students engage in learning outside the classroom is a co-curricular environment we aim to create in all of our programming. The culturally-relevant content of the book is an important component of the book club because it featured the various Latine authors, creators, scholars, activists, and more, that served to inspire, provide examples, and representation. The diversity of these authors and literary styles provided a welcoming introduction to authors that most of us had never had the chance to explore until now. Additionally, the supplemental content that included music, sounds, and visual art and short videos to build community in the group helped us reimagine the book club experience. Much of the supplemental content we shared can be viewed on our public LibGuide. Through dialogue and thought-partnership of the content covered, this community saw, heard, and experienced how the personal is political. While our planning is student-centered, we were also able to recognize the impact that it had on us personally as Latinx professionals on campus.

Our first barrier was the financial aspect of purchasing book copies. This was important for us because we knew we wanted to use the book as an incentive for participants. Also, giving participants a book they could touch, hold, and have on their shelves to return to after our book club sessions felt much more meaningful. We looked for grants outside the institution, not trusting that we would be able to use money in our programs for a one-off book club. However, in talking to colleagues about the project, and recognizing that this could be more than just hosting a book club, we were able to purchase 15 copies of the book through a Library Faculty Research Support Grant established to help faculty in our Information Studies Department pursue their research. It was in fact at this juncture of applying for the faculty research grant where we came to realize this was something we could use to more directly study, practice, and learn from for our work as educators. We also needed funds for food, decorations, and small tokens of appreciation. For this, the Latinx Student Success Center was able to cover food for the in-person events as long as we posted the event on our campus’ public events website. We also created bookmarks in-house to accompany the book, knowing that these small details centered in celebrating Latine women were components to creating a culturally-sustaining environment for our book club.

Engaging students in co-curricular learning

It was important for us to customize the book club format to the needs of our students at our college campus. We promoted the event on the campus’ official student clubs and organization events page, the library’s official Instagram page as well as on the Latin Student Success Center’s Instagram page. We received a total of 16 interested individuals, primarily Latine women (and one male participant who showed up consistently!) and had 14 participants show up with seven undergraduates who showed up to all events. Some of the participants that had expressed interest were in fact faculty, but unfortunately most faculty interested were not able to make the book club meetings. We did, however, check-in with them about the book club and their encouragement in the work we were doing was also meaningful for our excitement.

One of the ways in which we updated the format was in scaffolding the meeting process by using a hybrid approach. Our first three meetings were online to better accommodate the schedules of our largely commuter campus. We also decided to meet in the evening, which meant that this was most likely after work and school for most folks to give everyone (including us) the flexibility to come together “after hours.” We also believed that this would influence the energy of the group, since we would presumably be more comfortable and cozy in our own spaces (i.e. in bed with a cup of tea, lounging in the living room), hopefully allowing us to be more open to sharing. Our approach for the book club program was hybrid because we held our last two sessions in person during U-Hour, a time when students are not scheduled for any classes and would most likely be available if on campus. The first in-person session was at the university library and the second in-person session was at the Latinx Student Success Center. This was intentional to encourage students who rarely come to campus, the chance to visit the university library and the Latinx Student Success Center.

We saw that coming together after meeting online deepened connections and trust established in the Zoom sessions. Students had either already met us or had gone to the Latinx Student Success Center to meet us casually before our official in-person meetings. We were excited that we had high attendance for our first in-person meeting because we thought folks would not want to come for an on-campus event. But we were beyond thrilled to hear that it was largely due to the relationship-building process that was part of our Zoom sessions prior to meeting in-person. In some cases, this was the initial connection to campus that helped them continue to find ways to build community on campus.

We did not want to follow typical book club conventions (i.e. read chapter, have comments/feedback ready, expectations to share) but we also did not want it to feel like a typical online classroom (i.e. assigning chapters to read, feeling behind if you did not understand or read the poem or essay). We tried to address these things that made us think of “stale” book club sessions or a classroom setting by reminding folks that they did not have to read the content in advance and verbally stating that we did not want to be a classroom. But we also changed the vibe by including multimedia (i.e. playing music) and casual interactive elements (i.e. talking about our day) in our virtual sessions. Not only was the content culturally-relevant to the student population we were engaging with, but we also wanted to make sure our content could be experienced in a variety of ways beyond the text. Along with our weekly reminders the readings were voluntary and that the only requirement was to show up and tune in, we shared the selected readings as PDFs both before and during the sessions. We shared our screen to show a quote or passage and highlighted things that stood out to us, which meant that a participant would still be able to review the content during the group’s discussion and provide their reflections and ideas without having read the content ahead of time. We believe that this also helped us with attendance and participation as we continued the series because the barriers to be ready for book club were removed, and we were able to provide multiple ways of engagement. Building our book club in this way allowed us to go a step further into our pláticas.

Reflecting back on how we implemented the pilot we recognize that even without funding, we can still do this type of book club. We can also imagine bringing this into a more formal classroom setting as well. It feels doable to either scale it up or simplify it because the important components were making sure the environment was culturally-sustaining through the content shared, that the vibe was casual, meaning that there was a low/no-stakes commitment and preparation, and that it focused on engaging in meaningful relationship-building pláticas. As we write this, we can imagine making a mini zine with just the readings and incorporating lyrics from the songs we hear together to make it offline. We can help a student organization do this for their own club members. We think that there are many ways that we can imagine scaffolding this book club.

Ongoing Reflexiones

The focus of this article was to dive into our reflective process as PRAXISioners while engaging in our understanding of plática methodology. We hope that we have provided some examples of how plática emerged as central to our planning, how it appeared during book club, and now, how we continue to use it in the aftermath as well. Here we see how the cycle of learning about plática methodology was actually a return to our cultural history which allows us to affirm community learning practices that empower us to understand the critical power of practical and theoretical tools we already carry from our homes. The affirming and healing potential of this work helps us understand our own histories and helps us reframe the problems and issues we experience–in this case, within higher education. Engaging in reflection has helped us understand our experiences in this process and has ultimately given us the confidence to continue the process of engaging in plática methodology. We continue to hear ourselves say we know this and we’ve been doing this. These reflective conversations have been a source of nourishment for our personal and professional practice. As practitioners, we recognize that creating these spaces benefits our students and ourselves–academically, personally, and professionally–both in the classroom and beyond. To anchor our planning, process, and praxis, it is critical that we are intentional about how we continue to connect with plática as a transformative qualitative inquiry process and methodology (Carmona, et al., 2021).

Reflexiones post book club pilot

Below are excerpts from a plática we held on September 3, 2024, five months after our book club experience, but still before our preparation for the 2025 book club planning. We asked each other questions (Appendix A) to give us some structure and to allow us to tap into our recent and past experiences as educators in framing our book club experience. We reflected on the book club pilot from earlier in the year and realized that we would soon start planning our next iteration. Our conversation is reminiscent of the dialogue in bell hook’s “Building a Teaching Community” (1994, p. 129). hooks has been a guiding light in our educator journeys. The questions we asked each other helped us bring in our personal strengths and interests to enhance the session, but ultimately, we wanted to reflect and prioritize a sense of belonging over the planned content.

Vanessa Varko Fontana: I think for me what was cool was that it was something that started as personal excitement. Right? I got this book. Personally, I was like, “Oh, my gosh!” I was so excited to read it and share it. And I just kept thinking, “Wow! What would have happened if I read these stories when I was an undergrad?” “What would have happened if I was able to talk about stories like this with my peers – when I was going through school?” I think it also speaks to us when we start from a place of our own personal excitement, and how our work integrates with our hobbies or passions.

Daisy Muralles: Yeah, as you were sharing that, it reminded me of one of the moments that I won’t forget. It was one of our last sessions. And there was a student that was like, “it is really amazing to see you two like these two women basically collaborating and working together and bringing us together in this way.” I forget how she phrased it, but that’s what I took away, and I was like damn. I have a few questions about mentorship and like us as [Latine] women in these academic spaces, but, like her words right there at that moment just made me feel like this was so necessary. This was so needed.

VVF: This was one of those few projects that I’ve been able to share more sides of me than I can in different academic spaces like, the creative and goofy and loving to read. I think in academia, sometimes we only tap into a couple of our sides at once, and I think that for me that was really meaningful… from the art, the music, like I felt like I could bring in all these different parts, share different parts of me in the space that I don’t really get to always.

DM: And that makes me think very seriously about myself in/at our campus, and how important it is to be myself in front of our Latine students. But at the same time how necessary it was for me to have that space too with those students and with you, and so it really reminded me about what education can look like and what learning from each other outside the classroom looks like. I think it was a very important moment, and I don’t think I’ll ever forget that.

VVF: And after starting Book Club I noticed them coming by to the point where some maybe just came in once or twice, but a few more started to become regulars, and I would see them multiple times during the week. And you know that definitely helped with the relationship building, but like the resource sharing and like building, you know, community on campus. And I was able to help students or talk to them about their upcoming graduation, include them in the Chicanx Latinx Grad, talk to them about life after graduation, and I think, even hook them up with some other campus jobs. So I really appreciated how the book club allowed me to, you know, get to know students that I wouldn’t have before, and that we were able to because we met each other through the book club.

DM: We were learning so much more about that individual person. And we just got to hear different parts of who those folks were in that Zoom room and then in-person with us. And so it made it intimate. Kind of like quickly. We were able to feel intimate more readily because we were already learning so much from each other. And they didn’t have to. That was the other part. They really didn’t have to [share]. But people opened up. It wasn’t like people were giving out their whole life stories… but there was trust, and you could feel it….

We left this plática feeling refreshed and empowered to continue on this project. At the end of our plática, we asked ourselves, “What happens when people feel affirmed and seen?” From our own experience in it, and from the feedback we heard from the participants, we learned that the book club helped us bring Latine students together to engage in plática. It also showed us that by creating opportunities to engage in culturally-sustaining content with other Latine communities on campus, Latine students can become more comfortable in using campus resources, like visiting the Latinx Student Success Center, engaging in campus events, and creating opportunities to talk about their academic progress, like sharing more information about their classes with us. When we bridge personal connection with academic experiences we are able to demonstrate the intersectionality of culture in our work, no matter the major or industry. This type of information allows us to connect our students with our network of Latine scholars, instructors, advisors, and other folks we know can provide culturally-responsible communication. And we were also able to connect students with new Latine mentors and scholars, not only the ones on our campus, but the inspirational scholars, authors, creators that were showcased in our book club sessions. It was a powerful experience to witness the ripple effect on the group’s consciousness and personal reflection–how a poem led to a song, which led to a painting that was connected to our cultural roots and families, and how it allowed us to see the impact of our goals and agency to succeed and thrive. All of these were positive outcomes of the book club and motivation to continue this project.

Reflexiones a year later

Below we take another look back to our first book club experience a full year later. We try to respond to the following questions: how we felt after the first Zoom, after the first in person, and after the last one; our interpretations of the students’ feelings throughout the process; and how this affected our relationship with our students, each other, and our Latine identity.

DM: I am so glad that Vanessa and I were homies first before we became collaborators. When we first got started, I knew that our energy together would be good for a book club or podcast (this got confirmed by students as well). I feel like Vanessa gives me a different kind of confidence and I am able to be myself more readily. This is because I know Vanessa; she has my back. We have shared cultural experiences that we know about each other. During that first book club session, I was definitely nervous, had some technical difficulties because of course I don’t know how to use slide presentations when I am presenting. But knowing that Vanessa was there filled me with excitement and courage. I think in that first session, I did not consider myself a PRAXISioner; it was something in the back of my mind but it was probably not until after our second online Zoom session that I recognized my role in that space. Yes, it was one part moderator to help navigate through the content, but it was also a collaborator not only with Vanessa but with all of the participants. We had to share and be vulnerable together.

As mentioned earlier, we were surprised about the high attendance in the first in-person meeting, but by then (three sessions in) we had already formed relationships with each other. As an advisor to an academic Latina sorority, I want to see how I can continue to build that sisterhood further by leaning into the academic, professional, and lifelong learning support I aim to provide as a Latinx Librarian. I think I can continue doing that by engaging in plática in multiple spaces-being intentional about it. This also expands into the teaching that I do. In one of my courses, students complete a class podcast focusing on visual and auditory experiences. Learning from book club, I hope I can adjust the podcast for students to create their own types of liberatory teaching spaces. This process has allowed me to find myself as a scholar, recognizing that I want to learn and teach in these ways.

VVF: It was definitely a benefit that we established a rapport with each other but also we trusted our professional work ethic and followed through. In terms of the preparation we were intentional about what we could do or what each of us could offer from our departments. It was refreshing to have built an equitable collaboration that was uplifting and motivating, versus draining or unsustainable.

I was really proud we created a way to address the post-COVID low engagement on campus while introducing a new text and introducing writers throughout the Latine diaspora. As Central American Latinas, we wanted the chance to offer scholars we wish we learned about earlier in our lives. It was exciting to feel innovative and part of the solution. At the same time, I was really nervous about the Zoom factor and staring at gray screens and the impact that has on the vibe and my personal facilitation and the ability to stay engaged. I am so happy I was able to share that plan around that. Not only did we offer a way to share an image with the group while staying off screen, we were able to sprinkle a little tech tool too. I always remember when we asked folks to share images of their cultural foods, we got to see a screen of amazing delish dishes and stories. It was really cool for students not only to humor us with those asks, but really lean into it and like to appreciate it because we were not forcing folks to be on camera but we did offer a way to participate.

I remember being nervous about reading the readings. We knew we did not want to demand reading the weekly selections but that’s the typical book club format we were accustomed to. This is when I recognized that the anthology format would help us with this. Curating each week with a combination of shorter pieces, essays, excerpts, made it less intimidating to read some or all of the selections. We also had the readings scanned and ready to share with the Zoom room. Whether students had read it before the session or during, they had the opportunity to share their reflections. Creating a familiar sequence from the first session allowed the group to build rapport through their reflections and vulnerability. I know that I felt that when students began to share their faces voluntarily, came to meet us at the library and LSSC between book club meetings, and kept showing up, reading more, and sharing. We both recognized that by the last session we barely spoke, because the conversation just flowed and the connections were sparking with minimal facilitation.

This experience has reminded me of the power of weaving cultural identities in our work as a powerful tool to help connect with students and these connections are a portion of their experience towards graduation and building a fulfilling life and how that impacts positive change in our communities and beyond.

Conclusion

As we learn about plática methodology, and build on the work of others (Fierros & Bernal, 2016; Carmona, et al., 2021; Bernal, et al., 2023), we reflect on our process. We reflect on our shared work, and what it means to develop, create, and consider new practices and methods that honor our cultural selves. These conversations help us name and be honest about how we are engaging and learning not only about plática methodology but about being PRAXISioners. We are finding the words to affirm our academic experiences and scholarship by validating our cultural practices from home. What these pláticas are doing is teaching us a new language, providing us with this opportunity to uplift our practices, blending our cultural education with this academic journey we have chosen to be on. We are reminded of the value of this process when we create these spaces, not just for ourselves but with other Latine scholars.

This process of learning from our students, learning from each other as PRAXISioners, and learning from other Latine scholars has been an affirming and empowering experience. Responses from our pilot survey reminded us of the importance of holding space for Latine students on our college campus. We heard the healing, the need to share, and the need to hear and learn from each other. This is particularly important now in this current political climate. We know we need to do this work even more because it means more, it is necessary. As we work to problematize, visualize, reframe, and reimagine our culturally-sustaining work between sessions to help us navigate this academic landscape together, we are able to see how this collaboration between the university library and the Latinx Student Success Center is beneficial to us. The collaboration allows us to center a cultural and community-building lens to a library’s book club event that we know is more than representation–it is about empowerment. Together, we are able to explore our culture and rediscover these research practices that we learned at home with our mothers, siblings, and other family. We explore these ideas in a familial space, in a culturally-affirming way, where we can laugh, cry, and be vulnerable about our various identities that are between home and school. Again, we are reminded that we need to keep supporting ourselves as Latine scholars, educators, and cultural workers, to be able to show up for our racially and ethnically diverse students. We are reminded that we need to keep creating these spaces that support and retain us because most places do not acknowledge these parts of us.

We will continue our work in understanding how we can dive deeper into the impact of culturally sustaining co-curricular spaces, pláticas, and strengthening our PRAXISioner skills. We hope that this reflection inspires other educators, librarians, and practitioners to incorporate culturally relevant practices like pláticas and reflexiones as part of their anti-colonial practices in instruction, outreach, etc. Though some methodologies are specific to certain groups, we hope it inspires folks to learn more about cultural learning practices.

We believe that through this work we are building a network with and for ourselves and our students. We are building a support system that not only touches on academic struggle but the personal as well. We remind ourselves we are not foreigners and that we can build our own spaces. We listen to each other and figure out what our students need, and also what we need. The semesters will speed up and slow down, but our personal experiences in academia allow us to connect and empathize with our students’ experiences. So we have to take the time and encourage self reflection in ourselves, and share with each other our aspirations and goals. In addition to providing guidance through a praxis framework, we have the chance to introduce organizational efforts and institutional resources to students, which means that we need to make time for our pláticas.

Acknowledgements

We would like to acknowledge internal peer-reviewer, Brittany Paloma Fiedler, and our external peer-reviewer Veronica A. Douglas as well as the Lead Pipe Editors, including Jessica Schomberg, Publishing Editor.

References

Acevedo, A. (2023). My mother is the first one in her family to leave the Dominican Republic for the United States. In S. Guzmán (Ed.), Daughters of Latin America: An international anthology of writing by Latine women (pp. 181-182). HarperCollins Publishers.

Bernal, D. D. (2006). Learning and living pedagogies of the home: The mestiza consciousness of Chicana students. In D. D. Bernal, C. A. Elenes, F. E. Godinez, & S. Villenas (Eds.), Chicana/Latina education in everyday life: Feminista perspectives on pedagogy and epistemology (pp. 113–132). State University of New York Press. https://doi.org/10.2307/jj.18255003.15

Bernal, D.D., Flores A.I., Gaxiola Serrano, T.J., & Morales, S. (2023). An introduction: Chicana/Latina feminista pláticas in educational research. International Journal of Qualitative Studies in Education. https://doi.org/10.1080/09518398.2023.2203113

Brown, O. (2023). BIPOC women and imposter syndrome: Are we really imposters? Urban Institute of Mental Health. https://www.urbanmh.com/uimhblog/bipoc-women-and-imposter-syndrome-are-we-really-imposters

California State University, East Bay. (2024). 2024 Facts. https://www.csueastbay.edu/about/files/docs/2024-factsbk.pdf

Carmona, J.F., Hamzeh, M., Delgado Bernal, D., & Hassan Zareer, I. (2021). Theorizing knowledge with pláticas: Moving toward transformative qualitative inquiries. Qualitative Inquiry, 27(10), 1213–1220. https://doi.org/10.1177/10778004211021813

Castillo, F. E., García, G., Rivera, A. M., Hinostroza, A. P. M., & Toscano, N. M. (2023). Collectively building bridges for first-generation working-class students: Pláticas centering the pedagogical practices of convivencia in El Puente Research Fellowship. Teaching Sociology, 51(3), 288–300. https://doi.org/10.1177/0092055X231174511

Dueñas, M., & Gloria, A. M. (2020). Pertenecemos y tenemos importancia aquí! Exploring sense of belonging and mattering for first-generation and continuing-generation Latinx undergraduates. Hispanic Journal of Behavioral Sciences, 42(1), 95–116. https://doi.org/10.1177/0739986319899734

Ellis, C., Adams, T. E. & Bochner, A. P. (2010). Autoethnography: An overview. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 12(1). https://www.qualitative-research.net/index.php/fqs/article/view/1589

Ettarh, F. (2018, January 10). Vocational awe and librarianship: The lies we tell ourselves. In The Library With The Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2018/vocational-awe/

Exec. Order No. 14151, 90 FR 8339 (2025). https://www.federalregister.gov/d/2025-01953

Exec. Order No. 14173, 90 FR 8633 (2025) https://www.federalregister.gov/d/2025-02097

Excelencia in Education (2024). Why does the Seal of Excelencia matter? https://www.edexcelencia.org/seal-excelencia

Fernandez, L. R., Girón, S. E., Killoren, S. E., & Campione-Barr, N. (2023). Latinx college students’ anxiety, academic stress, and future aspirations: The role of sibling relationship quality. Journal of Child and Family Studies, 32(7), 1936–1945. https://doi.org/10.1007/s10826-022-02474-z

Fierros, C. O., & Bernal, D. D. (2016). Vamos a platicar: The contours of pláticas as Chicana/Latina feminist methodology. Chicana/Latina Studies, 15(2), 98–121. http://www.jstor.org/stable/43941617

Freire, P. (2014). Pedagogy of the oppressed (M. B. Ramos, Trans.; 30th anniversary edition.). Bloomsbury Academic, an imprint of Bloomsbury Publishing.

Garcia, N. M., & Bernal, D. D. (2021). Remembering and revisiting pedagogies of the home. American Educational Research Journal, 58(3), 567–601. https://doi.org/10.3102/0002831220954431

Ginwright, S. (2018, May 31). The future of healing: Shifting from trauma informed care to healing centered engagement. Medium. https://ginwright.medium.com/the-future-of-healing-shifting-from-trauma-informed-care-to-healing-centered-engagement-634f557ce69c

Guajardo, F., & Guajardo, M. (2013). The power of plática. Reflections (Baltimore, Md.), 13(1). https://reflectionsjournal.net/archive/

Guzmán, S. (2023). Daughters of Latin America : An international anthology of writing by Latine women (First edition.). Amistad, an imprint of HarperCollinsPublishers.

hooks, b. (1994). Teaching to transgress: Education as the practice of freedom. Routledge.

Manzano-Sanchez, H., Matarrita-Cascante, D., & Outley, C. (2019). Barriers and supports to college aspiration among Latinx high school students. Journal of Youth Development (Online), 14(2), 25–45. https://doi.org/10.5195/jyd.2019.685

Manzano Sanchez, H., Outley, C., Matarrita-Cascante, D., & Gonzalez, J. (2023). Personal and contextual variables predicting college aspirations among Latinx high school students. Voces y Silencios, Revista Latinoamericana de Educación, 13(2), 88–114. https://doi.org/10.18175/VyS13.2.2022.10

Muralles, D.C. & Varko Fontana, V. (2024, Dec. 13). PRAXISioners platicando: Fostering belonging through culturally centered learning. [Presentation]. METRO’s Reference and Instruction Interest Group.

Latino students in higher education (2022, September). Postsecondary National Policy Institute. PNPI.org. https://pnpi.org/wp-content/uploads/2022/09/LatinoStudentsFactSheet_September_2022.pdf

Parking and Transportation Services (2023). FY 2022-2023 CSUEB Survey results. Internal CSUEB report: unpublished.

Pierre, E. (2023, August 4). Starting your career as the only BIPOC on your team. HBR. https://hbr.org/2023/08/starting-your-career-as-the-only-bipoc-on-your-team

Reyes, G. T. (2021, December 3). Critical race, decolonial, culturally sustaining pedagogy. [Gathering]. California State University, East Bay.

Reyes, G. T. (2024). Nice for whom? A dangerous, not-so-nice, critical race love letter. Education Sciences, 14(5), 508. https://doi.org/10.3390/educsci14050508

Appendix A

Questions

The following were question prompts we brainstormed to help us engage in a reflection process that occurred in the summer after the original pilot project was completed.

How would you rate the experience you had at these events?
Which parts of our book club did you find meaningful?
What inspired this idea of book club? The book selected?
How did you find yourself preparing for the book club sessions?
Did what you read or our conversations prompt any questions? How did you follow-up, did you investigate your questions further? (i.e. on your own, with others)
Please share something that you walked away with that really resonated with you and your cultural identity.
What new skills or knowledge did you gain from this experience?
How did the book club increase visibility of the LSSC, the library?
How did you connect to the culture (music, art, literature) celebrated and presented in the topics covered during book club sessions?
To what extent did you feel comfortable being yourself during book club? What contributed to your comfort level and sharing about yourself during book club sessions?
Is there anything you learned from each other? Is there anything that stood out/learned from the participants? What was your impression of how the participants experienced the book club?
Is there anything we learned about ourselves and our role as educators, role models, individuals in these bodies that are placed as mentors in this environment? In our roles as women in academia? In our roles as Latinx/Brown/Indigenous women?
How do we want to approach the next book club? What are some lessons learned?
How can someone reproduce something like this in their environment?

LIL Awarded AALL Public Access to Government Information Award / Harvard Library Innovation Lab

This past week members of the Library Innovation Lab team traveled to Portland, Oregon to receive the Public Access to Government Information Award from AALL for our data.gov archive.

This award is given every year at the American Association of Law Libraries’ annual meeting to “recognize persons or organizations that have made significant contributions to protect and promote greater public access to government information.”

Image of LIL team members and other HLSL colleagues at the AALL Annual Meeting

The Harvard Law School Library has collected government records and made them available to patrons for centuries, and we are proud to have our contribution to this work recognized by our colleagues at AALL.

Open Data Editor Training in Brazil: Empowering Transparency and Innovation / Open Knowledge Foundation

The in-person training at the Federal University of Ceará (UFC) aimed to ensure adherence to the Brazilian Access to Information Law, promoting both active and passive transparency.

The post Open Data Editor Training in Brazil: Empowering Transparency and Innovation first appeared on Open Knowledge Blog.

2025-07-25: Feature Engineering with Shallow Features and Methods / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

Creating synthesized instances with SMOTE (from Figure 3 in Wongvorachan et al.)

Jason Brownlee gave the definition of feature engineering as follows: "feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data". In other words, feature engineering is manually designing what the input Xs should be.

Feature engineering is regarded as key to success in applied machine learning. "Much of the success of machine learning is actually success in engineering features that a learner can understand", as pointed out by Scott Locklin.

There are many application scenarios for feature engineering, including loan application fraud detection/prevention, recommendation system user behavior modeling and disease diagnosis/risk prediction, etc. In a loan application fraud prevention program, data scientists can decide whether a user is reliable with features based on user's basic information, credit history and other information. A recommendation system can analyze a user's behavior features such as materials clicked by the user in the past few months, positive or negative reactions, user type and so on, to decide the user's most interested topics.

A feature is an attribute useful for the modeling task, but not all attributes can be used as features. In general, for most industrial modeling, expert knowledge is important for feature creation. For example, in a loan application fraud prevention program, experiences from the risk control department will be very helpful. There are potentially hundreds of features based on a user's basic information, credit report, and assets, but not all of them will be used in modeling. Expert knowledge can help data scientists quickly perform feature construction and screening.

1. Feature Describing

This step provides a general understanding of the dataset. We explore max, min, mean, standard deviation values of features, understand the tendency, dispersion, or distribution and find missing values, outliers, and duplicate values. This step serves as the preparatory work for the next steps.

2. Feature Processing

The foundation of feature engineering is feature processing, which is time-consuming and directly associated with data quality. It includes operations such as data-cleaning, standardization, and resampling, and aims to transform raw data into a format suitable for model training.

2.1 Data-cleaning

Data-cleaning generally processes missing values, outliers, and inconsistencies to ensure the accuracy of data.

Some features may contain missing values because of the lack of observations. Missing values are typically processed in the following ways:

Drop directly. We can choose to drop the whole sample (row) or the feature (column) containing the missing value.

Fill with other values. We can fill the missing values with a constant such as 0，9999， -9999, -99, etc.

data['feature'] =data['feature'].fillna('-99')

Or we can fill the missing values with the mean, mode, previous value, or next value.

data['feature'] = data['feature'].fillna(data['feature'].mean()))

Fill with interpolation

data['feature'] = data['feature'].interpolate()

Fill with KNN

from fancyimpute import BiScaler, KNN, NuclearNormMinimization, SoftImpute

dataset = KNN(k=3).complete(dataset)

The most frequently used method is to drop directly or fill with mean values.

Outliers are identified based on interquartile range, mean and standard deviation. In addition, points whose distance from most points is greater than a certain threshold are considered outliers. The main distance measurement methods used are absolute distance (Manhattan distance), Euclidean distance, and Mahalanobis distance.

We need to process outliers to reduce noise and improve data quality. Typical strategies for processing outliers include: directly deleting outliers when they have a significant impact on the analysis results, treating outliers as missing values and using previous methods for missing values to fill them out, or keeping the outliers when they are considered to be important.

Duplicate values refer to identical samples from different sources, which will waste storage space and reduce data processing efficiency. The most common way is to drop duplicates completely or partially based on experience.

2.2 Resampling

Class imbalance refers to the situation where the number of samples in different categories of the training set is significantly different. Machine learning methods generally assume that the number of samples of positive and negative classes is close. However, in the real world, we often observe class imbalance. There are some extreme cases: 2% of credit card accounts are fraudulent every year, and online advertising conversion rate is in the range of 10^-3 to 10^ -6 and so on. Class imbalance can cause prediction results of models to be biased towards the majority class and thus lower prediction power.

We can mitigate class imbalance by oversampling the minority class or undersampling the majority class. When the original dataset is huge, undersampling is a good choice that randomly deletes samples in the majority class to make the number of samples in the two classes equal. When the dataset is small, we prefer to use oversampling. One practice is to resample repeatedly in the minority class to increase the number in the minority class, until it equals the number in the majority class, which has high a risk of over-fitting. A better way is to use SMOTE (Synthetic Minority Over-sampling Technique), in which synthetic instances of the minority class are generated by interpolating feature vectors of neighboring instances, effectively increasing their representation in the training data. To be specific, SMOTE picks a sample point x in the minority class, and randomly picks a point x' from its k nearest neighbors. Then the synthetic instance will be created by the formula x_new = x + (x'-x)*d, where d is in the range [0,1]. Three figures from Wongvorachan et al. shown below demonstrate the three methods more intuitively.

Figure 1. Random oversampling (Figure 1 in Wongvorachan et al.)

Figure 2. Random undersampling (Figure 2 in Wongvorachan et al.)

Figure 3. SMOTE (Figure 3 in Wongvorachan et al.)

Table 1 shows the operating principle, advantages, and drawbacks of each resampling technique. The methods are commonly used, but we also need to emphasize the disadvantages: random oversampling increases the likelihood of overfitting, random undersampling can only keep partial information in the original dataset, SMOTE potentially creates noise in the dataset.

Table 1. The comparison of resampling techniques (Table 1 in Wongvorachan et al.)

A more straightforward way to mitigate class imbalance is Class Weights, which assigns a class weight to each class in the training set. If the number of samples in this class is large, then its weight is low, otherwise the weight is high. There is no need to generate new samples with this method, we just need to adjust the weights in the loss function.

2.3 Feature Transformation

Different features have different scales and ranges. Eliminating scale differences between different features can put data on the same scale and make them numerically comparable.

StandardScaler transforms data into a distribution with a mean of 0 and a standard deviation of 1 by Z-score normalization. Similarly, MinMax scaling normalizes all features to be within 0 and 1. To be specific, StandardScaler obtains the mean and standard deviation of the training data, and then uses these statistics to make a Z-score normalization with the following formula:

$<semantics> μ </semantics>$ $μ is$ mean and σ is standard deviation.

MinMaxScaler obtains the maximum and minimum values of the training data, and then transforms data with the following formula:

<semantics> </semantics>

Feature transformation has different impacts on different models. It has a great impact on SVM (support vector machine) and NN (nearest neighbor) which are based on distances in a Euclidean space, but has little impact on tree models such as random forest or XGBoost.

With a broader definition of feature engineering, the generation of embeddings which represent latent features is also regarded as feature engineering. Latent features follow a different set of methodologies. In this article, we only focus on the narrow definition of feature engineering, where shallow features are selected based on expert knowledge, and data is processed with the methodology discussed above. In real-world practice, especially in industry, successful feature engineering is essential for the model's performance.

- Xin

Meta: Slow Blogging Ahead / David Rosenthal

Source

There will be fewer than usual posts to this blog for a while. I have to write another talk for an intimidating audience, similar to the audience for my 2021 Talk at TTI/Vanguard Conference. That one took a lot of work but a few months later it became my EE380 Talk. That in turn became by far my most-read post, having so far gained 522K views. The EE380 talk eventually led to the invitation for the upcoming talk. Thus I am motivated to focus on writing this talk for the next few weeks.

Wikipedia's description of the image is:

Titivillus, a demon said to introduce errors into the work of scribes, besets a scribe at his desk (14th century illustration)

The Selling Of AI / David Rosenthal

Not AI, just a favorite

On my recent visit to London I was struck by how many of the advertisements in the Tube were selling AI. They fell into two groups, one aimed at CEOs and the other at marketing people. This is typical, the pitch for AI is impedance-matched to these targets:

The irresistible pitch to CEOs is that they can "do more with less", or in other words they can lay off all these troublesome employees without impacting their products and sales.
Marketing people value plausibility over correctness, which is precisely what LLMs are built to deliver. So the idea that a simple prompt will instantly generate reams of plausible collateral is similarly irresistible.

In The Back Of The AI Envelope I explained:

why Sam Altman et al are so desperate to run the "drug-dealer's algorithm" (the first one's free) and get the world hooked on this drug so they can supply a world of addicts.

You can see how this works for the two targets. Once a CEO has addicted his company to AI by laying off most of the staff, there is no way he is going to go cold turkey by hiring them back even if the AI fails to meet his expectations. And once he has laid off most of the marketing department, the remaining marketeer must still generate the reams of collateral even if it lacks a certain something.

Below the fold I look into this example of the process Cory Doctrow called enshittification.

The first thing to note is that the pitch is working. The discourse is full of CEOs talking their book. For example we have Matt Novak's Billionaires Convince Themselves AI Chatbots Are Close to Making New Scientific Discoveries recounting the wisdom of Travis Kalnick:

“I’ll go down this thread with [Chat]GPT or Grok and I’ll start to get to the edge of what’s known in quantum physics and then I’m doing the equivalent of vibe coding, except it’s vibe physics,” Kalanick explained. “And we’re approaching what’s known. And I’m trying to poke and see if there’s breakthroughs to be had. And I’ve gotten pretty damn close to some interesting breakthroughs just doing that.”

Then there are the programmers extolling "vibe coding" and how it increases their productivity. CEOs who buy this pitch are laying off staff left and right. For example, Jordan Novote reports that Microsoft laying off about 9,000 employees in latest round of cuts:

Microsoft said Wednesday that it will lay off about 9,000 employees. The move will affect less than 4% of its global workforce across different teams, geographies and levels of experience, a person familiar with the matter told CNBC.
...
Microsoft has held several rounds of layoffs already this calendar year. In January, it cut less than 1% of headcount based on performance. The 50-year-old software company slashed more than 6,000 jobs in May and then at least 300 more in June.

How well is this likely to work out? Evidence is accumulating that AI's capabilities are over-hyped. Thomas Claiburn's AI models just don't understand what they're talking about is an example:

Asked to explain the ABAB rhyming scheme, OpenAI's GPT-4o did so accurately, responding, "An ABAB scheme alternates rhymes: first and third lines rhyme, second and fourth rhyme."

Yet when asked to provide a blank word in a four-line poem using the ABAB rhyming scheme, the model responded with a word that didn't rhyme appropriately. In other words, the model correctly predicted the tokens to explain the ABAB rhyme scheme without the understanding it would have needed to reproduce it.

The problem with potemkins in AI models is that they invalidate benchmarks, the researchers argue. The purpose of benchmark tests for AI models is to suggest broader competence. But if the test only measures test performance and not the capacity to apply model training beyond the test scenario, it doesn't have much value.

Source

As far as I know the only proper random controlled trial of AI's productivity increase comes from Model Evaluation and Threat Research entitled Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity:

16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early 2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%--AI tooling slowed developers down.

David Gerard notes:

Even the devs who liked the AI found it was bad at large and complex code bases like these ones, and over half the AI suggestions were not usable. Even the suggestions they accepted needed a lot of fixing up.

This might be why Ashley Stewart reported that Microsoft pushes staff to use internal AI tools more, and may consider this in reviews. 'Using AI is no longer optional.':

Julia Liuson, president of the Microsoft division responsible for developer tools such as AI coding service GitHub Copilot, recently sent an email instructing managers to evaluate employee performance based on their use of internal AI tools like this.

"AI is now a fundamental part of how we work," Liuson wrote. "Just like collaboration, data-driven thinking, and effective communication, using AI is no longer optional — it's core to every role and every level."

Liuson told managers that AI "should be part of your holistic reflections on an individual's performance and impact."

Source

If the tools were that good, people would use them without being threatened. If the tools were that good, people would pay for them. But Menlo Ventures found that only 3% of consumers pay anything. They are happy to use free toys but they have other spending priorities. Other surveys have found numbers up to 8%, but as Ted Gioia notes in The Force-Feeding of AI on an Unwilling Public:

Has there ever been a major innovation that helped society, but only 8% of the public would pay for it?

Gioia didn't want AI but as an Office 365 user he didn't have that option:

AI is now bundled into all of my Microsoft software.

Even worse, Microsoft recently raised the price of its subscriptions by $3 per month to cover the additional AI benefits. I get to use my AI companion 60 times per month as part of the deal.

Source

Microsoft didn't ask their customer whether they would pay for AI, because the answer would have been no. Gioia writes:

This is how AI gets introduced to the marketplace—by force-feeding the public. And they’re doing this for a very good reason.

Most people won’t pay for AI voluntarily—just 8% according to a recent survey. So they need to bundle it with some other essential product.

As I discussed in The Back Of The AI Envelope, the AI giants running the drug-dealer's algorithm are losing money on every prompt. Gioia has noticed this:

There’s another reason why huge tech companies do this—but they don’t like to talk about it. If they bundle AI into other products and services, they can hide the losses on their income statement.

That wouldn’t be possible if they charged for AI as a standalone product. That would make its profitability (or, more likely, loss) very easy to measure.

Shareholders would complain. Stock prices would drop. Companies would be forced to address customer concerns.

But if AI is bundled into existing businesses, Silicon Valley CEOs can pretend that AI is a moneymaker, even if the public is lukewarm or hostile.

Salesforce is another company that has spotted this opportunity:

Yesterday Salesforce announced that prices on a pile of their services are going up around 6% — because AI is just that cool.

Salesforce’s stated reason for the price rise is “the significant ongoing innovation and customer value delivered through our products.” But you know the actual reason is because f- you, that’s why. What are you gonna do, move to SAP? Yeah, didn’t think so.

One problem is that the technology Salesforce is charging its customers for doesn't work well in Salesforce's application space. Salesforce's own researchers developed a new bechmark suite called CRMAArena-Pro:

CRMArena-Pro expands on CRMArena with nineteen expert-validated tasks across sales, service, and 'configure, price, and quote' processes, for both Business-to-Business and Business-to-Customer scenarios. It distinctively incorporates multi-turn interactions guided by diverse personas and robust confidentiality awareness assessments. Experiments reveal leading LLM agents achieve only around 58% single-turn success on CRMArena-Pro, with performance dropping significantly to approximately 35% in multi-turn settings. While Workflow Execution proves more tractable for top agents (over 83% single-turn success), other evaluated business skills present greater challenges. Furthermore, agents exhibit near-zero inherent confidentiality awareness; though targeted prompting can improve this, it often compromises task performance.

Huang et al Table 2

To summarize the results:

The agent bots had 58% success on tasks that can be done in one single step. That dropped to 35% success if they had to take multiple steps. The chatbot agents are also bad at confidentiality:

Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance. These findings suggest a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios.

Despite the fact that most consumers won't pay the current prices, it is inevitable that once the customers are addicted, prices will go up spectacularly. But the wads of VC cash may not last long enough, and things can get awkward with the customers who are paying the current prices, as David Gerard reports:

You could buy 500 Cursor requests a month for $20 on the “Pro” plan. People bought a year in advance.

In mid-June, Cursor offered a new $200/month “Ultra” plan. But it also changed Pro from 500 requests to $20 of “compute” at cost price — the actual cost of whichever chatbot vendor you were using. That was a lot less than 500 requests.

You could stay on the old Pro plan! But users reported they kept hitting rate limits and Cursor was all but unusable.

The new plan Pro users are getting surprise bills, because the system doesn’t just stop when you’ve used up your $20. One guy ran up $71 in one day.

Anysphere has looked at the finances and stopped subsidising the app. Users suddenly have to pay what their requests are actually costing.

Anysphere says they put the prices up because “new models can spend more tokens per request on longer-horizon tasks” — that is, OpenAI and Anthropic are charging more.

The CEO who laid off the staff faces another set of "business risks". First, OpenAI is close to a monopoly; it has around 90% of the chatbot market. This makes it a single point of failure, and it does fail:

On June 9 at 11:36  PM PDT, a routine update to the host Operating System on our cloud-hosted GPU servers caused a significant number of GPU nodes to lose network connectivity. This led to a drop in available capacity for our services. As a result, ChatGPT users experienced elevated error rates reaching ~35% errors at peak, while API users experienced error rates peaking at ~25%. The highest impact occurred between June 10 2:00  AM PDT and June 10 8:00  AM PDT.

Second, the chatbots present an attractive attack surface. David Gerard reports on a talk at Black Hat USA 2024:

Zenity CEO Michael Bargury spoke at Black Hat USA 2024 on Thursday on how to exploit Copilot Studio:

Users are encouraged to link “public” inputs that an attacker may have control over.

A insider — malicious or just foolish — can feed their own files to the LLM.

If you train the bot on confidential communications, it may share them with the whole company.

63% of Copilot bots are discoverable online, out on the hostile internet. Bargury fuzzed these bots with malformed prompts and got them to spill confidential information.

Bargury demonstrated intercepting a bank transfer between a company and their client “just by sending an email to the person.”

So the technology being sold to the CEOs isn't likely to live up to expectations and it will cost many times the current price. But the way it is being sold means that none of this matters. By the time the CEO discovers these issues, the company will be addicted.

Testing the Summon Research Assistant / Library | Ruth Kitchin Tillman

Early this spring, Ex Libris released the Summon “Research Assistant.” This search tool is Retrieval Augmented Generation, using an LLM tool (OpenAI’s GPT–4o mini at time of writing) to search and summarize metadata in their Summon/Primo Central Discovery Index.

We did a library-wide test mid-semester and decided that it’s not appropriate to turn it on now. We may do so when some bugs are worked out. Even then, it is not a tool we’d leave linked in the header, promote as-is, or teach without significant caveats (see Reflection).

Brief Overview of the Tool

This overview is for the Summon version, though I believe that the Primo version is pretty similar and it has some of the same limitations.

From the documentation:

Query Conversion – The user’s question is sent to the LLM, where it is converted to a Boolean query that contains a number of variations of the query, connected with an OR. If the query is non-English, some of the variations will be in the query language, and the other variations will be in English.

Results Retrieval – The Boolean query is sent to CDI to retrieve the results.

Re-ranking – The top results (up to 30) are re-ranked using embeddings to identify five sources that best address the user’s query.

Overview Creation – The top five results are sent to the LLM with the instructions to create the overview with inline references, based on the abstracts.

Response Delivery – The overview and sources are returned to the user in the response.

There is one major caveat to the above, also in the documentation, which is the content scope. Once you get through all the exceptions,¹ only a slice of the CDI could make it into the top 5 results. Most notably, records from any of the following content providers are not included:

APA,
DataCite,
Elsevier,
JSTOR, and
Conde Nast.

These would be in the results you get when clicking through to “View related results,” but they could not make it into the “Top 5.”

Positive Findings

I would summarize the overall findings as: extremely mixed. As I said up front, we had enough concerns that we didn’t want to simply turn on the tool and encourage our wider base to try it out.

Sometimes, people got really interesting or useful results. When it worked well, we found the query generation could come up with search strings that we wouldn’t have thought of but got good results. I found some electronic resources about quilts that I didn’t know we had – which is saying something!

Some of the ways the tool rephrased research questions as suggested “related research questions” were also useful. A few people suggested that this could be used to help students think about the different ways one can design and phrase a search.

The summaries generally seemed accurate to the record abstracts. I appreciated that they were cited in a way that let me identify which item was the source of which assertion.²

We also had many concerns.

Massive Content Gaps (and Additions)

The content gaps are a dealbreaker all on their own. No JSTOR? No Elsevier? No APA? Whole disiplines are missing. While they do show up in the “View related results,” those first 5 results matter a lot in a user’s experience and shape expectations of what a further search would contain. If someone is in a field for which those are important databases, it would irresponsible to send them to this tool.

The need for abstracts significantly limits which kinds of results get included. Many of our MARC records do not have abstracts. For others, one may infer the contents of the book from a table of contents note, but this requires levels of abstraction and inference which a human can perform but this tool doesn’t.

Then there’s the flip side of coverage. This is based on the Ex Libris CDI (minus the massive content gaps), which includes everything that we could potentially activate. At time of writing, it still doesn’t seem possible to scope to just our holdings (and include our own MARC). This means results include not only the good stuff we’d be happy to get for a patron via ILL but also whatever cruft has made its way into the billion+ item index. And that’s not a hypothetical problem. In one search we did during the session, so much potential content was in excluded JSTOR collections that a top 5 result on the RAG page was an apparently LLM-generated Arabic bookseller’s site.³

LLM Parsing / Phrasing

The next issue we encountered was that sometimes the LLM handled queries in unexpected⁴ ways.

Unexpected Questions

First, the Research Assistant is built to only answer a specific type of question. While all search tools can be described that way, anyone who’s worked for more than 30 seconds with actual humans knows that they don’t always use things in the way we intend. That’s why we build things like “best bet” style canned responses to handle searches for library hours or materials with complicated access (like the Wall Street Journal).

It was not programmed to do anything with single word searches. A search for “genetics,” for example got the “We couldn’t generate an answer for your question” response. There wasn’t any kind of error-handling on the Ex Libris side to turn it into some kind of “I would like to know about [keyword],” even as a suggestion provided in the error message. For all my critiques of LLMs themselves, sometimes it’s just poor edge case handling.
Then there were the meta questions. Colleagues who staff our Ask-a-Librarian brought in a few that they’ve gotten: “Do you have The Atlantic?” or “What is on course reserve for XXXXX?” In both of those cases, the tool was not able to detect that this was not the kind of question it was programmed to answer. In both cases, it returned a few random materials and generated stochastic responses which were, of course, completely inaccurate.

LLM-Induced Biases

Then there were issues introduced by the nature of LLMs – how they tokenize and what kind of data they’re trained on:

A liaison librarian reported asking about notable authors from Mauritius and being given results for notable authors from Mauritania. I would guess this is a combination of stemming and lack of responses for Mauritius. But they are two very distinct countries, in completely different regions of a continent (or off the continent).
Another bias-triggering question related to Islamic law and abortion. The output used language specific to 20th/21st evangelical Christianity. Because LLMs are configured not to output the same result twice, we could not replicate it, but instead got a variety of different phrasings of results of varying quality. This is a (not-unexpected) bias introduced by the data the LLM was trained on. Notably, it was not coming from the language of the abstracts.

Balancing Safety and Inquiry

Note: While I was finishing this blog post, the ACRLog published a post going into more detail about topics blocked by the “safeguards”. I brought this to our library-wide discussion but I’m going to refer readers to the above. Basically, if you ask about some topics, you won’t get a response. Even though some of these are the exact kind of thing we expect our students to be researching.⁵

When the Summon listserv was discussing this issue in the spring, I went and found the OpenAI Azure documentation for content filtering. They have a set of different areas that people can configure:

Hate and fairness
Sexual
Violence
Self-Harm
Protected material
- Copyrighted text (actual text of articles, etc.) which can be output without proper citation
- Code harvested from repositories and returned without citation
User prompt attacks
Indirect attacks
Groundedness (how closely it sticks to training data and how much it goes into statistically probable text output)

Configuration levels can be set at low, medium, and high for each. I shared the link and list of areas on the listserv and asked about which the Research Assistant uses but did not get an answer from Ex Libris.

Steps to Delivery

This next part relates to the idea of the Research Assistant itself, along with Ex Libris’s implementation.

Very, very few of our patrons need just a summary of materials (and, again, results of only materials which happen to have an abstract, and of only the abstract not the actual materials). Plenty of our patrons don’t need that at all. Unless they’re going to copy-paste the summary into their paper and call it a day, they actually need to get and use the materials.

So once they’ve found something interesting, what are their next steps?

Well, first you click the item.

Search results with 5 item citations above a summary

Then you click Get It.

The first item citation has been expanded and shows a Get It button

Then THAT opens a Quick Look view.

A sidebar has opened on the right of the screen with a full citation. There is no clear place to click but the title is a link

Then you click the title link on the item in the Quick Look View.

A results page which says you are looking for the book and offers a button to get it via Interlibrary Loan

And oh look this was in the CDI but not in our holdings, so it’s sent me to an ILL page (this was not planned, just how it happened).

Maybe ExLibris missed the memo, but we’ve actually been working pretty hard to streamline pathways for our patrons. The fewer clicks the better. This is a massive step backward.

Reflection

I doubt this would be of any utility for grad students or faculty except as another way of constructing query strings. I do think it’s possible to teach with this tool, as with many other equally but differently broken tools. I would not recommend it at a survey course level. Is it better than other tools they’re probably already using? Perhaps, but the bar is in hell.

Optimal use requires:

Students to be in a discipline where there’s decent coverage.
Students to know that topical and coverage limitations exist.
Students to understand the summaries are the equivalent of reading 5 abstracts at once and that there may be very important materials in the piece itself.
Students to actually click through to the full list of results.
Ex Libris to let us search only our own index (due to the cruft issue).
Ex Libris to redesign the interface with a shorter path to materials.

Its greatest strength as a tool is probably the LLM to query translation and recommendations for related searches. When it works. But with all those caveats?

I am not optimistic.

FWIW, I totally understand and support not including News content in this thing. First, our researchers are generally looking for scholarly resources of some kind. Second, bias city. ↩︎
These citations are to the abstract vs. the actual contents. This could cause issues if people try to shortcut by just copy-pasting, since we’re reliant on the abstract to reliably represent the contents (though there’s also no page # citation). ↩︎
A colleague who is fluent in Arabic hypothesized that was not a real bookstore because many small things about the language and site (when we clicked through) were wrong. ↩︎
Ben Zhao’s closing keynote for OpenRepositories goes into how these kinds of issues could be expected. So I’ll say “unexpected” from the user’s POV but also I cannot recommend his talk highly enough. Watch it. ↩︎
Whether ChatGPT can appropriately summarize 5 abstracts from materials related to the Tulsa Race Riots or the Nakba is a whole separate question. ↩︎

Collaborating on a Digital Music Archive: U-M's contributions to the Sounding Spirit Digital Library / Library Tech Talk (U of Michigan)

In 2020, the Emory Center for Digital Scholarship in Atlanta, Georgia reached out to the University of Michigan to contribute to the Sounding Spirit Digital Library (SSDL). They asked the Bentley Historical Library, the U-M Library, and the William L. Clements Library to contribute titles in our collections that would expand their digital collection. This post looks at the range of titles contributed, discusses the equipment used to digitize the titles, and analyses the ways that SSDL and U-M Library align and vary in their digitization efforts.

Trying to rediscover my voice / Jez Cope

I always feel like I have so much to write, so much that I want to write. So why is it that anytime I sit down to actually write my mind goes entirely blank? Even if I have a list of intended topics right in front of me, or a partial draft to work on, I start feeling like I have nothing interesting to say. The only thing I have much success in writing is these long, boring, self indulgent walls of text about how I feel about not being able to write. I don’t know, maybe I should just publish this. At least it will be some of my thoughts out in the world again.

Back in the early days of my blog it seemed to flow quite easily. I was confident, not really that what I had to say was correct, or insightful, or in any way important, but that it was not going to get me into trouble or draw criticism that I couldn’t handle. I naively thought that I could say whatever I wanted without fear.

Since I am a white, well-educated, straight cis man, I was largely right. The only time I felt any discomfort was when I posted a paraphrased recipe from a book and was threatened with legal action by the writer’s agent. While I was certain that legally I was in the clear (you can’t copyright a recipe), and felt that since I was strongly recommending people buy the book it was OK morally too, I deleted the post.

Of course I did. I’ve spent my whole life learning over and over that conflict is bad, it’s my responsibility to resolve it, and it probably is me being unreasonable anyway. Like a lot of ND kids, I learned early on that all these intense feelings and sense impressions (that I believed everyone felt) were not to be acted upon because doing so only brought trouble.

So I think after that experience, benign though it was, I started to doubt whether I could speak freely after all. At the same time I was awakening politically and learning that some of the beliefs I thought were obvious (e.g. that all humans had rich inner lives that affected how they thought/acted) were in fact not that common, and that stating them in particular ways could be seen as somehow controversial. I was also succeeding in my career and starting to internalise what I was told about what I said reflecting on my employer and colleagues.

However it happened, I had lost my voice.

Yes, in the sense of being literally unable to express myself in certain ways, but also in that I felt there was something I once had but had misplaced and desperately wanted to find again without knowing where to look. First it only affected my personal writing, but as the years went by it crept into my professional work too. Working in a large organisation is never not political; what you say affects not only your own standing and influence but also that of your team and department in the wider organisation, which is a heck of a responsibility. By the time I ended up in the public sector I was being regularly reminded that it was my job to remain neutral, impartial, disinterested. Eventually I stopped saying much at all, except to a few trusted friends and colleagues.

This is not my fault, exactly, but it is partly related to some very core parts of my personality. I know that breaking rules feels bad, feels dangerous. So when I’m told implicitly that regardless of the depth of my knowledge or experience my word isn’t good enough —that only things supported by concrete evidence are OK to say— I play it safe.

I see that not everyone plays by those rules, and that some don’t suffer any consequences for breaking them, but I’m unable to discern to my own satisfaction why that is or how to emulate it. Is it because they are brighter or more experienced than I? Are they party to information that explains how they are not, in fact, breaking rules? Do they have better understanding of what rules can or cannot be broken? Are they more mature and self-confident through experience? Are they simply confident because they have the privilege never to have been challenged and the power to carry it off through confidence alone? Maybe some of these are people I don’t want to emulate after all…

Still, there are glimmers of hope. I’m starting to become more aware of contexts where I feel less shackled and more able to express myself. Unsurprisingly, it’s usually when I’m under less pressure (external or internal) to “deliver” some “output” that meets some vague criteria, and when I’m working with people I know well and trust not to judge me personally if we disagree. It’s also easier when I retreat to that shrinking zone where I still feel like I can speak with some authority.

I’m looking for ways to put myself in that context more often. Right now I think that means identifying a small group of trusted colleagues at work that I can bounce ideas around with, and doing more writing in the various communities I find myself in. There was a long while when I didn’t feel I had the authority to speak even about my own lived experience. Two years of therapy, a lot of introspection, and the love of friends and family have brought me to a place where I no longer doubt my own experience of the world (well, not so much as I did — it’s a work in progress), which gives me a place of solid ground to build out from as I re-establish my faith in my skills, experience and judgement in other areas.

Well, this wasn’t the thing I was expecting to write when I started, but here we are. I guess we’ll see how it goes!

DC wall / Ed Summers

2025-07-19: 17th ACM Web Science Conference (WebSci) 2025 Trip Report / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

The 17th ACM Web Science Conference (WebSci 2025) took place at Rutgers University in New Brunswick, New Jersey

The 17th ACM Web Science Conference (WebSci 2025) was held from May 20–23 at Rutgers University in New Brunswick, New Jersey. The theme was "Maintaining a Human-Centric Web in the Era of Generative AI" and highlighted the interdisciplinary nature of Web Science, which examines the complex, reciprocal relationship between the Web and society. This trip report is authored by Kritika Garg and David Calano from the Web Science and Digital Libraries (WSDL) research group at Old Dominion University, who had the pleasure of attending and presenting at the conference.

Tuesday, May 20, 2025

On the first day of the conference, a series of workshops and tutorials were held on cutting-edge topics such as Generative AI, the Human-Centric Web, and Information Security. Tutorials included sessions on using the National Internet Observatory for collecting web data for research and exploring the Meta Content Library as a research tool. We had to choose one workshop or tutorial to attend.

Day 1 of #WebSci25 @WebSciConf kicks off with an exciting lineup of workshops and tutorials!
🔗 https://t.co/ZN7SD9vnq4
— Kritika garg (@kritika_garg) May 20, 2025

Tutorial: Beyond APIs: Collecting Web Data for Research using the National Internet Observatory

The first workshop session was “Beyond APIs”, where members of the National Internet Observatory (NIO) at Northwestern University discussed many of the current issues in interfacing with the Web, collecting data, and ethical concerns of data usage. We at WS-DL often face many of these same challenges when working with APIs of various sites, such as the deprecation of the original Twitter API discussed in the workshop. In the NIO program, users opt into the study and can both voluntarily donate their data and utilize mobile apps and browser extensions which monitor their Web activity and allow researchers to find interesting patterns in user behavior and the interconnectedness of the Web.

At #WebSci2025? Join our "Beyond APIs: Collecting Web Data for Research using the National Internet Observatory" tutorial that addresses the critical challenges of web data collection in the post-API era.
📍Where: ABE 2400 (15 Seminary Place)
⏰When: Tue, May 20, 9-12.

1/2 pic.twitter.com/Expoi1BWAn
— Kevin Yang (@yang3kc) May 19, 2025

Workshop: HumanGenAI Interactions: Shaping the Future of Web Science

I, Kritika Garg, participated in the workshop “Human-GenAI Interactions: Shaping the Future of Web Science,” which showcased several fascinating studies.

Lydia Manikonda from Rensselaer Polytechnic Institute presented work on characterizing linguistic differences between human and LLM-generated text using Reddit data from r/explainlikeimfive. They prompted ChatGPT with the same questions as those posed on the subreddit, then compared the top-voted human responses with the AI-generated ones, asking whether readers could distinguish between them and identify the author.

Celia Chen and Alex Leitch from the University of Maryland discussed “Evaluating Machine Expertise,” focusing on how graduate students develop frameworks to assess GenAI content. They noted that LLM-generated content often appears authoritative even without domain expertise. Their research examines whether students build mental models to decide when and how to use LLMs and how these frameworks shift across disciplines. They found that students protect work central to their professional identity, are skeptical of academic LLM content, but trust machine outputs when they can be tested. International students often verify results across languages, such as checking first in English and then confirming in Chinese.

Alexander Bringsjord from Rensselaer Polytechnic Institute explored GenAI’s dual deception based on content and perceived intelligence, highlighting LLM hallucinations and how LLMs blend prior conversation into answers rather than accurately interpreting new documents.

Lydia Manikonda also spoke about the importance of privacy and ethical practices as more companies integrate AI into customer experiences.

Finally, Eni Mustafaraj’s reflections on the Semantic Web and the current state of AI, along with her work on Credbot, left me reflecting on how we might engage with the web and information in the future. The discussion about whether we will continue to visit web pages or shift to new modes of communication felt especially relevant and worth pondering.

How is GenAI reshaping the web and our behavior online?@maidylm, @oshaniws, and Rui Fan are leading a #WebSci25 @WebSciConf workshop on Human–GenAI Interactions: exploring ethical, social, and technical impacts on the web and its users.

🔗https://t.co/Cm6EjulLHA pic.twitter.com/DZftRzCUsV
— Kritika garg (@kritika_garg) May 20, 2025

Wednesday, May 21, 2025

The conference kicked off on Wednesday with opening remarks from General Chair Matthew Weber of Rutgers University. He welcomed attendees to New Jersey and introduced the other chairs. He shared that this year there were 149 submissions from 519 authors across 29 countries, with 59 papers accepted, resulting in an acceptance rate of 39.6%.

Day 2 at #WebSci25!

First day of the main conference. Opening remarks are happening @RutgersU now!
@lifefromalaptop @WebSciDL @WebSciConf pic.twitter.com/rKuOpELo3f
— Kritika garg (@kritika_garg) May 21, 2025

Session 1: Digital Identity & Social Systems

Ines Abbes opened Session 1 with “Early Detection of DDoS Attacks via Online Social Networks Analysis”. They proposed a BERT-based approach for early detection of DDoS attacks by analyzing user reports on Twitter, demonstrating high accuracy and outperforming existing methods. Next, Sai Keerthana Karnam presented “Social Biases in Knowledge Representations of Wikidata separates Global North from Global South.” Their work investigates social biases embedded in Wikidata’s knowledge representations, showing that geographic variations in bias reflect broader socio-economic and cultural divisions worldwide. Xinhui Chen presented “Unpacking the Dilemma: The Dual Impact of AI Instructors’ Social Presence on Learners’ Perceived Learning and Satisfaction, Mediated by the Uncanny Valley”, that explores how adding social presence to AI instructors boosts learners’ perceived learning and satisfaction but also risks triggering uncanny‑valley reactions. Lastly, Ben Treves presented “VIKI: Systematic Cross-Platform Profile Inference of Tech Users”. Their work introduces VIKI, a method that analyzes and compares users’ displayed personas, like personality traits, interests, and offensive behavior, across platforms such as GitHub, LinkedIn, and X, revealing that 78% of users significantly alter how they present themselves depending on the context.

Ines Abbes is starting the first session of #WebSci25 with the presentation “Early Detection of DDoS Attacks via Online Social Networks Analysis”.
📄 DOI: https://t.co/2jgcmJ8Oln
🔗https://t.co/rfcK4I0MSI @WebSciDL @WebSciConf @RutgersU pic.twitter.com/1KtuTPBCAp
— Kritika garg (@kritika_garg) May 21, 2025

Keynote: Mor Naaman

Mor Naaman from Cornell Tech delivered the first keynote of the conference. His talk was titled “AI Everywhere all at Once: Revisiting AI-Mediated Communication”. He reflected on how, when the concept of AI-Mediated Communication (AIMC) was first introduced in 2019, it seemed mostly theoretical and academic. However, in just a few years, AI has become deeply embedded in nearly every aspect of human communication, from personal conversations to professional work and online communities. Mor revisited key studies from the AIMC area, highlighting findings such as how suspicion of AI can undermine trust in interpersonal exchanges, and how AI assistants can subtly influence not only the language and content of our communication but even our attitudes. Given the rapid growth of AI technologies like ChatGPT, he proposed an updated understanding of AIMC’s scope and shared future research directions, while emphasizing the complex challenges we face in this evolving landscape. His talk highlighted the profound and often subtle ways AI is transforming our communication, not just in what we say, but how we think and connect with one another. It made me wonder about the future of communication as AI becomes increasingly integrated into our daily interactions, raising important questions about how we can preserve authenticity and trust amid this rapid technological rise.

Day 2 at #WebSci25!

Keynote by Mor Naaman (@informor) from @cornell_tech is underway.

He’s discussing AI-Mediated Communication (AIMC) once a theoretical concept, now a reality influencing how we talk, work, & connect online @WebSciConf @WebSciDL @lifefromalaptop @RutgersU pic.twitter.com/UaqRMV94Tv
— Kritika garg (@kritika_garg) May 21, 2025

Session 2: Content Analysis & User Narratives

After lunch, there were two parallel sessions and we attended Session 2 which seemed more aligned with our interests. Jessica Costa started the session with “Characterizing YouTube’s Role in Online Gambling Promotion: A Case Study of Fortune Tiger in Brazil”, which examines how YouTube facilitates the promotion of online gambling, highlighting its societal impact and providing a robust methodology for analyzing similar platforms. Next, Aria Pessianzadeh presented “Exploring Stance on Affirmative Action Through Reddit Narratives”. This study analyzes narratives on Reddit to explore public opinions on affirmative action, revealing how users express support or opposition through personal stories and thematic framing. Ashwin Rajadesingan presented “How Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions”, which study showing how sharing personal stories can motivate people who typically avoid politics to join conversations on Reddit, as these stories resonate more with people and tend to receive more positive engagement than other types of comments. Wolf-Tilo Balke concluded the session with "Scientific Accountability: Detecting Salient Features of Retracted Articles". This study identifies key characteristics of retracted scientific articles, such as citation patterns, language features, and publication metadata, to better understand their impact and improve detection of problematic research. This work offers a new lens to think critically about the credibility of scientific literature, especially in an era of information overload.

Day 2 at #WebSci25!
Session 2: Content Analysis & User Narratives

Jessica Costa from @UFOP is presenting “Characterizing YouTube’s Role in Online Gambling Promotion: A Case Study of Fortune Tiger in Brazil”

📄 https://t.co/gyUcfFrWcp @WebSciDL @WebSciConf @RutgersU pic.twitter.com/r8WLmboX23
— Kritika garg (@kritika_garg) May 21, 2025

Keynote: Lee Giles

Dr. Lee Giles delivered an excellent keynote on the operation and infrastructure of Web crawlers as well as search engines, both general and those created by him. These included numerous *Seer-variant engines, such as ChemXSeer and CiteSeerX. Being a friend of the WS-DL research group, this talk was a nice treat as a current WS-DL student and an incredible resource for other conference participants interested in Web crawlers. Through discussions with other students there, many had attempted to work with or create Web crawlers in the past without realizing the complexity and challenging hurdles they needed to overcome in the process of navigating the modern Web.

Excellent keynote session on spiders, search engines, and Web crawling from @cleegiles today at @WebSciConf #WebSci25 @kritika_garg @ibnesayeed @phonedude_mln @weiglemc @webscidl pic.twitter.com/bPvwhQn4Jy
— David Calano (@lifefromalaptop) May 21, 2025

Lightning Talks & Poster Session

The WebSci ‘25 Lightning Talks were brief presentations meant to advertise and attract audience members to the large selection of posters being presented. As with the session and keynote talks, there was no shortage of interesting work on display.

Great posters and drinks reception @WebSciConf 2025 @RutgersU. Spot two web science founders in this photo. It’s great to see the younger generation here picking up the challenge that we laid down 20 years ago and running with it. pic.twitter.com/Rk00D6EQGB
— Wendy Hall (@DameWendyDBE) May 21, 2025

I, David Calano, presented the poster "GitHub Repository Complexity Leads to Diminished Web Archive Availability", which highlighted the limited availability of Web hosted (i.e., GitHub) software repositories archived to the Wayback Machine. We looked at the page damage of archived repository landing pages and the availability of the archived source files themselves to assess the viability of potentially rebuilding archived software projects.

#WebSci25 Poster Session and Welcome Reception at @RutgersU, @lifefromalaptop from @oducs & @WebSciDL is presenting his paper: “GitHub Repository Complexity Leads to Diminished Web Archive Availability.”@weiglemc @phonedude_mln @WebSciConf pic.twitter.com/nDwTIch7xa
— Kritika garg (@kritika_garg) May 21, 2025

Thursday, May 22, 2025

Session 4: Media Credibility & Bias

The talks from Session 4 were all keenly relevant to today’s evolving political climate. The papers presented in this talk were:

Accuracy and Political Bias of News Source Credibility Ratings by Large Language Models, by Kai-Cheng Yang and Filippo Menczer

DocNet: Semantic Structure in Inductive Bias Detection Models, by Jessica Zhu, Michel Cukier and Iain Cruickshank

SemCAFE: When Named Entities make the Difference–Assessing Web Source Reliability through Entity-level Analytics, by Gautam Kishore Shahi, Oshani Seneviratne and Marc Spaniol

Unite or divide? Biased search queries and Google Search results in polarized politics, by Chau Tong

All of the papers in this talk presented interesting information and findings. For example, in the case of Kai-Cheng Yang and Filippo Menczer’s paper, it is interesting to note the left-wing bias inherent in LLMs and what effect such biases might have. As many Web users, particularly those of younger generations, default to consulting an LLM chat bot for information and rarely conduct further searches or analysis of sources, what happens to an already polarizing society? Likewise, Chau Tong’s paper explored the topic of polarization in search engine results. The DocNet paper by Zhu et al. also provided a good technical exploration of bias detection systems leveraging AI and Python.

Session 7: Online Safety & Policy

Deanna Zarrillo presented “Facilitating Gender Diverse Authorship: A Comparative Analysis of Academic Publishers’ Name Change Policies”, which examines the publicly available name change policies of nine academic journal publishers through thematic content analysis, providing insights into how publishers manage rebranding and transparency during transitions. Tessa Masis presented her work, “Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement” which examines how the #StopAsianHate movement used multilingual posts and K-pop fan culture to build global solidarity and amplify anti-Asian hate messages across different countries and communities online.

I, Kritika Garg, had the pleasure of presenting our work, “Not Here, Go There: Analyzing Redirection Patterns on the Web”. Our research examined 11 million redirecting URIs to uncover patterns in web redirections and their implications on user experience and web performance. While half of these redirections successfully reached their intended targets, the other half led to various errors or inefficiencies, including some that exceeded recommended hop limits. Notably, the study revealed "sink" URIs, where multiple redirections converge, sometimes used for playful purposes such as Rickrolling. Additionally, it highlighted issues like "soft 404" error pages, causing unnecessary resource consumption. The research provides valuable insights for web developers and archivists aiming to optimize website efficiency and preserve long-term content accessibility.

Not Here, Go There: Analyzing Redirection Patterns on the Web from Kritika Garg

Come get Rickrolled by @kritika_garg from @WebSciDL during her paper presentation for "Not Here, Go There: Analyzing Redirection Patterns on the Web" at #WebSci25 !@WebSciConf @weiglemc @phonedude_mln pic.twitter.com/NNi5UnbVj5
— David Calano (@lifefromalaptop) May 22, 2025

Mohammad Namvarpour presented the last presentation of the session, “The Evolving Landscape of Youth Online Safety: Insights from News Media Analysis”, which examines how news stories about keeping kids safe online have changed over the past 20 years, showing that recent coverage focuses more on tech companies and government rules. The authors studied news articles to understand how the conversation about youth online safety has evolved.

Session 9: Contemporary Issues in Social Media

The papers from Session 9 explored a wide range of topics across social media from war, news, and even mental health and safety. “A Call to Arms: Automated Methods for Identifying Weapons in Social Media Analysis of Conflict Zones” by Abedin et al. presented an interesting framework for analyzing and tracking weapons of war and ongoing conflicts in active regions of war through social media platforms, such as Telegram. Their work heavily utilized computer vision and open-source datasets and provides a window into the scale and lethality of ongoing conflicts. The paper by Saxena et al., “Understanding Narratives of Trauma in Social Media”, was incredibly valuable in discussing the effects of trauma and social media on mental health.

Web Science Panel

The Web Science panel consisted of Dame Wendy Hall, Dr. Jim Hendler, Dr. Mathew Webber, Dr. Wolf-Tilo Balke, Dr. Marlon Twyman II, and Dr. Oshani Seneviratne. While the panel went a little over on time and not many questions were able to be asked in session, many were had at the reception after. It was a treat to hear from some of the key founders of the field of Web Science and core creators of the World Wide Web at large. The panel topics and moderated questions took on a broad range of topics across the spectrum of Web Science and it was great to hear the thoughts from such key figures on issues related to social media, AI, political governance, the Semantic Web, and the broad applications of Communication and Social Science to the World Wide Web. Also discussed by Dame Hall and Dr. Hendler was the Web Science Trust, which seeks to advance the field of Web Science and bring together researchers from across the globe.

Web Science Panel responding to attendee questions on a wide range of Web Science topics

Friday, May 23, 2025

Session 10: Platform Governance & User Safety

Session 10 also had a decent variety in terms of content. Two of our favorite papers presented were “Decentralized Discourse: Interaction Dynamics on Mastodon” by Brauweiler et al. and “Is it safe? Analysis of Live Streams Targeted at Kids on Twitch.tv”, by Silva et al. Many of the WS-DL members are fans of new, unique, experimental, and decentralized Web tools and social platforms. Some of our members are active in various Mastodon communities and have even run their own instances. It was exciting to hear some researchers are utilizing Mastodon and other social platforms and how they tackled many of the technical challenges present among them. Like the work of Saxena et al. from Session 9, the work by Silva et al. in researching child safety on the popular streaming platform Twitch is also of great importance for the health and wellbeing of the younger Web population. They found that currently Twitch only has minimal options in place for marking and filtering adult content, and in particular only for select forms of media, and such channels are self-reported as for an adult audience, not automatically tagged as such. Furthermore, even if content is not marked for an adult audience, or explicitly marked for kids or for a younger audience, there is no guarantee of the language used by the streamer or topics discussed in chat to be suitable for younger audiences except through voluntary moderation.

Day 4 at #WebSci25!
Final session of the @WebSciConf, “Platform Governance & User Safety,” is happening now!

Chaired by @ibnesayeed @lifefromalaptop @WebSciDL @RutgersU
— Kritika garg (@kritika_garg) May 23, 2025

Closing Keynote: Dame Wendy Hall

Dame Wendy Hall’s closing keynote was an excellent look through the history of Artificial Intelligence and its relation to the Web. It served as an excellent reminder that progress is not always constant and we tend to alternate between periods of uncertainty and rapid progress that can often blindside us to potential hazards. It was also a reminder of how much Artificial Intelligence relies on the World Wide Web, its users surfing the waves of hyperspace, and the information they share along the way. The collective information of the Web is what comprises AI, without the input of billions of people around the world, there would be no substance to it. Some other great points from the talk were on the dangers and politics surrounding AI research, development, and utilization. Importantly, how much power and control we allow AI to have in our global society and global cooperation (or lack thereof) in regards to AI regulation. The points of this keynote were extremely relevant given the simultaneous release of Anthropic’s Claude 4 LLM model, which in testing was found to engage in blackmail, whistleblowing, and other interesting behaviors.

Day 4 at #WebSci25

Dame Wendy Hall (@DameWendyDBE) from @unisouthampton is now delivering the closing keynote: “Generative AI: Fact or Fiction”

She is unpacking the global hype around GenAI & asking what’s real, what’s not, what comes next#AI @WebSciDL @WebSciConf @RutgersU pic.twitter.com/5dOmmAAQZV
— Kritika garg (@kritika_garg) May 23, 2025

Conference closing

Despite the week’s rainy weather, the conference was well-organized, stimulating, and rewarding. For some, this was a return to a familiar community, while for us it was a valuable first in person conference experience. The opportunity to exchange ideas with colleagues from industry and academia worldwide was truly worthwhile. The dinner at the Rutgers Club was a fitting conclusion, providing space to connect before departing. With the next conference scheduled for Germany, we look forward to continuing these conversations there. Many thanks to the organizers for putting together an excellent event.

Snapshots from our trip — Kritika and David presenting at WebSci 2025, meeting Dame Wendy Hall and Dr. Jim Hendler, the must-have ODU WSDL group photo with our alumnus Dr. Sawood Alam, and a scenic drive back to Virginia

- Kritika Garg (@kritika_garg) and David Calano (@lifefromalaptop)

Join the Community: Shaping ODE’s Future with AI & Open Sessions / Open Knowledge Foundation

Open Data Editor is evolving into a key companion tool designed to support organisations in the early and critical stages of their AI journey. Today, we are announcing a series of four online meetings to reflect together with our communities the project's present and future.

The post Join the Community: Shaping ODE’s Future with AI & Open Sessions first appeared on Open Knowledge Blog.

What if we used Ai as an excuse to provide structured open data in plain text to everyone? / Mita Williams

Libraries can support people developing and using LLMs by providing an alternative to LLMs

Relive The Tech People Want Summit: 2025 Documentation / Open Knowledge Foundation

People want technology for people. Last week, the Open Knowledge Foundation hosted The Tech People Want Summit, bringing together 34 speakers from 19 countries to rethink how technology supports our work in conversational sessions. This wasn’t a summit about coding, but focused on non-technical professionals – including data practitioners, communicators, project managers, and advocates. Once...

The post Relive The Tech People Want Summit: 2025 Documentation first appeared on Open Knowledge Blog.

Whisper-generated transcripts used in presentation of archival video / Jonathan Rochkind

Here at the Science History Institute, we have a fairly small, but growing, body of video/film in our Digital Collections, at present just over 100 items, around 70 hours total.

We wanted to add transcripts/captions to these videos, for accessibility to those who are hearing impaired, for searchability of video transcript content, and for general usability. We do not have the resources to do any manual transcription or even really Quality Assurance, but we decided that OpenAI whisper automated transcription software was of sufficient quality to be useful.

We have implemented whisper-produced transcriptions. We use them for on-screen text track captions; for an accompanying on-the-side transcript; and for indexing for searching in our collection.

I’ll talk about some of the choices we made and things we discovered, including: our experience using whisper to transcribe; implementing a text track for captions in the video screen (and some Safari weirdness with untitled empty track); synchronized transcript elsewhere on the page; improving the default video.js skin/theme; and trying to encourage Google to index transcript text.

Some other interesting videos in our collection

Atomic Energy As a Force For Good (1955)
Putting Scientific Information to Work (1967) (of interest to my librarian colleagues, an early marketing video for ISI (then the “Institute for Scientific Information”) cutting edge citation indexing database. (Go to 31:53 for a weird 60s style recap of important events of the 60s?)
Pick of the Pod (1939)
Proposition 65: Troubled Waters for California (1986)

OpenAI Whisper Hosted API

Many of our library/museum/archives peers use the open source Whisper implementation, or a fork/variation of it, and at first I assumed I would do the same. However, we deploy largely on heroku, and I quickly determined that the RAM requirements (at least for medium and above models, and disk space requirements (a pip install openai-whisper added tens of gigs) were somewhere in between inconvenient and infeasible on the heroku cedar platform, at least for our budget.

These limitations and costs change on the new heroku fir platform, so at first I thought we might have to wait until we migrate there — but then I noticed whisper also existed, of course, on the commercial OpenAI API platform.

This is not exactly the same product as OpenAI whisper, and exactly how it differs is not public. The hosted whisper does not let (or require?) you to choose a model, it just uses whatever it uses. It has fewer options — and in the open source realm, there are forks or techniques with even more options and features, like diarization or attempting to segment multi-lingual recordings by language. With the hosted commercial implementation, you just get what you get.

But on the plus side, it’s of course convenient not to have to provison your own resources. It is priced at $0.006 per minute of source audio, so that’s only around $25 to transcribe our meager 70 hour corpus, no problem, and no problem if we keep adding 70-200 hours of video a year as currently anticipated. If we start adding substantially more, we can reconsider our implementation.

Details of whisper API usage implementation

Whisper hosted API has a maximum filesize of 25 MB. Some of our material is up to two hours in length, and audio tracks simply extracted from this material routinely exceeded this limit. But by using ffmpeg to transcode to the opus encoding in an ogg container, using the opus voip profile optimized for voice, at a 16k bitrate — even 2 hours of video is comfortably under 25MB. This particular encoding was found often recommended on forums, with reports that downsampling audio like this can even result in better whisper results; we did not experiment, but it did seem to perform adequately.

ffmpeg -nostdin -y -i input_video.mp4 -vn -map-metadata -1 -ac 1 -c:a libopus -b:a 16k -application voip ./output.oga

Whisper can take a single source language argument — we have metadata already in our system recording language of source material, so if there is only one listed, we supply that. Whisper can’t really handle multi-lingual content. Almost all of our current video corpus is only English, but we do have one video that is mixed English and Korean, and fairly poor audio quality — whisper API actually refused to transcribe that, actually returning an error message (after a wait). When I tried that with opensource whisper just out of curiosity, it did transcribe it, very slowly — but all the Korean passages were transcribed as “hallucinated” English. So error-ing out may actually be a favor to us.

You can give whisper a “prompt” — it’s not conversational instructions, but is perhaps treated more like a glossary of words used. We currently give it our existing metadata “description” field, and that resulted in successful transcription of a word that never caught on, “zeugmatography” (inventor of MRI initially called it that), as well as correct spelling of “Eleuthère Irénée”. If it’s really just a glossary, we might do even better by taking all metadata fields, and just listing unique words once per word (or even trying to focus on less common words). But for now description as-is works well.

Here’s our ruby implementation, pretty simple, using the ruby-openai gem for convenience.

I had at one point wanted to stream my audio, stored on S3, directly to a HTTP POST to API, without having to download the whole thing to a local temporary copy first. But ruby’s lack of a clear contract/API/shape of a “stream” object strikes again, making interoperability painful. This fairly simple incompat was just the first of many I encountered; patching this one locally just let me onto the next one, etc. One of my biggest annoyances in ruby honestly!

Results?

As others have found, the results of whisper are quite good, better than any other automated tool our staff had experimented with, and we think the benefits to research and accessibility remain despite what errors do eist. There isn’t much to say about all the things it gets right, by listing the things it doesn’t you might get the wrong idea, but it really does work quite well.

As mentioned, it can’t really handle multi-lingual texts
Errors and hallucinations were certainly noticed. In one case it accurately transcribed a musical passage as simply ♪, but oddly labelled it as “Dance of the Sugar Plum Fairies” (it was not). An audience clapping was transcribed as repeated utterances of “ok”. This example might be more troubling: some totally imaginary dialog replacing what is pretty unintelligible dialog in the original.
Perhaps the most troubling noticed is invented copyright attributions, such as © transcript Emily Beynon (apparently a common one?) — and some other names too. Putting imaginary erroneous copyright declarations in is not great. I am contemplating post-processing to strip any cue beginning with ©, which I think can’t possibly be legitimate?
Wide differences in how long the cues are, although consistent within a piece. But some pieces are transcribed with long paragraph-sized cues, and others just phrase by phrase. I am considering post-processing to join tiny phrase cues into sentences, up to so many words.
It seems to not infrequently, well into a video, start losing the synchronization of timing, getting 5, 10, or even 15 seconds behind? This is weird and I haven’t seen it commented upon before. The text is still as correct as ever, so mostly an inconvenience. See for instance at 9:09 in Baseline: The Chemist, definitely annoying. By 10:23 it’s caught up again, but quickly gets behind again, etc.

We don’t really have the resources to QA even our fairly small collection, so we are choosing to follow in the footsteps of WGBH and their American Archive of Public Broadcasting, and publish it anyway, with a warning influenced by theirs:

I think in the post-pandemic zoom world, most users are used to automatically generated captions and all their errors, and understand the deal.

WGBH digitizes around 37K items a year, far more than we do. They also run an instance of FixIt+ for public-contributed “crowd-sourced” transcription corrections. While I believe FixIt+ is open source (or a really old version of it is?) and some other institutions may run it, we don’t think we’d get enough public attention and only have a small number of videos, we can’t really afford to stand up our own FixitPlus even if it is available. But it does seem like there is an unfilled need for someone to run a crowd-hosted FixitPlus to charge a reasonable rate for hosting for someone that only will need a handful a year?

We did implement an admin feature to allow upload of corrected WebVTT, which will be used in preference to the direct ASR (Automated Speech Recognition) ones. As we don’t anticipate this being done in bulk, right now staff just downloads the ASR WebVTT, uses the software of their choice to edit it, and then uploads a corrected version. This can be done for egregious errors as noticed, or using whatever policy/workflow our archival team thinks appropriate. We also have an admin feature to disable transcoding for material it does not work well for, such as multi-lingual, silent, or other problems.

Text Track Captions on Video

We were already using video.js for our video display. It provides API’s based on HTML5 video API’s, in some cases polyfilling/ponyfilling, in some cases just delegating to underlying APIs. It has good support for text tracks. At present, by default it uses ‘native’ text tracks instead of it’s own implementation (maybe only on?) Safari — you can force emulated text tracks, but it seemed advisable to stick to default native. This does mean it’s important to test on multiple browsers, there were some differences in Safari that required workarounds (more below).

So, for text tracks we simply provide a WebVTT file in a <track> element under the <video> element. Auto-generated captions (ASR, or “Automated Speech Recognition”, compare to OCR), don’t quite fit the existing categories of “captions” vs “subtitles” — we label them as kind captions and give them an English label “Auto-captions”, which we think/hope is a common short name for these.

Safari adding extra “Untitled” track for untagged HLS

For those most part, this just works, but there was one idiosyncracy that took me a while to diagnose and determine appropriate fix. We deliver our video as HLS with a .M3U8 playlist. There is a newer metadata element in .m3u8 playlist that can label the presence or absence of subtitles embedded in the HLS. But in the absence of this metadata — Safari (both MacOS and iOS I believe) insists on adding a text caption track called “Untitled”, which in our case will be blank. This has been noticed by some, but not as much discussion on the internet as I’d expect to be honest!

One solution would be adding the metadata saying no text track is present embedded in HLS (since we want to deliver text tracks as external in <track> element instead). Somewhat astoundingly, simply embedding an EXT-X-MEDIA tag with a fixed static value of CLOSED-CAPTIONS=NONE — on AWS Elemental MediaConvert (which I use) seems to takes you into the “Professional Tier” costing 60% more! I suppose you could manually post-process the .m3u8 manifests yourself… including my existing ones…

Instead, our solution is simply, when on Safari, hook into events on video element to remove a text track with empty string language and title, which is what characterizes these. I adapted from similar solution in ramp, who chose this direction. They wrote theirs to apply to “mobile which is not android”; I found it actually was needed on Safari (iOS or MacOS Safari too), and indeed not Android Chrome (or iOS Chrome!).

I lost at least a few days figuring out what was going on here and how to fix it, hopefully you, dear reader, won’t have to!

Synchronized Transcript on page next to video

In addition to the text track caption in the video player, I wanted to display a synchronized transcript on the page next to/near the video. It should let you scroll through the transcript independent of the video, and click on a timestamp to jump there.

Unsure of how best to fit this on the screen with what UX — I decided to look at YouTube and base my design on what they did. (On YouTube, you need to expand description and look for a “show transcript” button at bottom of it — I did make my ‘show transcript’ button easier to find!)

It shows up next the video, or when on a narrow screen right below it. In a ‘window in window’ internal scrolling box. Used some CSS to try to make the video and the transcript fit wholly on the screen at any screen size — inner scrolling window that’s higher than the parent window I consider a UX nightmare to avoid!

Looking at YouTube, I realized that feature that highlighted current cue as the video played was also one I wanted to copy. That was the trickiest thing to implement.

I ended up using the HTML5 media element api and the events emitted by it and associated child objects, based on the text track with cues I had already loaded in my video.js-enhanced html5 video player. I can let the browser track cue changes and listen for events when they change, to highlight current cue.

If a track is set to mode hidden, then the user agent will still track the text cues and emit events for when they change, even though they aren’t displayed. Video.js (and probably native players) by default have UI that toggles between shown and disabled (which does not track cue changes), so I had to write a bit of custom code to switch non-selected text tracks to hidden instead of disabled
- (Some browsers and/or video.js polyfill code may have been emitting cueChange events even on disabled tracks, contrary to or not required by spec — important to test on all browsers!)
After that, it’s just listening to the cueChange HTML5 video event emitted on the track of our auto-captions, to know that we need to de-highlight any old cues, and highlight the new ones.
Had to write code to map from the HTML5 video Cue object returned as active cue, and find the div/span on page to highlight. as simple as putting start time in a data- attribute, and matching it to startTime on Cue — except we’re string-matching, so important to output identically including digits after decimal place etc.
At first I didn’t realize I could use the user-agent’s own cue-tracking code, and was trying to catch an event on every timeUpdate event, and calculate which cues included that timestamp myself. In addition to being way more work than required (the HTML5 video API has this feature for you to use) — safari wasn’t emitting timeUpdate events unless the status bar with current time was actually on screen!
In general, the media element api and events seemed to an area with, for 2025, unusual level of differences between browsers — or at least between more native Safari and more emulated video.js in other browsers. It definitely is important to do lots of cross browser testing. While I use it rarely, when I do I couldn’t do without BrowserStack and its free offerings for open source.

Improved Video Controls

The default video.js control bar seems to me undesirably small buttons and text, and just not quite right in several ways. And there don’t seem to be very many alternative open source theme or skins (video.js seems to use both words for this), and what do exist are often kind of pushing on “interesting” aesthetics instead of being neutral/universal?

Adding the caption button was squeezing the default control bar tight, especially on small screens. With that and the increased attention to our videos that transcripts would bring, we decided to generally improve the UX of the controls, but in a neutral way that was still generic and non-branded. Again, I was guided by both youtube and the ramp player (here’s one ramp example), and also helped by ramp’s implementation (although beware some skin/theme elements are dispersed in other CSS too, not all in this file).

Before (default video.js theme)

After (local tweaked)

Scrubber/progress bar extends all the way across the the screen, above the control bar (ala youtube and ramp)
- Making sure captions stay above the now higher controls was tricky. I think this approaching using translateY works pretty well, but hadn’t seen it before? Also required a bit of safari-specific css for safari’s “native text tracks”. And some nice slide up/down animation on control bar show/hide matching youtube seems nice.
- buttons split between right and left, like again both youtube and ramp. Volume on right only cause it was somewhat easier.
Buttons themselves made bigger by default, and the icons on the buttons take up a larger portion of the button square. (They were all so tiny before!)
Underline the CC button when a text track is visible. From both youtube and ramp.
- Required some simple javascript to add/remove the visibility class appropriately.
Current time showing as current / total instead of by default elapsed, now matching youtube and what some of our users asked for. (Default video.js has some weird spacing that you have to really trim down once you show current and total).
Use newer CSS @container queries to make buttons smaller and/or remove some buttons when screen is smaller (had some weird problems with this actually crashing the video player in my actual markup though).

While fairly minor changes, I think it results in much better look and usability for a general purpose neutral theme/skin than video.js ships with out of the box. While relatively simple, it still took me a week or so to work through.

If there’s interest, I would find time to polish it up further and release it as more easily re-usable open source product, let me know?

Google indexable transcripts

One of the most exciting things about adding transcripts for our videos, is that text is now searchable and discoverable in our own web app.

It would be awfully nice if Google would index it too, so people could find otherwise hidden mentions of things they might want in videos. In the past, I’ve had trouble getting Google to index other kinds of transcripts and item text like OCRs. While hypothetically Google is visiting with javascript and can click on things like tabs or disclosure “show” buttons — conventional wisdom seems to be that Google is doens’t like to index things that aren’t on the initial page and require a click to see — which matches my experience, although others have had other experiences.

In an attempt to see if I could get google to index, I made a separate page with just transcript text — it links back to the main item page (with video player), and even offers clickable timecodes that will link back to player at that time. This transcript-only page is the href on the “Show Transcript” button, although a normal human user ordinarily would get JS executing to show transcript on same page instead when clicking on that link, you can right-click “open in new tab” to get it if you want. These extra transcript pages are also listed in my SiteMap.

There are already a few of these transcript pages showing up in google, so it seems to be a potentially useful move.

That isn’t to say how much SEO juice they have; but first step is getting them in the index, which I had trouble doing before with things that required a tab or ‘show’ click to be shown. So we’ll keep an eye on it! Of course, another option is making the transcript on-page right from the start without requiring a click to show, but I’m not sure if that really serves the user?

We also marked up our item pages with schema.org content for video, including tags around the transcript text (which is initially in DOM, but requires a ‘show transcript’ click to be visible). I honestly would not expect this to do much for increasing indexing of transcripts… I think according to google this is intended to give you a “rich snippet” for video (but not to change indexing)… but some people think Google doesn’t do too much of that anyway, and to have any chance I’d probably have to provide a persistent link to video as a contentUrl which I don’t really do. Or maybe it could make my content show up in Google “Video” tab results… but no luck there yet either. Honestly I don’t think this is going to do much of anything, but it shouldn’t hurt.

Acknowledgements

Thanks to colleagues in Code4Lib and Samvera community slack chats, for sharing their prior experiences with whisper and with video transcripts — and releasing open source code that can be used as a reference — so I didn’t have to spend my time rediscovering what they already had!

Especially generous were Mason Ballengee and Dananji Withana who work on the ramp project. And much thanks to Ryan “Harpo” Harbert for two sequential years of Code4Lib conference presentations on whisper use at WGBH (2024 video, 2025 video), and also Emily Lynema for a 2025 whisper talk.

I hope I have helped pass on a portion of their generosity by trying to share all this stuff above to keep others from having to re-discover it!

2025-07-16: Understanding Hallucination in Large Language Models: Challenges and Opportunities / Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

Fig 1 from Rawte et al. Taxonomy for Hallucination in Large Foundation Model

The rise of large language models (LLMs) has brought about accelerated advances in natural language processing (NLP), enabling powerful results in text generation, comprehension, and reasoning. However, alongside these advancements comes a persistent and critical issue: hallucination. Defined as the generation of content that deviates from factual accuracy or the provided input, hallucination presents a multifaceted challenge with implications across various domains, from journalism to healthcare. This blog post presents insights from three recent comprehensive surveys on hallucination in natural language generation (NLG) and foundation models to provide an understanding of the problem, its causes, and ongoing mitigation efforts. “Survey of Hallucination in Natural Language Generation” by Ji et al. (2022) provides a foundational exploration of hallucination in various NLG tasks, including abstractive summarization, dialogue generation, and machine translation. It defines key terminologies, identifies contributors to hallucination, and outlines metrics and mitigation strategies, making it an essential reference for understanding the broad implications of hallucination across NLP applications. “A Survey of Hallucination in "Large" Foundation Models” by Rawte et al. (2023) extends the discussion of hallucination into the realm of large foundation models (LFMs), such as GPT-3 and Stable Diffusion. This paper categorizes hallucination phenomena across modalities (text, image, video, audio), evaluates current detection and mitigation efforts, and highlights the challenges posed by the inherent complexity and scale of LFMs. Finally, in “A Survey on Hallucination in Large Language Models”, Huang et al. (2024) offer an in-depth examination of hallucination specific to LLMs, introducing a nuanced taxonomy that includes factuality and faithfulness hallucinations. This work delves into the causes of hallucination during data preparation, training, and inference while also addressing retrieval-augmented generation (RAG) and future directions for research.

Hallucination in Language Models

WSDL member Hussam Hallak recently wrote a three part blog post about LLM hallucinations in the Quran which does an excellent job at illustrating ways that LLM’s can introduce problematic, erroneous information into its responses. Part 1 shows how Google Gemini hallucinates information. Part 2 compares Google Gemini’s hallucinations with ChatGPT’s. Finally, part 3 then reviews DeepSeek and how it performs on the same prompts.

Hallucination in LLMs can be categorized into two primary types:

Intrinsic Hallucination: This occurs when the generated content contradicts the source input. For instance, in machine translation, a model might produce translations that directly conflict with the original text.
Extrinsic Hallucination: This involves the generation of unverifiable or fabricated content not grounded in the input. While such outputs might sometimes provide helpful additional context, they frequently introduce inaccuracies, posing risks in applications demanding high factual fidelity.

Table 1. from Ji et al., showing various hallucinations in text respective to data

Fig 2. Ji et al., Examples of hallucination in image generation

Figure 2 from Ji et al.'s survey provides both intrinsic and extrinsic examples of object hallucination in image captioning, showcasing the challenges in multimodal tasks. For instance, when describing an image, a model might generate captions referencing objects that are not present in the visual content. This serves as a concrete illustration of how hallucination manifests beyond textual outputs, underscoring the need for robust grounding techniques in multimodal systems.

Recent research extends these definitions to include distinctions between factuality hallucinations (deviation from real-world facts) and faithfulness hallucinations (deviation from user instructions or input context).

Causes of Hallucination

Hallucination arises from multiple interconnected factors throughout the lifecycle of LLMs, encompassing data, training, and inference processes.

Data-Related Causes:

Source-Reference Divergence: According to Ji et al., tasks like abstractive summarization, the reference text might include information absent from the source, leading models to generate outputs based on incomplete grounding.
Misinformation and Bias: Both Rawte et al. and Huang et al. explain that training datasets often contain erroneous or biased information, which models inadvertently learn and reproduce.

Training-Related Causes:

Pretraining Limitations: During pretraining, models optimize for next-token prediction, potentially leading to overgeneralization or memorization of inaccuracies.
Fine-Tuning Misalignment: Misaligned training objectives during supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF) can exacerbate hallucination tendencies.

Inference-Related Causes:

Imperfect Decoding Strategies: Techniques such as beam search may prioritize fluency over factuality, amplifying hallucination.
Overconfidence: Models often assign high probabilities to fabricated outputs, further complicating detection and mitigation.

Metrics for Evaluating Hallucination

Evaluating hallucination requires a nuanced approach:

Statistical Metrics: These include precision and recall for factuality and alignment.
Model-Based Metrics: Leveraging pretrained models to assess the veracity of generated content.
Human Evaluation: Employing expert annotators to identify and classify hallucinatory outputs, though this is resource-intensive.

Recent efforts also emphasize the development of task-specific benchmarks, such as TruthfulQA and HaluEval, which test models against curated datasets designed to probe their factual consistency and faithfulness. These benchmarks are instrumental in driving standardized comparisons across models.

Mitigation Strategies

Efforts to mitigate hallucination span data curation, training modifications, and inference adjustments:

Data-Centric Approaches:

Data Filtering: Removing noisy or biased samples from training datasets.
Augmentation with External Knowledge: Incorporating structured knowledge bases or retrieval-augmented generation (RAG) to ground outputs in verifiable facts.

One notable example of augmentation is the use of retrieval mechanisms that fetch relevant external documents during inference, providing models with updated and accurate information to support generation.

Training Enhancements:

Faithfulness Objectives: Adjusting training loss functions to prioritize adherence to source content.
Contrastive Learning: Encouraging models to distinguish between grounded and ungrounded content.
Knowledge Injection: Embedding domain-specific or real-time data updates into the model during training to reduce reliance on potentially outdated pretraining data.

Inference Techniques:

Iterative Prompting: Refining outputs through step-by-step guidance.
Factuality-Enhanced Decoding: Modifying decoding algorithms to prioritize accuracy over fluency.
Self-Verification Mechanisms: Employing models to cross-check their outputs against trusted sources or re-evaluate their answers iteratively.

Applications and Implications

The implications of hallucination extend far beyond theoretical concerns with real world consequences. One such case happened with Air Canada’s customer support chatbot hallucinated a policy that doesn't exist. When a passenger asked Air Canada’s website chatbot about bereavement fares, the bot invented a rule saying he could buy the ticket at full price and claim the discount retroactively within 90 days. After the trip, the airline refused the refund only to have Canada’s Civil Resolution Tribunal order it to pay C$650 (plus interest and fees) and reject Air Canada’s claim that the chatbot was “a separate legal entity.” One stray hallucination turned into real legal and financial liability, illustrating why faithful generation matters. Another case happened during Mata v. Avianca, Inc., a personal injury suit tried in the U.S. District Court for the Southern District of New York. Two attorneys used ChatGPT to draft part of a motion and it hallucinated six precedent cases that never existed. U.S. District Judge P. Kevin Castel spotted the fabrications, dismissed the citations as “gibberish,” and fined the attorneys and their firm Levidow, Levidow & Oberman C$5,000 for “conscious avoidance and false statements.” The sanction shows how un-vetted AI hallucinations can turn into real ethical and financial consequences.

These real world examples illustrate how in domains like healthcare, hallucination can lead to life-threatening misinterpretations of medical advice. In journalism, it risks eroding public trust by disseminating misinformation. Similarly, legal contexts demand utmost accuracy, where any hallucinatory output could have grave consequences.

Fig 3 Ji et al. Examples of hallucination in visual question answering. The bold text is the output generated by the model and the part before it is the input prompt

Figure 3 further elaborates on this by depicting scenarios where hallucinated captions create misleading narratives about visual data. These examples highlight the significant risks in real-world applications like autonomous systems or assistive technologies, emphasizing the urgency for improved evaluation and mitigation strategies.

Fig 5 from Rawte et al. A video featuring three captions generated by various captioning models, with factual errors highlighted in red italics

Figure 5 from “A Survey of Hallucination in “Large” Foundation Models” adds another layer to this discussion by presenting video captioning outputs with factual errors highlighted in red italics. These examples reveal the complexities of grounding outputs in video data, where temporal and contextual nuances often exacerbate hallucination issues. They underscore the necessity for domain-specific training and evaluation techniques in applications like autonomous systems or assistive technologies.

However, hallucination is not universally detrimental. In creative applications like storytelling or art generation, "controlled hallucination" may enhance innovation by producing imaginative and novel content, like protein discovery. Balancing these use cases with robust mitigation strategies is critical.

Research Frontiers and Open Questions

Despite notable progress, significant gaps remain:

Dynamic Evaluation: Developing adaptive evaluation frameworks that account for diverse tasks and domains.
Contextual Understanding: Enhancing models’ ability to reconcile disparate contexts and avoid contradictory outputs.
Knowledge Boundaries: Establishing clear delineations of models’ knowledge limitations to improve trustworthiness.
Explainability and Transparency: Understanding why hallucinations occur and providing users with insights into the reasoning behind model outputs.
Cross-Modal Hallucination: Exploring hallucination across modalities, such as text-image generation systems, to develop unified mitigation strategies.

Conclusions

Addressing hallucination in LLMs is paramount as these models become integral to decision-making systems in high-stakes domains. By tackling the root causes through better data practices, refined training paradigms, and innovative inference strategies, the NLP community can enhance the reliability and trustworthiness of LLMs. Collaborative, interdisciplinary efforts will be essential to navigate the complexities of hallucination and unlock the full potential of generative AI.

Moreover, as LLMs continue to evolve, fostering ethical and responsible AI practices will be crucial. Researchers and developers must work together to ensure that these powerful tools serve as reliable partners in human endeavors, minimizing risks while amplifying benefits for society at large.

Works Cited

Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Chen, Delong; Dai, Wenliang; Shu Chan, Ho; Madotto, Andrea; Fung, Pascale. "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys, vol. 1, no. 1, Feb. 2022. DOI: https://doi.org/10.48550/arXiv.2202.03629.
Rawte, Vipula; Sheth, Amit; Das, Amitava. "A Survey of Hallucination in 'Large' Foundation Models." arXiv preprint, Sept. 2023. DOI: https://doi.org/10.48550/arXiv.2309.05922.
Huang, Lei; Yu, Weijiang; Ma, Weitao; Zhong, Weihong; Feng, Zhangyin; Wang, Haotian; Chen, Qianglong; Peng, Weihua; Feng, Xiaocheng; Qin, Bing; Liu, Ting. "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions." ACM Transactions on Information Systems, Jan. 2024. DOI: https://doi.org/10.48550/arXiv.2311.05232.

Jaya Klara Brekke: ‘Know who runs your infrastructure’ / Open Knowledge Foundation

The infrastructure researcher and CSO at Nym joins us for the seventeenth #OKFN100, a series of conversations with over 100 people about the challenges and opportunities facing the open movement

The post Jaya Klara Brekke: ‘Know who runs your infrastructure’ first appeared on Open Knowledge Blog.

🇵🇭 Open Data Day 2025 in Cagayan de Oro: Enabling Students to Understand the Effects of El Niño and La Niña / Open Knowledge Foundation

#ODDStories – USTP YouthMappers conducted a workshop seminar to cele Open Data Day, focused on QGIS as a spatial analysis and visualisation tool for disaster preparedness.

The post 🇵🇭 Open Data Day 2025 in Cagayan de Oro: Enabling Students to Understand the Effects of El Niño and La Niña first appeared on Open Knowledge Blog.

🇲🇼 Open Data Day 2025 in Zomba: GeoSkills Training for Climate Action / Open Knowledge Foundation

#ODDStories – In their Open Data Day event, Spatial Girls Network empowered university students with geoskills and knowledge in open data and open-source data collection tools.

The post 🇲🇼 Open Data Day 2025 in Zomba: GeoSkills Training for Climate Action first appeared on Open Knowledge Blog.

Announcing TinyCat District! / LibraryThing (Thingology)

TinyCat District is here! Libraries with multiple locations or branches can now manage their accounts in one place. We created new tools and features for TinyCat District that will simplify your setup, billing, and reporting.

TinyCat District offers:

Synced collections. Copy your catalog across branches with a few clicks.
Shared branding. Customize your brand across branches.
Group billing. Simplify your payment process.
Grouped reports. Review information about your district and individual branches.

Each TinyCat District “superadmin” can manage the source collections, branch memberships, grouped reports, shared branding, and billing across the district.

You can choose your billing setup. With regular “branch billing,” each library branch pays for their own account. If you choose “group billing,” your district’s superadmin can submit a single payment for all branches. Accounts with group billing may be eligible for a discounted rate!

You can see the setup on the TinyCat District wiki page: https://wiki.librarything.com/index.php/TinyCat_District_%C2%BB_Admin_Pages

Currently, TinyCat District is offered to libraries with five or more branches. If your library is close to five branches but not quite there, reach out anyway to see if you’re eligible!

Email us at tinycat@librarything.com to set up your TinyCat District.

Open Science & Open Knowledge – similar but yet different worlds / Open Knowledge Foundation

Why is it that these two worlds have similar values but barely seem to connect? While both aim to promote free access to information, they seem to do so by different types of peoples associated with the separate movements.

The post Open Science & Open Knowledge – similar but yet different worlds first appeared on Open Knowledge Blog.

Investigating the “Feeling Rules” of Generative AI and Imagining Alternative Futures / In the Library, With the Lead Pipe

In Brief

Since the public debut of ChatGPT in November 2022, the calls for librarians to adopt and promote generative AI (GenAI) technologies and to teach “AI literacy” have become part of everyday work life. For instruction librarians with reservations about encouraging widespread GenAI use, these calls have become harder to sidestep as GenAI technologies are rapidly integrated into search tools of all types, including those that libraries pay to access. In this article, I explore the dissonance between, on the one hand, instruction librarians’ pedagogical goals and professional values and, on the other, the capacities, limitations, and costs of GenAI tools. Examining discourse on GenAI and AI literacy, I pay particular attention to messages we hear about the appropriate ways to think and feel about GenAI. These “feeling rules” often stand in the way of honest and constructive dialogue and collective decision making. Ultimately, I consider work from within and outside librarianship that offers another view: that we can slow down, look honestly at GenAI capacities and harms, take seriously the choice some librarians may make to limit their GenAI use, and collectively explore the kinds of futures we want for our libraries, our students, fellow educators, and ourselves.

By Andrea Baer

At the April 2025 Association of College & Research Libraries Conference, academic library workers gathered in person and online to explore the theme “Democratizing Knowledge + Access + Opportunity.” Before sessions about how to integrate generative AI (GenAI) tools into essential public services like teaching and research services, sociologist and professor of African American Studies Ruha Benjamin offered the opening keynote. Articulating the resonance of the conference theme for her, Benjamin reflected, “One way to understand the stakes of this conference, … why it’s so vital that we work in earnest to democratize knowledge, access, and opportunity at a moment when powerful forces are working overtime to monopolize, control, and ration these social goods, is that this is a battle over who gets to own the future, which is also a battle over who gets to think their own thoughts, who gets to speak and express themselves freely, and ultimately who gets to create” (Benjamin, 2025). Noting that technologies are never neutral but rather reflect “the values or lack thereof of their creators,” Benjamin drew a connection between current attacks on libraries and higher education and a category of technology that was prominent throughout the conference program: artificial intelligence. “[I]t should give us pause,” she asserted, “that some of the same people hyping AI as the solution to all of our problems are often the ones causing those problems to begin with.” Applause followed.

Though Benjamin did not name the prominence of AI across conference sessions, I was probably not the only person to notice the contrast between Benjamin’s critique of AI hype and the prevalence of conference sessions about promoting AI technologies and AI literacy in libraries.

As Benjamin continued, she turned to the chilling words of JD Vance at the 2025 Paris AI Summit: “Our schools will teach students how to manage, how to supervise, and how to interact with AI-enabled tools as they become more and more a part of our everyday lives.” As I listened, I thought these words could easily be mistaken as part of a talk on AI literacy by an academic librarian or educator with better intentions. I wondered how many others were thinking the same thing. Benjamin then reminded the audience of Vance’s ideological perspective, as she observed that in January 2021 Vance gave a speech at the National Conservatism Conference entitled, “The Universities are the Enemy,” in which he argued that universities must be aggressively attacked to accomplish his and his audience’s goals for the country (Vance, 2021).

It’s worth taking a brief step away from Benjamin’s keynote to point out that a couple of weeks after her talk, on April 23, President Donald Trump issued an executive order to promote AI literacy through a new the White House Task Force on AI Education that will “establish public-private partnerships to provide resources for K-12 AI education, both to enhance AI-related education but also to better utilize AI tools in education generally.” The executive order’s Fact Sheet states that “AI is rapidly transforming the modern world, driving innovation, enhancing productivity, and reshaping how we live and work.” Thus, “[e]arly training in AI will demystify this technology and prepare America’s students to be confident participants in the AI-assisted workforce, propelling our nation to new heights of scientific and economic achievement” (The White House, 2025). This laudatory language about AI is perhaps unsurprising for an administration that established the Department of Government Efficiency (DOGE). DOGE purportedly aims to reduce the “government waste, fraud, and abuse,” largely through eliminating government jobs and replacing workers with a combination of automation and tech workers who have been directed to violate digital privacy rights and regulations (Klein, 2025; Salvaggio, 2025).

What is perhaps more striking is the similarity between the White House’s rhetoric and that of many educators in universities and academic libraries. Can the difficulty of distinguishing between the dominant AI rhetoric in higher education and that from political leaders who have explicitly named universities as the enemy be a wake-up call for people in higher education and in libraries, a message that we need to give more weight to the ethical concerns surrounding GenAI technologies?^[1]

Benjamin did not dwell long in her ACRL keynote on Vance’s vision for GenAI and AI literacy. Instead, she devoted most of her time to exploring imagination as a powerful means through which to envision the kinds of worlds we want to live in and to begin building. As she noted, imagination can envision dystopian futures, but it can also open more hopeful possibilities for the futures we want. “What if we took imagination seriously?” she asked. “Not as flights of fancy, but imagination as a resource, a capacity, a muscle? How might the powers of our collective imagination begin to transform the world around us?” (Benjamin, 2025). Here Benjamin articulated what I believe many in the academic library community have been thinking and feeling in the last few years, as pressure to integrate GenAI tools into library systems and library work has intensified, often accompanied by brief and perfunctory acknowledgements of GenAI’s present and potential harms that are then set aside.

Benjamin was inviting us to imagine alternatives to the narrative that GenAI technologies are the inevitable future of nearly all intellectual work. As I will explore, this process of imagining can include critically examining discourses about GenAI and AI literacy, as well as being curious about and attentive to our own affective experiences in response to GenAI technologies and discourses about them. If we accept this invitation to imagine, we might (re)discover what becomes out of view when so much of our attention is focused on a particular vision of the future of GenAI proliferation. We might widen our ideas of what is possible and nurture some sense of collective agency to work for the kinds of futures we want.

Of course, our individual imaginings and feelings don’t always match what a majority (real or imagined) appear to share. My own conceptions of, approaches to, and feelings about GenAI and AI literacy usually seem out of sync with the dominant discourse in higher education and librarianship (though with time I learned I have some company). Like many others, I am deeply concerned about the real and present costs of GenAI technologies that are rapidly being integrated into search and library tools. I am also unsettled by widespread overconfidence in these technologies’ abilities to generate mostly reliable information and to support research and learning. Both as a librarian and more recently as a professor of practice, I have struggled with how to understand and respond to the enthusiastic calls in higher education and academic librarianship for teaching a version of AI literacy which requires that educators and students use these tools, while giving limited attention to ethical questions surrounding GenAI. So often calls to teach AI literacy contribute to AI hype by misrepresenting GenAI’s capacities, minimizing acknowledgement of its harms, and implying that critique of GenAI stands in the way of human progress. We frequently hear that GenAI technologies are the inevitable future of the world and of libraries and that the only viable option is to embrace them, and quickly, before we fall behind. This message of urgency fits into an older narrative that libraries must embrace technological change or otherwise become obsolete (Birdsall, 2001; Glassman, 2017; Espinel and Tewell, 2023).

Through this article, I hope to further encourage what Benjamin recommends: rather than rushing to adopt and promote new technologies whose ethical implications raise major questions, we might slow down and claim more time and space for considering the present and potential implications of GenAI adoption and use. That time and space is necessary for the more expansive collective imagining that Benjamin proposes, imagining that takes into consideration the power and social structures that currently exist and those that we want to exist.

Of course, what we imagine to be desirable or possible is heavily shaped by our environments, social relationships and interactions, and the ideas and messages we encounter every day. Making space for imagination therefore also means making space for individual and collective inquiry that includes establishing agreed-upon facts about GenAI, rhetorical analysis of GenAI discourses, and critical reflection on our own thoughts and feelings about GenAI technologies and discourses. Being inclusive and expansive in imagining the futures we want also requires investigating the social expectations and pressures that often influence what we do and do not say in various professional circles.

With these things in mind, in this article I consider the dissonances between what we know about the limitations and harms of GenAI technologies and the imperatives we hear to adopt them. Can we reconcile the tensions between, on the one hand, the harms of GenAI technologies and, on the other, professional values like those articulated in the ALA Core Values of Librarianship, which include equity, intellectual freedom and privacy, public good, and sustainability (American Library Association, 2024)? And if we could magically resolve many of those tensions through a radically transformed AI infrastructure that is environmentally sustainable and does not depend on the exploitation of human labor, what might we lose when we offload cognitive tasks like searching for, selecting, reading, or synthesizing sources to GenAI technologies? What do we value about education and information literacy practices that need to be preserved with foresight and intention? Because my work as a librarian and as an educator centers on teaching and learning, I am especially interested in how we conceptualize and approach teaching what is often called AI literacy.

A necessary step in this process of imagining is investigating the messages embedded in much of academic and library discourse about GenAI technologies and the appropriate ways to think and feel about them (what sociologist Arlie Hochschild might call “feeling rules”). For instruction librarians, this process includes examining conceptions and framings of AI literacy and its role in information literacy education. A critical analysis of this discourse can help us open conversations about what we want information literacy instruction and library search tools to look like and do. This inquiry can also help us identify ways we have choice and agency in our own use of and teaching about GenAI tools. After an initial consideration of the feeling rules of GenAI and dominant discourse on AI literacy, I finally consider alternative ways to think about GenAI and to respond to calls for widespread adoption. Looking to work from within and outside librarianship, I consider another view: that we can slow down; take time to look honestly and critically at what we know, think, and feel about GenAI and its impacts; and consider ways to work toward the kinds of futures that align with our professional values. Part of this process is allowing space for more critical and skeptical perspectives on and feelings about GenAI, including nuanced arguments for AI refusal, a term I unpack in more detail later.

Feeling Rules

An impetus for my writing is that I see in much of our professional discourse and interactions a tendency to dismiss or minimize critiques of GenAI technologies, and sometimes even a shaming of those “Luddites” who do not embrace the technological changes of the day.^[2] As I have argued elsewhere and further consider in this article, an especially powerful strategy for shutting down critique of GenAI is the construction and imposition of “feeling rules”: social expectations about the appropriate ways to feel and display emotion in a given context (Hochschild, 1979, 1983; Baer, 2025).

Feeling rules, as first described by sociologist Arlie Hochschild, are social norms that prescribe what feelings are and are not appropriate to have and express (Hochschild, 1979). Though feeling rules are not confined to the workplace, they are a powerful part of the emotional labor we do in our places of employment (Hochschild, 1983).^[3] Feeling rules are typically discussed in the context of specific moments of social interaction among individuals, while in this article I apply them to our social relationships on both a micro- and a macro-level – that is, as evident not only in discrete individual social interactions but also in discourse about GenAI technologies that is informed by social relationships.

While feeling rules are usually established by those in positions of power, they are often internalized by those for whom the feeling rules are intended (Hochschild, 1979, 1983). In the case of GenAI, messages that librarians should be enthusiastic and optimistic about technological changes, which are frequently described as inevitable, often imply or outright assert that those who question or resist certain technological developments are simply overwhelmed by irrational fear and anxiety that they need to overcome. Giving too much attention to those unpleasant emotions or their underlying thoughts, the discourse often goes, risks making the profession obsolete.

Many of the feeling rules we observe in librarianship or higher education, of course, are influenced by social conditions and norms that extend beyond them. Search for the term “AI anxiety” and you will find articles explaining it as a psychological condition to be overcome by integrating AI technologies into your everyday life (Comer, 2023; Cox, 2023; Okamoto, 2023). The antidote to AI anxiety, according to its experts: accept and embrace the technology. For example, in the BBC article “AI Anxiety: The Workers who Fear Losing their Jobs to AI,” PricewaterhouseCoopers (PwC) Global AI and Innovation Technology Leader, Scott Likens, explains, “In order to feel less anxious about the rapid adoption of AI, employees must lean into the technology. … Instead of shying away from AI, employees should plan to embrace and educate” (Cox, 2023).

But what if emotional responses like “AI anxiety” are in large part deeply intelligent, a recognition of the unsettling facts about how most GenAI tools currently are built and deployed and what harmful impacts they already have? What if the cognitive dissonance that many of us experience when reading articles about AI anxiety or the necessity of AI adoption is worth our attention and curiosity? There is a stark mismatch between, on the one hand, imperatives to rapidly adopt and promote GenAI technologies and, on the other, the extensive documentation of the unethical labor practices upon which GenAI is built, as well at GenAI’s detrimental impacts on the environment, local communities, and society more broadly (Bender et al., 2021; Crawford, 2024; Electronic Privacy Information Center, 2023; Nguyen & Mateescu, 2024; Shelby et al., 2023). Despite this, librarians who are reluctant to adopt GenAI are frequently described as regressive and even harmful to a profession that must adapt to remain relevant. This shaming closes off open dialogue and critical thought.

For many librarians who teach, the calls to adopt GenAI, promote its use, and teach a kind of AI literacy that encourages others to do the same adds to this dissonance. We repeatedly hear that GenAI is the future of work and the university and that we must therefore embrace it in our own work and teaching, regardless of our own views. Projects and initiatives at our places of employment and in our professional associations urge us to use these tools robustly, partly so we can help students, faculty, and community members keep up and succeed in our ever-changing world. Library vendors and the technology companies that libraries and universities pay for services and subscriptions continue to integrate GenAI tools into their platforms, usually offering people little to no choice in whether they use these extractive tools (though perhaps it’s time that libraries demand more choice from our vendors). The apparent lack of criticality toward these vendors and companies is further perpetuated by the refrain that librarians must teach the AI literacy skills that students and researchers will inevitably need. When we do hear about the problems with GenAI technologies, like the persistent inaccuracies in information generated from large language models (LLMs), or the extensive list of GenAI’s environmental and societal harms, reservations are usually a short footnote, followed by a call for ethical GenAI use that sidesteps the fact that using GenAI technologies in their current forms inevitably means adding to their harmful impact.

While some AI technologies may be beneficial in specific domains and justified for narrow use cases – for example, machine learning in some instances of medical diagnosis and drug discovery (Ahmad et al., 2021; Suthar et al., 2022) – they are now being integrated widely and indiscriminately across domains, including in areas where they often hinder human thought more than they support it. As Shah and Bender argue, the LLMs being integrated into library search systems that supposedly save people precious time may actually prevent the exploration, discovery, and information literacy development that these resources have long been meant to enable (Shah & Bender, 2024). Their argument is further supported by accumulating research on the detrimental effects of the cognitive offloading of tasks to GenAI (Gerlich, 2025; Shukla et al., 2025).

I see detrimental impacts of GenAI reliance directly in my own work teaching academic research and information literacy. Increasingly, a large portion of students have turned to GenAI to do nearly all the cognitive work that previously would have taken them so much time and effort. In the process, many if not most of these students are not developing the critical thinking and writing skills that have long been considered foundational to higher education. I also see a smaller group of students who are deeply concerned about the costs of GenAI and who are choosing the more labor-intensive path of developing and articulating their own thinking, rather than immediately turning to chatbots. The latter group is learning far more, and is far better prepared for the workplace and meaningful participation in society more broadly. The contrasting perspectives and behaviors of my students reflects that students’ views and uses of GenAI are, like ours, not monolithic. And also like us, students hear many of the same simplistic messages: that GenAI is an amazing technology that will make work faster and easier and that the only way to be prepared for the workplace and relevant in the world is to embrace GenAI.

In academic libraries, those who want to take a slower and more cautious approach to GenAI are frequently criticized as holding the profession back, resisting the inevitability of technological change, inhibiting progress, neglecting to prepare students for the future, and denying reality. Such criticisms have a silencing effect, discouraging people from expressing their legitimate concerns about a technology that in the widest circulating discussions is surrounded by more hype than critical investigation.

But when we can free ourselves of shaming rhetoric, we are better positioned to both support one another as respected colleagues and to think critically, and imaginatively, about how we want to engage with and teach about GenAI technologies. Given the prevalence of hype and misunderstandings surrounding GenAI, unpacking discourse on GenAI and AI literacy is a powerful and necessary part of this work.

Rhetorics of AI Literacy

Calls for embracing GenAI in higher education and academic librarianship are frequently accompanied by declarations that AI literacy is one of the most essential skills that students must now develop to be prepared for the workforce and for the future in general. Definitions of AI literacy and related competencies regularly add to the AI hype that Benjamin cautions against, as they repeatedly misrepresent GenAI’s abilities, mandate GenAI adoption, and reinforce the message that GenAI is the inevitable future which we must therefore embrace through adoption and active use. Like GenAI discourse more broadly, AI literacy rhetoric often includes brief asides to consider the potential risks of AI technologies to ensure they are used ethically and responsibly. Like a perfunctory checklist, these acknowledgements rarely offer a meaningful examination of the extensive harms of GenAI, nor do they confront the reality that more ethical use will only be possible with radical changes to GenAI technologies and their infrastructures. With the emphasis on adoption and use, this discourse leaves little to no room for considering the possibility of non-use or critical examination of use cases that might not warrant AI use.

Consider, for example, the AI Literacy Framework developed by academic and technology teams at Barnard College. Based on Bloom’s taxonomy, it is composed of four levels: 1) Understand AI, 2) Use and Apply AI, 3) Analyze and Evaluate AI, and 4) Create AI. Here, using AI precedes considering critical perspectives on AI, such as ethical concerns. After students have engaged with level 3, where they “Analyze ethical considerations in the development and deployment of AI,” the next level (4) mandates creating more of these technologies (Hibbert et al., 2024). Stanford University Teaching Commons’ AI literacy framework, which emphasizes “human-centered values,” similarly begins with developing a basic understanding of AI tools, in part through AI use (“functional literacy”). Following functional literacy is “ethical AI literacy,” which involves “understanding ethical issues related to AI and practices for the responsible and ethical use of AI tools.” Again, non-use is not presented as an option. Instead, the framework authors explain, “You and your students can identify and adopt practices that promote individual ethical behavior and establish structures that promote collective ethical behavior” (Teaching Commons, Stanford University, n.d.).^[4] As these AI literacy frameworks suggest, much of the literature on AI literacy reflects a strange mixture of the AI inevitability narrative, superficial acknowledgement of ethical concerns, and AI hype that frames GenAI as a transformative force that will better society.

AI literacy frameworks created within librarianship frequently share these characteristics. ACRL President Leo Lo’s 2025 “AI Literacy: A Guide for Academic Libraries” is one such influential document. It is described as “a guide to AI literacy that addresses technical, ethical, critical, and societal dimensions of AI, preparing learners to thrive in an AI-embedded world.” In this new world, librarians can “become key players in advancing AI literacy as technology shapes the future” (Lo, 2025, p. 120). What that future looks like, or what we want it to look like, is not discussed.

Like other AI literacy frameworks, Lo’s guide predicates AI literacy on AI use, as the document defines AI literacy as “the ability to understand, use, and think critically about AI technologies and their impact on society, ethics, and everyday life” [my emphasis] (Lo, 2025, p. 120). As with the previously mentioned AI literacy frameworks, this document presents AI as pervasive and socially beneficial, while omitting a meaningful examination of the material conditions on which creating and using these technologies currently rests. At various points, the guide briefly notes the need to consider the limitations and ethics of GenAI tools, statements that are quickly followed by an emphasis on AI adoption and promotion that supports the common good, social justice, and empowerment. Consider, for example, the section on the societal impact of AI on the environment and sustainability:

While AI remains resource-intensive with a notable environmental footprint, discussions on sustainability should encompass more than just reducing consumption. The real potential lies in using AI to drive systemic changes that promote social and environmental well-being. For example, AI can optimize energy management in cities, creating smarter, more sustainable urban environments. It also has the capacity to revolutionize agricultural supply chains, increasing efficiency, reducing waste, and supporting sustainable practices across production and distribution. By integrating sustainability into the societal dimension of AI literacy, we can better understand AI’s role not just as a technological advancement, but as a force capable of reshaping our economic, social, and environmental landscapes for the better. [my emphasis] (Lo, 2025, p. 122)

Here, a minimization of the costs of AI coexists with an idealization of a future made possible by AI. No references are made to the water-thirsty and energy-hungry data centers rapidly being built to power GenAI, or how these data centers disproportionately harm economically disadvantaged communities and areas that are especially prone to drought (Barringer, 2025). If such harms seem like a distant problem that does not affect most of us, we are likely to be proven wrong. For example, in my current home of Austin, Texas, which is prone to both drought and power grid failures, data centers are big business (Buchele, 2024).

The influential role of Lo’s AI Literacy Guide is further reflected in another key ACRL effort to promote the integration of AI in academic libraries: the ACRL AI Competencies for Academic Library Workers (“AI Competencies”) (ACRL AI Competencies for Library Workers Task Force, 2025). The first draft, published online this past March, builds on Lo’s AI Literacy Guide. Like Lo’s AI Literacy Framework, AI Competencies does not consider whether GenAI tools are the optimal technologies for information literacy education, library research, or critical inquiry.

While Lo’s aforementioned AI Literacy Guide is apparently designed for library instruction, the AI Competencies document concentrates on the abilities that library workers should possess. Despite this different focus, the task force also associates their work with information literacy and notes early in the document that while developing the competencies, they “recognized significant parallels between responsible AI use and the principles of critical information literacy, as outlined in documents like the ACRL Framework for Information Literacy for Higher Education” (p. 1). This suggests the potential relevance of the document to librarians’ instructional work.

Before engaging in a closer examination of the AI Competencies first draft, I should stress that upon releasing the document the authors solicited feedback from the library community to inform future revisions. At the Generative AI in Libraries (GAIL) Conference this past June, the task force co-chairs shared the feedback they received and the kinds of revisions they plan to make (Jeffery and Coleman, 2025). Much of that feedback mirrors my own concerns about common conceptions of AI literacy that I have discussed thus far, conceptions that are reflected in the AI Competencies first draft as well. A considerable number of responses challenged the implications that library workers must use AI, that AI literacy necessitates AI use, and that responsible GenAI use is possible. Some also commented that the document did not adequately acknowledge GenAI technologies’ harms and that the description of AI dispositions (which I discuss in more detail momentarily) was not appropriate for a competencies document. The task force’s receptiveness to this input – which contrasts professional discourse about GenAI that I previously observed – suggests that many in our profession may be eager and now better positioned for more open and honest conversations about GenAI technologies than in the earlier days of learning about them.

Regardless of how the final draft of the AI Competencies document develops, the dispositions outlined in the first draft are worth closer attention because of the feeling rules about GenAI that they imply (for example, the expectation that competent library workers will embrace GenAI technologies and feel positively about them).^[5] As the AI Competencies task force explains, the document’s dispositions “highlight the importance of curiosity, adaptability, and a willingness to experiment with AI tools” (pp. 2-3). Library workers who demonstrate the appropriate AI literacy dispositions: “Are open to the potential of responsible human-AI collaboration to unlock a future of greater equity and inclusion,” “Seek uses of AI that center and enhance human agency rather than displace and inhibit it,” and “Pursue continuous professional reflection and growth, especially concerning ethical and environmental responsibilities” (p. 3). Implicit within these dispositions is the belief that use of AI tools in their current form can lead to greater equity and can enhance human agency rather than displacing it. The document does not discuss actions or responses one might take in light of the harmful impacts of GenAI technologies. Instead, questioning whether AI tools should be used appears antithetical to the AI competencies articulated in the document. Like many other AI literacy frameworks and guides, this document implies that reflection is sufficient for demonstrating the correct AI competency dispositions. Such rhetoric, not unique to this document, obfuscates the reality that people have limited control over or insight into what the AI companies that own most AI tools do to build and maintain them.

When AI literacy documents assume GenAI use and come to dominate conversations about GenAI in academic libraries and higher education, or even become codified through formal adoption by institutions or organizations, how does this position library workers and educators who disagree with the assumptions embedded within those documents? Should these individuals be considered “AI illiterate,” in need of developing proper GenAI practices, attitudes, and dispositions? Through the lens of these documents, resisting rapid adoption of GenAI tools or questioning their value might be considered incompetence, regardless of how well informed or thoughtful someone’s perspective on GenAI is.

The AI Competencies first draft provides a window into many of the feeling rules about GenAI currently circulating in academic librarianship. Fortunately they ultimately may not be codified in the final version. The task force’s honesty and critical reflection about the academic library community’s feedback, including questions about the appropriateness of including AI dispositions, is evidence that feeling rules and the narratives that help to drive them are never fully solidified and are rarely universally accepted. Feeling rules are often sites of contestation. Moreover, they can shift and change as we learn more and as we engage in critical reflection and dialogue.

New Imaginings for Responding to GenAI

As the critical feedback on the AI Competencies suggests, alternatives to the dominant AI literacy discourse and its implied feeling rules exist, even when those different viewpoints are harder to find. As some educators demonstrate, when we challenge the feeling rules embedded in much of the higher education and library GenAI discourse, we can open new possibilities for thinking about and responding to calls for GenAI adoption and AI literacy instruction that promote this adoption. We can begin to imagine ways of acting that might be out of view when we are mired in a particular set of feeling rules about GenAI (rules that have largely been constructed by the tech companies that stand to profit from the continued use and growth of their data-extracting products).

Charles Logan is among the educators going against the grain of AI enthusiasm and inviting us to think differently about common conceptions of AI literacy. Building on Nichols et al.’s (2022) work on the limits of digital literacy, Logan interrogates the extent to which AI literacy is even possible, given GenAI’s opaqueness and the hegemonic systems on which these technologies are built (Logan, 2024; Nichols et al., 2022). Noting the assumption of AI use in AI literacy discussions, Logan cautions, “An AI literacy devoid of power analysis and civic action risks becoming a talking point for Big Tech, and … a means for corporations like OpenAI and Google to set the terms of how educators and students think about and use their chatbots” (Logan, 2024, p. 363). Instead, Logan proposes a “more heterogeneous approach to generative AI” that allows room for non-use and critical inquiry into GenAI. One pedagogical response is “mapping ecologies of GenAI” that illuminate “its social, technical, and political-economic relations” (Logan, 2024, p. 362). For example, Logan describes a classroom mapping activity developed by Pasek (2023), in which students locate a nearby data center and investigate questions such as, “What potential land use, energy, or water conflicts might exist because of the data center?” and “Who benefits from the data center being here? Who loses?” (Pasek, 2023, cited in Logan, 2024, p. 366).

Drawing from the work of educators and scholars like Logan, librarian Joel Blechinger pays particular attention to dominant framings of AI literacy, which are connected to a longer tradition of presenting literacy as an antidote to intractable social issues and structural problems. Reiterating the question of whether AI literacy is possible, Blechinger asks librarians, “to what extent are efforts to theorize—and proclaim a new era of—AI Literacy premature? Do these efforts instead reflect our own professional investment in the transcendent power of literacy—what Graff & Duffy (2014) have termed ‘the literacy myth’—more than the applicability of literacy to GenAI?” Similar to Logan, Blechinger proposes that one alternative pedagogical approach could be to draw from a politics of refusal, rather than assuming AI use (Blechinger, 2024).

While some may have a knee-jerk negative response to the term refusal, the concept is more nuanced than one might first think. Writing and rhetoric scholars and teachers Jennifer Sano-Franchini, Megan McIntyre, and Maggie Fernandes, who authored “Refusing GenAI in Writing Studies: A Quickstart Guide,” describe GenAI refusal as encompassing “the range of ways that individuals and/or groups consciously and intentionally choose to refuse GenAI use, when and where we are able to do so.” Such refusal, they write, “is not monolithic,” nor does it “imply a head-in-the-sand approach to these emergent and evolving technologies.” Moreover, “refusal does not necessarily imply the implementation of prohibitive class policies that ban the use of GenAI among students” (Sano-Franchini et al., 2024).

This conception of GenAI refusal is aligned with the work of scholars like Carole McGranahan, who explains in “Theorizing Refusal: An Introduction” (2016) that “[t]o refuse can be generative and strategic, a deliberate move toward one thing, belief, practice, or community and away from another. Refusals illuminate limits and possibilities, especially but not only of the state and other institutions.” Such a politics of refusal, embedded in the fields of critical and feminist data studies, can be a source for imagining new possibilities, while being informed about the material conditions that underlie and shape technologies and technological use (D’Ignazio, 2022; Garcia et al., 2022; Zong & Matias, 2024).

Sano-Franchini, McIntyre, and Fernandes’s act of refusal, supported by an extended analysis of GenAI’s material impacts on society in general and on writing studies and higher education more specifically, can also be understood as a refusal to accept the feeling rules implied in so much of the discourse on AI literacy. The authors present ten premises on which they ground refusal as a reasoned disciplinary response to GenAI technologies. The first of these – “Writing studies teacher-scholars understand the relationship between language, power, and persuasion”– is especially relevant to considering the feeling rules that drive much of generative AI discourse in higher education and in libraries. The authors observe that the metaphors often applied to these technologies obscure the human labor that goes into GenAI training and ascribe human abilities to these technologies in ways “designed to cultivate trust in corporate, exploitative, and extractive technologies.” I would add that the messages we hear from our employers and other educators positioned as experts in GenAI and AI literacy further encourage us to trust these technologies over our reservations about them. Instead, Sano-Franchini, McIntyre, and Fernandes write, “We must be critical of the ways that these metaphors and affective associations are used to exaggerate the abilities of these products in ways that strengthen the marketing efforts of Big Tech corporations like OpenAI.” With this criticality, writing studies scholars can “use language that most accurately—and transparently—reflects the actual technology and/or … [highlight] the discursive limitations of the language … [they] commonly use to describe these products.” The authors draw attention to the economics behind GenAI and the ways it is promoted and marketed. Asking us to examine who truly benefits from the increased use of GenAI in higher education, they note that people in the EdTech industry have largely shaped this discourse (for example, articles in InsideHigherEd and The Chronicle of Higher Education written by individuals with close ties to the EdTech industry).

Such examinations of the language used to discuss GenAI in higher education help to illuminate what usually goes unspoken in that discourse. Sano-Franchini, McIntyre, and Fernandes’s critical examination of GenAI can be seen not just as a refusal to adopt GenAI technologies into their teaching. It is also a refusal to follow the feeling rules behind much of GenAI discourse. It is refusing to be shamed or to doubt oneself for having concerns about the value, ethics, and potential impacts of the GenAI technologies being so heavily promoted at our institutions. The authors choose critical thought over compliance with mandates that stifle critical inquiry and dialogue. Regardless of whether an individual or a group adopts a stance of GenAI refusal (a position that the authors stress looks different in practice for each individual in their context), examining and questioning the feeling rules implicit in much of GenAI discourse better enables us to make more intentional and informed choices about how we do or do not use these technologies and how we teach about them.

Examples of librarians challenging the feeling rules of dominant GenAI discourse exist, even when they are outliers. ACRL’s invitation to Ruha Benjamin to give the 2025 conference keynote is just one example of an interest within our profession to hear more critical perspectives. Library workers’ feedback on the ACRL AI Competencies for Academic Library Workers is another. Some librarians are also vocalizing a need for slower and more critical investigations into GenAI tools, even when doing so risks social ostracism.

In the April 2025 issue of College & Research Library News, Ruth Monnier, Matthew Noe, and Ella Gibson candidly discuss their concerns about the GenAI tools that are increasingly being used and promoted in their organizations. Drawing attention to both the hype and the many ethical questions surrounding GenAI, they note the unpopularity of expressing reservations about adopting GenAI into libraries. Noe reflects, “the hype cycle is real and here it often feels like the choices are to get on board or lay down on the tracks to become part of a philosophy joke.” Monnier concurs: “I agree it is weird how fast universities, corporations, and individuals have pushed for the adoption and usage of generative AI, especially in the context of the rhetoric about ‘how bad’ social media and cellphones are within a K-12 environment. What makes this technology so unique or special that we as a society feel the immediate need to use and adopt it compared to other previous technologies?” (Monnier et al., 2025). The scope of this article does not allow for a close examination of such work, but additional resources in which library workers challenge the dominant feeling rules of GenAI include Joel Blechinger’s “Insist on Sources: Wikipedia, Large Language Models, and the Limits of Information Literacy Instruction” (2024), Violet Fox’s zine “A Librarian Against AI” (2024); and Matthew Pierce’s “Academic Librarians, Information Literacy, and ChatGPT: Sounding the Alarm on a New Type of Misinformation” (2025). Such work does not deny the fact that GenAI tools exist, nor does it suggest we can or should ignore these tools’ existence. It does open space for thinking more critically about the actual capacities and impacts of GenAI and making more intentional and informed choices about how we (dis)engage with GenAI technologies.

Many in our profession likely will not agree with much of what I have said here, but regardless of our individual views of GenAI technologies, I hope we can all agree that we value critical inquiry, and that an essential part of that process is making space for a consideration of varied perspectives and experiences. Critical inquiry and dialogue become possible and richer when we investigate the feeling rules that may be shaping, and sometimes limiting, professional discourse and practice. As we expand critical conversations about GenAI, we have more power to imagine the futures we want to build and cultivate, as Ruha Benjamin invites us to do.

In the spirit of collective imagining, I close with some questions I would like to collectively explore in and beyond libraries. I have organized these questions into two main areas: professional conversations and interactions and our teaching practices.

Professional conversations:

How can we be more inclusive of varied perspectives in our conversations about GenAI and related work, as we acknowledge the challenge of speaking honestly when one disagrees with dominant framings of GenAI and AI literacy?
How can we more critically examine our discourses and dialogues about GenAI, as we identify areas that may be unclear, inaccurate, or based on assumptions that need further investigation?
How do we practice a culture of care in these dialogic spaces and engage in constructive critique of ideas, not a critique of individuals?
How do we align our discourse about GenAI and related work with our professional and personal values, including those articulated in the ALA Core Values of Librarianship and the ALA Ethics of Librarianship?
How do we preserve time and energy for valuable work that may not be centered on GenAI, and that has been deprioritized because of the presently dominant focus on GenAI?

Teaching practices:

Historically, what have we valued about librarianship and information literacy education that still remains vital to us? How do we continue our engagement with those dimensions of our work?
What agency do students, faculty, and library workers have in whether/how they choose to use GenAI tools? What might it look like for teaching about GenAI technologies to allow for choice in whether and when to use GenAI tools? How can opting for non-use be respected as a choice that may be well-informed and even strategic?
What skills, understandings, and practices are prioritized or deprioritized in our teaching? What might be gained and what might be lost through our different prioritizations of pedagogical content and learning experiences? What guides our decisions about what to teach and how?

Many of the resources referenced in this article’s section on alternative imaginings can be springboards for further dialogue and for imagining the futures we want to have and to help build.

In closing, I return now to the end of Ruha Benjamin’s 2025 ACRL keynote. Ultimately, Benjamin revised her opening question “Who owns the future?” to “Who shares the future?” This reframing invites us to imagine collectively. That imagining will inevitably include differing views and beliefs, and it will not always be comfortable. But it can be more generative (in the human sense) and more inclusive when we consider questions like those above, and when we remember that most of us want a future in which people and communities can pose and explore their own questions, find sources of information worth their trust, and work together to actively make informed choices that support the common good. Most of us will hopefully also agree that this collective work is worth the discomfort of looking honestly at the feeling rules embedded in much of GenAI discourse and librarianship. We may be better able to discover and work toward the futures we want when we break those rules in ways that are kind and affirmative of everyone’s humanity, and that prioritize human thought and action over automation.

Acknowledgements

Though this work lists one author, the reality is that many people helped shape it.

My sincere thanks to external reviewer Joel Blechinger and Lead Pipe internal reviewers Ryan Randall and Pamella Lach for the time, thought, and care they gave to providing constructive feedback on the various stages of this article. Thank you also to Pamella, as Publishing Editor, for facilitating all steps of the publishing process, and to all members of the Lead Pipe Editorial Board for their attention to this article, the opportunity to publish it here, and all the work that goes into sustaining this volunteer-driven, open access publishing venue. I also want to express my appreciation to Melissa Wong, who provided writing feedback on a separate article on dominant narratives about generative AI in librarianship and encouraged me to further develop that article’s discussion of GenAI and feeling rules.

References

ACRL AI Competencies for Library Workers Task Force. (2025). AI competencies for academic library workers (Draft—March 5, 2025). https://www.ala.org/sites/default/files/2025-03/AI_Competencies_Draft.pdf

Ahmad, Z., Rahim, S., Zubair, M., & Abdul-Ghafar, J. (2021). Artificial intelligence (AI) in medicine, current applications and future role with special emphasis on its potential and promise in pathology: Present and future impact, obstacles including costs and acceptance among pathologists, practical and philosophical considerations. A comprehensive review. Diagnostic Pathology, 16(1), 24. https://doi.org/10.1186/s13000-021-01085-4

American Library Association. (2024, January 21). Core values of librarianship. https://www.ala.org/advocacy/advocacy/intfreedom/corevalues

Baer, A. (2025). Unpacking predominant narratives about generative AI and education: A starting point for teaching critical AI literacy and imagining better futures. Library Trends, 73(3), 141-159. https://doi.org/10.1353/lib.2025.a961189

Barringer, F. (2025, April 8). Thirsty for power and water, AI-crunching data centers sprout across the West. Bill Lane Center for the American West, Stanford University. https://andthewest.stanford.edu/2025/thirsty-for-power-and-water-ai-crunching-data-centers-sprout-across-the-west

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

Benjamin, R. (2025, April 2). Opening keynote. ACRL 2025, Minneapolis, MN.

Birdsall, W. F. (2001). A political economy of librarianship? Progressive Librarians Guide, 18. http://www.progressivelibrariansguild.org/PL/PL18/001.pdf

Blechinger, J. (2024, June 7). Insist on sources: Wikipedia, Large Language Models, and the limits of information literacy instruction. CAPAL 2024 (Canadian Association of Professional Academic Libraries), Online. https://doi.org/10.60770/36Y6-3562

Buchele, M. (2024, June 21). AI could strain Texas power grid this summer. KUT News. https://www.kut.org/energy-environment/2024-06-21/ai-texas-ercot-grid-conditions-artificial-intelligence-crypto

Comer, J. (2023, July 15). The psychological fears associated with AI. Psychology Today. https://www.psychologytoday.com/us/blog/beyond-stress-and-burnout/202307/the-psychological-fears-associated-with-ai

Cox, J. (2023, July 13). AI anxiety: The workers who fear losing their jobs to artificial intelligence. BBC. https://www.bbc.com/worklife/article/20230418-ai-anxiety-artificial-intelligence-replace-jobs

Crawford, K. (2024). Generative AI’s environmental costs are soaring—And mostly secret. Nature, 626(8000), 693–693. https://doi.org/10.1038/d41586-024-00478-x

D’Ignazio, C. (2022). D’Ignazio, C. (2022). Chapter 6: Refusing and using data. In C. D’Ignazio (Ed.), Counting Feminicide: Data Feminism in Action. MIT Press. https://mitpressonpubpub.mitpress.mit.edu/pub/cf-chap6

Electronic Privacy Information Center. (2023). Generating harms: Generative AI’s impact & paths forward. Electronic Privacy Information Center. https://epic.org/documents/generating-harms-generative-ais-impact-paths-forward

Espinel, R., & Tewell, E. (2023). Working conditions are learning conditions: Understanding information literacy instruction through neoliberal capitalism. Communications in Information Literacy, 17(2), 573–590. https://doi.org/10.15760/comminfolit.2023.17.2.13

Evans, L., & Sobel, K. (2021). Emotional labor of instruction librarians: Causes, impact, and management. In I. Ruffin and C. Powell (Eds.), The Emotional Self at Work in Higher Education (pp. 104–119). IGI Global. https://www.igi-global.com/chapter/emotional-labor-of-instruction-librarians/262882

Fox, V. (2024). A librarian against AI. https://violetbfox.info/against-ai

Garcia, P., Sutherland, T., Salehi, N., Cifor, M., & Singh, A. (2022). No! Re-imagining data practices through the lens of critical refusal. Proceedings of the ACM on Human-Computer Interaction, 6 (CSCW2, Article no. 315), 1–20. https://doi.org/10.1145/3557997

Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), 6. https://doi.org/10.3390/soc15010006

Glassman, J. (2017). The innovation fetish and slow librarianship: What librarians can learn from the Juicero. In the Library with the Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2017/the-innovation-fetish-and-slow-librarianship-what-librarians-can-learn-from-the-juicero

Graff, H. J. & Duffy. J. (2014). Literacy myths. In B.V. Street, S. May (Eds.), Literacies and Language Education. Encyclopedia of Language and Education. Springer. https://doi.org/10.1007/978-3-319-02321-2_4-1

Hibbert, M., Altman, E., Shippen, T., & Wright, M. (2024, June 3). A framework for AI literacy. Educause Review. https://er.educause.edu/articles/2024/6/a-framework-for-ai-literacy

Hochschild, A. R. (1979). Emotion work, feeling rules, and social structure. American Journal of Sociology, 85(3), 551–575. https://doi.org/10.1086/227049

Hochschild, A. R. (1983). The managed heart: Commercialization of human feeling (1st ed.). University of California Press. https://archive.org/details/managedheart00arli

Ivanova, I. (2025, May 20). Duolingo CEO says AI is a better teacher than humans—But schools will exist “because you still need childcare.” Fortune. https://fortune.com/2025/05/20/duolingo-ai-teacher-schools-childcare

Jeffery, K. & Coleman, J. (2025, June 16). ACRL AI competencies for library workers. Generative AI in Libraries (GAIL) Conference. Online. https://www.youtube.com/watch?v=PLvf_OhaWZg

Klein, N. (2025, April 14). Silicon Valley’s AI coup: “It’s draining our real world” [Podcast]. Retrieved May 28, 2025, from https://podcasts.apple.com/us/podcast/silicon-valleys-ai-coup-its-draining-our-real-world/id1748845345?i=1000703466413

Lo, L. (2025). AI literacy: A guide for academic libraries. College & Research Libraries News, 86(3), 120-122. https://doi.org/10.5860/crln.86.3.120

Logan, C. (2024). Learning about and against generative AI through mapping Generative AI’s ecologies and developing a Luddite praxis. ICLS 2024 Proceedings (International Society of the Learning Sciences). https://repository.isls.org//handle/1/11112

McGranahan, C. (2016). Theorizing refusal: An introduction. Cultural Anthropology, 31(3). https://doi.org/10.14506/ca31.3.01

Merchant, B. (2023). Blood in the machine: The origins of the rebellion against Big Tech. Little, Brown and Company.

Monnier, R., Noe, M., & Gibson, E. (2025). AI in academic libraries, part one: Concerns and commodification. College & Research Libraries News, 86(4), Article 4. https://doi.org/10.5860/crln.86.4.173

Nguyen, A., & Mateescu, A. (2024). Generative AI and labor: Value, hype, and value at work. Data & Society. https://datasociety.net/library/generative-ai-and-labor

Nichols, T. P., Smith, A., Bulfin, S., & Stornaiuolo, A. (2022). Critical literacy, digital platforms, and datafication. In R. A. Pandya, J. H. Mora, N. A. Alford, & R. S. de R. Golden (Eds.), The Handbook of Critical Literacies (pp. 345–353). Routledge. https://doi.org/10.4324/9781003023425-40

Okamoto, S. (2023, June 26). Worried about AI? You might have AI-nxiety – here’s how to cope. The Conversation. http://theconversation.com/worried-about-ai-you-might-have-ai-nxiety-heres-how-to-cope-205874

Pasek, A. (2023). Getting into fights with data centers: Or, a modest proposal for reframing the climate politics of ICT. Experimental Methods and Media Lab. https://emmlab.info/Resources_page/Data%20Center%20Fights_digital.pdf

Pierce, M. (2025). Academic librarians, information literacy, and ChatGPT: Sounding the alarm on a new type of misinformation. College & Research Libraries News, 86(2), Article 2. https://doi.org/10.5860/crln.86.2.68

Salvaggio, E. (2025, February 9). Anatomy of an AI coup. Tech Policy Press. https://techpolicy.press/anatomy-of-an-ai-coup

Sam Altman [@sama]. (2022, March 20). I think US college education is nearer to collapsing than it appears.[Tweet]. Twitter. https://x.com/sama/status/1505597901011005442

Sano-Franchini, J., McIntyre, M., & Fernandes, M. (2024). Refusing GenAI in writing studies: A quickstart guide. Refusing GenAI in Writing Studies. https://refusinggenai.wordpress.com

Selber, S. A. (2004). Multiliteracies for a digital age. Southern Illinois University Press.

Shah, C., & Bender, E. M. (2024). Envisioning information access systems: What makes for good tools and a healthy web? ACM Transactions on the Web, 18(3), 33:1-33:24. https://doi.org/10.1145/3649468

Shelby, R., Rismani, S., Henne, K., Moon, Aj., Rostamzadeh, N., Nicholas, P., Yilla-Akbari, N., Gallegos, J., Smart, A., Garcia, E., & Virk, G. (2023). Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 723–741. https://doi.org/10.1145/3600211.3604673

Shukla, P., Bui, Ph. Levy, S. S., Kowalski, M., Baigelenov, A., & Parsons, P. (2025, April 25). De-skilling, cognitive offloading, and misplaced responsibilities: Potential ironies of AI-assisted design. CHI EA ’25: Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (Article No.: 171), 1-7. https://doi.org/10.1145/3706599.3719931

Shuler, S., & Morgan, N. (2013). Emotional labor in the academic library: When being friendly feels like work. The Reference Librarian, 54(2), 118–133. https://doi.org/10.1080/02763877.2013.756684

Sloniowski, L. (2016). Affective labor, resistance, and the academic librarian. Library Trends, 64(4), 645–666. https://doi.org/10.1353/lib.2016.0013

Sobel, K., & Evans, L. (2020). Emotional labour, information literacy instruction, and the COVID-19 pandemic. Journal of Learning Development in Higher Education, 19. https://journal.aldinhe.ac.uk/index.php/jldhe/article/view/607

Suthar, A. C., Joshi, V., Prajapati, R., Suthar, A. C., Joshi, V., & Prajapati, R. (2022). A review of generative adversarial-based networks of machine learning/artificial intelligence in healthcare. In S. Suryanarayan Iyer, A. Jain, & J. Wang (Eds.), Handbook of Research on Lifestyle Sustainability and Management Solutions Using AI, Big Data Analytics, and Visualization. IGI Global Scientific Publishing. https://doi.org/10.4018/978-1-7998-8786-7.ch003

Teaching Commons, Stanford University. (n.d.). Understanding AI literacy. Teaching Commons, Stanford University. https://teachingcommons.stanford.edu/teaching-guides/artificial-intelligence-teaching-guide/understanding-ai-literacy

The White House. (2025, April 23). Fact sheet: President Donald J. Trump advances AI education for American youth. The White House. https://www.whitehouse.gov/fact-sheets/2025/04/fact-sheet-president-donald-j-trump-advances-ai-education-for-american-youth

Vance, J. (2021, November 2). The universities are the enemy. National Conservatism Conference 2, Orlando, Florida. https://nationalconservatism.org/natcon-2-2021/presenters/jd-vance

Zong, J., & Matias, J. N. (2024). Data refusal from below: A framework for understanding, evaluating, and envisioning refusal as design. ACM Journal on Responsible Computing, 1(1), 1–23. https://doi.org/10.1145/3630107

^[1] For those who would argue we should not conflate the extreme views of a few politicians with those of the AI industry, it is worth noting statements by tech leaders who have argued AI can replace education (Ivanova, 2025; Sam Altman [@sama], 2022).

^[2] For an exploration of who the Luddites actually were and why the term’s pejorative use is misplaced, see Brian Merchant’s book Blood in the Machine (2023).

^[3] Hochshild’s initial research on emotional labor focused on the experiences of flight attendants and debt collectors (Hochshild, 1983). Subsequent research by others building on Hochschild’s work examined the emotional labor of numerous caring professions, including librarianship, where workers are often expected to consistently display a friendly and cheerful demeanor (Evans & Sobel, 2021; Shuler & Morgan, 2013; Sloniowski, 2016; Sobel & Evans, 2020).

^[4] The Stanford University Teaching Commons AI Literacy Framework is based partly on Selber’s 2004 multiliteracy framework, which includes three main dimensions of literacy: functional literacy, critical literacy (related to social and ethical issues), and rhetorical literacy.

^[5] The choice to include dispositions in the AI Competencies for Academic Library Workers was likely inspired by the ACRL Framework for Information Literacy, which lists dispositions for each of its six conceptual frames.

The diffusion of ideas

Drezner’s typology of sources

Public intellectuals and thought leaders

Platform publications

Ischools and influence on policy and practice

Coda: overview of and links to full contribution

References

The price rises

Bigger doses are needed

The deleterious effects kick in

The companies

The workers

The economy

Obtaining Library Holdings: The Summary

Obtaining Library Holdings: The Whole Story

Processing ISSNs for Siblings, Mostly

Challenges

Querying Holdings: WorldCat

Querying Holdings: Z39.50

Querying Holdings: Shared Print

Minding the Gaps

Observations

Introduction

An Unexpected Literature Review

Applications in Librarianship

Giving until it hurts

Feeling the disparity

Starting from the top

Taking it slow

Conclusion

Acknowledgements

Works Cited

Motivation

Schema Design

Core Component

Extended Component

Implementation

Interoperability and Mapping

Limitations and Future Work

References

I. Opening Plenary and Community Building

JPL’s Vision and the Future of Planetary Exploration

II. Advancing Resilient Space Systems

III. Exascale Computing and Its Impact on Aerospace Research

IV. AI as a Catalyst for Transformative Aerospace Applications

V. NASA’s Evolving Vision for Low Earth Orbit (LEO)

VI. NASA Langley Specific Talks

The Cartesian Move Algorithm: A Simpler Path to Precision

Performance Insights from Hardware and Simulation

Practical Implications for Planetary Robotics

VII. Conclusion

How to make a custom template for the Remarkable 2

The tl;dr

Caveats

Step 1 - create your template

Step 2 - connect to your reMarkable via SSH

Step 3 - copy your template to the reMarkable

Step 4 - update the templates.json file

Step 5 - reboot

Optional bonus step 7 - SSH keys

This month’s news:

This month’s DLF group events:

Arts and Cultural Heritage Working Group Special Session: Reimagining Digital Library Software for Arts & Cultural Heritage Collections

This month’s open DLF group meetings:

Get Involved / Connect with Us

Introduction

Context and Positionality

Barriers Latine Students Face

Where we are coming from

Becoming PRAXISioners

On Plática Methodology

Project Overview

From creating community to developing a research idea

Engaging students in co-curricular learning

Ongoing Reflexiones

Reflexiones post book club pilot

Reflexiones a year later

Conclusion

Acknowledgements

References

Step 4 - update the `templates.json` file