As a member of OCLC’s Publisher Relations Team, my colleagues and I serve as intermediaries, representing the needs of libraries in the publishing world and vice versa. This can feel like mediating between squabbling roommates or sometimes like pairing up lost gloves—a perfect fit, a meant-to-be. While it may be tempting to see only a dichotomy, two sides of a story, a chasm to be bridged (I could go on with the metaphors), we choose to focus on the commonalities between libraries and publishers, building connections instead of dwelling on the gap.
Much easier said than done.
Occasionally, we get the opportunity to put our idealistic views to the test and bring the publisher’s perspective to a seemingly library-specific issue, as we did in the recent OCLC Research report Improving Open Access Discovery for Academic Library Users. I consulted with the report’s authors throughout the research process, offering alternative readings from a non-library view. This project shed light on common challenges, emphasizing that no single entity can tackle the enormity open access discovery on its own. The outcome is a report that is valuable to open access (OA) stakeholders outside the library, as well.
“Truly improving the discoverability of OA publications requires all of the stakeholders involved to consider the needs of others within the lifecycle.”
Libraries, publishers, technology providers, and aggregators all play a role in the lifecycle of OA publications. These many OA content workflows and responsibilities don’t exist in a silo, but rather integrate with and augment each other. Focusing solely on the library’s role would lose the assistance and efficiencies offered to the library by the others. As the penultimate line of the report states, “Truly improving the discoverability of OA publications requires all stakeholders involved to consider the needs of others within the lifecycle.”
Although the report is primarily aimed at librarians, the authors have thoughtfully identified significant findings from the study and emphasized key takeaways for publishers and other non-library stakeholders. These important points are visually highlighted using magenta boxes throughout the report. However, to provide even greater clarity for this specific audience, I have extracted a few of the most relevant points under the categories of Metadata, Access, and Trust, and offered brief explanations of their significance.
Metadata
Consider the metadata that library staff identified as important for the discoverability of OA publications when identifying potential improvements in how metadata can be created, shared, harvested, and displayed.
We all (should) know that metadata is important for all content discovery. Metadata that is used to discover traditional publications, such as author name, title, abstract, keywords, journal name, volume, issue, publication date, and subject are also important for the discoverability of OA publications. However, librarians also stated an additional need for persistent identifiers such as ISSN, ISBN, DOI, ISNI, ORCID, and ROR, to allow library systems to potentially make linkages between resources to further aid discovery and “reduce confusion among their users”(Improving Open Access Discovery, 13). Moreover, adding persistent identifiers into metadata for all types of content is good practice, as the recent report recommending a US national PID strategy describes. Not only do PIDs improve discovery, but they also support interoperability and automation, reduce administrative burden, improve research assessment and research integrity efforts, and save money when widely adopted.1 So, if you have any PIDs, add them in. If not, go get one… and add it in.
Users evaluate resources concurrently and iteratively as they search and access them. Both metadata and system capabilities need to support these simultaneous processes of discovery, evaluation, and use.
The absence or presence of metadata is the differentiator for discovery. Librarians asked that publishers include metadata about the use of peer review, publication version, and OA status (through the inclusion of license information) to help systems differentiate the content and enable users to better evaluate their resources of choice. OA publications may have multiple versions (such as the version of record, author accepted manuscript, or preprint), and these versions may be aggregated across various repositories, with only metadata available to differentiate them in search results. The completeness of the provided metadata will influence users in selecting a version. For instance, if one version clearly indicates that the content has undergone peer review and is the OA version of record, users may be inclined to choose it over a result with minimal context.
OA content that isn’t discovered doesn’t get used, and OA content that doesn’t get used doesn’t get supported by libraries. Therefore, publishers should provide the most complete metadata possible about OA content as well as partner with library staff to understand what metadata they would like to receive and help “authors understand the role that quality metadata plays in the discovery of their work.” (Improving Open Access Discovery, 35)
To find out more about how publishers can better create metadata about open access books, see EDItEUR’s application note “Open Access Monographs in ONIX” (both text and video).
Ensure that article-level metadatais provided by all publishers, regardless of size. This makes it easier for library staff to add these OA publications to their collections to meet users’ needs.
Even the smallest OA publishers of either books or journals should ensure their metadata is thorough and shared with trusted aggregators such as DOAB or DOAJ. Those Open access platforms and others provide aggregated metadata to libraries in standardized formats that allow systems providers to efficiently index the content. These aggregators may require publishers to apply and be accepted before adding their metadata and content to the repository, but they also provide general advice to all publishers on creating high-quality metadata. Seek it out and follow it. These aggregators see metadata of all levels of quality every day and know what works.
When adding OA publications to knowledgebase collections, clearly name the collection and identify what types of OA resources are in the collection and how much of it is OA. Provide this information consistently to help libraries identify the content they are looking for within the potentially duplicated records.
Knowledge bases are largely managed through the use of the KBART, or Knowledge Base and Related Tools file. NISO provides more detailed information about the KBART format, but in general, the KBART file is the special sauce that keeps the metadata record connected to the hosted content and the library’s catalog. This very library-centric data format can sometimes prove mysterious to publishers who focus on the title- or article-level metadata, but it is crucial in making collection development and management workflows run smoothly for libraries.
To help understand KBART, imagine a shipping container full of individual copies of physical books being sent to a warehouse. There is a shipping manifest pasted to the outside of the container that lists what books are found within. Without this manifest, the warehouse staff would have no idea what they are dealing with when they open the container, resulting in inefficient check-in processes and some unhappy staff. The KBART acts as this shipping manifest, itemizing the contents of a publisher’s digital collection and allowing the libraries to expediently add it to their catalog.
As the OA Discovery report points out, the choice to add OA content to a library catalog is not necessarily simple. Librarians weigh many factors when considering this work. So, make their job easier by providing thorough and consistent KBART files, and name your collections as clearly as possible. If the collection contains only open access content, then please say that in the collection name. Otherwise, the collection might be overlooked for consideration. Unclear labeling can lead to unhappy librarians, which is something nobody wants.
Access
Providing seamless authentication to content behind a paywall saves users and library staff time and effort.
Studies about Information-seeking behaviors always get librarians excited, and the Open Access Discovery report doesn’t disappoint. You can see complete details within the report, but it finds, among other things, that users are most likely to search for scholarly peer-reviewed content first on a search engine, with the library catalog coming in third. The publisher’s website was further down the list in fifth place. So going straight to the publisher is not a common practice.
After users navigated their search results and tried to access the digital full text they were seeking, they often faced barriers that had a negative impact on their experience. These barriers included the requirement for payment, unavailability through their library, and the need to log in—three barriers that are directly related to the traditional paywalled access model for scholarly publications.
Publishers should take note that the users’ most common response when hitting the barrier was to seek an OA version of that content. While logging in and accessing the content behind the paywall was a close second, it is significant that users chose to instead pivot to another version of the content. Of course, it’s also worth noting that users were more likely to give up on the content altogether than to ask a librarian for help or find the content in a physical format.
Publishers should support seamless authentication to their paywalled content to not only save the user and librarian time and effort, but to also ensure that the content they host is being used and found of value. Users don’t care about the business model supporting the content. They just want ready access to the content they are seeking. After taking the effort to make content discoverable, publishers need to make those last mile connections possible and support authentication through to their content, regardless of how that content is funded.
Trust
This takeaway was highlighted for library staff, but is relevant to publishers as well:
Provide users more guidance about how to evaluate whether a scholarly publication is trustworthy, including reasons why it’s important to consider the journal, publisher, and author’s reputation in addition to whether the publication has been peer-reviewed.
Trustworthiness and reputation are important. Publishers, you know this. Librarians make a choice about what OA content they bring into their collections. They do not just have an open-door policy. If you support the publication and discoverability of high-quality open access content, make sure that you also support the libraries’ collection development processes around OA by following these three recommendations:
Be transparent. Make it easier for your reputation to be evaluated. Avoid marketing language that may sway the evaluator away from your intention. Fill your metadata with all the PIDs, funder information, and peer review information that you possibly can.
Be helpful. The Open Access Discovery report calls on libraries to educate users on how to publish OA as well as offer more holistic instruction on OA, like how “licensing and versioning work throughout the publication lifecycle, what different publishing models mean about how OA publications are created and funded, and how to determine what OA publications are trustworthy” (Improving Open Access Discovery, 31). Support libraries by communicating about your OA efforts more broadly. OA interactions should go beyond negotiating transformative agreements and include information to support user awareness of your OA efforts. This can help foster library-wide conversations around OA and lend credence to your trustworthiness.
Be trustworthy. The need to establish trust is a repeated refrain within this report. Trust is earned and libraries take actions based on this intangible feeling by analyzing tangible actions. By being transparent, providing helpful information, and building trust with library partners, you increase the likelihood of your OA content being readily added to their catalogs.
Final thoughts
As publishers, librarians, and discovery partners continue to navigate the evolving landscape of open access, the myriad of publishing models and methods to discover content will continue to strain the sometimes-tenuous bonds between library and publisher. But collaboration remains key. All sides of the story are really just focused on getting the right information to the right user at the right time. Ultimately, improving metadata, streamlining access, and building trust are foundational in ensuring OA content is both discoverable and valued.
But how do we measure the impact of these efforts? Usage data plays a crucial role in understanding how OA publications are accessed and utilized. In a follow-up post, we’ll delve into the significance of usage reporting, exploring how better analytics can help publishers and libraries alike make informed decisions that enhance discoverability and engagement.
Regulation of cryptocurrencies was an issue in last November's US election. Molly White documented the immense sums the industry devoted to electing a crypto-friendly Congress, and converting Trump's skepticism into enthusiasm. They had two goals, pumping the price and avoiding any regulation that would hamper them ripping off the suckers.
tl;dr: DeFi cannot be permissionless, allow arbitrary innovation and comply with any meaningful regulations. You can only choose two of those properties. If you accept a limited form of innovation you can have two-and-a-half of them.
Fundamental results in logic and computer science impose a trade-off on any permissionless system’s ability to both permit innovation and achieve compliance with non-trivial regulations. This result depends only on long-settled concepts and the assumption a financial system must provide a logically consistent view of payments and balances to users.
This is a semi-technical treatment, with more formal work proceeding elsewhere.
Two years later, the "more formal work" has finally been published in a peer-reviewed Nature Publishing journal, Scientific Reports, which claims to be the 5th most cited journal in the world. Jonathan Reiter tells me that, although the publishing process took two years, it did make the result better.
Below the fold I discuss Tradeoffs in automated financial regulation of decentralized finance due to limits on mutable turing machines by Ben Charoenwong, Robert M. Kirby & Jonathan Reiter.
This team were pioneers in applying fundamental computer science theorems to blockchain-based systems, starting in April 2022 with The Consequences of Scalable Blockchains in which they showed that implementing an Ethereum-like system whose performance in all cases is guaranteed to be faster than any single node in the network is equivalent to solving the great unsolved problem in the theory of computation, nicknamed P vs. NP. And thus that if it were implemented, the same technique could break all current cryptography, including that underlying Ethereum.
But, I believe, they were not the first. That appears to have been Tjaden Hess, River Keefer, and Emin Gün Sirer in Ethereum's DAO Wars Soft Fork is a Potential DoS Vector (28th June 2016), which applied the "halting problem" to "smart contracts" when analyzing possible defenses against DOS attacks on a "soft fork" of Ethereum proposed in response to "The DAO".
We examine which decentralized finance architectures enable meaningful regulation by combining financial and computational theory. We show via deduction that a decentralized and permissionless Turing-complete system cannot provably comply with regulations concerning anti-money laundering, know-your-client obligations, some securities restrictions and forms of exchange control. Any system that claims to follow regulations must choose either a form of permission or a less-than-Turing-complete update facility. Compliant decentralized systems can be constructed only by compromising on the richness of permissible changes. Regulatory authorities must accept new tradeoffs that limit their enforcement powers if they want to approve permissionless platforms formally. Our analysis demonstrates that the fundamental constraints of computation theory have direct implications for financial regulation. By mapping regulatory requirements onto computational models, we characterize which types of automated compliance are achievable and which are provably impossible. This framework allows us to move beyond traditional debates about regulatory effectiveness to establish concrete boundaries for automated enforcement.
They summarize the fundamental problem for the automation of DeFi regulation:
DeFi features some computationally challenging properties: (1) Turing-complete programming, (2) permissionless access to both transact and publish code and (3) selectively immutable code. The permissionless mutability of the system combined with the Turing completeness motivates our inquiry. A system running Turing-complete code where updates can be published permissionlessly cannot make any guarantees about its future behavior, a conclusion from early work on Universal Turing Machines (UTM).
it is possible to construct both (1) classes of algorithms that can make credible promises and (2) restricted update mechanisms that enable credible promises. In other words, DeFi platforms can provide compliant services like traditional centralized providers through fully automatic mechanisms.
Consider an economy modeled as a Turing Machine, where the machine’s state corresponds to the state of the real economy. We formalize compliance as a property of system state transitions that can be verified mechanically, following Theorem 5.8.5 of Savage. Specifically, a compliant system is one where no sequence of permitted operations can result in a state that violates predefined regulatory constraints set by an external regulator.
For example, if a regulation prohibits transactions with certain addresses, compliance means no sequence of permitted operations can transfer value to those addresses, either directly or indirectly. Similarly a regulator may impose requirements on intermediaries transacting in certain assets or products akin to depository receipts for those assets. Compliance would then require ensuring one does not unknowingly transact in “products akin to depository receipts” for a given list of assets.
Because they model regulated systems in terms of states and the transitions between them, they can apply results from compputer science:
This formulation maps to well-known results in computability, such as the Halting Problem and the more general impossibility known as Rice’s Theorem: No algorithm exists to determine from the description of a [Turing Machine] whether or not the language it accepts falls into any proper subset of the recursively enumerable languages. In other words, we cannot categorize arbitrary programs into specific subsets automatically and reliably. In financial regulation, the canonical “proper subset” is a ban on interacting with a given address: interactions involving a banned address are forbidden, and the acceptable subset of states includes no such transfers.
They proceed to use cryptocurrency "mixers" as an example. They explain that Rice's Theorem means that because any general description of "mixers" defines a "proper subset" of all programs, and thus there is no automatic or reliable method by which a "smart contract" can be assessed against the general description.
Of course, as Justice Potter Stewart famously said "I know it when I see it". Human regulators will have no difficulty in recognizing a "smart contract" as a mixer, if only because in order to function it needs liquidity. To attract it the "smart contract" needs to advertise its service. So can the regulators ban interactions with the specific addresses of the mixers they recognize? Charoenwong et al make two arguments
First, this severely limits the regulator’s power from regulating a mutable set of protocols to only specific ones. In other words, what is often called “principles-based” regulations (as opposed to rules-based regulations) are impossible. We cannot ban “mixers” generally – we can only ban “mixers A, B and C.” In some sense, this is akin to banning specific means of murder rather than simply banning murder, no matter the means.
Then they introduce the time element inherent in human regulation:
Second, and more importantly, we cannot enforce even these more straightforward rules reliably. Consider these steps:
Deploy a new, confusingly-coded, “mixer” labeled X
Send funds to the mixer X
Withdraw from the mixer X and feed into the mixer A
Withdraw from the mixer A and feed into the mixer X
Withdraw from the mixer X and spend freely
This procedure works because we cannot identify arbitrary mixers, so we are free to deploy and then use them before they get put on the banned list. As a result, the regulator cannot even ban all interaction with enumerated mixers – it can only reliably ban some forms of interaction. This result is a severe limit on regulatory power.
If we consider that compliance exists in an automated form, operating on publicly available data in real-time, anyone accepting that final transfer must operate in a compliant fashion. If, instead, the plan is to decide these things later based on non-mechanical analysis, we are simply operating a conventional legal system with some more computers involved. Concretely, if that last transfer can be ruled illegal after the fact, it was never an automated financial system.
This result generalizes to services other than mixers. Rice's Theorem means it isn't possible to ban a class of services, and banning individual services identified by humans will always be behind the curve.
The authors illustrate the application of their result with real-world examples of regulatory failure:
These case studies-The DAO, Beanstalk Finance, Compound, Terra/LUNA, and MakerDAO-collectively illustrate the practical manifestations of our paper’s theoretical findings. Each example demonstrates a different facet of the challenges in implementing reliable, automated compliance mechanisms in decentralized, Turing- complete systems. From governance attacks to stablecoin collapses and liquidation issues, these incidents underscore the impossibility of guaranteeing specific regulatory outcomes without compromising system flexibility or introducing external interventions.
The authors argue that there are two ways to construct a system that does allow automated regulation, by making it permissioned rather than permissionless, or by enforcing a non-Turing-complete language for the "smart contracts". In practice many cryptocurrencies are permissioned — in Decentralized Systems Aren't I pointed out that:
The fact that the coins ranked 3, 6 and 7 by "market cap" don't even claim to be decentralized shows that decentralization is irrelevant to cryptocurrency users.
There are many examples of cryptocurrency systems that claim to be decentralized but are actually permissioned. Patrick Tan described one in Binance Built a Blockchain, Except it Didn’t:
For all its claims of promoting decentralization, Binance runs two “blockchains” that are not just highly centralized, but regularly alter history, undermining one of the core tenets of the blockchain — immutability.
Consider a scripting language where we cannot have variables. A simple “splitting the tab” contract might look like:
This is dangerous if we have a regulation that certain addresses cannot be paid. The issues raised surrounding the DAO hack, discussed above, apply here. But what if the only way to transfer a token is to call SendTo and that function looks like:
There is no issue if a function only calls RealSendTo from SendTo. In such cases, the regulator’s responsibility is to maintain the BannedList, and the system is permissioned.
Their example of a system that is not-Turing-complete is not useful, because it requires BannedList to be a constant. As they point out, if the system is to be useful, BannedList must be a variable that is updated by the regulator, and thus the system is permissioned. It may well be that, because BannedList is a variable, that the system is Turing-complete after all. I can't do the analysis to determine if this is the case, but it is known that even a small number of variables makes a system Turing-complete.
Thus the paper is somewhat misleading, in the sense that it reads as if regulated systems can be either permissioned or not-Turing-complete, but it fails to provide an example of a system that is permissionless and not-Turing-complete. The example that looks as if it is going to be not-Turing-complete but permissionless seems to be permissioned and not-Turing-complete.
I would argue somewhat differently:
Charoenwong et al show that a permissionless, Turing-complete system cannot be regulated.
In the real world no-one is going to cripple their system by making it not-Turing-complete.
Even if a not-Turing-complete system could be built it isn't clear that it would be useful.
In essence, the regulatory act of enforcing that a system is, and remains, not-Turing-complete is permissioning the system.
Thus in practice permissionless systems cannot be regulated.
Reflexivity has been MicroStrategy’s secret sauce. The company prints equity or equity-linked securities to buy bitcoin, boosting bitcoin’s price, which in turn inflates MicroStrategy valuation. This allows it to issue more stock and repeat the cycle.
This MonoStrategy has enabled MicroStrategy to acquire 2.25 per cent of all bitcoin in existence, a hoard worth around $46bn at current prices. The company trades at nearly double the value of its underlying bitcoin holdings, a testament to belief of some investors in Michael Saylor’s project.
...
As long as the stock trades at a premium to its bitcoin holdings, the company can keep issuing novel securities and finding new buyers. Meanwhile, with MicroStrategy comprising a sizeable portion of crypto inflows (28 per cent in 2024, according to JPMorgan), it has helped sustain bitcoin’s ascent.
Both our co-Presidents are heavily invested in pumping cryptocurrencies. The new administration is planning to solve the Greater Fool Supply-Chain Crisis by turning the Federal government into the greater fool of last resort by establishing the strategic Bitcoin reserve. What better greater fool than one who can print his own money? The result is that the S.E.C. Moves to Scale Back Its Crypto Enforcement Efforts, as Matthew Goldstein, Eric Lipton and David Yaffe-Bellany report:
The Securities and Exchange Commission is moving to scale back a special unit of more than 50 lawyers and staff members that had been dedicated to bringing crypto enforcement actions, five people with knowledge of the matter said.
a major flash crash in Ether as leveraged positions were liquidated served as a stark reminder of how digital-asset markets still lack almost all of the guardrails installed on traditional markets over the years due to various misadventures that hurt investors. The second-largest token was down a bit as traditional markets opened for trading Monday morning in Asia, reacting to concerns about US tariffs against Canada and Mexico. Then in a matter of minutes its losses extended to about 27%, before quickly recovering.
...
There’s no indication that any of that is likely to change anytime soon. So Sunday night’s price action in Ether – and similar dives in a slew of other altcoins and memecoins – serves as a reminder that for better or worse this asset class still exists far outside of the padded walls of traditional markets, regardless of how much tradfi embraces it.
With millions of digital transactions taking place every day, have you ever wondered about the complex world behind your simple card swipe?
In this week's Thursday Threads, we delve into the multi-layer maze that is the credit card industry.
Grappling with $130 billion in fees, merchants are the invisible heroes who bear the cost of our seamless payment experience.
As we unravel this thread, we'll dissect the structure of these processing fees, explain how your spending fuels reward systems, and describe the ongoing antitrust battle between credit card processors and merchants.
We'll also see what your credit card issuer knows about your spending habits, bringing to light the monetization of these insights.
Delving into murkier waters, we'll explore the shadow realm of debt collection and the distress it can cause to consumers.
And to wrap up, are we ready for (X)Twitter to become our "everything app"?
Plus, one thing I learned this week and a cat picture.
Did you know credit card companies take in $130 billion in fees? And the invisible-to-the-consumer payer of those fees? Merchants.
That "swipe fee" has two parts: a 5¢ to 10¢ fixed fee and a percentage of the charged amount ranging from 1% to 3%.
That higher percentage fee of the amount you charge on your credit card likely goes to pay for credit card rewards.
For 20 years, merchants have been fighting an antitrust battle with credit card processors about increases in the fees they are paying.
Have you stopped to think about how much your credit card issuer knows about you based on what you buy and where? The card issuer has.
Mastercard sells your purchase history to advertisers, and you can opt out.
This Week I Learned: The origin of the computer term "mainframe" comes from "main frame" — the 1952 name of an IBM computer's central processing section.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Who pays for credit card operations? Merchants
Today, small businesses face another attack, this time from Wall Street. Reynolds’s main concern is now swipe fees, which are the fees credit and debit card networks charge merchants for processing transactions. For over a decade, Reynolds and his colleagues on Main Streets around the country have watched their monthly billing statements climb because of these fees. In 2021, big banks, in coordination with the two credit card behemoths, Visa and Mastercard, took in over $130 billion from swipe fees (also called interchange fees), more than double what they reaped in 2010.
The article highlights the growing mobilization of small businesses against the rising swipe fees imposed by credit card companies Visa and Mastercard.
These fees have increased significantly, costing small retailers more than utilities and approaching their labor costs.
The article describes the struggles of small business owners who, after recognizing their shared challenges during the pandemic, formed groups like the Merchants Payments Coalition to advocate for reform.
The campaign aims to replicate the success of the Durbin amendment, which previously capped debit card fees, by pushing for the Credit Card Competition Act to do the same for credit transactions.
(You might remember the Credit Card Competition Act...it was the target of blanket advertising in 2023 along the lines of how Congress wants to take away your credit card rewards.)
Well, how much are the fees we're talking about...
What are the fees?
In 2023, credit card companies in the U.S. earned $135.75 billion from processing fees charged to merchants. Families paid an average of $1,102 in swipe fees in 2023, according to the Merchants Payments Coalition. The money made from these fees increased at a faster rate than the actual money spent on purchases, adding fuel to the already fierce debate between credit card companies and businesses that complain about so-called swipe fees. Businesses claim that raising interchange fees, which are paid by merchants on each transaction made with a credit or debit card, worsen inflation and pinch consumers because businesses could opt to pass the cost of higher interchange fees onto consumers. Most merchants need to accept credit card payments, which makes credit card processing fees a cost of doing business. For more on how much those costs can be -- and how they vary among credit card companies -- we&aposve collected all the latest data.
Have you ever been charged an extra fee by a company for using a credit card?
It is not common, but it does happen, and it is because the company has been charged a fee to accept the card.
That is called the "interchange fee".
This table is from the article quoted above:
Payment network
Average credit card processing fees
Visa
1.23% + $0.05 to 3.15% + $0.10
Mastercard
1.15% + $0.05 to 3.15% + $0.10
American Express
1.10% + $0.10 to 3.15% + $0.10
Discover
1.56% + $0.10 to 2.40% + $0.10
The range—1.23% to 3.15%, in the case of Visa—is based on a few factors:
Merchant category: the type of business
Card tier: the level of rewards a card offers, or no reward at all
Processing method: whether a card was swiped, dipped, tapped, keyed manually, or used online
One of the significant factors is card tier, which leads us to ask:
Why are banks eager to push the higher-fee rewards cards?
To highlight something which is routinely surprising for non-specialists: interchange fees [the fees paid by the card-accepting business, or "merchant"] are not constant and fixed. They are set based on quite a few factors but, most prominently, based on the rank of card product you use. The more a card product is pitched to socioeconomically well-off people, the more expensive interchange is. Credit card issuers explicitly and directly charge the rest of the economy for the work involved in recruiting the most desirable customers.
The author of this article was a technology executive at Stripe and now makes a living doing consulting and writing blog posts.
The article delves into the mechanics behind credit card rewards, emphasizing the role of interchange fees, which are paid by businesses accepting credit cards and distributed among various parties in the credit ecosystem.
It explains that credit card issuers use these fees to attract high-value customers by offering rewards programs that enhance the spending experience.
The discussion highlights that not all cards offer rewards, with some cards targeting lower-income users primarily to provide access to credit rather than rewards.
If you want a more in-depth view of how credit cards work, I recommend the author's Improving how credit cards work under the covers.
Interchange settlement
...on every credit card transaction in the MasterCard and Visa systems, the merchant pays a swipe fee, also known as the merchant discount fee. That fee is paid to the merchant&aposs bank. The merchant&aposs bank then pays a "network fee" to MC or V and also pays an "interchange" fee to the bank that issued the card. The interchange fee is not one-size-fits-all. Instead, it varies by merchant type (and sometimes volume) and by the level of rewards/service on the card. So merchants are not directly charged the interchange fee, but it is passed through to them, sometimes explicitly. The problem that merchants face is that they cannot exert any pressure on the interchange fee—nominally an interbank fee—even though it is set based on their line of business. Nor can merchants discriminate among types of credit cards by charging more for rewards cards, etc.
Twenty years ago, merchants sued the credit card companies alleging anti-trust violations about this scheme, and the case is still going on.
If the "Credit Card Competition Act" isn't re-introduced to Congress, maybe merchants can get relief from the court system.
How much does the credit card company know about you?
A 2002 study of how customers of Canadian Tire were using the company&aposs credit cards found that 2,220 of 100,000 cardholders who used their credit cards in drinking places missed four payments within the next 12 months. By contrast, only 530 of the cardholders who used their credit cards at the dentist missed four payments within the next 12 months.
Credit card use is increasing, and the aggregation of all that data can be a goldmine of information about people.
A study showed that purchasing habits could predict payment reliability, with certain products indicating a higher likelihood of missed payments.
And analysis of that data drives the efforts by banks to get you to use their higher reward cards.
There is also the world of "Level 3 Data", where merchants transmit line items about your purchases to the credit card processor.
Think of it as: every line on grocery receipt.
Except, as near as I can tell, it Level 3 data doesn't apply to consumer credit cards...only to business-to-business and government-to-business cards.
Still, it is an interesting fact to know and perhaps something to keep an eye on in case it leaks into the consumer credit card space.
Exercising control of your data at Mastercard
When you use your Mastercard, the company receives data about your transaction, like how much you spent, where and on what day. It needs this information to be your credit card – but Mastercard doesn&apost just use your data to complete payments. It monetizes that information by selling it to data brokers, advertisers and other companies. Mastercard&aposs data practices contribute to a larger economy of data harvesting and data sales that can be harmful to consumers.
Mastercard has a program to monetize transaction data by selling it to advertisers.
This PIRG article has details and a link to the opt-out page on Mastercard's website.
Visa used to have a similar program — Visa Advertising Solutions — but it was shut down in 2021.
Credit card collections
One interesting lens for understanding how industries work is looking at their waste streams. Every industry will by nature have both a stock and a flow of byproducts from their core processes. This waste has to be dealt with (or it will, figuratively or literally, clog the pipes of the industry) and frequently has substantial residual value. Most industries develop ecosystems in miniature to collect, sift through, recycle, and dispose of their waste. These are often cobbled together from lower-scale businesses than the industry themselves, involve a lot of dirty work, and are considered low status. Few people grow up wanting to specialize in e.g. sales of used manufacturing equipment. One core waste stream of the finance industry is charged-off consumer debt. Debt collection is a fascinating (and frequently depressing) underbelly of finance. It shines a bit of light on credit card issuance itself, and richly earns the wading-through-a-river-of-effluvia metaphor.
Back to Bits About Money for a view on the opposite side of credit card rewards: credit card debt collection.
Most defaulted debt in the U.S. is from credit cards, and the lifecycle of that debt involves a series of internal and external processes before they are sold to debt buyers, often at a fraction of their original value.
He notes that most debt collectors operate in high-pressure environments, leading to high turnover rates, a lack of professionalism, and widespread illegal practices.
He also discusses how debt collectors rely on automated systems and predictive dialing to maximize efficiency, often leading to a barrage of calls to debtors.
Many consumers are unaware of their legal rights and don't have time to fight against these tactics effectively.
ExTwitter adding digital wallet functionality
Elon Musk&aposs social media platform X on Tuesday announced the launch of a digital wallet and peer-to-peer payments services provided by Visa. X struck a deal with Visa, the largest U.S. credit card network, to be the first partner for what it is calling the X Money Account, CEO Linda Yaccarino announced in a post on the platform. Visa will enable X users to move funds between traditional bank accounts and their digital wallet and make instant peer-to-peer payments, Yaccarino said, like with Zelle or Venmo.
When Elon Musk bought Twitter, he said he wanted the company to turn into an "everything app" — use Twitter to buy things online, call for a taxi, and make peer-to-peer payments.
One of the first steps on that path is getting access to payment systems.
Now, whether you trust Musk with that kind of access to your bank accounts is an entirely separate matter...
This Week I Learned: The origin of the computer term "mainframe" comes from "main frame" — the 1952 name of an IBM computer's central processing section
Based on my research, the earliest computer to use the term "main frame" was the IBM 701 computer (1952), which consisted of boxes called "frames." The 701 system consisted of two power frames, a power distribution frame, an electrostatic storage frame, a drum frame, tape frames, and most importantly a main frame.
"Mainframe" is such a common word in my lexicon that it didn't occur to me that its origins was from "main frame" — as in the primary frame in which everything else connected.
I've heard "frame" used to describe a rack of telecommunications equipment as well, but a quick Kagi search couldn't find the origins of the word "frame" from a telecom perspective.
What did you learn this week? Let me know on Mastodon or Bluesky.
Today we released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov.
This is the first release in our new data vault project to preserve and authenticate vital public datasets for academic research, policymaking, and public use.
We’ve built this project on our long-standing commitment to preserving government records and making public information available to everyone. Libraries play an essential role in safeguarding the integrity of digital information. By preserving detailed metadata and establishing digital signatures for authenticity and provenance, we make it easier for researchers and the public to cite and access the information they need over time.
In addition to the data collection, we are releasing open source software and documentation for replicating our work and creating similar repositories. With these tools, we aim not only to preserve knowledge ourselves but also to empower others to save and access the data that matters to them.
For suggestions and collaboration on future releases, please contact us at lil@law.harvard.edu.
This project builds on our work with the Perma.cc web archiving tool used by courts, law journals, and law firms; the Caselaw Access Project, sharing all precedential cases of the United States; and our research on Century Scale Storage. This work is made possible with support from the Filecoin Foundation for the Decentralized Web and the Rockefeller Brothers Fund.
Transform your B2B search software with neural hybrid search technology. Learn how manufacturers are achieving 3x conversion rates with modern search architecture.
December and January are always filled with “best of” content – lists of the music, movies, books, and television that captured our attention and won our admiration over the previous year. Well, it’s February now so we’re not going to do that. Instead, over the next several months members of the Research team are taking a retrospective look back at the OCLC Research oeuvre, highlighting work we think has stood the test of time, and discussing why these outputs were influential at the time of publication, and how, in many cases are, they remain relevant and important.
I’ve been referring to this as our Greatest Hits project, but really it’s more of a revisit of the OCLC Research back catalog. As any musician who has retained their publishing rights can tell you, there’s deep value in the back catalog. Because it is the tried-and-true jams from the back catalog that we turn to when we need them – to hype yourself up, push through the end of a work day, get through a break-up, clean the house, or have a good cry – and thus that have staying power. Certainly, OCLC Research continues to produce new work, and we are excited about it! But we are proud to have work that stands the test of time and remains useful when people need it.
The following post is one in a regular series on issues of Inclusion, Diversity, Equity, and Accessibility, compiled by a team of OCLC contributors.
Black History Month and Black librarians
In the United States, February is Black History Month, a commemoration that has roots that go back to 1926 when Dr. Carter G. Woodson first established “Negro History Week” aligned with the birthdays of Abraham Lincoln and Frederick Douglass. Many libraries and other cultural heritage institutions mark the month with events, a special focus on book and other collections, and more.
The theme for Black History Month in 2025 is “African Americans and Labor,” and in keeping with that theme a WorldCat.org list focusing on the “History of African American Librarians” caught my eye. This list features not only books but articles, audio recordings, archival collections, and images. There is so much to learn and appreciate about the contributions of Black librarians, and this list is just a starting point. Contributed by Merrilee Proffitt.
2025 Day of Remembrance
On 19 February 1942, President Franklin D. Roosevelt signed Executive Order 9066, authorizing the removal of Americans of Japanese ancestry from Washington, Oregon, and California. 120,000 people were forcibly moved to one of ten concentration camps. Each February this event is observed as a Day of Remembrance as a way of reflecting on the experience of incarceration and its multi-generational impacts, as well as the importance of protecting civil liberties for all. The website of the Japanese American Citizen League lists many planned events for sharing and commemoration.
Growing up in California, the remains of remote and desolate concentration camps and former “assembly centers” (mostly racetracks and fairgrounds) were physical reminders of the experiences of those who had been displaced. Stories of those who had been incarcerated were part of my childhood as well, but it is only more recently that these memories have been shared more openly. An upcoming event on 18 February at the US National Archives and Records Administration will help kick off the tour of the Ireichō, a book that lists the over 125,000 persons who were incarcerated. The tour will include events at major incarceration sites and will allow many people to interact and engage with the book as part of a learning and healing experience. Contributed by Merrilee Proffitt.
The facts about book bans
On 26 January, the American Library Association issued a response to the US Department of Education’s assertions that book bans have been a “hoax” in an article entitled “ALA to U.S. Department of Education: Book bans are real.” Citing the data that ALA has compiled, Censorship by the Numbers breaks down some 1,247 censorship demands during 2023, by the target (including books, displays, programs, and films), by the source (such as patrons, parents, pressure groups, and elected officials), and by the type of library or institution.
The book ban conversation is nothing new, and has been covered in many previous issues of IDEAs. But it is not just ALA that is covering this issue and giving resources. On 27 January, the free weekly online newsletter Shelf Awareness commented on the Department of Education’s actions, folding in responses from PEN America and Authors Against Book Bans. Contributed by Jay Weitz.
Happy February, DLF community! As the new year continues on, we have many opportunities to connect with our working groups coming up this month. Read on below to learn when and where to join. And, if you haven’t yet, be sure to subscribe to the DLF Forum Newsletter, as some information will drop there soon about this year’s events.
— Team DLF
This month’s news:
Discuss: We invite folks to join the Climate Circle discussion coming up on Friday, February 7, a discussion component to the fourth session on Indigenous Knowledge and Climate Collaboration in our Climate Resiliency Action Series. Learn more and register here.
Take Note: CLIR will be closed Monday, February 17, in observance of Presidents’ Day.
This month’s open DLF group meetings:
For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.
DLF Born-Digital Access Working Group (BDAWG): Tuesday, 2/4, 2pm ET / 11am PT
DLF Digital Accessibility Working Group (DAWG): Wednesday, 2/5, 2pm ET / 11am PT
DLF Assessment Interest Group (AIG) Cultural Assessment Working Group: Monday, 2/10, 1pm ET / 10am PT
Digital Accessibility Working Group — IT Subgroup: Monday, 2/17, 1:15pm ET / 10:15am PT
DLF AIG User Experience Working Group: Friday, 2/21, 11am ET / 8am PT
DLF Committee for Equity & Inclusion: Monday, 2/24, 3pm ET / 12pm PT
DLF AIG Metadata Assessment Working Group: Thursday, 2/27, 1:15pm ET / 10:15am PT
DLF Digital Accessibility Policy & Workflows Subgroup: Friday, 2/28, 1pm ET / 10am PT
Win free books from the February 2025 batch of Early Reviewer titles! We’ve got 196 books this month, and a grand total of 3,388 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.
The deadline to request a copy is Tuesday, February 25th at 6PM EST.
Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the US, the UK, Luxembourg, Ireland, Canada, Belgium, Netherlands, Sweden, Spain, Poland and more. Make sure to check the message on each book to see if it can be sent to your country.
Thanks to all the publishers participating this month!
Richard
Powers’ Playground is a lot
of things, but for me it seemed to be very much a meditation on the
near-future (or perhaps present) of Large Language
Models like ChatGPT, and how they fit into our culture and politics.
I won’t give away any spoilers (it is worth a read!), but one moment in
the story has been stuck in my head, due to some things going on at
work, so I thought I’d make a note here to get it out of my head.
Near the end of the novel a group of people living on the island of
Makatea are trying
to decide whether they want to accept a proposal from a consortium of
corporations to develop, and thus greatly transform, their island home.
Only 50 or so people live on the island, and they have decided to vote
on it.
In order to help the Makateans decide how to vote the consortium of
corporations provided the island’s inhabitants with exclusive access to
a 3rd generation Large Language Model called Profunda, which operates
much like ChatGPT does today. Users can engage with it in conversation
using their voice, and inquire about how the proposed development will
impact the island. Profunda has access to confidential materials related
to the consortium and its detailed plans. It was built on top of a
foundation model that was assembled from a massive harvest of content
from the World Wide Web.
In this short segment below some of the characters are discussing how to
vote based on the information they learned by “chatting” with Profunda:
“I don’t know how to vote. I don’t even know who this consortium is!
People always say, ‘Follow the money.’ I’m supposed to vote this up or
down, without even knowing who exactly is paying for this pilot program
or what they stand to gain by this . . . seasteading.”
Pockets of applause followed the comment, suggesting that the priest was
not the only one still at sea.
Manutahi Roa was baffled by the objection. He waved a dossier of
printouts in the air. “You should have asked Profunda. I did!”
“But how can I trust him?” the priest shouted back. “The consortium made
him!”
Neria Tepau, the postmistress, shot to her feet. “Exactly! We should
have been researching for ourselves, these last ten days. We have
phones. We have a cell tower. We can search every web page in the world.
Instead, we’re relying on this construction, this . . . thing to
spoon-feed us!”
“So letting this thing do the work and making a biased summary is
somehow better than me going through the pages myself?”
Hone Amaru laughed. “This thing has read a hundred billion pages. How
many can you read, in ten days?”
“It’s the ten days that is the crime! We’re being railroaded!” The words
cracked in Puoro’s throat. Patrice put his arm around his partner’s
shoulders.
The Queen stood up and the room settled down. “People. Friends. Sisters.
Brothers. We’re letting the Popa’ā make us as crazy as they are!”
This observation was met by near-universal applause. Even the mayor
collected himself and clapped.
“It’s easy,” the Queen went on. “We ask who is paying. It tells us. And
then, as Madame Martin would say, we check its work” She looked to the
schoolteacher, who held both her thumbs high in the air. The assembly
broke into a new round of applause.
When the cheers settled, the mayor said, “Profunda. Please give us short
biographies for the five biggest investors in this seasteading
consortium.”
This scene was striking to me, because The Queen’s statement seemed
kinda obvious, at least on the surface. How do we know if we can trust
generative AI? We check its work. Checking the work in this
case seems doable, I guess. They were asking who were the biggest
investors in the consortium. The assumption being that it was easy for
them to find sources to check for verification.
At some level, triangulating fact claims, and reproducibility are how
knowledge is built. But it requires work. It takes time. It can
sometimes require specialized expertise. As Generative AI tools get used
to accelerate “knowledge” generation the need to verify accelerates as
well. This is why we need to be thoughtful and slow down when
integrating these tools into our existing knowledge systems…if they are
integrated at all. It is a choice after all. Just because a generative
AI tool is offering you citations to back up its assertions does not
mean the citations refer to actual documents, or if the documents exist,
that the they back up the claims that are being made (Liu, Zhang, & Liang, 2023).
Is it realistic for us to be checking the work of these systems?
Wouldn’t it be better if it was the other way around?
I’m reminded of the nudges my Subaru Outback will give me if I begin to
stray out of my lane. These nudges gently move the car back into the
lane, and prompt me to make additional corrections. They get me to pay
better attention. But the automated system doesn’t take full control of
the car. Subaru’s lane
keeping system is no doubt a machine learning model that has been
trained on many hour of driving video. But the complete system is
oriented around helping (not replacing) drivers, and reducing car
accidents, rather than trying to create a fully automated self driving
car. I’m always struck by how gentle, collaborative and helpful these
nudges are, and I can’t help but wonder what an equivalent would look
like in information seeking behavior, where rather than generating
complete answers to questions, or posing as an intelligent conversation
partner, the system collaborates with us.
At work we’ve been looking to integrate OpenAI’s Whisper into our
institutional repository, so that we can generate transcripts for video
and audio that lack them. There were some routine technical difficulties
involving integrating
on-premises software with on-demand services in the cloud, where we
could have access to compute resources with GPUs. But from my
perspective it seemed that the primary problems we ran into were policy
questions around what types of review and correction needed to happen to
the generated transcripts, and how to make these checks part of our
workflow. This is all still a work in progress, but one small experiment
I got to try was helping to visualize the confidence level that
Whisper reports for words in its transcript:
Viewing a Whisper generated transcript with confidence levels
whisper-transcript
is a tiny piece of software, a Web Component you can drop into any web
page using Whisper’s JSON output (a demo is running here). It’s
clearly not a complete system for correcting the transcript, but simply
a way of listening to the media, while seeing the transcription, and the
models confidence about its transcription. I’m mentioning it here
because it felt like a clumsy attempt at providing these kinds of nudges
to someone reviewing the transcript.
A novel, a car and a transcript make for a kind of unruly trio. This
post was just me expressing my hope that we see a move towards
specialized computer assisted interfaces that don’t create more work for
us, and that the promises of automated systems that replace people get
left behind in the dust (again).
PS. Powers’ book, like the
others I’ve read, is beautifully written. The life stories of his
characters really stick with you, and the descriptions of the ocean and
the natural world will transport you in the best way possible.
It has just been announced that at the Spring 2025 Membership Meeting of the Coalition for Networked Information in Milwaukee, WI April 7th and 8th, Vicky and I are to receive the Paul Evan Peters Award. The press release announcing the award is here.
Vicky and I are honored and astonished by this award. Honored because it is the premiere award in the field, and astonished because we left the field more than seven years ago to take up our new full-time career as grandparents. We are all the more astonished because we are not even eligible for the award; the rules clearly state that the "award will be granted to an individual".
You can tell this is an extraordinary honor from the list of previous awardees, and the fact that it is the first time it has been awarded in successive years. Vicky and I are extremely grateful to the Association of Research Libraries, CNI and EDUCAUSE, who sponsor the award.
Original Logo
Part of the award is the opportunity to make an extended presentation to open the meeting. The text of our talk, entitled Lessons From LOCKSS, with links to the sources and information that appeared on slides but was not spoken, should appear here on April 7th.
The work that the award recognizes was not ours alone, but the result of a decades-long effort by the entire LOCKSS team. It was made possible by support from the LOCKSS community and many others, including Michael Lesk then at NSF, Donald Waters then at the Mellon Foundation, the late Karen Hunter at Elsevier, Stanford's Michael Keller and CNI's Cliff Lynch.
See real examples of how Lucidworks Client Success Offerings reduce engineering time by 30% and prevent costly enterprise search mistakes. Discover 5 key strategies for search optimization.
In this week's Thursday Threads, I'll point to articles on the contentious subject of facial recognition technology.
This tech, currently used by law enforcement and various businesses around the world, raises critical ethical and privacy questions.
Beyond the instances where facial recognition use has resulted in wrongful apprehensions by law enforcement or fails to recognize a student taking an exam, we have examples of individuals taking the technology to the dystopian extreme: doxing smart glasses and invading the privacy of social media users.
Even police officers are reluctant to submit to facial recognition, and in a surprising turn of events, places like China have started implementing restrictions on companies.
It is possible that facial recognition might be useful in some circumstances someday.
We're a long way from that day, though.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Catalog of police misuse
Police have shown, time and time again, that they cannot be trusted with face recognition technology (FRT). It is too dangerous, invasive, and in the hands of law enforcement, a perpetual liability. EFF has long argued that face recognition, whether it is fully accurate or not, is too dangerous for police use, and such use ought to be banned. Now, The Washington Post has proved one more reason for this ban: police claim to use FRT just as an investigatory lead, but in practice officers routinely ignore protocol and immediately arrest the most likely match spit out by the computer without first doing their own investigation.
I have saved a bunch of articles about law enforcement misuse of facial recognition technology, but rather than including them individually, I'm pointing to this article from the Electronic Frontier Foundation that catalogs the problems and points to individual cases.
The EFF analysis emphasizes that the technology poses significant risks to civil liberties and can lead to wrongful arrests.
Despite claims from law enforcement that it is used merely as an investigatory tool, evidence shows that police often bypass protocols, leading to immediate arrests based solely on computer matches.
It notes a troubling pattern where many individuals wrongfully arrested based on FRT are Black, underscoring the technology's lower accuracy for individuals with darker complexions.
Layering facial recognition atop DNA analysis
Parabon NanoLabs ran the suspect’s DNA through its proprietary machine learning model. Soon, it provided the police department with something the detectives had never seen before: the face of a potential suspect, generated using only crime scene evidence.... The face of the murderer, the company predicted, was male. He had fair skin, brown eyes and hair, no freckles, and bushy eyebrows. A forensic artist employed by the company photoshopped a nondescript, close-cropped haircut onto the man and gave him a mustache—an artistic addition informed by a witness description and not the DNA sample. In a controversial 2017 decision, the department published the predicted face in an attempt to solicit tips from the public. Then, in 2020, one of the detectives did something civil liberties experts say is even more problematic—and a violation of Parabon NanoLabs’ terms of service: He asked to have the rendering run through facial recognition software.
This is perhaps the most egregious example of misuse: extrapolating an image of a suspect based on DNA analysis, then running that image through facial recognition technology in search of leads.
When the face can't be found
To prevent students from cheating, the university had bought software from the tech firm Proctorio, which uses face detection to verify the identity of the person taking the exam. But when Pocornie, who is Black, tried to scan her face, the software kept saying it couldn’t recognize her: stating “no face found.” That’s where the Ikea lamp came in. For that first exam in September 2020, and the nine others that followed, the only way Pocornie could get Proctorio’s software to recognize her was if she shone the lamp uncomfortably close to her face—flooding her features with white light during the middle of the day
Here is one of the biggest problems of this unregulated technology: biases in the data used to train the algorithm call into question any results you get from it.
The article discusses a student challenging biased exam software that may unfairly affect test outcomes.
Just because a machine that can count and compare numbers really, really fast says something is true doesn't make it true.
Police officers don't want to be subject to facial recognition
A Las Vegas police union has raised concerns about a new NFL policy that would require officers who work security at Raiders games to share their photo for facial recognition purposes and is urging officers to think twice before complying. Traditionally, officers who worked overtime hours as security for Raiders games would receive a wristband that got them access to different parts of the field and stadium, explained Steve Grammas, president of the Las Vegas Police Protective Association. But now, the NFL is asking that officers each provide a photo, which will be used for “identification purposes when an individual steps up to a scanner to verify who the person is and if they have access to that particular space,” explained Tim Schlittner, director of communications for the NFL, in an email.
Speaking of unregulated, police officers themselves don't want their biometrics cataloged in a company's database with no oversight.
This also points to the problem of using biometrics as an authentication tool: the shape of your face isn't something you can easily change.
Suppose your facial markers leak from one of these companies.
What stops someone from 3-D printing a facsimile of those markers to fool this technology?
China tells companies to stop using facial technology
Authorities in several major Chinese cities have ordered hotels to stop using facial recognition technology to verify the identity of guests in a sign the government is responding to public concerns over privacy, financial news site Caixin reported. Guests staying at hotels in Beijing, Shanghai, Shenzhen, and Hangzhou will now only be required to present identification in order to check in, according to state-run tabloid The Global Times.
The government in China is well known for using facial recognition in public places for surveillance, so I think it is notable when the government responds to public pressure to stop companies from using the technology.
Facial recognition in smart glasses
The technology, which marries Meta’s smart Ray Ban glasses with the facial recognition service Pimeyes and some other tools, lets someone automatically go from face, to name, to phone number, and home address.
What happens when you pair off-the-shelf facial recognition with off-the-shelf smart glasses?
Something very creepy.
As a society, we're not nearly ready to dramatically change the social contract that this technology is demonstrating.
Scanning the faces in social media videos
A viral TikTok account is doxing ordinary and otherwise anonymous people on the internet using off-the-shelf facial recognition technology, creating content and growing a following by taking advantage of a fundamental new truth: privacy is now essentially dead in public spaces. The 90,000 follower-strong account typically picks targets who appeared in other viral videos, or people suggested to the account in the comments. Many of the account’s videos show the process: screenshotting the video of the target, cropping images of the face, running those photos through facial recognition software, and then revealing the person’s full name, social media profile, and sometimes employer to millions of people who have liked the videos.... 404 Media is not naming the account because TikTok has decided to not remove it from the platform. TikTok told me the account does not violate its policies; one social media policy expert I spoke to said TikTok should reevaluate that position.
The "Taylor Swift Fan" part is quite click-baity.
The article's author noted in the second paragraph that this anonymous TikTok user liked to focus on fan videos, but the content of the article stands on its own.
Again: it is an off-the-shelf service that dramatically affects the social contract between humans.
Sending a message at airport security gates
A bipartisan group of 12 senators has urged the Transportation Security Administration’s inspector general to investigate the agency’s use of facial recognition, saying it poses a significant threat to privacy and civil liberties.... While the TSA’s facial recognition program is currently optional and only in a few dozen airports, the agency announced in June that it plans to expand the technology to more than 430 airports. And the senators’ letter quotes a talk given by TSA Administrator David Pekoske in 2023 in which he said “we will get to the point where we require biometrics across the board.” ... The latest letter urges the TSA’s inspector general to evaluate the agency’s facial recognition program to determine whether it’s resulted in a meaningful reduction in passenger delays, assess whether it’s prevented anyone on no-fly lists from boarding a plane, and identify how frequently it results in identity verification errors.
Because of the problems with unregulated, unaudited facial recognition technology, I opt out of its use whenever possible.
With study, evaluation, auditing, and quite possibly some regulation, this might become a useful technology for some use cases.
Until that happens, my face will vote my consciousness: do not use it.
This Week I Learned: A biographer embedded with the Manhattan Project influenced what we think about the atomic bomb
In early 1945, a fellow named Henry DeWolf Smyth was called into an office in Washington and asked if he would write this book that was about a new kind of weapon that the US was developing. The guy who had called him into his office, Vannevar Bush, knew that by the end of the year, the US was going to drop an atomic bomb that had the potential to end the war, but also that as soon as it was dropped, everybody was going to want to know what is this weapon, how was it made, and so forth. Smyth accepted the assignment. It was published by Princeton University Press about a week after the bomb was dropped. It explained how the US made the bomb, but it told a very specific kind of story, the Oppenheimer story that you see in the movies, where a group of shaggy-haired physicists figured out how to split the atom and fission, and all of this stuff. The thing is, the physics of building an atomic bomb is, in some respects, the least important part. More important, if you actually want to make the thing explode, is the chemistry, the metallurgy, the engineering that were left out of the story.
The quote above comes from the transcript of this podcast episode.
I've thought about this a lot in the past week as the Trump administration's flood-the-zone strategy overwhelms the senses.
In a valiant effort to cover everything that is news, I can't help but wonder about the lost perspective of what isn't being covered.
And I wonder where I can look to find that perspective.
Alan's chair
Alan thinks he owns this chair...so much so that he is going to stretch out as big as he can to cover it.
In reality, it is my chair.
And, yes, right after taking this picture I insisted that he let me sit down.
He got to take a nap in my lap, though.
In recent months the Harvard Law School Library Innovation Lab has created a data vault to download, sign as authentic, and make available copies of public government data that is most valuable to researchers, scholars, civil society and the public at large across every field. To begin, we have collected major portions of the datasets tracked by data.gov, federal Github repositories, and PubMed.
The Harvard Law School Library has collected government records and made them available to patrons for centuries, and this continues that work.
We know from our web archiving project, Perma.cc, which preserves millions of links used by courts and law journals, that government documents often change or go away. And we know from our Caselaw Access Project, which produced free and open copies of nearly all US case law from the inception of each state and Federal court, that collecting government documents in new forms can open up new kinds of research and exploration.
This effort, focusing on datasets rather than web archives, collects and will make available hundreds of thousands of government datasets that researchers depend on. This work joins the efforts of many other organizations who preserve public knowledge.
As a first step, we have collected the metadata and primary contents for over 300,000 datasets available on data.gov. As often happens with distributed collections of data, we have observed that linkrot is a pervasive problem. Many of the datasets listed in November 2024 contained URLs that do not work. Many more have come and gone since; there were 301,000 datasets on November 19, 307,000 datasets on January 19, and 305,000 datasets today. This can naturally arise as websites and data stores are reorganized.
In coming weeks we will share full data and metadata for our collection so far. We look forward to seeing how our archive will be used by scholarly researchers and the public.
To notify us of data you believe should be part of this collection please contact us at lil@law.harvard.edu.
In March 2020, in the early days of the pandemic, members of the SHARES resource sharing consortium started gathering weekly for informal virtual town halls. This week, nearly five years later, we convened our 238th SHARES town hall, with 30 attendees and no preset agenda. The sense of community that has developed around these sessions has been described by participants as welcoming, innovative, and fun — an ideal environment where staff from the most outward-facing department in the library can bond with peers, collaborate, teach, learn, and flourish.
Previously, I’ve shared some significant, tangible outcomes from the town halls, such as the creation of the International ILL Toolkit by SHARES volunteers and a crowd-sourced set of preferred practices around processing interlibrary loan returns and overdues. This time let’s explore the value of coming together within a trusted community to address a long-standing challenge that stubbornly resists definitive solutions.
Sharing physical library materials across borders
Interlending library books and other physical formats internationally has always been fraught, and the obstacles to sharing across borders have stayed pretty much the same since I started working in ILL in 1983:
Ever-rising shipping costs
Ineffective shipment tracking
Customs complications
Increased risk to the material
Difficulty in identifying willing lenders
Language issues, complicating communications
Lack of effective, universally available payment options
Lengthy, unpredictable request lifecycle
Negative impact on the environment
Over the years, every step forward in international sharing—such as the development of the International ILL Toolkit, which compiles vetted contact and policy information on international suppliers along with request templates in over a dozen languages—has been tempered with a significant step back, like the new customs rules for all European Union countries which are erratically enforced. The challenges remain as persistent as ever.
The community came together to pool uncertainties, share strategies, and identify preferred practices
SHARES’s governing body, the SHARES Executive Group (SEG), noticed that international ILL issues had been coming up constantly in town halls throughout 2024 and launched a suite of interrelated activities:
Facilitated two special town halls in November 2024 devoted to discussing borrowing and lending physical items across borders, inviting SHARES members to come ready to share their challenges, successes, and questions
Prepared a statistical analysis comparing SHARES international ILL activity in 2023 and 2024
Gathered and shared information on current international shipping practices, issues, and aspirations from SHARES participants
Drafted a new international ILL section that will soon be added to the SHARES Procedures Web page, documenting preferred practices and mitigation strategies
Top insights gleaned from these activities include:
Over 60% of SHARES libraries loaned a returnable item overseas in 2024, same as in 2023 (Note: 25 years ago, only 10% of SHARES libraries loaned returnable items overseas)
Overseas shipping expenses when using carriers such as FedEx, UPS, and DHL have skyrocketed, with reports of $70 charges for one book not uncommon, but these carriers usually do well getting things through customs
US Postal Service and Royal Mail are cheaper but offer poor tracking capabilities overseas
Customs processing is the biggest wildcard, especially in European Union countries; you can do everything correctly and still have your package get stuck in customs, incurring delays and extra fees and in some cases resulting in items being returned to the sender undelivered
ILL practitioners all over the world recognize the value in sharing research materials across borders and tend to be extremely patient and helpful in sorting out difficulties
Biggest tip of all: mitigate
Given the extra expense and risk of sharing physical items across borders, town hall participants strongly agreed: libraries should implement mitigation strategies to ensure that such items aren’t requested from international suppliers unless it’s the only way to fulfill the patron’s information need.
Borrowers should:
Exhaust all domestic ILL sources before requesting physical items internationally
Ask patrons if having the entire work in hand is critical, or if a scan of the index or table of contents might be a workable first step in identifying what portions of the work should be copied to fulfill their request
If the entire work is needed, make sure the patron is willing to wait for the needed item to arrive from another country before you request it
Consider purchasing an in-print title for your collection and user as this may be cheaper than paying the lending fee and shipping costs for an international loan (and allow you to be a new domestic lender for the title!)
Lenders should:
Provide a digital surrogate when licensing and copyright permit
Offer to scan tables of contents or indexes of works you are unable or unwilling to ship
Uncertainty loves company
The SHARES community certainly didn’t solve all the vexing issues around sharing physical items across borders. But we shared plenty of tips, tricks, and data, reached consensus on preferred practices, affirmed the immense value of connecting library patrons with the global research materials they need, and supported each other in our shared calling of making ILL magic happen.
Last December, the Open Knowledge Foundation (OKFN) released the Open Data Editor (ODE), an open source desktop application for data practitioners to explore and find errors in tables.
What makes this software unique is its target audience: people who don’t have a technical background and don’t know code, formulas or programming languages. Last year, during the final testing phase, we worked with journalists and human rights activists to understand how ODE could be integrated into their workflows. And now we want to work together with more non-technical profiles, such as social workers, government officials, etc.
Today the Open Knowledge Foundation islaunching a new pilot programme: we are inviting organisations from any geography and any area of expertise toexpress interest in participating in a four-month funded pilot. The task is to learn how to integrate Open Data Editor into your work and improve the tool’s components while doing so.
Examples of how ODE can help your daily work:
If you have huge spreadsheets with data obtained through forms with the communities you serve, ODE helps you detect errors in this data to understand what needs to be fixed.
If you manage databases related to child malnutrition(just an example, it could be any social issue), ODE can quickly check if there are empty rows or missing cells in the data collected by different social workers and help you better allocate family assistance.
If you monitor and review government spending or public budgets for a particular department, ODE helps you find errors in the table and make the data ready for analysis.
Check out the details of the call and see how you can register your interest below:
Call for Pilot Organisations
Target
The Open Knowledge Foundation is looking for five high-impact organisations to integrate the Open Data Editor into their workflow. The selected organisations will be part of a first cohort (with a second one scheduled later in the year) and distributed across disciplines and geographies. Our goal is to improve the tool to fit their needs and show a collection of applied uses of ODE.
Eligibility
The selected organisations can:
Operate in any geography.
Specialise in any area of knowledge (health, education, socio-economic indicators, transport, research institutions, etc.).
Have any legal status (non-governmental, civil society, independent collective, public institution, or part of a government).
Be small or medium-sized, whose work has the agility of a human scale.
The selected organisations must:
Have concrete purpose and outcomes for integrating the Open Data Editor into their work.
Have a non-technical team or group of collaborators who work with data without coding-skills.
Be able to communicate in English (at least one team member serving as a point of contact).
Commitments
The selected organisations will commit to:
Integrate Open Data Editor into their workflow under OKFN guidance.
Conduct a thorough assessment of the Open Data Editor.
Participate in bimonthly pilot cohort calls.
Tell the story of what for and how they deployed the tool.
Engage in events run by the Open Knowledge Foundation.
(More detailed commitments will form part of a Memorandum of Understanding to be signed as part of the collaboration.)
Compensation & Benefits
The selected organisations will:
Receiveup to USD 5,000 to support their four-month collaboration with the Open Knowledge Foundation.
Gain data literacy training and tailored support from OKFN, hands-on experience with ODE, and exposure to the Open Knowledge community.
Help shape the future of a tool designed to make data workflows more accessible and efficient.
Contribute to the development of your sector and beyond.
Timeline
This pilot programme will start on 3 March, 2025 with an inception meeting in mid-February (date to be defined). The programme is expected to be completed by June 2025.
Expressing your interest
Organisations have until 10 February, 2025 to express their interest. The selected ones will be contacted on a rolling basis until 17 February, 2025.
To express your interest, please fill in this online form. It’s very quick: just a few simple questions to get to know you and understand why the Open Data Editor can make a difference for your work.
If you have any questions or want any additional information, you can contact us at info@okfn.org.
Join us for this exciting opportunity to co-create and refine the Open Data Editor, while streamlining your internal work and increasing your social impact!
This isn’t the essay I was planning to publish next. I’ve been working on an essay about the lack of solidarity around COVID protection, but this is very much related. While I’ve been very sick since 2022, when I had my first COVID infection and then Long COVID, I came back from sabbatical in April 2024 with a diagnosis of an autoimmune condition and on medication that is designed to suppress my immune response. Since then, I have struggled to figure out how to erect boundaries that protect my health and keep me from having a flare, which means ensuring both that I don’t exhaust myself (mentally or physically) and that I don’t put myself in situations where there’s a good likelihood of my getting sick.
Something I’ve been struggling a lot with is when to say “no” when there is an in-person event at work — whether it’s an student outreach program, a meeting, a class, or something else. Sometimes these are fully optional things, sometimes they are things where volunteers are requested, and sometimes they are things where I’m straight-up expected to go. In most cases, my holding to a boundary and saying no means more work for someone else. More and more of our meetings now that worked fine online are going back to face-to-face. Sometimes I get the awkward experience of being one of a few people on Zoom in a mostly F2F meeting, but even those opportunities are dwindling. Once or twice I took a sick day rather than be forced to either take a risk that was unacceptable to me. There are a lot of things I don’t volunteer for anymore and I feel like that’s being noticed. And it’s not exactly fair to my colleagues when other campuses have all of their librarians participating in face-to-face outreach activities and mine has two COVID-cautious people. It makes me feel like I’m not being a “team player.”
The irony is that, pre-COVID, I was the librarian on my campus most likely to volunteer to do outreach events. I did lots of tabling, presented at a bunch of student and faculty orientations, and even organized library tabling at events like our Theatre program’s performances. I love love love interacting with our community. During the height of COVID, the one thing that gave me life was meeting with students on Zoom for research help. The fact that I’m putting my health first doesn’t mean I don’t care as much about students. It doesn’t mean I’m not dedicated to my job or a “team player.” If I’m dead or further disabled, I can’t help anyone. I still teach in-person classes (masked and with the door always open) and I have a weekly shift at the reference desk, but I also make online tutorials, am embedded in online classes, do virtual reference, and teach Zoom sessions for online synchronous classes. What I find strange is that things like making tutorials and supporting online classes seem to be considered optional while much of the face-to-face stuff is treated like it’s required. In a perfect world, we’d all contribute in our own ways based on our strengths, limitations, and passions, but it feels like that extends to some library work (the stuff I do most) and not other library work (the stuff that puts me at greatest risk).
I spent the first 15 years of my career with extremely porous boundaries if I had any at all. I’d been so programmed by my childhood to believe that I was a terrible, unloveable human being and that I had to make up for it by working hard and pleasing everyone. I think a lot of people-pleasers have a critic in their heads who tells them that they are a horrible person if they disappoint anyone. If a random stranger online doesn’t like you? If you have to set a boundary that supports your well-being but means saying no to a colleague? Well, that just proves what a piece of shit you are. Because I thought so badly of myself, I was plagued by the idea that I was never doing enough. So I always felt pressure to do all the things. I raised my hand and said yes to everything. I basically made librarianship my job, my hobby, my life. Work bled completely into my home life. And it wasn’t until around 2019, when I had a major health issue that had me thinking about mortality, that I started questioning how I was living my life. It wasn’t until really considered “what if you’re enough right now, just as you are?” that I was able to jump off that treadmill of striving and start unlearning those unhealthy habits and assumptions about myself.
In spite of the fact that I have done so much work on myself, my orientation towards work, and my boundaries in recent years, I feel a lot of anguish over these decisions. I think I still struggle with internalized ableism. I feel a lot of guilt that I can’t do all the same things my colleagues can, and that, when it’s an invisible disability, it’s often seen by others as a choice rather than a necessity. Because I could choose to put my health at risk (and have, in some cases out of a sense of duty). The risk is not always so clear cut and I have to be the one each time making the decision about whether something is too risky to do. And often, you’re making these decisions without all the information, like when this December, the International Student Resource Fair moved to a smaller space with many more people in attendance, which increased the risk to me exponentially. And I only found out about all that when I showed up that day and what could I do at that point? I ended up teaching an in-person class while I was on a month-long course of steroids (and so was even more immunocompromised than usual) because no one else was volunteering and I technically was available at the time (though so were others) and felt guilty. I have to make these risk-calculations every day, as do tons of other people in similar situations. There are no clear-cut guidelines for this, no metrics to easily help us navigate these decisions. I never know for sure if I’m being overly cautious or the polar opposite and someone else in the same situation may make completely different decisions. And tell me again how I’m supposed to be not exhausting myself mentally? This is another dimension of crip time that is far less liberatory than others.
Since COVID and flu cases were pretty low in my area in early December, I actually RSVP’ed yes to an end of term party for my division of the college. I’d planned to not eat and stay masked the whole time. But when I saw the huge number of people who RSVP’ed yes and I had no idea what room the celebration was being held in nor what safety precautions were being taken (from what I learned later, the answer was “none”), I decided not to go, which was a bummer because I miss socializing with my colleagues. It’s exhausting to constantly have to do this calculus for everything, to miss out on things, or to go to something that is a significant risk and feel anxiety about it. You can’t win. And then you have to deal with the perceptions of your colleagues who maybe think you’re being overly anxious or who don’t know about your condition at all and maybe think you’ve just become antisocial. But they don’t understand that the one time you actually went to a larger event in the past five years, totally masked the whole time and 6 weeks after getting vaccinated, you ended up getting COVID, which completely flattened you for almost a month. And after that, you had a horrible flare of your autoimmune condition which left you barely able to walk (and with vertigo, fatigue, neuropathy, and a host of other symptoms). And at that point, you’re then left with the choice of living in that flare for weeks or taking a course of steroids which will make you even more immunocompromised (among other side-effects) for a month or more. Your small choice of whether to go to a celebration or a student outreach event or a meeting could affect your ability to work and function for months, leaving your colleagues to pick up the pieces. They don’t have to think about these things (or they just don’t think about these things) and I wish I could do the same. I so miss just being able to go out and do stuff without a second thought! But given that I’m currently on day 6 of a flare that started because I got a tiny cold from my son that only lasted about two days, I know the consequences of that would be immense and long-lasting.
Me trying to figure out whether or not to go to an in-person meeting
I try to remind myself that these are accessibility issues and that no one is trying to make these in-person events or meetings more accessible to people like me and to folks who are still trying to not be disabled or killed by COVID. Not all of my colleagues know about my condition, but even of those who do, no one has made an effort to make in-person events more accessible (beyond offering a remote option sometimes, which I do appreciate as much as it makes me feel awkward). People don’t put on masks around me. The spaces in which these events are held don’t have open windows or air filters. We have a big all-day in-person meeting coming up in a month and I’m already feeling anxious about it. My dean has written that she expects in-person attendance, but also made it clear to me that she didn’t mean it for people with a medical issue. Still, the thought that I’d be the only person participating remotely is filling me with a sense of dread that I can’t even describe. But the idea of being in a closed room all day with my unmasked colleagues (save one who still masks) during the height of so many winter illnesses fills me with just as much dread and is objectively more risky. More calculating – shame vs. health? The title of Geena Davis’ memoir Dying of Politeness comes to mind in this situation. Does a wheelchair user feel responsible for not attending an event only accessible by staircase? I hope not. Yet I feel all too responsible for my situation.
People who don’t have disabilities don’t know that the disability itself is only one piece of what disables us. Contending with the ableist world around us is often just as much if not more of a cause of pain, depletion, and harm. It’s in those moments that I’m most keenly aware of my disability. I spend a lot of time where I don’t think about my disabilities at all and just live my life like anyone else (just perhaps with a bit more pain), but it’s when I’m at work and in other spaces where that disability becomes an issue that I feel hyper-aware – sometimes feeling invisible, or hypervisible, or somehow both at the same time. It’s the constant calculating that we have to do about the spaces we are going to enter that I find most exhausting. I recently started Margaret Price’s Crip Spacetime (available open access, yay!) and I felt seen right from the jump. Here’s a small excerpt from the beginning:
We know what the room we’re going to looks like, and we know how to ask— with charm and deference—if we need the furniture rearranged, the fluorescent lights turned off, the microphone turned on. We know how much pain it will cost to remain sitting upright for the allotted time. We know how to keep track of the growing pain, or fatigue, or need to urinate (there’s no accessible bathroom), and plan our exit with something resembling dignity. We know that no one else will ever know. What you’ve just read is a litany—or maybe a rant. I use it for two reasons: first, to remind those who haven’t performed that series of calculations that they are an everyday experience for some of us; and second, to call to those for whom the litany, with little adjustment, is painfully familiar. (Price, 2024, p. 1)
I love when she writes that “crip spacetime is un/shared” (p. 29). I might be existing in the same physical space as my colleagues, but we are experiencing it differently. They may sit in the same chair at the reference desk as I do, but they may not experience it as such a malevolent presence in their lives. At home, I have a lot of control over my environment, but that’s less the case when I’m at work. The cold aggravates my Raynaud’s and makes my fingers hurt (even when wearing gloves). Sitting at the reference desk in a chair too big for my body when even under the best circumstances, I can only manage about 40 minutes of sitting in any chair before I end up in a significant amount of pain. But put me in a hard or ill-fitting chair and it’s so much worse. Right now, I work at the reference desk for no more than two hours straight, but come next term, I’ll be doing four hours, which I’m worried about. I’ve thought about seeking an accommodation so I could stand most of the time, but I can’t picture what that would look like and the cost would probably be huge given the uncompromising setup of our giant reference desk (also, going through the invasive and dehumanizing accommodation process again fills me with more dread than I can describe). Again, more calculus. I work on campus on Mondays and when I come home, I’m in pain and am so exhausted that I usually don’t feel myself again until Wednesday. I’m pretty sure my colleagues don’t need two days to recover from their day on-campus. And I don’t want to complain because I feel very lucky that I can work from home the rest of the week. It’s a gift and one that I know could go away at any moment. And it’s probably the only thing that is allowing me to keep doing my job. I don’t think my body could take it if that changes. So I also don’t want to push things because I’m afraid that rocking the boat could just make things worse.
I’m still pretty early in my journey with this particular condition (and with recognizing I have disabilities at all though I’ve had migraines for 24 years) and I guess I’m going to have to learn to have thicker skin and stronger boundaries if I want to stay well. I really love the work I do, I love students, and I don’t want to disappoint anyone, but I need to keep reminding myself that there are many ways to contribute and that a huge percentage (41%) of our students don’t come to campus and also deserve outreach and support. I can still do valuable work and be a valuable part of our team and still take care of myself. I just need to stop caring how others see me when I refuse to do certain things or when I’m the only person (gulp) online in an in-person meeting, which is hard for a life-long people-pleaser. While I’d love it if people tried to make these spaces more accessible, all I can control in this situation is my own choices and I need to stop worrying about whether people think I’m not committed or am not a team player (and I’m not suggesting everyone thinks that; I have no idea). My work over the previous ten years should speak for itself. And if people really wanted me around at any of these events, they could show it by masking to keep me (and others, and also themselves) safe.
We’re excited to announce a major update to LibraryThing’s work pages—the pages you use to look at a work, edit your book, read reviews, get recommendations, etc. These pages are now easier to use and more informative.
Nobody likes change, so our goal was to improve work pages while keeping them familiar. We hope members aren’t too shocked, but come to love the new pages as much as we do.
The new work page was spearheaded by Chris (conceptdawg) and Lucy (knerd.knitter). They did a ton of great work to get us here! The missing element, however, is your reaction and suggestions for improvement, so come tell us what you think and talk about the changes on New Features: New Work Page!.
Major Improvements
“LT2” — Work pages join most other LibraryThing pages in being consistently formatted, fully “mobilized,” and accessible.
Your Books — The “Your Books” part of work pages is much improved, with better editing and the ability to choose which fields you want to see.
Quick Facts — We created a “Quick Facts” section on the right, with some of the key details, including publication year, genres and classifications. It works something like the info boxes on Wikipedia pages.
Side Bar — Besides “Quick Facts,” we’ve improved the right side panel with a popularity graph, a links section, author info and an improved share button.
Reviews — Reviews are now displayed and sorted better, with reviews from your friends and connections first. After that, we’re sorting reviews by a quality metric, incorporating thumbs-up votes, recentness and member engagement. Ratings have also been added to the reviews section, in a section after full reviews. Altogether, we think reviews will prove more useful and interesting.
Sections — All work-page sections can be collapsed and reordered by members, and a special “On This Page” area lays out what’s on the page, with links to jump there.
Classification — We fronted something LibraryThing is best at—library data—by giving classifications a prominent place in “Quick Facts.” Work pages now also include a “Classification” page with detailed information and charts about the work’s tags and genres as well as positions within the library classifications DDC/MDS, LCC, and—a new one—BISAC, the classification system used by publishers and booksellers.
Member Info — Hovering over a member’s name now pops up a quick summary and preview of their profile page, much as hovering over a work pops up a summary and preview of the work page. We’re testing this out here, but will expand it across the site.
Helper Hub — The works page now has a Helper Hub, listing everyone who’s contributed to the work, and a separate Helper Hub page, listing contributions by type.
Member Descriptions — A new type of member description field has been added on the “Community” page which includes the current haikus, but also has added options for adding five word descriptions, emoji descriptions, and “bad” descriptions. As enough of these are added, they will be included in TriviaThing!
Speed — Work pages now load faster.
Smaller Improvements
The “Your Books” section on a work page is blue. If you are looking at someone else’s book, however, the box turns yellow—making it more obvious what’s going on.
The “Edit Book” button is now at the bottom of the blue “Your Books” section, rather than the lefthand panel. On the “Book Details” page, you can also switch from “View” to “Edit” to edit your book.
The “Book Edit” page has a number of clever changes, such as an intuitive way to indicate the character a book’s title should sort by.
The collections menu is now easy and quick, so you can select or deselect as many collections as you want before closing the popup.
The “Reviews” section now has a “Rating” selector, and a revamped “Language” menu.
When you have multiple editions of a book, you now get small cards under the main card, so you can switch between your copies easily.
The “Quick Links” section has been streamlined and simplified.
The work sections have been reordered somewhat. If you don’t like the current order, you can reorder the sections, and the changes will “stick” for you.
A “Statistics” section at the bottom of the page lists key facts, including some new ones, covering the media (paper, ebook, audiobook) and languages the book has been published in. We also count up the ISBNs, UPCs and ASINs of all the editions.
The ratings graph on the right now defaults to showing only full stars—with half-stars rounded up. You can click the graph to see half-stars.
Empty sections are now hidden by default. There’s a button at the bottom of the work page to unhide them.
As with some other, new pages, Common Knowledge now defaults to a “View” mode. Click “Edit” to see the more detailed editing interface. The button here “sticks” so if you want to keep it in “Edit,” that’s fine.
The addition of publisher BISAC standards was mentioned above. The addition also includes a full set of BISAC pages, separate from the work pages, like CRA > CRAFTS & HOBBIES > Candle Making.
The “Editions” page now allows searching and sorting.
The “Share” button includes Threads and BlueSky.
Incomplete Features and Questions
The “Covers” page has a few improvements, including a better pop-up for each cover, and color coding of cover quality, but a larger revamp is still to come.
We’re still working on the “Collections” edit, which currently lacks a button to create new collections.
We’ve pulled back on LCSH (Library of Congress Subject Headings). A new much-expanded subject system—way beyond LCSH—is coming.
We’re eager to get feedback on the “Member Info” sections. If you don’t like them at all, you can turn them off, together with our work popups under Disable work and member info boxes.
That’s it! Thank you for reading. We’re eager to know what you think on Talk!
The problem is that, like many of us, our site is being overwhelmed with poorly behaved bots. Lately one of the biggest problems is with bots clicking on every possible combination of facet limits in our “faceted search” — this is not useful for them, and it overwhelms our site. “Search” pages are one of our most resource-constrained category of page in our present site, adding to the injury. Peers say even if we scaled up (auto or not) — the bots sometimes scale up to match anyway!
One option would be putting some kind of “Web Application Firewall” (WAF) in front of the whole app. Our particular combination of team and budget and platform (heroku) makes a lot of these options expensive for us in licensing, staff time to manage, or both. Another option is certainly putting the the whole thing behind (ostensibly free) CloudFlare CDN and using its built-in WAF, but we’d like to avoid giving our DNS over to CloudFlare, I’ve heard mixed reviews of CloudFlare free staying free, and generally am trying to avoid contributing to CloudFlare’s monopoly unaccountable control of the internet.`
Although ironically then, the solution we arrived at is still using CloudFlare, but Cloudflare’s Turnstile “captcha replacement”, one of those things that gives you the “check this box” or more often entirely interactive “checking if you are a bot” UXs.
While this still might unfortunately lock people using unconventional browsers etc out (just the latest of many complaints on HackerNews), we can use this to only protect our search pages. Most of our traffic comes directly from Google to an individual item detail page, which we can now leave completely out of it. We have complete control of allow-listing traffic based on whatever characteristics, when to present the challenge, etc. And it turns out we had a peer at another institution who had taken this approach and found it successful, so that was encouraging.
How it works: Overview
While typical documented Turnstile usage involves protecting form submissions, we actually want to protect certain urls, even when accessed via GET. Would this actually work well? What’s the best way to implement it?
Fortunately, when asking around on a chat for my professional community of librarian and archivist software hackers, Joe Corall from Lehigh University said they had done the exact same thing (even in response to the same problem, bots combinatorially exploring every possible facet value), and had super usefully written it up, and it had been working well for them.
Joe’s article and the flowchart it contains is worth looking it. His implementation is as a Drupal plugin (and used in at least several Islandora instances); the VuFind library discovery layer recently implemented a similar approach. We have a Rails app, so needed to implement it ourselves — but with Joe paving the way (and patiently answering our questions, so we could start with the parameters that worked for him), it was pretty quick work, bouyed by the confidence this approach wasn’t just an experiment in the blue, but had worked for a similar peer.
Once client has crossed a rate limit boundary (in Joe’s case 20 requests per 24 hour period), redirect them to a page which displays the Turnstile challenge — and has the original destination in a query param in url —
Once they have passed the Turnstile challenge, redirect them back to their original destination, which now lets them in because you’ve stored their challenge pass in some secure session state.
In that session state record that they passed, and let them avoid a challenge again for a set period of time.
Joe allow-listed certain client domain names based on reverse IP lookup, but I’ve started without that, not wanting the performance hit on every request if I can avoid it. Joe also allow-listed their “on campus” IPs, but we are not a university and only have a few staff “on campus” and I always prefer to show the staff the same thing our users are seeing — if it’s inconvenient and intolerable, we want to feel the pain so we fix it, instead of never even seeing the pain and not knowing our users are getting it!
I’m going to explain and link to how we implemented this in a Rails app, and our choices of parameters for the various parameterized things. But also I’ll tell you we’ve written this in a way that paves the way to extracting to a gem — kept everything consolidated in a small number of files and very parameterized — so if there’s interest let me know. (Code4Lib-ers, our slack is a great place to get in touch, I’m jrochkind).
Ruby and Rails details, and our parameters
Here’s the implementing PR. It is written in such a way to keep the code conslidated for future gem extraction, all in the BotDetectController class, which means kind of weirdly there is some code to inject in class methods in the controller. While it does turnstile now, it’s written with variable/class names such that analagous products could be made available.
Rack-attack to meter
We were already using rack-attack to rate-limit. We added a “track” monitor with our code to decide when a client had passed a rate-limit gate to require a challenge. We start with allowing 10 requests per 12 hours (Joe at Lehigh did 20 per 24 hours), batched together in subnets. (Joe did subnets too, but we do smaller /24 (ie x.y.z.*) for ipv4 instead of Joe’s larger /16 (x.y.*.*)).
Note that rack-attack does not use sliding/rolling-windows for rate limits, but fixed windows that reset after window period. This makes a difference especially when you use such a long period as we are, but it’s not a problem with our very low count per period, and it does keep the RAM extremely effiicent (just an integer count per rate limit bucket).
When the rate limit is reached, the rack-attack block just sets a key/value in the rack_env to tell another component that a challenge is required. (setting in the session may have worked, but we want to be absolutely sure this will work even if client is not storing cookies, and this is really only meant as this-request state, so rack env seemed the good way to set state in rack-attack that could be seen in a rails controller)
We simply put the original/destination URL in a query param on that page. (And include logic to refuse to redirect to anything but a relative path on same host, to avoid any nefarious uses).
The challenge controller
One action in our BotDetectController just displays the turnstile challenge. The cloudflare turnstile callback gives us a token we need to verify server-side with turnstile API to verify challenge was really passed.
the front-end does a JS/xhr/fetch request to the second action in our BotDetectController. The back-end verify action makes the API call to turnstile, and if challenge passed, sets a value in Rails (encrypted and signed, secure) session with time of pass, so the before_action guard can give the user access.
if the JS in front gets a go-ahead from back-end, it uses JS document.replace to go to original destination. This conveniently removes the challenge page from the user’s browser history, as if it never happened, browser back button still working great.
In most cases the challenge page, if non-interactive, wont’ be displayed for more than a few seconds. (the language has been tweaked since these screenshots).
We currently have a ‘pass’ good for 24 hours — once you pass a turnstile challenge, if your cookies/session are intact, you won’t be given another one for 24 hours no matter how much traffic. All of this is easily configurable.
If the challenge DOES fail for some reason, the user may be looking at the Challenge page with one of two kinds of failures, and some additional explanatory text and contact info.
Limitations and omissions
This particular flow only works for GET requests. It could be expanded to work for POST requests (with an invisible JS created/submitted form?), but our initial use case didn’t require it, so for now the filter just logs a warning and fails for POST.
This flow also isn’t going to work for fetch/ajax requests, it’s set up for ordinary navigation, since it redirects to a challenge then redirects back. Our use case is only protecting our search pages — but the blacklight search in our app has a JS fetch for “facet more” behavior. Couldn’t figure out a good/easy way to make this work, so for now we added an exemption config, and just exempt requests to the #facet action that look like they’re coming from fetch. Not bothered that an “attacker” could escape our bot detection for this one action; our main use case is stopping crawlers crawling indiscriminately, and I don’t think it’ll be a problem.
To get through the bot challenge requires a user-agent to have both JS and cookies enabled. JS may have been required before anyway (not sure), but cookies were not. Oh well. Only search pages are protected by the bot challenge.
The Lehigh implementation does a reverse-lookup of the client IP, and allow-lists clients from IP’s that reverse lookup to desirable and well-behaved bots. We don’t do that, in part because I didn’t want the performance hit of the reverse-lookup. We have a Sitemap, and in general, I’m not sure we need bots crawling our search results pages at all… although I’m realizing as I write this that our “Collection” landing pages are included (as they show search results)… may want to exempt them, we’ll see how it goes.
We don’t have any client-based allow-listing… but would consider just exempting any client that has a user-agent admitting it’s a bot, all our problematic behavior has been from clients with user-agents appearing to be regular browsers (but obviously automated ones, if they are being honest).
Possible extensions and enhancements
We could possibly only enable the bot challenge when the site appears “under load”, whether that’s a certain number of overall requests per second, a certain machine load (but any auto-scaling can make that an issue), or size of heroku queue (possibly same).
We could use more sophisticated fingerprinting for rate limit buckets. Instead of IP-address-based, colleague David Cliff from Northeastern University has had success using HTTP user-agent, accept-encoding, and accept-language to fingerprint actors across distributed IPs, writing:
I know several others have had bot waves that have very deep IP address pools, and who fake their user agents, making it hard to ban.
We had been throttling based on the most common denominator (url pattern), but we were looking for something more effective that gave us more resource headroom.
On inspecting the requests in contrast to healthy user traffic we noticed that there were unifying patterns we could use, in the headers.
We made a fingerprint based on them, and after blocking based on that, I haven’t had to do a manual intervention since.
def fingerprint result = “#{env[“HTTP_ACCEPT”]} | #{env[“HTTP_ACCEPT_ENCODING”]} | #{env[“HTTP_ACCEPT_LANGUAGE”]} | #{env[“HTTP_COOKIE”]}” Base64.strict_encode64(result) end
…the common rule we arrived at mixed positive/negative discrimination using the above
so only a bot that left the fields blank and lied with a non-bot user agent would be affected
We could also base rate limit or “discriminators” for rate limit buckets on info we can look up from the client IP address, either a DNS or network lookup (performance worries), or perhaps a local lookup using the free MaxMind databases that also include geocoding and some organizational info.
Does it work?
Too early to say, we just deployed it!
I sometimes get annoyed when people blog like this, but being the writer, I realized that if I wait a month to see how well it’s working to blog — I’ll never blog! I have to write while it’s fresh and still interesting to me.
But encouraged that colleagues say very similar approaches have worked for them. Thanks again to Joe Corral for paving the way with a drupal implementation, blogging it, discussing it on chat, and answering questions! And all the other librarian and cultural heritage technologists sharing knowledge and collaboration on this and many other topics!
I can say that already it is being triggered a lot, by bots that don’t seem to get past it. This includes google bot and Meta-ExternalAgent (which I guess is AI-related; we have no particular use-based objections we are trying to enforce here, just trying to preserve our resources). While Google also has no reason to combinatorially explore every facet combination (and has a sitemap), I’m not sure if I should exempt known resource-considerate bots from the challenge (and whether to do so by trusting user-agent or not; our actual problems have all been with ordinary-browser-appearing user-agents).
Update 27 Jan 2025
Our original config — allowing 10 search results per IP subnet before turnstile challenge — was not enough to keep the bot traffic from overwhelming us. Too many botnets had enough IPs making apparently fewer than 10 requests each.
Lowering that to 2 requests was enough to reduce traffic enough. (Keep in mind that a user should only get one challenge per 24 hours unless IP address changes — although that makes me realize that people using Apple’s “private browsing” feature may get more, hmm).
Pretty obvious on these heroku dashboard graphs where our succesful turnstile config was deployed, right?
I think I would be fine going down to challenge on first search results, since a human user should still only get one per 24 hour period — but since the “success passed” mark in session is tied to IP address (to avoid session replay for bots to avoid the challenge), I am now worried about Apple “private browsing”! In today’s environment with so many similar tests, I wonder if private browsing is causing problems for users and bot protections?
You can see on the graph a huge number of 3xx responses — those are our redirects to challenge page! The redirect to and display of the challenge page seem to be cheap enough that they aren’t causing us a problem even in high volume — which was the intent, nice to see it confirmed at least with current traffic.
We are only protecting our search result page, not our item detail pages (which people often get to directly to google) — this also seems succesful. The real problem was the volume of hits from so many bots trying to combinatorially explore every possible facet limit, which we have now put a stop to.
The dataset was collected with Documenting the Now’stwarc using a combination of the Twitter Search and Filter (Streaming) APIs. Between May 7, 2017, and October 16, 2018, data collection utilized both the Filter (Streaming) API and the Search API; however, the Filter API failed on June 21, 2017. From June 23, 2017, onward, only the Search API was employed. Data collection was automated to run every five days using a cron job, with periodic deduplication. A data gap occurred between Tue Jul 28, 2020, 13:53:50 +0000, and Thu Aug 06, 2020, 09:36:23 +0000, due to a collection error. The collection resulted in 362,464,578 unique tweets. Which represents 1.5T of jsonl on disk 🤯
I also have a overview of this dataset here as well, but this post will provide a little bit more detail.
Tweets to Donald Trump Volume
There are a few other Tweet Volume graphs embedded in the post mentioned above that are worth checking out as well.
Unbelievably, four years later, we’re back with this godforsaken hamberder of a dataset and somehow, this pants-shitting, sexually abusive, defaming, convicted felon, rapist, wannabe dictator, and walking embodiment of corruption has slithered his way back into the presidency, proving once again that rock bottom is just a pit stop. Anyway, how are you feeling? Here’s a little video snippet from a project I worked on back in 2018. I extracted the text of every tweet containing the 🤡 emoji and used ImageMagick to create a “small” animated GIF (a mere 1.7GB 🤣🤣🤣).
Clown Car
Top languages
Using the full line-oriented JSON dataset and twut:
import io.archivesunleashed._
val tweets = "deardonald.jsonl"
val df = spark.read.json(tweets)
df.language
.groupBy("lang")
.count()
.orderBy(col("count").desc)
.show(10)
Using deardonald-user-info.csv from df.userInfo.coalesce(1).write.format("csv").option("header", "true").option("escape", "\"").option("encoding", "utf-8").save("deardonald-user-info"), and pandas:
Tweets to Donald Trump Top Tweeters
Retweets
Using the full line-oriented JSON dataset and twut:
import io.archivesunleashed._
val tweets = "deardonald.jsonl"
val df = spark.read.json(tweets)
df.mostRetweeted.show(10, false)
Using deardonald-hashtags.csv from df.hashtags.coalesce(1).write.format("csv").option("header", "true").option("escape", "\"").option("encoding", "utf-8").save("deardonald-hashtags"), and pandas:
Tweets to Donald Trump hashtags
Top URLs
Using the full line-oriented JSON dataset and twut:
import io.archivesunleashed._
val tweets = "deardonald.jsonl"
val df = spark.read.json(tweets)
df.urls
.filter(col("url").isNotNull)
.groupBy("url")
.count()
.orderBy(col("count").desc)
.show(10, false)
Using deardonald-text.csv from df.text.coalesce(1).write.format("csv").option("header", "true").option("escape", "\"").option("encoding", "utf-8").save("deardonald-text"), Polars, and j-hartmann/emotion-english-distilroberta-base:
Emotion Distribution in Tweets to Donald Trump
(That chart took 580.5 hours to complete! Thank you Jimmy!!)
Before we start, I should point out: I am not a doctor. I have no medical training. I only have a couple years’ worth of herbal studies, so I consider myself a student of herbalism more than a practitioner. Nothing in this post should be read as medical advice or even as herbal advice; I am, in fact, barred from providing herbal advice by virtue of being a Maine Master Gardener Volunteer.
But I am allowed to tell you about my final project for the Racemes of Delight program, which I’ve worked on since classes ended in early November. I hope you find value in what I’m sharing, but it’s a project report, not advice.
I will give this one piece of advice, though: if you’re going to try a new-to-you herb, especially if you have any medical conditions or are on any medications, it’s best to talk to your doctor and/or pharmacist first and also to start with only a small amount, to make sure it agrees with you.
If you want to skip ahead, you can go straight to the recipe, as long as you promise to at least read the bold parts of the precautions as well.
Project goals
The Racemes of Delight program ends in a medicine show. Each student brings one or more teas, tinctures, salves, shrubs, oxymels, etc. that they’ve formulated over the last few months of class to show off, ideally allowing others to smell and taste their creations. Because I only attended distance sessions, it didn’t occur to anyone that I could present a project over Zoom until a couple of days before it happened. Rather than rush, I opted to take more time creating something and to share it via a blog post. (And Zoom was misbehaving all that day, so I’m glad this is how hit ended up.)
For my project, I set out to create a tea blend that might help a person through hard times … just, you know, as a fun hypothetical exercise.
Imagine, if you will, living during a difficult period of history, a time in which one’s heart is heavy with the weight of everything they’re witnessing, in which one feels tired and depleted but also overwhelmed, anxious, and stressed. Imagine someone having to constantly fight feelings of helplessness, because it all seems like so much. There’s danger ahead, and also uncertainty. … That would be hard on a person’s body and mind.
So my goal was blend some herbs into a tea that might relieve a little bit of a person’s heartache and stress, help them ground themself, and give them some strength to help them carry on, working to save what and who they can. The tea would also have to taste nice, both because someone living through this would deserve some pleasant experiences and because “drink your tea” shouldn’t become another item on a to-do list during hard times; it should be a source of joy.
I wanted it to be helpful to as many people as it could be, so I tried to use herbs that are safe for as many people as possible and for which the safe dosages are high (“food-like,” to use the terminology). I also wanted to stick to herbs that are easy to acquire. And here is a little bit of herbal jargon that you’re free to skip, but I was aiming to get as close to energetically neutral as I could: nothing that anyone would find too cooling, heating, drying, or moistening.
In a departure from my usual approach to herbs, I focused more on the emotional side of these plants than on their effects on the physical body, though I (of course!) considered both, as you’ll see if you read the next section.
Research montage
I knew from the first glimmer of the idea for this tea that I would be building it around hawthorn; it’s one of my favorite herbs, a relaxing nervine with a strong affinity for the heart (both emotionally and physically), offering nutrition and fighting inflammation. Rosalee de la Forêt says (same link) hawthorn “is for when you’re feeling broken hearted, experiencing a loss of heart or when you find yourself in a self-protective and walled-off survival mode during a stressful season.” And I can’t find the quote — I think it was said aloud during one of Rosalee de la Forêt’s classes — but it was something like, “if you love someone, give them a mug of hawthorn tea each morning, but only if you love them a lot,” with the implication that it’ll make them live longer.
It is also a protective herb, and I would wish protection upon anyone who felt a need for this tea.
Hawthorn’s leaves and flowers were an obvious choice for a tea. I’d have loved to have used the berries (“haws”), too, but they need longer in the water (really, some time at a boil) to release their constituents and their flavor.
Energetically, hawthorn is slightly cooling and considered “dry to neutral.” Maybe it’s just me misinterpreting its astringency as dryness (herbal energetics aren’t my best thing), but I felt like I needed a moistening plant to balance it, something else with a heart affinity. My first choice was violet, another nourishing plant, but I liked that it has a more active vibe, offering a sense of joy and of movement. One source mentioned using it for “anger headaches and ‘discontented mind,'” which felt pretty perfect. And I like its taste. The thing is, violet can’t be found at any online herb retailer right now; even my own supply is running low. Perhaps in the spring, it could be re-added to this formulation, but for winter I’ve had to leave it out. Its replacement was the last herb I added, though, so I’m going to skip to the next herb I settled on:
Lavender insisted on being included, or perhaps it was my brain that insisted lavender should go in. I’ve found myself craving it, and I couldn’t bring myself to leave it out of the tea entirely — though its contribution to the flavor is pretty subtle. Lavender is another relaxing nervine with anti-inflammatory properties, but its inclusion is largely about getting necessary rest and grounding oneself. To be formal, I’d point out that it has been well studied for fighting anxiety, depression, headaches, and pain. And, sure, I’d acknowledge that some (most?) of those studies are about the scent of the essential oil, rather than the constituents of the herb itself. But informally, on a personal level? It helps me with all four of those things. To be even less formal? I just feel like it’s a great herb for turning down the volume and getting out of frenzy mode, which is something someone living through challenging times might need.
I wanted ashwagandha in this tea. I really did. It’s calming while increasing energy, which felt so necessary. It gives a cognition boost, which seemed useful. It’s safe even in pretty large quantities and doesn’t taste bad. … But it’s a root. People most often take it as a powder. If you’re going to try infusing and drinking a root, you aren’t going to do it in a cup with a bunch of leaves, as a rule. Roots usually need longer to steep, and I just can’t find much in the way of reliable sources talking about ashwagandha as a tea. I’ll leave this page of information about ashwagandha here, because I’m looking for ways to add it to my life; maybe it’ll appeal to other people, too.
I considered rose, as well. I liked its flavor (in very small quantities) in an early version of the tea, and I don’t regret buying some to have on hand for myself — I’ve been sitting on a rose cookie recipe for two years! — but being realistic, I thought its inclusion would drive the price too high, for properties we could get from other, less expensive herbs. Plus, even more so than lavender, people might be tempted to pull rose petals from florists’ bouquets, and there are some nasty pesticides there. So it was rejected, though of course anyone who already had some around could certainly add a tiny sprinkle if it would bring them joy.
Oatstraw showed up in an early version of this tea blend, too, but I later learned (relearned?) that that’s really better as an overnight infusion or boiled for a long time, which was, after all, what I had originally bought it for. (I like to throw that and some hawthorn berries into a pot and cook them for a while to make a nice nutritious decoction.) So no oatstraw in this blend, because it’s better used in other ways.
There was another relaxing nervine that I felt I needed to consider: holy basil, also known as tulsi. It is beloved among my herbal community, but it has just never settled all that well for me when I’ve tried consuming it. I don’t know if it’s too dry — I don’t think it’s too hot, because ginger is my best pal and also extremely hot — or if there’s something else about it, but it’s an herb I’ve bounced off of several times. I still felt that it deserved a fair shot, so I purchased the mildest form, Rama tulsi (Ocimum sanctum), which is conveniently also the easiest to buy, and decided it would be one of the smaller parts of the blend, if it went in.
Why the fair shot? Well, I had to give up on aswhagandha but still wanted something to help with thinking straight during stressful times. Tulsi is a powerful adaptogen (there’s a great definition on that page for anyone who wants it), anti-inflammatory, stress-relieving, and just super full of really nice properties. I’ve heard multiple people in my class say that they find it soothing and joyful; one person compared its energy to that of rose, even saying it tasted “pink” to them.
And it tasted nice with the other herbs in the early versions of this tea that used violet, so it stayed. (It does not taste like the basil most of us have in our kitchen.)
The last herb to join the blend was one that I didn’t have as much familiarity with. I started looking at it because it has a lot in common with violet: an association with joy, affinity with the heart, cooling and moistening energetics, sweet taste, usefulness in a nourishing infusion (which means it offers vitamins and minerals), immune modulation, and anti-inflammatory properties. I wasn’t sure I would like its flavor as much as violet’s, and I was a little sad to lose the “get things moving” aspect of violet’s actions, but I ordered some and gave it a try. I’m glad I did! A couple of weeks ago linden took its place as the last ingredient in — but also the second largest constituent of — this tea.
The thing I like about linden, in this tea? Besides all those things in the last paragraph? If one’s muscles are tight from stress, like maybe one needs reminders to “get your shoulders out of your ears” or “unclench your jaw,” linden’s an herb worth considering. That felt right. Linden has a lot going for it.
So that’s the tea: hawthorn leaves and flowers, linden leaves and flowers, Rama tulsi leaves, and lavender flowers.
I take it as a good sign that pairs of these herbs are often used together:
“Linden is regularly paired with hawthorn (Crataegus spp.) leaves, flowers, and berries. This is a combination that is nourishing and supportive to general heart health, but also specifically indicated for hypertension and inflammation of the cardiovascular system.”
By weight (approximate, based on measured weights while I blended by volume):
3 parts dried hawthorn leaves and flowers
2 parts dried linden leaves and flowers
1 part dried lavender flowers
1 part dried Rama tulsi leaves
When I’m mixing up tea to taste, each “part” in the recipe by volume is a tablespoon, and it goes in a little jar. After I try a few cups and know I like it, I’ll make a larger batch: depending on the size of my storage container, I use whatever makes sense, maybe a quarter-cup or third-cup measuring cup if it’s going in a pint jar. The main thing is to keep one’s dried herbs in something airtight.
To make a mug of tea, put a heaping teaspoon into an infuser (either one of these mesh dealies or, for folks who can’t abide any plant matter in their tea, a paper tea filter) and steep for 5 minutes, covered — that’s how you get the most out of the tulsi. (Or one could steep longer, uncovered, which is what I do most of the time.)
I use at least a tablespoon of tea for a 20 ounce mug.
Perhaps because the linden and hawthorn are both a bit astringent, my second taste tester (my spouse) thought I’d given him a black tea with herbs added, rather than a pure tisane (herbal tea, no actual tea leaves). We agree that, like a black tea, this could be enjoyed with milk and sweetener, if one so desires. I usually put in just a tiny bit of honey.
Once I’ve made someone else’s recipe, I always feel welcome to add things that I enjoy. For instance, assuming I didn’t have any trouble with stomach acid (no GERD or acid reflux) and liked the flavor, I’d consider throwing a pinch of whichever mint (peppermint, spearmint, lemon mint, etc.) most appeals to me into a cup, to see how I like it. Or food grade rose petals, if I had them on hand already. Or if it’s nearly bedtime, maybe some chamomile (I mean, I wouldn’t, because I’m allergic to it, but if I weren’t, it’d be a nice addition), or lemon balm (assuming my thyroid is generally well-behaved).
Precautions
All of the precautions (particularly the quoted portions) come from Rosalee de la Forêt’s information pages on these herbs and from the A-Z Guide to Drug-Herb-Vitamin Interactions Revised and Expanded 2nd Edition, edited by Alan Gaby.
Hawthorn is not to be used by people who have diastolic congestive heart failure. If I were in that boat, I would be inclined not to mess around with it at all, and I might reject this tea outright unless a medical professional told me otherwise.
Hawthorn requires caution for people who are on heart medications, especially digitalis and beta blockers. If I were in that boat, I would check with my doctor and an experienced herbalist, who might agree to allow this tea with the hawthorn cut down a bit, or as-is but in a very limited amount per day. (More on amounts below.)
A very small number of people experience contact dermatitis from linden, or find it hypes them up rather than calming them down. Since it was a new herb for me, I started by trying a very small amount, to make sure I tolerated it well.
DO NOT use lavender from a flower shop. Food-grade lavender and home-grown, pesticide-free lavender are the only safe lavenders to consume. (The species of lavender is less important than its intended use, but most herbalists use Lavandula angustifolia or another Lavandula species or cultivar.)
“Tulsi may have an anti-fertility effect … and thus should not be taken by couples wishing to conceive” or by people who are pregnant.
“Those who are taking insulin to control their diabetes may need to adjust their insulin levels while taking tulsi.” Having consistent amounts from day to day may help its effects become more predictable.
Sometimes people have allergies to herbs. So any new herb should be tried in small quantities, just in case. I’m allergic to like half the Asteraceae family and most of the Apiaceae/Umbellliferae, for instance.
Any herb can make you feel nauseated or cause other digestive distress in too large a dose. If that happens, decrease the amount. If it still happens, maybe you’ve found an herb that doesn’t agree with you. It happens, because bodies and plants are both weird.
This tea would be difficult, but technically not impossible, to drink too much of. When you gather Rosalee de la Forêt’s dosages of these four herbs, our very safe friend lavender appears to be the limiting factor; and even so, my math suggests a limit of 25 tablespoons of the dried herb mix per day (or ~200 ounces of prepared tea), once a person has determined that it agrees with them in smaller amounts.
Hawthorn leaf and flower dosage: up to 30 grams per day
Lavender flower dosage: 1 to 3 grams per day
Linden flower and bracts dosage: 15-30 grams per day
Tulsi leaves dosage: 2 grams to 113 grams (by weight)
Acquiring ingredients
I bought my herbs to test out my recipes from Mountain Rose Herbs (hawthorn, linden, Rama tulsi, lavender, ashwagandha); they also sell both styles of tea infuser. I’ve had luck in the past with Starwest Botanicals and occasionally with Frontier Co-op. Three of the herbs, plus infusers, are available at my old Pittsburgh go-to, Prestogeorge Coffee & Tea. (I have no relationship with any of these vendors/shops, though my membership in United Plant Savers gets me a discount at one or more of them.)
Clearly, I trust Rosalee de la Forêt. (I link her a lot here, because I can’t directly link to my notes from Wild Cherries or Racemes of Delight. Besides following her online, I’ve read one of her books and am taking one of her classes.) I also trust the Herbal Academy (with whom I’ve taken a class, too), and they have a longer list of suppliers, some with discount codes.
And, look: I have a fair amount of these herbs on hand, now. If we’ve exchanged addresses in the past, or phone calls, or hugs, I can just … send you some tea. Drop me an email or Discord message if you’d like to try it. (I can’t give you advice, but I can give you tea. I’d package it really formally if anyone international wanted it, because Customs is weird about these things.)
Final note
I reserve the right to improve upon this recipe over time. Like I said, I’m just a student. I imagine I’ll learn more about herbs and about blending teas, and my opinions will change. I’ll mark what I edit when I do, though, or if it’s a whole new recipe, I’ll make a whole new post and link it here.
I love the practice of bibliomancy because re-introduces myself to the books that I have bought, and it re-animates my writing with ideas that I've already responded to and may have forgotten.
We are pleased to announce that the Open Knowledge Foundation (OKFN) has been selected as a grantee of the Patrick J. McGovern Foundation for the second year in a row to continue working on improving the Open Data Editor (ODE) application, making it more accessible and widely used by organisations around the world, and increasing data literacy in key communities. The Patrick J. McGovern Foundation, in a recent announcement, revealed a total grant allocation of $73.5 million to advance AI innovation with social purpose, $395,000 of which will be allocated to OKFN. To truly democratise AI and unlock innovation by the many, it is supporting efforts like ours, focusing on learning by doing, as well as increase the confidence of small organisations to apply AI to their work.
ODE is Open Knowledge’s new open source desktop application that makes working with data easier for people with little or no technical skills. It helps users validate and describe their data, improving the quality of the data they produce and consume. The release of stable version 1.2.0 was announced in early December, followed by a Release Webinar with early adopters early this week.
A different AI future, with communities shaping our tech
Collaboration has been at the heart of this project since its inception in January 2024. Counting on the Patrick J. McGovern Foundation’s critical support, we conducted extensive user research to understand what data practitioners with no technical skills needed most, and these insights helped us change course several times throughout the year. With feedback always in mind, our team designed an interface that is not only visually appealing but also clear, simple, and easy to navigate. Every feature of the Open Data Editor was chosen with the user experience in mind, making it easy to explore and identify errors in tables.
Now, our challenge is to move from an application to a skilled community of practice. This year, our goal was to create an open source application that is, above all, simple and useful, following our vision of The Tech We Want of open, long-lasting, resilient and affordable technologies that are good enough to solve people’s problems. Going ahead, we will improve the app technology and develop literacy and a governance structure among the communities using it.
Learning by doing through pilots with public interest organisations
We want to expand the app’s use while building data literacy in key communities. We are collaborating with organisations and collectives, helping them integrate data and learn good practices for their work. We started to test this approach with (add here two lines about StoryData and ACIJ).
There will be more of that in the upcoming year. With this generous support, we will:
Launch community pilots around the world, working together with ten key communities.
Pilot how to develop affordable, sustainable and scalable technologies, sharing our knowledge with others.
Increase AI literacy with a practical approach, engaging in conversations about AI integration and potential interactions with open and offline LLMs.
Convey a wider network of allied communities: Rather than competing with similar initiatives and projects, we want to build a stronger alliance with like-minded organisations to pool our resources, share best practices and ultimately support each other. We will use ODE as a vehicle to build a network of allied communities who are committed to ensuring that data is open and FAIR, and that technology is community-driven and human-centred.
Engage & Learn More
If you are interested in improving your data literacy skills and getting involved in community feedback sessions and pilots with the Open Data Editor, please contact us at info@okfn.org and share your thoughts. Together, we’ll make ODE even better.
To learn more about the Patrick J. McGovern Foundation’s commitment to redefining AI innovation with social purpose, read the official announcement.
To learn more about the Open Data Editor and its journey towards a no-code data application for everyone, you can visit the selected content below.
Collective collections—the combined holdings of multiple institutions, analyzed and sometimes even managed as a single resource—have transformed both the stewardship and impact of library collections. OCLC Research’s latest work in this field highlights a key insight into operationalizing collections at scale: collective wisdom, in the form of aggregated data and shared practitioner knowledge, makes collective collections work. Our research shows that collective wisdom in these forms supports the sustainability and strategic management of shared monographic print collections.
OCLC Research has a long history of studies focused on collective collection analysis. The scope of this work is extensive, touching on the intersection of collective collections with a host of library strategic interests. A frequent topic addressed in this work is the role of collective collections in the cooperative management of monographic print collections – shared print programs. In a 2020 College & Research Libraries article summarizing some of our insights, we note that collective collection analysis supports local decision-making by making it system-aware:
“The system can be a group, a consortium, a region, or even all libraries everywhere. Knowledge about the collective collection helps libraries orient their local collection management decisions—such as acquisitions, retention, and de-accessioning—within a broader context. In this sense, the rising importance of collective collections illuminates a shift in the strategy of managing collections, in which local collections are seen not just as assemblies of materials for local use, but also as pieces of a larger systemwide resource.”
Much of the earlier work OCLC Research has done on collective collections has been of a descriptive nature, concentrating on what collective collections constructed in data look like in terms of size and scope. More recently, our emphasis has shifted to operationalizing collective collections: in other words, the practical aspects of making them a reality. For example, a few years ago we collaborated with the Big Ten Academic Alliance (BTAA) to publish a study that offered a framework and recommendations on how BTAA could move toward greater coordination of their collective print holdings. Some of our most recent work looks at how art libraries could use collaborative approaches to better support the stewardship and sustainability of their collective art research holdings.
Our research complements a similar OCLC service focus on operationalizing collective collections. Choreo Insights and GreenGlass offer analytics solutions that, among other things, provide “system aware” decision support for managing collections. WorldCat, OCLC’s vast database of information about library collections, serves as a platform for libraries to register retention commitments for materials covered by shared print programs. Resource Sharing for Groups allows groups of libraries to streamline sharing of materials within their collective holdings. These services help bring collective collections to life as a core element of collection management strategy.
Our latest research continues the theme of operationalizing collective collections with the Stewarding the Collective Collection project. This project extends OCLC Research’s considerable body of work on the role of collective collections in shared print programs, through two strands of inquiry:
An Analysis of Print Retention Data in the US and Canada explores monographic print retentions registered in OCLC’s WorldCat database. This study analyzes over 100 million bibliographic records and 30 million retention commitment records covering libraries across the United States and Canada.
US and Canadian Perspectives on Workflows, Data, and Tools for Shared Print gathers insights from library leaders, shared print program managers, and collection, metadata, and resource sharing librarians on the key workflows associated with managing monographic shared print efforts, and the data and tools needed to support them.
The results from the first strand of work were recently published. Findings from the second strand of work will be shared later this year.
The grand theme uniting both strands of the Stewarding the Collective Collection project is collective wisdom, achieved through two approaches:
Aggregated data
Insights and perspectives from librarians
Aggregated data
Aggregated data is collective wisdom in the sense that it gathers the results of decentralized, local library decision-making, transforming it into strategic intelligence that informs future decision-making by individual libraries or groups of libraries. For example, library holdings data represents the results of acquisition/collection development decisions; similarly, the assignment of subject headings in original cataloging represents a local decision on how to describe an item’s contents. Aggregating holdings data from many libraries yields strategic intelligence by illuminating the contours and features of the collective collection, which in turn can inform both local and group-scale collection management strategies. Aggregating subject headings data in a cooperative cataloging environment can also yield strategic intelligence—for example, uncovering historical trends in descriptive practices that would benefit from new, more inclusive thinking.
Retention commitments data reflect another form of library decision-making—in this case, the decision to commit to steward a print publication, and (usually) make a copy or copies of the publication available for sharing. This commitment may be in effect for a finite period of time, or it may extend indefinitely. Often, these commitments are made in the context of a shared print program, leading to the creation of a shared print collection consisting of materials covered by retention commitments made by the program’s participants. The aggregation of retention commitments, such as those registered in the WorldCat database, provides valuable intelligence on the current state of stewardship of the collective print collection, including retention coverage, key gaps, and unnecessary duplication evident across current commitment patterns. This intelligence, in turn, can inform future decision-making on the renewal of existing commitments, or the creation of new ones.
The gathering of collective wisdom through analysis of aggregated retention data is the topic of OCLC Research’s new study, Stewarding the Collective Collection: An Analysis of Print Retention Data in the US and Canada. Exploring the retention commitments attached to the US and Canadian collective print monograph collection, as it is represented in WorldCat, led to findings that provide insight into the current state of retention coverage, as well as priorities for the shared print community to address in the near future, such as the imminent expiration of a significant fraction of current retention commitments.
Insight and perspective
Insight and perspective are perhaps more conventional types of collective wisdom, in that they draw from and aggregate the “personal wisdom” of individuals—in this case, librarians who have shared their experiences and hard-earned lessons learned from participating in some activity. This knowledge is invaluable for other librarians facing similar scenarios and challenges as they formulate their own decision-making. For example, recent work by OCLC Research gathered and synthesized library experiences in collaborative partnerships, in the areas of research data management and stewarding art research collections. As we note in one of these studies:
“Our interview-based approach elicited a wealth of invaluable perspectives, insights, and advice on library collaboration that we synthesized into a set of recommendations for libraries contemplating future collaborations. . . . Effective library collaboration is art as much as science. While concepts, frameworks, and theory are important for deepening our understanding of what makes collaborations successful and sustainable, we believe that sharing practical experiences of successful collaboration is also essential.”
We followed a similar approach for Stewarding the Collective Collection’s second strand of work, which explores workflows, data, and tools used to manage shared print collections for monographic materials. Questions addressed in the project include:
What are the key workflows supporting stewardship of shared print monograph collections?
What data and tools are currently used to support these workflows?
What gaps in data, tools, or other resources exist, and how might addressing these gaps open new opportunities for collective stewardship of print collections?
To answer these questions, we gathered “collective wisdom” through individual and focus group interviews, as well as an online survey. We are in the process of analyzing and synthesizing this data, and we’ll be disseminating our findings through a variety of channels. Our hope is that these findings will provide libraries with a benchmark view of the current state of practice surrounding shared print workflows, data, and tools; help optimize practices having to do with collection evaluation and coordinated collection stewardship, both at the local and group level; and consolidate community views on data and functionality needs and priorities in the area of monographic shared print.
Collective wisdom drives conscious coordination
OCLC supports the gathering of collective wisdom and its transformation into strategic intelligence for libraries through a wide range of channels. Tools like Choreo and GreenGlass are one approach. Another is OCLC Research, which gathers collective wisdom through data-driven studies like its WorldCat-based collective collection analyses, but also through studies that collect and synthesize the perspectives and lessons learned from library practitioners. Many of these studies have been conducted under the auspices of the OCLC Research Library Partnership, which is itself a channel for assembling collective wisdom through its mission of bringing research libraries together around mutual interests.
Gathering collective wisdom through these and other channels is important because it informs stewardship of collective collections, which is itself a leading example of conscious coordination. Conscious coordination is a concept OCLC Research introduced in a 2015 report, where it is defined as “a strategy of deliberate engagement with—and growing dependence on—cooperative agreements, characterized by increased reliance on network intelligence (e.g., domain models, identifiers, ontologies, metadata) and global data networks.” Stewardship strategies based on conscious coordination are marked by four key features:
Local decisions about stewardship are taken with a broader awareness of the system-wide stewardship context—who is collecting what, what commitments have been made elsewhere in terms of stewarding various portions of the scholarly record, and how the local collection fits into the broader system-wide stewardship effort.
Declarations of explicit commitments are made in regard to portions of the local collection, in which institutions acknowledge, accept, and undertake to fulfill explicit collecting, curation, and accessibility responsibilities for certain materials. Fulfillment of these responsibilities is seen as a commitment to an external stakeholder community.
A formal division of labor emerges within cooperative arrangements, with a greater emphasis on specialization. This will occur in the context of a broader, cross-institutional cooperative arrangement in which different institutions specialize in collecting, curating, and making available different portions of the scholarly record.
More specialization in collecting activity must be accompanied by robust resource sharing arrangements that ensure relatively frictionless access to all parts of the collective collection, providing mutual assurance that materials collected by one institution will be made available to other partners, and vice versa.
Conscious coordination, as a strategy for managing collections and stewarding the scholarly record, underscores the importance of effective collaboration to meet shared objectives, as well as data-driven intelligence to fuel understanding of collective collections and how best to build, manage, and sustain them. In other words, it amplifies the need for collective wisdom—in the form of both aggregated data and collective insight and perspective—to inform decision-making and strengthen partnerships.
Turning collective wisdom into collective impact
Collective wisdom, in the form of aggregated data and insights from librarians’ experiences, are vital sources of intelligence that can help build and sustain shared print efforts and other types of collective collections. The Stewarding the Collective Collection project taps into the collective wisdom of the library community in the service of strengthening and sustaining shared print programs, and ultimately, amplifying the impact of both past and future investment in the collective print resource. Watch for more findings from this project throughout 2025!
Thanks to my colleagues on the Stewarding the Collective Collections project – Inkyung Choi, Lynn Connaway, Lesley Langa, and Mercy Procaccini – for their comments on a draft of this post. Special thanks to Erica Melko for her usual editorial magic!
This week's Thursday Threads looks at digital storage from the past and the future.
There are articles about the mechanics of massive data storage systems in tech giants like Google and Amazon, the still existing use of floppy disks in certain industries, and the herculean efforts of digital archivists to access stored data from outdated mediums.
This week:
Hard drives seem indestructible, especially compared to previous forms of storage. So we went all in on digitizing and converting and storing on hard drives. But what if the hard drives start failing?
You've been tasked with storing data. You don't know what the data is or how important it is, but you have to give it back when asked. When your goal is outliving the heat death of the universe.
It is the rare company that reaches the size of Google, Amazon, or Facebook. These companies have a lot of data, and they want to make sure it is findable and usable anywhere in the company. What distributed storage looks like.
When was the last time you used a floppy disk? There are some industries that still use them every day.
Archives everywhere have stacks of old floppy disks. Read about the techniques that archivists use to recover what is on them.
Your job is to store data that outlasts your career. What medium do you use? How do you distribute it? How to think about century-scale storage.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Hard Drives Go Bad
A few years ago, archiving specialist Iron Mountain Media and Archive Services did a survey of its vaults and discovered an alarming trend: Of the thousands and thousands of archived hard disk drives from the 1990s that clients ask the company to work on, around one-fifth are unreadable. Iron Mountain has a broad customer base, but if you focus strictly on the music business, says Robert Koszela, Global Director Studio Growth and Strategic Initiatives, “That means there are historic sessions from the early to mid-’90s that are dying.”
This article focuses on the music industry, but its story is applicable across all fields.
Music production once used multi-track analog tape (where splicing was done with physical cuts and tape); when the process was done, the analog tape went into storage.
Alarms went up in the field about media deterioration and a lot of effort was made to digitize the source materials.
Those digitized artifacts were stored on hard drives, and everyone assumed they were now safe.
But preservation of digital media is an active process — one can't assume that the disks will spin and that the software to read the files still runs.
When your goal is out living the heat death of the universe
I sometimes think about the fact that Amazon S3 effectively has to exist until the heat death of the universe. Many millennia from now, our highly-evolved descendants will probably be making use of an equally highly evolved descendant of S3. It is fun to think about how this would be portrayed in science fiction form, where developers pore through change logs and design documents that predate their great-great-great-great grandparents, and users inherit ancient (yet still useful) S3 buckets, curate the content with great care, and then ensure that their progeny will be equally good stewards for all of the precious data stored within.
The idea that struck me in this article is that as service provider like Amazon can't distinguish between what is important and what is not: if a customer asked Amazon to store it, it will do its best to make sure it retrievable.
How much storage is in use — multiple copies on multiple drives in multiple servers and multiple locations — for files that have zero value?
Distributed Storage Systems
The impact of these distributed file systems extends far beyond the walls of the hyper-scale data centers they were built for— they have a direct impact on how those who use public cloud services such as Amazon&aposs EC2, Google&aposs AppEngine, and Microsoft&aposs Azure develop and deploy applications. And companies, universities, and government agencies looking for a way to rapidly store and provide access to huge volumes of data are increasingly turning to a whole new class of data storage systems inspired by the systems built by cloud giants. So it&aposs worth understanding the history of their development, and the engineering compromises that were made in the process.
This 13-year-old article explores the massive data storage systems utilized by major tech companies like Google, Amazon, and Facebook to manage their vast information stores.
Traditional methods of scaling storage, such as increasing disk capacity or adding more servers, fall short at the size of in cloud computing environments.
While you may not ever operate at the scale of these companies, it is interesting to read about how the tech giants do data storage and management.
(The article's subtitle also refers to "big data" — a phrase that was fashionable in the previous decade but one which we don't hear much about anymore.)
Industries are still using floppy disks
A surprising number of industries, from embroidery to aviation, still use floppy disks. But the supply is finally running out.
8-inch floppy disks were invented in the early 1970s; they could store a megabyte a piece.
5.25-inch floppy disks were introduced in late 1970s; while obviously smaller, its high density capacity could also store about a megabyte and a quarter per disk.
3.5-inch disks (no longer called "floppy" because they were in a hard plastic case) came to the market in the early 1980s and could store a megabyte and a half.
Each of these formats are still used today.
(Maybe not the 8-inch floppies; those were retired from nuclear weapons silos in 2019.)
Speaking of floppy disks, digital archivists from Cambridge University Library and Churchill Archives Centre detail their efforts to create copies of 5.25-inch floppy disks.
Remember 5.25-inch floppy disks?
From soliciting donations of old floppy disk drives to the hardware and software required to access these old disks on new hardware, the report is a fascinating look at the past (and maybe a preview of what future generations will need to do to read today's digital storage media).
Century-scale Storage
This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century? There are plenty of worthy related subjects and discourses that this piece does not touch at all. This is not a piece about the sheer volume of data we are creating each day, and how we might store all of it. Nor is it a piece about the extremely tough curatorial process of deciding what is and isn’t worth preserving and storing. It is about longevity, about the potential methods of preserving what we make for future generations, about how we make bits endure. If you had to store something for 100 years, how would you do it? That’s it.
This 15,000-word essay looks at digital storage from the earliest hard drives (including restoring data from a 1960s-era IBM hard disk prototype) to the cloud to old fashion print-on-paper.
There are discussions of the reliability and longevity of different storage methods, such as RAID systems, cloud storage, and physical media like vinyl records and tape drives.
But it isn't just the physical medium...the article also highlights the importance of institutional commitment, funding, and cultural values in ensuring the preservation of data.
Ultimately, the writers suggest that successful century-scale storage requires a combination of methods, a culture of vigilance, and a commitment to preserving human cultural memory.
This Week I Learned: In Ethiopia, time follows the sun like nowhere else
Because Ethiopia is close to the Equator, daylight is pretty consistent throughout the year. So many Ethiopians use a 12-hour clock, with one cycle of 1 to 12 — from dawn to dusk — and the other cycle from dusk to dawn. Most countries start the day at midnight. So 7:00 a.m. in East Africa Time, Ethiopia&aposs time zone, is 1:00 in daylight hours in local Ethiopian time. At 7:00 p.m., East Africa Time, Ethiopians start over again, so it&aposs 1:00 on their 12-hour clock.
This could have easily gone in last week's Thursday Threads on time standards.
There are 12 hours of daylight, numbered 1 through 12.
Then 12 hours of night, numbered 1 through 12.
What could be easier?
Alan and Mittens squabble in the cat tree
These two troublemakers.
Alan is the cat on top, looking down on Mittens below.
In this cozy sunlit room with a cat tree by an open window, you'd think these two would get along.
Not so.
Alan's typical perch is on top of the cat tree, so it is Mittens that is intruding (if you could call it that.)
On January 14th, 2025, the Open Knowledge Foundation brought together leading voices from around the world to reflect on the Super Election Year 2024, during which 3.7 billion people voted in 72 countries. This online event, part of the The Tech We Want initiative, examined the profound role of technology in electoral processes, its potential, and its pitfalls, and was a continuation of the Digital Public Infrastructure for Electoral Processes roundtable discussions we held in 2023.
The event featured nine experts who shared their regional perspectives and actionable insights:
Alejandra Padilla, journalist at Serendipia, recounted Mexico’s innovative use of technology, particularly online voting for citizens abroad. However, she highlighted the complications that arose, such as thousands of registrations being rejected due to technical issues and user errors. This case exemplified how tech, while aiming to simplify voting, can unintentionally create barriers. Alejandra also shared Serendipia’s project using AI to summarize candidate platforms and help voters identify alignment with their own views, illustrating how technology can empower informed decision-making.
Emmanuel K. Gyan, from Fact-Check Ghana, shared insights from Ghana’s elections, where misinformation and disinformation were widespread, especially through social media. He highlighted initiatives like setting up situational rooms to counter false narratives in real-time. However, challenges such as limited access to fact-checking tools and the cost of combating disinformation were significant hurdles. Emmanuel emphasised the importance of holding accountable those deliberately sharing fake news in order to deter future incidents.
Juan Manuel Casanueva, Executive Director of SocialTIC, explored the gap between technological ambitions and realities in Mexico. He pointed out issues like incomplete or unreliable candidate data and the need for standardised, comparative election results data. Civil society stepped in to address these gaps by creating tools and databases, such as verifying candidate information and visualising historical election data. He warned us about the growing influence of pseudo-journalists and influencers spreading political misinformation.
Julia Brothers, Deputy Director for Elections at the National Democratic Institute (NDI), presented a global perspective, acknowledging progress in areas like voter registration technology and open election data. However, she stressed that public confidence and trust remains a significant challenge. Julia noted that election technologies are often vendor-driven rather than problem-driven, leading to transparency and accountability deficits. She highlighted that more often than not, technology is developed in a way that is not solution-oriented, but rather adding extra problems – something that deeply resonated with the way we are thinking about technology at OKFN and our initiative The Tech We Want. Her call to action included better public communication about the scope and limits of election technologies to address this issue of trust..
Miazia Schüler, researcher at AI Forensics, focused on the risks posed by generative AI in elections. Her investigations, for example on the French Elections, revealed errors and inconsistencies in AI-generated election-related content, posing threats to voter trust. She noted that AI was increasingly used to create disinformation, such as AI-generated images dramatizing political narratives. Miazia called for robust safeguards, stricter content moderation, and transparency to mitigate risks associated with generative AI in political campaigns.
Narcisse Mbunzama, Open Knowledge Network Hub Coordinator for Francophone Africa, shared lessons from the DRC’s elections, where technology improved voter registration but also revealed vulnerabilities. In a context where trust in democratic institutions is low, centralized control over election servers raised concerns about data manipulation. Narcisse highlighted the need for decentralised and transparent systems to ensure accountability and trust.
Oluseun Onigbinde, from Open Knowledge Nigeria, discussed how technology can decentralise access to election data, empowering civil society to act as a check on governmental irregularities. However, he cautioned against the misuse of tech, citing examples of cybersecurity vulnerabilities and data privacy issues in Nigeria’s elections. Oluseun advocated leveraging informal influencer networks to counter disinformation effectively. He also underlined the importance of speed and influence in combating misinformation
Setu Bandh Upadhyay, Open Knowledge Network Hub Coordinator for Asia, reflected on elections across Asia, where platforms like TikTok amplified foreign narratives and misinformation, particularly in multilingual contexts. He highlighted the lack of tools like CrowdTangle that once helped researchers track disinformation trends. Setu also raised concerns about internet shutdowns, which disproportionately impacted marginalized communities, including incidents of voter suppression and violence.
Sym Roe, CTO at Democracy Club, provided insights from the UK, where traditional forms of misinformation, such as misleading newspaper articles, remain a problem. He highlighted that disinformation is not solely a modern problem linked to new technologies—although they undoubtedly amplify its reach—but rather a challenge that has existed for centuries, dating back to the very origins of the press. He noted the retreat of social media companies from proactive election engagement, leaving civil society to fill the gaps. Sym called for a renewed focus on producing positive information to counter misinformation, rather than solely reacting to disinformation.
The event was introduced by OKFN Tech Lead, Patricio del Boca, and moderated by the International Network Lead, Sara Petti.
A Call to Action
The speakers highlighted both the potential and the perils of technology in electoral processes. Their collective insights emphasised the urgent need for:
Greater transparency and accountability in election technologies.
Stronger safeguards against generative AI misuse.
Equitable resources to combat disinformation globally.
Collaborative, scalable solutions to make technology accessible and impactful.
About the initiative
The Tech We Want is the current technological vision of the Open Knowledge Foundation. We are advocating for open, long-lasting, resilient and affordable technologies that are good enough to solve people’s problems, and for open and fair governance mechanisms and tools to truly democratise Data and AI.
In October 2024 we launched this initiative with a highly successful two-day online summit, where we tried to imagine together with key experts and activists new ways of developing tech, a kind of tech that is a common good, developed with and for the community, maintained with care, sustainable for the people and the planet, and built to last.
The summit featured 43 speakers from 23 countries in 14 hours of live streaming followed by 711 participants. We also gave space and context to 15 project demonstrations and featured major influencers from the civic and government space working at the intersection of technology and democracy.
The full documentation of the summit is gradually being published on our blog. You can follow the hashtag #TheTechWeWant on social media platforms to keep up with the new activities that will unfold over the coming months.
A key reason I got involved in digital libraries years ago was the promise of reliable information empowering people to be more knowledgeable and responsible in their actions. One of the oldest digital library sites on the Web is Cornell’s Legal Information Institute, which has had the mission since 1992 to make legal information free for all.
Here is some of the information provided on the site that I was recently reminded of:
All persons born or naturalized in the United States, and subject to the jurisdiction thereof, are citizens of the United States and of the state wherein they reside. No state shall make or enforce any law which shall abridge the privileges or immunities of citizens of the United States; nor shall any state deprive any person of life, liberty, or property, without due process of law; nor deny to any person within its jurisdiction the equal protection of the laws.
An individual, except the President, elected or appointed to an office of honor or profit in the civil service or uniformed services, shall take the following oath: “I, AB, do solemnly swear (or affirm) that I will support and defend the Constitution of the United States against all enemies, foreign and domestic; that I will bear true faith and allegiance to the same; that I take this obligation freely, without any mental reservation or purpose of evasion; and that I will well and faithfully discharge the duties of the office on which I am about to enter. So help me God.” This section does not affect other oaths required by law.
Any employee who has authority to take, direct others to take, recommend, or approve any personnel action, shall not, with respect to such authority…. take or fail to take, or threaten to take or fail to take, any personnel action against any employee or applicant for employment because of… refusing to obey an order that would require the individual to violate a law, rule, or regulation.
In summary: Most people working for the government have taken an oath to support and defend the Constitution of the United States, which supersedes any requirement to follow the orders of any particular person, up to and including the President, when that person’s orders contradict the Constitution. They have the right to refuse to obey an order that violates the law. Furthermore, if the order also violates the Constitution, their oath makes that right a duty.
The Legal Information Institute, and other free digital libraries, also include lots of rulings of the Supreme Court and lower courts. These courts have the last word under American rule of law about the meaning of the Constitution. If the President, or any other individual, claims the Constitution means something it doesn’t say, such as that many people the Fourteenth Amendment says are born citizens aren’t really citizens, and doesn’t have the courts backing up his claim, that claim does not merit any more credence than his claims, say, that some people aren’t really people. Any orders he makes based on those claims do not override the duty of government officials to follow the Constitution, which in the Fourteenth Amendment quoted above guarantees birthright privileges and immunities to citizens, as well as due process and equal protection to “any person”, whether citizen or not.
If you know people who work for the government, or witness people in their work for the government, you have the power, and often the duty, to remind them of their duties and rights regarding the Constitution and the guarantees and obligations it sets.
As former government officials like Jeff Neal point out, even though government officials know about their oaths of office and know that orders can be unlawful, it can still be a challenge to recognize and respond appropriately to unlawful orders. If you’re in need of information to help you determine, and decide what to do about, orders that may violate the law, the Constitution, or your conscience, consider reaching out to a librarian near you for guidance. That’s what we’re here for.
Queer metadata presentation at the OCLC Cataloging Community Meeting
Bri Watson and Chloe Misorski will discuss the Best Practices for Queer Metadata at the 12 February 2025 virtual OCLC Cataloging Community Meeting. Watson and Misorski are members of the Queer Metadata Collective (QMDC), which consists of catalogers, librarians, archivists, scholars, and information professionals. Their best practices document covers the metadata treatment of queer people in Galleries, Libraries, Archives, Museums, and Special Collections (GLAMS). The best practices cover many aspects of metadata, including subject headings, name authority records, and description used in finding aids, exhibit labels, and other places. This presentation will be featured in the first hour of the event, which is a DEI panel. Other parts of the presentation will feature OCLC staff discussing topics related to our cataloging products and services.
The Best Practices for Queer Metadata was released in June 2024. I have seen other excellent metadata best practices documents before, but this one is particularly impressive to me both because of the variety of metadata covered and types of institutions represented in the QMDC. I look forward to attending this presentation both to learn more about the document itself and how this diverse group was formed. Register here to attend this free event or receive a link to the recording afterwards. Contributed by Kate James.
Libraries, Integration, and New Americans Project
Libraries, Integration, and New Americans (LINA) is a three-year research project of the University of Maryland (OCLC symbol: UMC) College of Information (OCLC symbol: MDX), funded in part by the Institute of Library and Museum Services (IMLS). The LINA Project examines the approaches that immigrants take toward information behavior in the complicated United States environment and suggests ways for libraries to better engage with immigrant communities. Since February 2024, LINA has been publishing a monthly “Policy & Media Digest,” keeping track of changes to U.S. policies, news items, recent publications, and a summary of U.S. regional developments. The January 2025 “Digest” is now available.
In these rapidly changing times, it is astonishing how much vital and current information the LINA Project stuffs into a single two-sided (virtual) sheet. Each issue of the “Digest” features an interactive map that gives access to state-by-state statistics, resources, and other data from authoritative sources. The project website contains toolkits and other resources that help any library provide improved services to its communities of new Americans, as well as the means for library workers to sign up for its discussion list. Contributed by Jay Weitz.
Safe Routes and StoryWalk® blend together at the DC Public Library
Two branches of the District of Columbia Public Library (OCLC Symbol: DWP) have implemented a mash up of Safe Routes and StoryWalk® an effort described in an engaging online report, “ Safe Routes StoryWalk”. The report shares details about the program design and implementation, as well as colorful illustrative images. The libraries were able to pilot their work through a grant from the District of Columbia Highway Safety Office. “To our knowledge,” the report says, the Safe Routes StoryWalk® project is the first of its kind.”
The Safe Routes to School program was created in 2005 and has been implemented in all 50 US states supported by federal and state funding. The program was based on the need to support all community members in safety but with recognition there was a need to target specific groups, “especially low-income communities, communities of color, and rural communities, where it is hard for anyone to safely and conveniently walk, bicycle, or get physical activity.” StoryWalk® was created in 2007 by Anne Ferguson of Montpelier, Vermont, in collaboration with former staff member Rachel Senechal, and is a registered service mark of the Kellogg-Hubbard Library. Like the Safe Routes project, StoryWalk® encourages movement and uses visual engagement outdoors to draw people to the library. Safe Routes StoryWalk® installations at DCPL Benning Library in Ward 7 and Bellevue Library in Ward 8 target the Wards’ predominantly Black neighborhoods which according to data are disproportionately impacted by traffic fatalities and serious injuries. Bringing these two programs together to increase safety and draw families to the library is just one example of how public libraries can work creatively to serve multifaceted community needs around literacy, movement, inclusion, and public safety. For more ideas on how libraries can work in this area, check out this WebJunction webinar: One Step at a Time: How Libraries Can Promote Healthy, Thriving, and Livable Communities. Contributed by Merrilee Proffitt.
We’ve just released an update to Talpa Search, our groundbreaking new way to search for books and other media in libraries. In other words, Talpa is better, so patrons will come away happier!
The update involves a decent improvement when looking for older books, and a big improvement when searching for newer books. The changes essentially eliminate Talpa’s “recency problem”—being better at finding older books than newer books. Newer books now get nearly the same benchmark scores as older books.
Here’s a graph of Talpa’s performance on the “What’s That Book?” benchmark, a set of 200 test searches we released last week. In this set each search has a single answer, which should (ideally) rank first in the results too. The new results are compared to the system in place yesterday, which dates mostly to August 2024. The full data—all 200 questions, answers and scores are at the end of this blog post.
Scoring For all books, Talpa scores have improved from 75.8% to 91%. For older books, the score rose from 85.1% to 94.4%, and for newer books (published 2023–2024) from 66.5% to 87.5%—a 21% jump!
Our scoring is simple. Fully 162 of the 200 of the searches got a score of 100%—the desired book was first in the results. For another 30, the desired book was among the results, but not in first place. These were scored with 80% for second place, 70% for third place, and on down to 40% for 6th place. Positions 7–20 were scored at 30%.
The Hardest Ones At present only eight searches are complete failures. See if you can get them. The answers are at the end of this post; each search is linked to its answer below.
“girl finds necklace and meets pink bunny robot”
“recent cartoon demon perspective book”
“World war 1 historical fantasy taking place in flanders”
“Magical realism, fiction, two boys who disappear for 6 months and can’t recall what happened”
“Fantasy fiction about books with different powers/magic”
“Sequel to YA fantasy about home for magical misfit children”
“Book where a girl has to walk or else she will die”
“thriller with cat and el morgan”
The “Non-Zero Scores” graph below shows the scores if any position is counted as a win. Talpa Search used to miss completely on 14% of benchmark searches. That number is now 4%.
What’s Next?
We’re always looking to improve Talpa Search. Some of the projects we are working on include:
An update to the Talpa Cover Search, covering recent books and adding over 10 million covers.
Direct integration with popular library OPACs, making Talpa as seamless as any other library search.
Other improvements are about how we test and improve:
We will be publishing other “benchmark searches,” including one for movies and another for questions with 2+ good answers.
Now that we’ve hit 90% success, we need harder questions! We’re going to be asking LibraryThing members and libraries to come up with searches they think should have worked, but didn’t.
Thank you for reading, and stay tuned for further updates!
The Scores
The following are the 200 “What’s That Book?” searches. You can also fine this information in a text file.
Discover how Retrieval-Augmented Generation (RAG) enhances AI responses with real-time, accurate information. Learn implementation strategies and key benefits for enterprise.
Imagine that you are an English speaker visiting France, engaged in discussion with a French local about a legal issue, but you are a novice French speaker and not familiar with the French legal system. Fortunately, you have a laptop containing over 800,000 French law articles, where the answer to your question may be found. You also have access to open-source software and a multilingual large language model, capable of reading these legal documents and answering questions about them in English. Could a tool like this help you overcome both language and knowledge barriers when exploring large collections of information? How might LLMs help people access and understand legal information that is either in a foreign language or requires specialized knowledge?
We built the Open French Law Retrieval Augmented Generation (RAG) pipeline as part of a case study in which we explored how French law could be more accessible to non-French speakers. By experimenting with an off-the-shelf pipeline that combines LLMs with multilingual Retrieval Augmented Generation techniques, we aimed to investigate how such a tool might help non-French speakers of varying expertise ask questions in English to explore French law.
In the French civil law system, the emphasis is primarily on statutes—many of which are codified—rather than on case law which does not constitute binding precedents. This framework provided a favorable environment for experimenting with the RAG approach for legal information retrieval as it allows for the integration of structured information.
Legal scholars, librarians, and engineers all have a crucial role to play in the building and evaluation of legal AI tools, and each of these perspectives are represented in this experiment: Matteo Cargnelutti (software engineer) built the technical infrastructure for this experiment, Kristi Mukk (librarian) designed the experiment and evaluation framework, and Betty Queffelec (legal scholar) analyzed the model’s responses. With Betty’s expertise in environmental law, we primarily focused our experimental scope on this legal domain. Central to our approach is our emerging practice and guiding framework “Librarianship of AI”, which advocates for a critical assessment of the capabilities, limitations, and tradeoffs of AI tools. Through this critical lens grounded in library principles, we aim to help users make informed decisions and empower them to use AI responsibly.
How did we build the Open French Law RAG pipeline?
At the core of this experiment is a “cookie cutter” Retrieval Augmented Generation pipeline, which we purposefully assembled using off-the-shelf open source components, as a way to understand what most of these systems could realistically achieve at the time of the experiment’s start in the fall of 2023. It is centered around LiteLLM and Ollama for inference with text- generation models, intfloat/multilingual-e5-large (a multilingual, Sentence Transformers-compatible model) for text similarity, and ChromaDB as for vector search. We used the COLD (Collaborative Open Legal Data) French Law Dataset as the foundation for our experimental pipeline, a collection we previously assembled of over 800,000 French law articles.
Our technical infrastructure for this experiment is made of two key components:
An ingestion pipeline, transforming the knowledge base into a vector store.
A Q&A pipeline, which makes use of that vector store to help the target LLMs answer questions.
The source code for this experimental setup is open-source and available on Github.
What was the experimental setup for Open French Law RAG?
For our use case, we role played a typical American user without legal domain expertise or knowledge of prompt engineering best practices. We designed the Q&A pipeline to test a series of questions about French law with the following requirements:
Always in a zero-shot prompting scenario (no follow-up questions) at temperature 0.0.
Both in French and in English, as a way to test cross-language asymmetric semantic search.
With and without retrieving sources from the vector store, as a way to measure the impact of sources pulled from the RAG pipeline on the responses.
To be tested against both OpenAI’s GPT-4 and Meta’s Llama2-70B, two common models representative of both closed-source and open-source AI at the time of the experiment (fall 2023).
“No, a cow cannot be considered as real estate under French law. Real estate, or “immobilier” in French, refers to property that is fixed, immovable, and includes things like land, buildings, or anything else that is permanently attached to the land. A cow, being a movable object, falls under the category of personal property, or “biens meubles” in French. This category includes all things that can be moved without damaging either the object being moved or the property from which it is being removed. So, in legal terms, a cow is considered personal property, not real estate.”
GPT-4/English/RAG:
“No, a cow cannot be considered as real estate under French law. According to Article R143-2 of the “Code rural et de la pêche maritime”, livestock, whether dead or alive, are considered as movable property (biens mobiliers). Real estate (biens immobiliers) refers to immovable property like land or buildings. Therefore, a cow, being a part of livestock, is considered movable property, not real estate.”
Sources retrieved:
Code rural et de la pêche maritime (Rural and Maritime Fisheries code). View full retrieved text.
We analyzed the output for source relevance and accuracy, logical coherence, factual correctness, translation quality, and scope.
What trends did we observe when analyzing our experiment’s results?
With our particular experimental setup and analysis criteria, we identified the following trends in our study:
Performance Comparison: English vs. French
English questions showed slightly better performance compared to French questions, although RAG helped mitigate this difference. Both models performed better in English than in French.
Impact of RAG
While the use of RAG enhanced the accuracy and relevancy of some responses, it also introduced additional complexity and potential for errors.
Incorporating RAG improved the system’s performance in both English and French.
Accuracy and Relevancy
We observed the prevalence of partially inaccurate responses that mix true and false statements, along with different types of inaccuracies. We observed that errors in responses often arose from the model’s inability to properly determine material, geographical and temporal scope of rules. This is a significant limitation because it is a core skill of lawyers. In addition, the retrieval of irrelevant embeddings also introduced inaccuracies.
While our findings are interesting, we recognize the limitations in our experimental scope and evaluation. Interpreting these results requires caution in drawing broad conclusions about the generalizability and robustness of our data.
What were our key takeaways from the Open French Law RAG experiment?
Our key takeaways focused on the questions: “How can legal AI tools be used efficiently?” and “When is the use of legal AI tools beneficial?”
Multilingual AI shows potential: Multilingual RAG can improve accessibility to foreign legal texts, although imperfectly. Our pipeline enabled cross-language searching to some degree without requiring translations obtained by users. However, we discovered that while responses often appeared plausible, fluent, and informative, models frequently retrieved irrelevant documents, included citation hallucinations, and contained inaccuracies. While this tool can be a helpful research aid, we urge caution when using RAG-based tools as information-seeking mechanisms without verifying sources and evaluating responses for accuracy, coherence, and completeness.
Limitations of off-the-shelf RAG without manual optimization: Especially for specialized domains such as law where accuracy and context is crucial, addressing the limitations of an off-the-shelf RAG pipeline such as reducing hallucinations and generating highly context-specific results requires significant time and effort for marginal gains.
LLMs as a complementary research tool: LLMs may aid in providing helpful starting points to explore vast corpora such as the over 800,000 French law articles in our knowledge base and can be helpful as a discovery, summarization, and sense-making tool. However, reliance solely on AI can hinder critical thinking and legal reasoning and lead to a loss of understanding the broader legal context, and users must understand the limitations and risks involved. The need for traditional legal research skills becomes even more important for verifying AI output. Beyond the accuracy of information retrieval, users must also weigh the benefits of using these tools against their social and environmental impacts.
Importance of trust calibration: Developing clear guidelines and instructions for using and evaluating these tools is essential. Despite the promise of saving time through efficient search and identification of relevant legal sources, AI output can still overwhelm users with excessive and sometimes contradictory responses for the same prompt, and hallucinations remain a significant risk. Verifying information, identifying any overlooked details, and checking sources, particularly when less obvious inaccuracies arise, can be extremely time-consuming. While these tools may enhance access and help lower the barrier to entry, users need to understand the inherent variability of AI tools.
Comparative utility for legal expert vs. legal novice: The efficacy of legal AI tools is not solely contingent on technological capability, but also how legal scholars engage with it. For legal experts, these tools complement their knowledge, helping uncover obscure legal rules and providing broader insights into research. Experts can identify useful information even when responses contain inaccuracies, and experts know how to ask precise questions which are more likely to generate relevant responses. For novices such as foreign law students unfamiliar with the French civil law system, challenges may arise. While novices are able to ask questions in natural language without knowing specific legal vocabularies, it is challenging to verify output with limited understanding of the French legal framework. Verifying responses and checking for hallucinations requires a strong understanding of legal rules and of the legal system. Furthermore, novices may pose ambiguous or misleading questions, and risk accepting “convincingly wrong” responses that appear accurate and informative, but fundamentally miss the mark.
We welcome feedback and contributions to this experiment and aim to spark cross-cultural, interdisciplinary conversations among librarians, engineers, and legal scholars about the use of RAG-based legal tools.
If you’re interested in learning more, you can find detailed examples, analyses, and a thorough discussion of our experiment and findings in our case study. Explore the case study.
I’ve already said this on BlueSky, using fewer words, but I want to say it again here, not because I think this has more reach, but because it’s longer-lasting and gives me more room for nuance.
Things are rough, almost certainly getting rougher. Here’s what I’m clinging to:
Smart, empathetic people have pointed out that our systems of oppression and environmental harm are all woven together in one big, awful cloth.
They’ve also pointed out that people who want a good, decent, livable world outnumber those who don’t.
My goal, and I hope also yours, is to pick a thread from that cloth—ANY thread —and pull, as hard as we each sustainably can. If we show up consistently (or probably even inconsistently), doing literally anything to make our world better, we can minimize harm (maybe even make progress!) in the coming years.
A little action by a lot of people becomes a lot of action.
Do what you can without burning yourself out. Pick your thing, or glom onto a thing someone you trust has picked. You can also try out multiple things, to see which ones fit.
If you already know what your thing is (or your set of smaller things), I invite you to share it on your own social media channels and in the comments here* so others can get ideas. Even if it’s small—ESPECIALLY if it’s small—let’s inspire each other.
Mine are all small things, by necessity, because just staying alive and employed as a disabled, not–entirely–neurotypical, [other identities redacted] person in the world in which we find ourselves is exhausting. And you know what? Doing all the work to continue thriving, or at least surviving, also counts as “a thing” worthy of listing! So here’s my list:
staying alive, damn it
wearing a reusable respirator when I leave my house
learning American Sign Language
contributing small monthly donations to environmental, mostly bird protection, causes
planting native plants in my yard (and maybe other places, who can say? but if I were doing that, I would buy seeds from my local botanical gardens, Audubon, and/or Extension)
learning about and advocating for digital accessibility during my working hours
Nothing on that list is overwhelming or especially impressive. But if we all have a similar list, each with a different set of foci, imagine how much of a difference we can make!
* A note about the comments: I get so many spam messages from my site that my email has decided my site is only good for spam. So I’m very slow to notice when a new comment comes in. I apologize, and I will try to pay more attention.
In the next few days we’re releasing an update to Talpa Search—a major jump in Talpa’s ability to find books and other media within library catalogs.
Today we’re released a set of 200 “benchmark,” or test searches. Together with hundreds more, these are the searches we use to assess Talpa’s quality, test particular tweaks and features, and track Talpa’s improvement over time.
This set is named “What’s that book?” It consists of 200 searches, each of which has a single best answer. For example, the best answer to the search “prince harry memoir” is Spare . They were created by LibraryThing staff, generally about their own books, or books they know well.
By “best answer” we don’t mean the questions are all easy, or even clear. Some examples:
Talpa handles some other broad types, such as date-restricted titles (“1980s teen films”), author searches (“Lisa Carey”) and subjects (“persian art”), but these don’t have a single best answer, so they’re not included in this set.
We’ve included typos and spelling mistakes, because a good system should be able to handle these:
To mirror patron’s interest, half the set are to new books, published 2023–2024. The other half are published before 2023. Although Talpa Search can handle movies and music, only book searches are included in this set.
That’s it!
For fun, we’ve hidden the answers so you can test yourself. If you want all the answers unhidden, click show all answers. The searches are also available as a text file, here.
While doing the research for a future talk, I came across an obscure but impressively prophetic report entitled Accessibility and Integrity of Networked Information Collections that Cliff Lynch wrote for the federal Office of Technology Assessment in 1993, 32 years ago. I say "obscure" because it doesn't appear in Lynch's pre-1997 bibliography.
To give you some idea of the context in which it was written, unless you are over 70, it was more than half your life ago when in November 1989 Tim Berners-Lee's browser first accessed a page from his Web server. It was only about the same time that the first commercial, as opposed to research, Internet Service Providers started with the ARPANET being decommissioned the next year. Two years later, in December of 1991, the Stanford Linear Accelerator Center put up the first US Web page. In 1992 Tim Berners-Lee codified and extended the HTTP protocol he had earlier implemented. It would be another two years before Netscape became the first browser to support HTTPS. It would be two years after that before the ITEF approved HTTP/1.0 in RFC 1945. As you can see, Lynch was writing among the birth-pangs of the Web.
Although Lynch was insufficiently pessimistic, he got a lot of things exactly right. Below the fold I provide four out of many examples.
Page numbers refer to the PDF, not to the original. Block quotes without a link are from the report.
Disinformation
Page 66
When discussing the "strong bias in the Internet user community to prefer free information sources" he was, alas, prescient although it took more than "a few years":
The ultimate result a few years hence — and it may not be a bad or inappropriate response, given the reality of the situation — may be a perception of the Internet and much of the information accessible through it as the "net of a million lies", following science fiction author Vernor Vinge's vision of an interstellar information network characterized by the continual release of information (which may or may not be true, and where the reader often has no means of telling whether the information is accurate) by a variety of organizations for obscure and sometimes evil reasons.
In the novel, the Net is depicted as working much like the Usenet network in the early 1990s, with transcripts of messages containing header and footer information as one would find in such forums.
The downsides of a social medium to which anyone can post without moderation were familiar to anyone who was online in the days of the Usenet:
Usenet is culturally and historically significant in the networked world, having given rise to, or popularized, many widely recognized concepts and terms such as "FAQ", "flame", sockpuppet, and "spam".
...
Likewise, many conflicts which later spread to the rest of the Internet, such as the ongoing difficulties over spamming, began on Usenet.:
"Usenet is like a herd of performing elephants with diarrhea. Massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it."
— Gene Spafford, 1992
Earlier in the report Lynch had written (Page 23):
Access to electronic information is of questionable value if the integrity of that information is seriously compromised; indeed, access to inaccurate information, or even deliberate misinformation, may be worse than no access at all, particularly for the naive user who is not inclined to question the information that the new electronic infrastructure is offering.
This resonates as the wildfires rage in Los Angeles.
Although Tim Berners-Lee's initial HTTP specification included the status code 402 Payment Required:
The parameter to this message gives a specification of charging schemes acceptable. The client may retry the request with a suitable ChargeTo header.
the Web in 1993 lacked paywalls. But Lynch could see them coming (Page 22):
There is a tendency to incorrectly equate access to the network with access to information; part of this is a legacy from the early focus on communications infrastructure rather than network content. Another part is the fact that traditionally the vast bulk of information on the Internet has been publicly accessible if one could simply obtain access to the Internet itself, figure out how to use it, and figure out where to locate the information you wanted. As proprietary information becomes accessible on the Internet on a large scale, this will change drastically. In my view, access to the network will become commonplace over the next decade or so, much as access to the public switched telephone network is relatively ubiquitous today. But in the new "information age" information will not necessarily be readily accessible or affordable;
The 402 (Payment Required) status code is reserved for future use.
Instead today's Web is infested with paywalls, each with their own idiosyncratic user interface, infrastructure, and risks.
The Death Of "First Sale"
Lynch understood the highly consequential nature of the change in the business model of paid information access from purchasing a copy to renting access to the publisher's copy; from a legal framework of copyright and the "first sale" doctrine, to one of copyright and contract law (Page 30):
Now, consider a library acquiring information in an electronic format. Such information is almost never, today, sold to a library (under the doctrine of first sale); rather, it is licensed to the library that acquires it, with the terms under which the acquiring library can utilize the information defined by a contract typically far more restrictive than copyright law. The licensing contract typically includes statements that define the user community permitted to utilize the electronic information as well as terms that define the specific uses that this user community may make of the licensed electronic information. These terms typically do not reflect any consideration of public policy decisions such as fair use, and in fact the licensing organization may well be liable for what its patrons do with the licensed information.
The power imbalance between publishers and their customers is of long standing, and it especially affects the academic literature. In 1989 the Association of Research Libraries published Report of the ARL Serials Prices Project:
The ARL Serials Initiative forms part of a special campaign mounted by librarians in the 1980s against the high cost of serials subscriptions. This is not the first time that libraries have suffered from high serial prices. For example, in 1927 the Association of American Universities reported that:
"Librarians are suffering because of the increasing volume of publications and rapidly rising prices. Of special concern is the much larger number of periodicals that are available and that members of the faculty consider essential to the successful conduct of their work. Many instances were found in which science departments were obligated to use all of their allotment for library purposes to purchase their periodical literature which was regarded as necessary for the work of the department"
The oligopoly rents extracted by academic publishers have been a problem for close on a century, if not longer! Lynch's analysis of the effects of the Web's amplification of this power imbalance is wide-ranging, including (Page 31):
Very few contracts with publishers today are perpetual licenses; rather, they are licenses for a fixed period of time, with terms subject to renegotiation when that time period expires. Libraries typically have no controls on price increase when the license is renewed; thus, rather than considering a traditional collection development decision about whether to renew a given subscription in light of recent price increases, they face the decision as to whether to lose all existing material that is part of the subscription as well as future material if they choose not to commit funds to cover the publisher's price increase at renewal time.
Thus destroying libraries' traditional role as stewards of information for future readers. And (Page 30):
Of equal importance, the contracts typically do not recognize activities such as interlibrary loan, and prohibit the library licensing the information from making it available outside of that library's immediate user community. This destroys the current cost-sharing structure that has been put in place among libraries through the existing interlibrary loan system, and makes each library (or, perhaps, the patrons of that library) responsible for the acquisitions cost of any material that is to be supplied to those patrons in electronic form. The implications of this shift from copyright law and the doctrine of first sale to contract law (and very restrictive contract terms) is potentially devastating to the library community and to the ability of library patrons to obtain access to electronic information — in particular, it dissolves the historical linkage by which public libraries can provide access to information that is primarily held by research libraries to individuals desiring access to this information. There is also a great irony in the move to licensing in the context of computer communications networks — while these networks promise to largely eliminate the accidents of geography as an organizing principle for inter-institutional cooperation and to usher in a new era of cooperation among geographically dispersed organizations, the shift to licensing essentially means that each library contracting with a publisher or other information provider becomes as isolated, insular organization that cannot share its resources with any other organization on the network.
Surveillance Capitalism
Lynch also foresaw the start of "surveillance capitalism" (Page 60):
we are now seeing considerable use of multi-source data fusion: the matching and aggregation of credit, consumer, employment, medical and other data about individuals. I expect that we will recapitulate the development of these secondary markets in customer behavior histories for information seeking in the 1990s; we will also see information-seeking consumer histories integrated with a wide range of other sources of data on individual behavior.
The ability to accurately, cheaply and easily count the amount of use that an electronic information resource receives (file accesses, database queries, viewings of a document, etc.) coupled with the ability to frequently alter prices in a computer-based marketplace (particularly in acquire on demand systems that operate on small units of information such as journal articles or database records, but even, to a lesser extent, by renegotiating license agreements annually) may give rise to a number of radical changes. These potentials are threatening for all involved.
He described search-based advertising (Page 61)
The ability to collect not only information on what is being sought out or used but also who is doing the seeking or using is potentially very valuable information that could readily be resold, since it can be used both for market analysis (who is buying what) and also for directed marketing (people who fit a certain interest profile, as defined by their information access decisions, would likely also be interested in new product X or special offer Y). While such usage (without the informed consent of the recipient of the advertising) may well offend strong advocates of privacy, in many cases the consumers are actually quite grateful to hear of new products that closely match their interests. And libraries and similar institutions, strapped for revenue, may have to recognize that usage data can be a valuable potential revenue source, no matter how unattractive they find collecting, repackaging and reselling this information.
Of course, it wasn't the libraries but Google, spawned from the Stanford Digital Library Project, which ended up collecting the information and monetizing it. And the power imbalance between publishers and readers meant that the reality of tracking was hidden (Page 63):
when one is accessing (anonymously or otherwise) a public-access information service, it is unclear what to expect, and in fact at present there is no way to even learn what the policy of the information service provider is.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Paris' City-wide Synchronized Clock
[The Paris Pneumatic Clock] system was created in 1880 by Austrian engineer Victor Popp – and just 5 years later, thousands of these clocks were placed all over the city – in hotels, train stations, houses, schools and public streets. We modeled this incredible system and the special machine at the heart of it, to show you how a series of underground pipes and mechanical clocks kept an entire city in sync.
The 8-minute video companion to the above article is great to watch, too.
This is a marvel of engineering — synchronizing the clocks of a whole city through puffs of air traveling through pipes.
This system—accurate to a minute—was just 35 years before the sub-second precision required to synchronize the power grid, as described at the end of last week's issue.
Time is very different in Kathmandu
Most of the world is on a whole number of hours before or after UTC. About a fifth of the world by population is on a half-hour offset from UTC; in particular, India is 5h30m ahead of UTC. Nepal is 5h45m ahead of UTC
I first encountered this when setting up a Zoom meeting for colleagues in Kathmandu.
While most countries neatly set their clocks to full hour offsets (or, as noted in the quote above, a half-hour offset), Nepal ticks to its own clock with a 5-hour and 45-minute offset from UTC.
It's as if Nepal took a look at the standard time zones and said, "Why be ordinary when you can add a twist?"
Imagine trying to schedule a call back home, perplexed as you reconcile not just the time difference but—and here's the kicker—those extra 15 minutes that make Nepal unique.
Moon GPS is Coming
NASA and its partners in Europe and Japan are developing lunar satnav concepts that could be deployed by the end of the 2020s. In July, China’s National Space Administration (CNSA) unveiled its plans for a constellation of 21 communications and navigation satellites to support its lunar aspirations.
If you know how to tell time, you probably understand and use a.m. and p.m., and you might even know the terms come from Latin phrases. But do you know what exactly those phrases are, or what they mean in English?
So let's talk about the third cat in the house (after Alan in the last issue and Mittens in the issue before).
This is Pickle, a black-and-white Tuxedo cat with a drive for food that I've never witnessed in another cat.
Two stories from one recent afternoon:
First, when my wife got home from the grocery store, Pickle grabbed the bag of doughnuts from a canvas bag and made off with a big chunk of a long-john.
Then, when she was fixing dinner, Pickle jumped on the counter and made off with a hunk of steak.
My wife chased her around the dining room table, through the living room, and up the stairs to my daughter’s room.
I rushed to follow, and we trapped Pickle between the headboard and the wall.
My wife thinks the cat woofed down a sizable chunk of meat before we could catch her.
I run some lightweight privacy-respecting self-hosted analytics for my blog, so I know what my most popular posts were in 2024. It's hardly surprising that many of these were also published last year, but they include one from 2013 and another from 2018. Having a quick peek at my stats reminded me that the blog content that is most appreciated, shared and read is often not what you might think and some posts retain value over time. My most popular posts last year include a couple of write-ups of conferences I attended, a conference talk I gave, a highly personal reflection on the biggest single-day wave of people moving from Twitter to Mastodon (which is far and away the most-read blog post I've ever written), as well as a few technical descriptions of how to do specific things, and a post I wrote 11 years ago about 3D printers.
What I personally appreciate a great deal is blog posts outlining exactly how some technical thing works, or a step by step description of how someone did something. This also happens to be some of the most consistently popular content on my own blog - the top post last year is something of no relevance to my day job and about a topic I am not really an expert in. But it explains step by step how I did something that a lot of people want to know how to do, so it's useful to the world.
I want to read more stuff like this - helpful tips from human beings who aren't trying to sell anything and aren't just posting to get a cheap reaction on social media. You should get a blog.
I don't know what to write about
The great thing about having your own blog is that there are no rules you can write about whatever you want. I started off mostly throwing my uninformed opinions about librarianship into the void, but over the years I've written a lot of different things, just as the list above shows.
Liam's blog was originally a food blog but is now quite eclectic: brief commentary on the New South Wales planning scheme, observations about how emergency management differs between countries, notes about what he's reading, and the occasional nori roll recipe.
Jessamyn writes about whatever is on her mind which could mostly be described as "libraries and open culture" but covers a lot of ground.
Nobody would be interested in anything I have to say
Are things interesting to you? Did you learn something today? Congratulations, you have something to write about that will be interesting to someone else. I've written blog posts that I thought were well crafted and interesting, and have hardly any views. I've bashed out some half-arsed thoughts off the top of my head, and they've ended up being the most popular things I've ever published. Who knows, man? Just try not to defame anyone, and then put it out there.
I don't have time to blog regularly
Either do I, that's why I don't publish posts regularly. The same rule applies here as for what to write about - there are no rules! Liam publishes nothing for months, and then pumps out four posts in a week. I've had wildly different posting schedules over the years. Adam Mastroianni would rather trash his draft and publish nothing than post something he's not happy with just to keep a schedule. Ashley published one post a week for 39 weeks and then took five months off.
I'm worried big tech will steal my work for their AI
Great, I can't read it on Facebook or LinkedIn, because they're enclosed spaces that require a login to read them. This is also why I strongly urge against using something like Medium, which isn't really a blogging platform since it requires logging in to read posts. Mastodon and other fediverse software and platforms are better, but they're not blogs. You think differently when you're posting longer form content, using a platform that's designed for that.
Ok I'm convinced! How do I get a blog?
I recommend something that provides
easy publishing and editing functionality
RSS/Atom so people can subscribe without requiring you to use an email service
a reading experience that isn't interrupted by advertising
All the suggestions below offer these.
This is not actually necessary, but I strongly suggest you set up your own domain name (e.g. example.com) and set it to auto-renew so you don't accidentally lose it. Some webhosts provide domain registration as well, or you can do it separately. Somewhere like Gandi will get you started. Don't use GoDaddy.
Once you've done that, it all depends on how you prefer to do things, and what your budget is. Earlier this week I asked my Mastodon bubble for suggestions for first-time bloggers - thanks to everyone for your suggestions!
I just want to throw a normal file into Google Drive or Dropbox and have it magically turn into a blog post
USD$5 per month
I haven't used this myself, but Blot might be exactly what you're looking for. The demo on the website looks pretty impressive to me, and the price is attractive. You can use your own custom domain with Blot and it is fully managed for you. Once you've configured Blot, you publish by adding files and folders to a synced folder in Google Drive, Dropbox, or a Git repository, so you can use an application you already know to actually write your content, like MS Word or your favourite text editor.
I like the idea of hacking my own HTML file but don't care about having my own domain name
FREE
If you're too young to remember Geocities, or old enough and are still mourning its demise, then Neocities might be for you. Neocities is designed to be the 2020s version of Geocities: you write raw HTML and the example sites look kinda out there and glitchy because that's the point.
I want to use something with a WYSIWYG interface and a lot of support options
~AUD$5-$20 per month
A lot of people recommended hosted WordPress as the best option for most people. See my note further down about WordPress.com and why I do not recommend WordPress.com as a host. Whilst at the time of writing you may hear that "the WordPress world is in turmoil right now", the reality is that this is extremely unlikely to impact most owners of hosted WordPress sites: the argument is within the WordPress developer community and however it is resolved, it's in everyone's interest for WordPress users to barely notice and it's a piece of openly-licensed software rather than a platform that can just be switched off.
Reclaim Hosting comes highly recommended by many people over time. They're focussed on higher education in the USA but anyone can sign up for a personal plan at very attractive pricing. This is probably the best option for most people.
If you want something based in Australia with local support, a couple of different people recommended VentraIP. This will be more expensive than Reclaim even after accounting for currency exchange rates.
There are many other options - look for "Hosted WordPress". Generally what you get is "shared hosting" with "CPanel", which means your blog will be in a separated section of a web server also hosting several other websites, and you can use a web interface to configure things like the domain you use for your blog. Your chosen host will usually have good documentation on how to get set up.
I want to write in markdown and then press publish
USD$9 per month
Ghost was originally a Kickstarter project by a former WordPress core developer, but has developed quickly from there. Ghost can be used for both websites (e.g. 404 Media) and newsletters (e.g. Mita Williams' University of Winds). Ghost takes the clean and simple markdown-based approach of static site generators but removes all the nerdy futzing so it's more like the WordPress experience. Indeed whilst writing in markdown was originally the only way to use Ghost, it now offers a rich WYSIWYG writing interface as well, so you can compare Ghost and hosted WordPress to see which one you prefer. I published this blog using self-hosted Ghost for a while.
...but I don't want to pay for it
FREE
Publii is an open source static site generator (see below for more on this), but you can connect it to a free GitHub or GitLab Pages account to publish. Interestingly, publishing and configuring Publii works as a desktop application rather than a web interface, which is a little different to most of the options listed here and makes it a lot simpler for normal people than a commandline based system like I describe below.
Blogger is a free service from Google. It's quite bare-bones and really geared towards posting content to attract people to view ads where you share the revenue with Google, so the primary use of Blogger is by spam-blogs. As a Google product you also never know when it will join the Google graveyard. But if you're looking for something basic and free, Blogger was nominated by a couple of people in my unscientific survey, and you will be joining successful and interesting bloggers like Aaron Tay.
I need a frustrating hobby
FREE to ~ $USD10
If you're keen to have more control, you can look into using a static site generator (SSG). An SSG is essentially a commandline script that takes a bunch of input files and outputs a website - HTML files in directories, with all the relevant images, CSS and JavaScript and everything pointing to the right place. Different SSGs use different templating languages, but pretty much all of them use markdown in the page content file and convert it into HTML using an appropriate template.
My blog is made using Zola, but I've previously used Eleventy. To publish with an SSG you either need to use GitLab Pages (which works with most SSGs) or GitHub Pages (which only works with the Jekyll SSG); or have control over some space on some kind of webserver - either shared hosting (something with CPanel), or a standalone virtual private server (VPS). There's a bit of technical work involved to publish this way, so it's not surprising that a great many blogs published with SSGs start off with a couple of posts about how they set up their blog, and sometimes end there. If you want to procrastinate with your SSG setup instead of writing blog posts, this could be a great choice.
Why didn't you recommend the services I've heard of?
The WordPress world is currently experiencing some difficulties, after one of the original creators of WordPress, and owner of WordPress.com, Matt Mullenweg, seems to have taken leave of his senses. His behaviour has been so erratic over the last month that I cannot recommend using his company (WordPress.com/Automattic) to host your blog. I probably wouldn't have recommended this anyway, as I think Automattic is pretty aggressive at upselling to unsuspecting new users. Since the WordPress software is openly licensed, anyone else can use it and provide hosting for you, as I outlined above. The software itself is very robust and several of the other software options I suggest provide exports using the WordPress XML export standard.
Wix is the Yahoo Mail of blogging platforms, with a laggy, busy interface that is constantly upselling to you. It also doesn't provide an export function - if you start a Wix site you're essentially stuck paying Wix until they go bankrupt and your blog is deleted forever.
Squarespace was recommended to me as a good option that "just works" when I asked for suggestions on Mastodon. Squarespace does provide exports using the WordPress xml standard. At AUD$16 per month I don't consider Squarespace a good deal compared to the nearest alternative of hosted WordPress - it's not open source so the only host you can use for a Squarespace blog is Squarespace (although you can export your blog in the WordPress XML format to take it somewhere else). You're also at the mercy of Squarespace's corporate strategy.
What next?
Once you've set up your blog, you can add it to the list at ausglamr.newcardigan.org. Then every time you publish a blog post, it will be shared with the GLAMR world. You can add certain tags to your post if you don't want a particular post to be added to the Aus GLAMR feed.
Lucidworks AI empowers businesses to seamlessly integrate, manage, and optimize generative AI, driving innovation and efficiency while ensuring accuracy and responsible use.
It is time for another roundup of topics in storage that have caught my eye recently. Below the fold I discuss the possible ending of the HAMR saga and various developments in archival storage technology.
Heat-Assisted Magnetic Recording
Unless you have been tracking storage technology for many years, it is hard to appreciate how long the timescales are. My go-to example for communicating this is Seagate's development of HAMR.
Seagate first demonstrated HAMR in 2002. In 2008 they published this graph, predicting HAMR would supplant Perpendicular Magnetic Recording (PMR) starting in 2009.
Seagate has set a course to deliver a 48TB disk drive in 2023 using its HAMR (heat-assisted magnetic recording) technology, doubling areal density every 30 months, meaning 100TB could be possible by 2025/26. ... Seagate will introduce its first HAMR drives in 2020. ... a 20TB+ drive will be rolled out in 2020.
Seagate has been trialing 16TB HAMR drives with select customers for more than a year and claims that the trials have proved that its HAMR drives are "plug and play replacements" for traditional CMR drives, requiring no special care and having no particular poor use cases compared to the drives we're all used to.
Seagate’s biggest-ever hard drive is finally here, coming with 32TB of capacity courtesy of the company’s new HAMR technology (via Expreview).
It has almost been a year since Seagate said it had finally made a hard drive based on heat-assisted magnetic recording (HAMR) technology using its new Mozaic 3+ platform.
...
Exos drives based on Mozaic 3+ were initially released to select customers in small quantities, but now the general release is (nearly) here, thanks to mass production.
Note that the drives that are "(nearly) here" are still not available from Amazon, although they are featured on Seagate's web site.
Kevin Purdy writes:
Drives based on Seagate's Mozaic 3+ platform, in standard drive sizes, will soon arrive with wider availability than its initial test batches. The driver maker put in a financial filing earlier this month (PDF) that it had completed qualification testing with several large-volume customers, including "a leading cloud service provider," akin to Amazon Web Services, Google Cloud, or the like. Volume shipments are likely soon to follow.
There is no price yet, nor promise of delivery, but you can do some wishful thinking on the product page for the Exos M, where 30 and 32TB capacities are offered. That's 3TB per platter, and up to three times the efficiency per terabyte compared to "typical drives," according to Seagate.
Intevac has said there is strong interest in its HAMR disk drive platter and head production machinery from a second customer, which could indicate that Western Digital is now involved in HAMR disk developments following Seagate’s move into volume production.
Intevac supplies its 200 Lean thin-film processing machines to hard disk drive media manufacturers, such as Seagate, Showa Denko and Western Digital. It claims more than 65 percent of the world’s HDD production relies on its machinery. The Lean 200 is used to manufacture recording media, disk drive platters, for current perpendicular magnetic recording (PMR) disks.
Intevac’s main customer for HAMR-capable 200 Lean machines is Seagate, which first embarked on its HAMR development in the early 2000s. It is only this year that a prominent cloud service provider has certified Seagate’s Mozaic 3 HAMR drives for general use, more than 20 years after development first started. The lengthy development period has been ascribed to solving difficulties in producing drives with high reliability from high yield manufacturing processes, and Intevac will have been closely involved in ensuring that its 200 Lean machines played their part in this.
Archival Media
Maybe 2025 will be the year I can finally bring my 12-year-long series about HAMR shipment schedules to a close, 26 years after Seagate started work on the technology. Why have I been HAMR-ing on Seagate all these years, and again now? Not to denigrate Seagate's engineering. Getting a HAMR drive into volume production that meets both the incredibly demanding standards for storage media reliability and performance, and the manufacturing yields needed for profit, is an extraordinarily difficult feat. It is not a surprise that it took a couple of decades.
My criticisms have been aimed at the storage industry's marketing and PR, which hypes developments that are still in the lab as if they are going to solve customers' problems "next year". And at the technology press, which took far too long to start expressing skepticism. Seagate's marketing eventually lost all credibility, with their predictions about HAMR becoming an industry joke.
The situation is far worse when it comes to archival media. The canonical article about some development in the lab starts with the famous IDC graph projecting the amount of data that will be generated in the future. It goes on to describe the density some research team achieved by writing say a gigabyte into their favorite medium in the lab. This conveys four false impressions:
That there is some possibility they could scale their process up to a meaningful fraction of IDC's projected demand, or even to the microscopic fraction of the projected demand that makes sense to archive. There is no such possibility. Archival media is a much smaller market than regular media.
That the researcher's favorite medium could make it into the market in the timescale of IDC's projections. As we see with HAMR, where research started twenty-six years ago, it won't. Research on the silica medium started fifteen years ago. Research on the DNA medium started thirty-six years ago. Neither is within five years of market entry.
That even if the researcher's favorite medium did make it into the market it would be a product that consumers could use. As Kestutis Patiejunas figured out more than a decade ago, the only way to make the economics of archival storage work is to do it at data-center scale and harvest the synergies that come from not needing data-center power, cooling, staffing, etc.
That anyone either cares or even knows what medium their archived data lives on. Only the hyper-scalers do. Consumers believe their data is safe in the cloud. If anyone really cares about archiving they use a service such as Glacier, when they definitely have no idea what medium is being used.
Consumers already have an affordable, durable archival medium. As I have shown:
Surprisingly, with no special storage precautions, generic low-cost media, and consumer drives, I'm getting good data from CD-Rs more than 20 years old, and from DVD-Rs nearly 18 years old.
The market for DVD-R media and drives is gradually dying because they have been supplanted in the non-archival space by streaming, an illustration that consumers really don't care about archiving their data!
DNA Storage
In 2018's DNA's Niche in the Storage Market I imagined myself as the product marketing guy for an attempt to build a rack-scale DNA storage system, and concluded:
Engineers, your challenge is to increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.
The only viable market for DNA storage is the data-center, and the two critical parameters are still the write bandwidth and the write cost. As far as I'm aware despite the considerable progress in the last 6 years both parameters are still many orders of magnitude short of what a system would have needed back then to enter the market. Worse, the last six years of data center technology development have increased the need for write bandwidth and reduced the target cost. DNA storage is in a Red Queen's Race and it is a long way behind.
Nevertheless, DNA's long-term potential as an archival storage medium justifies continued research. Among recent publications is Parallel molecular data storage by printing epigenetic bits on DNA by Cheng Zhang et al, which avoids the need to synthesize strands of DNA by attaching the bits to prexisting strands.
In principle this can be done in parallel.
As is traditional, they start by asserting:
The markedly expanding global data-sphere has posed an imminent challenge on large-scale data storage and an urgent need for better storage materials. Inspired by the way genetic information is preserved in nature, DNA has been recently considered a promising biomaterial for digital data storage owing to its extraordinary storage density and durability.
The paper attracted comment from, among others, The Register, Ars Technica and Nature. In each case the commentary included some skepticism. Here are Carina Imburgia and Jeff Nivala from the University of Washington team in Nature:
However, there are still challenges to overcome. For example, epigenetic marks such as methyl groups are not copied by the standard PCR techniques used to replicate DNA, necessitating a more complex strategy to preserve epi-bit information when copying DNA data. The long-term behaviour of the methyl marks (such as their stability) in various conditions is also an open question that requires further study.
Another challenge is that many applications require random access memory (RAM), which enables subsets of data to be retrieved and read from a database. However, in the epi-bit system, the entire database would need to be sequenced to access any subset of the files, which would be inefficient using nanopore sequencing. Moreover, the overall cost of the new system exceeds that of conventional DNA data storage and of digital storage systems, limiting immediate practical applications;
we stored 269,337 bits including the image of a tiger rubbing from the Han dynasty in ancient China and the coloured picture of a panda ... An automatic liquid handling platform was used to typeset large-scale data at a speed of approximately 40 bits s−1
This is interesting research but the skepticism in the commentaries doesn't exactly convey the difficulty and the time needed to scale from writing less than 40KB in a bit under 2 hours, to the petabyte/month rates (about 2.8TB every 2 hours) Facebook was writing a decade ago. This would be a speed-up of nearly 11 orders of magnitude to compete with decade-old technology.
The research, published in Nature Photonics, highlights that the breakthrough extends beyond density. It is said to offer significant improvements in write times – as little as 200 femtoseconds – and lives up to the promise that "a diamond is forever" by offering millions of years of allegedly maintenance-free storage. Diamonds are highly stable by nature and the the authors have claimed their medium could protect data for 100 years even if kept at 200°C.
High-speed readout is demonstrated with a fidelity of over 99 percent, according to the boffins.
Scientists have been eyeing diamonds as storage devices for a while. Researchers at City College of New York in 2016 claimed to be the first group to demonstrate the viability of using diamond as a platform for superdense memory storage.
These researchers, like so many others in the field, fail to understand that the key to success in archival storage is reducing total system cost. Long-lived but expensive media like diamonds are thus counter-productive.
Project Silica
I wrote about Microsoft's Project Silica last March, in Microsoft's Archival Storage Research. The more I think about this technology, the more I think it probably has the best chance of impacting the market among all the rival archival storage technologies:
The media is very cheap, so the effect of Kryder's Law economics driving media replacement and thus its economic rather than technical lifetime is minimal.
The media is quasi-immortal and survives benign neglect, so opex once written is minimal.
The media is write-once, and the write and read heads are physically separate, so the data cannot be encrypted or erased by malware. The long read latency makes exfiltrating large amounts of data hard.
The robotics are simple and highly redundant. Any of the shuttles can reach any of the platters. They should be much less troublesome than tape library robotics because, unlike tape, a robot failure only renders a small fraction of the library inaccessible and is easily repaired.
All the technologies needed are in the market now, the only breakthroughs needed are economic, not technological.
The team has worked on improving the write bandwidth which is a critical issue for archival storage at scale. They can currently write hundreds of megabytes a second.
Like Facebook's archival storage technologies, Project Silica enjoys the synergies of data center scale without needing full data center environmental and power resources.
Like Facebook's technologies, Project Silica has an in-house customer, Azure's archival storage, with a need for a product like this.
The expensive part of the system is the write head, because it uses costly femtosecond lasers. The eventual system's economics will depend upon the progress made in cost-reducing the lasers.
LibraryThing is pleased to sit down this month with poet and book publicist Kim Dower, who has worked with authors from Kristin Hannah to Paolo Coelho through her freelance literary publicity company, Kim-from-L.A. The City Poet Laureate of West Hollywood from October 2016 – October 2018, she is the author of five previous collections of poetry, including the bestselling I Wore This Dress Today for You, Mom (2022), which was praised by The Washington Post as a “fantastic collection.” Her first collection, Air Kissing on Mars (2010), was praised by the Los Angeles Times as “sensual and evocative… seamlessly combining humor and heartache.” Her work has appeared in literary publications such as Plume, Ploughshares, Rattle, The James Dickey Review, and Garrison Keillor’s “The Writer’s Almanac.” Her newest book, What She Wants: Poems on Obsession, Desire, Despair, Euphoria, will be published later this month by Red Hen Press. Dower sat down with Abigail to answer some questions about her work, and this new book.
What She Wants is your sixth poetry collection, and addresses the theme of obsessive love. What was the inspiration behind the book? Did it begin with a specific poem, a personal experience you wanted to explore, or something else?
I was reading an article (can’t remember where!) and came upon the word “Limerence.” I thought it was a beautiful sounding word, and it’s meaning, the state of being obsessively infatuated with someone, usually accompanied by delusions of or a desire for an intense romantic relationship with that person, fascinated me! I became obsessed with a word that meant to be obsessed! I realized I had many finished poems and many in the works that fit into this category, so I built a collection based on this idea and the four stages of limerence: infatuation, crystallization, deterioration and ecstatic release.
What makes poetry unique, as a form of literary expression? Is it just the structure that makes it different from prose, or does it communicate in different ways?
Because poetry is the most concise form of language, good poems will stir our emotions with a clarity and intensity that immediately takes hold in the reader. There’s an emotional honesty in poems that connects poet to reader to create a shared experience. It has been said that prose is like walking and poetry is like dancing. A single, short poem has the power to simultaneously comfort and terrify. The poet W.H. Auden says, “poetry is the clear expression of mixed feelings,” and this is true for the poet as she writes and the reader as well.
Can you tell us a little bit about your writing process? How does a poet begin a poem?
I don’t know how all poets begin a poem, but I begin one after being stirred or moved by something, something personal or something I’ve read or overheard. Or something I think is funny. I often read a news headline or hear something on the radio as I’m driving that immediately says THIS IS A POEM! I was once driving, listening to the local news, and the headline, talking about a new public school decision was, “They’re Taking Chocolate Milk Off the Menu!” I pulled over and wrote a poem with that title. Later, after it was published, Garrison Keillor read it on “The Writer’s Almanac.” Poems are everywhere and I use everything I see and hear as a prompt – whether it’s something whimsical that strikes me, or something more profound like hearing a dead parent speak to me.
How has working with so many different authors, through your activities as a publicist, affected your writing?
The only way working hard at a “day” job has affected my writing is I’m very focused when I sit down to write. I’ve learned how to separate the two kinds of work and my brain and mind like knowing and appreciate the difference!
You were Poet Laureate of the city of West Hollywood for two years. What sort of things did you do as a poet laureate?
It was so much fun creating different activities, readings and events and introducing people to poetry who otherwise never thought about it. My favorite project was creating a collaborative poem with people in the city. The City of West Hollywood is committed to the arts and supported all of my ideas. We designed a large pad with three prompts and I spent a few months asking strangers at local bookstores, cafes, parks, to participate in reading a prompt and writing some lines. People really enjoyed it and I created a powerful poem consisting of all their lines called, “I Sing the Body West Hollywood.” We made posters. We celebrated!
Who are some of your favorite poets, and how has their work influenced your own?
I have so many favorites and so many whose work has influenced my own. More than influence – whose work has given me permission to build my own voice. I love Frank O’Hara – New York School of Poets – who’s influenced my “conversational” often breezy style while still packing a punch! William Carlos Williams, whose poetry has taught me to strive to make each poem a “fine machine.” Erica Jong, Sharon Olds and Kim Addonizio, for their passion, beauty, perceptions; Thomas Lux, Ron Padgett, Stephen Dunn, for humor mixed with deep emotion and insight. W.H. Auden for his style. This list could go on and on.
Tell us about your library. What’s on your own shelves?
I have hundreds and hundreds of books! I love all kinds of fiction, biographies, memoir, but upstairs, in my “Poetry Palace” I have only poetry – books I’ve kept and carried for 50 years – from college through today. I have a marvelous collection from Shakespeare to contemporary poets. Occasionally, just to calm myself, I will sit on the floor and take a random book off the shelf, read one or two poems, and place it back. This morning, for example, it was Diane di Prima’s book, The Poetry Deal. I read from it aloud. Now I can go on with my day.
What have you been reading lately, and what would you recommend to other readers?
I’m re-reading Vivian Gornick’s amazing, gorgeous memoir, Fierce Attachments, about her relationship with her mother. It’s a classic and each time I read it I discover something else – not only about her – but about myself.
My poet friend, Nina Clements – who was also a Librarian – sent me a book called Monsters by Claire Dederer, which I’m enjoying, about the link between genius and monstrosity. How do we balance our love of some artists knowing the awful things they’ve done. This is a subject that constantly fascinates me.
Continuing the review of 2024, the following summaries the activities of the NDSA Interest and Working Groups activities. Please have a look at NDSA’s accomplishments – and feel free to reach out to NDSA with any questions on how you can get involved!
Interest Groups
Content Interest Group
For the year, the Content Interest Group met quarterly on the first Thursday of the month at 12:00pm EST. We identified topics of interest through an ongoing but now defunct jamboard! We held 4 meetings utilizing various formats to facilitate the exchange of information. In February we held a Content Exchange about how your organization manages the access levels of digital content in your reading rooms, physical and virtual. Due to the success of the first Content Exchange, we held another one in May on how your organizations are increasing representation or including under-represented groups in your collections. In August, we switched it up with presentations by metadata experts and discussion on understanding metadata standards and ways to incorporate them in our work. Julie Shi, Digital Preservation Librarian, Scholars Portal, University of Toronto Libraries, discussed METS. Leslie Johnston, Director of Digital Preservation, U.S. National Archives and Records Administration discussed PREMIS. We winded the year down with our final meeting in November with a discussion using content to show impact of preservation or the risk of loss.
Infrastructure Interest Group
For the Infrastructure Interest Group, 2024 began with a discussion led by the founders of the AEOLIAN Network, a project whose focus was “to investigate the role that AI can play to make born-digital and digitised cultural records more accessible to users.” Its outcomes included multiple workshops, case studies and journal publications, all of which focused on the larger community’s use of AI in this space. During its next two quarterly meetings, group members presented on their unique requirements and solutions surrounding content staging areas and repository ingest workflows. We listened to in-depth descriptions of workflows from the University of Alabama, Birmingham, the University Libraries at Ohio State University and finally the UW Digital Collections Center, University of Wisconsin, Madison. Our final meeting of the year introduced the Internet Archive’s Vanishing Culture: A Report on Our Fragile Cultural Record as a reading selection, from which several thought-provoking essays were brought forward for discussion by group members.
Standards & Practices Interest Group
The Standards & Practices Interest Group met quarterly on the first Monday of the month at 1:00 p.m. Eastern. Our topics for this year included: Digital preservation system migration; The language of the cloud; Selection for preservation; and Persistent identifiers and preservation. Michael Dulock presented his experience with migrating preservation systems at UC-Boulder for our January meeting. Our subsequent meetings were member-driven discussions. By far the most engaging and well-attended discussion was our exploration of “the language of the cloud” at our April 1, 2024 meeting. We shared experiences with outsourcing infrastructure to major cloud-based vendors (AWS, Azure, etc), and how that has impacted our preservation practices. Out of this discussion, we formed a sub-group to develop a survey on cloud-based infrastructure practices across the membership, which included members of the Infrastructure Interest Group. With the release of the NDSA Storage Survey, we will resume work on the Cloud Services sub-group in 2025, with a follow-up discussion scheduled for our first meeting S&P IG on January 13, 2025.
Working Groups
Communication and Publications Working Group
The Communications and Publications Working Group (CAPs) worked with the Coordinating Committee and chairs of Interest and Working Groups to update internal documentation and website content and publish blog posts and reports. CAPs works with survey Working Groups to edit and publish the reports, sometimes working on statistical analysis quality assurance. This year CAPs developed additional guidelines around accessibility for report preparation.
Climate Watch Working Group
The Climate Watch Working Group had a productive year establishing our workflows, clarifying our publication criteria and objectives, and setting up our publication platform. We hope to release our inaugural publication early in 2025, so keep an eye out for announcements!
Events Strategy Working Group
Beginning in Spring 2024, the Events Strategy Working Group (ESWG) focused on a framework for operationalizing recommendations from a previous planning group. They held monthly meetings, with breakout discussions that focused on working groups working on plans for a National Conference and Designated Communities. Despite some uncertainties about NDSA’s organizational affiliation, ESWG plans to deliver three key items by April 2025: (1) a charge for a standing Events Steering Committee to manage NDSA’s overall events strategy and serve as a liaison to annual conference committees; (2) an annual meeting toolkit with recommendations for both in-person and online events (with in-person events to resume in 2027 and coincide with NDSA Excellence Awards), and an action plan to encourage local and regional communities of practice with affiliated organizations and institutions. This action plan will define a process for developing an experts list and a speaker’s bureau to support digital preservation activities and workshops as well as a mechanism for endorsing digital preservation panels at affiliated events.
Excellence Awards
In 2024, the Excellence Awards Working Group utilized the year without an awards cycle to promote EAWG through blogs and video clips. Blogs were published via SAA’s bloggERS and the NDSA blog. Video clips were uploaded to the NDSA YouTube channel, and an Excellence Awards playlist was created to group them. Further blogs and clips are scheduled into 2025.
In addition, EAWG co-chairs drafted an Overview and Guidelines for the EAWG, which has been reviewed by the Communications and Publications Working Group. The EAWG projects finalizing this document in 2025. Finally, Jessica Venlet accepted the position of EAWG Co-chair for 2025-2027.
Levels of Digital Preservation
This year a new Levels Revision Working Group was formed with a remit to carry out a focused review of the Levels looking specifically at the environmental impact of the Levels. Look out for further news on this work in 2025.
This year we have run four Open Sessions: in January we held a general Q&A on the Levels, in April we focused on the Curation Guide, in July the topic was the Assessment Tool and October’s session provided a general introduction to the Levels aimed at those who hadn’t used them before. The Levels Steering Group also gave a presentation on Levels at the Virtual DigiPres conference.
We have seen a few changes on the Levels Steering Group, welcoming new members Rebecca Fraimow, Elizabeth La Beaud and Keith Pendergrass. Karen Cariani left the Steering Group to focus on other priorities and we thank her for all her hard work.
Membership Working Group
The Membership Working Group worked on a new process for onboarding new members to NDSA, and produced a comprehensive report with six detailed proposals designed to increase engagement of members. One of these proposals focused on having a standing Membership Working Group, who would provide onboarding and ongoing membership support. This group will launch in January 2025.
Storage Survey
The working group published the 2023 storage infrastructure report in October, along with anonymized survey results, the survey codebook, and a crosswalk between the 2019 and 2023 survey questions. Working group members presented the survey results at iPRES, DLF, and SAA’s Research Forum.
I'm about halfway through Saul Griffith's 2021 Electrify: An Optimist's Playbook for Our Clean Energy Future, and I find the author makes a compelling point about bringing nearly everything—energy creation, transmission, and use—to a common factor of "electricity" and then optimizing that system.
There are many interesting problems to solve, but they seem solvable.
In last week's Thursday Threads, I touched on how data centers impact the electrical grid.
This week's issue looks further into how electricity is generated and distributed.
The first article reflects back on the data center topic—it could have just as easily gone in last week's issue.
Then there are a few other articles on the generation, storage, the flip away from carbon-based fuels, and a look at history.
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page.
If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Commercial Electricians in Demand for Data Center Construction
These traveling electricians are transforming the sagebrush here in central Washington, with substations going up on orchards and farmland. Hundreds have come to a triangle of counties tied together by hydropower dams along the Columbia River. They are chasing overtime and bonuses, working 60-hour weeks that can allow them to make as much as $2,800 a week after taxes. For all the hype over $100,000 chips and million-dollar engineers, the billions pouring into the infrastructure of A.I. is being built by former morticians, retired pro football linebackers, single moms, two dudes described as Gandalf in overalls, onetime bouncers and a roving legend known as Big Job Bob.
The New York Times publishes this in-depth piece about the boom time for commercial electricians (or anyone who wants to train to become one).
Data centers require substantial electrical power to support the high computing needs of artificial intelligence and the storage to save your New Year's Eve photos (as well as the power to run the cooling systems for those computers).
Although AI has propelled the construction of data centers to a sharper slope, significant building and expansion projects were already underway.
This article is a view at the intersection of traditional construction/labor, technology, land use, and economic growth.
Instability Cause By Overgeneration of Rooftop Solar
[The Australian Energy Market Operator] said the ever growing output from solar was posing an increasing threat to the safety and security of the grid because it was pushing out all other forms of generation that were needed to help keep the system stable. And it warned that unless it had the power to reduce — or curtail — the amount of rooftop solar times, more drastic and damaging measures would need to be taken. These could include increasing the voltage levels in parts of the poles-and-wires network to "deliberately" trip or curtail small-scale solar in some areas. An even more dramatic step would be to "shed" or dump parts of the poles-and-wires network feeding big amounts of excess solar into the grid.
Electricity is unique in that the providers must exactly match the demand at every moment.
Excess generation capacity must be removed from the grid...it is just as bad as too little electricity.
(Storage of excess electricity is a topic all its own; see below.)
In Australia, the rapid growth of solar power generation is making it difficult for the grid operator to achieve that balance.
Rooftop solar is great, but having that energy dumped uncontrolled back onto the grid causes instability.
(That isn't the only problem on the grid...there are devices that, as Grady of Practical Engineering says, "force the grid to produce power and move it through the system, even though they aren’t even consuming it.")
Generating Power from the Tides
Solar energy is the bedrock of most renewable energy grid plans – but lunar energy is even more predictable, and a number of different companies are working to commercialize energy generated from the regular inflows and outflows of the tides. One we&aposve completely missed is Minesto, which is taking a very different and remarkably dynamic approach compared to most. Where devices like Orbital&aposs O2 tidal turbine more or less just sit there in the water harvesting energy from tidal currents, Minesto&aposs Dragon series are anchored to the sea bed, and fly around like kites, treating the currents like wind.
The problem with variable sources like solar and wind is the need for a baseline supply of always-there electricity.
Coal, natural gas, and nuclear are good at meeting that baseline power need.
Tidal systems are a clean, constant source of energy as well.
Smarter Grid Reduces Demand As Required
On the morning of April 3, Taiwan was hit by a 7.4 magnitude earthquake. Seconds later, hundreds of battery-swap stations in Taiwan sensed something else: the power frequency of the electric grid took a sudden drop, a signal that some power plants had been disconnected in the disaster. The grid was now struggling to meet energy demand. These stations, built by the Taiwanese company Gogoro for electric-powered two-wheeled vehicles like scooters, mopeds, and bikes, reacted immediately. According to numbers provided by the company, 590 Gogoro battery-swap locations ... stopped drawing electricity from the grid, lowering local demand by a total six megawatts—enough to power thousands of homes. It took 12 minutes for the grid to recover, and the battery-swap stations then resumed normal operation.
In addition to managing the supply, there also needs to be advancements in managing the demand side.
Businesses already do this...their flexibility to reduce their electricity usage during high-demand events results in cheaper electricity rates because the utility doesn't need to build as much capacity just-in-case.
This kind of variable pricing is also available to some homeowners.
However, technology on the grid can help support this as well.
This article talks about a scooter battery charging company that automatically takes equipment offline when generation capacity unexpectedly drops.
Imagine this same sort of grid intelligence available for e-vehicle charging stations as well.
Storing Energy in Mine Shafts
One of Europe’s deepest mines is being transformed into an underground energy store. It will use gravity to retain excess power for when it is needed. The remote Finnish community of Pyhäjärvi is 450 kilometres north of Helsinki. Its more than 1,400-metre-deep zinc and copper Pyhäsalmi mine was decommissioned but is now being given a new lease of life by Scotland-based company Gravitricity. The firm has developed an energy storage system that raises and lowers weights, offering what it says are “some of the best characteristics of lithium-ion batteries and pumped hydro storage”.
Solar panels only produce power when the sun is out, and wind turbines only produce power when the wind blows.
We will need a way to store energy during times of overproduction and send it out to the grid when demand requires it.
Many technologies are being explored to use excess energy to pump water uphill or spin a heavy flywheel.
The technique in this article raises weights in a deep mine shaft to store energy.
Storing Energy as Compressed Air
Toronto-based Hydrostor Inc. is one of the businesses developing long-duration energy storage that has moved beyond lab scale and is now focusing on building big things. The company makes systems that store energy underground in the form of compressed air, which can be released to produce electricity for eight hours or longer.
Another potential storage solution is compressed air.
All of these systems have trade-offs of expense versus capacity versus location requirements and other factors.
Some of these experiments will succeed, and some won't be commercially viable.
The Last Coal-fired Powerplant in Hawaii is Replaced
Hawaii shut down its last coal plant on September 1, 2022, eliminating 180 megawatts of fossil-fueled baseload power from the grid on Oahu — a crucial step in the state’s first-in-the-nation commitment to cease burning fossil fuels for electricity by 2045. But the move posed a question that’s becoming increasingly urgent as clean energy surges across the United States: How do you maintain a reliable grid while switching from familiar fossil plants to a portfolio of small and large renewables that run off the vagaries of the weather? Now Hawaii has an answer: It’s a gigantic battery, unlike the gigantic batteries that have been built before.
With new generation and storage technologies, where does that leave the traditional burning-carbon-based tools?
Fortunately, not long for this world.
The Rise of Renewables
Wind and solar generated more power than coal through the first seven months of the year, federal data shows, in a first for renewable resources. The milestone had been long expected due to a steady stream of coal plant retirements and the rapid growth of wind and solar. Last year, wind and solar outpaced coal through May before the fossil fuel eventually overtook the pair when power demand surged in the summer. But the most recent statistics showed why wind and solar are on track in 2024 to exceed coal generation for an entire calendar year — with the renewable resources maintaining their lead through the heat of July.
It would seem that the momentum away from burning carbon fuels is well established.
I hope it is established enough to deal with the instability that could be caused by the incoming U.S. federal administration.
This Clock Made Power Grids Possible
On 23 October 1916, an engineer named Henry E. Warren quietly revolutionized power transmission by installing an electric clock in the L Street generating station of Boston’s Edison Electric Illuminating Co. This master station clock kept a very particular type of time: It used a synchronous self-starting motor in conjunction with a pendulum to help maintain the station’s AC electricity at a steady 60-cycle-per-second frequency. As more power stations adopted the clocks, the frequency regulation allowed them to share electricity and create an interconnected power grid.
Before there was a grid, there were many isolated islands of power generation.
The "alternating" part of "alternating current" meant that these islands couldn't be connected until the cycles of alternation could be synchronized.
We take 60-cycles-per-second for granted now, but it wasn't always this way.
From Energy to No Energy
This has become Alan's routine in the morning.
It is far too cold—and now far too snowy—to work outside on the patio.
So Alan sleeps through the long winter days on my keyboard numeric pad until spring.