Planet Code4Lib

Are we walking the talk? A snapshot of how academic LIS journals are (or aren’t) enacting disciplinary values / In the Library, With the Lead Pipe

By Rachel Borchardt, Symphony Bruce, Amanda Click, and Charlotte Roh

In Brief 

The academic library field claims to value social responsibility, open access, equity, diversity, and inclusion (EDI). But academic library journal practices do not always reflect these values. This article describes a mixed-method study designed to operationalize and measure these values in practice. We found that many of the journals we examined have open access policies and practices but fall short in providing for accessibility and ensuring EDI in their publication processes. In short, library journals are not meeting the ideals that our own field has defined.  

Introduction

There is a great deal of discussion about our disciplinary values in the library and information science (LIS) field. We prioritize values like access to information, diversity and democracy, and lifelong learning and social responsibility (American Library Association, 2019). Librarians urge our professor colleagues to publish their scholarship in open access journals and teach with open educational resources. We form diversity committees and inclusive reading groups. We talk about pushing back on the systems (e.g., scholarly publishing) and institutions (e.g., universities) that perpetuate inequity. There are many ways to explore whether the LIS field is living up to these ideals; this multi-method study focuses on these values in LIS scholarly communications, particularly academic librarianship journals. It attempts to summarize the general state of academic library journals, including trends in publishers, indexing, and metrics. In addition, it explores whether these journals are demonstrating a commitment to the values of the LIS profession, namely open access and equity, diversity, and inclusion (EDI).1 

Our research questions are as follows:

  • What is the general state of academic library journals, including trends in publishers, indexing, and metrics? 
  • Are the core academic library journals demonstrating a commitment to the values of the LIS profession, namely openness, inclusion, and equity? 

Literature Review

The LIS Scholarly Landscape

The research that explores the state of the LIS scholarly landscape often focuses on bibliometrics. For example, Sin (2011) investigated the relationship between international coauthorship and citation impact. Walters and Wilder (2015) explored “the contributions of particular disciplines, countries, and academic departments” to the LIS literature. Bibliographic analysis of specific journals is popular, including Library Management Journal (Singh & Chander, 2014), JASIST (Bar-Ilan, 2012), and Journal of Documentation (Tsay & Shu, 2011). 

Another common theme is LIS research content analysis. Kim and Jeong (2006) analyzed the development and use of theory in LIS research. A recent paper summarized research topics and methods across 50 years of LIS research (Järvelin & Vakkari, 2021). Onyancha (2018) used author-supplied keywords to map the evolution of LIS research. 

Only a few attempts have been made to identify and/or evaluate the LIS journal corpus. As Nixon observed in 2014, “In library and information science (LIS) there is no professionally accepted tiered or ranked list of journals in the United States,” which “creates a dilemma for librarian-authors who wish to expand the literature in librarianship, write about successful programs, or report on research findings” (p. 66). She proposes a methodology that takes into account expert opinions, circulation and acceptance rates, impact factors, and h-indexes.

In 2005, Nisonger and Davis replicated a 1985 study by Kohl and Davis that identified and ranked 71 LIS journals by surveying LIS program deans and Association of College & Research Libraries (ACRL) library directors. The resulting list demonstrates a “hierarchy of prestige” that is based on the perceptions of a small, albeit knowledgeable, number of people in the LIS field at that time.

EDI & the Professional Values of Academic Librarians 

Recent documents published by ACRL, a division of the American Library Association (ALA) and the professional association for academic librarians, clearly encourage a commitment to EDI. The ACRL Plan for Excellence includes seven core organizational values, one of which is equity, diversity, and inclusion. The original document, approved in 2011, included simply diversity. The first version to replace diversity with equity, diversity, and inclusion was approved in November 2018. Following an ALA task force report in 2017, equity, diversity, and inclusion recommendations were approved for implementation in February 2018, which led to the inclusion of EDI in the ALA Policy Manual.

In 2019, ACRL published an updated scholarly communications research agenda. This report, Open and equitable scholarly communications: creating a more inclusive future, calls for the scholarly communications and research environment to be more open, inclusive, and equitable and defines these concepts:  

Open… refers to removing barriers to access and encouraging use and reuse, especially of the tools of production of scholarly content and of the outputs of that work.

Inclusive… refers to (1) creating opportunities for greater participation in systems, institutions, and processes involved in creating, sharing, and consuming research; (2) removing barriers that can hinder such participation; and (3) actively encouraging and welcoming people to participate, particularly those whose voices have often been marginalized.

Equitable… refers to ensuring that systems, institutions, and processes function in a way that gives everyone what they need in order to successfully participate (ACRL, 2019b).

In a statement updated in 2019 and focused specifically on open access, ACRL recommends that “academic librarians publish in open access venues, deposit in open repositories, and make openly accessible all products across the lifecycle of their scholarly and research activity, including articles, research data, monographs, presentations, digital scholarship, grant documentation, and grey literature.” In addition, the organization urges “librarians who are editors, reviewers, authors, grantees, or digital scholars should advocate open models of creation and dissemination with publishers, funding agencies, and project or program managers” (ACRL, 2019a). 

Open Access in LIS 

Open access (OA) is “the free, immediate, online availability of research articles coupled with the rights to use these articles fully in the digital environment” (SPARC, n.d.). In 2016 Ollé Castellà, López-Borrull, & Abadal surveyed the editors of 212 LIS journals indexed in Scopus and the Web of Knowledge. While only 10% of the journals in the sample were fully OA at the time, the respondents believed that OA funded via institutional support would become the most common model in the short term. A 2016 article examined the open access status of five ALA division peer reviewed journals. Three of the five were fully OA, providing “unrestricted access to published content” (Hall, Arnold-Garza, Gong, & Shorish, 2016, p. 659). The fourth was reportedly transitioning from a green to gold OA model (although it does not appear to have done so yet), and the fifth had no plans to adopt an OA model (see the Defining Openness, Equity, and Inclusion Practices section under Method section for definitions of OA types). 

Mercer (2011) found that 49% of articles written by academic librarians and appearing in LISA in 2008 were available OA. Authors in the study categorized as Other (e.g., public librarians, LIS faculty) published OA 37% of the time. In a more recent survey of academic librarians, 50% of respondents indicated that they considered open access status when selecting a journal for publication. However, only 6% named OA as their top consideration, and many expressed concern about funding for article processing charges (APCs)2 and promotion and tenure expectations (Neville & Crampsie, 2019). 

Journals’ open data policies and practice have also been examined in several disciplines, including library and information science (LIS). Jackson studied the strength, level of detail, and compliance with open data policies in over 200 LIS journals and found that the strongest open data policies were created by independent publishers, with commercial publishers having more uniform but relatively weaker, vaguer, and less comprehensive policies in place, including many instances where LIS journals chose to adopt weaker policies from an available range of policies made available from the commercial publisher (2021).

Inclusion & Equity in Publishing

The Committee on Publication Ethics (COPE) is a non-profit that provides a variety of support for publication ethics, including guidance on a variety of topics such as transparency and open access (n.d.). They recently surveyed members regarding EDI and publication ethics to help identify areas of priority for support. In 2021, COPE hosted an EDI-focused webinar for its members and released updated, freely-available guidance regarding diversification of editorial boards. COPE membership is voluntary but has over 13,000 members, primarily individual publishers.

In 2018, the Library Publishing Coalition released An Ethical Framework for Library Publishing, a document designed to center publishing practices around library values. Each section of the Framework includes an introduction, scope statement, existing resources, and recommendations. Practical recommendations in the section on diversity, equity, and inclusion, include:

  • Create a diversity statement for the publishing program or point to the library’s diversity statement;
  • Educate graduate students and faculty on systemic biases in academic publishing and strategies to dismantle barriers;
  • Provide access to your publications to diverse audiences through direct promotion in diverse communities and open or reduced cost to access content (Library Publishing Coalition, 2018). 

The Coalition for Diversity and Inclusion in Scholarly Communications (C4DISC) was founded in 2019 in order to address issues around diversity, equity, and inclusion by trade and professional associations that represent organizations and individuals working in scholarly communications. Its work thus far includes a joint statement of principles, two anti-racist toolkits, and several webinar workshops. Involvement is tiered, with formal members, sponsoring partners, and individual donors. 

Charlotte Roh, who is a co-author of this article, has written extensively on the lack of diversity in scholarly publishing, librarianship, and academia in general. Her work has been cited broadly in the last few years in the publishing industry. In an early C&RL News scholarly communications column, she observed:

As librarians who are engaging more directly with scholarly publishing, we must ask ourselves: Are we perpetuating the biases and power structures of traditional scholarly publishing? Or are we using library publishing to interrogate, educate, and establish more equitable models of scholarly communication? As librarians, we can be explicit about inequalities in scholarly publishing. We can take action to avoid reproducing them in our unique roles as publishers, scholarly communication experts, and information literacy providers (Roh, 2016, p. 85).

Inefuku and Roh (2016) argue that librarians can play an important role in advocating for social justice and diversity in scholarly communications. Libraries can host and publish new journals that specifically include marginalized voices, in an effort to disrupt traditional academic publishing. As journal authors, editors, and reviewers, librarians can push for open access policies and editorial board diversity. And librarians can educate and advocate, cultivating “an open access-oriented mind-set in the next generation of scholars” and addressing information access disparities.  

Research Metrics and Academic Librarianship

In November 2020, the ACRL Executive Board approved the ACRL Framework for Impactful Scholarship and Metrics, which establishes a suggested framework for the evaluation of academic librarian scholarship (Borchardt et al., 2020). This framework includes a variety of article-level metrics, including not only citations as an indicator of scholarly impact but also downloads, views, shares, mentions, and comments as potential indicators of practitioner impact. Librarians wishing to use this framework to demonstrate the impact of their scholarly output need access to these metrics, many of which are typically provided directly from publishers.

Method

Our study design involved a collection and analysis of content from several sources, including academic library journal websites, followed by a survey of journal editors. Prior to data collection, we compiled a comprehensive list of relevant journals and operationalized the concepts of open, equitable, and inclusive principles. In the first stage of the study, we collected non-exhaustive information about inclusive and equitable practices from each journal’s website but did not seek additional information from sources such as social media or journal editorials. In the second stage, we sent a survey to all journal editors, asking them to elaborate on relevant practices and policies. 

Journal Selection

The goal of this study was to analyze the state of academic library journals, with a focus on the EDI values of the subfield. Unfortunately, there is no universally agreed upon corpus of scholarly journals for the academic library field. We considered building a journal list using Nixon’s (2014) methodology for ranking LIS journals, or Nisonger and Davis’s (2005) ranking of journals based on the perceptions of ARL library directors and LIS deans. However, we determined that San Jose State University’s LIS Publications Wiki (n.d.) offered a more inclusive set of journals. The Wiki’s inclusion criteria is not published. The list is global in scale but not comprehensive. Regardless of these limitations, other scholarly communications researchers have also found this wiki to be an appropriate and thorough resource (Vandegrift & Bowley, 2014). We began with the Wiki’s journal list and applied the inclusion and exclusion criteria in Table 1.

Inclusion CriteriaExclusion Criteria
Scholarly journalNot scholarly (e.g., trade publication)
Peer reviewedNot peer reviewed
Academic library focus, demonstrated by any but not necessarily all of the following:Academic librarians are a primary, but not necessarily exclusive, audiencePublication is owned by ACRLAcademic librarianship is in the title Primary focus is something other than academic libraries (e.g. information science, public libraries)
Issue(s) published in the last 12 monthsNo issue published in the last 12 months

Table 1. Inclusion and exclusion criteria for journals selected for this study. 

One title, Journal of Librarianship and Scholarly Communication, was manually added to the list. The final list of 78 journals is included in Appendix A.

Defining Openness, Inclusion, and Equity Practices

Based on the 2019 ACRL Open and equitable scholarly communications report’s definitions of open, inclusive, and equitable, we developed a checklist of practices and policies that demonstrate an ongoing commitment to these practices, as shown in Table 2. Open access categories were defined as gold, green, platinum, hybrid, and bronze, based on commonly used criteria (“Open access,” 2021). 

CharacteristicDefinitionDemonstrated in Journals
Open“removing barriers to access and encouraging use and reuse, especially of the tools of production of scholarly content and of the outputs of that work.”Does the journal offer OA publishing options? What type?
Does the journal encourage the open sharing of data? 
Does the journal apply creative commons licenses and/or confer ownership to the author(s) by default? 
Inclusive“(1) creating opportunities for greater participation in systems, institutions, and processes involved in creating, sharing, and consuming research; (2) removing barriers that can hinder such participation; and (3) actively encouraging and welcoming people to participate, particularly those whose voices have often been marginalized.”Does the journal actively and continuously recruit reviewers or editors from underrepresented groups?
Does the journal actively and continuously encourage authors from underrepresented groups to submit manuscripts?
Does the journal waive publishing fees for demonstrated need?
Does the journal demonstrate flexibility in accepted research processes and scholarly output format?
Does the journal ensure that the journal website is accessible for all users (e.g., ADA compliant, all article formats compatible with assistive technology)?
Does the journal ensure that the journal backend is accessible for all reviewers, authors, and editors (e.g., WCAG compliant)?
Does the journal provide professional development for journal workers to ensure inclusive practices (e.g., anti-bias training)?
Does the journal author guidelines encourage inclusive language (e.g., they/them) and variety of writing styles?
Equitable“ensuring that systems, institutions, and processes function in a way that gives everyone what they need in order to successfully participate.”Does the journal provide additional assistance to authors (e.g., language support, proofreading, mentoring, alternate contacts in case of problems)?
Does the journal formally recognize the work of everyone who has contributed to scholarly output (e.g., open peer review, crediting contributors such as research assistants)?
Does the journal pay editors, authors, and/or reviewers for their labor?

Table 2. Open, inclusive, and equitable practices checklist of questions for journal practice.

Database Data Collection

We used Ulrich’s Periodicals Directory to collect indexing information for the journals and Cabell’s to collect acceptance rate information. Open access status data was collected from both Cabell’s and publisher websites because Cabell’s only lists one open access category per journal.

Unpaywall Data Collection

We contacted Unpaywall to ask them for repository data for all journals included in our study for the past three years. This included repository rates > 0% for several journals that we had not identified as being green journals. We excluded one journal with a 0% repository rate that we had not classified as green but otherwise assumed that repository rates for these journals were due to either incomplete access to data or university mandates overriding journal-level green OA policies.

Journal Website Data Collection

Statements, policies, or other documentation of open, equitable, and/or inclusive practices were collected from each journal’s website, as well as available metrics for individual articles. Interpretation of information, such as how a policy journal website demonstrated open, inclusive, and/or equitable practice, was generally conducted by one individual member of the research team at a time, with two team members consulting as needed for unusual or unclear situations. In the case of OA classification and copyright, sometimes journal websites clearly stated that articles had Creative Commons licensing or similar licensing but did not clearly state that authors could publish their work in repositories. If the repository allowance was not clear — e.g., “we allow authors to self-archive their publications freely on the web” (Marketing Libraries Journal) — we did not categorize the journal as green unless the journal editor’s survey response clarified the green OA policy. 

Survey Data Collection

A survey asking about the demonstrated open, equitable, and inclusive practices in Table 2 was sent to editors, editors-in-chief, and/or editorial boards for all 78 journals. We received 40 responses from 38 journals for an overall response rate of 48.7%. 

Openness, Inclusion, and Equity Analysis

Information collected from journal websites were combined with survey results to categorize the openness, equity, and inclusion practices for all journals. In most cases, survey responses were taken at face value but in a few cases were replaced with data from other sources when available. For example, a journal self-reported as gold OA, but neither their website, Cabell’s, or Sherpa-Romeo showed evidence of gold status and/or APC charges.

Survey comments were analyzed for common trends, including practices not originally part of the survey multiple-choice responses. The survey instrument and anonymized data are available at https://doi.org/10.6084/m9.figshare.c.5744702.v1.

Findings

We collected data from 78 journal websites and received survey responses from 38 of those journals. The mixed-methods design represents our effort to ensure accuracy. We felt an obligation to accurately represent the efforts of these publications during a time in which organizations are attempting to enact change around EDI. However, because the field is in a time of transition, these results may not align exactly with the current state of journal policy and practice at time of publication. For example, between the beginning and end of data collection, Communications in Information Literacy unveiled their new Statement of Values, which addresses OA and EDI issues. College & Research Libraries has recently adopted and will soon implement a data policy. The Canadian Journal of Academic Librarianship is developing a name change policy, which supports inclusivity for transgender researchers by allowing for “rapid and discreet author name changes to be made on digital editions of published works” (DePaul, 2021). Thus, these results could more accurately be labeled a “snapshot.”

Publishers

For sake of simplicity, we only aggregated for-profit publishers in Figure 6, as the non-profit publishers/hosting platforms for the majority of journals vary widely. Among for-profit publishers, Taylor & Francis is by far the most common, publishing nearly ¼ of the journals (19 of 78), followed by Emerald which published seven of the journals. 

Journal publisher prevalence. Non-profit 59%, Taylor & Francis 24.4%, Emerald 9%, Elsevier 3.8%, De Gruyter 1.3%, Litwin Books 1.3%, Sage 1.3%, Springer 1.3%, Wiley 1.3%

Figure 1. Prevalence of journal publishers 

Journal publisher prevalence: Non-profit 59%, Taylor & Francis 24.4%, Emerald 9%, Elsevier 3.8%, De Gruyter 1.3%, Litwin Books 1.3%, Sage 1.3%, Springer 1.3%, Wiley 1.3%

Indexing and Journal-Level Metrics

As demonstrated in Figure 2, indexing of the academic librarianship literature is varied and inconsistent, with no single database including more than 70% of the journals in the study. Notably, 13% of the journals are not indexed by any LIS databases (or Web of Science / Scopus), according to Ulrich’s. This demonstrates somewhat of a disconnect between the more traditional ways in which a journal is acknowledged within academia — in this case, indexing — and the corpus of the academic library field. This disconnect became even more apparent when we considered two databases responsible for journal-level metrics, Web of Science (owned by Clarivate, who produces Journal Impact Factors) and Scopus (owned by Elsevier, who produce CiteScore and whose data is also used to tabulate SCImago Journal Rank or SJR). In fact, while 48% of the journals studied are included in Web of Science, only 18 of them (22.7%) have impact factors, ranging from 0 to 3.18. The average impact factor for these 18 academic library journals is 1.39. This demonstrates the dearth of journal-level metrics for academic librarianship and is one of many arguments against the use of journal-level metrics for meaningful evaluation of academic librarian scholarship, which are demonstrably limited in their ability to serve as a proxy for the larger concepts of quality and impact for scholarly publications (Borchardt et al., 2020; Davies et al., 2021).

Percentage of journal titles indexed by database: LISTA 67.2%, LISA 65.4%, LISS 62.8%, LLIS 62.8%, Scopus 52.6%, Web of Science 47.4%, ProQuest Library Science 37.2%, ProQuest L&IS 35.9%, Gale IS&L 10.3%, No indexing 12.8%

Figure 2. Percentage of journal titles indexed by database

Percentage of journal titles indexed by database: LISTA 67.2%, LISA 65.4%, LISS 62.8%, LLIS 62.8%, Scopus 52.6%, Web of Science 47.4%, ProQuest Library Science 37.2%, ProQuest L&IS 35.9%, Gale IS&L 10.3%, No indexing 12.8%

Acceptance rate is another commonly-used journal-level metric. Cabell’s has reported acceptance rates for 43 of the 78 journals. Of those with acceptance rates reported in Cabell’s, the average rate was 46.2%, with a standard deviation of 17.4%. We believe that these rates are self-reported, as the range of acceptance rates was quite high, with “5-10%” reported as the lowest rate for Catholic Library World. However, this metric was in conflict with information given to us by the Catholic Library World editor in the survey. On reaching out to the editor, she thought that whoever had reported that information had probably reversed the percentage and that a rate of 90-95% was more reasonable, given her experience.

Article-Level Metrics

Article-level metrics are provided for 67.9% of the journals in the study. Citations were the most common (though citation counts provided by a publisher will vary according to the source from which citations are drawn), as shown in Figure 8. Downloads and page views, perhaps the most useful unique indicators, are included in less than one-third of the journals. Altmetric and PlumX, two companies who collate a variety of “alternative metrics”, are also represented. Some of the metrics are a reflection of the journal’s platform, as comments, refbacks, and pingbacks are all more commonly associated with blogging platforms but can provide unique qualitative insight into the engagement with a particular article. The 32.1% of journals with no article-level metrics indicates significant room for improvement, as article-level metrics are key for academic librarians wishing to demonstrate the scholarly and practitioner impact of their publications.

Availability of article-level metrics and measures. Citations 44.9%, Altmetric 41%, Downloads 32.1%, Page views 29.5%, PlumX 12.8%, Refbacks 3.8%, Comments 2.6%, Accesses 1.3%, Pingbacks 1.3%, None of these 32.1%

Figure 3. Prevalence of article-level metrics

Availability of article-level metrics and measures: Citations 44.9%, Altmetric 41%, Downloads 32.1%, Page views 29.5%, PlumX 12.8%, Refbacks 3.8%, Comments 2.6%, Accesses 1.3%, Pingbacks 1.3%, None of these 32.1%

Openness

Looking at open access categories (Figure 1), green OA is by far the most common form of open access, followed by platinum and hybrid, which account for the vast majority of journals in our dataset. Four journals, or 5.1% of those studied, are bronze, indicating that only some content is openly accessible, and no open options were found for one journal. As expected, every journal providing hybrid publications was hosted by a major (for-profit) publisher. We can conclude from these results that open access is an accepted norm in academic library scholarship. Since open access to information is a platform espoused by major library associations, this is both not a surprise and also gratifying to see how librarianship is, in this area, practicing what it preaches. In the case of the Journal of Creative Library Practice, the editor explains:

We created this journal in order to promote and experiment with open access publishing — publishing articles on acceptance rather than bundling into issue, being open-minded about citation styles, and allowing for open peer review if wanted. We also wanted to validate the open sharing of practice in library and information science, given the nature of our field in which most practitioners don’t have the luxury of time and resources for large research projects.

However, it should be noted that journal-level embrace of open access policies does not necessarily translate into author-level compliance with the green OA practice of making publications and/or preprints available. Unpaywall’s data reported a 15.3% average rate of repository deposit for the published articles, with a 10.6% average rate of deposit for preprint versions of publications. This shows that authors also bear responsibility for ensuring that they are contributing to the equitable practice of making publications accessible to all when possible.

According to the respondents, on the question of copyright and Creative Commons licensing (Figure 2), authors retain copyright at just under half of the journals studied. The journal retains copyright at 17.7% of journals. The “sometimes” copyright retention is usually associated with hybrid journals, where authors presumably retain copyright but pay to have their article published open access. Finally, no easily-discernible copyright or Creative Commons information was available for 6.3% of journals.

Nearly half of the journals studied do not have any explicit policies in place on data sharing (Figure 3). Several journal editors provided more complicated responses: two journals indicated that they did not regularly deal with data and thus the answer was not applicable; one indicated that they were in the process of adopting such a policy; and three noted that they encourage data sharing, although they do not have specific policies in place. For example, the Journal of Librarianship and Scholarly Communication stated, “We encourage data sharing, but do not require it. We provide suggestions for places to store the data.” 

Does this journal offer open access options? Green 71.8%, Platinum 47.4%, Hybrid 42.3%, Bronze 5.1%, None 1.3%

Figure 4. Prevalence of open access options

Does this journal offer open access options? Green 71.8%, Platinum 47.4%, Hybrid 42.3%, Bronze 5.1%, None 1.3%

Do authors retain copyright or does the journal use a CC-BY license? yes 47.4%, no 17.9%, sometimes 28.2%, unclear 6.4%

Figure 5. Status of copyright ownership and Creative Commons licensing

Do authors retain copyright and/or does the journal use a CC-BY license3 ? yes 47.4%, no 17.9%, sometimes 28.2%, unclear 6.4%

Does your journal have policies in place to encourage the open sharing of data? No 48.7%, Yes 43.6%, N/A 2.6%, encouraged 3.8%, in process 1.3%

Figure 6. Status of open data sharing policies

Does your journal have policies in place to encourage the open sharing of data? No 48.7%, Yes 43.6%, N/A 2.6%, encouraged 3.8%, in process 1.3%

Inclusivity

In order to investigate inclusive journal practices, we asked survey respondents whether they engaged in any of the following eight examples:

  • actively and continuously recruits reviewers or editors from underrepresented groups
  • actively and continuously encourages authors from underrepresented groups to submit manuscripts
  • waives publishing fees for demonstrated need
  • demonstrates flexibility in accepted research processes and scholarly output format
  • ensures that the journal website is accessible for all users (e.g., ADA compliant, all article formats compatible with assistive technology)
  • ensures that the journal backend is accessible for all reviewers, authors, and editors (e.g., WCAG compliant)
  • provides professional development for journal workers to ensure inclusive practices (e.g., anti-bias training)
  • author guidelines encourage inclusive language (e.g., they/them) and variety of writing styles

The prevalence of these practices is included, in the order listed above, in Figure 4.

The most common practices dealt with recruiting reviewers, editors, and/or authors from underrepresented groups and flexibility with the research processes and scholarly output format. The least common practice was providing EDI professional development opportunities for journal workers (e.g., anti-bias training). Five respondents indicated that their journals waived publishing fees, but upon further scrutiny, we believe that these responses were given in error. The waiving of article processing charges (APCs) is associated with gold OA, an OA format that did not appear in this dataset, and we are unaware of any other mandatory fees for authors (as opposed to voluntary, as in the case of hybrid publications). These responses have been removed from the dataset. 

In our survey, all the respondents with the exception of one indicated that they did have inclusive practices in place. However, the content analysis of journal websites showed no evidence of these inclusive practices for more than half of the journals (see Figure 4). This suggests that respondents think they have committed to inclusive practices that were not visible on their website, which makes it difficult to ascertain the veracity of their responses.

Percent of journals with demonstrated or self-reported inclusive practices: Actively and continuously recruits reviewers or editors from underrepresented groups 42.3%; actively and continuously encourages authors from underrepresented groups to submit manuscripts 34.6%; demonstrates flexibility in accepted research processes and scholarly output format 33.3%; ensures that the journal website is accessible for all users (e.g., ADA compliant, all article formats compatible with assistive technology) 19.2%; ensures that the journal backend is accessible for all reviewers, authors, and editors (e.g., WCAG compliant) 14.1%; provides professional development for journal workers to ensure inclusive practices (e.g., anti-bias training) 9.0%; author guidelines encourage inclusive language (e.g., they/them), and variety of writing styles 23.1%, none of these 51.3%

Figure 7. Prevalence of inclusive practices

Percent of journals with demonstrated or self-reported inclusive practices: actively and continuously recruits reviewers or editors from underrepresented groups 42.3%; actively and continuously encourages authors from underrepresented groups to submit manuscripts 34.6%; demonstrates flexibility in accepted research processes and scholarly output format 33.3%; ensures that the journal website is accessible for all users (e.g., ADA compliant, all article formats compatible with assistive technology) 19.2%; ensures that the journal backend is accessible for all reviewers, authors, and editors (e.g., WCAG compliant) 14.1%; provides professional development for journal workers to ensure inclusive practices (e.g., anti-bias training) 9.0%; author guidelines encourage inclusive language (e.g., they/them) and variety of writing styles 23.1%; none of these, according to website evidence 51.3%

Some participants noted specific examples of inclusive practices, including those focused on language, peer-review, and other policies. Library Philosophy and Practice “embraces the concept of World Englishes and International English, and welcomes well-written articles in any variety of academic English.” The Canadian Journal of Academic Librarianship notes that their “internal style guide encourages inclusive language and in particular, language that is inclusive of Indigenous peoples and knowledges.” The name change policy for the Marketing Libraries Journal states that:

If authors wish to change their names following publication, we will update the manuscript with the name changes and/or pronoun changes, with no legal documentation required. Upon receiving the name change request, we will update all metadata, published content, and associated records under our control to reflect the requested name change. Further, we respect the privacy and discretion in an author’s request for a name change. To protect the author’s privacy, we will not publish a correction notice to the paper, and we will not notify co-authors of the change.

Inclusive peer review practices included open and flexible peer review, tailored to the wishes of the authors and reviewers (Journal of Radical Librarianship), and specialized reviewers for authors who speak English as an additional language (Communications in Information Literacy). 

Equity

We asked about three categories of equity practice in our survey:

  • provides additional assistance to authors (e.g., language support, proofreading, mentoring, alternate contacts in case of problems)
  • formally recognizes the work of everyone who has contributed to scholarly output (e.g., open peer review, crediting contributors such as research assistants)
  • pays editors, authors, and/or reviewers for their labor

Equitable practices were less prevalent than other values measured, with the majority of journals not having any demonstrable or self-reported equitable practices. Of these practices, payment was particularly low, with just 10.3% of journals reporting financial compensation for editors, authors, and/or reviewers. However, it’s worth noting that payment is not common practice in academic publishing. The editorial board of Lead Pipe has had multiple discussions about providing payment but has “always chosen to remain independent of sponsors or other types of fundraising.” One journal editor who is paid a small stipend reports regularly using this money to pay a specialized editor “to help international authors with language support and proofreading.” This is a unique situation for two reasons: first that the editor is receiving payment in the first place, which is not typical, and second that this editor is using their own money to pay for a special editor. Presumably this is an individual decision rather than a policy one, so we don’t know whether this practice would continue. Since none of the journals mentioned payment for labor on their websites, all of the compensation data was collected via survey responses. 

Many journals outlined standards for establishment of authorship, as well as standards for including other non-author support in acknowledgments, but few provided a structure such as CRediT, the taxonomy for contributor roles. Peer review practices varied widely across journals, including the ways that reviewers are credited or acknowledged. Some journals publicly thank reviewers in editorials; others provide formal letters of support for those seeking tenure and/or promotion. A number have registered for Publons, a platform that tracks peer review and editing work, so that their reviewers can more easily quantify and demonstrate their labor. 

Percent of journals with demonstrated or self-reported equitable practices: provides additional assistance to authors (e.g., language support, proofreading, mentoring, alternate contacts in case of problems) 41%; formally recognizes the work of everyone who has contributed to scholarly output (e.g., open peer review, crediting contributors such as research assistants) 17.9%; pays editors, authors, and/or reviewers for their labor 10.3%; none of these 57.7%

Figure 8. Prevalence of equitable practices

Percent of journals with demonstrated or self-reported equitable practices: provides additional assistance to authors (e.g., language support, proofreading, mentoring, alternate contacts in case of problems) 41%; formally recognizes the work of everyone who has contributed to scholarly output (e.g., open peer review, crediting contributors such as research assistants) 17.9%; pays editors, authors, and/or reviewers for their labor 10.3%; none of these 57.7%

Discussion

Emerging Best Practices

There was a correlation between the journals that demonstrated the highest levels of open, inclusive, and equitable practices and those that had a more independent publishing platform, increasing the likelihood that they were truly working towards a less harmful experience for readers, authors, reviewers, and editors alike. These journals generally do not have to answer to commercial publishers with policies that could be prioritized over the possible will or interest of the editorial board.

In this section, we would like to highlight some of the journals that demonstrate emerging best inclusive and equitable practices, based on the self-reported survey data. When interested librarians respond to the Journal of the Canadian Health Libraries Association’s calls for new editors, they are required to include an EDI statement that describes relevant experience and training. This journal also includes the following language in their author guidelines: “Authors submitting to the journal must strive to use language that is free of bias and avoid perpetuating prejudicial beliefs or demeaning attitudes in their writing. Please consult the APA guidelines and recommendations available for Bias-Free Language.”

The style guide for Weave: Journal of Library User Experience contains an Inclusive Language section that covers using pronouns, writing about disability, and avoiding harmful language. Calls for editors and board members explicitly encourage applications from the BIPOC community, people with disabilities, and people who identify as LGBTQIA+. The journal’s Dialog Box, a feature that expands the concept of the book review, specifically invites content outside of traditional scholarly formats.

More than half of In the Library with the Lead Pipe board members are women of color, an intentional shift. This journal stands out in our dataset for not only engaging in the majority of inclusive and equitable practices listed in the survey, but also codifying their commitment with language, policies, and procedures. In addition to the options listed on the survey instrument, the editors’ responses indicate a culture of care: “…during the pandemic we collectively took steps to address our own mental health and decided to pause submissions for three months.”   

Communications in Information Literacy has made recent strides in inclusive and equitable practice and policy. Their recently published Statement of Values addresses authors’ ownership of their work, EDI and social justice, and “practicing care in our relationships with authors, peer reviewers, readers, and colleagues.” Journal editors developed a new policy on inclusive language, with input from the transgender and non-binary library community. In addition, they make a concerted effort to engage the international scholarly community via targeted outreach with organizations like IFLA and specialized manuscript reviewers for articles written by authors who speak English as an additional language. 

Room for Improvement

We see similar trends across many of the values-based practices we measured: there is a lot of room for improvement. Overall, we found no inclusive practices for 51% of journals and no equitable practices for 57%. Some survey responses made clear that journal policies were outside editorial control. As one observed, “Many of our policies are dictated by our publisher, so we (as editors) don’t really get the option of doing things unless we bring it up.” Notably, a few editors referred us to their publisher as the owner of diversity policies, eschewing responsibility and knowledge. We found this “passing of the buck” particularly troubling. Many large publishers have announced EDI initiatives, whether performative or sincere. For example, Taylor & Francis includes EDI as a “Corporate Responsibility,” but it seems that these types of efforts are regularly operationalized at the journal level. With only 9% of journals reporting professional development opportunities such as bias training, there is little to no evidence of real change for scholars who might be experiencing structural and individual barriers. This may be particularly true for scholars with disabilities, since only 14% of survey respondents replied that they ensure that the backend is accessible for reviewers, authors, and editors. While this may be due to a lack of knowledge from the editors who completed the survey, one summed up the situation nicely, saying “The backend is barely usable for me, so I hope it is at least [ADA/WCAG] compliant.”

The decision of where to host content is one with far-reaching implications, and it is concerning that more than 40% of the journals in our study are hosted with a for-profit publisher, and therefore captive to commercial priorities that can be at odds with library values. Starting with the open access status of these journals, we noted the strong and enduring impact publishers have had on both setting and limiting the degree to which our collective values are embodied in our journals. As one respondent noted, “As an editor, the things I can do are limited by the choice of publisher for our journal. We’ve gone through a publisher change recently, and that limited our open access options (but increased the support we are able to provide for authors of accepted work).” Journals held by traditional, for-profit publishers are more likely to have more restrictive open access options and are bound to publisher policies, and the editor’s quote hints at some of the complicated trade-offs involved with choosing a journal’s publisher. These findings are consistent with Jackson’s findings that commercial publishers had weaker overall open data policies (2021).

Even for journal editors who stated in our survey that they have inclusive and equitable practices in place, many could not provide documentation for that work, indicating that much of this is done editorial board to editorial board and is not codified. One editor shared with us that they “don’t have good enough written policies and procedures; we rely too much on the ongoing knowledge and goodwill of editors. This needs to be improved.” Documenting and codifying practices are likely some of the best actions towards creating more open, accessible, equitable, and inclusive publishing environments for library journals, as this would help hold editors accountable for the experiences authors have with them. The problem that we heard many times from editors was that they wanted to make changes but did not have the time, funding, or support from their publisher to do so. While for some, these may be excuses for not taking on necessary change, for others, these are legitimate barriers. 

For example, the author guidelines for one journal advise that “Editors and reviewers often judge misspellings, typos, and grammatical errors harshly; they can undermine a good first impression. Such flaws raise concern about the overall quality of the submission and the meticulousness of the author. Use your software’s spell check, but remember that it will not catch every error. Review your manuscript carefully.” While at first glance this seems like practical advice rather than a legitimate barrier, this tells us that language is being explicitly tied to the quality of the content. The authors and editors are not receiving proper training on the politics of academic English and its biases. 

Similarly, in the survey responses one journal confirmed that “Any training for journal workers (on the publisher end) is on the publisher. We have not provided formal training for reviewers, though we have done some reviewer mentorship and advising.” It is clear that thoughtful training is missing from the process, which is unfortunate since so many careers rely on the outcomes. 

It’s difficult to draw meaningful conclusions about the less visible practices because the extent to which journals are enacting policies or encouraging practice is still relatively unknown. But if we extrapolate from the survey responses received, it’s reasonable to assume that some journals have taken some action, while others have done little or have only made performative declarations without the necessary accompanying actions to transform or improve practice. For example, nearly half of survey respondents claimed to actively and continuously recruit reviewers or editors from underrepresented groups, but this does not necessarily speak to the actual success of these recruitment efforts, since a cursory glance at the current demographics of most library journal editorial boards shows a lack of representative diversity. We made a conscious decision not to include special issues on EDI topics in this study as evidence of inclusive practices because it would be difficult to claim that a special issue is evidence of lasting change. In the past year alone, we witnessed the harm that is caused when journals and their editors take on “diversity topics” without proper internal training and culture shift, when five Black librarians decided to pull an editorial from the Journal of the Medical Library Association (JMLA) after the copy editor made changes to soften the message of their piece on anti-Blackness in librarianship. We commend the bravery of these librarians for speaking up and initiating discussions of the meaningful incorporation of inclusive journal practice, even as we note that two of the librarians have since left the profession and wonder about the prevalence of similar experiences that have caused harm and contributed to toxicity in librarianship. 

What is clear from our evaluation of journals’ current, established, and stated policies and practices, is that most academic library journals are not prepared for or capable of creating truly equitable and inclusive scholarly experiences, outside of open access (which has its own acknowledged EDI issues). The focus on EDI issues in the scholarly communications industry at the highest level represents one of the most likely opportunities for large-scale change within the traditional academic publishing model, but it is ultimately up to motivated individuals and groups to commit to meaningful rather than performative change. This over-reliance on small-scale commitments is heavily influenced by the degree to which journal workers are supported for their work — as our survey indicates, payment for work is not a widely-accepted norm, which means that a journal worker’s commitment can only be as strong as their internal motivation or acknowledgement or reward for their labor that may exist at their institution. 

Future Research

Our methodology relied heavily on the information that was available to us, through licensed resources, journal websites, and voluntary survey responses, but we know that there is still a lot we do not know about how well journals are able to successfully implement sustainable open, equitable, and inclusive practices. One source of information that would create a more accurate picture is that of researcher experience. This idea builds on Kaetrena Davis Kendrick’s work on Twitter to gather information about journals that others “perceived had reviewers who were constructive, measured, thoughtful, mentoring-minded & positive in their feedback” (2021). We propose building a database that could include the stated and self-reported practices we have collected, along with the lived experiences of researchers, both positive and negative, as they engage with these practices as potential authors, reviewers, editors, or other journal workers. Such a resource would not only help establish the degree to which journals have incorporated open, equitable, and inclusive practice, but also help researchers choose publications with which they would like to establish or continue relationships. On a larger scale, a database that can help identify EDI practice in journals could be used as an evaluation tool using a value-based model entirely separate from, or perhaps complementary to, the citation-based journal metrics like impact factor that commonly form the basis of research evaluation. However, misuse, credibility, gamification, performative rather than meaningful adoption, and threats of retaliation are concerns that exist even if EDI practice becomes a commonly accepted measure that academic institutions and journals adopt for the purposes of research evaluation. As the “Shitty Media Men” list has demonstrated, the collection of information that has the potential to damage reputations is not always suitable for public consumption (Donegan, 2018).

Ultimately, these measures are all attempts to drive change. The end goal is full incorporation and embodiment of meaningful open, inclusive, and equitable practice as the norm in academic journals. But what does achieving this goal actually look like? The practices explored in this study are steps along the road to open, inclusive and equitable journals but not necessarily the end state. For example, sustained attempts to improve editorial diversity are important for inclusive excellence in scholarly publishing. But realistically, diversity in editorial boards is only the first step toward incorporation of meaningful inclusive practice, since diverse perspectives can and should call attention to bias in journal policy and practice. Without follow-up, diverse editorial boards may simply lead to tokenization and exhaustion. Future research might consider what an “end game” for incorporation of open, inclusive, and equitable practice looks like — it likely will focus on sustainable practice rather than one-time initiatives and shared understanding of values in addition to demographic diversity. 

Conclusion

Capitalism is bullshit (Chan, 2019; McMillan Cottom, 2020; Horgan, 2019), and our journals cannot adequately support our values given the current journal financial models and weak institutional incentives for the scholars who support journal infrastructure. Until these systems change, we can reasonably expect publishers like Taylor & Francis to continue dictating the values of academic librarianship literature. Changing or evolving the values and practices of a journal requires time, care, and investment; a major commitment is necessary for success. Journal reputation, citation-based journal metrics (e.g., impact factor), and rank/tenure/promotion expectations all serve to support and reinforce existing power structures within many journals and represent major barriers to operationalization of these values as well. Exclusion through manuscript rejection, unnecessarily strict adherence to writing structure and language, and bias in peer review are all hallmarks of traditional scholarly publications and even serve to enhance the reputation of a journal. While these dynamics are in place, meaningful change will be difficult to enact in many journals that pride themselves on their ‘rigor’.

However, academic librarians can advocate for change in many ways, including:

  • pushing for new policies and procedures at the journal level
  • publishing with journals that are in line with our stated values
  • increasing awareness by discussing these issues with colleagues and peers
  • working to update tenure/promotion/rank guidance to prioritize journal work and to value and prioritize journals who embody our values. 

For example, Rachel, with the co-authors’ permissions, has adapted this survey instrument into a “Checklist for evaluating DEI practices of journals.” She developed this document for a faculty committee focused on incorporating EDI principles into scholarly evaluation in the promotion and tenure process. ACRL has published a statement of support for open access publication, but similar support for other equitable and inclusive practices would likely galvanize current publishing practice. Our survey provides a kind of framework for the translation of ideals into meaningful practice, one that we hope more journals will consider and integrate. Scholarly communications ideals and library values matter only if we are moving toward them. Library journals and the broader academic publishing industry must incorporate open, inclusive, and equitable policies and practices in order to move away from systemic oppression and toward a more representative knowledge landscape. 


Acknowledgments

We are grateful to Richard Orr, Jason Priem, and Heather Piwowar at Unpaywall for providing us with the green OA repository and preprint data for the journals in our study. Thank you to Ikumi Crocoll and Ian Beilin, In the Library with the Lead Pipe editors, and Yasmeen Shorish, peer reviewer, for your insightful comments. Your expert feedback made this a much stronger paper.


References

American Library Association [ALA]. (2019). Core values of librarianship. https://www.ala.org/advocacy/intfreedom/corevalues

Association of College & Research Libraries [ACRL]. (2019a). ACRL policy statement on open access to scholarship by academic librarians. https://www.ala.org/acrl/standards/openaccess  

Association of College and Research Libraries [ACRL]. (2019b). Open and Equitable Scholarly Communications: Creating a More Inclusive Future. Prepared by N. Maron and R. Kennison with P. Bracke, N. Hall, I. Gilman, K. Malenfant, C. Roh, and Y. Shorish. Chicago, IL: Association of College and Research Libraries. https://doi.org/10.5860/acrl.1

Bar-Ilan, J. (2012). JASIST 2001–2010. Bulletin of the American Society for Information Science and Technology, 38, 24-28. https://doi.org/10.1002/bult.2012.1720380607 

Borchardt, R., Bivens-Tatum, W., Boruff-Jones, P., Chin Roemer, R., Chodock, T., DeGroote, S., … & Matthews, J. (2020). ACRL framework for impactful scholarship and metrics. https://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/impactful_scholarship.pdf

Chan, L. (2019, April 30). Platform capitalism and the governance of knowledge infrastructure [Keynote presentation]. Digital Initiative Symposium, San Diego, CA, United States. https://doi.org/10.5281/zenodo.2656601 

Committee on Publication Ethics [COPE]. (n.d.) Guidelines. Retrieved October 20, 2021 from https://publicationethics.org/guidance/Guidelines 

Davies, S. W., Putnam, H. M., Ainsworth, T., Baum, J. K., Bove, C. B., Crosby, S. C., … & Bates, A. E. (2021). Promoting inclusive metrics of success and impact to dismantle a discriminatory reward system in science. PLoS biology, 19(6), e3001282. https://doi.org/10.1371/journal.pbio.300128

DePaul, A. (2021, July 21). Scientific publishers expedite name changes for authors. Nature. https://www.nature.com/articles/d41586-021-02014-7 

Donegan, M. (2018, January 10). I started the Media Men List – My name is Moira Donegan. The Cut. https://www.thecut.com/2018/01/moira-donegan-i-started-the-media-men-list.html

Hall, N., Arnold-Garza, S., Gong, R., & Shorish, Y. (2016). Leading by example? ALA Division publications, open access, and sustainability. College & Research Libraries, 77(5), 654–667. https://doi.org/10.5860/crl.77.5.654 

Horgan, J. (2019, February 18). Revolt against the rich. Scientific American Blog Network. https://blogs.scientificamerican.com/cross-check/revolt-against-the-rich/ 

Inefuku, H., & Roh, C. (2016). Agents of diversity and social justice: Librarians and scholarly communication. In K. Smith & K. A. Dickson (Eds.), Open access and the future of scholarly communication: Policy and infrastructure (pp. 107-127). Rowman and Littlefield.

Jackson, B. (2021). Open Data Policies among Library and Information Science Journals. Publications, 9(2), 25. https://doi.org/10.3390/publications9020025

Järvelin, K., & Vakkari, P. (2021). LIS research across 50 years: Content analysis of journal articles. Journal of Documentation, ahead-of-print. https://doi.org/10.1108/jd-03-2021-0062  

Kendrick, K. D. [@Kaetrena]. (2021, August 12). Would you share with me journals that you perceived had reviewers who were constructive, measured, thoughtful, mentoring-minded & positive in their feedback? [Tweet]. Twitter. https://twitter.com/kaetrena/status/1425817950930542594 

Kim, S. J., & Jeong, D. Y. (2006). An analysis of the development and use of theory in library and information science research articles. Library & Information Science Research, 28(4), 548-562. https://doi.org/10.1016/j.lisr.2006.03.018  

Kohl, D. F., & Davis, C. H. (1985). Ratings of journals by ARL library directors and deans of library and information science schools. College & Research Libraries 46(1), 40–47. https://doi.org/10.5860/crl_46_01_40 

Library Publishing Coalition. (2018). An ethical framework for library publishing. https://librarypublishing.org/resources/ethical-framework/ 

McMillan Cottom, T. (2020). Where platform capitalism and racial capitalism meet: The sociology of race and racism in the digital society. Sociology of Race and Ethnicity, 6(4), 441–449. https://doi.org/10.1177/2332649220949473

Mercer, H. (2011). Almost halfway there: An analysis of the open access behaviors of academic librarians. College & Research Libraries, 72(5), 443-453. https://doi.org/10.5860/crl-167 

Neville, T., & Crampsie, C. (2019). From journal selection to open access: Practices among academic librarian scholars. portal: Libraries and the Academy, 19(4), 591–613. https://doi.org/10.1353/pla.2019.0037 

Nisonger, T. E., & Davis, C. H. (2005). The perception of library and information science journals by LIS education deans and ARL library directors: A replication of the Kohl–Davis study. College & Research Libraries, 66(4), 341–377. https://doi.org/10.5860/crl.66.4.341 

Nixon, J. M. (2014). Core journals in library and information science: Developing a methodology for ranking LIS journals. College & Research Libraries, 75(1), 66-90. https://doi.org/10.5860/crl12-387 

Ollé Castellà, C., López‐Borrull, A., & Abadal, E. (2016). The challenges facing library and information science journals: Editors’ opinions. Learned Publishing, 29(2), 89-94. https://doi.org/10.1002/leap.1016  

Onyancha, O. B. (2018). Forty-five years of LIS research evolution, 1971–2015: An informetrics study of the author-supplied keywords. Publishing Research Quarterly, 34(3), 456-470. https://doi.org/10.1007/s12109-018-9590-3  

Open Access. (2021, October 1). In Wikipediahttps://en.wikipedia.org/w/index.php?title=Open_access&oldid=1047593461 

Roh, C. (2016). Library publishing and diversity values: Changing scholarly publishing through policy and scholarly communication education. College & Research Libraries News, 77(2), 82-85. https://crln.acrl.org/index.php/crlnews/article/view/9446/10680 

San Jose State University School of Information. (n.d.) LIS publications wiki – LIS scholarly journals. Retrieved August 7, 2020 from https://ischoolwikis.sjsu.edu/lispublications/wiki/lis-scholarly-journals/ 

Sin, S. C. J. (2011). International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008. Journal of the American Society for Information Science and Technology, 62(9), 1770-1783. https://doi.org/10.1002/asi.21572 

Singh, K. P., & Chander, H. (2014). Publication trends in library and information science: A bibliometric analysis of Library Management journal. Library Management, 35(3), 134–149. https://doi.org/10.1108/lm-05-2013-0039  

Scholarly Publishing and Academic Resources Coalition [SPARC]. (n.d.). Open access. Retrieved October 7, 2021, from https://sparcopen.org/open-access/ 

Tsay, M., & Shu, Z. (2011). Journal bibliometric analysis: A case study on the Journal of Documentation. Journal of Documentation, 67(5), 806–822. https://doi.org/10.1108/00220411111164682 

Vandegrift, M., & Bowley, C. (2014). Librarian, heal thyself: A scholarly communication analysis of LIS journals. In the Library with the Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2014/healthyself/ 

Walters, W. H., & Wilder, E. I. (2016). Disciplinary, national, and departmental contributions to the literature of library and information science, 2007–2012. Journal of the Association for Information Science and Technology, 67(6), 1487-1506. https://doi.org/10.1002/asi.23448 


Appendix A – All Journals Included in the Study

  1. American Archivist, The
  2. Archival Science
  3. Canadian Journal of Academic Librarianship
  4. Canadian Journal of Information and Library Science
  5. Cataloging & Classification Quarterly
  6. Catholic Library World
  7. Collaborative Librarianship
  8. Collection and Curation
  9. Collection Management
  10. College & Undergraduate Libraries
  11. College and Research Libraries
  12. Communications in Information Literacy
  13. Education Libraries
  14. Evidence Based Library and Information Practice
  15. Georgia Library Quarterly
  16. Global Knowledge, Memory and Communication
  17. Health Information and Libraries Journal
  18. IFLA Journal
  19. In the Library With a Lead Pipe
  20. Information Discovery and Delivery
  21. Information Technology and Libraries
  22. International Information & Library Review
  23. International Journal of Librarianship
  24. International Journal of Library Science
  25. Issues in Science and Technology Librarianship
  26. Italian Journal of Library, Archives and Information Science
  27. Journal of Academic Librarianship
  28. Journal of Archival Organization
  29. Journal of Creative Library Practice, The
  30. Journal of Critical Library and Information Studies
  31. Journal of Documentation
  32. Journal of Education for Library and Information Science
  33. Journal of Electronic Resources in Medical Libraries
  34. Journal of Electronic Resources Librarianship
  35. Journal of eScience Librarianship
  36. Journal of Information Literacy
  37. Journal of Intellectual Freedom & Privacy
  38. Journal of Interlibrary Loan, Document Delivery & Electronic Reserve
  39. Journal of Librarianship and Scholarly Communication
  40. Journal of Library & Information Services in Distance Learning
  41. Journal of Library Administration
  42. Journal of Library and Information Science
  43. Journal of Library Metadata
  44. Journal of New Librarianship
  45. Journal of Radical Librarianship
  46. Journal of Research on Libraries and Young Adults
  47. Journal of the Australian Library and Information Association
  48. Journal of the Canadian Health Libraries Association
  49. Journal of the Medical Library Association
  50. Journal of Web Librarianship
  51. Judaica Librarianship
  52. Law Library Journal
  53. Library & Information History
  54. Library & Information Science Research
  55. Library and Information Research
  56. Library Hi Tech
  57. Library Management
  58. Library Philosophy and Practice (LPP)
  59. Library Quarterly, The
  60. Library Resources & Technical Services
  61. Library Trends
  62. LIBRES: Library and Information Science Research e-journal
  63. LIBRI: International Journal of Libraries and Information Studies
  64. Marketing Libraries Journal
  65. Medical Reference Services Quarterly
  66. New Review of Academic Librarianship
  67. Notes: The Quarterly Journal of the Music Library Association
  68. Partnership: The Canadian Journal of Library and Information Practice and Research
  69. Pennsylvania Libraries: Research and Practice
  70. portal: Libraries and the Academy
  71. Practical Academic Librarianship
  72. Public Services Quarterly
  73. Reference & User Services Quarterly
  74. Reference Services Review
  75. Serials Librarian, The
  76. Technical Services Quarterly
  77. Urban Library Journal
  78. Weave: Journal of Library User Experience (Weave UX)
  1. The Association of College & Research Libraries (ACRL) Plan for Excellence directly addresses open access, equity, diversity, and inclusion.
  2. APCs are any fees charged to authors related to publication — in this context, it refers to the fee charged by journals to make the authors’ research freely available and accessible via open access.
  3. This language was used in the survey sent to journal editors, but a more accurate phrasing might have been “Do authors retain copyright or does the journal acquire copyright and apply a CC-BY license?”

Progress On Storage Class Memory / David Rosenthal

Storage Class Memory (SCM) is fast enough to be used as RAM, but persistent through power loss like flash. The idea of returning to the days of core storage, when main memory was in effect a storage layer, is attractive in theory. There have been many proposed SCM technologies, such as restive RAM, magnetoresistive RAM and phase-change memory. Intel and Micron got Optane resistive RAM into production, but it wasn't a great success. It wasn't as fast as RAM, but it was a lot more expensive than flash. The main barrier to adoption was that exploiting its properties required major system software changes.

Now, Mark Tyson's UltraRAM Breakthrough Brings New Memory and Storage Tech to Silicon reports on a promising SCM development published by Peter Hodgson and a team from Lancaster University in ULTRARAM: A Low-Energy, High-Endurance, Compound-Semiconductor Memory on Silicon. Below the fold, I comment briefly.

Their abstract reads in part:
ULTRARAM is a nonvolatile memory with the potential to achieve fast, ultralow-energy electron storage in a floating gate accessed through a triple-barrier resonant tunneling heterostructure. Here its implementation is reported on a Si substrate; a vital step toward cost-effective mass production. ... Fabricated single-cell memories show clear 0/1 logic-state contrast after ≤10 ms duration program/erase pulses of ≈2.5 V, a remarkably fast switching speed for 10 and 20 µm devices. Furthermore, the combination of low voltage and small device capacitance per unit area results in a switching energy that is orders of magnitude lower than dynamic random access memory and flash, for a given cell size. Extended testing of devices reveals retention in excess of 1000 years and degradation-free endurance of over 107 program/erase cycles, surpassing very recent results for similar devices on GaAs substrates.
Note that the advance here isn't the design of the cell, the same team had earlier published similar cells on gallium arsenide. It is that it has been implemented in silicon with promising performance. Hodgson et al write:
Incorporation of ULTRARAM onto Si substrates is a vital step toward realizing low-cost, high-volume production. Si substrates offer several advantages over III–Vs, including mechanical strength and large wafer sizes, thereby allowing fabrication of more devices in parallel and reducing production cost. Moreover, Si is the preferred material for digital logic and has a highly mature fabrication route. In contrast, III–V substrates are fragile, expensive, and generally only available in much smaller wafer sizes, making them less suitable for high-volume production. But III–V semiconductors do provide advantages such as high electron mobilities, superior optoelectronic properties and a greater degree of bandgap engineering, making them the preferred material for LEDs, laser diodes, infrared detectors and for power, radio frequency and high electron mobility transistors.
This work is at a very early stage, testing large single cells. The cell design is complex, but appears to be manufacturable, and the initial performance is encouraging. Nevertheless, any SCM technology faces the same barriers as Optane; it has to be as fast as RAM and cheap enough compared to flash to motivate making the substantial system software changes it requires. Given high manufturing costs before high volumes can be achived, this is difficult.

Inclusive hiring – there’s a lot more than posting on the right job board / Tara Robertson

“Which job board do I post on to reach diverse* candidates?” is a question I’m asked on a weekly basis. There’s more to attracting underrepresented talent than simply where you advertise job opportunities. Attracting new talent starts with having an honest look inward at your current culture. How are you growing and developing your team now? Who is rising up the ranks? Who is represented in leadership (and in the succession plans for leadership?) Who is choosing to leave and why? You need to do an honest look inward before you identify external places to post a job ad.

I have a lot to say on inclusive hiring–this is the first in a series of five posts that I’ll be posting over the next couple of weeks. 

Assess your culture today

person cleaning a hardwood floor with a red vacuum cleaner

I try to clean (or at least tidy) my apartment before guests come over. At minimum, I make sure there’s a place for them to hang their coat, that there’s toilet paper in the bathroom and that I have a clean glass or mug to offer guests something to drink. Thinking along similar lines–does your internal culture need a little bit of a tidy up before you invite new people to join your company? 

Different kinds of data can be very helpful in understanding the big picture. 

Do a voluntary self identification project so you can look at aggregate data about gender, race/ethnicity, disability status, caregiver status, veteran status, etc.

You can use this data to help you understand who is represented in your company now and see how your diversity efforts change over time.

Look at your engagement survey data to understand where there are differences between majority and marginalized demographic groups.


For example, do women, non-binary folks and men all report feeling the same degree of safety to report harassment? Do Asian, Black, Indigenous, Latinx, and white staff all report they have managers who are invested in their growth? Is there a specific office, region, or business unit that has outlier high scores for belonging that you can learn from? Culture Amp has a list of common demographics that might be useful for planning your engagement survey.

Use focus groups to dig deeper to understand why you’re seeing gaps between groups.


After analyzing your engagement survey you might see that there’s a big gap between Asian and white staff on a question about your manager showing genuine interest in your career aspirations. You may know overall at your company that Asian staff’s scores are 20 points lower, but until you really understand why you can’t change things. Focus groups can be a useful way to probe deeper to understand how staff are really experiencing the workplace and really hear about systemic issues.

Audit your voluntary attrition data and exit survey data to understand the patterns of who is leaving and why.


I love Dr. Erin Thomas’ wisdom on this topic. She suggests talking to the departing employee’s manager, HR Business Partner, closest collaborators and work friends and her unconventional advice is to circle back with the departing employee 6-12 months after they’ve left when they’re more likely to have the perspective of what wasn’t working and be willing to share it.

Representation is affected by hiring AND attrition. If a demographic group is leaving faster than you’re hiring you will never shift representation. Attrition is an important, and often overlooked, part of this equation.

Listen to what current and former staff are saying about their experience on social media


What are current employees saying about you on Blind? What are former employees saying about you on Glassdoor? What are employees saying about you on Twitter, LinkedIn, Instagram? The Instagram account Change the Museum posts anonymous quotes detailing “tales of unchecked racism”. 

Lego minifig holding a balloon in one hand and flowers in the other. On a stack of bricks including Megan Hall, Chocolate Soup, VP Operations, CS 2020, Late Founder, 1 yearFrom Chocolate Soup’s Instagram Used with permission.

Of course, it’s not just negative experiences that people share. When I see people at various companies posting personalized Lego blocks from Chocolate Soup on my LinkedIn feed, it always makes me smile. I love reading people’s work stories where they’ve been encouraged to stretch and grow, or where a team was more than the sum of its parts and did something new and innovative.

You should have a better idea of where your culture is awesome and where it needs work. You need to work to fix the problems, whether it’s a culture that celebrates brilliant assholes, or the practice of sweeping bullying and harassment under the carpet. If you don’t do the work to fix what’s not working you may be successful in hiring underrepresented talent, but they’re not going to stay long. Also: staff morale and engagement will tank. When staff take the time and risk to share their experience, you need to hear them, and then act to fix those. 

Before you invite guests over for a party, fix the wobbly handrail at your front steps, clean the bathroom and make sure you’ve vacuumed the dog hair off the couch. 

*A note on language: “diverse” is an adjective that can describe a group of people. A singular person can not be diverse. People use “diverse” as a shorthand for saying what they actually mean because it’s short, sweet and many of us have been taught that it’s impolite to be specific.  If you’re looking to hire women for engineering manager roles, say that. If your university internship program is focused on university students with disabilities, say so. If you’re looking specifically to hire Black senior leaders, then say that. In some places you can’t be this explicit  in the job ad, but you should be clear about who you’re looking to attract. Using vague language like “diverse” allows us to hire someone who is different in a small way (like, someone who has a university degree, but it’s not from an Ivy League school). This means we dodge accountability for prioritizing historically marginalized people and moving the needle in meaningful and necessary ways. 

The post Inclusive hiring – there’s a lot more than posting on the right job board appeared first on Tara Robertson Consulting.

Product Management advice from Ford / Casey Bisson

People can’t always tell you why they like something. And when they do, they often point to things they can describe—or feel comfortable describing—rather than the real reasons.

The numbers / Casey Bisson

Why is Trump suddenly a public advocate of vaccines and booster shots? Somebody did the math for him, according to Donald G.

Router Behind a Uverse/Pace 5268ac Gateway Loses its Mind Every 10 Minutes / Peter Murray

Late last year, I had my AT&T Uverse residential gateway replaced. For reasons that truly baffle me, AT&T has decided that they are going to run unsupported equipment on their residential customer network. When the replacement was swapped in, my family noticed that video conference calls—Zoom and Facetime and Slack—would occasionally drop out for about 10 seconds before continuing. After much frustration, I started timing the outages and found that they were happening at roughly 10-minute intervals (plus or minus just a few seconds).

Some internet searching lead to a forum post ( page 1, page 2) on AT&T’s customer site. As it turns out, there is a conflict with the DHCP address assignment messages when the residential gateway is in DMZplus mode. 1

Forum user “weshunt” had the right solution:

I’m not a network confguration expert, but it bothered me that the Pace [residential gateway] and the USG both wanted to use 192.168.1.x for DHCP allocations. I noticed that even after putting the USG into the DMZPlus, I could connect a wireless device and it would get an address in the Pace’s default 192.168.1.x range, which conflicted with the IP range the USG was trying to manage. And of course the Pace answered to 192.168.1.254, which was also in the default allocation range of the USG.

So I changed the DHCP settings on the Pace to answer to a different subnet (192.168.100.1 with a DHCP allocation range inside 192.168.100.x as well). Like magic, the USG immediately picked up the DHCP assignment from the Pace and got the public IP exactly like I wanted. Now the networks don’t seem to want to fight each other. I can still access the Pace from the wired network via the new gateway IP (192.168.100.1), and also connect to the Pace wirelessly using the old SSID if I need to, though I’m shutting that down to alleviate unnecessary wireless congestion.

Step by step, this is what you need to do.

Change the LAN DHCP Range

With a web browser, go to your residential gateway advanced device configuration page. The link for this will be printed on the bottom of the gateway and is probably http://192.168.1.254. You will also need the “Device Access Code” that is printed just below that web address. I’m using a hardwired ethernet connection between my desktop and the residential gateway, but this will probably also work over wireless, too.

  1. Click on Settings
  2. … then LAN
  3. … … then DHCP.
  4. In the “DHCP Configuration”→”DHCP Network Range” section, select “Configure manually” and enter these values:
    • Router Address: 192.168.100.1
    • Subnet Mask: 255.255.255.0
    • First DHCP Address: 192.168.100.100
    • Last DHCP Address: 192.168.100.200
    • DHCP Lease Time: 24
  5. At the bottom, click “Save”. You’ll need your Device Access Code at this point to save your changes.
Screen Capture of the DHCP Configuration page of a Pace 5268ac Residential Gateway Pace Residential Gateway DHCP Configuration Page

The IP address ranges on the LAN side of the residential gateway have now changed, so the browser’s computer is going to need a new IP address. Unplug the ethernet cable and plug it back in to get a DHCP IP address assignment in the 192.168.100.x block; if using wifi, turn it off and turn it back on.

Set DMZplus Mode for Your Router

Connect to the residential gateway advanced device configuration page again. Also, make sure your router is plugged into the residential gateway. In these examples “SONATA” is the name of my desktop computer, and my home router is called “Gateway”. Yes, I know that is confusing. Sorry ‘bout that.

  1. Click on Settings
  2. … then Firewall
  3. … … then Applications, Pinholes and DMZ.
  4. Under “1) Select a computer” pick your router (not your desktop computer).
  5. Under “2) Edit firewall settings for this computer” pick Allow all applications (DMZ plus mode)
  6. Click “Save”
Screen Capture of the DHCP Configuration page of a Pace 5268ac Residential Gateway Pace Residential Gateway DHCP Configuration Page

Ensure the Router’s Network Address is Correct

I think this section is redundant—it should be set this way as a combination of the two changes above—but you can check it to be sure.

  1. Click on Settings
  2. … then LAN
  3. … … then LAN IP Address Allocation.
  4. Verify the settings for your router:
    • Current address: your assigned IP address from AT&T
    • Device status: DMZ device
    • Firewall: Disabled
    • Address Assignment: Public (select WAN IP Mapping)
    • WAN IP Mapping: Router WAN IP Address (default)
    • Cascade Router: no
  5. Select “Save”
Screen Capture of the DHCP Configuration page of a Pace 5268ac Residential Gateway Pace Residential Gateway DHCP Configuration Page
  1. Aside—my residential gateway is in DMZplus mode because:

    • my home network gear—in particular the wireless access points—are much better than what is in the residential gateway; and
    • I trust AT&T’s network about as far as I can throw that residential gateway…apparently for good reason since AT&T thinks it is okay for its customers to have unsupported routers on their networks.

2021 DigiPres Recordings Available… and More! / Digital Library Federation

The recordings of the 2021 Digital Preservation Conference: Embracing Digitality are now available as a playlist on the NDSA YouTube Channel! If you missed a session during the conference – or perhaps didn’t have the opportunity to attend – now is your chance to catch up on all the great stuff you may have missed. Many thanks to our DLF conference counterparts for posting these.  

Additionally, a page has been created on the NDSA OSF site called NDSA’s Digital Preservation Conference Links. This will provide a centralized location to find links to past years’ conference materials. This page uses the wiki to link to the slides and recordings of available DigiPres conference sessions.  

Much gratitude to our NDSA community and beyond for making #DigiPres21 one for the books. Stay tuned for more information about the 2022 conference!

 

Onwards and upwards,

Tricia Patterson (2021 Chair) 

The post 2021 DigiPres Recordings Available… and More! appeared first on DLF.

Mainstream Media Catching On / David Rosenthal

Two of the externalities of cryptocurrencies I discussed in my Talk at TTI/Vanguard Conference were the way decentralization and immutability work together to enable crime, and their environmental impact. How well are mainstream media doing at covering these problems? The picture is mixed, as the two examples below the fold show.

Centralization

First, a good effort by Edward Ongweso Jr. in ‘All My Apes Gone’: NFT Theft Victims Beg for Centralized Saviors:
On the eve of the new year, tragedy struck in Manhattan: Chelsea art gallery owner Todd Kramer had 615 ETH (about $2.3 million) worth of NFTs, primarily Bored Apes and Mutant Apes, stolen by scammers and listed on the peer-to-peer NFT marketplace OpenSea.

Kramer quickly took to Twitter and begged for help from OpenSea and the NFT community for help regaining his NFTs. Unsurprisingly, he was ripped to shreds by others in the community for not storing his valuable JPEGs in an offline wallet; however, OpenSea froze trading of the stolen NFTs on its platform.

More than a few commentators pointed out that OpenSea's intervention here—and especially Kramer's pleas for a centralized response—seemed to go against a key tenet of the industry that often bumps up against usability: the idea that "code is law," and once your tokens are in someone else's digital wallet, that's the end of the game. While OpenSea did not actually reverse the transaction on the blockchain, it did block the stolen NFT's sale on its own platform, which is the most popular marketplace for NFTs.
Of course, you can bet the "commentators" would change their tune if it was their $2.3M "worth" of NFTs that had been stolen. Ongweso understands the gap between ideology and the real world:
OpenSea's interventions in the cases of stolen NFTs show how centralized intermediaries often have an important role wherever the decentralized world of the blockchain meets the real world. It's also not the first time that similar moves have happened elsewhere in crypto, even though they break from the core dogma of immutability and self-sovereignty.
The entire cryptocurrency ecosystem is based upon inequality of participants, the existence of the "greater fool" to whom tokens can be sold at a profit. Thus the importance of preserving the sanctity of "wealth transfer". If your whole purpose is to rip off the "greater fool" the idea that the "greater fool" might be able to do something about the rip-off is a big problem. Ongweso understands this:
It increasingly feels like the inconsistent application of rules in this space more often results in protecting wealth transfer schemes than protecting all users equally, and obscuring the deep centralization already present: less than one percent of users (institutional investors) account for 64 percent of Coinbase’s trading volume, and 10 percent of traders account for 85 percent of NFT transactions and trade 97 percent of all NFTs at least once.

It’s not clear how this contradiction will be resolved. Uncritically believing decentralization is a salve that immediately transforms something’s politics endangers not only users but crypto’s fever dream of disruption. Take the adoption of blockchain-based technology by investment and central banks. One way to look at this is as a sign of crypto’s inevitability. If you look under the hood, however, it’s more plainly a move by financial institutions to reinforce fiat and further centralize the global financial system.

So long as the contradiction persists and the uncritical belief is held, crypto will find itself in an increasingly weaker position to do anything about any of these concerns.
The contradiction must persist, because the ideology of decentralization is fundamental to cryptocurrencies, but in the real world they are not, and cannot be, decentralized.

Greenwashing

Second, a curate's egg from Gillian Tett in Crypto cannot easily be painted green:
Hence that tricky question for mainstream investors fretting about inflation in 2022: can you dabble in crypto without getting your hands dirty in the real world?

The short answer is “yes — but not easily”. First, the good news: in 2021 this once-anarchic corner of finance started to organise itself to go greener. Most notably, a coalition of 200-odd crypto entities recently joined forces with the Rocky Mountain Institute, a Colorado-based environmental lobby, to create a Crypto Climate Accord.

The accord signatories have apparently agreed to cut the carbon emissions from electricity use to net zero by 2030, partly via carbon offsets but also by switching all blockchain technology to renewable energy sources by 2025 and using energy tracking tools such as so-called green hashtags.
David Gerard illustrates the credibility of these "commitments":
Crypto miner Luxxfolio has signed up to the Crypto Climate Accord “committing” them to power all their mining with 100% renewable electricity. How are they implementing this? By buying 15 megawatts of coal-fired power from the Navajo Nation! They’re paying less than a tenth what other Navajo pay for their power — and 14,000 Navajo don’t have any access to electricity. The local Navajo are not happy.
Tett gives far too much credit to the alternatives:
Many of the newer digital assets — such as cardano or solana — have therefore embraced a different process, created in 2012, known as “proof of stake”.

Purists argue that PoS might be less secure than PoW. But it is also far less energy intensive. And some digital assets, such as chia, have cut energy usage even more by adopting a “proof of space and time” algorithm. Taken together, these moves could further reduce the sector’s carbon footprint, particularly since Joe Lubin, a leader in ethereum (the second biggest digital asset) says ethereum will move from PoW to PoS in the coming months.
Chia hasn't exactly set the world on fire, but not for the reason Tett suggests. And Julia Magas writes in When will Ethereum 2.0 fully launch? Roadmap promises speed, but history says otherwise:
Looking at how fast the relevant updates were implemented in the previous versions of Ethereum roadmaps, it turns out that the planned and real release dates are about a year apart, at the very minimum.
There are many problems with Proof-of-Stake but the main one isn't that it is "less secure", which given its massive extra complexity it may well be, but that the Gini coefficient of cryptocurrencies is extreme and this means it isn't effective at decentralization.

But Tett does identify three real problems with the frantic greenwashing of cryptocurrencies, she just under-estimates them:
Yet, as Goldman says, obstacles remain. One big problem is that bitcoin remains wedded to PoW consensus, and it accounts for about half of the $2tn crypto universe.
There is no possibility that Bitcoin miners will abandon their investment in equipment and migrate away from PoW:
A second problem is that the industry is so murky that it remains to be seen how much transparency the CCA can really create, particularly among non-signatories. Or, as the Sustainable Funds Monitor observes: “In the end, the lack of transparency and data make it exceedingly difficult to point to any one currency being ‘greener’ than others.”
The great mining migration shows that mining will move quickly to wherever electricity is cheapest, so no pledges will last long:
Finally, basket products created by financial institutions could make the ESG challenge considerably worse by jumbling assets together.
Tett misses the bigger point, which is that even if mining used 100% renewable power, it would compete for that power in a market that is supply constrained, displacing more beneficial uses, and increasing prices for everyone. This point isn't lost on, for example, the residents of Zeewolde. Morgan Meaker reports on this in Facebook’s data center plans rile residents in the Netherlands:
Like Schaap, other residents of Zeewolde are outraged that Meta has chosen their town for its first gigantic data center in the Netherlands. They claim the company will be allowed to syphon off a large percentage of the country’s renewable energy supply to power porn, conspiracy theories, and likes on Meta’s social platforms.

Issue 80: Cryptocurrency’s Wasteful Energy Consumption and an Ode to Interlibrary Loan / Peter Murray

Welcome to issue 80 of Thursday Threads. I’m so happy many of you chose to stick around and greetings to all of the new subscribers. To those that received my email last Thursday giving you a heads-up that a new issue would be coming to your inbox but then didn’t receive it: check your spam folder. Over the course of the week, I’ve learned a great deal more about the spam-prevention mechanisms that are keeping our inboxes as clean as they are. I highly recommend the interactive ‘Learn and Test DMARC’ site sponsored by URIPorts. It was useful to see several standards come together to ensure email senders are who they say they are. (If you find this issue in your spam folder, please reply so I can track down more of the causes.)

Two threads this week:

On a professional note, my employer is looking for a FOLIO Services Analyst to join our growing effort bringing the FOLIO open source platform to libraries around the world. If getting in on the ground floor of a revolution in library technology sounds appealing, check out the job description at the link above.

Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ’s Thursday Threads, visit the sign-up page. If you would like a more raw and immediate version of these types of stories, follow me on Twitter where I tweet the bookmarks I save. Comments and tips, as always, are welcome.

Cryptocurrency’s Energy Consumption

Kosovo's government on Tuesday introduced a ban on cryptocurrency mining in an attempt to curb electricity consumption as the country faces the worst energy crisis in a decade due to production outages.
Kosovo bans cryptocurrency mining to save electricity Reuters, 5-Jan-2022
An army of cryptocurrency miners heading to the state for its cheap power and laissez-faire regulation is forecast to send demand soaring by as much as 5,000 megawatts over the next two years. The crypto migration to Texas has been building for months, but the sheer volume of power those miners will need — two times more than the capital city of almost 1 million people consumed in all of 2020 — is only now becoming clear.
Texas Plans to Become the U.S. Bitcoin Capital. Can Its Grid, Ercot, Handle It? Bloomberg, 19-Nov-2021
Jumble of calculator tape Tape Pile, by SidewaysSarah, CC-By

One thread that I already anticipate will be covered on many Thursdays is the growing cryptocurrency problem. In this edition: how cryptocurrencies are a waste of resources. A brief introduction, in case you haven’t encountered this technology yet, goes like this: cryptocurrencies are tokens of value that are exchanged on a “blockchain”. A blockchain, in turn, is like a strip of calculator tape…once something is printed on it, it doesn’t come off and it is there for everyone to see. Cryptocurrencies need “miners” to do the calculations that print something on the tape. Miners race each other to solve complex mathematical problems to be the first to reach the right answer, and when a miner has the answer, it prints it on the tape and all of the other miners check the winner’s work. When the work is accepted, the winning miner gets a little bit of cryptocurrency as a reward and everyone’s transactions that were included in what the winner printed on the tape are considered “confirmed”.

Racks of computers in an industrial warehouse Cryptocurrency Mining Farm, from Wikimedia Commons, CC-BySA

In the early days of cryptocurrencies, ordinary people used their computers to run the cryptocurrency algorithms. But the algorithms were structured in such a way that as more miners started working, the harder the mathematical problems would get. It is no longer feasible for individuals to make any money using their computer’s idle time (that isn’t stopping some companies from trying, though—a Thread for another day). Instead, warehouses of highly specialized computers are doing this work, and are consuming fast quantities of electricity to do so. The world’s leading cryptocurrency mining country was China, but last year China banned miners because of the air pollution coming from the power plants (primarily coal) that were generating electricity for the miners. So the miners left China for politically unstable countries like Kosovo and deregulated states like Texas.

That’s just the start of the cryptocurrency problem. We’ll return to this thread often.

An Ode to Interlibrary Loan

My book arrived a few weeks later, a gift from a mysterious library in the Midwest. Here was knowledge on loan from afar. I loved books, but my local libraries had finite stacks—but now I could get books from anywhere. I have used this service an obnoxious amount of times since. Books about UFOs in France, folklore, saints, basketball, poetry, and propulsion system dissertations—anything.
InterLibrary Loan Will Change Your Life Nick Ripatrazone on Literary Hub, 7-Aug-2019

You know an article is good when it makes the rounds again years later. This 2019 article recently came around again in my Twitter feeds, this time originating with Brewster Kahle. It is a wonderful article about the wonder of discovery and the ethos of libraries to share resources with each other for the benefit of our patrons. Starting in the fifth paragraph, the author takes us on a whirlwind tour of interlibrary loan through the ages.

Interlibrary Loan is a thing of wonder. Modern ILL is chock full of standards, overlapping pools of cooperating libraries, and automation that with each iteration attempts to smooth the process for patrons and librarians. A worthy article if you are in need of an example of why libraries do what they do.

DLF Forum Community Journalist Reflection: Marwah Ayache / Digital Library Federation

Marwah AyacheThis post was written by Marwah Ayache (@cutiewithecards), who was selected to be one of this year’s DLF Forum Community Journalists.

My name is Marwah Ayache (she/her). I am a first generation, Muslim Lebanese-American student librarian. As of now, I work at the University of Michigan – Dearborn Mardigan Library as the User Services Supervisor. Additionally, I am working towards the joint-master’s degree at Wayne State University in Library and Information Sciences and Public History with a School Library Media Specialist Endorsement to my Michigan Secondary Education Social Studies and English Teaching Certificate. I have been featured on the @lifeoflibrarians Instagram. In my free time, I enjoy writing, reading, playing on the Nintendo Switch, doing yoga, listening to podcasts, tarot cards, hanging out in coffee shops, and catching up on TV.


It has been a pleasure to attend the 2021 DLF Forum as one of DLF’s Community Journalists. There were many sessions I enjoyed attending and I have learned so much at an attendee of this conference. The session that stood out to me the most was the keynote address, “1619 to 2021: A Black Journalist Turns the Light of Truth on the History of American Race”. This keynote address focused on Nikole Hannah-Jones, a journalist and the creator of The 1619 Project, as she was speaking with Stacy Patton. It was a pleasure hearing from Hannah-Jones on why she decided to start this project, along with her thoughts on race in America including critical race theory and the notion of diversity and inclusion.

As an educator turned librarian, this session touched on projects being discussed in the world of education so it really sparked my interest. Critical Race Theory and The 1619 Project are both projects that are deemed controversial by conservative political leaders and many families across the country and have even been banned from being taught in schools. Teachers have been targeted for teaching on these topics, from being told they cannot include these projects in the classroom to as far as being terminated. It was insightful to hear Hannah-Jones touch on how both of these projects are viewed and deemed as controversial for just simply including the truth — for addressing and focusing on the issues of race and racism in America. Additionally, I enjoyed hearing the support and need for archiving during this keynote address. At first, Patton and Hannah-Jones discussed the issue of sources that are kept from certain people who are looking for information in archive spaces for fear of what they will find. This was shocking to me because this seems unethical to withhold sources from a person just to cover the information that is there. Both women explained that this is common in certain situations, even for them as people who write about race and racial issues. However, I did enjoy hearing that there needs to be more funding and support for archiving in order for information to be preserved correctly. It felt great to hear this support, especially after hearing after how institutions across the country do not put in the effort to archive resources and information on a variety of topics. 

Finally, the keynote address touched on how diversity and inclusion is a notion that is not working in cooperate America. It was discussed how many companies will do the most basic thing, like hold diversity and inclusion training, to try to resolve any race issues in the company then move on without doing much else. This is not effective at all since the need for diversity and inclusion in America in the cooperate world is something that is thrown around to make it look like companies look like they care about social issues. In reality, if companies truly cared, then they would make better efforts to address the issues they are facing.

The post DLF Forum Community Journalist Reflection: Marwah Ayache appeared first on DLF.

2021 CLIR Events Recordings Now Available + Join the 2022 Planning Committees! / Digital Library Federation

Join us in Baltimore! October 9-13, 2022

2021 CLIR Events Recordings Now Available

Recordings of all live-streamed 2021 CLIR Events sessions, including DLF Forum  and , are now freely and publicly available.

Watch and share all of last year’s  and  recordings today!

Join the 2022 CLIR Events Planning Committees

Time is running out to sign up to join the planning committees for 2022’s CLIR Events! This year’s events will take place in Baltimore, Maryland in October.

To apply to join the DLF Forum planning committee, complete the volunteer form here by this Friday, January 14, 2022: https://forms.gle/2ajaM8zsruL3ctfw8
To apply to join the NDSA Digital Preservation 2022 planning committee, complete the volunteer form here by this Friday, January 14, 2022: https://forms.gle/UaqqJYwhwBrFQvBH9
To apply to join the Digitizing Hidden Collections Symposium planning committee, complete the volunteer form here by next Friday, January 21, 2022: https://forms.gle/KaqfZm37oZpdmoAp7

The post 2021 CLIR Events Recordings Now Available + Join the 2022 Planning Committees! appeared first on DLF.

Another Layer Of Centralization / David Rosenthal

Moxie Marlinspike tried building "web3" apps and reports on the experience in his must-read My first impressions of web3. The whole post is very perceptive, but the most interesting part reveals yet another way the allegedly decentralized world of cryptocurrencies is centralized.

Below the fold, I explain the details of yet another failure of decentralization.

Marlinspike starts with his explanation of why although "web1" was decentralized, "web2" ended up centralized:
People don’t want to run their own servers, and never will. The premise for web1 was that everyone on the internet would be both a publisher and consumer of content as well as a publisher and consumer of infrastructure.

We’d all have our own web server with our own web site, our own mail server for our own email, our own finger server for our own status messages, our own chargen server for our own character generation. However – and I don’t think this can be emphasized enough – that is not what people want. People do not want to run their own servers.

Even nerds do not want to run their own servers at this point. Even organizations building software full time do not want to run their own servers at this point. If there’s one thing I hope we’ve learned about the world, it’s that people do not want to run their own servers. The companies that emerged offering to do that for you instead were successful, and the companies that iterated on new functionality based on what is possible with those networks were even more successful.
This is partly an example of Economies of Scale in Peer-to-Peer Networks; massive economies of scale make running services "in the cloud" enormously cheaper. But it is also an issue of skill. I've been running my own servers for decades, so I can testify that the skills needed to do so now are exponentially greater than when I started. Not that they were trivial back then, but I was a professional in the technology. There are two main reasons:
  • The environment in which servers run these days is extremely hostile, keeping them reasonably secure demands constant attention.
  • The devoted efforts of thousands of programmers over the decades have made the software the servers run much more complex.
Delegating the task of baby-sitting servers to the paid professionals who run cloud systems just makes sense.

His key observation is:
When people talk about blockchains, they talk about distributed trust, leaderless consensus, and all the mechanics of how that works, but often gloss over the reality that clients ultimately can’t participate in those mechanics. All the network diagrams are of servers, the trust model is between servers, everything is about servers. Blockchains are designed to be a network of peers, but not designed such that it’s really possible for your mobile device or your browser to be one of those peers.

With the shift to mobile, we now live firmly in a world of clients and servers – with the former completely unable to act as the latter – and those questions seem more important to me than ever. Meanwhile, ethereum actually refers to servers as “clients,” so there’s not even a word for an actual untrusted client/server interface that will have to exist somewhere, and no acknowledgement that if successful there will ultimately be billions (!) more clients than servers.
Ethereum nodes need far more resource than a mobile device or a desktop browser can supply. But on a mobile device or in a desktop browser is where a "decentralized app" needs to run if it is going to interact with a human. So:
companies have emerged that sell API access to an ethereum node they run as a service, along with providing analytics, enhanced APIs they’ve built on top of the default ethereum APIs, and access to historical transactions. Which sounds… familiar. At this point, there are basically two companies. Almost all dApps use either Infura or Alchemy in order to interact with the blockchain. In fact, even when you connect a wallet like MetaMask to a dApp, and the dApp interacts with the blockchain via your wallet, MetaMask is just making calls to Infura!
So once again we see that "decentralized" is just a marketing buzzword that implies "not controlled by big corporations you can't trust", thus obscuring the fact that each layer of the system is controlled by a few not yet as big corporations that are actually far less trustworthy that the big corporations that centralized "web2".

How do we know that the two companies centralizing this layer of the "decentralized" stack aren't trustworthy? Marlinspike looked at their APIs:
These client APIs are not using anything to verify blockchain state or the authenticity of responses. The results aren’t even signed. An app like Autonomous Art says “hey what’s the output of this view function on this smart contract,” Alchemy or Infura responds with a JSON blob that says “this is the output,” and the app renders it.

This was surprising to me. So much work, energy, and time has gone into creating a trustless distributed consensus mechanism, but virtually all clients that wish to access it do so by simply trusting the outputs from these two companies without any further verification.
One of the major reasons advanced for why centralization of "web2" in the hands of huge corporations is bad is that they can censor the Web. Marlinspike built an NFT to demonstrate the fragile nature of NFTs. It looked different depending on which NFT service you used to view it:
but when you buy it and view it from your crypto wallet, it will always display as a large 💩 emoji
How did OpenSea react to this demonstration of the problems with NFTs?
After a few days, without warning or explanation, the NFT I made was removed from OpenSea (an NFT marketplace)
...
The takedown suggests that I violated some Term Of Service, but after reading the terms, I don’t see any that prohibit an NFT which changes based on where it is being looked at from, and I was openly describing it that way.

What I found most interesting, though, is that after OpenSea removed my NFT, it also no longer appeared in any crypto wallet on my device.
The reason is that the wallets simply call OpenSea's API, so OpenSea can simply decide to refuse to display NFTs they don't like. Russell Brandom reports on another instance of OpenSea's censorship in Messy NFT drop angers infosec pioneers with unauthorized portraits:
Released on Christmas Day by a group called “ItsBlockchain,” the “Cipher Punks” NFT package included portraits of 46 distinct figures, with ten copies of each token. Taken at their opening price, the full value of the drop was roughly $4,000. But almost immediately, the infosec community began to raise objections — including some from the portrait subjects themselves.
...
Tuesday morning, the ItsBlockchain team announced in a Medium post that it would be “shutting down” the collection in response to the backlash, offering full refunds to any purchasers and covering any gas fees involved in the transfer.
...
In the wake of the post, OpenSea appears to have taken central action to remove the collection, which is no longer visible on the platform.
Censorship can also be useful in the wake of thefts, as Edward Ongweso Jr. reports in ‘All My Apes Gone’: NFT Theft Victims Beg for Centralized Saviors:
Chelsea art gallery owner Todd Kramer had 615 ETH (about $2.3 million) worth of NFTs, primarily Bored Apes and Mutant Apes, stolen by scammers and listed on the peer-to-peer NFT marketplace OpenSea.
...
"We take theft seriously and have policies in place to meet our obligations to the community and deter theft on our platform. We do not have the power to freeze or delist NFTs that exist on these blockchains, however we do disable the ability to use OpenSea to buy or sell stolen items. We've prioritized building security tools and processes to combat theft on OpenSea, and we are actively expanding our efforts across customer support, trust and safety, and site integrity so we can move faster to protect and empower our users.”

OpenSea did not answer, however, why it had frozen the trading of these NFTs and not others stolen just weeks ago that were announced on Twitter by Bored Ape Yacht Club and Jungle Freak NFT owners.
More than seven years ago I provided a detailed description of the economic forces driving centralization. Why has there been almost no progress since in developing ways to push back against these forces? No-one cares that their "decentralized" system isn't actually decentralized because even if it isn't they can use the buzzword to ensure their "number go up".

Crypto Crazy / Casey Bisson

The conclusion from Garbage Day: We have an entire class of wealth holders now who have assets that are not connected to any particular country.

Finding source install location of a loaded ruby gem at runtime / Jonathan Rochkind

Not with a command line, but from within a ruby program, that has gems loaded … how do you determine the source install location of such a loaded gem?

I found it a bit difficult to find docs on this, so documenting for myself now that I’ve figured it out. Eg:

Gem.loaded_specs['device_detector'].full_gem_path
# => "/Users/jrochkind/.gem/ruby/2.7.5/gems/device_detector-1.0.5"

Service Opportunities within NDSA / Digital Library Federation

Are you looking to get more involved with NDSA? Do you need to ramp up your service? NDSA is currently recruiting new co-chairs for the Standards & Practices and Content Interest Groups.

Interest Group Co-chairs host regular virtual meetings around topics of interest to NDSA members and the digital curation community, participate in monthly Leadership calls, and contribute to the strategic direction of NDSA. Interest groups may also engage in other activities of interest to members, such as developing educational materials. The NDSA Leadership team works collaboratively and provides support for interest group development. Terms are typically two years and are renewable. You can find out more about being a co-chair at Co-Chairing and NDSA Group.

The Standards and Practices Interest Group works to facilitate a broad understanding of the role and benefit of standards in digital preservation and how to use them effectively to ensure durable and usable collections. The Content Interest Group explores a broad array of topics related to preserving digital content, including: viable approaches to institutional collaboration; sharing strategies to address issues of scale and complexity; and the development of policies, practices, and community action to promote ethical and sustainable labor for digital stewardship.

If you’re interested in becoming a co-chair, please contact the current co-chairs to let them know. Any member of NDSA Leadership would be happy to answer any questions!

Standards & Practice: Felicity Dykas (DykasF [at] MISSOURI [dot] EDU)

Content: Brenda Burk (bburk [at] clemson [dot] edu) and Deb Verhoff (deb [dot] verhoff [at] nyu [dot] edu)

~ Nathan Tallman
NSDA Chair

The post Service Opportunities within NDSA appeared first on DLF.

Frictionless Planet – Save the Date / Open Knowledge Foundation

We believe that an ecosystem of organisations combining tools, techniques and strategies to transform datasets relevant to the climate crisis into applied knowledge and actionable campaigns can get us closer to the Paris agreement goals. Today, scientists, academics and activists are working against the clock to save us from the greatest catastrophe of our times. But they are doing so under-resourced, siloed and disconnected. Sometimes even facing physical threats or achieving very local, isolated impact. We want to reverse that by activating a cross-sectoral sharing process of tools, techniques and technologies to open the data and unleash the power of knowledge to fight against climate change. We already started with the Frictionless Data process – collaborating with researcher groups to better manage ocean research data and openly publish cleaned, integrated energy data – and we want to expand an action-oriented alliance leading to cross regional, cross sectoral, sustainable collaboration. We need to use the best tools and the best minds of our times to fight the problems of our times. 

We consider you-your organisation- as leading thinkers-doers-communicators leveraging technology and creativity in a unique way, with the potential to lead to meaningful change and we would love to invite you to an initial brainstorming session as we think of common efforts, a sustainability path and a road of action to work the next three years and beyond. 

What will we do together during this brainstorming session? Our overarching goal is to make open climate data more useful. To that end, during this initial session, we will conceptualise ways of cleaning and standardising open climate data, creating more reproducible and efficient methods of consuming and analysing that data, and focus on ways to put this data into the hands of those that can truly drive change. 

WHAT TO BRING?

  • An effort-idea that is effective and you feel proud of at the intersection of digital and climate change.
  • A data problem you are struggling with.
  • Your best post-holidays smile.

When?

13:30 GMT – 20 January – Registration open here.

20:30 GMT – 21 January – Registration open here.

Limited slots, 25 attendees per session. 

100+ Conversations to inspire our new Direction / Open Knowledge Foundation

It has been almost two decades since OKF was founded. Back then, the open movement was navigating uncharted waters, with hope and optimism. We created new standards, engaged powerful actors and achieved change in government, science and access to knowledge and education, unleashing the power of openness, collaboration and community in the early digital days. You were a key mind in shaping the movement with your ideas and contributions.

Now, the World changed again. Digital power structures are in the hands of a few corporations, controlling not only the richest datasets but also what we see, read and interact with. The climate crisis is aggravated by our digital dependencies. Inequality is rampant and the benefits of the digital transition are once again, unevenly distributed. We transferred racism and prejudices of the past to the technologies of the future, and the permissionless openness we enabled and encouraged led in some cases to new forms of extractivism and exploitation.

What is the role of Open Knowledge Foundation to face the new challenges of “open” and the new threats to a “knowledge society and economy”? Which are the most urgent and important areas of action? Who are the partners we need to bring in to gain relevance and traction? Who are the allies we need to get closer to? Priorities? Areas of opportunity? Areas of caution?

We are meeting 100+ people to discuss the future of open knowledge, as we write our new strategy, which will be shaped by a diverse set of visions from artists, activists, academics, archivists, thinkers, policymakers, data scientists, educators and community leaders from all over the World, to update and upgrade our path of action and direction to meet the complex challenges of our times.

We want these conversations to reflect the diversity in our societies and the very diverse challenges we will need to face. We are therefore gathering suggestions on people we should talk to, from as many allies as possible. Who do you think would make a difference in this conversation? Who should we go and talk to? Please let us know your suggestion via this form.

Stay tuned to know more about these conversations and the outcome they will have on our strategy ahead.

The collaborative strategy will be validated by our board of directors and network, and it will be launched this year.

Forcing the TCP/IP TTL on GL-iNet devices / Casey Bisson

I sometimes use a GL-iNet AR750S Slate instead of tethering to my phone. It’s a nice, compact device, but I needed to change the TCP/IP TTL for one application.

Working with UNHCR to better collect, archive and re-use data about some of the world’s most vulnerable people / Open Knowledge Foundation

Since 2018, the team at Open Knowledge Foundation has been working with the Raw Internal Data Library (RIDL) project team at UNHCR to build an internal library of data to support evidence-based decision making by UNHCR and its partners.

What’s this about? 

The United Nations High Commissioner for Refugees (UNHCR) is a global organisation ‘dedicated to saving lives, protecting rights and building a better future for refugees, forcibly displaced communities and stateless people’.

Around the world, at least 82 million people have been forced to flee their homes. Many of these people are refugees and asylum seekers. Over half are internally displaced within the border of their own country. The vast majority of these people are hosted in developing countries. Learn more here.

UNHCR has a presence in 125 countries, with 90%+ of staff based in the field. An important dimension of their work involves collecting and using data – to understand what’s happening, to which people, where it’s happening and what should be done about it. 

In the past, managing this data has been a huge challenge. Data was collected in a decentralised manner. It was then stored, archived, and processed in a decentralised manner. This meant that much of the value of this data was lost. Insights were undiscovered. Opportunities missed. 

In 2019, the UNHCR released its Data Transformation Strategy 2020 – 2025 – with the vision of UNHCR becoming ‘a trusted leader on data and information related to refugees and other affected populations, thereby enabling actions that protect, include and empower’.

The Raw Internal Data Library (RIDL)  supports this strategy by creating a safe, organized place for UNHCR to store its data , with metadata that helps staff find the data they need and enables them to re-use it in multiple types of analysis. 

Since 2018, the team at Open Knowledge Foundation have been working with the RIDL team to build this library using CKAN –  the open source data management system. 

OKF spoke with Mariann Urban at UNHCR Global Data Service about the project to learn more. 

Here is an extract of that interview, which has been edited for length and clarity.


Hi Mariann. Can you start by telling us why data is important for UNHCR

MU/UNHCR: That’s a great question. Pretty much everyone at UNHCR now recognises that good data is the key to achieving meaningful solutions for displaced people. It’s important to enable evidence-based decision making and to deliver our mandate. And also, it helps us raise awareness and demonstrate the impact of our work. Data is at the foundation of what UNHCR does. It’s also important for building strong partnerships with governments and other organisations. When we share this data, anonymised where necessary, it allows our partners to design their programmes better. Data is critical to generate better knowledge and insights. Secondary usage includes indicator baseline analysis, trend analysis, forecasting, modeling etc. Data is really valuable!

What kinds of datasets does UNHCR collect and use?

MU/UNHCR: We have people working in countries all over the world, most of them in the field. Every year UNHCR spends a huge amount of money collecting data. It’s a huge investment. Much of this data collection happens at the field level, organised by our partners in operations. They collect a multitude of operational data each year.

You must have lots of interesting data. Can you give us an example of one important dataset?

MU/UNHCR: One of the most valuable datasets is our registration data. Registering refugees and asylum seekers is the primary responsibility of governments. But if they require help, UNHCR provides support in that area.

In the past, How was data collected, archived and used at UNHCR?

MU/UNHCR: Let me give you an example about how it used to be. In the past, let’s imagine, there was a data collection exercise in Cameroon. Our colleagues finished the exercise, and the data stayed in the partner organisation, or sometimes with the actual person collecting the data. It was stored on hard drives, shared drives, email accounts etc. Then, the next person who wanted to work with the data, or a similar data set probably had no access to this data, to use as a baseline, or for trends analysis.

That sounds like a problem.

MU/UNHCR: Yes! This was the problem statement that led to the idea of the Raw Internal Data Library (RIDL). Of course, we already have corporate data archiving solutions. But we realised we needed something more.

Tell us more about RIDL

MU/UNHCR: The main goal of RIDL is to stop data loss. We know that the organisation cannot capitalise on data if they are lost or forgotten, or not stored in a format that is interoperable, machine-readable, and does not include a minimum set of metadata to ensure appropriate further use.

RIDL is built on CKAN. Why is that?

MU/UNHCR: Our team had some experience with CKAN, which is already used in the humanitarian data community. UNHCR has been an active user of OCHA’s Humanitarian Data Exchange (HDX) platform to share aggregate data externally and we closely collaborate with its technical team. After a market research, we realised that CKAN was also a good solution for an internal library – the data is internal, but it needs to be visible to a lot of people inside the organisation. 

What about external partners and the media? Can they access RIDL datasets?

MU/UNHCR: There are some complicated issues around privacy and security. Some of the data we collect is extremely sensitive. We have to be strong custodians of this data to ensure it is used appropriately. Once we analyse the data, we can take the next step and share it externally, of course. Sometimes our data include personal identifiers, it therefore must be cleaned and anonymised to ensure that data subjects are not identifiable. Once we have a dataset that is anonymised – we use our Microdata Library to publish it externally. Thus RIDL is the first step in a long chain of sharing our data with partners, governments, researchers and the media. 

RIDL is a technological solution. But I imagine there is some cultural change required for UNHCR to reach its vision of becoming a data-enabled organisation.

MU/UNHCR: Yes of course, achieving these aspirations is not just about getting the technology right. We also have to make cultural, procedural and governance changes to become a data-enabled organisation. It’s a huge project. It needs a culture shift in UNHCR – because even if it’s internal, it’s a bit of work to convince people to upload. The metadata is always visible for everyone internally, but the actual data itself can be restricted and only visible following a request and evaluation. We want to be a trusted leader, but we also want to use that data to arrive at a better solution for refugees, to enrich our partnerships, and to enable evidence-based decision making – which is what we always aim to do.

Thanks for sharing your insights with us today Mariann. 

MU/UNHCR: No problem. It’s been a pleasure. 


Find out more

Open Knowledge Foundation is working with UNHCR to deliver the Raw Internal Data Library (RIDL). If you work outside of UNHCR, you can access UNHCR’s Microdata Library here. Learn more about CKAN here. 

If your organisation needs a Data Library solution and you want to learn more about our work, email info@okfn.org. We’d love to talk to you !

Fix the docs - SQL Server and Docker on a Mac / Ted Lawless

I recently setup Microsoft SQL Server on a Macbook and found a simple error in Microsoft's documentation. Since I didn't quickly find an answer in the normal places (Google -> Stack Overflow), I thought I would post it here in case it saves someone a few minutes. From Microsoft's "Get started with SQL Server" documentation, which is otherwise quite good and nicely organized, Step 1.1 ((https://www.microsoft.com/en-us/sql-server/developer-get-started/python/mac/)) lists two commands for pulling a Docker image from Docker Hub and running it...

Mutable Mobiles / Ed Summers

Based on Calder Sculpture 2/4 by Bolumena

One of the topics we’ve been discussing while thinking about the design of the WACZ format and how it is used in tools like ArchiveWeb.page and ReplayWeb.page is the issue of trust. This post isn’t about all the angles on this issue, because it’s a huge topic that will be addressed more fully as the work progresses. It’s just a quick experiment to illustrate how distributed web archives could be (will be? are?) used as a tool in disinformation and deep(ish) fakes, and to encourage some creative thinking on how to mitigate some of these potential harms, or design with them in mind.

ArchiveWeb.page makes it easy to record a web page (or pages), export as a WACZ, and embed that WACZ anywhere on the web using the ReplayWeb.page web component (just a bit of HTML, JavaScript and the WACZ file). This is amazing when you consider the amount of infrastructure that is needed to do this with a classical web archive setup (e.g. Heritrix + Wayback Machine).

But because the traditional web archive stack is so expensive to run and keep online it is typically paid for by organizations that naturally have some level of trust associated with them (Internet Archive, Library of Congress, British Library, Bibliothèque nationale de France, etc).

When web archives are easy to create and distribute as a media type on the web outside of traditional “web archives” (e.g on wikis, in cms systems, social media, etc) they theoretically become what Bruno Latour calls an immutable mobile: a snapshot of what some region of the web looked like at a particular points in time. Latour details lots of characteristics of immutable mobiles, but his basic definition is that they are:

A general term that refers to all the types of transformations through which an entity becomes materialized into a sign, an archive, a docu­ment, a piece of paper, a trace. Usually but not always inscriptions are two­ dimensional,superimposable, and combinable. They are always mobile, that is they allow for new translations and articulations while keeping some types of relations intact.(Latour, 1999)

Maybe this is a bit circular saying that WACZ files are immutable mobiles because they are archives, but I think the idea is useful because it gets us thinking about how WACZs can move around while retaining particular intrinsic properties.

But the question is, how immutable are these WACZ archives? One approach that is under discussion for verifying the authenticity of a WACZ is to generate a content hash of each HTTP response, record it in the WARC file, and then create a content hash for all the WARC files, and sign it. This way you can know as you are replaying content from an archive that it hasn’t been interfered with, as long as you trust the WACZ creator. This information would then be surfaced in the player in some fashion, perhaps not unlike how you trust a website by examining the lock to the left of the URL location in your browser.

Trust in the creator of a WACZ (not only the publisher) is an essential component, because with a little bit of effort it’s possible to create a web archive that looks authentic, but was actually recorded in such a way that has transformed the original content. For example:

You can also see this WACZ embedded in a larger page here which allows some additional the UI elements to appear. The WACZ file being viewed here demonstrates a potential attack vector for distributed web archives, where archived content has been altered as it was being recorded.

In this experiment a man in the middle proxy (mitmproxy) was set up on the recording users’s computer, and their browser has been instructed to use the proxy and trust the certificate it generated. mitmproxy was given a small script to rewrite the content of the page:

# rewrite.py

def response(flow):
    flow.response.text = flow.response.text.replace(
        'Members of extremist Oath Keepers group planned attack', 
        'Members of extremist Oath Keepers group planned to prevent the attack'
    )

Which is used by simply telling mitmproxy to use it when starting up:

$ mitmproxy -s rewrite.py

Is there anything that can be done at the interface level to surface any information in the underlying data (e.g HTTP trace data) that could provide a clue that some breach of trust has occurred? Or perhaps the only viable way forward is for the ReplayWeb.page interface to negotiate trust between the viewer and the creator of the archive: to allow the viewer to see who created the archive, and potentially who else trusts the creator?

Trust on the web is a bit of a fraught problem space, and the techniques for signing WACZs are by no means finalized, but if you have thoughts or ideas about a WACZ viwere could speak to issues of trust please send them via the WACZ issue tracker or to info [at] webrecorder.net.

References

Latour, B. (1999). Pandora’s hope: Essays on the reality of science studies. Harvard University Press.

Refactoring DLTJ, Winter 2021 Part 4: Thursday Threads Newsletter Launches / Peter Murray

Success! Four parts plus a half (or a “re-do”” of part 2):

  1. Ramp up automation for adding reading sources to Obsidian
  2. Refactor the process of building this static website on AWS
  3. Recreate the ability for readers to get updates by email
  4. Turn the old DLTJ “Thursday Threads” concept into a newsletter (this post)

Earlier today, the newsletter launched with issue 79. It wasn’t without hiccups, but I don’t think any of the problems leaked out to the subscribers. I started with a list of 286 email addresses that were subscribed to the 2015 edition. This morning I sent an email to all of them on the blind-carbon-copy line from my regular email. That way I could see which addresses bounced back as undeliverable (94 addresses) before loading the list into the newsletter database. (Undeliverable email counts as a strike against you when using Amazon’s Simple Email Service, so I didn’t want to start with a bad reputation with them.)

One of the issues I ran into was with the multiprocessing code that I found on the web. It didn’t work as claimed, and when I tried to adjust it, the loop to process email stalled, so I ripped out that code. In the end, with about 200 email addresses, it took just a minute or two of single-threaded, sequential sending to get them all out. Perhaps I won’t need that multi-threaded capability until Thursday Threads gets much bigger.

How the Newsletter is Put Together

Like everything on this static site blog, an issue starts as a Markdown file. Markdown is a light-weight markup language that translates very easily into HTML, and makes it easy for a writer to create valid HTML. It is also possible to mix HTML inside a Markdown file and have the right thing happen. The Jekyll processor (the program that turns a folder of Markdown files into a folder of HTML files) has a mechanism for including macros in the markup, and each “thread” in the issue is a macro file. If you look at the Markdown source for issue 79, you’ll see each heading (marked with ##) has a {% include thursday-threads-quote.html macro definition.

blockquote="The EDUCAUSE 2022 Top 10 IT Issues take an optimistic view of how technology can help make the higher education we deserve—through a shared transformational vision and strategy for the institution, a recognition of the need to place students’ success at the center, and a sustainable business model that has redefined 'the campus.'" 
url="https://er.educause.edu/articles/2021/11/top-10-it-issues-2022-the-higher-education-we-deserve" 
versiondate="2021-11-12"
versionurl="https://web.archive.org/20211127031010/https://er.educause.edu/articles/2021/11/top-10-it-issues-2022-the-higher-education-we-deserve"
anchor="Top 10 IT Issues, 2022: The Higher Education We Deserve" 
post="EDUCAUSE"

Each of those variables are used in the include processor, which at the moment looks like this (see current version):

<figure class="quote thursdaythread">
  <blockquote>
{{ include.blockquote }}
  </blockquote>
  <figcaption>&mdash;
{% if include.pre %}{{ include.pre }}{% endif %}
{% if include.url %}<a href="{{ include.url }}"{% if include.versionurl %} data-versionurl="{{ include.versionurl }}"{% endif%}{% if include.versiondate %} data-versiondate="{{ include.versiondate }}"{% endif %}{% if include.title %} title="{{ include.title }}"{% endif %}>{{ include.anchor}}</a>{% endif %}
{% if include.post %}{{ include.post }}{% endif %}
  </figcaption>
</figure>

That is some semantically-appropriate HTML that with some CSS make the nice layout on the page. (And should be accessible to screen readers, too.) The content of the “blockquote” variable is inserted at {{ include.blockquote }} spot. There are also some conditional statements ({% if include.pre %} ... {% endif %}) that will include markup when a variable has a value assigned to it. The best part of these include blocks is that I can save them as separate files in my Obsidian database with links and tags to the places where I got the content. In fact, I expect my writing workflow will start with creating these include fragments in my Obsidian database throughout the week, and then when Wednesday night rolls around I’ll pick some to drop into an issue. (Over time, I aim to convert all 650 previous blog posts into Markdown and add them to my Obsidian database as well. That will make it even easier to draw threads from the past.)

So that is where we are: some revitalized technoloy backing DLTJ and a strong intention to write more in the new year. Thanks for everyone’s interest along the way, and please get in touch if you have any questions or comments.

Refactoring DLTJ, Winter 2021 Part 1: Picking up Obsidian / Peter Murray

As 2021 comes to a close, I’ve been thinking about this blog and my own “personal knowledge management” tools. It is time for some upgrades to both. The next few posts will be about the changes I’m making over this winter break. Right now I think the updating will look something like this:

I’ll go back and link the bullet points above when (if?) I create the corresponding blog posts.

I’ve been using Obsidian for about six months as a place to note and link ideas on stuff I’m reading and watching. In case you haven’t run across it yet, Obsidian is a personal wiki of sorts. It is software that sits atop a folder of Markdown files to provide indexing as well as inter-page linking and graph views of the folder’s contents. Most people use it to build up their own personal knowledge management (PKM) database. You can make notes for the sources you are reading, then build knowledge by linking sources together using keywords and adding commentary at the intersection of related ideas.

Before Obsidian, I was using the Pinboard service to store bookmarks of interesting sources and using the paid subscription search engine and my own memory to find stuff. I’ve found that this setup works okay for retrieval—I can usually find things that I know I’ve read about before—but doesn’t do so well for making new connections or creating new knowledge. The Thursday Threads series on this blog years ago was, in part, a way to find those connections and explore them a little bit in writing. I’m expecting Obsidian to help improve this area.

The start of the knowledge curation process is creating pages in Obsidian for the important/useful things I’m reading—each of these is a “source”. I like the idea of having a bookmark service as the start of the queue of sources feeding into the PKM; It is a universal tool that is available from a wide variety of entry points. In my desktop browser, I use the Pinboard Bookmarklet to add new sources. On iOS, I use the Pins app on the share sheet to add things. The Pins app works not only in Safari but also in other places like the New York Times and Twitter apps.

To get sources from Pinboard into my Obsidian PKM database, I wrote a Python script that uses the Pinboard API to copy bookmarks into an intermediate SQLite3 database, and then every morning creates a page in the Obsidian database for each new source. Please note that this Python script is quite the mess; it started simple but has had functionality grafted into it a dozen times now, and it is in need of a serious rewrite. For better or for worse, it is out there for others to inspect and get ideas from.

For the sources I add to my PKM, I’m also concerned about link rot (web resources that go missing) and content drift (resources that change in between the time you first read them and when you or someone else goes back to them). To combat this, the script sends an API call to the Internet Archive’s Wayback Machine to save the contents of a web page. I’m able to retrieve the Wayback archive URL and save that in the SQLite3 database. This is useful not only for my own reference but also for when I publish blog posts. You’ve probably noticed the link symbol to the right of hyperlinks on this page; those are robust links in practice—it opens a drop-down menu that takes you to the archived version of the linked webpage. (The robust links concept probably deserves a blog post all its own.)

Another side effect that I wrote into the script was to post my public bookmarks to Twitter and Mastodon. (For a long time I used the Buffer service to do the same thing, but over the years I had less and less control over how and when Buffer posted links.) I hope that posting these sources publicly will generate more conversation on the topic that I can add to my notes.

Each bookmark on the Pinboard service has a field for a description and a field for tags, and those are okay as far as they go. For some sources, though, I found myself wanting to comment in a more structure way, and so I reintroduced myself to the hypothesis.is service. Hypothes.is allows you to comment on any web page or PDF, and share those comments with others. More importantly, Hypothes.is lets you comment on selected portions of a document, and stores enough context to find that same location even when the underlying document changes. Hypothes.is as a service is embedding itself into learning management systems as a way for students to collaboratively critique content on the internet, but it is also useful to average folks. My Python script uses the Hypothes.is API to read my stored annotations, then gathers all of the annotations for one source onto its own page in the PKM database.

So that is what I’ve been running with for a number of months. This week I’ve been adding some enhancements to the Python script. The first change was to make each Pinboard bookmark its own page in the PKM database. When I started out months ago, I thought my “Sources” area of my Obsidian PKM would get polluted with small, stub pages because the only content was a link to the source and the topical keywords. So initially the script just added the links to the sources on a “daily notes” page. I found this ended up polluting the PKM’s knowledge graph, though, because unrelated daily note pages would be linked through topical keywords from sources that were only related because I happened to read them on the same day.

Second, to add more “heft” to these source pages in the PKM, the script now adds a summary paragraph. It does this by scraping the main content of the webpage (using trafilatura) and picking the most important sentences (using the Natural Language Toolkit (NLTK) and the technique described by Ekta Shah). I’m expecting that no matter what I’ve written about the source or the keywords I’ve assigned to the source, these summaries will provide another valuable way to retrieve pages and concepts. (The NLTK toolkit has other text processing features—entity recognition, sentiment analysis, etc.—and I might explore additional information to the Obsidian PKM pages with those tools.)

Third, I started adding more metadata to the top of each PKM page. I’m expecting this metadata will be useful in the future, especially for functionality like reminding myself of saved sources after six months or a year.

The result of all of this work is an Obsidian page that looks like this:

Screen capture of a Markdown-formatted page that includes the metadata at the top, an automated summary in the middle, and links to the original source and topical keywords. Example Obsidian page

I think I’m done for now with this Python script that injects new sources into my Obsidian PKM database. The next big thing is some kind of topical keyword management…a personal ontology service of sorts. (If you know of any sofware like that—particularly something that works with Obsidian—please let me know.) Eventually, I’d like to add a mechanism that pulls annotated text from Kindle books as new sources. I’d also like to find a way to get a list of podcast episodes that I’ve listened to and add those as well. But for now…until that rewrite…good enough.

Refactoring DLTJ, Winter 2021 Part 3: ‘Serverless’ Newsletter System / Peter Murray

So it has been quiet here for a couple of days. Rest assured: the quietness comes from heads-down work, not from giving up. Here are the refactor-DLTJ activities so far:

Since New Years Day, I’ve been working on a way to send the contents of blog posts by email…commonly known nowadays as a newsletter. Years ago, I was using the Feedburner service to do that. Then Feedburner was bought by Google, and things were mostly okay for a while. Which is to say that most everything was working, and the things that weren’t—like HTTPS for custom RSS domain names—had workarounds. But last summer Feedburner-Google discontinued the distribution of blog posts by email, which necessitated the need to buy or build my own email distribution system.

There are certainly “buy” options. For instance, one might use Medium for writing and distribution. But I’ve seen too many services come and go to come to rely on a business to be a good steward of my content. The Substack service has the same problem. For a while I considered the follow.it service as an alternative to Feedburner that included a newsletter-like add-on, but its “white label” service inserts the “follow.it” domain name in critical places where I would lose control over my list of subscribers. (After all, I’m only able to do this cleanly because I kept control over my RSS feed by using “feeds.dltj.org” as a hostname.)

So I’m running it myself. I briefly considered listmonk, but I don’t know the Go programming language so that make troubleshooting and enhancing more of a challenge. Not readily spotting other alternatives, I created my own system using AWS tools, the Serverless.com framework, and the Python programming language. Thanks to a great outline by Marco Lancini and ideas from Victoria Drake.

The newsletter infrastructure software is on GitHub. It deserves a decent README file and some documentation to help others use it if they are so inclined. There are also a number of hard-coded areas that would need to be made more general. (See, for instance, these couple of lines that are used to pull out the body of the blog post for inclusion into the newsletter email.)

But Why

I’ve been asked, why do you go through all of this work instead of just hosting your blog on Wordpress.com? That is a reasonable question and it deserves a thoughtful response.

  1. I like control of my content. My writings have always been stored on devices that I have a moderate amount of control over—first WordPress on a personal server in a co-location space, then WordPress on an Amazon Web Services (AWS) server, then as static files created by the Jekyll program and served up by AWS. (Side note, AWS isn’t the only place my stuff lives—I’ve always kept a copy on my local machine with backups held off-site.)
  2. To keep my tech skills sharp. With a computer science undergrad degree and self-described, old-school hacker, I’d like to think I could dive into any system and figure out how to run it. I’ve about given up on physical and data link layers (there was a time I made my own cables and configured building network equipment) and skills in the network and transport layers are getting quite stale (heard of the new QUIC protocol?). In my day job, the newish always-on, internet-grade infrastructure tools are becoming ever more mysterious. I want to learn a few new things just to keep up the practice of learning.
  3. Privacy and the Common Good still matter. My blog and this newsletter technology use no tracking technology. Aside from comments, I can’t tell who or how many are reading my blog. With the newsletter system, there are no tracking pixels or link shorteners that are detecting what you read. And this content is offered for free. Beyond the technical expertise, the technology running the blog and newsletter is really cheap. The blog looks to be about 50¢ a day—much lower than expected; we’ll see about the newsletter, but I don’t expect it to be more than a couple of dollars a month.

So that’s my thinking at this point. The technology surrounding DLTJ has certainly changed over the years, and I don’t expect it will remain static for decades on end. Time, technology advancement, and life choices can certainly change the calculus in the future.

Keep an eye out for the newsletter tomorrow, posted here on DLTJ and sent by email. If you’d like to subscribe to DLTJ Thursday Threads by email, head on over to newsletter.dltj.org.

Refactoring DLTJ, Winter 2021 Part 2: Adopt AWS Amplify / Peter Murray

Look at that! Progress is being made down the list of to-dos for this blog in order to start the new year on a fresh footing. As you might recall from the last blog post, I set out to do some upgrades across the calendar year boundary:

DLTJ is a “static site” blog, meaning that the page you are reading right now is a straight-up HTML file. This page is converted from the simple Markdown format to HTML by the Jekyll program. The DLTJ blog used to be based on WordPress, which meant a server was always running to dynamically generate each webpage out of a database. (If you go back in the DLTJ archives you’ll see notes on top of pages that were part of the automatic conversion from WordPress to Markdown.) That WordPress server was quite costly to have constantly run for a small blog. (Yes, it is possible to pay someone a small amount to host your WordPress blog for you, but I’m a do-it-yourself kind of person.) So at the end of 2017 I migrated the site to Markdown stored in a GitHub repository with the Jekyll conversion and content delivery through Amazon Web Services (AWS).

Serving up static web pages from AWS S3/CloudFront is really simple. Processing the Markdown on GitHub into HTML via Jekyll on AWS is more complicated, and that process was something that I wanted to happen automatically every time I published a change to GitHub. I ended up hand-crafting about 650 lines of an AWS CloudFormation configuration file plus a few dozen lines of Python in some AWS Lambda functions. It worked, but it was fragile and very hard to maintain.

That was in 2017 and technology marches on; now AWS has a service that does all of the automation for you. Called Amplify, it bundles together a bunch of other AWS tools to help developers to create “full-stack web and mobile apps.” The Amplify tools are really quite overkill for a static website, but building a static website is one of the hands-on “Getting Started” examples that AWS offers. For a static website, Amplify handles:

  1. creating an S3 bucket and CloudFront distribution to store and serve up the content
  2. provisioning a webhook API that notifies AWS to start the content building process and adds that webhook to the GitHub repository
  3. setting up the CodeBuild process for Jekyll to generate the static web pages
  4. creating the HTTPS security certificate and adding the appropriate DNS entries to the domain

All of the stuff I was doing in that 650-line CloudFormation file. (Plus Amplify has a lot more interesting features built into the service.)

Screen capture the AWS Console page for the Amplify Service. AWS Amplify Console

One Problem: Getting the Correct Version of Ruby

Now for the two-hour detour. At least one of the Jekyll Gems I’m using to build this site requires Ruby version 2.6 or higher. The AWS CodeBuild container used by Amplify was defaulting to Ruby version 2.4, though, and my initial attempts to configure Amplify to use the higher version (setting an environment variable) didn’t work. One answer that I found seemed to imply that I needed to build my own custom Docker image, so I started to make one using the AWS instructions. I got pretty far down that path before discovering that this blog needs not only the Ruby runtime but also a JavaScript runtime. (Another one of the Jekyll Gems calls out to a JavaScript function.) At that point, I went back to searching for an answer that used the AWS-supplied Docker image.

The real solution came in an answer about using a different version of Node in Amplify, which was to add a command to the preBuild and build steps to switch Ruby versions:

yaml preBuild: commands: - rvm use $VERSION_RUBY_2_6 - bundle install --path vendor/bundle build: commands: - rvm use $VERSION_RUBY_2_6 - bundle exec jekyll build --trace `

After that, everything built perfectly.

Note! Jumping in here a day later to say there was another problem...the webmention cache was left behind in the old CodeBuild configuration so I had to fix it."

Downsides: Lots of Invisible AWS Services and Poor Pricing Comparison

One problem with using AWS Amplify is that the underlying AWS services—S3 bucket, CloudFront distribution, CodeBuild instance, etc.—are not visible in the AWS Console. In other words, you can’t go to the CloudFront console page and see the configuration. More to the point, the cost of the underlying services seems to be aggregated into a relatively flat billing structure; Amplify’s costs are:

  • 1¢ per minute it takes to build the site (the blog takes between 2 and 3 minutes to build for each change)
  • 2.3¢ per GB of data stored per month (this whole blog is about 300MB)
  • 15¢ per GB of data served to readers (it looks like my blog is about 100GB/month)

So all told I think this is going to be $15 to $20 per month, with the biggest piece of that being the outbound bandwidth. Under the old system, last month my outbound CloudFront bandwidth cost $0.59, and this month will be zero because Amazon announced that the first 1TB of CloudFront data is now free. Hmmm—now that I’m pricing this out, maybe I’ll need to go back to the old way. I asked a question on AWS’ forums to see if this really holds true. Of course, I can also just wait a little while and see what my AWS bills look like.

$10 to $15 a month is quite a lot—I’ll keep that old 650-line CloudFormation file around to see if I end up reverting to that method. Overall, though, I’m pleased with how this turned out. When I have a better answer on the costs associated with Amplify, I’ll try to remember to come back here and update this post.

Tomorrow…with luck…recreating the email notification/delivery service!

Issue 79: Educational Technology Futures, Social Media Legislation, Apollo 11 Launch at 50 / Peter Murray

Welcome to the re-inaugural issue of DLTJ Thursday Threads. Counting backward, there were 78 previous issues (all by the most recent still need to be converted from the old WordPress style of formatting) with—all told—several hundred references and commentary. Here at the start of 2022, I’m making a resolution to restart Thursday Threads with links and thoughts about library technology, general technology trends, and internet culture.

Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ’s Thursday Threads, visit the sign-up page. If you would like a more raw and immediate version of these types of stories, follow me on Twitter where I tweet the bookmarks I save. Comments and tips, as always, are welcome.

What EDUCAUSE’s 2022 Top 10 IT Issues Mean for Libraries

The EDUCAUSE 2022 Top 10 IT Issues take an optimistic view of how technology can help make the higher education we deserve—through a shared transformational vision and strategy for the institution, a recognition of the need to place students’ success at the center, and a sustainable business model that has redefined 'the campus.'
Top 10 IT Issues, 2022: The Higher Education We Deserve EDUCAUSE

Let’s start with this report from EDUCAUSE from a panel of its members that reviewed survey results on what they see as the big educational technology issues for the year. I cover this report in more depth in a separate DLTJ article, but I think it is useful to provide some of the headline commentaries here. First, these IT leaders anticipate an acceleration of the role of technology in teaching and learning. The pandemic has spawned a new recognition of how big the cohort of “non-traditional” students is—part-time learners, remote learners, asynchronous learners, etc. Instructional technologists will certainly be called upon to support new tools and new roles; the academic librarian’s instructional experience and traditional “high-touch” approach to supporting users can be an asset for institutions that choose to tap that capability. There is recognition that we are all tired and stretched as well as the reality that one-time emergency money is drying up. Still, there is room for growth for academic libraries seeking to re-form their mission for a new era.

Legislation in the Works for Social Media Regulation

Washington is awash in proposals for reforming social media, but in a narrowly divided Congress, it’s little surprise that none have passed. Many Democrats believe that social media’s core problem is that dangerous far-right speech is being amplified. Many Republicans believe that the core problem is that the platforms are suppressing conservative political views. The new Senate legislation, which was introduced by two Democrats, Chris Coons and Amy Klobuchar, and a Republican, Rob Portman, may have a path toward passage because it doesn’t require taking a side in that argument.
A Former Facebook Executive Pushes to Open Social Media’s ‘Black Boxes’ New York Times, 2-Jan-2022

I haven’t heard anyone say recently that the ills of social media are a matter of information literacy. It seems like the world has recognized that social media algorithms prey upon socioeconomic standing and addictive human psychology to drive engagement in negative feedback loops and that no amount of education can combat the power of the algorithm. I don’t expect “social media curriculum” to come out of any legislative effort—particularly in an environment that is as polarized as the one we are now in. But I wonder if there is a role for library programming and library services in helping citizens understand the effects of social media algorithms, should new regulations provide the public data and research about how these companies are affecting our social relationships.

Relive the 50th Anniversary of the Apollo 11 Launch…Projected onto the Washington Monument!

"Apollo 50: Go for the Moon," recreated the launch of Apollo 11 and told the story of the first Moon landing through full-motion projection mapping artwork on the Washington Monument. Over a half-million people joined us July 16 to 20, 2019, to celebrate the 50th anniversary of Apollo 11 on the National Mall.
Apollo 50 Launch in 4k: Washington Monument Projection Mapping , Vimeo (16 minute video)

This is nearly three years old—pre-pandemic times—and it is still worth a quarter of an hour of your time to watch. The creative production and the technical execution of this performance must have been spectacular in person because it is mesmerizing to watch on a flat, two-dimensional screen. Details about this collaboration between the Smithsonian Institution can be found in a press release from the time.

Counterpoint on Venture Capital / David Rosenthal

My personal experience working with VCs was very positive, but it was (a) a long time ago and (b) they were top-flight firms (Sutter Hill and Sequoia). I've been very skeptical of the current state of the VC industry in Venture Capital Isn't Working and Venture Capital Isn't Working: Addendum. Steven J. Dubner's Is Venture Capital the Secret Sauce of the American Economy? presents a far more optimistic view, as does The Economist's The bright new age of venture capital. On my side of the argument are Fred Wilson's Seed Rounds At $100mm Post Money and the Wall St. Journal's The $900 Billion Cash Pile Inflating Startup Valuations.

Below the fold, some discussion of these opposing views.

Pro

Steven J. Dubner's Is Venture Capital the Secret Sauce of the American Economy? is a positive view of the contribution of VC-funded companies to the overall US economy. It is based on research published in Synergizing Ventures by Ufuk Akcigit et al, who summarize it in Synergising ventures: The impact of venture capital-backed firms on the aggregate economy:
In a recent paper, we study both empirically and theoretically the role of venture capital (VC) funding – a key source of startup financing – in identifying promising startups and turning them into engines of growth (Akcigit et al. 2019). In particular, we examine the types of startups that get funded by venture capitalists, study the extent to which synergies between venture capitalists and startups and venture capitalist experience matter for firm growth and innovation, and evaluate how critical VC is for growth in the US economy.
Figure 5
They ask whether VC funding improves a company's outcome:
To assess the magnitude of VC treatment effects, the selection of startups by venture capitalists must be taken into account. To control for selection based on observables, we match VC-funded firms with observationally similar non-funded firms along key dimensions, including year of initial funding, industry, state, age, and employment. Figure 5 plots the evolution of (ln) average employment for VC-funded firms and their matched counterparts over the period spanning three years prior to initial funding to 10 years afterwards. Among firms that patent, Figure 6 plots the evolution of (ln) citation-adjusted patent stock of VC-funded firms and their matched counterparts.
Figure 6
The answer isn't a suprise — VC funding is good for a company:
In both figures, the VC-funded and non-funded groups exhibit virtually identical trajectories before VC funding. However, subsequently VC-funded firms grow and innovate much more. Average employment increases by approximately 475% by the end of the horizon for VC-funded firms, whereas growth is much more modest for the control group (230%). Similarly, the average patent stock of VC-funded firms grows by about 1,100% over the 10-year horizon, as opposed to 440% for the control group. These results suggest that venture capitalists play an important role in the making of successful firms.
It has long been known that there are many mediocre VCs and a few really good ones. Here, for example, is Bill Janeway's take:
let me just say one thing about venture capital that’s really different. It’s not the extremely skewed returns: we see that across various asset classes. Rather, it’s the persistence of a firm’s returns over several decades, as seen in the US and documented with the data provided not by venture capitalists themselves, but by their limited partners!
Figure 7
Akcigit et al investigated this effect:
To understand a potential source of these treatment effects, we examine the heterogeneous impact on firm outcomes of being funded by more experienced versus less experienced venture capitalists. To do so, we first divide venture capitalists into two groups. Venture capitalists in the top decile of the ‘total deals’ distribution are labelled as “high quality” (high experience), and the remaining venture capitalists are labelled as “low quality” (low experience). Then, VC-funded firms are separated into those funded by high-quality versus low-quality venture capitalists. Figure 7 plots the evolution of (ln) average employment of firms in each of these categories, and Figure 8 plots the evolution of their (ln) average quality-adjusted patent stock.
Figure 8
Again, the result is as expected:
While firms backed by high- and low-quality venture capitalists are similar prior to VC involvement, the average employment and average patent stock of startups funded by high-quality venture capitalists is higher after VC involvement, and the gap between the two groups widens over the 10-year horizon. By the end of the horizon, average employment grows by about 400% in the high-quality group, versus 320% in the low-quality group. Similarly, by the end of the horizon, the average patent stock grows nearly 50-fold for the high-quality group, and only 19-fold for the low-quality group. We confirm that startups funded by high-quality VCs have better employment outcomes through a regression analysis that controls for both startup characteristics and initial funding infusion. These findings suggest that factors beyond funding, such as expertise and management quality associated with high-quality venture capitalists, matter for subsequent firm outcomes.
Source
The Economist's The bright new age of venture capital is a similarly optimistic take:
YOUNG COMPANIES everywhere were preparing for doomsday in March 2020. Sequoia Capital, a large venture-capital (VC) firm, warned of Armageddon; others predicted a “Great Unwinding”. Airbnb and other startups trimmed their workforces in expectation of an economic bloodbath. Yet within months the gloom had lifted and a historic boom had begun. America unleashed huge stimulus; the dominance of tech firms increased as locked-down consumers spent even more of their time online. Many companies, including Airbnb, took advantage of the bullish mood by listing on the stockmarket. The market capitalisation of American VC-backed firms that went public last year amounted to a record $200bn; it is on course to reach $500bn in 2021.

With their pockets full, investors are looking to bet on a new generation of firms. Global venture investment—which ranges from early “seed” funding for firms that are only just getting going to funding for more mature startups—is on track to hit an all-time high of $580bn this year, according to PitchBook, a data provider. That is nearly 50% more than was invested in 2020, and about 20 times that in 2002.
The money isn't just the result of previous successes by the VCs, it is coming from investors new to the space:
The frenzy is a result of both the entrance of new competitors and greater interest from end-investors. That in turn reflects the fall in interest rates across the rich world, which has pushed investors into riskier but higher-return markets. It has no doubt helped that VC was the highest-performing asset class globally over the past three years, and has performed on a par with bull runs in private equity and public stocks over the past decade.

End-investors who previously avoided VC are now getting involved. In addition to alluring returns, picking out the star funds may be easier for VC than for other types of investment: good performance tends to be more persistent, according to research in the Journal of Financial Economics published last year.
Source
Increased demand from VCs for deals has had the inevitable effect:
The rush of capital has pushed up prices. Venture activity for seed-stage startups today are close to those of series A deal sizes (for older companies that may already be generating revenue) a decade ago. The average amount of funding raised in a seed round for an American startup in 2021 is $3.3m, more than five times what it was in 2010
The biggest VC firms, as usual, have outsized wallets:
The result is that the industry has become more unequal: although the average American VC’s assets under management rose from $220m in 2007 to $280m in 2020, that is skewed by a few big hitters. The median, which is less influenced by such outliers, fell from $70m to $48m. But this is not to say that the industry has become dominated by a few star funds. Market shares are still small. Tiger Global, for instance, led or co-led investments worldwide worth $5bn in 2020, just 1.3% of total venture funding.
...
Company founders, for their part, have gained bargaining power as investors compete. “There’s never been a better time to be an entrepreneur,” says Ali Partovi of Neo, a VC firm based in San Francisco.
Note, for example, Andreeesen Horowitz' $2.2B fund devoted solely to cryptocurrency startups such as Chia.

The Economist notes that the proportion of startups exiting via the preferred route, an IPO, has increased:
The big-tech firms used to gobble up challengers: acquisitions by Amazon, Apple, Facebook, Google and Microsoft rose after 2000 and hit a peak of 74 in 2014. But they have fallen since, to around 60 a year in 2019 and 2020, perhaps owing to a fear of antitrust enforcement (see United States section). More startups are making it to public markets. Listings, rather than acquisitions or sales, now account for about 20% of “exits” by a startup, compared with about 5% five years ago.

Con

First, note that although the rate of acquisitions by the FANGs has dropped, but only from 74 to around 60. On average, they are each buying a company a month. Second, note that acquisitions still outnumber listings by 4 to 1. Third, lets look at how well the companies that did list are doing.
WSJ via Barry Ritholtz
This graph is from a Wall St. Journal article entitled IPOs Had a Record 2021. Now They Are Selling Off Like Crazy by Corrie Driebusch and Peter Santilli:
Looming behind a record-breaking run for IPOs in 2021 is a darker truth: After a selloff in high-growth stocks during the waning days of the year, two-thirds of the companies that went public in the U.S. this year are now trading below their IPO prices.
Although the investors who bought in the IPOs are likely under water, the VCs who sold in the IPOs did well, As The Economist wrote:
With their pockets full, investors are looking to bet on a new generation of firms.
The Wall Street Journal focuses on The $900 Billion Cash Pile Inflating Startup Valuations:
Investors are defying a share-price slump for newly public companies to make hundreds of billions of dollars available to startups, a cash pile that promises to inject a torrent of money into early-stage firms in 2022 and beyond.
The South Sea Company Prospectus (1711)
Some of this money comes in the form of a SPAC:
Special-purpose acquisition companies, which take startups public through mergers, raised about $12 billion in each of October and November, roughly doubling their clip from each of the previous three months, Dealogic data show. So far in December, three SPACs a day are being created. While that is below the first quarter’s record pace, it brings the total amount held by the hundreds of SPACs seeking private companies to take public in the next two years to roughly $160 billion.
But most of it still comes from VCs, but with an increasing proportion from private-equity firms:
The cash committed to venture-capital firms and private-equity firms focused on rapidly growing companies but not yet spent also is ballooning. So-called dry powder hit about $440 billion for venture capitalists and roughly $310 billion for growth-focused PE firms earlier this month, according to Prequin.
Just as the WSJ headline suggests, startup valuations have become inflated. In Seed Rounds At $100mm Post Money, Fred Wilson writes "We have been seeing quite a few seed rounds getting done in and around $100mm". He runs the numbers on a hypothetical $100M fund making 100 $1M seed round fundings each getting 1% of the company, for a $100M/company post-money valuation. There are three key parameters of his model:
  • 2/3 dilution from seed to exit, so the fund ends up at exit with only 0.3% of the company.
  • A 0.75 power-law of returns among the 100 companies, as shown in the graph.
  • The best performing among the companies exits at a $10B valuation.
The result is:
a $100mm seed fund that makes all of its investments at $100mm post-money will barely return the fund. And that number is gross, before fees and carry.
The fund put in $1M and got out $1.333M. That isn't a viable VC fund. Wilson points out that the valuation is the problem:
f you run that same model with a $20mm post-money value, you get a 6.667x fund before fees and carry. That’s a strong seed fund, probably a tad better than 4x to the LPs, after fees and carry. If you think you can get one of your hundred seed investments to a $10bn outcome, then paying $20mm post-money in seed rounds seems to make a lot of sense.
But Joseph Flaherty tweets that $10B valuations are rare:

It is important to note that the data underlying Ufuk Akcigit et al's analysis is entirely from the period before the most recent tsunami of money hit startups. It largely reflects the VC ecosystem back in the days when I worked at startups, with the balance of power in favor of the VCs. They write:
The analysis is carried out by combining data on all employer businesses in the US from the Census Bureau’s Longitudinal Business Database (LBD), with data on patenting from the USPTO, and deal-level data on firms receiving VC funding from VentureXpert for the period 1980-2012. A key advantage of the data is that it enables us to track the evolution of employment and patenting for all employer businesses in the US, and, critically, differentiate between the experience of VC-funded firms and other firms in the economy.
The current balance of power is in favor of founders like Travis Kalanick, Adam Neumann, Justin Zhu, and Elizabeth Holmes. The Economist notes this change:
The shift has also weakened governance. As the balance of power tilts away from them, VCs get fewer board seats and shares are structured so that founders retain voting power. Founders who make poor chief executives—such as Travis Kalanick, the former boss of Uber, a ride-hailing firm—can hang on for longer than they should.
What Akcigit et al don't investigate is the extent to which their earlier results, showing the beneficial effects of VC funding, are due not to the overall effect of VC funding, but to the large effect of high-quality VC funding on a small proportion of the funded companies. Nor do they investigate whether the effect of the more recent flood of money, the resulting proliferation of VCs, and the shift in the balance of power toward founders, has resulted in the small proportion of companies funded by high-quality VCs has become even smaller, thus dragging down the aggregate benefit of VC funding to the economy.

As I wrote in Venture Capital Isn't Working:
I'm a big believer in Bill Joy's Law of Startups — "success is inversely proportional to the amount of money you have". Too much money allows hard decisions to be put off. Taking hard decisions promptly is key to "fail fast".
"Fail fast" used to be the mantra in Silicon Valley. In most of the world success was great, failure was bad, not doing either was OK. In Silicon Valley success was great, failure was OK, and not doing either was a big problem. Trying something and failing fast meant that you had avoided wasting a lot of money and time on an idea that wasn't great, and you had freed up money and time to try something else. Now the mantra seems to be "fake it till you make it", which pretty much guarantees a waste of time and money.

Connecting EAD headings to controlled vocabularies / HangingTogether

Are EAD content headings associated with controlled vocabularies, or can they be?

For names of people, families, organizations, subjects, places, and genre forms, the EAD Tag Analysis conducted by OCLC Research in 2021 as part of our contribution to the Building a National Finding Aid Network (NAFAN) project describes how frequently these content tags are used, but a closer look at their element and attribute values can reveal how much work has been done to associate those values with identifiers for the entity in a controlled vocabulary, and also can provide a testbed for evaluating the potential for using automated tools to attempt further reconciliation.

As a warning to the reader: this post delves deeply into EAD elements and attributes and assumes some familiarity with the encoding standard. For those wishing to learn more about the definitions and structure, we recommend the Library of Congress EAD website and the highly readable and helpful EADiva website.

Inclusion of authority file numbers is infrequent, but identification of controlled vocabulary sources is more common

Top 10 persname source attribute valuesFigure 1: Top 10 persname element values (click to expand)

The EAD elements describing people, families, organizations, subjects, places, and genre form terms were extracted from the corpus of over 145 thousand finding aids provided by NAFAN participants. All of these EAD documents used the EAD2002 DTD or Schema; no EAD3 documents were provided for the NAFAN data analyses.

The tabulation of elements that included values for the authfilenumber (a number that identifies the authority file record for an access term) and source (indicating the controlled vocabulary source of the heading) attributes indicates that fewer than 10% of these elements provided an authfilenumber, but 40% or more of the elements included a source value. For example, in the extraction of 1,092,209 persname elements, 94,365 (8.6%) included an authfilenumber attribute value while 595,048 (54.5%) included a source attribute value. Establishing the source can improve the efficiency and accuracy of reconciling headings with a controlled vocabulary.

Not surprisingly, a small number of widely used controlled vocabularies predominate. For example, the Library of Congress Name Authority File (LC NAF) and Virtual International Authority File (VIAF) controlled vocabulary sources were the most commonly referenced shared vocabularies in the aggregated EAD finding aids for personal names, as shown in the chart depicting the top 10 source values found (figure 1).

Element# of elements extracted% with authfilenumber% with source
corpname504,4385.2%46.4%
famname21,9934.6%79.2%
genreform482,9554.5%67.1%
geogname392,3873.0%57.9%
persname1,092,2098.6%54.5%
subject844,8724.9%83.2%
Table 1: Tabulation of authfilenumber and persname attributes

Thank you to EADdiva for providing the excellent tag library that is linked to from the EAD elements names in this post.

Clustering personal name elements could provide a path to enriching finding aids with controlled vocabulary identifiers

This study also focused on how identity can be established for personal names in EAD finding aids. The persname element describes people who owned or created the materials in collections and others who are noted in or related to the collection materials, as well as archivists and others associated with the stewardship of the collection and creation of the finding aid. It was expected that, in comparison with access terms for organizations, places, subjects, and genres, the people that finding aids refer to may not be as widely represented in shared controlled vocabularies from the library domain, if they are not otherwise associated with published works as creators, contributors, or subjects. The EAD Tag Library provides guidance on the use of the persname element:

All names in a finding aid do not have to be tagged. One option is to tag those names for which access other than basic, undifferentiated keyword retrieval is desired. Use of controlled vocabulary forms is recommended to facilitate access to names within and across finding aid systems.

To what extent has authority control been applied in finding aids to access terms for people?

persname clusters with authfilenumber attribute valuesFigure 2: persname clusters with authfilenumber attribute values

The personal name investigation began by using the 1,092,209 persname element values that were extracted from the aggregation of NAFAN EAD documents, noted above. The persname element values were then normalized (converted to lower case, extraneous spaces and punctuation removed) and de-duplicated into clusters of matching headings. To produce a more manageable dataset for cleanup and analysis, a cutoff point for the frequency of occurrence of the heading was applied to only include clusters in which the name occurred in 5 or more finding aids. This resulted in a dataset of 20,767 personal name clusters, representing de-duplicated and merged headings from 496,340 persname elements (45% of all the extracted persnames), and including their associated EAD attribute values. This dataset was further modified using the data cleanup tools available in OpenRefine. Those changes included resolving the variant forms of the LC NAF source attribute value (described below) to a single consistent form, converting LC NAF and VIAF identifier numbers in the authfilenumber attribute to a full URL, and de-duplicating LC NAF and VIAF URLs.

As described in the section below on positive network effects that can be observed in an aggregation, the clustering of personal name elements amplifies the effect of some finding aids having used the source and authfilenumber attributes. While only 8.6% of the total extraction of persname elements included an authfilenumber attribute, in the OpenRefine project based on the 20,767 personal name clusters, 5,837 clusters (28%) included an authfilenumber for the LC NAF, VIAF, or both (figure 2). Since the clusters are based on individual persname elements, some of which may not have had an authfilenumber provided in the source finding aid, the clustering process increased the potential availability of applicable authfilenumber values from 8% to 11%, without any additional reconciliation work.

Automated and manual reconciliation of personal names to a controlled vocabulary can further enrich the clustered elements

A key advantage of working with the OpenRefine tool is that, in addition to providing ways to clean up, transform, and sort data, it can connect to external controlled vocabulary systems and reconcile strings to finding matching authorized headings and their persistent identifiers. This OpenRefine reconciliation feature was used to look for matches in LC NAF for the 11,439 persname cluster names that did not already have an LC NAF or VIAF authfilenumber attribute value in the cluster.

The OpenRefine reconciliation feature can be configured to point to a compatible “endpoint” which uses the OpenRefine Reconciliation API to convert requests into searches sent to the target controlled vocabulary’s system, typically using that system’s API or similar machine-readable data service. For this study, OCLC hosted an instance of a Library of Congress Reconciliation Service for OpenRefine, which is made available under an open source BSD license in the GitHub software repository. Its documentation provides more details on how it interacts with the LC Name Authority file and ranks its results.

persname clusters with authfilenumber attributes or exact recon matchesFigure 3: persname clusters with authfilenumber attributes or exact recon matches

The OpenRefine settings for reconciliation include an option for the system to automatically assign a match for any results that are returned from the endpoint with a high confidence. With this setting, the first pass at reconciling the 11,439 persname clusters lacking an authfilenumber automatically matched 3,491 clusters to a LC NAF heading. The percentage of clusters with an authority file number increased after the automated reconciliation and matching, from 28% to 44% (figure 3), and the total number of persname elements that have, or could inherit from the cluster, an authority file identifier increased from 11% to 17%.

The real work of reconciliation is more painstaking and careful, as the personal names that returned one or more potential matches from the LC NAF need to be evaluated, at times consulting their context within the original finding aid, to select or reject suggested matches from the authority file. This work requires diligence, time, and domain expertise. For this study, after clustering similar personal names, finding VIAF and LC NAF identifiers when available from one or more persname elements in the clusters, and looking for automated exact matches using the OpenRefine reconciliation service, there were still over 11,000 persname clusters that would need manual reconciliation and review.

persname clusters with authfilenumber attribute values or either exact or manual recon matchesFigure 4: persname clusters with authfilenumber attribute values or either exact or manual recon matches

To evaluate the impact of manual reconciliation, the top 500 persname clusters (ranked by the number of finding aids in which the personal name element was found) that lacked either an authfilenumber from the finding aid or an exact match from the first pass of the automated reconciliation process were reviewed. Just 66 of those names were manually reconciled to corresponding LC NAF records, though matches were set only if there was very high confidence in the relationship, without evaluating the name in its finding aid source to obtain more context. The tactic of working with clusters for names that appear many times in many finding aids meant that the 66 manual matches provided identifiers that had an outsized potential impact on finding aids (figure 4), as the total number of persname elements that have, or could inherit from the reconciled cluster, an authority file identifier increased substantially, from 17% to 22%.

Personal name elements in the aggregation represent a long tail and will require substantial resource commitments to establish their identity

This reconciliation study focused on just a subset of the persname element values by working with clusters of names that occurred 5 or more times, ignoring 55% of the extracted values. There is a long tail of infrequently occurring personal names, some of which include too little data to support effective reconciliation (i.e., only providing a surname) and some representing people who are not likely to be found in authority files if they lack a type of literary warrant, not having been a creator of, contributor to, or subject of a published work. They may be accurately tagged as a personal name, but their authority and identity may either be only established locally or not at all. A source attribute value of “local” (or a similar designation) was found in 15.8% of the extracted persname elements.

Variations in source attribute values impede reconciliation

The source attribute is an optional method of identifying the controlled vocabulary source for an element value. When analyzing values in persname elements, 266 unique source attribute values were found, a surprisingly high number given the expected range of controlled vocabularies used for creating archival collection descriptions. But there can be multiple distinct representations of the same vocabulary. For example, the Library of Congress Name Authority File appears to be represented by these distinct source attribute values in the EAD finding aids evaluated for this study:

lc, LC Name Authority File, lca, lcaf, lcanaf, lcanf, lccn, lchs, lcna, lcnaf , lcnaflocal, lcnag, LCNAH, lcnameauthorityfile, lcnat, lcnf, lcnnaf, lcsnaf, library of congress name authority file, library_of_congress_name_authority_file, Library_of_Congress_Name_Authority_File, lnaf, lncaf, lnnaf, naf

Some of these variants may be the result of typographic data entry errors, while others may originate in the finding aid editing interfaces and conversion tools used to create the EAD.

The pareto chart in figure 5 visualizes all the unique source attribute values for the persname elements. As also shown in figure 1, a few attribute values make up the majority of the uses, but (after some additional normalization and clustering of typographically different terms) there are 119 unique source values, some potentially representing the same controlled vocabulary. The infrequently occurring sources likely have important advantages for data management in their “local” context, but in cross-institution aggregation their functional benefits are less clear.

Pareto chart of persname source attribute valuesFigure 5: Pareto chart of persname source attribute values

This level of variation can present a barrier to cross-document and cross-aggregation data analysis, as the source attribute value is important for determining what systems to use for reconciliation of headings to persistent identifiers. If a taxonomy of controlled vocabulary source values could be agreed upon and used across finding creation tools, the interoperability of this EAD attribute would improve.

There are potential network effects for name reconciliation in aggregated finding aids

The same person’s name may be found within persname elements in finding aids from multiple repositories. While not all occurrences of that name will have been described with a source or an authfilenumber attribute, in some cases they may be. When many finding aid sources are brought together in a single aggregation, by deduplicating and clustering persname values, there is the potential for enhancing all of the finding aids by inheriting the authfilenumber and source attributes from more completely described names, representing a positive network effect. For example, the personal name string “Obama, Barack” can be found in persname elements in 17 finding aids across the aggregation, but only a few occurrences make use of the authfilenumber attribute. By clustering this data, links to the Library of Congress Name Authority file and VIAF can be derived, and potentially applied to less fully described persname elements in other finding aids, avoiding duplicative or repetitive reconciliation of the access term by each repository.

A positive network effect created by aggregating multiple finding aid sources can also be seen when a personal name cluster is associated with more than one unique identifier from the same controlled vocabulary. For example, in the study of persname values, the cluster for the name “Hamilton, Alexander, 1757-1804” was associated with two LC NAF identifiers. One identifier was correct, the other was not. Discrepancies like these can rise to the surface when multiple sources are aggregated, allowing for detection of the issue and potentially correction.

The aggregation can also help to surface inconsistencies in the controlled vocabulary sources. In this persname study, the cluster for the personal name heading “Parker, Quanah, 1845?-1911” was found to be associated with two different VIAF clusters, which can be reported and likely merged.

The post Connecting EAD headings to controlled vocabularies appeared first on Hanging Together.

NDSA Year in Review / Digital Library Federation

Happy New Year! 

As we begin 2022 we wanted to take a moment to look back at what we’ve done over the past year.  One of the big shifts the Leadership team made was starting to review membership applications quarterly – welcoming a total of 23 new members in January, March, June, September, December.  More about this process is available on the NDSA membership website along with details for how existing members can learn how to keep their contact information up to date.

Interest Groups

The three NDSA Interest Groups continued to meet on a monthly basis.  A summary of their activities follows: 

Content Interest Group

The Content Interest Group hosts a monthly call for NDSA members who are interested in preservation activities broadly related to content. Recent calls focused on topics around ethical work practices, email preservation and making a shift from bit level preservation to content based decision making. 

This past year, we’ve been experimenting with the format for these calls. For Content Exchanges, we invite members to lead us in a conversation about a topic of interest to the group. Sometimes they have experience to draw from and sometimes, we are exploring new territory together. We look for ways to engage in dialogue in order to learn from one another and make connections that might further our work or scholarship. In another series, Sharing Scholarship, we invite members to present their research in early stages in order to gather feedback and identify partners when possible.

Infrastructure

The Infrastructure Interest Group spent 2021 both continuing as we have in the past by hosting presentations on topics of interest, as well as investigating new ways to meet group members’ needs. We held several meetings where we invited meeting attendees to bring their challenges and questions to the group to be discussed and to get suggestions for solutions.  These topics ranged from discussing datasets, shifting to using cloud infrastructure, collection policies, and dynamically adjusting collection storage infrastructures. In our more topic focused meetings, the group covered: 

  • Fixity in the cloud
  • The Oxford Common File Layout (OFCL)
  • Information security
  • The environmental impact of digital preservation
  • Exploring characterization
  • Experience with RDM (research data management) systems
  • DNA storage for digital preservation

Standards and Practices Group

The Standards and Practices Interest Group focused on standards for policies for digital preservation during 2021. We reviewed policies libraries and other cultural heritage institutions have posted online, identified categories of topics covered, and developed an outline of those. Following that, we worked as a group to create a template that includes sample texts taken from the policies and added sample text to fill in gaps. 

In addition to that focused work, meetings were used for round robin updates, and questions from attendees. The networking aspect of meetings is useful in sharing Covid protocols from different institutions, as well as modifications made to workflows. Also, questions and subsequent discussions provided continuing education as we heard how different organizations handle their work. The mix of new and more experienced members from a variety of institutions provided for a rich experience.

Working Groups

NDSA Working groups were once again busy this year with work being conducted by the Digipres Conference Organizing Committee and Excellence Awards Committee, the Fixity Survey Working Group, the publishing of the Good Migrations Checklist, additional translations of the Levels of Digital Preservation, a Membership Task Force Survey and Report, the Staffing Survey Working Group, and the formation of the Web Archiving Working Group.  Highlights of the work of these groups can be found below: 

DigiPres Conference Organizing Committee

  • The 2021 DigiPres conference was hosted virtually on Thursday, November 4th 2021 – on World Digital Preservation Day! Since 2016, the conference has been held in conjunction with DLF Forum and Learn@DLF, but this was the second year our community convened online, paving the way for record numbers of proposal submissions and registrants. A recap of the event, including links to the proceedings, can be found on the DigiPres 2021 NDSA web page.  Session recordings will be out shortly.
  • Recordings from the 2020 conference were made available in early 2021

NDSA Excellence Awards Committee 

  • In 2021 the awards were renamed as the Excellence Awards to highlight and commend all forms of creative and meaningful contributions by individuals, projects, sustainability activities, organizations, future stewards, and educators to the field of digital preservation.  Awards focus on recognizing excellence in the following areas: Individual Award, Project Award, Sustainability Award, Organization Award, Future Steward Award and the Educator Award.  Information about the awards and current and past winners is available on the NDSA website.  

Fixity Survey

  • The NDSA Fixity Survey originally run in 2017 was rerun in 2021.  A call for participation went out in March, the survey was available to the public in May, and the survey results were published in November.  The Fixity Survey is expected to be completed every four years. 

Good Migrations Checklist Published

  • The Good Migrations Checklist provides a checklist for what you will want to do or think through before and after moving digital materials and metadata forward to a new digital preservation system/infrastructure.  The checklist also uses the technical framework of the functional areas of the Levels of Digital Preservation.  

Levels of Digital Preservation

  • There were three new translations (Arabic, Finnish, and German) done for the Levels of DIgital Preservation in the past year, bringing the total number of languages the matrix and other Levels documents are translated to seven.  
  • The Levels Steering group also assisted the Teaching Advocacy and Outreach subgroup in publishing a set of teaching materials about the Levels.  

Membership Task Force

  • NDSA Leadership initiated a Membership Task force in early 2021 in order to learn more about how members felt about their membership and what they may be expecting or interested in for the future.  A call for participation was made in February and the report based on the survey responses was published in December.  

Staffing Survey

  • The NDSA Staffing Survey was run in 2013 and 2017. A call for participation in the next iteration was made in March of 2021. The Staffing Survey working group spent much of the year discussing and revising the survey questions and goals, before releasing the survey for responses in November. In the past, the survey requested only one response per organization. This time, in order to get a wider set of responses and a new view of staffing situations, any individual could answer the survey and multiple people from the same organization could participate. The survey has been closed and the group will be analyzing the results in early 2022.

Web Archiving Survey 

  • Upon the NDSA Leaderships review of surveys as a whole, the Web Archiving survey was also due for the next iteration in 2021.  A request for participation in this group was sent out in October and the group has formed and has begun working on the survey.  Please keep your eyes out for the survey in 2022.  

 

Thank you!

Thank you to all of our members for taking time to participate in the many surveys and activities over the years.  We appreciate the time spent and look forward to continuing to work together.

 

The post NDSA Year in Review appeared first on DLF.

DLF Forum Community Journalist Reflection: Anna Sofia Lippolis / Digital Library Federation

Anna Sofia LippolisThis post was written by Anna Sofia Lippolis (@dersuchende_), who was selected to be one of this year’s DLF Forum Community Journalists.

Anna Sofia Lippolis is a Research Fellow at Italy’s National Research Council (CNR)’s Institute for Cognitive Sciences and Technologies (ISTC) and a Communications fellow for the Alliance of Digital Humanities Organizations (ADHO). After her BA in Philosophy, she received her MA in Digital Humanities and Digital Knowledge at the University of Bologna with a project on the quantitative analysis and visualization of authorial corrections. She has also worked as an IT consultant for projects on cultural heritage enhancement and took part in organizing two conference cycles about the present and future of Digital Humanities, “Digital WHOmanities”. Her interests include cultural analytics, quantitative approaches in literature studies and semantic technologies for knowledge organization.


I am deeply thankful for the opportunity to have helped cover this year’s conference as one of 10 Community Journalists. Being a recent graduate in Digital Humanities and a research fellow in Semantic Technologies, I was eager to attend the virtual DLF Forum 2021 to find out what best practices digital GLAM practitioners have been engaging in.

Truth of information has been a crucial theme during this pandemic and its value has been highlighted during the whole conference. As noted by Dr. Nikole Hannah-Jones and Dr. Stacey Patton when discussing the “1619” project, this means on the one hand researchers, in the broad sense of the term, need to be informed by history for their work to be held accountable in the present, while at the same time we have a responsibility towards future projects, as the infrastructures we build now will be the basis for tomorrow’s historians. Assigning values of truth to data, directly or indirectly, is also a matter of giving power to certain elements over others. It is in this sense that being informed by history reduces the risk of inequality through collaboration and inclusivity, one of the main takeaways from the DLF conference. 

In the last two years, there has been increased attention to the need of online access and discoverability, thus more consideration to digitizing material reflecting under-represented communities. Online findability of resources, whose process is based on the correct handling of metadata, implies the existence of a representational framework that organizes real-world information in the best way possible and proves necessary for long-term preservation of cultural heritage. Thus, it is only by understanding why data is not neutral that we can really appreciate the value of community engagement.

Jamboard titled: What are the biggest challenges you're encountering in your work with digital collections?

Let’s take for example the project Stolen Relations, a community-centered database project, housed at Brown University, seeking to illuminate and recognize the role the enslavement of Indigenous peoples played in settler colonialism over time. As the archival documents about indigenous enslavements are written by the colonizer, the brilliant idea of an ever-present context bar in the user interface of the database has allowed users to fully understand the data. For instance, rather than simply list “tribe” affiliations, as is sometimes listed on the original document, the database aims at providing information on how archival documents often include terms that diminish the nationhood and sovereignty of Indigenous peoples (such as the word “tribe”). And in many cases, the tribal/national affiliation of enslaved Natives was completely erased. 

Another enlightening talk on the matter to me has been A look into Latinx archival praxis. Mexican American Art Since 1848 is a digital portal that will aggregate Mexican American art and related documentation from dozens of digital collections in libraries, archives, and museums located throughout the United States in order to enable users to search through digitized collections of art that, as of today, are hard to find. Through culturally-informed search strategies, the team intended to pose a solution to the unidirectional understanding of arts and mainstream metadata conventions to classify artwork, which makes art created, exhibited and collected by Mexican Americans harder to digitize. Eurocentric standards and the formal imperative to select a single definitive statement for each metadata-represented concept, even when it could have multiple interpretations, actually means promoting a single narrative if appropriate context is not given. The perfect example to show that is related to the cataloguing as “vessel” of the piñata.

A very useful visual from the Latinx Archival Praxis, with the question "How would the Met feel if I swung a stick at their vessels?" #DLFforum

Mexican American Art Since 1848 ensures that all the works involved in the project are discoverable because of culturally-informed descriptors added to the metadata used by the partner institutions. 

It is in this same sense that initiatives like the Wikidata Edit-A-Thon mentioned in the talk The Marmaduke Problem: Comics, Linked Data, and Community-led Authority Control, which aimed at connecting the metadata associated with a subset of the Michigan State University Comics Collection to entries on Wikidata in order to make the Collection much more digitally and publicly accessible to folks outside of MSU, are important milestones to foster discussion about addressing the gaps in digital records and the creation of linked knowledge through community engagement.

There are many other challenges concerning history-informed metadata decisions that have been presented during the Forum. They include data-rich but interdisciplinary archival materials, as shown by the The Bending Water Project at the Claremont Colleges Library, and metadata mislabelling, as the project Climbing Metadata has indicated.

screenshot of slide titled Digital Libraries - Mislabelled

On the bright side, I think doubt can be considered a catalyst for change and the discussions with the Metadata Working Group and the Metadata Support Group have confirmed the need to build a community that is able to discuss everyday issues and long-term data preservation. The joint effort towards the development of a repository for metadata assessment tools, the collection of metadata application profiles, mappings, and practices, and blog posts that disseminate the results of this ongoing work is a model for effective knowledge sharing. The BOAF session about developing a metadata interview was, in the same way, about discussing a common framework on the matter. Questions were focused on useful interview techniques, on the support of DEI work in metadata and on the general establishment of guidelines, including project planning templates and data sheets for datasets

Although it is obvious there are different ways to approach digital projects, the capacity to preserve them depends upon the reliability and the possibility to reuse technical infrastructures—it is a joint, long-term strategy. Along these lines Shannon O’Neill, in the panel Open to All? Creatively Imagining, Realizing, and Defending the Commons in Libraries and Archives, stated that sustainability has a much longer vision than productivity. It is why talking about the future of digital cultural heritage preservation from the point of view of GLAM practitioners, the communities involved and worldwide users has been the focus of attention during the DLF, and why doing it now is more important than ever. 

The post DLF Forum Community Journalist Reflection: Anna Sofia Lippolis appeared first on DLF.

Blacklight: Automatic retries of failed Solr requests / Jonathan Rochkind

Sometimes my Blacklight app makes a request to Solr and it fails in a temporary/intermittent way.

  • Maybe there was a temporary network interupting, resulting in a failed connection or timeout
  • Maybe Solr was overloaded and being slow, and timed out
    • (Warning: Blacklight by default sets no timeouts, and is willing to wait forever for Solr, which you probably don’t want. How to set a timeout is under-documented, but set a read_timeout: key in your blacklight.yml to a number of seconds; or if you have RSolr 2.4.0+, set key timeout. Both will do the same thing, pass the value timeout to an underlying faraday client).
  • Maybe someone restarted the Solr being used live, which is not a good idea if you’re going for zero uptime, but maybe you aren’t that ambitious, or if you’re me maybe your SaaS solr provider did it without telling you to resolve the Log4Shell bug.
    • And btw, if this happens, can appear as a series of connection refused, 503 responses, and 404 responses, for maybe a second or three.
  • (By the way also note well: Your blacklight app may be encountering these without you knowing, even if you think you are monitoring errors. Blacklight default will take pretty much all Solr errors, including timeouts, and rescue them, responding with an HTTP 200 status page with a message “Sorry, I don’t understand your search.” And HoneyBadger or other error monitoring you may be using will probably never know. Which I think is broken and would like to fix it, but have been having trouble getting consensus and PR reviews to do so. You can fix it with some code locally, but that’s a separate topic, ANYWAY…)

So I said to myself, self, is there any way we could get Blacklight to automatically retry these sorts of temporary/intermittent failures, maybe once or twice, maybe after a delay? So there would be fewer errors presented to users (and fewer errors alerting me, after I fixed Blacklight to alert on em), in exhange for some users in those temporary error conditions waiting a bit longer for a page?

Blacklight talks to Solr via RSolr — can use 1.x or 2.x — and RSolr, if you’re using 2.x, uses faraday for it’s solr http connections. So one nice way might be to configure the Blacklight/RSolr faraday connection with the faraday retry middleware. (1.x rubydoc). (moved into its own gem in the recently released faraday 2.0).

Can you configure custom faraday middleware for the Blacklight faraday client? Yeesss…. but it requires making and configuring a custom Blacklight::Solr::Repository class, most conveniently by sub-classing the Blacklight class and overriding a private method. :( But it seems to work out quite well after you jump through some a bit kludgey hoops! Details below.

Questions for the Blacklight/Rsolr community:

  • Is this actually safe/forwards-compatible/supported, to be sub-classing Blacklight::Solr::Repository and over-riding build_connection with a call to super? Is this a bad idea?
  • Should Blacklight have it’s own supported and more targeted API for supplying custom faraday middleware generally (there are lots of ways this might be useful), or setting automatic retries specifically? i’d PR it, if there was some agreement about what it should look like and some chance of it getting reviewed/merged.
  • Is there anyone, anyone at all, who is interested in giving me emotional/political/sounding-board/political/code-review support for improving Blacklight’s error handling so it doesn’t swallow all connection/timeout/permanent configuration errors by returning an http 200 and telling the user “Sorry, I don’t understand your search”?

Oops, this may break in Faraday 2?

I haven’t actually tested this on the just-released Faraday 2.0, that was released right after I finished working on this. :( If faraday changes something that makes this approach infeasible, that might be added motivation to make Blacklight just have an API for customizing faraday middleware without having to hack into it like this.

The code for automatic retries in Blacklight 7

(and probably many other versions, but tested in Blacklight 7).

Here’s my whole local pull request if you find that more covenient, but I’ll also walk you through it a bit below and paste in frozen code.

There were some tricks to figuring out how to access and change the middleware on the existing faraday client returned by the super call; and how to remove the already-configured Blacklight middleware that would otherwise interfere with what we wanted to do (including an existing use of the retry middleware that I think is configured in a way that isn’t very useful or as intended). But overall it works out pretty well.

I’m having it retry timeouts, connection failures, 404 responses, and any 5xx response. Nothing else. (For instance it won’t retry on a 400 which generally indicates an actual request error of some kind that isn’t going to have any different result on retry).

I’m at least for now having it retry twice, waiting a fairly generous 200ms before first retry, then another 400ms before a second retry if needed. Hey, my app can be slow, so it goes.

Extensively annotated:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
# ./lib/scihist/blacklight_solr_repository.rb
module Scihist
# Custom sub-class of stock blacklight, to override build_connection
# to provide custom faraday middleware for HTTP retries
#
# This may not be a totally safe forwards-compat Blacklight API
# thing to do, but the only/best way we could find to add-in
# Solr retries.
class BlacklightSolrRepository < Blacklight::Solr::Repository
# this is really only here for use in testing, skip the wait in tests
class_attribute :zero_interval_retry, default: false
# call super, but then mutate the faraday_connection on
# the returned RSolr 2.x+ client, to customize the middleware
# and add retry.
def build_connection(*_args, **_kwargs)
super.tap do |rsolr_client|
faraday_connection = rsolr_client.connection
# remove if already present, so we can add our own
faraday_connection.builder.delete(Faraday::Request::Retry)
# remove so we can make sure it's there AND added AFTER our
# retry, so our retry can succesfully catch it's exceptions
faraday_connection.builder.delete(Faraday::Response::RaiseError)
# add retry middleware with our own confiuration
# https://github.com/lostisland/faraday/blob/main/docs/middleware/request/retry.md
#
# Retry at most twice, once after 300ms, then if needed after
# another 600 ms (backoff_factor set to result in that)
# Slow, but the idea is slow is better than an error, and our
# app is already kinda slow.
#
# Retry not only the default Faraday exception classes (including timeouts),
# but also Solr returning a 404 or 502. Which gets converted to
# Faraday error because RSolr includes raise_error middleware already.
#
# Log retries. I wonder if there's a way to have us alerted if
# there are more than X in some time window Y…
faraday_connection.request :retry, {
interval: (zero_interval_retry ? 0 : 0.300),
# exponential backoff 2 means: 1) 0.300; 2) .600; 3) 1.2; 4) 2.4
backoff_factor: 2,
# But we only allow the first two before giving up.
max: 2,
exceptions: [
# default faraday retry exceptions
Errno::ETIMEDOUT,
Timeout::Error,
Faraday::TimeoutError,
Faraday::RetriableResponse, # important to include when overriding!
# we add some that could be Solr/jetty restarts, based
# on our observations:
Faraday::ConnectionFailed, # nothing listening there at all,
Faraday::ResourceNotFound, # HTTP 404
Faraday::ServerError # any HTTP 5xx
],
retry_block: -> (env, options, retries_remaining, exc) do
Rails.logger.warn("Retrying Solr request: HTTP #{env["status"]}: #{exc.class}: retry #{options.maxretries_remaining}")
# other things we could log include `env.url` and `env.response.body`
end
}
# important to add this AFTER retry, to make sure retry can
# rescue and retry it's errors
faraday_connection.response :raise_error
end
end
end
end

Then in my local CatalogController config block, nothing more than:

config.repository_class = Scihist::BlacklightSolrRepository

I had some challenges figuring out how to test this. I ended up testing against a live running Solr instance, which my app’s test suite does sometimes (via solr_wrapper, for better or worse).

One test that’s just a simple smoke test that this thing seems to still function properly as a Blacklight::Solr::Repository without raising. And one that of a sample error

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
require "rails_helper"
describe Scihist::BlacklightSolrRepository do
# a way to get a configured repository class…
let(:repository) do
Scihist::BlacklightSolrRepository.new(CatalogController.blacklight_config).tap do |repo|
# if we are testing retries, don't actually wait between them
repo.zero_interval_retry = true
end
end
# A simple smoke test against live solr hoping to be a basic test that the
# thing works like a Blacklight::Solr::Repository, our customization attempt
# hopefully didn't break it.
describe "ordinary behavior smoke test", solr: true do
before do
create(:public_work).update_index
end
it "can return results" do
response = repository.search
expect(response).to be_kind_of(Blacklight::Solr::Response)
expect(response.documents).to be_present
end
end
# We're actually going to use webmock to try to mock some error conditions
# to actually test retry behavior, not going to use live solr.
describe "retry behavior", solr:true do
let(:solr_select_url_regex) { /^#{Regexp.escape(ScihistDigicoll::Env.lookup!(:solr_url) + "/select")}/ }
describe "with solr 400 response" do
before do
stub_request(:any, solr_select_url_regex).to_return(status: 400, body: "error")
end
it "does not retry" do
expect {
response = repository.search
}.to raise_error(Blacklight::Exceptions::InvalidRequest)
expect(WebMock).to have_requested(:any, solr_select_url_regex).once
end
end
describe "with solr 404 response" do
before do
stub_request(:any, solr_select_url_regex).to_return(status: 404, body: "error")
end
it "retries twice" do
expect {
response = repository.search
}.to raise_error(Blacklight::Exceptions::InvalidRequest)
expect(WebMock).to have_requested(:any, solr_select_url_regex).times(3)
end
end
end
end

Blockchain Gaslighting / David Rosenthal

In Web3/Crypto: Why Bother? Albert Wenger draws an analogy between the PC and "web3" as platforms for innovation:
The late Clayton Christensen characterized this type of innovation as being worse at everything except for one dimension, but where that dimension really winds up mattering a lot (and then over time everything else gets better also as the innovation is widely adopted).

The canonical example here is the personal computer (PC). The first PCs were worse computers than every existing machine. They had less memory, less storage, slower CPUs, less software, couldn’t multitask, etc. But they were better at one dimension: they were cheap. And for those people who didn’t have a computer at all that mattered a great deal.
...
A blockchain is a worse database. It is slower, requires way more storage and compute, doesn’t have customer support, etc. And yet it has one dimension along which it is radically different. No single entity or small group of entities controls it – something people try to convey, albeit poorly, by saying it is “decentralized.”
Below the fold I explain why this is typical blockchain gaslighting.

The cheapness of the PC was something users experienced directly, but the "decentralized" nature of blockchains and cryoptocurrencies is an abstract quality. The experience of using them is just like using conventional centralized systems, only worse. Promoters of these technologies thus need a constant propaganda marketing campaign explaining that they're better because they're "decentralized" and thus not controlled by big corporations and governments that you can't trust. They can then fantasize, as Wenger does, about the golden future of innovation that awaits because they are "decentralized" and thus aren't controlled by big corporations:
And if widely adopted Web3/crypto technology will also start to improve along other dimensions. It will become faster and more efficient. It will become easier and safer to use. And much like the PC was a platform for innovation that never happened on mainframes or mini computers, Web3 will be a platform for innovation that would never come from Facebook, Amazon, Google, etc.
The infrastructure of the Internet (IP/DNS/HTTP and so on) is decentralized, but that hasn't stopped the actual Internet that everyone uses being centralized — the problem "web3" claims it will solve precisely because its infrastructure is decentralized. Wikipedia defines gaslighting as:
The term may also be used to describe a person (a "gaslighter") who presents a false narrative to another group or person which leads them to doubt their perceptions and become misled
The reason I tag authors like Wenger as gaslighters is that, at every level, blockchains and cryptocurrencies are not actually decentralized. I've been pointing this out since 2014's Economies of Scale in Peer-to-Peer Networks, most recently in my Talk at TTI/Vanguard Conference. Here are a few examples:
Source
Blockchain-based technologies promise liberation from control by big corporations. But what they actually deliver is control by a different set of as yet somewhat smaller corporations, which are much less accountable, transparent, or regulated than the FAANGs. Thus the need for continual gaslighting.

Update 5th January: I should have made it explicit that the systems Wenger and I are discussing are based on permissionless blockchains. Permissioned systems are necessarily centralized.

One final thought. If a system is to be decentralized, it has to have a low barrier to entry. If it has a low barrier to entry, competition will ensure it has low margins. Low margin businesses don't attract venture capital. VCs are pouring money into cryptocurrency and "web3" companies. This money is not going to build systems with low barriers to entry and thus low margins. Thus the systems that will result from this flood of money will not be decentralized, no matter what the sales pitch says.

Compliance monitoring in RIM systems: not a US thing? Think again. / HangingTogether

This blog post is jointly authored by Rebecca Bryant from OCLC Research and Jan Fransen from the University of Minnesota Libraries, and is part of a series based upon the OCLC Research Information Management in the US reports.

Monitoring research policy compliance is a major driver of research information management activities. That was a key finding of the 2018 report, Practices and Patterns in Research Information Management: Findings from a Global Survey, collaboratively prepared by OCLC Research and euroCRIS, which describes the global research information management (RIM) landscape, from a survey of more than 380 respondents in 44 countries.

The report found that 79% of responding institutions described monitoring and supporting institutional compliance with external mandates as an extremely important or important driver of RIM activities. One Australian respondent commented, “Compliance with government regulations is the main driver behind RIM activities.”

But not here in the United States

The majority of US survey respondents indicated that compliance monitoring was not important or even applicable. This finding was further confirmed in the recent OCLC Research report, Research Information Management in the United States, which examines the US landscape through close examination of the RIM practices at five case study institutions:

  • Penn State University
  • Texas A&M University
  • Virginia Tech
  • UCLA (including University of California system-wide practices)
  • University of Miami.

In these case studies, we found that while most US institutions support multiple RIM use cases, especially public portals and faculty activity reporting workflows, compliance monitoring was by far the least important. In stark contrast to other locales, we found only limited compliance monitoring activities at one case study institution, the University of California, and only for certain units, particularly the Lawrence Berkeley National Laboratory (LBNL), which is an entity under the US Department of Energy and managed by the University of California.

Why?

A major reason is that the US doesn’t have an external research assessment exercise like the Research Excellence Framework (REF) in the United Kingdom or the Excellence in Research for Australia (ERA), which require institutions to collect information about research outputs and measure the impact of sponsored research activities. Failure to accurately track and report outputs may have significant economic and reputational repercussions.


Slide from Russell R (2013) “CERIF CRIS UK landscape study: work in progress report”.
euroCRIS Membership Meeting Spring 2013 (DFG, Bonn, May 13-14, 2013), http://hdl.handle.net/11366/75

For example, beginning in the 1990s, Dutch research universities and institutes have been assessed every six years. In Finland there are strong incentives to track each and every publication, as 13% of institutional support to universities from the Ministry of Education and Culture is directly linked to publication output. These requirements have led to the earlier development and maturity of RIM infrastructures in Europe.

Open Science may change this

Today an ever-increasing number of funders, research organizations, and regional bodies are requiring public access to scholarly or scientific research, including publications and/or data. For example, UK Research and Innovation (UKRI) seeks to make all UKRI-funded research outputs openly available, with policies impacting publications and research data. These efforts are strong in Europe, particularly as the cOAlition S consortium of research organizations and funders are enacting mandates requiring immediate open access to scholarly publications.

Here in the US we are also seeing an increase in public access mandates, such as the 2013 policy memo from the White House Office of Science and Technology applying to federally-sponsored research as well as private funders like the Bill and Melinda Gates Foundation. The failure to comply with an open access mandate potentially risks economic losses for the investigator and the institution.

RIM systems offer a useful infrastructure for monitoring compliance with open science mandates by offering an entity-based structure that can link publications, datasets, and other research outputs with a specific grant (and with the investigators, institutions, equipment, and more).

RIM systems can help make sense of a complex regulatory environment by facilitating a more granular view of the institution’s research.

Specifically, it can enable tracking of subcomponents of this research: only some publications, resulting from specific grants, by specific principal investigators are necessary to monitor, and in relation to specific funder requirements.

Opportunities for US institutions

European institutions are using RIM systems to track compliance with open access mandates, but it doesn’t appear that this functionality is widely used in the United States[1]. Or at least not yet. Since US RIM systems aren’t used for compliance monitoring for national research assessment exercises, their potential for open science compliance monitoring may not be as apparent to US users, but we believe there are significant opportunities for improved processes and efficiencies.[2]

We encourage readers at institutions with an existing RIM system to share this potential use with others at their institutions, particularly those with grant awards from NIH (which requires PubMed Central deposit) and NSF (NSF-PAR deposit).[3]

RIM systems populated with high-quality data can offer a time-saving alternative to laborious spreadsheets used today for monitoring and reporting on compliance with these mandates.

RIM systems can support multiple uses. Institutions will see the greatest return on their investments when using these systems to support and streamline as many processes as possible.

Rebecca Bryant, PhD, serves as Senior Program Officer at OCLC Research where she leads investigations into research support topics such as research information management (RIM).  Janet (Jan) Fransen is the Service Lead for Research Information Management Systems at University of Minnesota Libraries. In that role, she works across divisions and with campus partners to provide library systems and data that save researchers, students, and administrators time and highlights the societal and technological impacts of the University’s research. The most visible system in her portfolio is Experts@Minnesota.


[1] The Practices and Patterns report documented that compliance and open access to publications were an extremely important or important function in locales like the UK and the Netherlands, but it was of less or no importance in the US (and Canada).

[2] P De Castro (2018), “The Role of Current Research Information Systems (CRIS) in Supporting Open Science Implementation: the Case of Strathclyde”. ITlib. Informačné technológie a knižnice Special Issue 2018: pp 21–30, https://doi.org/10.25610/itlib-2018-0003.  

[3] The NIH Public Access Policy requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funding to PubMed Central immediately upon acceptance for publication. https://publicaccess.nih.gov/. The NSF also has a public access policy, effective 25 January 2016, with details at https://www.nsf.gov/news/special_reports/public_access/index.jsp. It offers NSF awardees the opportunity to deposit their work in the NSF Public Access Repository (NSF-PAR,) https://www.nsf.gov/news/special_reports/public_access/about_repository.jsp.

The post Compliance monitoring in RIM systems: not a US thing? Think again. appeared first on Hanging Together.

How to implement remote following for your ActivityPub project / Hugh Rundle

Recently I contributed an addition to the Bookwyrm social reading software to enable "remote following". Bookwyrm uses the ActivityPub protocol for decentralised online social interaction. The most well-known ActivityPub software is Mastodon, but there are many other implementations, including Pixelfed (for sharing photos), Funkwhale (music), write.as (long form articles) and Pleroma (general). This blog post is primarily for a very niche audience: software developers who want to implement "remote following" for their own decentralised social software. Whilst the technique I discuss here is not restricted to (nor even part of) the ActivityPub specification, it is likely that you would be doing this for an ActivityPub implementation. I received some very helpful pointers from people via my Mastodon account when I was looking in to how to do this, but I could not find anything that explained the whole process, and some of the relevant documents appear to have disappeared from the web, so I figured it may be helpful to write up the process for other people who want to do the same thing.

What do we mean by "remote following?"

It's probably helpful to define our terms up front. ActivityPub is a protocol enabling social interaction between "actors" on multiple (theoretically infinite) platforms. So for example, a Mastodon account hosted at aus.social can follow a Mastodon bot at botsin.space. However because there are multiple ways to implement ActivityPub, it is not merely limited to different servers running the same software interacting: we can also interact with actors who use different implementations. So our aus.social user can also follow someone on pixelfed.social and view their photos from the Mastodon sofware, and get status updates from a bookwyrm.social user directly to their Mastodon feed.

By "remote following" I primarily mean following an account hosted on one implementation from an account using a different implementation, however the technique also works for servers using the same implementation and can improve the "workflow" for this if the remote user does not yet have any presence on your home server. From a user point of view, remote following looks like this:

  1. View another user's profile page on their chosen platform
  2. Click a button to "remote follow" the user from another platform
  3. Enter your username for your platform and click a button to follow
  4. Be redirected to your home server, with a prompt to log in if necessary
  5. Confirm the request to follow the user

In some respects this is a similar workflow to OpenID - we need to identify the remote service, authenticate into it and confirm our choice.

In the following examples we are going to use two ActivityPub actors:

  • @molly@example.social, a user on a new ActivityPub platform
  • @hugh@remote.social. a user on a Mastodon server

For a full implementation of remote-following, both users need to be able to "remote follow" the other user via the five steps listed above. The first thing we likely want to do is to allow our users to share their content outside of the walls of our own implementation: to put the social in "social media". So we should start by helping users of other ActivityPub software to follow our own users.

Steps 1 and 2 - Identifying the user to follow

The first step is fairly straightforward and even a minimal ActivityPub implementation is likely to already have it: a user profile page. Different implementations use different URL schemes. For our imaginary new platform we will use https://example.social/user/{local_username}.
So @molly@example.social's profile page would be https://example.social/user/molly.

Step 2 is also pretty straightforward, you just need to add a button somewhere on the page to allow people to remote follow. You may choose to only make this visible to users who are not logged in, or to everyone, it's up to you. But of course you need this button to do something. Exactly how you implement this—whether with a pop-up window like Mastodon and Bookwyrm use, or just directing the user to a new page in the existing window—is your choice, but essentially this button should really behave like an anchor element and open a new page with a form where the requesting user can enter their own (remote) username. For our example we will use a page at the url https://example.social/remote_follow.

Molly profile page

You must pass information to this page about the user to follow. How this is done is up to you: for Bookwyrm I simply used a query parameter, but you could use an HTTP header, as Mastodon appears to do.

Step 3 - Identifying the remote follower

So far our remote user Hugh has browsed to https://example.social/user/molly, clicked on a button to follow Molly, and been directed to another page at https://example.social/remote_follow asking him to enter the account from which he wants to follow @molly@example.social.

Remote follow page

So ...now what? Hugh enters his username, and clicks the Follow button. But Hugh doesn't have an account with example.social—he's not even using the same software. In ActivityPub terms what we need here is for remote.social to send a FOLLOW request to example.social, asking it to add @hugh@remote.social to @molly@example.social's followers collection. But we only control example.social, and we're not authorised to do anything on behalf of @hugh@remote.social. We need to send Hugh to a page on remote.social from which he can send a "remote" follow request: but how do we know where to send him?

Webfinger to the rescue

The webfinger protocol can be used "to discover information about people or other entities on the Internet using standard HTTP methods". Sounds perfect! Webfinger in turn follows the "Well known" URI structure. Don't worry, you don't actually need to read either of these IETF documents, all you need to know is that it has become standard practice in ActivityPub implementations to provide information about users with this URL structure:

/.well-known/webfinger/?resource=acct:USERNAME@DOMAIN.TLD

With webfinger, as long as you know the username and the domain for a remote user, you can discover everything else you need to enable them to follow one of your local users. In our example, remote.social is a Mastodon server, so if we make an HTTP GET request to https://remote.social/.well-known/webfinger/?resource=acct:hugh@remote.social (note that we drop the initial @) we will get a JSON reply like this:

{
"subject": "acct:hugh@remote.social",
"aliases": [
"https://remote.social/@hugh",
"https://remote.social/users/hugh"
],
"links": [
{
"rel": "http://webfinger.net/rel/profile-page",
"type": "text/html",
"href": "https://remote.social/@hugh"
},
{
"rel": "self",
"type": "application/activity+json",
"href": "https://remote.social/users/hugh"
},
{
"rel": "http://ostatus.org/schema/1.0/subscribe",
"template": "https://remote.social/authorize_interaction?uri={uri}"
}
]
}

We are interested in the last two links. The link with a rel of self tells us the canonical user URI. This can differ between implementations, so Webfinger tells us the exact URI we need for this particular implementation.

The last link has a rel of http://ostatus.org/schema/1.0/subscribe. You might be thinking this looks like a linked data source, and that if you follow the link it will provide a specification for subscribe in the ostatus schema. I mean, that would be a reasonable assumption. However, this is simply a reference indicating that this link has the meaning of a subscribe template link as defined in the ostatus schema. This is useful, because at the time of writing, the link will simply send you to some kind of advertising page for online slot machines. Don't worry though, you can find the relevant bit of the ostatus specification on github pages—at least for now.

In any case, the webfinger response tells us what we need to do, which is redirect Hugh to the template url, replacing the {uri} section with the URI of our local user Molly. The template URI must always have a reference to {uri} which refers to the local user's URI, and this must be URL encoded.

Confirming the follow request on the remote server (steps 4 and 5)

Your head may be spinning a little at this point, so let's recap. Remote user Hugh wants to follow local user Molly. Hugh clicks on the Remote Follow button on Molly's profile. We present Hugh with a form to tell us his remote account name and he enters @hugh@remote.social and submits the form. We send a GET request from our server to https://remote.social/.well-known/webfinger/?resource=acct:hugh@remote.social, and in return we get sent some JSON conforming to the Webfinger protocol. If it uses ostatus subscribe, we can pull out the template value from the link where rel is http://ostatus.org/schema/1.0/subscribe.

We are now ready to respond to Hugh's button pressing!

First we URL-encode local user Molly's canonical user URI, then use this in place of the {uri} provided in the template value:

https://remote.social/authorize_interaction?uri=https%3A%2F%2Fexample.social%2Fuser%2Fmolly

Now we redirect Hugh to this URL. At this point, there is nothing more for us to do. The remote.social server will ensure Hugh is actually logged in, ask him to confirm the request, and then send an ActivityPub FOLLOW request to our server, and if we have set it up correctly, it will respond just the same as any other ActivityPub compliant follow request.

Hugh following Molly from Mastodon

What about the other way around?

So far so good, but what if Molly wants to follow Hugh using the same logic? At the moment, there is no way for this to happen. Molly will go to Hugh's profile page on Mastodon, click "Follow", fill in her username in the pop-up window, and ...get an error. This is because remote.social doesn't know how to find our ostatus subscribe template. There's a good reason for that: we don't have one yet.

To put the final pieces into place we need to do two things: create the template page, and add a reference to it at a .well-known/webfinger URI.

Setting up your remote follow template

To allow our local users to follow remote users, we need a template page that can display basic information about the user they want to follow, and require them to take some kind of action to confirm they want to remote follow. Typically this means we display the avatar and username of the remote user, and ask our local user to press a button to confirm.

Our page logic should also check that the user is actually logged in! You have probably already set up some templating logic for this across your application: it should be used here too so that our local user is prompted to log in but then returned to the remote follow page.

So in this example we need a route like this:

https://example.social/ostatus_subscribe?acct={uri}

The exact path and query param isn't important: Bookwyrm uses ostatus_subscribe and acct to make clear the connection with ostatus and webfinger, Mastodon uses authorize_interaction and uri, which makes just as much sense. As we discussed above, the {uri} will be the URL-encoded canonical URI for the remote user. What I didn't make clear in the earlier sections is that if you add .json to the end of this user URI and GET it, the request should send back all the ActivityPub actor information you need for this user. This is helpful because we can use that to get the username, user icon (avatar), and anything else you want to display to your local user.

Now when we get a request to this path, it's a matter of grabbing the information we want from their canonical URI, presenting a confirmation screen to our user, and then sending a normal follow request to the remote server once our user (in this case, Molly) confirms.

Molly following Hugh

Telling other servers where to find your template

As I just noted, the path to your template is arbitrary and you can use whatever path you like. However, you need to tell other servers where to find it. As we saw above, we can do this using webfinger. You need to implement a webfinger route as follows:

https://example.social/.well-known/webfinger/?resource=acct:{uri}

Requests to this route should check for local users identified by {username} (e.g. molly@example.social) and return a JSON response with the user's webfinger data. A fairly minimal but compliant webfinger response would look like this:

{
"subject": "acct:molly@example.social",
"links": [
{
"rel": "self",
"type": "application/activity+json",
"href": "https://example.social/users/molly"
},
{
"rel": "http://ostatus.org/schema/1.0/subscribe",
"template": "https://example.social/ostatus_subscribe?acct={uri}"
}
]
}

Now other ActivityPub services like Mastodon, Bookwyrm and Pleroma can find your remote subscribe template, and your users will be able to take advantage of the remote following buttons on user profiles on those services. 🎉

Errors

During these communications between servers, there are all sorts of errors you can get, and it's helpful to your users if you deal with them separately and display different messages for each one, so that they know why their request failed. Here is a non-exhaustive list of reasons the request might fail:

  • the remote user doesn't exist
  • one of the users has blocked the other one
  • the remote user is already following the local user
  • the remote server has not implemented webfinger
  • the remote server has implemented webfinger but does not define an ostatus subscribe template
  • the remote server fails to respond for some reason
  • the remote server experiences an error
  • the local server experiences an error

Conclusion

Hopefully this explanation will be helpful to others who want to implement remote following for software using the ActivityPub protocol. As you can see, this feature is really nothing to do with ActivityPub, instead using the webfinger protocol, and a remnant of the ostatus protocol, which is not well documented or easy to find. If you spot any errors or have any suggestions for improvement or additions to this post, get in touch via @hugh@ausglam.space.


DLF Digest: January 2022 / Digital Library Federation

DLF Digest

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation


This month’s news:

  • Join us in Baltimore! The call for the 2022 DLF Forum volunteer planning committee is open now through Friday, January 14. Next year’s event will take place in Baltimore, Maryland, in October 2022. For more information about committee roles and time commitments, and to apply to join the planning committee, head to the volunteer form here: https://forms.gle/2ajaM8zsruL3ctfw8 
  • Applications are now being accepted for the 2022 Leading Change Institute (LCI), to be held July 10–15 in Washington DC. Cosponsored by CLIR and EDUCAUSE, LCI is designed for leaders in higher education—including CIOs, librarians, information technology professionals, and administrators—who are interested in working collaboratively to promote and initiate change on critical issues affecting the academy. Applications are due by January 10.
  • CLIR recently announced award recipients for its new “Pocket Burgundy” publication series. The series, which derives its name from the deep red covers of CLIR’s traditional research reports, will focus on shorter pieces—20 to 50 pages—addressing current topics in the information and cultural heritage community. View the awardees here.
  • CLIR offices will be Monday, January 17, in observance of Martin Luther King, Jr. Day.

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Can’t find meeting call-in information? Email us at info@diglib.org.

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member institution. Learn more about our working groups and how to get involved on the DLF website. Interested in starting a new working group or reviving an older one? Need to schedule an upcoming working group call? Check out the DLF Organizer’s Toolkit to learn more about how Team DLF supports our working groups, and send us a message at info@diglib.org to let us know how we can help. 

The post DLF Digest: January 2022 appeared first on DLF.

Work / Ed Summers

Today I’m starting a new job working as a software developer at Stanford University as part of their Digital Library Systems and Services team. For at least the last decade DLSS has been building standards, technology and practices for digital library, archives and repository systems–not only for Stanford, but for the wider open source and GLAM communities. I’m really excited to have the opportunity to join their infrastructure team, and to work with some new and familiar faces, on maintaining and further developing their systems and services.

The maintenance angle is something that specifically drew me to this job. DLSS’ prolific output and impact is the result of many skilled people working together for a sustained period of time. The more software that is written and deployed, the more challenging maintenance work can become. Knowing when to refactor, replace or retire applications and services can be extremely tricky, especially when some of those applications are interdependent, and/or are open source and used in other contexts. The longer I’ve worked as a software developer the more interesting these questions of maintenance of existing systems, as opposed to greenfield development, have become.

When I joined MITH back in 2014 they had an extensive back catalog of projects and applications that needed to be cared for. I helped migrate many of these to AWS where they became more legible (resource utilization, cost) compared to the server that they had accreted on at the University of Maryland. Over the course of my stay there I gradually also migrated these “sites” from dynamic database backed applications to static sites, sometimes with degraded functionality that was negotiated with their original creators. Upgrading these applications was increasingly untenable, and they often posed security problems. At the end of the day many of these projects remain online as a record of research activity that MITH was engaged in. It feels good to be leaving MITH having helped them manage this transition, which I think also created some breathing room for embarking on new projects.

As part of my PhD research I focused on how maintenance work and archival appraisal fit together. I think it’s useful to see decisions about what records to keep and which ones to discard as acts of maintenance and repair, because it highlights how these efforts are more than acts of preservation–they are acts of transformation. Sites of maintenance and repair are very often places where values get expressed. What things actually get attention over others is not simply a question of managing priorities, but is a real material expression of individual, organizational and cultural values. This was especially true when looking at web archiving practices, where born digital records and the systems that are used to manage (crawl, index, replay) them are intertwined and interdependent (Hedstrom, 2002).

Another dimension that my research explored was how maintenance work can sometimes operate under the radar as a conservative force, preserving established interests, sustaining the status quo, and serving the purposes of vested power, instead of allowing for needed and necessary change. One way of working against this tendency is to understand maintenance and repair work not only as expressions of value, but as a pivot point for transformative (not only incremental) change (Jackson, 2014 ; A. Russell & Vinsel, 2016 ; A. L. Russell & Vinsel, 2018) as well as a practice of care (Maintainers et al., 2019).

Hopefully this orientation won’t make my work more confusing, because there will be a lot to understand. Maybe too much. I’m actually excited about this new role at Stanford because it shifts my footing back to working as a software developer on a team of software developers. For the last eight years I’ve worked as a software developer, but this has largely been solo or paired work, and mostly inflected towards research. The aspects of research that sent me back to school, and that have always interested me the most were ones that engaged with practice (rather than ideation and theory) so I’m very grateful for the opportunity to re-engage with practice after some time away. Look for some of my notes here to pop up here as I get into it.

References

Hedstrom, M. (2002). Archives, memory, and interfaces with the past. Archival Science, 2(1-2), 21–43.
Jackson, S. J. (2014). Rethinking repair. In P. Boczkowski & K. Foot (Eds.), Media technologies: Essays on communication, materiality and society (pp. 221–239). MIT Press. Retrieved from http://sjackson.infosci.cornell.edu/RethinkingRepairPROOFS(reduced)Aug2013.pdf
Maintainers, T. I., Olson, D., Meyerson, J., Parsons, M. A., Castro, J., Lassere, M., … Acker, A. (2019). Information Maintenance as a Practice of Care. https://doi.org/10.5281/zenodo.3251131
Russell, A. L., & Vinsel, L. (2018). After innovation, turn to maintenance. Technology and Culture, 59(1).
Russell, A., & Vinsel, L. (2016). Hail the maintainers. Retrieved from https://aeon.co/essays/innovation-is-overvalued-maintenance-often-matters-more

Issue 79 - The Placeholder Issue / Peter Murray

This is just a placeholder as I develop the newsletter framework. I went back and counted—there were 78 Thursday Threads posts before I stopped in 2015. That will make the next issue #79!

Public Domain Day 2022: Trespassers Will / John Mark Ockerbloom

The Piglet lived in a very grand house in the middle of a beech-tree, and the beech-tree was in the middle of the forest, and the Piglet lived in the middle of the house. Next to his house was a piece of broken board which had “TRESPASSERS W” on it. When Christopher Robin asked the Piglet what it meant, he said it was his grandfather’s name, and had been in the family for a long time, Christopher Robin said you couldn’t be called Trespassers W, and Piglet said yes, you could, because his grandfather was, and it was short for Trespassers Will, which was short for Trespassers William. And his grandfather had had two names in case he lost one–Trespassers after an uncle, and William after Trespassers.

— A. A. Milne, “Winnie-the-Pooh”, now in the US public domain

It’s good to be celebrating another Public Domain Day. It’s especially good this year in the US, where we get an especially rich set of arrivals to the public domain. They include as many as 400,000 sound recordings from the invention of recording through the end of 1922. They also include many thousands of publications from 1926, including classics like Winnie-the-Pooh, quoted above. (See my just-finished public domain countdown for a selection of other interesting works joining the US public domain.) Most other countries have public domain arrivals to celebrate as well. In countries like those in Europe with “life+70 years” copyright terms, those include works of authors who died in 1951, and in countries like Canada that still have “life+50 years” terms, they include works of authors who died in 1971. (Some notable authors who died in those years are featured in the Public Domain Review’s Public Domain Day 2022 article.)

As I’ve thought about Winnie-the-Pooh today, I’ve been drawn back to the quote from it above. Much of Milne’s dry humor in it (and elsewhere in the book) was aimed at older readers, and flew over my head when I first read it as a child. I didn’t know that the “Trespassers W” sign was the remnant of someone trying to claim ownership over the land the characters inhabit. All of the characters in the book, likewise, are completely oblivious to that claim. They freely wander over the Hundred Acre Wood and surrounding countryside without any regard to private property claims. Instead, as seen above, they assume a completely different and absurd reason for the claim-staking sign.

Now that the book is in the public domain, we too can revisit and reuse the setting and characters of the book as we like. But the value of the intellectual property claims related to Winnie-the-Pooh make it important for us to watch our steps carefully. Pooh is one of the most valuable characters in the portfolio of the Walt Disney Company, and this book’s entry to the public domain was delayed 39 years, in part because of Disney’s lobbying of Congress. They still control rights to the designs of Pooh and friends in their animated cartoons (recognizably different from the original designs by Ernest Shepard), to the characters that don’t appear in the 1926 book (including Tigger, who shows up in its sequel), and to trademarks covering all manner of Pooh-related merchandise.

In a blog post at Duke’s Center for the Study of the Public Domain, Jessica Jenkins discusses the rights Disney can still assert over Winnie the Pooh and friends. She also discusses how other rightsholders have wielded control over any use of characters that they claim is “too close” to their own expressions. For instance, the estate of Arthur Conan Doyle got Netflix to make a deal with them for the use of Sherlock Holmes, a character who’s long been in the public domain, over claims that their movie Enola Holmes reused copyrightable aspects of Holmes that only appear in the last few stories that were still under US copyright. The character copyright claims of the estate are dubious, but the estate’s been litigious enough that I could easily see how a filmmaker would prefer to settle with the estate rather than undergo a long and costly lawsuit, even if it were likely to eventually get a favorable ruling.

Similarly, it’s possible that Disney and other rightsholders could chill reuse of public domain works by making legal threats against anything they claim is “too similar” to their own products. Similar concerns over the character Bambi are a reason that new translations of Felix Salten’s original novel are only being released today, when it is finally unquestionably in the public domain in all major global markets. There are arguments to be made that Bambi has already been in the public domain for a few years at least, but due to the complicated litigation history, many have been reluctant to make use of the character until its 1926 US registration expired and made those arguments moot.

As I went walking I saw a sign there
And on that sign it said “No Trespassing.”
But on the other side it didn’t say nothing.
That side was made for you and me.

— Woody Guthrie, “This Land Is Your Land”, as published on his official website

What can we do to protect the public’s right to enjoy and reuse what’s rightfully theirs to use, when others want to monopolize them? For one, we can boldly and publicly make full use of the parts of our cultural heritage that are not truly covered by “No Trespassing” claims, the sides that, as the verse above says, were “made for you and me”. We can assert and exercise the right to use the public domain, even in the face of challenges to it. Whether those assertions result in court victories (which blunted even more expansive claims over Sherlock Holmes by the Conan Doyle estate in 2014), or in implicit peace treaties (like the willingness of Guthrie’s rightsholders to liberally license “This Land is Your Land” without admitting to public domain status), our affirmations of the public domain make it safer not only for us to use, but for others to use as well.

Affirming the public domain motivates projects like HathiTrust’s Copyright Review Program, the New York Public Library’s U.S. Copyright History Project, as well as Penn’s Deep Backfile Project for serials that I manage. We’re all trying to make it easier to identify and open access to works newer than 95 years old that are in the public domain (but not obviously so) so that people can feel more secure reproducing and reusing them.

Fair use is important to defend as well. I can quote the verse above from “This Land is Your Land”, despite it not being in the 1945 publication of the song that EFF has convincingly argued is in the public domain, and despite often-repeated folk advice to never quote song lyrics without permission. I’m using a limited portion of the song analytically to help make a point in my argument about public rights, and my use is not likely to substitute for, or hurt the market for, the song itself. (Guthrie’s own words also suggest he’d be happy with my use.) In other words, I’m exercising fair use. And when we exercise fair use, we keep it from atrophying, and preserve it as an “essential part of our political and cultural life”, and an important protection of free expression.

We also may need to work to protect all the sound recordings that just entered the public domain from claims that now have no more validity than the remnants of the sign outside Piglet’s door. Many of the major Internet platforms have mechanisms to automatically flag audio that seems to be derived from commercial recordings, and block them, demonetize them, and impose copyright strikes against the people who post them, Up until now, platforms could generally safely assume that recordings that have ever had a valid copyright claimant always have one. But now there are hundreds of thousands of recordings that once had a claimant, but now belong to the public at large. We need to make sure that platforms recognize and respect those new public rights. If past experience is any guide, public vigilance and outcry over improper takedowns will help ensure that happens.

This Public Domain Day gives us much to celebrate, and to use, in all kinds of educational, entertaining, and creative ways. To make the most of it, we have to resolve and work to protect it from those who would try to monopolize public rights for themselves. While some may still call us “trespassers” when we make full use of public domain, fair use, and other public rights, our will to persist in those uses helps bring the copyright system into a healthier balance, promoting the well-being of creators and audiences alike.

Talk at TTI/Vanguard Conference / David Rosenthal

I was invited to present to the TTI/Vanguard Conference. The abstract of my talk, entitled Can We Mitigate Cryptocurrencies' Externalities? was:
Bitcoin is notorious for consuming as much electricity as the Netherlands, but there are around 10,000 other cryptocurrencies, most using similar infrastructure and thus also in aggregate consuming unsustainable amounts of electricity. This is far from the only externality the cryptocurrency mania imposes upon the world. Among the others are that Bitcoin alone generates as much e-waste as the Netherlands, that cryptocurrencies enable a $5.2B/year ransomware industry, that they have disrupted supply chains for GPUs, hard disks, SSDs and other chips, that they have made it impossible for web services to offer free tiers, and that they are responsible for a massive crime wave including fraud, theft, tax evasion, funding of rogue states such as North Korea, drug smuggling, and even armed robbery. In return they offer no social benefit beyond speculation. Is it possible to mitigate these societal harms?
The text with links to the sources is below the fold.

[Slide 1: Title]
I'd like to thank John Markoff for inviting me to present to this amazing conference. You don't need to take notes; when I stop talking the text of my talk with links to the sources and much additional material will be at blog.dshr.org. I'm aiming for 10 minutes for questions at the end. Before I start talking about cryptocurrencies, I should stress that I hold no long or short positions in cryptocurrencies, their derivatives or related companies; I am long Nvidia. Unlike most people discussing them, I am not "talking my book".

Cryptocurrencies' roots lie deep in the libertarian culture of Silicon Valley and the cypherpunks. Libertarianism's attraction is based on ignoring externalities, and cryptocurrencies are no exception.

[Slide 2: Externalities]
Bitcoin is notorious for consuming as much electricity as the Netherlands, but there are around 10,000 other cryptocurrencies, most using similar infrastructure and thus also in aggregate consuming unsustainable amounts of electricity. This is far from the only externality the cryptocurrency mania imposes upon the world. Among the others are that Bitcoin alone generates as much e-waste as the Netherlands, that cryptocurrencies suffer an epidemic of pump-and-dump schemes and wash trading, that they enable a $5.2B/year ransomware industry, that they have disrupted supply chains for GPUs, hard disks, SSDs and other chips, that they have made it impossible for web services to offer free tiers, and that they are responsible for a massive crime wave including fraud, theft, tax evasion, funding of rogue states such as North Korea, drug smuggling, and even as documented by Jameson Lopp's list of physical attacks, armed robbery, kidnapping, torture and murder.

[Slide 3: Alecus]
Alecus, El Diario de Hoy
The attempt to force El Salvador's population to use cryptocurrency is a fiasco. They offer no significant social benefit beyond speculation; Igor Makarov and Antoinette Schoar write:
90% of transaction volume on the Bitcoin blockchain is not tied to economically meaningful activities but is the byproduct of the Bitcoin protocol design as well as the preference of many participants for anonymity. ... the vast majority of Bitcoin transactions between real entities are for trading and speculative purposes
...
exchanges play a central role in the Bitcoin system. They explain 75% of real Bitcoin volume
...
Our results do not support the idea that the high valuation of cryptocurrencies is based on the demand from illegal transactions. Instead, they suggest that the majority of Bitcoin transactions is linked to speculation.
[Slide 4: "Transaction" Rate]
Source
Bitcoin is only processing around 27K "economically meaningful" transactions/day. And 75% of those are transactions between exchanges, so only 2.5% of the "transactions" are real blockchain-based transfers involving individuals. That's less than 5 per minute.

What are the causes of these costs that cryptocurrency users are happy to impose on the rest of us? Fundamentally, they arise from four attributes that cryptocurrencies promise, but in practice don't guarantee:
  • Decentralization
  • Immutability
  • Trustlessness
  • Anonymity

Decentralization

Nakamoto's motivation for Bitcoin was distrust of institutions, especially central banks. When it launched in the early stage of the Global Financial Crisis, this had resonance. The key to a system that involves less trust is decentralization.

[Slide 5: Resilience]
Source
Why do suspension bridges have stranded cables not solid rods? The major reason is that solid rods would fail suddenly and catastrophically, whereas stranded cables fail slowly and make alarming noises while they do. We build software systems out of solid rods; they fail abruptly and completely. Most are designed to perform their tasks as fast as possible, so that when they are compromised, they perform the attacker's tasks as fast as possible. Changing this, making systems that are resilient, ductile like copper not brittle like glass, is an extraordinarily difficult problem in software engineering.

I got interested in it when, burnt out after three startups all of which IPO-ed, I started work at the Stanford Library on the problem of keeping digital information safe for the long term. This work won my co-authors and I a "Best Paper" award at the prestigious 2003 Symposium on Operating System Principles for a decentralized consensus system using Proof-of-Work. When, five years later, Satoshi Nakamoto published the Bitcoin protocol, a cryptocurrency based on a decentralized consensus mechanism using proof-of-work, I was naturally interested in how it turned out.

Decentralization is a necessary but insufficient requirement for system resilience. Centralized systems have a single locus of control. Subvert it, and the system is at your mercy. It only took six years for Bitcoin to fail Nakamto's goal of decentralization, with one mining pool controlling more than half the mining power. In the seven years since no more than five pools have always controlled a majority of the mining power.

[Slide 6: Economies of Scale]
Source
In 2014 I wrote Economies of Scale in Peer-to-Peer Networks, explaining the economic cause of this failure. Briefly, this is an example of the phenomenon described by W. Brian Arthur in 1994's Increasing returns and path dependence in the economy. Information technologies have strong economies of scale, so the larger the miner the lower their costs, and thus the greater their profit, and thus the greater their market share.

"Blockchain" is unfortunately a term used to describe two completely different technologies, which have in common only that they both use a data structure called a Merkle Tree, commonly in the form patented by Stuart Haber and Scott Stornetta in 1990. This is a linear chain of blocks each including the hash of the previous block. Permissioned blockchains have a central authority controlling which network nodes can add blocks to the chain, and are thus not decentralized, whereas permissionless blockchains such as Bitcoin's do not; this difference is fundamental:
  • Permissioned blockchains can use well-established and relatively efficient techniques such as Byzantine Fault Tolerance, and thus don't have significant carbon footprints. These techniques ensure that each node in the network has performed the same computation on the same data to arrive at the same state for the next block in the chain. This is a consensus mechanism.
  • In principle each node in a permissionless blockchain's network can perform a different computation on different data to arrive at a different state for the next block in the chain. Which of these blocks ends up in the chain is determined by a randomized, biased election mechanism. For example, in Proof-of-Work blockchains such as Bitcoin's a node wins election by being the first to solve a puzzle. The length of time it takes to solve the puzzle is random, but the probability of being first is biased, it is proportional to the compute power the node uses. Initially, because of network latencies, nodes may disagree as to the next block in the chain, but eventually it will become clear which block gained the most acceptance among the nodes. This is why a Bitcoin transaction should not be regarded as final until it is six blocks from the head.
[Slide 7: Blockchain Patent Filed 1990]
Source
Discussing "blockchains" and their externalities without specifying permissionless or permissioned is meaningless, they are completely different technologies. One is 30 years old, the other is 13 years old.

Why are economies of scale a fundamental problem for decentralized systems? Because there is no central authority controlling who can participate, decentralized consensus systems must defend against Sybil attacks, in which the attacker creates a majority of seemingly independent participants which are secretly under his control. The defense is to ensure that the reward for a successful Sybil attack is less than the cost of mounting it. Thus participation must be expensive, and so will be subject to economies of scale. They will drive the system to centralize. So the expenditure in attempting to ensure that the system is decentralized is a futile waste.

Most cryptocurrencies impose these costs, as our earlier system did, using Proof-of-Work. It was a brilliant idea when Cynthia Dwork and Moni Naor originated it in 1992, being both simple and effective. But when it is required to make participation expensive enough for a trillion-dollar cryptocurrency it has an unsustainable carbon footprint.

[Slide 8: Bitcoin Energy Consumption]
Source
The leading source for estimating Bitcoin's electricity consumption is the Cambridge Bitcoin Energy Consumption Index, whose current central estimate is 117TWh/year.

Adjusting Christian Stoll et al's 2018 estimate of Bitcoin's carbon footprint to the current CBECI estimate gives a range of about 50.4 to 125.7 MtCO2/yr for Bitcoin's opex emissions, or between Portugal and Myanmar. Unfortunately, this is likely to be a considerable underestimate. Bitcoin's growing e-waste problem by Alex de Vries and Christian Stoll concludes that:
Bitcoin's annual e-waste generation adds up to 30.7 metric kilotons as of May 2021. This level is comparable to the small IT equipment waste produced by a country such as the Netherlands.
That's an average of one whole MacBook Air of e-waste per "economically meaningful" transaction.

[Slide 9: Facebook & Google Carbon Footprints]
Source
The reason for this extraordinary waste is that the profitability of mining depends on the energy consumed per hash, and the rapid development of mining ASICs means that they rapidly become uncompetitive. de Vries and Stoll estimate that the average service life is less than 16 months. This mountain of e-waste contains embedded carbon emissions from its manufacture, transport and disposal. These graphs show that for Facebook and Google data centers, capex emissions are at least as great as the opex emissions[1].

Cryptocurrencies assume that society is committed to this waste of energy and hardware forever. Their response is frantic greenwashing, such as claiming that because Bitcoin mining allows an obsolete, uncompetitive coal-burning plant near St. Louis to continue burning coal it is somehow good for the environment[2].

But, they argue, mining can use renewable energy. First, at present it doesn't. For example, Luxxfolio implemented their commitment to 100% renewable energy by buying 15 megawatts of coal-fired power from the Navajo Nation!.

Second, even if it were true that cryptocurrencies ran on renewable power, the idea that it is OK for speculation to waste vast amounts of renewable power assumes that doing so doesn't compete with more socially valuable uses for renewables, or indeed for power in general.

[Slide 10: Energy Return on Investment]
Delannoy et al Fig 2
Right now the world is short of power; one major reason that China banned cryptocurrency mining was that they needed their limited supplies of power to keep factories running and homes warm. Shortage of energy isn't a short-term problem. This graph is from Peak oil and the low-carbon energy transition: A net-energy perspective by Louis Delannoy et al showing that as the easiest deposits are exploited first, the Energy Return On Investment, measuring the fraction of the total energy extracted delivered to consumers, decreases.

[Slide 11: Oil Energy Gross vs. Net]
Delannoy et al Fig 1
Delannoy et al's Figure 1 shows the gross and net oil energy history and projects it to 2050. The gross energy, and thus the carbon emission, peaks around 2035, but because the energy used in extraction (the top yellow band) increases rapidly, the net energy peaks in about 5 years.


[Slide 12: CO2 Emission Trajectories]
Source
This is a problem for two reasons. If society is to survive:
  • Carbon emissions need start decreasing now, not in a decade and a half.
  • Renewables need to be deployed very rapidly.
Deploying renewables consumes energy, which is paid back during their initial operation. Thus during the transition to renewable power it consumes energy, reducing that available for other uses[3]. The world cannot afford to waste a Netherlands' worth of energy on speculation that could instead be deploying renewables.


If cryptocurrency speculation is to continue, it needs to vastly reduce its carbon footprint by eliminating Proof-of-Work. The two major candidates are Proof-of-Space-and-Time and Proof-of-Stake. Unfortunately, as I detail in Alternatives To Proof-of-Work, both lack the simplicity and effectiveness of Proof-of-Work.

Proof-of-Space-and-Time attempts to make participation expensive by wasting storage instead of computation. The highest-profile such effort is Bram Cohen's Chia, funded by Andreesen Horowitz, the "Softbank of crypto". Chia's "space farmers" create and store "plots" consisting of large amounts of otherwise useless data.

[Slide 13: Chia]
Source
The software was ingenious, but the design suffered from naiveté about storage media and markets. When it launched in May, gullible farmers rushed to buy hard disks and SSDs. By July, the capital tied up in farming hardware was around six times the market cap of the Chia coin. Chia's CEO described the result:
"we've kind of destroyed the short-term supply chain"
Disk vendors were forced to explain that Chia farming voided the media's warranty. Just as with GPUs, the used market was flooded with burnt-out storage. Chia's coin initially traded at $1934 before dropping more than 90% — last I looked it was $109. I expect A16Z made money, but everyone else had to deal with the costs. Chia doesn't use much electricity, more to do with failure than with the technology, but does have a major e-waste problem.

[Slide 14: Proof of Stake Sucks]
Bram Cohen's opinion
The costs that Proof-of-Stake imposes to make participation expensive are the risk of loss of and the foregone interest on the "stake", an escrowed amount of the cryptocurrency itself. This has two philosophical problems:
  • It isn't just that the Gini coefficients of cryptocurrencies are extremely high[4], but that Proof-of-Stake makes this a self-reinforcing problem. Because the rewards for mining new blocks, and the fees for including transactions in blocks, flow to the HODL-ers in proportion to their HODL-ings, whatever Gini coefficient the systems starts out with will always increase. Proof-of-Stake isn't effective at decentralization.
  • Cryptocurrency whales are believers in "number go up". The eventual progress of their coin "to the moon!" means that the temporary costs of staking are irrelevant.
There are also a host of severe technical problems. The accomplished Ethereum team have been making a praiseworthy effort to overcome them for more than 7 years and are still more than a year away from being able to migrate off Proof-of-Work. Among the problems is that at intervals Proof-of-Stake blockchains need to achieve consensus on checkpoints, using a different consensus mechanism from that used to add blocks. I discuss 16 of these problems in Alternatives To Proof-of-Work.

[Slide 15:Centralization Risk]
Yulin Cheng wrote:
According to the list of accounts powered up on March. 2, the three exchanges collectively put in over 42 million STEEM Power (SP).

With an overwhelming amount of stake, the Steemit team was then able to unilaterally implement hard fork 22.5 to regain their stake and vote out all top 20 community witnesses – server operators responsible for block production – using account @dev365 as a proxy. In the current list of Steem witnesses, Steemit and TRON’s own witnesses took up the first 20 slots.
Vitalik Buterin pointed out that lack of decentralization was a security risk in 2017, and this was amply borne out last year when Justin Sun conspired with three exchanges, staking their customers coins to take over the Steem Proof-of-Stake blockchain. Pushing back against the economic forces centralizing these systems is extremely difficult.

The last time Ethereum attempted to migrate the mining technology, in 2016 to fix the bug that enabled the DAO disaster, a fraction of the miners refused the upgrade[5]. The great block-size debate showed how resistant Bitcoin is to technical change. Even if a low-carbon alternative to Proof-of-Work were as effective it would likely not be adopted in the face of sunk costs and risk-averse investors.

[Slide 16: Top 2 ETH Pools = 53.9%]
Source
The advantage of permissionless over permissioned blockchains is claimed to be decentralization. How has that worked out in practice?

As has been true for the last seven years, no more than five mining pools control the majority of the Bitcoin mining power and last month two pools controlled the majority of Ethereum mining. Makarov and Schoar write:
Six out of the largest mining pools are registered in China and have strong ties to Bitmain Techonologies, which is the largest producer of Bitcoin mining hardware, The only non-Chinses [sic] pool among the largest pools is SlushPool, which is registered in the Czech Republic.
[Slide 17: Centralized Mining]
Makarov and Schoar write:
Bitcoin mining capacity is highly concentrated and has been for the last five years. The top 10% of miners control 90% and just 0.1% (about 50 miners) control close to 50% of mining capacity. Furthermore, this concentration of mining capacity is counter cyclical and varies with the Bitcoin price. It decreases following sharp increases in the Bitcoin price and increases in periods when the price drops ... the risk of a 51% attack increases in times when the Bitcoin price drops precipitously or following the halving events.
It isn't just the mining pools that are centralized. The top 10% of miners control 90% and just 0.1% (about 50 miners) control close to 50% of mining capacity. This centralization doesn't just increase the system's technical risk, but also its legal risk. The reason is that in almost all cryptocurrencies a transaction wishing to be confirmed is submitted to a public "mempool" of pending transactions. The mining pools choose transactions from there to include in the blocks they attempt to mine. This, as Nicholas Weaver points out, means that mining pools are providing money transmission services under US law:
[Slide 18: 31 CFR § 1010.100]
The term "money transmission services" means the acceptance of currency, funds, or other value that substitutes for currency from one person and the transmission of currency, funds, or other value that substitutes for currency to another location or person by any means.
Thus, in the US, they are required to follow the Anti-Money Laundering/Know Your Customer (AML/KYC) rules as enforced by the Financial Crimes Enforcement Network (FinCEN)[6]. The only pool to try following them:
stopped doing this because the larger Bitcoin community objects to the idea of attempting to restrict Bitcoin to legal uses!
Most countries follow FinCEN's lead because the penalty for not doing so can be loss of access to the Western world's banking system.

As Adem Efe Gencer et al pointed out:
a Byzantine quorum system of size 20 could achieve better decentralization than proof-of-work mining at a much lower resource cost.
Thus the only reason for the massive carbon footprint of Proof-of-Work and the complexity and risk of the alternatives is to maintain the illusion of decentralization. Alas, it is unlikely that any alternative defense against Sybil attacks will be widely enough adopted to mitigate Proof-of-Work's carbon emissions.

Immutability

[Slide 19: Immutability]
Source
Immutability is one of the two things that make the cryptocurrency crime wave so effective. These systems are brittle, make a single momentary mistake and your assets are irretrievable.

Immutability sounds like a great idea when everything is going to plan, but in the real world mistakes are inevitable. Lets take a few recent examples — the $23M fee Bitfinex paid for a $100K transaction, or the $19M oopsie at Indexed Finance, or the $31M oopsie at MonoX, or the $90M oopsie at Compound and the subsequent $67M oopsie, all of which left the perpetrators pleading with the benficiaries to return the loot. And in Compound's case threatening its customers with the ultimate crypto punishment, reporting them to the IRS. $12B in DeFi thefts so far, or about 5% of all the funds[7].

[Slide 20: Trammell Hudson]
Source
Vulnerabilities are equally inevitable, as we see with the $38M, $19M and $130M hacks of Cream Finance this year, last week's $115M hack of BadgerDAO, Sunday's $196M hack of BitMart,and of course the $600M hack of Poly Network. As Trammell Hudson says, "Smart contracts should be considered self-funded bug-bounty platforms".

The centralization of Ethereum's mining pools and exchanges allowed Poly Network to persuade them to blacklist the addresses involved. This made it very difficult for the miscreant to escape with the loot, much of which was returned. But it also vividly demonstrated that in most blockchains it is the mining pools that decide which transactions make it into a block, and are thus executed. The small number of dominant mining pools can effectively prevent addresses from transacting, and can prioritize transactions from favored addresses. They can also allow transactions to avoid the public mempool, to prevent them being front-run by bots. This turned out to be useful when a small group of white hats discovered a vulnerability in a smart contract holding $9.6M.

The key point of Escaping the Dark Forest, Samczsun's account of their night's work, is that, after the group spotted the vulnerability and built a transaction to rescue the funds, they could not put the rescue transaction in the public mempool because it would have been front-run by a bot. They had to find a miner who would put the transaction in a block without it appearing in the mempool. In other words, their transaction needed a dark pool. And they had to trust the cooperative miner not to front-run it.

[Slide 21: Ether Mining Pools 11/02/20]
Ether miners 11/2/20
Ethereum is, fortunately, very far from decentralized, being centralized around a small number of large pools. Thus, the group needed a trusted pool not an individual miner. At the time, the three largest pools mined more than half the blocks between them, so only three calls would have been needed to have a very good chance that the transaction would appear in one of the next few blocks.

Trustlessness

Just as economics forces theoretically decentralized blockchain-based systems in practice to be centralized, economics forces theoretically trustless blockchain-based systems in practice to require trusting third parties. As with the equity markets trusted third parties are needed to run "dark pools" to prevent trades being front-run by bots. The lure of profit means that sometimes this trust will be misplaced. For example, Barclays was fined $70M for selling high-frequency traders access to its dark pool.

Although there are informal methods like these of recovering from mistakes, they aren't very effective in general, and hardly effective at all in case of crime. Implementing mutability at the blockchain level requires trust, and trust requires a reliable identity for the locus of trust. Most activity in cryptocurrencies actually uses trusted third parties, exchanges, that are layered above the blockchain itself. These use conventional Web-based identities and are routinely compromised. In most cases immutability means the pilfered funds are not recovered.

But, more fundamentally, the entire cryptocurrency ecosystem depends upon a trusted third party, Tether, which acts as a central bank issuing the "stablecoins" that cryptocurrencies are priced against and traded in[8]. This is despite the fact that Tether is known to be untrustworthy, having consistently lied about its reserves.

Anonymity

[Slide 22: Anonymity]
Makarov and Schoar write[9]:
First, non-KYC entities serve as a gateway for money laundering and other gray activities.
...
Second, even if KYC entities were restricted to deal exclusively with other KYC entities, preventing inflows of tainted funds would still be nearly impossible, unless one was willing to put severe restrictions on who can transact with whom
...
Finally, notice that while transacting in cash and storing cash involve substantial costs and operational risks, transacting in cryptocurrencies and storing them are essentially costless (apart from fluctuation in value).
The other main enabler of the cryptocurrency crime spree is the prospect of transactions that aren't merely immutable but are also anonymous. Anonymity for small transactions is important, but for large transactions it provides the infrastructure for major crime. In the physical world cash is anonymous, but it has the valuable property that the cost and difficulty of transacting increase strongly with size. KYC/AML and other regulations leverage this. Cryptocurrencies lack this property. The ease with which cryptocurrency can be transferred between institutions that do, and do not, observe the KYC/AML regulations means that absent robust action by the US, the KYC/AML regime is doomed.

[Slide 23: The Coming Ransomware Storm]
Stephen Diehl writes in The Oncoming Ransomware Storm:
Go to your local bank branch and try to wire transfer $200,000 to an anonymous stranger in Russia and see how that works out. Modern ransomware could not exist without Bitcoin, it has poured gasoline on a fire we may not be able to put out.

When you create a loophole channel (however flawed) for parties to engage in illicit financing of anonymous entities beyond the control of law enforcement, it turns out a lot of shady businesses models that are otherwise prevented move from being impractical and risky to perversely incentivized. Ransomware is now very lucrative to the point where there is a whole secondary market of vendors selling Ransomware as a Service picks and shovels to the criminals.
The most serious crime enabled by anonymity is ransomware, which is regularly crippling essential infrastructure such as oil pipelines and hospital systems, to say nothing of the losses to business large and small. This business is estimated to gross $5.2B/year and is growing rapidly, aided by a network of specialist service providers. This is just the ransom payments, the actual externalities include the much larger costs of recovering from the attacks.

There are cryptocurrencies that provide almost complete anonymity using sophisticated cryptography[10]. For example Monero:
Observers cannot decipher addresses trading monero, transaction amounts, address balances, or transaction histories.
Monero has become the cryptocurrency of choice for major ransomware gangs, who charge 10% extra for payment in Bitcoin, and plan to insist on Monero in future. It is also the coin of choice for crypto-mining malware, as it is also ASIC-resistant.

Bitcoin and similar cryptocurrencies are pseudonymous not anonymous. Anyone can create and use an essentially unlimited number of pseudonyms (addresses), but transactions and balances using them are public. A newly minted pseudonym cannot be deanonymized, but as it becomes enmeshed in the public web of transactions maintaining anonymity takes more operational security than most users can manage.

Users are aware of the risk that their transactions can be traced, so many engage in wash transactions between addresses they control, and use mixers and tumblers to mingle their coins with those of other miscreants. Because it is almost impossible to actually buy legal goods with Bitcoin, at some point a HODL-er needs to use an exchange to obtain fiat currency[11]. This risks having their identity connected into the web of transactions on the blockchain. Makarov and Schoar conclude:
90% of transaction volume on the Bitcoin blockchain is not tied to economically meaningful activities but is the byproduct of the Bitcoin protocol design as well as the preference of many participants for anonymity.
In other words, 90% of Bitcoin's carbon footprint is used in a partially successful attempt to compensate for its deficient anonymity.

Because there are existing alternatives that provide greatly increased anonymity, attempts to mitigate the externalities of pseudonoymous cryptocurrencies are likely to be self-defeating. As the ransomware industry shows, users will migrate to these alternatives, reducing the effectiveness of chain analysis.

Conclusion

The prospects for mitigating each of the four attributes are dismal:
  • Decentralization: although the techniques used to implement decentralization are effective in theory, at scale emergent economic effects render them ineffective. Despite this, decentralization is fundamental to the cryptocurrency ideology, making mitigation of its externalities effectively impossible.
  • Immutability: although mutability is necessary in the real world of mistakes and crime, implementing it in a decentralized system and thereby mitigating its externalities is an unsolved problem.
  • Trustlessness: even if you think this would be a good thing, it is impractical[12].
  • Anonymity: attempts to mitigate its externalities are likely to be self-defeating.
Thus it seems highly unlikely that any effort to mitigate cryptocurrencies' externalities would succeed[13].

[Slide 24: Conclusions]
Thus we can conclude that:
  1. Permissioned blockchains do not need a cryptocurrency to defend against Sybil attacks, and thus do not have significant externalities.
  2. Permissionless blockchains require a cryptocurrency, and thus necessarily impose all the externalities I have described except the carbon footprint.
  3. If successful, permissionless blockchains using Proof-of-Work, or any other way to waste a real resource as a Sybil defense, have unacceptable carbon footprints.
  4. Whatever Sybil defense they use, economics forces successful permissionless blockchains to centralize; there is no justification for wasting resources in a doomed attempt at decentralization.
I've talked for about half an hour, but the answer to the question "Can We Mitigate The Externalities Of Cryptocurrencies?" could have been immediately deduced from Betteridge's Law of Headlines, which states:
Any headline that ends in a question mark can be answered by the word no.
Given this, and the fact that cryptocurrencies are designed to resist harm reduction by regulation, the correct policy response is to follow the Chinese example and make cryptocurrencies illegal.

Thank you for your attention, I'm ready for questions.

Update December 9th: I realized too late that I omitted an important step in the argument. Let me expand upon it here.

Because participation in a permissionless blockchain must be expensive to ensure that the reward for a Sybil attack is less than the cost of mounting it, miners have to be reimbursed for their expensive efforts. There is no central authority capable of collecting funds from users and distributing them to the miners in proportion to their efforts. Thus miners' reimbursement must be generated organically by the blockchain itself. Thus a permissionless blockchain needs a cryptocurrency to be secure.

Because miners' opex and capex costs cannot be paid in the blockchain's cryptocurrency, exchanges are required to enable the rewards for mining to be converted into fiat currency to pay these costs. Someone needs to be on the other side of these sell orders. The only reason to be on the buy side of these orders is the belief that "number go up". Thus the exchanges need to attract speculators in order to perform their function.

Thus a permissionless blockchain requires a cryptocurrency to function, and this cryptocurrency requires speculation to function.

My apologies for leaving this out

Update December 24th: Nicholas Weaver tweeted this:
Source

Source
Update January 1st: The Economist's Holiday Double Issue features The most powerful people in crypto. It profiles Sam Bankman-Fried of FTX, Changpeng Zhao of Binance, Arthur Hayes of BitMEX, and Brian Armstrong of Coinbase, and includes these revealing graphs which demonstrate:
In practice, if you use cryptocurrencies you trust the four of them and their exchanges; they control most of the trading. This once again illustrates the lack of decentralization and trustlessness in cryptocurrencies. The Economist writes:
All four have amassed multi-billion-dollar fortunes, and huge influence, in just a few years. In conventional finance, where money is commonly borrowed, spent or saved, the most powerful intermediaries are bankers, payment firms and asset managers. But private currencies today are mostly used to speculate, which makes exchange bosses, who provide punters with the tools and venues to trade, the kings of a world whose raison d’être, paradoxically, is to do away with mighty middlemen.

End Notes

  1. Ethereum mining adds another 23.7TWh/yr (16.5 to 32 range) for about 6.9MtCO2/yr, according to Kyle McDonald.

    Doubling the carbon footprint to account for embedded emissions would put Bitcoin between Zimbabwe and Thailand. It would put Ethereum between Uruguay and Yemen, but it is likely that this would be an over-estimate, since GPUs are likely to have a somewhat longer economic life.

    Source
    Note the hockey-stick on these graphs. I wrote:
    In 2017 Facebook and Google changed their capex footprint disclosure practice, resulting in an increase of 7x for Google and 12x for Facebook. It is safe to assume that neither would have done this had they believed the new practice greatly over-estimated the footprint.
    If Google and Facebook are correctly measuring their capex emissions, and if they are representative of miners' capex emssions, cryptocurrencies' carbon footprints are vastly more than double that from their opex emssions alone.
  2. And lobbying. See, for example, the way the climate aspects of "Build Back Better" were crippled to facilitate the plant that is the sole customer of the company that pays Joe Manchin $500K/year transitioning to burning Manchin's waste coal to mine cryptocurrency.
  3. Sweden's regulators make this point in an open letter to the EU:
    Sweden needs the renewable energy targeted by crypto-asset producers for the climate transition of our essential services, and increased use by miners threatens our ability to meet the Paris Agreement. Energy-intensive mining of crypto-assets should therefore be prohibited. This is the conclusion of the director generals of both the Swedish Financial Supervisory Authority and the Swedish Environmental Protection Agency.
    And the Norwegians agree.
  4. Makarov and Schoar write:
    We show that the balances held at intermediaries have been steadily increasing since 2014. By the end of 2020 it is equal to 5.5 million bitcoins, roughly one-third of Bitcoin in circulation. In contrast, individual investors collectively control 8.5 million bitcoins by the end of 2020. The individual holdings are still highly concentrated: the top 1000 investors control about 3 million BTC and the top 10,000 investors own around 5 million bitcoins.
  5. Five years after Ethereum Classic became the remainder of the vulnerable currency, the result was:
    from the beginning of March to the beginning of May, the value of Ethereum Classic had shot up by over 1,000 percent. It jumped from about $12 a token to over $130.
  6. David Gerard provides a comprehensive overview of the latest "regulatory clarity" on cryptocurrencies from the international and US government agencies:
    • The Financial Action Task Force issued Updated Guidance for a Risk-Based Approach for Virtual Assets and Virtual Asset Service Providers. Gerard writes
      The October 2021 revision is to clarify definitions, give guidance on stablecoins, note the issues of peer-to-peer transactions, and clarify the travel rule, which requires VASPs to collect and pass on information about their customers.

      VASPs include crypto exchanges, crypto transfer services, crypto custody and financial services around crypto asset issuance (e.g., ICOs). VASPs must do full Know-Your-Customer (KYC), just like any other financial institution.
      As regard peer-to-peer transactions, Gerard writes:
      Jurisdictions should assess the local risks from peer-to-peer transactions, and possibly adopt optional provisions, such as restricting direct deposit of cryptos with VASPs (paragraphs 105 and 106) — Germany and Switzerland have already considered such rules.
    • The US Office of Foreign Assets Control's Sanctions Compliance Guidance for the Virtual Currency Industry explains that:
      Members of the virtual currency industry are responsible for ensuring that they do not engage, directly or indirectly, in transactions prohibited by OFAC sanctions, such as dealings with blocked persons or property, or engaging in prohibited trade- or investment-related transactions.
      In particular, US miners are required to blacklist wallets suspected of being owned by sanctioned entities. Gerard writes:
      Sanctions are strict liability — you can be held liable even if you didn’t know you were dealing with a sanctioned entity. Penalties can be severe, but OFAC recommends voluntary self-disclosure in case of errors, and this can mitigate penalties. You will be expected to correct the root cause of the violations.
    • The US Financial Crimes Enforcement Network issued Advisory on Ransomware and the Use of the Financial System to Facilitate Ransom Payments. Gerard writes:
      Insurers and and “digital forensic and incident response” companies have been getting more directly involved in ransomware payments — even paying out the ransoms. FinCEN expects such companies to: (a) register as money transmitters; (b) stop doing this.

      A lot of ransomware gangs are sanctioned groups or individuals. Payments to them are sanctions violations.
      The Federal Reserve, the FDIC and the OCC have joined the party with a Joint Statement on Crypto-Asset Policy Sprint Initiative and Next Steps. They:
      plan to provide greater clarity on whether certain activities related to crypto-assets conducted by banking organisations are legally permissible, and expectations for safety and soundness, consumer protection, and compliance with existing laws and regulations
  7. In Really stupid “smart contract” bug let hackers steal $31 million in digital coin, Dan Goodin reports that:
    blockchain-analysis company Elliptic said so-called DeFi protocols have lost $12 billion to date due to theft and fraud. Losses in the first roughly 10 months of this year reached $10.5 billion, up from $1.5 billion in 2020.
    That is ~5% of the $237B locked up in DeFi.
  8. But the only significant social benefit of cryptocurrencies is rampant speculation, mostly in an enormous Bitcoin futures market using up to 125x leverage, based on a Bitcoin-Tether market about one-tenth the size, based on a Bitcoin-USD market about one-tenth the size again. The Bitcoin-Tether market is highly concentrated, easily manipulated and rife with pump-and-dump schemes.

    A New Wolf in Town? Pump-and-Dump Manipulation in Cryptocurrency Markets by Anirudh Dhawan and Tālis J. Putniņš finds:
    Combining hand-collected data with audited data from a pump-and-dump aggregator, we identify as many as 355 cases of pump-and-dump manipulation within a period of six months on two cryptocurrency exchanges. Up to 23 million individuals are involved in these manipulations. We estimate that the 355 pumps in our sample are associated with approximately $350 million of trading on the manipulation days, and that manipulators extract profits of approximately $6 million from other participants. In all, 197 distinct cryptocurrencies or “coins” are manipulated, which implies that approximately 15% of all coins in our sample of exchanges are targeted by manipulators at least once in the six-month period. There are, on average, two pumps per day. This rate of manipulation is considerably higher than pump-and-dump manipulation in stock markets in recent decades.
    See also this post on the strange fact that:
    The futures curve for Bitcoin has been permanently upward sloping in Contango pretty much since inception, back in 2017 meaning that the price of the future asset is higher than the spot price of the asset for pretty much 4 years
    ...
    The implication that this arbitrage opportunity persistently exists and is not hammered by investors until it closes, is that there is some form of market dislocation or systemic credit risk that cannot be properly quantified or hedged.
    And on Celsius' offer of 17% interest on BTC loans, which clearly indicates a high degree of risk. Note that:
    Yaron Shalem, the chief financial officer of cryptocurrency lending platform Celsius, was one of the seven people arrested in Tel Aviv this month in connection with Israeli crypto mogul Moshe Hogeg
  9. Transaction fees make Makarov and Schoar's claim that "transacting in cryptocurrencies and storing them are essentially costless" false. The demand for transactions is variable, but the supply is fixed. Pending transactions bid their fees in a blind auction for inclusion in a block. The result is that when no-one wants to transact fees are low and when everyone does they spike enormously.

    BTC transaction fees
    The graph shows that as the Bitcoin "price" spiked to $63K in April the frenzy drove the average fee per transaction over $60. User's lack of understanding of transaction fees is illustrated by Jordan Pearson and Jason Koebler's ‘Buy the Constitution’ Aftermath: Everyone Very Mad, Confused, Losing Lots of Money, Fighting, Crying, Etc.:
    The community of crypto investors who tried and failed to buy a copy of the U.S. Constitution last week has descended into chaos as people are realizing today that roughly half of the donors will have the majority of their investment wiped out by cryptocurrency fees.
    Apparently, fees averaged $50/transaction, and the $40M raised paid about $1M in fees. That is 2.5%, very similar to the "extortionate" fees charged by credit card companies that cryptocurrency enthusiasts routinely decry.

    Source
    Vitalik Buterin has a proposal that attempts to paper over the fundamental problem of fixed supply and variable demand, as Ruholamin Haqshanas reports in Vitalik Buterin Proposes New EIP to Tackle Ethereum’s Sky-High Gas Fees:
    Vitalik Buterin has put forward a new Ethereum Improvement Proposal (EIP) that aims to tackle the network's gas fee problems by adding a limit on the total transaction calldata, which would, in turn, should reduce transaction gas cost.

    Since Ethereum can only process 15 transactions per second, gas fees tend to spike at times of network congestion. On November 9, the average transaction network fee reached USD 62 per transaction. As of now, Ethereum transactions cost around USD 44,
  10. With the Taproot soft fork, explained in WHY YOU SHOULD CARE ABOUT TAPROOT, THE NEXT MAJOR BITCOIN UPGRADE, Bitcoin is making transactions slightly more difficult to trace, but still not offering the anonymity of Monero:
    The Taproot upgrade improves this logic by introducing Merklelized Abstract Syntax Trees (MAST), a structure that ultimately allows Bitcoin to achieve the goal of only revealing the contract's specific spending condition that was used.

    There are two main possibilities for complex Taproot spending: a consensual, mutually-agreed condition; or a fallback, specific condition. For instance, if a multisignature address owned by multiple people wants to spend some funds programmatically, they could set up one spending condition in which all of them agree to spend the funds or fallback states in case they can't reach a consensus.

    If the condition everyone agrees on is used, Taproot allows it to be turned into a single signature. Therefore, the Bitcoin network wouldn't even know there was a contract being used in the first place, significantly increasing the privacy of all of the owners of the multisignature address.

    However, if a mutual consensus isn't reached and one party spends the funds using any of the fallback methods, Taproot only reveals that specific method. As the introduction of P2SH increased the receiver's privacy by making all outputs look identical — just a hash — Taproot will increase the sender's privacy by restricting the amount of information broadcast to the network.

    Even if you don't use complex wallet functionality like multisignature or Lightning, improving their privacy also improves yours, as it makes chain surveillance more difficult and increases the broader Bitcoin network anonymity set.
  11. Whales can't get the face value of their HODL-ings. Last Friday the price crashed 20% in minutes. David Gerard writes:
    Someone sold 1,500 BTC, and that triggered a cascade of sales of burnt margin-traders’ collateral of another 4,000 BTC. The Tether peg broke too.
    That is 0.03% of the stock of BTC. Gerard writes:
    The real story is that the whales — “large institutional trading firms,” ... want (or need) to realise the face value of their bitcoins, and they can’t, because there just aren’t enough actual dollars in the market. This is the same reason miners are keeping a “stockpile” of unsaleable bitcoins, as I’ve noted previously.

    So the whales are going to Goldman Sachs to ask for a loan backed by their unsaleable bitcoins, even though the collateral can’t possibly cover for the value of the loan even if Bitcoin doesn’t crash.
  12. Here is a list of institutions that a real-world user of cryptocurrencies as they actually exist has to trust:
    • The owners and operators of the dominant mining pools not to collude.
    • The operators of the exchanges not to manipulate the markets or to commit fraud.
    • The core developers of the blockchain software not to write bugs.
    • The developers of your wallet software not to write bugs.
    • The developers of the exchanges not to write bugs.
    And, if your cryptocurrency has Ethereum-like "smart contracts":
    • The developers of your "smart contracts" not to write bugs.
    • The owners of the smart contracts to keep their secret key secret.
    Every one of these has examples where trust was misplaced.
  13. In the medium term, Bitcoin and many other cryptocurrencies face two technological threats that might disrupt them and thus provide partial mitigation:
    Source
    • Quantum computing. Quantum attacks on Bitcoin, and how to protect against them by Divesh Aggarwal et al describes two threats they pose in principle:
      • They can out-perform existing ASICs at Proof-of-Work, but it is likely to be many years before this threat is real.
      • They can use Shor's algorithm to break the encryption used for cryptocurrency wallets, allowing massive theft. Aggarwal et al track the likely date for this, currently projecting between 2029 and 2044. When it happens there will be an estimated 4.6 million Bitcoins up for grabs.
    • The halvening. At regular intervals Bitcoin's mining rewards are halved, with the goal that the currency eventually become fee-only. Alas, Raphael Auer shows that a fee-only system is insecure.

New Year, New Cycles, New Platform / In the Library, With the Lead Pipe

Update from the In the Library with the Lead Pipe Editorial Board

In 2022, we will be refreshing, upgrading, and relaunching on a new platform and with new cycles for article submission and publication. To get ready for these big changes, we will pause submissions on January 15, 2022 at 11:59pm Hawaii Standard Time. (That means January 16, 2022 at 1:59am Pacific, 4:59am Eastern, and 9:59am Greenwich Mean Time.)

Accepted articles submitted by this deadline will continue to be published on a rolling basis while we move to the new platform.

Beginning in late spring 2022, we will again accept new submissions. Instead of accepting rolling submissions, we’ll move to a more structured cycle of three submission windows a year.

In the Library with the Lead Pipe‘s Editorial Board has a longstanding practice of transparently documenting our journal’s procedures and any changes. We’re making these upcoming changes in order to streamline our editorial processes, making them more sustainable for both editors and authors. We’re very excited that the new platform will provide additional features.

Watch this space for updates and news about our relaunch and publication timelines!

Refactoring DLTJ, Winter 2021 Part 2.5: Fixing the Webmentions Cache / Peter Murray

Okay, a half-step backward to fix something I broke yesterday. As I described earlier this year, this static website blog uses the Webmention protocol to notify others when I link to their content and receive notifications from others. Behind the scenes, I’m using the Jekyll plugin called jekyll-webmention_io to integrate Webmention data into my blog’s content. Each time the contents of this site is built, that plug-in contacts the Webmention.IO service to receive its Webmention data. (Webmention.IO holds onto it between Jekyll builds since there is no always-on “dltj.org” server to receive notifications from others.) The plug-in caches that information to ease the burden on the Webmention.IO service.

The previous CloudFormation-based process was using AWS CodeBuild natively, and the Webmention cache was stored in CodeBuild’s caching function. CodeBuild automatically downloads the previous cache into the working directory for each build iteration and then automatically uploads the cache as the build is completed. Handy, right?

Well, AWS Amplify simplifies some of the setup of working with the underlying CodeBuild tool. One of the configuration options that is no longer available is the ability to specify which S3 bucket to use as the CodeBuild cache; so I couldn’t point it at the previous cache files and all of the previous Webmention entries no longer appeared on the blog pages. Fortunately, I hadn’t decommissioned the CloudFormation stuff, so I still had access to the old cache; I was able to extract the four webmention files (but see below for a discussion about that).

Since Amplify doesn’t allow me to have direct access to the CodeBuild cache, I decided it was high time to use a dedicated cache location for these webmention files. To do that took three steps: 1. Create the S3 bucket (with no public access) 2. Add read/write policy for that bucket to the AWS role assigned to the Amplify app 3. Add lines to the amplify.yml file to copy files from the S3 bucket into and out of the working directory

For step 2, the IAM policy for the Amplify role: yaml { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObject", "s3:PutObject", "s3:GetObject", "s3:ListBucket" ], "Resource": "arn:aws:s3:::org.dltj.webmentions-cache" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "s3:ListAllMyBuckets" ], "Resource": "*" } ] }

For the amplify.yml file: yaml version: 1 frontend: phases: preBuild: commands: - aws s3 cp s3://org.dltj.webmentions-cache webmentions-cache --recursive - rvm use $VERSION_RUBY_2_6 - bundle install --path vendor/bundle build: commands: - rvm use $VERSION_RUBY_2_6 - bundle exec jekyll build --trace postBuild: commands: - aws s3 cp webmentions-cache s3://org.dltj.webmentions-cache --recursive artifacts: baseDirectory: _site files: - '**/*' cache: paths: - 'vendor/**/*'

And the webmentions part of the Jekyll _config.yml file: yaml webmentions: cache_folder: webmentions-cache

Contents of the AWS CodeBuild Cache File

Can we do a quick sidebar on the AWS CodeBuild caching mechanism? Because I was not expecting what I saw. The CodeBuild cache S3 bucket contains one file with a UUID as its name. That file is a tar-gzip’d archive of a flat directory containing sequentially numbered files (0 through 8099 in my case) and a codebuild.json table of contents:

json { "version": "1.0", "content": { "files": [ { "path": "vendor/s3deploy.tar.gz", "rel": "src" }, { "path": "vendor/s3deploy", "rel": "src" }, { "path": "vendor/LICENSE", "rel": "src" }, { "path": "vendor/README.md", "rel": "src" }, { "path": "vendor/webmentions", "rel": "src" }, { "path": "vendor/webmentions/received.yml", "rel": "src" }, { "path": "vendor/webmentions/lookups.yml", "rel": "src" }, { "path": "vendor/webmentions/bad_uris.yml", "rel": "src" }, { "path": "vendor/webmentions/outgoing.yml", "rel": "src" }, ... `

Each item in the files array corresponded to the numbered filename in the directory. (In the case of the 4th item in the array—a directory—there was no corresponding file in the tar-gzip archive.) Fortunately, the four files I was looking for were near the top of the list and I didn’t have to go hunting through all eight-thousand-some-odd files to find them. (The s3deploy program is one that I found to intelligently copy modified files from the CodeBuild working directory to the S3 static website bucket.)

I’m really wondering about the engineering requirements for all of this overhead. Why not just use a native tar-gzip archive without the process of parsing the table of contents and renaming the files?

Coming soon to the public domain in 2022 / John Mark Ockerbloom

One of the blessings of what’s been a rough couple of years is that Public Domain Day is now a routine cause for celebration in the United States. For 20 years up to 2019, very little entered the public domain due to a 20-year copyright extension enacted in 1998. But beginning in 2019, we started getting large numbers of works joining the public domain again, and every year from then on I’ve posted here about what’s about to join the public domain in the new year.

As I did last year, I’m posting to Twitter, making one tweet per day featuring a different work about to enter the public domain in the US, using the #PublicDomainDayCountdown hashtag. Most of these works were originally published in 1926. But this year for the first time we’ll also be having a large number of sound recordings joining the public domain for the first time, published in 1922 and before.

Since not everyone reads Twitter, and there’s no guarantee that my tweets will always be accessible on that site, I’ll reproduce them here. (This post will be updated to include new tweets as they appear leading up to 2022, and may be further updated later on to link to copies of some of the featured works, or for other reasons.) The tweet links have been reformatted for the blog, and a few tweets have been recombined or otherwise edited.

If you’d like to comment yourself on any of the works mentioned here, or suggest others I can feature, feel free to reply here or on Twitter. (My account there is @JMarkOckerbloom. You’ll also find some other people tweeting on the #PublicDomainDayCountdown hashtag, and you’re welcome to join in as well.

I’m not the only one doing a public domain preview like this. The Public Domain Review has an Advent-calendar style countdown going as well through the month of December, complete with artwork and information about the featured works longer than what will fit in a single tweet. (They’re featuring works entering the public domain in other countries as well as in the US.). A few other organizations also often publish posts about what’s coming to the public domain, and I may add links to some of their posts as they appear.

Here’s my countdown for 2022:

October 15: A.A. Milne and E. H. Shepard’s book Winnie-the-Pooh was published this week in 1926. It is one of the best-known works set to join the US public domain in 78 days. Follow #PublicDomainDayCountdown to hear about many others from now till January.

October 16: I never read Agatha Christie’s Murder of Roger Ackroyd after reading a spoiler about who infamously did it. But when it joins the US public domain in 77 days, anyone could write new variants, and those could well keep me guessing. #PublicDomainDayCountdown

October 17: “Have had ample time for serious thought and it is my ambition to follow up on my art,” resolved Will James on his release from prison. His Newbery-winning first novel Smoky the Cowhorse joins the public domain in 76 days. #PublicDomainDayCountdown

October 18: Harry Woods’s “When the Red, Red Robin Comes Bob, Bob, Bobbin’ Along” has been performed memorably by many artists (including Al Jolson, Lillian Roth, Doris Day, & the Nields) over the last 95 years. It bobs along to the public domain in 75 days. #PublicDomainDayCountdown

October 19: Don Juan, the first Vitaphone feature film, premiered in 1926, with sound effects and music synchronized to the visuals. It raised expectations of the prospect of talking pictures: The film joins the public domain in 74 days. #PublicDomainDayCountdown

October 20: Giacomo Puccini’s Turandot, finished by Franco Alfano with libretto by Giuseppe Adami & Renato Simoni, has been an operatic staple since its debut. (See e.g. the Met’s current production.) It joins the public domain in 73 days. #PublicDomainDayCountdown (Thanks to @abzeronow for suggesting this work, along with its famous aria “Nessun Dorma” that will join the public domain at the new year with the rest of the opera. Further suggestions for my #PublicDomainDayCountdown, which will continue to the start of 2022, are welcome!)

October 21: This coming Public Domain Day is extra special. In 72 days, every sound recording published from the invention of records through 1922 joins the US public domain. Like these recordings of Enrico Caruso. #PublicDomainDayCountdown

October 22: With a multiracial, multigenerational cast of characters, Edna Ferber’s novel Show Boat topped weekly bestseller lists in 1926, and was adapted for radio, films, and an even more famous Broadway musical. The book’s public domain opening is in 71 days. #PublicDomainDayCountdown

October 23: Sometimes uncertainty over a US copyright’s start makes it hard to tell when it ends. I had Hitchcock’s The Pleasure Garden in last year’s #PublicDomainDayCountdown, but that may have been premature. Its 1926 screenings clarify its PD status in 70 days.

October 24: “Among the biographers I am a first-rate poet,” said Carl Sandburg. Many reviewers found his biography of Abraham Lincoln’s early life, The Prairie Years, first-rate in both poetic style and its panoply of facts. It joins the public domain in 69 days. #PublicDomainDayCountdown

October 25: It’s 68 days more till issues of the Journal of Biological Chemistry as old as November 1925 go public domain. But this year, the entire run was made open access. Why wait to share your research freely with the world? #PublicDomainDayCountdown #OAWeek

October 26: The Academy of American Poets has an illuminating profile of The Weary Blues, Langston Hughes’s first book of poetry, which includes often-anthologized poems on African American life. The book joins the public domain in 67 days. #PublicDomainDayCountdown

October 27: “There must be more money!” whispers throughout D. H. Lawrence’s short story “The Rocking-Horse Winner”, first published in the July 1926 Harper’s Bazar. When it joins the public domain in 66 days, the need for money to copy or adapt it will finally end. #PublicDomainDayCountdown

October 28: Some years back, my wife and I (both fans of Dorothy L. Sayers’s Lord Peter Wimsey and Harriet Vane) put online Whose Body?, her first Wimsey novel. We’re looking forward to her second, Clouds of Witness, joining it in the public domain in 65 days. #PublicDomainDayCountdown (Some copyright nerdery: I found no © renewal for Clouds, but its 1926 British publication preceded the 1st US edition, which came out in 1927, by more than 30 days, making its copyright likely revived by URAA. I’m assuming that 1926 publication marks the restored term start.)

October 29: Donna Scanlon reviews Lord Dunsany’s The Charwoman’s Shadow, a classic fantasy novel, set in “a mythical medieval Spain”, that joins the public domain in 64 days. (Thanks to @MrBeamJockey for suggesting this one!) #PublicDomainDayCountdown

October 30: Mari Ness describes Lucy Maud Montgomery’s The Blue Castle as a book about “a Sleeping Beauty trapped in Canada”, and an escape from her popular but constraining Anne of Green Gables books. It joins the public domain in 63 days. #PublicDomainDayCountdown

October 31: Anne M. Pillsworth and Ruthanna Emrys discuss Abraham Merritt’s creepy tale “The Woman of the Wood”, which appeared in the August 1926 issue of Weird Tales, and which reaches the public domain in 62 days. (Beware spoilers!) #PublicDomainDayCountdown

November 1: Like many academics, church historian F. J. Foakes-Jackson aimed to publish on a grander scale than he managed, but still had an impressive career output. His book on the life of St. Paul joins the public domain in 61 days. #PublicDomainDayCountdown

November 2: Arthur Conan Doyle’s spiritualist interest was increasingly visible in 1926. While it didn’t figure in his 4 1926 Sherlock Holmes stories, it did in his novel The Land of Mist, and his 2-volume history of spiritualism. All go public domain in 60 days. #PublicDomainDayCountdown

November 3: Infantilizing romantic songs have largely fallen out of favor, but “Baby Face” retains staying power by also being singable, in excerpted form, to actual babies. In 59 days, it will be more freely adaptable in that and other ways. #PublicDomainDayCountdown

November 4: Raped by a traveling preacher, then cast out by her family and town, the protagonist of The Unknown Goddess avoids ruination, spurns redemption by marriage, and becomes a healer. Ruth Cross’s 1926 novel, now hard to find, goes public domain in 58 days. #PublicDomainDayCountdown

November 5: With Seneca and white ancestry, Arthur C. Parker (Gawaso Wanneh) wrote to promote mutual cultural understanding. His collection of folklore for children, Skunny Wundy and other Indian Tales, joins the public domain in 57 days. #PublicDomainDayCountdown

November 6: The 1926 musical Oh, Kay! had a book by Guy Bolton and P.G. Wodehouse, and songs by George and Ira Gershwin, including the enduring standard “Someone to Watch Over Me”. It joins the public domain in 8 weeks. #PublicDomainDayCountdown

November 7: Copies of the book The Great Gatsby, new to the public domain this year, can easily be found and read. Its first film adaptation is not so fortunate. Its copyright still has 55 days left, but almost all of it is now deemed lost. #PublicDomainDayCountdown

November 8: Margaret Evans Price was a writer, artist, and toy designer, one of the co-founders of Fisher-Price Toys in 1930. She both wrote and illustrated Enchantment Tales for Children, which joins the public domain in 54 days. #PublicDomainDayCountdown #WomensArt

November 9: 53 days remain on the copyright of George Clason’s 1926 personal finance guide The Richest Man in Babylon, & the main Dow Jones average is over 200 times what it was in 1926. Well-invested early royalties usually way outearn new royalties 95 years out. #PublicDomainDayCountdown

November 10: Dorothy Parker was famous in the 1920s for sardonic verse and commentary in magazines like Vanity Fair and The New Yorker. Her first book of collected verse, Enough Rope, joins the public domain in 52 days. So do the 1926 issues of those two magazines. #PublicDomainDayCountdown

November 11: After World War I ends, three veterans & a war widow come home, and they find their relationships with those they return to permanently changed. Soldiers’ Pay, William Faulkner’s debut novel, joins the public domain in 51 days. #PublicDomainDayCountdown

November 12: The Stratemeyer syndicate released many series books in 1926, including debuts of the X Bar X Boys & Bomba the Jungle Boy. Many didn’t age well in their original form, but in 50 days the public domain could spur creative reboots. #PublicDomainDayCountdown

November 13: It’s best known now as an Elvis song, but “Are You Lonesome To-night?” has been sung and recorded by many singers since Lou Handman & Roy Turk wrote it in 1926. It won’t be lonesome in the public domain, which it joins in 49 days. #PublicDomainDayCountdown

November 14: Seventy Negro Spirituals was William Arms Fisher’s response to Antonín Dvořák’s call for American “serious” (& mostly white) composers & musicologists to respect and draw on Black music. Still referred to today, it joins the public domain in 48 days. #PublicDomainDayCountdown

November 15: “There is pleasure in philosophy,” says Will Durant at the start of The Story of Philosophy, which has made western philosophy pleasurable & accessible to a general audience for decades. It joins the public domain in 47 days. #PublicDomainDayCountdown

November 16: An attractive young cat burglar gets out of prison, but she can’t escape getting caught up in international intrigue. No, it’s not the latest streaming blockbuster; it’s Baroness Orczy’s The Celestial City, coming to the public domain in 46 days. #PublicDomainDayCountdown

November 17: Would HG Wells or Hilaire Belloc have bothered to renew copyrights on their extended flame war in print over The Outline of History? Wells’ heirs renewed his side; GATT in effect renewed Belloc’s. They finally expire in 45 days. #PublicDomainDayCountdown (Note: As in general with my other posts under this hashtag, I’m referring to US copyrights here. In Britain, where Wells and Belloc published their argument, Wells’s copyrights expired a few years ago, while Belloc’s will last a couple more years after this January 1.)

November 18: When Richard Scarry was growing up, Helen Cowles LeCron and Maurice Day published an illustrated children’s book which, like his, featured animals in human dress behaving badly and well. Their Animal Etiquette Book joins the public domain in 44 days. #PublicDomainDayCountdown

November 19: Franz Kafka died before finishing Das Schloss (The Castle), a novel in which “K” is frustrated dealing with an inflexible, arbitrary & uncaring governing system. Its first published edition joins the US public domain in 43 days. #PublicDomainDayCountdown

November 20: In 1926, Sinclair Lewis saw the film adaptation of his Canada-set novel Mantrap, then told the audience he was glad he’d read the book because he didn’t recognize it from the movie. They’ll both become comparable in the public domain in 42 days. #PublicDomainDayCountdown

November 21: Enthusiastic about astronomy, photography, nature, and bibliography, Florence Armstrong Grondal combined many of her passions in The Music of the Spheres: A Nature Lover’s Astronomy. Her book rises over the public domain horizon in 41 days. #PublicDomainDayCountdown

November 22: “As a narrative of war and adventure it is unsurpassable,” Winston Churchill said of Seven Pillars of Wisdom, T. E. Lawrence’s classic personal account of the Arab revolt against the Ottoman Empire. Its first edition joins the public domain in 40 days. #PublicDomainDayCountdown

November 23: There’s often extra drama in a Yale-Harvard football game. The one in Brown of Harvard includes John Wayne, Grady Sutton, & Robert Livingston, all uncredited, in their screen debuts. The film joins the public domain in 39 days. #PublicDomainDayCountdown

November 24: The US copyright status of Felix Salten’s 1923 book Bambi has long been controversial. (See e.g. William Patry’s post on the Twin Books case.) The arguments will be moot in 38 days, though, when its 1926 US registration expires. #PublicDomainDayCountdown

November 25: Big dinner parties can be stressful for many concerned about doing them Correctly. Isabel Cotton Smith’s Blue Book of Cookery and Manual of House Management, introduced by Emily Post, was meant for such concerns. It joins the public domain in 37 days. #PublicDomainDayCountdown

November 26: My high school library displays a larger than life painting of its namesake Katharine Brush, a writer who had much fame and fortune from the 1920s through the 1940s. Her first novel, Glitter, joins the public domain in 36 days. #PublicDomainDayCountdown

November 27: Seán O’Casey completed his Dublin trilogy with The Plough and the Stars, a play whose unsentimental portrayal of the Easter Rising prompted riots at an early performance. It joins the US public domain in 35 days. #PublicDomainDayCountdown

November 28: Christopher Vecsey profiles liberal Protestant minister Harry Emerson Fosdick, focusing on Adventurous Religion and Other Essays, which he says “illuminates his faith best of all.” That book joins the public domain in 34 days. #PublicDomainDayCountdown

November 29: Willa Cather’s enigmatic novel My Mortal Enemy makes some wonder if it’s based on her personal life or inner circle. (Here’s Charles Johanningsmeier’s take.) It becomes public domain in 33 days, enabling wider analysis. #PublicDomainDayCountdown

November 30: Giving workers not just wages, but also a stake in their company, has a long history. Profit Sharing and Stock Ownership for Employees by Gorton James et al. is a detailed study of its use and consideration up to 1926. It goes public domain in 32 days. #PublicDomainDayCountdown

December 1: Topper: An Improbable Adventure relates the misadventures of a mild-mannered banker who starts seeing ghosts. Thorne Smith’s novel, which spawned sequels, films, and radio and TV series, levitates to the public domain in 31 days. #PublicDomainDayCountdown

December 2: S. Ansky’s The Dybbuk, based on Jewish folklore, is one of the best-known Yiddish plays, still frequently staged: The first English version, translated by Henry G. Alsberg and Winifred Katzin, joins the public domain in 30 days. #PublicDomainDayCountdown

December 3: Sherwood Anderson’s 1926 magazine pieces include the 1st version of one of his most famous stories “Death in the Woods”, as well as others that went into his book Tar: A Midwest Childhood. They join the public domain in 29 days. #PublicDomainDayCountdown

December 4: An Englishwoman moves to the country to escape her relatives, and takes up witchcraft. Robert McCrum in 2014 counted Sylvia Townsend Warner’s Lolly Willowes among the “100 best novels”. It joins the US public domain in 4 weeks. #PublicDomainDayCountdown

December 5: In 1926, US Baptist churches were divided not only between north and south, but also in music styles. William H. Main and I. J. Van Ness’s 1926 New Baptist Hymnal, meant to bring them together, becomes public domain in 27 days. #PublicDomainDayCountdown

December 6: The 1926 musical The Girl Friend boosted the careers of writers Richard Rodgers, Lorenz Hart, & Herbert Fields, on their way to being Broadway legends, and introduced the song “Blue Room”. It joins the public domain in 26 days. #PublicDomainDayCountdown

December 7: Charles E. King did much to raise awareness of Hawaiian music in the rest of the world. The 6th edition of his Hawaiian Melodies, which includes “Ke Kali Nei Au” (aka the Hawaiian Wedding Song) becomes public domain in 25 days. #PublicDomainDayCountdown

December 8: Thanks to @OnlineCrsLady for suggesting a #PublicDomainDayCountdown book that was itself built from the public domain of the time: Muriel St. Clare Byrne edited The Elizabethan Zoo, a collection drawn from early 17th century books, and published it in 1926. Its US copyright ends in 24 days.

December 9: Joseph Conrad died in 1924, but some of his posthumously published work remains copyrighted for 23 more days. That includes the collection Last Essays, edited by Richard Curle, which has a reprint of his Congo diaries behind his Heart of Darkness. #PublicDomainDayCountdown

December 10: Readers keep rediscovering Hope Mirrlees’s unconventional fantasy Lud-in-the-Mist. First published in 1926, republished in 1970 by Lin Carter, and more recently brought back to wide attention by Neil Gaiman, it’ll reach the US public domain in 22 days. #PublicDomainDayCountdown (Copyright nerdery: The only US Ⓒ registration I can find for Lud-in-the-Mist is dated 1927, and that registration is unrenewed. I’m assuming GATT restored its copyright based on its prior 1926 UK publication, and that it will expire at the same time as other 1926 copyrights.)

December 11: In 3 weeks, Moana (not the Disney movie, but a feature film by Robert J. Flaherty that was the first to be called a “documentary”) will join the public domain. Wikipedia describes some of its innovations and fictionalizations. #PublicDomainDayCountdown See also this longer profile of the movie by Nathanael Hood, including more on a restored version with added sound made by the original director’s daughter Monica Flaherty.

December 12: “The Birth of the Blues” by Ray Henderson, B. G. DeSylva, & Lew Brown debuted in George White’s Scandals of 1926, but has been recorded many times since, including in the 1941 Bing Crosby movie of the same title. It joins the public domain in 20 days. #PublicDomainDayCountdown

December 13: The University of Virginia Library has the papers of Rosalie Caden Evans, a prominent figure in land disputes in Mexico who was killed in 1924. A book of her letters, published by her sister, joins the public domain in 19 days. #PublicDomainDayCountdown The copyright status of her other letters is trickier to determine. Any not published before 2003 are in the public domain now. Any that were first published before then, but after 1926, such as in this book, may remain copyrighted for years to come.

December 14: Not just a wisecracker, WC Fields was an accomplished juggler and physical comedian. Steve Massa calls So’s Your Old Man “the best” of Fields’ surviving silent films. This #NatFilmRegistry film joins the public domain in 18 days. #PublicDomainDayCountdown Another 1926 film was added to the #NatFilmRegistry today: The Flying Ace. As far as I can tell, this one’s already public domain; I’m not finding a copyright renewal for it. (Lots of 1926 African American works are public domain due to nonrenewal.)

December 15: C. K. Scott Moncrieff, the first translator of Proust’s Remembrance of Things Past (as he titled it), also did what’s probably the most-read English translation of Stendhal’s classic novel The Red and the Black. It joins the public domain in 17 days. #PublicDomainDayCountdown

December 16: Arturo Toscanini was already a world-renowned conductor when he made his first orchestral recordings for Victor in 1920 and 1921. Peter Gutmann calls them still “highly listenable today”: They join the public domain in 16 days. #PublicDomainDayCountdown

December 17: When Georgette Heyer started on a sequel to her first novel The Black Moth, she realized she could do better, and reworked the characters to produce These Old Shades. It was a career-making bestseller, and in 15 days it’ll be public domain in the US. #PublicDomainDayCountdown

December 18: Written for Betsy, a musical few now remember, Irving Berlin’s “Blue Skies” has since been featured in The Jazz Singer, White Christmas, and more than one Star Trek production. It’s nothing but public domain in 2 weeks. #PublicDomainDayCountdown

December 19: The Poetry Foundation profiles Hart Crane‘s too-short poetic career, which has garnered lasting interest in modernist, Romantic, and queer circles. His first collection, White Buildings, joins the public domain in 13 days,

December 20: Enid Blyton’s children’s books have been reissued and revised many times, but older editions are starting to join the public domain in the US. The 1st edition of her Book of Brownies, illustrated by Ernest Aris, does in 12 days. #PublicDomainDayCountdown

December 21: In 1926, not only was Pluto undiscovered, but astronomers like Harlow Shapley had only just come around to the idea of other galaxies. You can see how far astronomy’s come since, when his popular science book Starlight goes public domain in 11 days. #PublicDomainDayCountdown

December 22: American amateur detective Philo Vance debuts in The Benson Murder Case, based loosely on a real-life case. The novel, by S. S. Van Dine (pen name of art critic Willard Huntington Wright), debuts in the public domain in 10 days. #PublicDomainDayCountdown

December 23:Crazy Blues“, recorded by Mamie Smith and Her Jazz Hounds in 1920, was a smash hit that persuaded big record companies to promote Black singers and genres. It, and other pre-1923 blues records, join the public domain in 9 days. #PublicDomainDayCountdown

December 24: English Christmas carol fans will appreciate William Adair Pickard-Cambridge’s 1926 Collection of Dorset Carols, which first published “Shepherds Arise” and popularized “Angels We Have Heard on High” in English. It joins the US public domain in 8 days. #PublicDomainDayCountdown

December 25: Barbara Dyer writes of her great-aunt Mary Christmas, an immigrant from Syria: Fictionalized, she was the title character of a book by Mary Ellen Chase described here. It joins the public domain in 7 days. #PublicDomainDayCountdown

December 26: Greta Garbo began her film career in Sweden, but became a movie star in the US in 1926. Her first US film, Torrent, looks to be in the public domain for lack of a copyright renewal. Her second, The Temptress, joins the public domain in 6 days. #PublicDomainDayCountdown

December 27:
for life’s not a paragraph
And death i think is no parenthesis
– from E. E. Cummings’ poem “since feeling is first”, in his poetry collection is 5, which today is 5 days from the public domain. More on his work here. #PublicDomainDayCountdown

December 28: One feature of the US’s older publication-based copyright terms is that book texts & illustrations usually go public domain at the same time. So we get Olive Miller’s and Maud and Miska Petersham’s work in Tales Told in Holland all at once in 4 days. #PublicDomainDayCountdown

December 29: Fritzi Kramer writes on The Winning of Barbara Worth, a silent western featuring Gary Cooper’s first major role, and a memorable climactic flood sequence. In 3 days it’ll be in a flood of arrivals to the public domain from 1926. #PublicDomainDayCountdown

December 30: If you’re celebrating on New Years Eve, consider playing the Peerless Quartet’s “Auld Lang Syne” just after midnight. This recording will have just entered the public domain, along with an estimated 400,000 more pre-1923 records. #PublicDomainDayCountdown

December 31: Joan Didion said Ernest Hemingway “changed the rhythms of the way both his own and the next few generations would speak and write and think.” His first full-length novel The Sun Also Rises will be ours in the public domain when the sun rises tomorrow. #PublicDomainDayCountdown

DMCA To The Rescue! / David Rosenthal

In NFTs and Web Archiving I pointed out that the blockchain data representing an NFT of an image such as this CryptoPunk is typically a link to a Web URL containing metadata that includes a link to the Web URL of the image. That post was about one of the problems this indirect connection poses, that since both the metadata and the data are just ordinary Web URLs they are subject to "link rot"; they may change or vanish at any time for a wide variety of reasons.

I Confess To Right-Clicker-Mentality discusses another of the problems this indirect connection causes, namely that trying to create "ownership", artificial scarcity, of an image represented by a Web URL is futile. Anyone can create their own copy from the URL. Miscreants are now exploiting en masse the inverse of this. Because art images on the Web are URLs, and thus easy to copy, anyone can make a copy of one and create an NFT for it. No "ownership" of the image needed. Liam Sharp suffered this way:
Yet another externality of cryptocurrencies!

Follow me below the fold for an explanation of how the DMCA was used to fix the problem.

The Digital Millennium Copyright Act (DMCA) was passed in 1998. It mandates a "notice and takedown" process for copyright content online, which has ever since been problematic:
First, fair use has been a legal gray area, and subject to opposing interpretations. This has caused inequity in the treatment of individual cases. Second, the DMCA has often been invoked overbearingly, favoring larger copyright holders over smaller ones. This has caused accidental takedowns of legitimate content, such as a record company accidentally removing a music video from their own artist. Third, the lack of consequences for perjury in claims encourages censorship. This has caused temporary takedowns of legitimate content that can be financially damaging to the legitimate copyright holder, who has no recourse for reimbursement. This has been used by businesses to censor competition.
2016's Notice and Takedown in Everyday Practice by Jennifer M. Urban, Joe Karaganis & Brianna Schofield reported on the problems:
It has been nearly twenty years since section 512 of the Digital Millennium Copyright Act established the so-called notice and takedown process. Despite its importance to copyright holders, online service providers, and Internet speakers, very little empirical research has been done on how effective section 512 is for addressing copyright infringement, spurring online service provider development, or providing due process for notice targets.
...
The findings suggest that whether notice and takedown “works” is highly dependent on who is using it and how it is practiced, though all respondents agreed that the Section 512 safe harbors remain fundamental to the online ecosystem. Perhaps surprisingly in light of large-scale online infringement, a large portion of OSPs still receive relatively few notices and process them by hand. For some major players, however, the scale of online infringement has led to automated, “bot”-based systems that leave little room for human review or discretion, and in a few cases notice and takedown has been abandoned in favor of techniques such as content filtering. The second and third studies revealed surprisingly high percentages of notices of questionable validity, with mistakes made by both “bots” and humans.
Now, via one of David Gerard's invaluable news posts we find this innovative solution to Liam Sharp's problem.

The leading site for creating and trading NFTs is OpenSea. Because they are focused on making a quick buck from the muppets, not the long-term future of their NFTs, they store the images in Google's cloud. This provides victims like Sharp with an opportunity. Because major corporations like Google operate an automated DMCA takedown mechanism victims can send Google a takedown notice. The result will be that the fraudulent NFT will end up pointing to "404 page not found". @CoinersTakingLs explains:

Github Action setup-ruby needs to quote ‘3.0’ or will end up with ruby 3.1 / Jonathan Rochkind

You may be running builds in Github Actions using the setup-ruby action to install a chosen version of ruby, looking something like this:

    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: 3.0

A week ago, that would have installed the latest ruby 3.0.x. But as of the christmas release of ruby 3.1, it will install the latest ruby 3.1.x.

The workaround and/or correction is to quote the ruby version number. If you actually want to get latest ruby 3.0.x, say:

      with:
        ruby-version: '3.0'

This is reported here, with reference to this issue on the Github Actions runner itself. It is not clear to me that this is any kind of a bug in the github actions runner, rather than just an unanticipated consequence of using a numeric value in YAML here. 3.0 is of course the same number as 3, it’s not obvious to me it’s a bug that the YAML parser treats them as such.

Perhaps it’s a bug or mis-design in the setup-ruby action. But in lieu of any developers deciding it’s a bug… quote your 3.0 version number, or perhaps just quote all ruby version numbers with the setup-ruby task?

If your 3.0 builds started failing and you have no idea why — this could be it. It can be a bit confusing to diagnose, because I’m not sure anything in the Github Actions output will normally echo the ruby version in use? I guess there’s a clue in the “Installing Bundler” sub-head of the “Setup Ruby” task:

Of course it’s possible your build will succeed anyway on ruby 3.1 even if you meant to run it on ruby 3.0! Mine failed with LoadError: cannot load such file -- net/smtp, so if yours happened to do the same, maybe you got here from google. :) (Clearly net/smtp has been moved to a different status of standard gem in ruby 3.1, I’m not dealing with this further becuase I wasn’t intentionally supporting ruby 3.1 yet).

Note that if you are building with a Github actions matrix for ruby version, the same issue applies. Maybe something like:

matrix:
        include:
          - ruby: '3.0' 
 steps:
    - uses: actions/checkout@v2

    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: ${{ matrix.ruby }}