August 20, 2025

Harvard Library Innovation Lab

Live and Let Die: Rethinking Personal Digital Archiving, Memory, and Forgetting Through a Library Lens

A collage featuring various overlapping images and textures, including a close-up of a human eye, hands handling archival materials, a person sitting in water, vintage books, a computer file folder icon, flowers, clouds, and a grid sketch of a building. Repeated words "More" appear on the left, while years from 2004 to 2022 are listed vertically in the center. The composition uses muted tones with occasional pastel highlights—blue, purple, yellow, and olive green—and is overlaid with torn paper and photographic effects.

In today’s world, each moment generates a digital trace. Between the photos we take, the texts we send, and the troves of cloud-stored documents, we create and accumulate more digital matter each day. As individuals, we hold immense archives on our personal devices, and yet we rarely pause to ask: What of this is worth keeping? And for how long? Each text we send, document we save, or photo we upload quietly accumulates in the digital margins of our daily routines. Almost always, we intend to return to these traces later. Almost never do we actually return to them.

Libraries do not collect and store everything indiscriminately. They are bastions of selection, context, and care. So why don’t we do the same when managing our personal digital archives? How can library principles inform personal archiving practices when memory becomes too cheap, too easy, and too abundant to manage? What does meaningful digital curation look like in an age of “infinite” storage and imperfect memory? How might we better navigate the tension between memory and forgetting in the digital age? At LIL, we’re interested in holding space for these tensions and exploring the kinds of tools and frameworks that help communities navigate these questions with nuance, care, and creativity. We researched and explored what it could look like to provide individuals with new kinds of tools and frameworks that support a more intentional relationship with their digital traces. What emerged is less a single solution and more a provocation about curation, temporality, and what it means to invite forgetting as part of designing for memory.

This blog post sketches some of our ideas and questions informed by the work of archivists, librarians, researchers, coders, and artists alike. It is an invitation to rethink what it means to curate the digital residue of our everyday lives. Everyone, even those outside of libraries, archives, and museums (LAMs), should engage in memory work with their own personal digital archives. How might we help people rigorously think through the nature of digital curation, even if they aren’t already thinking of themselves as archivists or librarians of their personal collections? We hope what follows offers a glimpse into our thinking-in-progress and sparks broader conversation about what communities and individuals should do with the sprawling, often incoherent archives our digital lives leave behind.

Our premise: overaccumulation and underconsideration

We live in a time of radical abundance when it comes to digital storage. Cloud platforms promise virtually unlimited space. A single smartphone can hold thousands of photos. Machines never forget (at least, not by default) and so we hold on to everything “just in case,” unsure when or why we might need it. Often, we believe we are preserving things such as emails, messages, and files, because we’re simply not deleting them.

But this archive is oddly inhospitable. It’s difficult to find things we didn’t intentionally label or remember existed. Search functions help us find known items, but struggle with the forgotten. Search is great for pinpointing known things like names or keywords, but lost among our buried folders and data dumps are materials we didn’t deliberately catalog for the long-term (like screenshots in your photos app). One distinction that emerged in our work is the difference between long-term access and discovery or searchability. You might have full-text search capability over an inbox or drive, but without memory of what you’re looking for or why it mattered, it won’t appear. Similarly, even when content resurfaces through algorithmic recommendation, it often lacks appropriate context.

And so, we are both overwhelmed and forgetful. We save too much, but know too little about what we’ve saved. Digital infrastructure has trained many of us to believe that “saving” is synonymous with “remembering,” but this is a design fiction. People commonly assume that “they can keep track of everything,” “they can recognize the good stuff,” and most of all, “they’re going to remember what they have!” But in practice, these assumptions rarely hold true. The more we accumulate, the less we can truly remember. Not because the memories aren’t saved, but because they are fundamentally disconnected from context.

A library lens on everyday personal digital archives

“Not everything that is dead is meant to be alive.”

When it comes to our digital lives, we often feel pressure to rescue every bit of data from entropy. But what if some data is just refuse, never meant to be remembered? In libraries and archives, we don’t retain every book, document, or scrap of marginalia. We acquire with purpose, discard items and weed our collections with care, organize our collections, and provide access with users in mind. Digitally, this process can be much harder to implement because of the sheer volume of material. Everything is captured whether it be texts, searches, or half-finished notes. Some of it may be precious, some useful, and some exploitable.

The challenge is thus cultural as much as technical. What deserves preservation? Whose job is it to decide? And how can we create tools that align with people’s values, rather than simply saving everything? Libraries and archives are built on principles of deliberate acquisition, thoughtful organization, and selective retention. What if we followed those same principles in our personal digital ecosystems? Can we apply principles like curation, appraisal, and mindful stewardship from library science to personal digital archives? What if, instead of saving everything permanently by default, we adopted a mode of selective preservation rooted in intention, context, and care?

Integral to memory work is appraisal, deciding what is worth keeping. In archival theory, this is a complex, value-laden practice. As the Society of American Archivists (SAA) notes, archivists sometimes use the term “enduring value” rather than “permanent value” with intention, signaling that value may persist for a long time, but not necessarily forever. Notions of “enduring value” can shift over time and vary in different communities.

On forgetting (and why it’s valuable)

 In digital systems, forgetting often has to be engineered. Systems are designed to store and resurface, not to decay. But decay, entropy, and obsolescence are part of the natural order of memory. If we accept that not everything needs to be held forever, we move into the realm of intentional digital gardening.

“What if forever isn’t the goal? What’s the appropriate level of preservation for a given context?”

Preservation need not be permanent in all cases. It can be revisited, adjusted, revised with time as people, contexts, and values change. Our tools should reflect that. What if temporary preservation was the more appropriate goal? What if the idea of a time capsule was not just about novelty and re-surfacing memory, but instead core to a practice of sustainable personal archiving, where materials are sealed for a time, viewed in context, then allowed to disappear?

“The memory needs to be preserved, not necessarily the artifact.”

There’s a growing recognition in library and archival science that resurfacing content too easily, and out of context, can be damaging, especially in an era where AI searches can retrieve texts without context. Personal curation tools should assist in the caretaking of memory, not replace it with AI. Too often, we see narratives that frame technology as a substitute for curation. “Don’t worry about organizing,” we’re told, “We’ll resurface what you’ll want to remember.” But this erases the intentionality fundamental to memory-making. Sometimes, forgetting protects. Sometimes, remembering requires stewardship, not just storage.

Designing for memory: limits as creative force

Designing for memory is ultimately a human-centered challenge. Limitations can be a tool, not a hindrance, and constraints can cultivate new values, behaviors, and practices that prioritize deliberate choice and intentional engagement.

Imagine creating a digital time capsule designed for memory re-encountering, temporality, and impermanence. You can only choose 10 personal items to encapsulate for future reflection. What would you choose? What story would those items tell? Would they speak to your accomplishments? Your values? Your curiosities? Would they evoke joy or loss?

Capsules could be shaped around reflective prompts to aid selection and curation:

Engaging in reflection like this can help individuals perform the difficult and deeply human work of curating your personal digital archive without being overwhelmed by the totality of your digital footprint. Making this kind of digital housekeeping part of your established maintenance routine (like spring cleaning) helps make memory work an intentional and active process that encourages curation, self-reflection, and aids the process of choosing what not to keep. It is memory with intention.

Memory craft: a call to action

In every era, humans have sought ways to preserve what’s vital, and let the nonessential fall away. In our current digital context, that task has become harder, not because of lack of space, but because of lack of frameworks. Your life doesn’t have to be backed up in its entirety. It only needs to be honored in its essentials. Sometimes, that means creating a space in which to remember. Sometimes, that means creating a ritual in which to let go.

At the Library Innovation Lab, we are continuing to explore what it means to help people preserve with intention. Becoming active memory stewards means moving beyond default accumulation and choosing with care and creativity what stories and traces to carry forward. We want to make memory, not just data, something people can shape and steward over time. Not everything needs to be preserved forever, and our work is to provide people with the frameworks and tools to make these decisions.

Resources

The following resources helped shape our thinking and approach to intentional curation of personal archives in the digital age:

Acknowledgements

We would like to thank our colleagues Clare Stanton, Ben Steinberg, Aristana Scourtas, and Christian Smith for the ideas that emerged from our conversations together.

Visual by Jacob Rhoades.

 

August 20, 2025 11:06 PM

In the Library, With the Lead Pipe

Interest Convergence, Intersectionality, and Counter-Storytelling: Critical Race Theory as Practice in Scholarly Communications Librarianship

In Brief: Despite the ever-increasing presence of diversity, equity, and inclusion (DEI) rhetoric in librarianship, library workers who are Black, Indigenous, and people of color (BIPOC) are still underrepresented and marginalized. Critical race theory (CRT) offers the tools necessary to understand why the underlying racial power dynamics of our profession remain unchanged and to generate new ideas to move toward true equity and inclusion. This article presents applications of the theoretical frameworks of interest convergence, intersectionality, and counter-storytelling to the authors’ work with users and to our collegial relationships. As scholarly communications and information policy specialists of color at a predominantly white academic library, these three frameworks inform how we teach about scholarly practices, such as copyright and citation, as well as how we analyze and educate about the open access landscape. We also argue that a critical race theory lens can provide useful analytical tools to inform practice in other types of libraries and different kinds of library work, and encourage all library workers to engage with it as they seek meaningful change in their work settings and the profession more broadly.

By Maria Mejia and Anastasia Chiu

Introduction

As scholarly communications practitioners of color located in an academic library of a predominantly white[1] institution (PWI), we find that critical race theory serves as a cornerstone for how we relate to each other and to the profession. Multiple theoretical frameworks in this movement give name and shape to our approaches and to the racialized phenomena that we seek to resist. The themes of counter-storytelling, intersectionality, and a problematized approach to interest convergence speak most closely to the ways in which we practice CRT in our relationships and our work. We are members of a department consisting entirely of librarians who are Black, Indigenous, and people of color—a somewhat uncommon occurrence in our PWI and in librarianship more broadly—and this dynamic has shaped our CRT-informed practice. Collectively, as a department, we seek to set our own terms around what it means to be a good library worker and a good colleague. We work together to advocate for communities that are systematically excluded in scholarship and librarianship because our librarianship is for those communities. Yet, we must also contend with the fact that our institution’s support for this work is mainly a matter of interest convergence. To paraphrase Derrick Bell (1980), PWIs value and promote racial progress and racial justice work only insofar as it serves their political interest to do so. In our case, our institution benefits from the optics of our intersectionality as a woman and a non-binary person of color, taking on the labor of building inclusive services and an inclusive workplace. With this in mind, we take advantage of interest convergence as it suits us, while also prioritizing ourselves and each other in this environment. We empower each other to recognize when our institution is pushing us to do too much of the labor of inclusion and support each other in setting strong boundaries.

At the core of our scholarly communications work is providing services to researchers to help them navigate their scholarship and academic communities. These services include teaching scholars about common conventions that exist in the scholarly lifecycle, such as publication and citation practices. Many researchers are taught to see scholarly practices as pro forma requirements devoid of politics, but we seek to trouble this assumption. We recognize that scholarly practices exist in a capitalist, heteropatriarchal, white supremacist framework that reinforces the marginalization of BIPOC scholars and creators. With this understanding, we work with researchers to push back against the mainstream narrative to surface the counter-stories of those silenced in scholarly discourse. Relying on CRT as a frame, we attempt to build expansive conversations that recognize the racialized, gender essentialist, ableist, and capitalist politics of knowledge production, while making space for more liberatory, critically open, and equitable practices.

Critical race theory provides us with vocabulary, theoretical frameworks, and tools for many aspects of the collegial relationships and the services that we are building together. Our article will explore how we find our work and our working selves in CRT, applying the concepts of interest convergence, intersectionality, and counter-storytelling. We hope that our lived experiences in bringing CRT to practice will serve as an example for others looking to build environments where BIPOC library workers can thrive.

Definitions of Interest Convergence, Intersectionality, and Counter-Storytelling

We look at CRT and apply it in our practice with recognition that it is not a single monolithic model of race, racism, and anti-racism, but a fully-fledged social and scholarly movement with many strands and tenets, some of which differ significantly in their emphases. The ways that we inform our approaches to scholarly communications librarianship with CRT are not the only or definitive ways to apply CRT. Rather, they form a concrete example of approaching library work with an understanding of racism as an everyday phenomenon that is structurally embedded in individual interactions, institutional relationships, and macro-level policy. Three commonly used frameworks of CRT that we use as lenses for our work are the interest convergence dilemma, intersectionality, and counter-storytelling.

Interest convergence is a theory that posits that racial progress only happens when the political interests of BIPOC and white people converge. Derrick Bell (1980) originated this theory in his critical perspective of the Supreme Court decision in Brown v. Board of Education. He saw the Brown decision not as a result of change in legal reasoning or social norms around race, but as a result of temporary common ground between the political need of white people for optics of racial equity during the Cold War and the enduring equality needs of Black people. Bell concludes that although we can sometimes use interest convergence to push for useful changes, this approach also has serious shortcomings in fostering change in foundational racial power dynamics. Interest convergence does not only appear at the macro level of national policy, as in Bell’s case analysis; it also shows up in organizations and the labor conditions within them, including libraries and library work. Library organizations have been building rhetorical commitments to diversity, equity, and inclusion for decades in response to BIPOC-led calls for change in professional organizations, and though these may be seen as progress, Bell’s articulation of interest convergence offers an explanation for the observable shortcomings of these organizational statements.

Despite the growth of DEI rhetoric, libraries as organizations nevertheless continue to enact racial domination through our work and working conditions. As Victor Ray (2019) points out, race is constitutive of organizations; organizations “help launder racial domination by obscuring or legitimating unequal processes” (35). We see this in action when “diversity work” is coded in libraries as something to be done primarily by BIPOC separately from, and in addition to, everyday organizational functions, resulting in disproportionate influence and control of our time and labor. Moreover, interest convergence also encourages us to notice that racialized groups often do not reap the benefits of policies and measures for inclusion and equity, and in fact, racialized communities can be harmed in the halfhearted enactment of those policies and measures. Just as Bell points out that this was the case in school desegregation under Brown v. Board, Hathcock and Galvan point out that this is the case in libraries’ DEI efforts, such as the use of temporary job appointments as diversity hiring initiatives (Galvan 2015; Hathcock 2015; Hathcock 2019). These temporary job appointments increase staff diversity in the short term but do not disrupt the racial dynamics of predominantly white libraries in the long term, demonstrating the limits of interest convergence. We use this framework to approach librarianship with a critical understanding of where our interests truly converge and diverge with our organization’s, and to inform how we situationally advocate for equitable practices and policies.

One of the key steps in our advocacy for equity is understanding the multiple identities we embody and, therefore, the overlapping marginalizations that we face in a predominantly white profession and institution. We look to intersectionality as a framework for understanding the compounding effects of racial marginalization with other forms of marginalization (such as gender, class, etc.). Intersectionality rejects the tendency of institutions and individuals to treat these forms of marginalization as entirely separate spheres. The concept was originally coined by legal scholar Kimberlé Crenshaw (1991) to highlight the shortcomings of workplace discrimination law in addressing discrimination that appears specifically at the intersection of race and gender. Sociologist Patricia Hill-Collins (2000) expanded upon it, identifying the organized interconnections between multiple forms of oppression and the experience of multiple marginalizations as having a compounding effect that constitutes a “matrix of domination” (43). Although the term “intersectionality” has become a buzzword in DEI rhetoric, it remains a useful theoretical framework for analyzing the experiences of BIPOC library workers who may also be queer, working class, disabled, and hold other marginalized identities. Examining our day-to-day experiences through an intersectional lens allows us to understand how the dilemma of interest convergence manifests itself in our work and professional relationships. When our interests do not align with our organization’s interests, developing and sharing counter-stories can be a powerful and necessary tool.

Counter-storytelling refers to the practice of telling stories that reflect marginalized experiences, histories, and knowledge in a way that challenges mainstream narratives or commonly-taught histories. In their foundational Critical Race Theory: An Introduction, Delgado and Stefancic (2023) characterize narrative and storytelling as having the “valid destructive function” of illuminating preconceptions and myths about race and racialization that form the contextual backdrop against which BIPOC are marginalized and dehumanized (76). In our practice, we tell counter-stories that highlight how white supremacy is enacted in underlying philosophies and common practices of scholarship and librarianship. We use these counter-stories to move ourselves and others to center feminist and BIPOC knowledge and scholarship, as well as to resist and relinquish the everyday practices that serve white interests at the expense of BIPOC humanity. We also use counter-stories to develop counter-spaces where BIPOC library workers and users can build community with each other in honest, safe, and liberatory ways that resist the dominant gaze.

In summary, we use the frameworks of interest convergence and intersectionality to understand the conditions of our workplace and of academia in general, and we apply those understandings to construct counter-stories, with which to empower each other and the scholars we serve.

Applying CRT in Our Collegial Relationships

We are BIPOC library workers with many interrelated marginalized identities that affect how we approach our work. That being the case, applying the frameworks of CRT to our relationships with each other, with our entire department, and with colleagues across our library system is key to helping us understand our work environment and mitigate the effects of that environment on our minds and bodies. We work in academic spaces that are predominantly white, ableist, heteronormative, cisgender, and patriarchal, as information professionals who do not fit many or any of those dominant identities. It can be exhausting, but we lean on our CRT praxis to help ourselves and each other not only survive in this environment but also find moments to thrive and experience joy. By employing intersectionality and interest convergence to recognize and call out the white supremacist environment in which we work, we can build our counter-stories and even counter-spaces, to ensure that we set healthy boundaries and find fulfillment in our work together

As we apply CRT in our relationship with each other, we explicitly acknowledge and push back against the white supremacist culture that permeates every aspect of our workplace and our work (Quiñonez, Nataraj, and Olivas 2021). This requires that we recognize vocational awe and neutrality as key components of everyday white supremacy in library work. Vocational awe encourages us to sacrifice ourselves for a so-called sacred profession dominated by a white womanhood that neither of us can (or wants to) achieve (Ettarh 2018). Library neutrality is a myth that serves to disguise white supremacist ideas as normal (Chiu, Ettarh, and Ferretti 2021). Our very racialized presence in a white profession necessitates a radical pushback against vocational awe and neutrality for us to survive, much less thrive. Thus, part of our work as colleagues is to help each other call out those instances of white supremacy that would seek to appropriate our work and, in many ways, our very selves (Brown et al. 2021).

Part of the white supremacist culture that we seek to name and dismantle is the widespread practice in academic—and by extension academic library—spaces to co-opt, tokenize, and undervalue our work as marginalized people (Brown, Cline, and Méndez-Brady 2021). More often than not, BIPOC academics are relegated to what Dr. Chanda Prescod-Weinstein (2021) refers to as the “emotional housework” of the academy, where we are expected to meet general academic standards while also providing much of the support work for our colleagues and students (189). We are not only responsible for completing the standard service, research, and librarian duties that are expected of all faculty members. Our institutions also expect us to take on the additional labor of building more inclusive spaces through mentoring, educating colleagues from dominant groups, and shouldering the burden on diversity committees and in diversifying faculty governance and search committees (Brown, Cline, and Méndez-Brady 2021).

This additional burden is true for academics like us who are tenure-track but has even weightier implications for those in more precarious positions, including contract and adjunct workers. As Dr. Prescod-Weinstein (2021) notes: “Researchers from minoritized groups, including gender minorities and especially those of us of color, face an extraordinary burden in academic spaces” (188). We find that to be the case for us at our institution, where our small department has often been overrepresented on search committees, diversity work, and other activities where the optics of a BIPOC perspective are seen as beneficial. These requests come fast and furious with no additional compensation for the extra work and little to no acknowledgement of the inequitable distribution of this labor. Using the CRT frameworks of intersectionality, interest convergence, and counter-storytelling/counter-spaces, we gauge each request as it comes in, and push ourselves and each other to consider what opportunity the request presents to make changes that benefit BIPOC in our library, weighed against how it will affect our individual and departmental capacity, as well as the precedent that it may set for other BIPOC in our library. We refuse and accept requests judiciously, and thus, can carve out joy and fulfillment in our relationships with each other and our work amidst the additional burdens a white supremacist environment places on us.

Applying intersectionality requires that we exercise care in acknowledging each other’s labor, respect each other’s boundaries, and give each other space to fully inhabit our intersectional identities at work as we see fit. We explicitly recognize that we are full human beings with interlocking identities that we cannot and often do not wish to leave at the proverbial door when we enter our workspaces. We each have different ways of embodying our marginalized identities at work and deserve to be treated with respect to our varying needs. We also acknowledge that our lives involve more than the work we do at our institution. We are caregivers and receivers; we are family and friends; we are whole people with ideas, desires, and commitments that go beyond our workplace. With this in mind, the intersectional care that we bring to the profession takes many forms, including building flexibility into our work schedules, using multiple modes of communication for meetings and information-sharing, or taking full advantage of the hybrid virtual and in-person work environment that surfaced during the earlier days of the pandemic. Our intersectional care also extends to a sensitivity to our feelings and focus as we approach our work; there have often been moments when we have had to postpone job activities to take time and space to check in with each other about heightened circumstances in our personal lives or the world around us.

Another way we employ intersectionality in our collective work is through intentionally surfacing and valuing that “emotional housework” which often gets invisibilized and devalued by our white institution. Together, we call out and celebrate the achievements we make in mentoring BIPOC students, teaching our white colleagues through our lived experiences, and all the labor we put into helping to make our institution a viable place for marginalized workers and learners. We not only surface this invisibilized work among ourselves, but we also do so with our colleagues across the institution, both formally and informally. We take the time to mention the emotional labor we are engaging in, especially when new requests for such labor come in, and we make a point of adding narratives of that labor to our formal documentation for promotion and tenure.

Finally, we apply intersectionality as we support one another in exercising boundaries when the institution wishes to exploit our labor and intersectional identities (Brown, Cline, and Méndez-Brady 2021). We keep each other informed as necessary about our schedules, workload, and personal life loads, and make adjustments as needed to allow any one of us to step back or forward as they are able. The purpose of this practice is never to force disclosure beyond what either of us wishes to disclose at work. Rather, it is to encourage each other to reflect holistically and to empower ourselves to decline new requests if needed to secure and maintain healthy boundaries in our work. We seek to interrupt the professional librarian norm that we must be selective in refusing requests for ad hoc or invisibilized labor; instead, we encourage each other to be selective in accepting these requests, particularly in times when we are stretched thin due to staffing shortages or simply because it is a busy time of the academic term. Overall, our intersectional approach to empowered boundary-setting is a form of “transformative self-care” where we build a supportive community to affirm our intersectional identities, validate our lived experiences, and push back against coerced assimilation to the surrounding white supremacist norms of our institution (Baldivia, Saula, and Hursh 2022, 137; see also Moore and Estrellado 2018).

In addition to intersectionality, we also use a critical approach to interest convergence to help us call out oppressive systems in our workplace and call forth more liberatory possibilities. From the CRT framework of interest convergence, we learn that antiracist work is only supported in white supremacist culture to the extent that it also provides a benefit to white supremacy (Delgado and Stefancic 2023). Employing this framework can, therefore, be a powerful way of pushing forward initiatives and changes that benefit us and our fellow racialized colleagues and do so with the support of our predominantly white academic library (Brown, Cline, and Méndez-Brady 2021; Aguirre 2010). However, we also recognize Derrick Bell’s original framing of the interest convergence dilemma, which teaches us that basing our pursuit of racial justice solely on the interests of white supremacy is no way to build toward liberation. We must instead take a critical approach to interest convergence that allows us to move forward an antiracist agenda within our white-dominated workplace without losing sight of our own ultimate goals and needs as BIPOC workers.

With this tension in mind, we make use of interest convergence only when it best suits our needs of building an antiracist workspace while continuing to make material changes to the white supremacist culture that surrounds us. For example, when one of us served alongside our department’s senior leader on a committee tasked with providing recommendations for initiatives to aid new faculty with integration to the library, it was a priority to provide recommendations with a clear benefit to new BIPOC faculty. This approach was informed by a recognition that benefits to BIPOC faculty would also benefit white faculty. One recommendation was to create a scheduling system for more equitable sharing of service duties, such as committee work, informed by the heavy service loads of every member of our department at the time. For the sake of interest convergence, one explanation of the recommendation was that it would maximize faculty productivity since productivity is a priority for our white-dominated institution. However, with the antiracist goals and praxis of our department in mind, the recommendation also served as a means of helping to alleviate the excessive service load often placed on racialized academics, particularly BIPOC women, who formed a substantial contingent of our library’s new hires at the time. Thus, interest convergence helped to get the point across, but did not overtake our particular anti-oppression agenda. We saw some of the initial advantages of leaning into interest convergence as an active strategy when our senior leadership group accepted the recommendation. However, we also see some of its drawbacks, as we witnessed the recommendation come to faculty governance for consensus without success. To this day, we have not yet seen a scheduling system take shape, though we have certainly learned more about whose interests have influence in implementing various types of recommendations.

As we apply the CRT frameworks of intersectionality and interest convergence to name and push back against white supremacy in our workplace, we support one another in crafting counter-stories of what it means to be an academic library worker at a private PWI. We craft our counter-stories as a means of making our department what Solorzano, Ceja, and Yosso (2000) describe as a “counter-space” where we can find some reprieve as racialized people working in a white supremacist environment (70-71; Masunaga, Conner-Gaten, Blas, & Young 2022, 16-17). In crafting our departmental counter-space within our academic library, we build a more liberatory community together and extend that to others within our institution and beyond.

Highlighted CRT-Informed Relational Practices

Applying CRT in Our Scholarly Communications Work

Academic librarians are crucial in educating researchers about the importance of scholarly practices such as citation and copyright. As the scholarly communications and information policy department at our academic library, we regularly teach library information sessions and consult one-on-one with users about the myriad ways citation norms and copyright law can help scholars share their research with the public, thereby challenging a common misconception that these are perfunctory steps in the research process. However, we are also scholars in our own right and, as such, have a stake in challenging a conventional approach to these scholarly practices that privileges white, able-bodied, cisgender, and heteronormative perspectives over others.

In the digital zine “How to Cite Like a Badass Tech Feminist Scholar of Color,” Rigoberto Lara Guzmán and Sareeta Amrute (2019) challenge the established rules of citation in which the “scholar” and the “research subject” both contribute to the production of knowledge but only the contribution of the “scholar” merits acknowledgment through citation. The idea that citing is the key to engaging in scholarly conversation privileges writing over other forms of communication and assumes that all perspectives are given equal value in academia (Okun 2021). In reality, many academic authors cite marginalized scholars less often than scholars from dominant groups, erase the scholarly contributions of those outside academia, and deem certain kinds of knowledge unworthy of citation.

Instead of upholding a hierarchy of knowledge with “research subjects” at the bottom, library workers can challenge researchers to think beyond the mechanics of citation and to contend with the socioeconomic structures upholding this scholarly practice – who does academia cite and who does it exclude when it comes time to name the creators and keepers of knowledge? One way our department addresses this issue in instruction sessions is by discussing who the students cite in their research papers and asking if they are citing, or otherwise acknowledging, classmates with whom they have been in conversation during the semester. Since the answer is usually “no,” this creates an opportunity for a larger discussion about who else the students might be omitting from their list of citations and why they should expand their understanding of who is worthy of being cited.

The Cite Black Women Collective is an example of how scholars across the social sciences and humanities have put into practice Black feminist theories and applied those critical theoretical frameworks by disrupting “hegemonic citational politics” (Smith et al. 2021, 14). Citing Black women is at the core of the Collective’s mission, but their work extends further. The Cite Black Women Collective explains: “Our politics of collectivity demand that we strive to embody a particular kind of Black feminist thought, one that rejects profitability, neoliberalism, self-promotion, branding, and commercialization. We make intentional decisions to uptorn neoliberal values of hyper-individualistic profiteering” (Smith et al. 2021, 13). As librarians who are also tenure-track faculty, we must meet expectations of publishing and get cited by other scholars to achieve tenure and promotion. However, these expectations reinforce the profiteering we are trying to challenge.

Instead of succumbing to the pressures of a process shaped by white supremacist values, we approach our tenure and promotion portfolios from an intersectional, collective perspective by publishing in journals that prioritize BIPOC and other marginalized voices and citing scholars who challenge the status quo, regardless of academic status or institutional affiliation. Librarian Harrison W. Inefuku (2021) identified academic publishing as one of the main stumbling blocks for BIPOC scholars and argued that “the persistence of racism in the process creates a negative feedback loop that suppresses the number of faculty of color, resulting in a scholarly record that continues to be dominated by whiteness” (211). Publishing in an established academic journal and being cited as the first author carries prestige that increases our chances of achieving tenure and promotion, even though these do not directly correlate to the effort or creativity that goes into creating a work or the impact a work has on the larger scholarly field. Approaching the tenure and promotion process from a viewpoint shaped by CRT, particularly a problematized approach to interest convergence, allows us to make sense of these contradictions. Our tenure and promotion materials put interest convergence into practice by emphasizing how our institution has benefited from our CRT-informed work and, at the same time, making it clear that our work goes far beyond the DEI initiatives we may be involved with. We can then focus on creating a new path forward instead of trying to meet professional benchmarks that may not align with our values as BIPOC academic librarians informed by Black feminist practices. We might not be able to escape the hyper-individualistic, commercialized expectations of our PWI completely during the tenure and promotion process. Still, we can choose to deprioritize those expectations, pursue work that fulfills us, and share counter-stories in our tenure and promotion materials that highlight our contributions to the collective good over individual achievements.

In our roles as scholarly communication and information policy specialists, we encourage users to question how they consume information and how they create it. One of the ways we do this is by framing copyright as a mechanism to expand the reach of scholarship, not just to monetize creative works. In the age of the Internet, many students first encounter copyright when they learn that they have shared copyrighted material in violation of a social media platform’s rules and have their content taken down. Others may see how large corporations and individuals intentionally use copyright infringement claims to remove content and censor conversations on social media. Our job is to educate scholars on the basic principles of copyright. However, we take it a step further by challenging the use of copyright in ways that prioritize profit over sharing information.

Our department utilizes counter-storytelling when teaching copyright law to empower our students and colleagues to disrupt the power dynamics inherent in the academic publishing process. In doing so, the scholars we train learn how to maintain some legal control over their work and to prioritize the freedom to share their scholarship with communities on the margins of or outside academia over making a profit. In past workshops about research ethics, we asked graduate students to imagine a scenario in which they submitted their thesis for publication in an academic journal and had to negotiate the terms of their publishing agreement. One of the questions we posed to the students was whether they wanted to transfer their copyright to the publisher. The students unanimously responded that they wanted to keep their copyright. Their reasoning for keeping their copyright varied. A common thread, however, was that the students wanted the freedom to make their work widely available instead of having the piece they created stuck behind a publisher’s paywall. Even though it was an imaginary scenario, the students touched on an important reality for many scholars who want to share their work freely but ultimately have to compromise to meet the publication expectations of their institutions.

Authors who seek to make their research widely available for free while retaining their copyright may find traditional academic publishing challenging and instead turn to open access publishing. “Open is an expansive term and encompasses a series of initiatives, policies, and practices that broadly stand for ideals of transparency, accessibility, and openness” (Yoon 2023, “Introduction;” italics in the original). By publishing open access, scholars can reach readers unaffiliated with academic institutions or whose institutions have limited access to publications due to budget restrictions. In theory, authors who publish open access would not have to choose between reaching a large audience, one of the appeals of publishing in established journals, or keeping their copyright. However, we have seen how traditional publishers have co-opted the open access publishing model for their own profit-making goals by charging exorbitant fees and pushing authors to give up their copyright if they want to publish open access (Maron et al. 2019). We encourage authors who consult with us to be aware of this tendency toward co-optation of equity-oriented movements by for-profit publishers, and to think critically about whether the value of working with those publishers measures up to what the publishers will extract from them (and other scholarly authors) in the process.

Despite these challenges, open access has significant implications beyond intellectual property rights for BIPOC scholars, particularly those who engage in research about race, gender, sexuality, and class. Inefuku (2021) wrote: “With the growth of open-access publishing and library publishing programs, there has been a growth of opportunities to create journals dedicated to publishing subjects and methodologies that are ignored by the dominant publications” (205). While open access journals provide necessary platforms for scholars researching marginalized people or topics, Inefuku explained, these publications may not be given the same weight as traditional journals during the tenure and promotion process. Traditional journals still outrank open access journals, which limits the options available to authors who want to pursue open access publishing exclusively and allows cynics to dismiss open access publications as inferior. In addition, publications that rely on steep publication fees to publish open access reproduce the inequities of traditional publishing by making it harder for scholars with less funding to share their research.

Open access exists within the capitalist landscape of academic publishing, an industry that, like librarianship, is predominantly white and gatekeeps scholarship that does not uphold whiteness (Inefuku 2021). Our department promotes open access and sees its potential to make academic publishing more accessible to scholars that publishers have historically excluded. When publishing our own research, we prioritize open access publications and publishers that support self-archiving (or green open access), which allows us to easily share our research with the public for free. At the same time, we recognize that open access is not intrinsically equitable. Analyzing open access, and scholarly communications more broadly, from a critical and intersectional perspective allows us to simultaneously imagine the possibilities and see the stumbling blocks in the path toward equitable academic publishing.

Highlighted CRT-Informed ScholComm Practices

Looking Forward: A Case for CRT Application Across Library Settings

One of the most insidious outcomes of stock stories of diversity is that it is often assumed that the plain and simple presence of Black people, Indigenous people, and people of color represents justice and dismantlement of white supremacy. But white supremacy is not merely manifested in rare, egregious incidents in environments where BIPOC are minoritized; it is normalized across everyday life, and can often operate and have impacts on BIPOC even without a predominating presence of white people within an organization (Delgado and Stefancic 2023). Shaundra Walker (2017) gives an example of this by examining the patronizing and white-supremacy-serving history of 19th-century white industrialist philanthropy on Historically Black Colleges and Universities (HBCUs) and its lasting impacts. From Walker’s incisive critique, we draw an understanding that white supremacy thrives best in the absence of interrogation of norms and practices, and indeed, some of the most hurtful forms of white supremacy are the ones that BIPOC may enact on each other when we have not questioned the everyday norms and practices of our fields. That understanding deeply informs our efforts to go beyond the confines of what is typically defined as scholarly communications work in libraries and to work with library users to challenge traditional views of scholarship and open access. A CRT-informed analysis of library work and workplace culture is applicable across all types of departments and institutions. It’s important to note that although our application of CRT is in the particular setting of a large private PWI, a CRT lens can also inform work at institutions with numerically significant populations of BIPOC employees and users, particularly students of color. It can also inform others’ work in different functional areas and institutions with different racial dynamics from our own.

Another facet of our collective application of CRT that may be useful to others is our use of it as a lens to understand our services for users and our collegial relationships within our organization. Because a common pitfall in DEI work is to focus on departmental services and neglect internal workplace culture, it is particularly significant that library workers tend to both of these aspects carefully and intentionally, rather than choosing one at the expense of the other. Library workers can approach work with an underlying awareness that organizations enact racialization (knowingly or not) through practices like heightened surveillance of how BIPOC workers spend their time, heightened expectations of care labor from BIPOC women without recognition of it as added workload, and more. These are normalized “business as usual” practices. We recognize that library workers’ prioritization of each other as BIPOC colleagues must be expressed first and foremost by pushing back against these “business as usual” approaches. Among library workers at large, one way to enact equity values is to help each other protect work-life boundaries by judiciously refusing requests for labor that cannot be accommodated without compromising capacity for self-care. This is not a way of justifying deprioritizing collegiality, nor is it refusal simply to exercise the power to refuse, both of which are common narratives that implicitly penalize BIPOC employees’ refusal to perform demanded labor. Rather, it is a way of recognizing that white supremacy deeply informs the cultural expectations of library work to provide service even when it compromises our physical and mental selves (Okun 2021). These cultural expectations will not graciously recede as a result of institutional commitment without action, nor even as a result of incremental and partial action toward values alignment. As Espinal, Hathcock, and Rios (2021) point out: “It is clear that we need new approaches. It is not enough to continuously demonstrate and bemoan the state of affairs [of the profession’s racial demographics]; we need to take action, another tenet of CRT” (232). These actions must encompass library workers’ relationships with each other as colleagues in addition to our services to users.

Many other themes that appear in our application of CRT in our services and working relationships can apply to different library settings as well. For example, informed by the primary CRT tenet that racism is normalized across everyday norms and practices, library workers can strive to take an expansive approach to the practices we encourage our users to engage in, to avoid reproducing the same racialized dynamics that have always existed in each sector of library services. In our context as scholarly communications practitioners, we recognize that many scholarly communications departments focus primarily on open access and authors’ rights, often with an orientation toward open access at any cost, and we encourage scholars who work with us to approach open access critically, and to also consider racialized dynamics in their citation practice. We encourage practitioners in other sectors of library work to consider how common practices in those sectors reproduce racialized inequity, and redefine their services and approaches accordingly.

Although the practices of another library department in a different library setting from our own might be completely different, the same CRT analysis could also lead to a new imagination of the underlying values, scope, and practices of that department’s services. We do not simply share space and work together as a department of BIPOC colleagues; as Nataraj et al. encourage, we also work together to recognize how racialized organizational norms and bureaucratic standards impact us (2020). An overall goal in this work of callout and pushback is to support each other in resistance through rest (Hersey 2021). How might CRT inform your understanding of your work and your organization? How can you use it to interrupt “business as usual?”

Conclusion

We write this against the backdrop of persistent attacks on critical race theory, a term that right-wing politicians co-opted and turned into a white supremacist dog whistle for any effort to educate about race or address systemic racism. We also write this at a time when genocides are being openly perpetrated in Palestine, Sudan, the Democratic Republic of the Congo, and other places in the world, while those in power violently suppress vocal opposition to these egregious acts of oppression. These bans—on teaching CRT, on calling out deadly oppression across the globe, on sharing counter-stories in solidarity—cannot be separated from the everyday marginalization we experience as library workers who are Black, Indigenous, and people of color.

Our systemic exclusion from and marginalization in librarianship, a predominantly white profession, means that BIPOC employees face inherent risk when challenging standard scholarly and cultural practices. Even libraries that profess to value marginalized perspectives in their DEI statements fail to translate these words into action and, instead, shift the burden of DEI work onto the relatively few BIPOC library workers that they hire (Brown, Cline, and Méndez-Brady 2021). This burden increased significantly after the 2020 Black Lives Matter protests against police brutality and anti-Black racism, which motivated many institutions to create additional DEI committees and working groups (Rhodes, Bishop, and Moore 2023). Like other academic libraries, our employer released a “commitment to anti-racism” statement in the aftermath of the Black Lives Matter protests, but the statement was never updated after 2020 to outline what concrete actions the organization took, if any, to enact meaningful change for its BIPOC users and employees. The statement was eventually removed from the library website in 2025. Much as the burden of DEI work increased after 2020, it has become even more complex as libraries have begun hiding, watering down, distancing themselves from, or retracting these statements since 2024, often while maintaining the same expectation that equity-oriented labor is still necessary for organizational optics and that BIPOC will carry it out.

The tools of interest convergence, intersectionality, and counter-storytelling shape how we interact with each other and our communities, ultimately helping us navigate the profession in ways that resist white supremacy, capitalism, and individualism. We hope that libraries move beyond the existing model of hiring token BIPOC library workers and expecting them to diversify overwhelmingly white workplaces and instead question why the profession remains so white despite decades of DEI work. In an environment that is at best resistant and at worst actively hostile to any disruption of the status quo, placing the onus of diversity work on BIPOC library workers is ineffective and violent. Although our experience working in a department consisting entirely of BIPOC is rare, we believe that others can learn from how we have carved out our own space and see the potential for building communities of library workers who prioritize living over simply surviving. Our mere presence at a predominantly white institution is not enough to dismantle the racism that thrives in academic libraries such as ours; CRT provides us with frameworks and inspiration to enact meaningful change at our institution and in the field of librarianship more broadly. We call on all library workers to do the same. With the tools that CRT has to offer, we can build new visions of librarianship that benefit everyone and work toward them together.


Acknowledgements

Thank you to our publishing editor, Jess Schomberg, and the editorial board for their flexibility, guidance, and expertise throughout the publication process. We would also like to thank our reviewers, Brittany Paloma Fiedler and Charlotte Roh, for their invaluable feedback and enthusiasm. This project would not have been possible without the encouragement of our manager and associate dean, April Hathcock, who has built a rare departmental culture that deeply supports our efforts to build community and create a healthier work environment. Many thanks to her!


References

Aguirre, Adalberto, Jr. 2010. “Diversity as Interest-Convergence in Academia: A Critical Race Theory Story.” Social Identities 16, no. 6: 763-774. https://doi.org/10.1080/13504630.2010.524782.

Baldivia, Stefani, Zohra Saulat, and Chrissy Hursh. 2022. “Creating More Possibilities: Emergent Strategy as a Transformative Self-Care Framework for Library EDIA Work.” In Practicing Social Justice in Libraries, edited by Alyssa Brissett and Diana Moronta, 133-144. Routledge.

Bell, Derrick A. 1980. “Brown v. Board of Education and the Interest-Convergence Dilemma.” Harvard Law Review 93 (3): 518–33. https://harvardlawreview.org/print/no-volume/brown-v-board-of-education-and-the-interest-convergence-dilemma/.

Brown, Alexandria, James Cheng, Isabel Espinal, Brittany Paloma Fiedler, Joyce Gabiola, Sofia Leung, Nisha Mody, Alanna Aiko Moore, Teresa Y. Neely, and Peace Ossom Williamson. “Statement Against White Appropriation of Black, Indigenous, and People of Color’s Labor.” WOC + Lib. https://www.wocandlib.org/features/2021/9/3/statement-against-white-appropriation-of-black-indigenous-and-people-of-colors-labor?rq=fiedler.

Brown, Jennifer, Nicholae Cline, and Marisa Méndez-Brady. 2021. “Leaning on Our Labor: Whiteness and Hierarchies of Power in LIS Work.” In Knowledge Justice: Disrupting Library and Information Studies through Critical Race Theory, edited by Sofia Y. Leung and Jorge R. López-McKnight, 95-110. MIT Press. https://doi.org/10.7551/mitpress/11969.003.0007.

Chiu, Anastasia, Fobazi M. Ettarh, and Jennifer A. Ferretti. 2021. “Not the Shark, but the Water: How Neutrality and Vocational Awe Intertwine to Uphold White Supremacy.” In Knowledge Justice: Disrupting Library and Information Studies through Critical Race Theory, edited by Sofia Y. Leung and Jorge R. López-McKnight, 49-71. MIT Press. https://doi.org/10.7551/mitpress/11969.003.0005.

Collins, Patricia Hill. 2000. Black Feminist Thought: Knowledge, Consciousness, and the Politics of Empowerment, rev. 10th anniversary ed. Routledge.

Crenshaw, Kimberlé W. 1991. “Mapping the Margins: Intersectionality, Identity Politics, and Violence Against Women of Color.” Stanford Law Review 43, no. 6: 1241-1299.

Crenshaw, Kimberlé, Neil Gotanda, Gary Peller, and Kendall Thomas, eds. 1995. Critical Race Theory: The Key Writings that Formed the Movement. The New Press.

Delgado, Richard, and Jean Stefancic. 2023. Critical Race Theory: An Introduction, 4th ed. New York University Press.

Espinal, Isabel, April M. Hathcock, and Maria Rios. 2021. “Dewhitening Librarianship: A Policy Proposal for Libraries.” In Knowledge Justice: Disrupting Library and Information Studies through Critical Race Theory, edited by Sofia Y. Leung and Jorge R. López-McKnight, 223-240. MIT Press. https://doi.org/10.7551/mitpress/11969.003.0017.

Ettarh, Fobazi. 2018. “Vocational Awe and Librarianship: The Lies We Tell Ourselves.” In the Library with the Lead Pipe, January 10. https://www.inthelibrarywiththeleadpipe.org/2018 /vocational-awe/.

Galvan, Angela. 2015. “Soliciting Performance, Hiding Bias: Whiteness and Librarianship.” In the Library with the Lead Pipe, June 3. https://www.inthelibrarywiththeleadpipe.org/2015/soliciting-performance-hiding-bias-whiteness-and-librarianship/

Guzmán, Rigoberto Lara, and Sareeta Amrute. 2019. “How to Cite Like a Badass Tech Feminist Scholar of Color.” Points (blog). Data & Society, August 22. https://medium.com/datasociety-points/how-to-cite-like-a-badass-tech-feminist-scholar-of-color-ebc839a3619c.

Hathcock, April. 2015. “White Librarianship in Blackface: Diversity Initiatives in LIS.” In the Library with the Lead Pipe, October 7. https://www.inthelibrarywiththeleadpipe.org/2015/lis-diversity/.

Hathcock, April. 2019. “Why Don’t You Want to Keep Us?” At the Intersection (blog), January 18. https://aprilhathcock.wordpress.com/2019/01/18/why-dont-you-want-to-keep-us/.

Hersey, Tricia. 2022. Rest is Resistance: A Manifesto. Little, Brown Spark.

Inefuku, Harrison W. 2021. “Relegated to the Margins: Faculty of Color, the Scholarly Record, and the Necessity of Antiracist Library Disruptions.” In Knowledge Justice: Disrupting Library and Information Studies through Critical Race Theory, edited by Sofia Y. Leung and Jorge R. López-McKnight, 197-216. MIT Press. https://doi.org/10.7551/mitpress/11969.001.0001.

Laws, Mike. 2020. “Why We Capitalize ‘Black’ (and Not ‘White’).” Columbia Journalism Review (blog), June 16. https://www.cjr.org/analysis/capital-b-black-styleguide.php.

Maron, Nancy, Rebecca Kennison, Paul Bracke, Nathan Hall, Isaac Gilman, Kara Malenfant, Charlotte Roh, and Yasmeen Shorish. 2019. Open and Equitable Scholarly Communications: Creating a More Inclusive Future. Association of College and Research Libraries. https://doi.org/10.5860/acrl.1.  

Masunaga, Jennifer, Aisha Conner-Gaten, Nataly Blas, and Jessea Young. 2022. “Community Building, Empowering Voices, and Brave Spaces Through LIS Professional Conferences.” In Practicing Social Justice in Libraries, edited by Alyssa Brissett and Diana Moronta, 14-27. Routledge.

Moore, Alanna Aiko, and Jan Estrellado. 2018. “Identity, Activism, Self-Care, and Women of Color Librarians.” In Pushing the Margins: Women of Color and Intersectionality in LIS, edited by Rose L. Chou and Annie Pho, 349-390. Library Juice Press.

Nataraj, Lalitha, Holly Hampton, Talitha R. Matlin, and Yvonne Nalani Meulemans. “Nice White Meetings: Unpacking Absurd Library Bureaucracy through a Critical Race Theory Lens.” Canadian Journal of Academic Librarianship 6 (2020): 1-15. https://www.erudit.org/en/journals/cjalib/2020-v6-cjalib05325/1075449ar.pdf.

Okun, Tema. 2021. “White Supremacy Culture Characteristics” White Supremacy Culture. https://www.whitesupremacyculture.info/characteristics.html.

Prescod-Weinstein Chanda. 2021. The Disordered Cosmos: A Journey into Dark Matter Spacetime and Dreams Deferred. Bold Type Books.

Quiñonez, Torie, Lalitha Nataraj, and Antonia Olivas. 2021. “The Praxis of Relation, Validation, and Motivation: Articulating LIS Collegiality through a CRT Lens.” In Knowledge Justice: Disrupting Library and Information Studies through Critical Race Theory, edited by Sofia Y. Leung and Jorge R. López-McKnight, 197-216. MIT Press. https://doi.org/10.7551/mitpress/11969.003.0018.

Ray, Victor. 2019. “A Theory of Racialized Organizations.” American Sociological Review 84, no. 1: 26–53. https://doi.org/10.1177/0003122418822335.

Rhodes, Tamara, Naomi Bishop, and Alana Aiko Moore. 2023. “The Work of Women of Color Academic Librarians in Higher Education: Perspectives on Emotional and Invisible Labor.” up//root, February 13. https://www.uproot.space/features/the-work-of-women-of-color.

Smith, Christen A., Erica L. Williams, Imani A. Wadud, Whitney N.L. Pirtle, and The Cite Black Women Collective. 2021. “Cite Black Women: A Critical Praxis (A Statement).” Feminist Anthropology 2, no. 1: 10–17. https://doi.org/10.1002/fea2.12040.

Solorzano, Daniel, Miguel Ceja, and Tara Yosso. 2000. “Critical Race Theory, Racial Microaggressions, and Campus Racial Climate: The Experiences of African American College Students.” The Journal of Negro Education 69, no. 1/2: 60–73. http://www.jstor.org/stable/2696265.

Walker, Shaundra. 2017. “A Revisionist History of Andrew Carnegie’s Library Grants to Black Colleges.” In Topographies of Whiteness: Mapping Whiteness in Library and Information Science, edited by Gina Schlesselman-Tarango, 33–53. Library Juice Press. https://kb.gcsu.edu/lib/3.

Yoon, Betsy. 2023. “A Genealogy of Open.” In the Library with the Lead Pipe, March 1. https://www.inthelibrarywiththeleadpipe.org/2023/genealogy-of-open/.


[1] We choose to capitalize the “B” in Black and to lowercase the “W” in white. We capitalize Black in recognition of its use to describe shared struggle, identity, and community, including a history of slavery that erased many Black people’s knowledge of their specific ethnic heritage (Laws 2020). We do not capitalize white because we see whiteness as a construct that only exists in opposition to racialized communities, and is only used to claim shared identity and community in white supremacist contexts.

by Maria Mejia at August 20, 2025 03:15 PM

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-08-19: Paper Summary: Reproducibility Study on Network Deconvolution

 

     The “reproducibility crisis” in scientific research refers to growing concerns over the reliability and credibility of published findings, in many fields including biomedical, behavioral, and the life sciences (Laraway et al. 2019, Fidler et al. 2021). Over the past decade, large-scale reproducibility projects revealed failures to replicate findings. For example, in 2015 the Open Science Collaboration reported that a larger portion of replicated studies produced weaker evidence for the original findings despite using the same materials. Similarly, in fields like machine learning researchers may publish impressive new methods, but if others can’t reproduce the results, it hinders progress. Therefore, reproducibility matters. It’s science’s version of fact-checking.

In this blog, we’ll break down our recent effort to reproduce the results of the paper Network Deconvolution by Ye et al. (hereafter, "original study"), published in 2020, which claimed that replacing a Batch Normalization (BN) with “deconvolution layers” boosts model performance. During spring 2024 we began this work as our “CS895 Deep Learning Fundamentals” class project, then extended and published it as a journal paper in ReScience C, a venue dedicated to making science more reliable through reproducibility.

What is Rescience C?

ReScience C (initiative article) is a platinum open-access, peer-reviewed journal dedicated to promoting reproducibility in computational science by explicitly replicating previously published research. The journal was founded in 2015 by Nicolas Rougier, a team leader at the Institute of Neurodegenerative Diseases in Bordeaux, France, and Konrad Hinsen, a researcher at the French National Centre for Scientific Research (CNRS). It addresses the reproducibility crisis by encouraging researchers to independently reimplement computational studies using open-source software to verify and validate original results.

Unlike traditional journals, ReScience C operates entirely on GitHub, where submissions are managed as issues. This platform facilitates transparent, collaborative, and open peer-review processes. Each submission includes the replication code, data, and documentation, all of which are publicly accessible and subject to community scrutiny.

The journal covers a wide range of domains within computational science, including computational neuroscience, physics, and computer science. ReScience C provides valuable insights into the robustness of scientific findings and promotes a culture of open access and critical evaluation by publishing both successful and unsuccessful replication attempts.

Our approach in reproducing “Network Deconvolution”

We evaluated the claim that the Network Deconvolution (ND) technique improves deep learning model performance when compared with BN. We re-ran the paper’s original experiments using the same software, datasets, and evaluation metrics to determine whether the reported results could be reproduced. Out of 134 test results, 116 (87%) successfully reproduced the original findings within a 10% margin, demonstrating good reproducibility. Further, we examined the consistency of reported values, documented discrepancies, and discussed the reasons why some results could not be consistently reproduced.

Introduction

BN is a widely used technique in deep learning that accelerates training and enhances prediction performance. However, recent research explores alternatives to BN to further enhance model accuracy. One such method is Network Deconvolution. In 2020, Ye et al. studied the model performance using both BN and ND and found that ND can be used as an alternative to BN, while improving performance. This technique replaces BN layers with deconvolution layers that aim to remove pixel-wise and channel-wise correlations in input data. According to their study, these correlations cause a blur effect in convolutional neural networks (CNNs), making it difficult to identify and localize objects accurately. By decorrelating the data before it enters convolutional or fully connected layers, network deconvolution improves the training of CNNs.

Figure 1: Performing convolution on this real world image using a correlative filter, such as a Gaussian kernel, adds correlations to the resulting image, which makes object recognition more difficult. The process of removing this blur is called deconvolution. Figure 1 in Ye et al. 2020.

In the original study, Ye et al. evaluated the method on 10 CNN architectures using the benchmark datasets CIFAR-10 and CIFAR-100, and later validated the results on the ImageNet dataset. They report consistent performance improvements in ND over BN. Motivated by the potential of this method, we attempted to reproduce these results. We used the same datasets and followed the same methods, but incorporated updated versions of software libraries when necessary to resolve compatibility issues, making this a “soft-reproducibility” study. Unlike “hard reproducibility”, which replicates every detail exactly, soft reproducibility offers a practical approach while still assessing the reliability of the original findings.

Methodology

The authors of the original study reported six values per architecture for the CIFAR-10 dataset as three for BN (1, 20, 100 epoch settings), and three for ND (1, 20, 100 epoch settings). They reported similarly for the CIFAR-100 dataset. To assess the reproducibility as well as consistency, we conducted three runs for each value reported for both CIFAR-10 and CIFAR-100 datasets. For instance, we repeated the experiment using the same hyperparameter settings (Table 1) for batch normalization at 1 epoch for a specific architecture three times, recording the outcome of each run. We then calculated the average of these three results and compared them to the corresponding values from the original study.

Table 1: Hyperparameter settings that we used to reproduce results in Ye et al.

Results

Table 2 shows the results from Table 1 in Ye et al. (Org. Value) and the reproduced averaged values from our study (Rep. Avg) for CIFAR‐10 dataset with 1, 20, and 100 epochs. Architectures: (1) VGG‐16, (2) ResNet‐18, (3) Preact‐18, (4) DenseNet‐121, (5) ResNext‐29, (6) MobileNet v2, (7) DPN‐92, (8) PNASNet‐18, (9) SENet‐18, (10) EfficientNet (all values are presented as percentages). Color code red represents if the reproduced result is lower than the original value by more than 10%, green represents if the reproduced value is greater than the original value, and black represents if the reproduced value is less than the original value, but the difference between the two values is no more than 10%.

Table 2: Reproduced results for CIFAR-10 dataset.

Table 3 shows the results from Table 1 in Ye et al., and the reproduced averaged values from our study for CIFAR‐100 dataset with 1, 20, and 100 epochs. Architectures: (1) VGG‐16, (2) ResNet‐18, (3) Preact‐18, (4) DenseNet‐121, (5) ResNext‐29, (6) MobileNet v2, (7) DPN‐92, (8) PNASNet‐18, (9) SENet‐18, (10) EfficientNet (all values are presented as percentages). The color codes are the same as Table 2.

Table 3: Reproduced results for CIFAR-100 dataset.

The results indicate that although network deconvolution generally enhances model performance, there are certain cases where batch normalization performs better. To assess reproducibility, we applied a 10% threshold for accuracy as our evaluation criterion. On the CIFAR-10 and CIFAR-100 datasets, 36 out of 60 values (60%) were successfully reproduced with improved outcomes. For the ImageNet dataset, 9 out of 14 values showed better reproduced performance. We identified a few instances, particularly in CIFAR-10 and CIFAR-100, where the reproduced accuracy was lower than the original report, mostly occurring when models were trained for just 1 epoch. However, for models trained over 20 and 100 epochs, the reproduced results were generally higher, closely aligning with the original study’s accuracy. One exception is the PNASNet-18 architecture, which demonstrated relatively poor performance across both batch normalization and network deconvolution methods.

We reported the reproduced top‐1 and top‐5 accuracy values for BN and ND for the VGG‐11, ResNet‐18, and DenseNet‐121 using the ImageNet dataset. All the reproduced results fall within our reproducibility threshold, and they confirm the main claim in the original study (Tables 4 and 5).

Table 4: Accuracy values reported by the original study Table 2 and the reproduced values for VGG‐11 with 90 epochs (Rep.: Reproduced value).


Table 5: Accuracy values reported by the original study’s Table 2 and the reproduced values for the architectures ResNet‐18 and DenseNet‐121 with 90 epochs (Rep.: Reproduced value).

During our reproducibility study, we observed a noticeable difference in the training time between BN and ND, which was not reported in the original paper. Training time is a critical factor to consider when building a deep learning architecture and deciding on the computing resources. Therefore, we compare the training times of the BN and ND observed when testing them on the 10 deep learning architectures (Figures 2 and 3).

Figure 2: Training times for each CNN architecture with CIFAR‐10 dataset: (a) with 1 epoch, (b) with 20 epochs, (c) with 100 epochs

Figure 3: Training times for each CNN architecture with CIFAR‐100 dataset: (a) with 1 epoch, (b) with 20 epochs, (c) with 100 epochs

There are large training time gaps visible between BN and ND in DenseNet‐121 and DPN‐92. The shortest time gap is seen in the EfficientNet for the CIFAR‐10. A similar trend can be observed in the CIFAR‐100 dataset except ResNet‐18 has the shortest time difference in 20 epochs.

Discussion

The reproducibility study on network deconvolution presents that out of 134 test results, 116 (87%) successfully reproduced the original findings within a 10% margin, demonstrating good reproducibility. Surprisingly, 80 results actually performed better than the original study, with higher accuracy scores across different models and datasets. These improvements likely stem from updated software libraries, better optimization algorithms, improved hardware, and more stable numerical calculations from the newer versions of the libraries used. While network deconvolution generally outperformed traditional batch normalization methods, there were exceptions where batch normalization was superior, such as with ResNet-18 on certain datasets. Although network deconvolution requires longer training times, the performance gains typically justify the extra computational cost. The technique has gained widespread adoption in modern applications, including image enhancement, medical imaging, and AI-generated images, confirming its practical value and reliability in real-world scenarios.

Behind the Scenes: Challenges and Solutions

Reproducing the study was not without obstacles. On the software side, we encountered PyTorch compatibility issues, dependency conflicts, and debugging challenges. Working with the ImageNet dataset also proved demanding due to its large size of over 160 GB and changes in its folder structure, which required further research to find out solutions. Hardware constraints were another factor, and we had to upgrade from 16 GB to 80 GB GPUs to handle ImageNet training efficiently. These experiences emphasize that reproducibility is not only about having access to the original code, but also about adapting to evolving tools, datasets, and computational resources.

Why this matters for the Machine Learning Community

Reproducibility studies such as this one are essential for validating foundational claims and ensuring that scientific progress builds on reliable evidence. Changes in software or hardware can unintentionally improve performance, which highlights the importance of providing context when reporting benchmarking results. By making our code and data openly available, we enable others to verify our findings and extend the work. We encourage researchers to take part in replication efforts or contribute to venues such as ReScience C to strengthen the scientific community.

Conclusions

Our study finds that the accuracy results reported in the original paper are reproducible within a threshold of 10 percent with respect to the accuracy values reported in the original paper. This verifies the authors’ primary claim that network deconvolution improves the performance of deep learning models compared with batch normalization.

Important Links




– Rochana R. Obadage, Kumushini Thennakoon

by Rochana Obadage (noreply@blogger.com) at August 20, 2025 12:14 AM

August 19, 2025

John Mark Ockerbloom

Meet the people behind the books

Today I’m introducing new pages for people and other authors on The Online Books Page. The new pages combine and augment information that’s been on author listings and subject pages. They let readers see in one place books both about and by particular people. They also show let readers quickly see who the authors are and learn more about them. And they encourage readers to explore to find related authors and books online and in their local libraries. They draw on information resources created by librarians, Wikipedians, and other people online who care about spreading knowledge freely. I plan to improve on them over time, but I think they’re developed enough now to be useful to readers. Below I’ll briefly explain my intentions for these pages, and I hope to hear from you if you find them useful, or have suggestions for improvement.

Who is this person?

Readers often want to know about more about the people who created the books they’re interested in. If they like an author, they might want to learn more about them and their works– for instance, finding out what Mark Twain did besides creating Tom Sawyer and Huckleberry Finn. For less familiar authors, it helps to know what background, expertise, and perspectives the author has to write about a particular subject. For instance, Irving Fisher, a famous economist in the early 20th century, wrote about various subjects, not just ones dealing with economics, but also with health and public policy. One might treat his writings on these various topics in different ways if one knows what areas he was trained in and in what areas he was an interested amateur. (And one might also reassess his predictive abilities even in economics after learning from his biography that he’d famously failed to anticipate the 1929 stock market crash just before it happened.)

The Wikipedia and the Wikimedia Commons communities have created many articles, and uploaded many images, of the authors mentioned in the Online Books collection, and they make them freely reusable. We’re happy to include their content on our pages, with attribution, when it helps readers better understand the people whose works they’re reading. Wikipedia is of course not the last word on any person, but it’s often a useful starting point, and many of its articles include links to more authoritative and in-depth sources. We also link to other useful free references in many cases. For example, our page on W. E. B. Du Bois includes links to articles on Du Bois from the Encyclopedia of Science Fiction, the Internet Encyclopedia of Philosophy, BlackPast, and the Archives and Records center at the University of Pennsylvania, each of which describes him from a different perspective. Our goal in including these links on the page is not to exhaustively present all the information we can about an author, but to give readers enough context and links to understand who they are reading or reading about, and to encourage them to find out more.

Find more books and authors

Part of encouraging readers to find out more is to give them ways of exploring books and authors beyond the ones they initially find. Our page on Rachel Carson, for example, includes a number of works she co-wrote as an employee of the US Fish and Wildlife Service, as well as a public domain booklet on her prepared by the US Department of State. But it doesn’t include her most famous works like Silent Spring and the Sea Around Us, which are still under copyright without authorized free online editions, as are many recent biographies and studies of Carson. But you can find many of these books in libraries near you. Links we have on the left of her page will search library catalogs for works about her, and links on the bottom right will search them for work by her, via our Forward to Libraries service.

Readers might also be interested in Carson’s colleagues. The “Associated authors” links on the left side of Carson’s page go to other pages about people that Carson collaborated with who are also represented in our collection, like Bob Hines and Shirley Briggs. Under the “Example of” heading, you can also follow links to other biologists and naturalists, doing similar work to Carson.

Metadata created with care by people, processed with care by code

I didn’t create, and couldn’t have created (let alone maintained), all of the links you see on these pages. They’re the work of many other people. Besides the people who wrote the linked books, collaborated on the linked reference articles, and created the catalog and authority metadata records for the books, there are lots of folks who created the linked data technology and data that I use to automatically pull together these resources on The Online Books Page. I owe a lot to the community that has created and populated Wikidata, which much of what you see on these pages depends on, and to the LD4 library linked data community, which has researched, developed, and discussed much of the technology used. (Some community members have themselves produced services and demonstrations similar to the ones I’ve put on Online Books.) Other crucial parts of my services’ data infrastructure come from the Library of Congress Linked Data Service and the people that create the records that go into that. The international VIAF collaboration has also been both a foundation and inspiration for some of this work.

These days, you might expect a new service like this to use or tout artificial intelligence somehow. I’m happy to say that the service does not use any generative AI to produce what readers see, either directly, or (as far as I’m aware) indirectly. There’s quite a bit of automation and coding behind the scenes, to be sure, but it’s all built by humans, using data produced in the main by humans, who I try to credit and cite appropriately. We don’t include statistically plausible generated text that hasn’t actually been checked for truth, or that appropriates other people’s work without permission or credit. We don’t have to worry about unknown and possibly unprecedented levels of power and water consumption to power our pages, or depend on crawlers for AI training so aggressive that they’re knocking library and other cultural sites offline. (I haven’t yet had to resort to the sorts of measures that some other libraries have taken to defend themselves against aggressive crawling, but I’ve noticed the new breed of crawlers seriously degrading my site’s performance, to the point of making it temporarily unusable, on more than one occasion.) With this and my other services, I aim to develop and use code that serves people (rather than selfishly or unthinkingly exploiting them), and that centers human readers and authors.

Work in progress

I hope readers find the new “people” pages on The Online Books Page useful in discovering and finding out more about books and authors of interest to them. I’ve thought of a number of ways we can potentially extend and build on what we’re providing with these new pages, and you’ll likely see some of them in future revisions of the service. I’ll be rolling the new pages out gradually, and plan to take some time to consider what features improve readers’ experience, and don’t excessively get in their way. The older-style “books by” and “books about” people pages will also continue to be available on the site for a while, though these new integrated views of people may eventually replace them.

If you enjoy the new pages, or have thoughts on how they could be improved, I’d enjoy hearing from you! And as always, I’m also interested in your suggestions for more books and serials — and people! — we can add to the Online Books collection.

by John Mark Ockerbloom at August 19, 2025 07:22 PM

Meredith Farkas

Rest as a productive act

I’m a member of an online support group for the autoimmune condition I have and one of the recently diagnosed people wrote a post about how hard it is to cope with the pain and fatigue alongside their job, parenting, and housework and sometimes they have to “give in” and rest. They made giving in sound so negative and you could tell that they were filled with shame about it. Like giving in was giving UP and that was unacceptable. My response was to suggest to them that they might consider reframing rest as an active treatment for their condition… namely because it is. I know that I can’t go full-on with anything the way I used to. I need more sleep, I need more rest, even mental exertion sometimes becomes too much. Along with the many meds I take, I see rest as an essential treatment that I need every day, and some days more than others. Given the unpredictable nature of this disease and its flares, I just don’t take on as much as I used to professionally. And when I look at what my Spring and Summer have looked like, as I developed a totally new condition out of nowhere that is still not fully understood or definitively diagnosed (after seeing eight different medical professionals – though at least I’m now under the care of two good specialists), I feel very prescient for having decided not to pursue several opportunities that I wanted to do, but I would absolutely have had to drop.

For those of you with disabilities, spoon theory is probably quite familiar. We only have so many spoons each day – so much capacity for deep thought, stress, physical exertion, and even social interaction before we crash. And crashing often leads to further disability – for example, overexerting myself one day could (and has) lead to a flare of pain, fatigue, and a host of other symptoms that lasts weeks or even a month. So we try to plan our lives around leaving a few spoons in reserve each day, because stuff comes up, right? Our kid tells us as we’re going to bed that they need help with a project that’s due tomorrow. Our colleague is unable to do their part for a presentation we’re supposed to give together tomorrow and we need to figure out how to deliver their part as well. Our spouse gets sick and we have to take care of everything at home on our own. You can’t plan for everything and it’s inevitable that there will be times when you’re going to use up all your spoons and then some, but learning to plan around your capacity and leave some in reserve is a critical skill for those of us with disabilities. And learning how many spoons we have for different types of activities is a process and one that feels like building a sandcastle next to the water at low-tide. It’s an ever-changing endless process.

Even if you don’t live with disabling conditions, I can promise you that you only have so many spoons for each day. If you have a bad tension headache at the end of a workday, if your mind is racing when you try to go to sleep, if your shoulders are knotted and tight, if you’re snapping at the people you love because you’re all out of patience when you get home, if you’re so mentally exhausted that you can’t even make a simple decision like what to eat… those (and others) are signs that you have pushed yourself too hard. Even if you’re not disabled, pushing yourself beyond your capacity disables you, at least temporarily. It makes you less capable of reflection, attention, patience, and solid decision-making. As I’ve mentioned in the past, having too much on our plates (called “time poverty”  in the literature) has been shown to increase our risk of anxiety and depression. And repeatedly pushing yourself too hard puts you at much greater risk of burnout. Whether you are disabled or not, there are consequences for working beyond your capacity. 

And yet, so many of us overwork. For some of us, that’s more the norm than the rare exception, to the point where we see doing our contracted amount of work as underperforming, as lazy, as letting people down. Instead of looking at our to-do lists and seeing that we’re being asked to do way more than is reasonable, we assume that we just need to find new ways to become more productive. Because the failure must be ours, not the system of work that keeps intensifying, and asking us to do more and become expert in more and more things. 

And being productive is a seductive thing, especially for people who have self-esteem issues. If you feel you’re not enough, meeting deadlines and getting things done can make you feel good about yourself temporarily. But it can easily become more about chasing the dopamine hit that comes from completing a task than about doing something meaningful. I think a lot of productivity is that way — feeling busy and getting things done can make us feel useful. If we’re busy, we must be worthwhile, right? It’s sort of a hedge against our existential worries. I must be a good person if I’m getting all these things done on time!

I’ve come to recognize that I feel a strong need to show people that I’m a person who lives up to their commitments and respects other people’s schedules and needs. Basically, I want to be liked, probably (definitely) to an unhealthy extent, and I spend a lot of time worrying that I’m inconveniencing or pissing off others. A library is very much an interdependent ecosystem where one person’s failure to complete a task can impact the timelines and workloads of others. For example, I’ve seen the negative impact that waiting until the end of the fiscal year to do the bulk of one’s book ordering has on our Technical Services staff. I don’t want to be the sort of person who causes stress for another colleague. That said, I think I’m a bit compulsive about my reliability to the point where I put completing tasks on time (even relatively unimportant ones) over my own wellbeing. 

I think how we treat productivity comes from the stories we tell ourselves about who we are. I grew up hearing that I was a uniquely terrible kid and thinking I was inherently unlovable, and while I’ve become much more confident in myself, that assumption still hangs over me and colors my interpretation of everything. I think it’s very hard to feel deserving of rest when you are worried about what your colleagues will think if you have to rely on them because you can’t get x or y done. If you think you’re an inherently good person who is just as deserving of kindness and grace as anyone else at work, I imagine it would be a lot easier to do what you need to stay well.

Sleep by Eugène Carrière (1897)

I was extremely sick and was barely sleeping from March through July and I only took one day of sick leave because I felt like I needed to get all the tasks done before the end of the academic year (I’m on a 9-month contract). And it truly did take me every single one of those days I had left to complete everything. Would it have been the end of the world if I’d taken some sick leave to rest and came back to some of the projects in September? No. But I was also feeling a lot of guilt about needing people to cover for me for certain parts of my job in the Spring due to my new illness and felt the need to overcompensate by being super-productive. 

I want to feel comfortable not completing things. I want to feel ok looking at to-do lists that I know I won’t complete at the end of the term or the year. I want to feel like I can take a sick day if I’ve slept only a few hours, even if it’s last minute and means my reference shift may not be covered. I want to be ok with letting people down if it means safeguarding my well-being. What’s really stunning is that I have much better boundaries than I used to and they are still fairly pathetic. I don’t take on nearly as much as I used to. I’m ok with saying no. I’m far better at conserving my energy and paying attention to my capacity on any particular day. And yet, I have so far to go, especially as I get sicker.

Last week, my family was visiting colleges in the Northeast. On Thursday, I had to wake up around 2am East Coast time, fly all the way back to the West Coast, and, since I arrived home around 10:30am Pacific time, I felt like I had a whole day to get household chores done, in spite of the fact that I was absolutely wrecked (that productivity urge is really ingrained in me). Instead, I spent the bulk of the day on the couch watching TV, went to sleep at 6:45pm, and have no regrets. It wasn’t laziness that kept me on the couch; it was the right treatment for my body and mind. We need to stop feeling guilt for giving our bodies and minds the comfort and rest they need. 

Do we call taking a medication that we need for our survival “giving in?” What if we treated rest as a productive act like exercise? What if we saw rest as protecting our capacity; our overall ability to show up at work and in our lives? What if we saw it as being as integral to our health as the medications we take? And why are we so willing to cheat ourselves out of rest, often for things that in the long-run are not that important?

It’s one thing for me to take the rest I need at home, another entirely to do it when it will impact my colleagues (and, to be clear, I have amazingly lovely and generous colleagues who all support and cover for one another when life inevitably smacks us in the face). I need to keep reminding myself that it’s better for my workplace to have a healthy, happy colleague who is committed to the work and sometimes needs to take time off to stay healthy than a burnt out husk of a colleague. I need to remind myself that I won’t be able to be reflective, creative, or a solid decision-maker if I am too depleted. In the end, rest is integral to my doing my job well, as it is for all of you. You’re doing a service to your place of work when you take the time you need to rest and get/stay healthy because it makes you better at your job.

If you feel like you’re overworking, that you can’t slow down when you need rest, if you feel guilty for taking sick days when you need them, if you rely on getting things done for your self-worth, it’s worth interrogating the stories you tell about yourself and your work. Is whatever you’re going to do that day really more important than your health and, if so, are you really the only one who can do it? Do you give your colleagues grace when they are sick and take the time they need or if they miss a deadline because they have too many competing demands? Why can’t you extend that same grace to yourself? Why do you think you’re not deserving? (I find it sometimes helps to think of myself in the third person and imagine how I’d feel if my colleague needed whatever it is I do.) And if your workplace sucks and someone is going to resent you for doing what you need to do to take care of yourself, the problem is with them, not you. You deserve rest. We all do.

by Meredith Farkas at August 19, 2025 06:47 PM

David Rosenthal

2025 Optical Media Durability Update

Seven years ago I posted Optical Media Durability and discovered:

Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary.
Here are the subsequent annual updates:
It is time once again for the mind-numbing process of feeding 45 disks through the readers to verify their checksums, and yet again this year every single MD5 was successfully verified. Below the fold, the details.


MonthMediaGoodBadVendor
01/04CD-R5x0GQ
05/04CD-R5x0Memorex
02/06CD-R5x0GQ
11/06DVD-R5x0GQ
12/06DVD-R1x0GQ
01/07DVD-R4x0GQ
04/07DVD-R3x0GQ
05/07DVD-R2x0GQ
07/11DVD-R4x0Verbatim
08/11DVD-R1x0Verbatim
05/12DVD+R2x0Verbatim
06/12DVD+R3x0Verbatim
04/13DVD+R2x0Optimum
05/13DVD+R3x0Optimum
The fields in the table are as follows:
The drives I use from ASUS and LG report read errors from the CDs and older DVDs but verify the MD5s correctly.

Surprisingly, with no special storage precautions, generic low-cost media, and consumer drives, I'm getting good data from CD-Rs more than 21 years old, and from DVD-Rs nearly 19 years old. Your mileage may vary. Tune in again next year for another episode.

Previously I found a NetBSD1.2 CD dating from October 1996. Each directory has checksums generated by cksum(1), all but one of which verified correctly despite a few read errors. So some of the data on that CD is bad after nearly 29 years.

by David. (noreply@blogger.com) at August 19, 2025 03:00 PM

August 18, 2025

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-08-18: Eight WSDL Classes Offered for Fall 2025

 

https://xkcd.com/2237/

Eight courses from the Web Science and Digital Libraries (WS-DL) Group will be offered for Fall 2025.  The classes will be a mixture of on-line, f2f, and hybrid. 

Dr. Michael L. Nelson will not be teaching in Fall 2025.  Non-WSDL faculty member Ms. Nasreen Muhammad Arif will also teach a section of  CS 624, and current WSDL PhD student and newly appointed lecturer Bhanuka Mahanama will teach a section of CS 620.

 The Spring 2026 semester is still undecided, but the following courses are likely:

Previous course offerings: S25F24S24F23S23F22S22F21S21F20S20F19S19, and F18.


--Michael


by Michael L. Nelson (noreply@blogger.com) at August 18, 2025 11:43 PM

Open Knowledge Foundation

From Chaos to Order: A Workshop with ODE to Uncomplicate Data in Brazil

The three-hour online session focused on teaching the fundamentals of data usage and quality to professionals from various fields with different levels of familiarity with open data.

The post From Chaos to Order: A Workshop with ODE to Uncomplicate Data in Brazil first appeared on Open Knowledge Blog.

by Anicely Santos at August 18, 2025 07:56 PM

August 17, 2025

Ed Summers

CD Tree

CD Tree

as seen through a window screen.

August 17, 2025 04:00 AM

August 16, 2025

Ed Summers

Sequential

We are sequential beings. Actions cannot be undone; life, as we experience it, cannot be reversed. The irreversibility of human life is the source of our pain and also our wonder.

Spinoza’s Rooms by Madeliene Thien.

August 16, 2025 04:00 AM

August 15, 2025

Lorcan Dempsey

On the dissemination of ideas and innovation

This is an excerpt from a longer contribution I made to Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools. [pdf] It is a sketch only, and somewhat informal, but I thought I would put it here in case of interest. It occasionally references the position paper. It is also influenced by the context in which it was prepared which was a discussion of the informational disciplines and the iSchool in R1 institutions.
It would be interesting to take a fuller discussion in one of two directions, which would fill in more individual or organizational names. The first is empirical, based on survey, citations, and other markers of influence. A second would be to be more opinionated, which would be partial (in more than one sense) but might prompt some reflection about emphasis and omission.
If you wish to reference it, I would be grateful if you cite the full original: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]
On the dissemination of ideas and innovation

The diffusion of ideas

As numerous critics beyond Kristof have observed, the professionalization of the academy prioritizes peer-reviewed publications over other forms of writing. Professors allocate the bulk of their efforts to researching, writing, and publishing in their field journals. The first task of any professor—particularly junior professors—is to publish in prestigious peer-reviewed outlets. Even scholars who have some facility with engaging a wider audience have warned that it takes time away from research.49 It is great when academics also express their ideas to a wider audience. Professional incentives dictate, however, that this will always be the hobby and not the job. // Daniel W. Drezner (2017) The Ideas Industry.

The university plays an important role in the generation and diffusion of ideas and innovation. The Report does not focus on this area. However, informing practice and influencing policy is an important part of what the university does, especially in a practice-oriented discipline. As noted, in a period of change, libraries benefit from data-based frameworks, evidence, and arguments to support advocacy work, or to think about new service areas. In a related context, being interviewed on NPR when expertise is required or writing an op-ed in a leading newspaper are markers of esteem (see the discussion of symbolic capital in the next section).

... in a period of change, libraries benefit from data-based frameworks, evidence, and arguments to support advocacy work, or to think about new service areas.

Drezner’s typology of sources

Dan Drezner writes about the dissemination of ideas in The Ideas Industry. Drezner is interested in how ideas are diffused and taken up in political and policy contexts, and how they lead to action or practical decisions. He discusses the evolving sources of ideas in the policy arena.

How does this play out in the library field?

The incentives Drezner mentions are strongly at play in R1 schools and may not be aligned with broader community engagement. This is evident in the comments of the Early Career Researchers. Of course, taken collectively iSchools do work which influences both practice and policy, and there are some notable connections (Sheffield and exploration of open access, for example). There are also some high-profile iSchool faculty members who make important and visible contributions to broader debate outside the library context.

While they are not think-tanks as such, one can point to Ithaka S&R and OCLC Research, divisions, respectively, of large not-for-profit service organizations, each of which is quite active in working with groups of libraries to develop applied R&D outputs.[1] They tend to focus on areas of topical interest, such as collections, collaboration, research infrastructure and scholarly communication. Over the years, they have worked on a variety of topics (including, for example, metadata and protocols, research support, library collaboration, and user behavior in the case of OCLC Research). Ithaka S&R has an academic and cultural focus. OCLC Research works with academic and public libraries (notably through WebJunction, a learning platform for libraries). In each case, there is definitely an interest in providing knowledge, evidence and models that help influence practice or inform policy.

This interest is also evident in the output of professional associations and others which produce outputs on behalf of members. While different from Drezner’s consultancy category, there are some parallels in terms of providing value to members. Here one might point to the Urban Libraries Council or to the Association for Research Libraries and the Coalition for Network Information, or to the divisions of ALA. ARSL is another example.

Advocacy and other groups also produce materials to inform and guide. Helping with evidence and arguments is important here. SPARC and EveryLibrary are examples.

An important element of what associations and membership groups do is to provide venues for networking and to support communities of practice. They aim to scale learning and innovation within their constituencies.

An important element of what associations and membership groups do is to provide venues for networking and to support communities of practice. They aim to scale learning and innovation within their constituencies.

One can also see that vendors produce occasional reports, as a value-add to customers. Think of Sage or Clarivate for example. In some cases, these may not be seen as more than elevated marketing.

Finally, there is a variety of individual practitioner voices that are quite influential.

I have not given a lot of examples above, because without some analysis, it would be very subjective. However, some exploration of the diffusion of ideas and innovation in this space would be interesting, acknowledging that it is a smaller more tight-knit community than some of the areas Drezner (who is a scholar and commentator of International Relations) discusses.

Public intellectuals and thought leaders

Public intellectuals delight in taking issue with various parts of the conventional wisdom. By their very nature, however, they will be reluctant to proffer alternative ideas that appeal to any mass audience. Thought leaders will have no such difficulty promising that their ideas will disrupt or transform the status quo. And the shifts discussed in this chapter only increase the craving for clear, appealing answers. // Daniel W. Drezner (2017) The Ideas Industry.
This is an inherent tension between scholarship and communication, one that breeds resentment for academics trying to engage a wider audience as well as readers who have to wade through complex, cautious prose. Daniel W. Drezner (2017) The Ideas Industry.

In Drezner’s terms, thought leaders favor large explanatory ideas, and deliberately aim to influence policy and practice. They value clear communication, may view the world through a single frame, and evangelize their ideas actively. Thomas Friedman is an example in the book. Public intellectuals promote critical thought across different arenas, may not offer easy solutions or answers, and emphasize complexity and questioning. Francis Fukuyama and Noam Chomsky are cited examples here.

Drezner notes that the current climate favors thought leaders because their ideas are easier to consume: their big idea can be delivered in a Ted Talk. Perhaps the library community hit peak thought leadership in the heyday of the personal blog, where several influential librarians achieved large audiences.

Platform publications

Computing has Communications of the ACM. Engineering has IEEE Spectrum. Business readers turn to the Harvard Business Review. The HE technology community has Educause Review.

These are what I have called in the past ‘platform’ publications (Dempsey and Walter). They aggregate the attention of an audience within a particular domain, including leadership, practice and research. They provide a platform for their authors, who may be reasonably assured of a broad engaged audience.

The library community does not have such a publication, which could provide a venue for research and practice to co-exist in dialog.  

Ischools and influence on policy and practice

What is the role of the iSchool in influencing policy and informing practice? More specifically, how important is it for Library Studies to visibly do this?

The bilateral research - practice connection is of course much discussed, and I wondered about the gap here above. Is the influence on policy at various levels perhaps less discussed?

This works in a variety of ways, not least through participation in professional venues – membership of associations, presentation where practitioners congregate to learn about direction, partnership with library organizations and libraries.

Again, without supporting evidence, my impression is that there may be a higher level of engagement with policy and practice in Archival Studies than in Library Studies when measured against overall research and education capacity.

I believe that markers of influence are important for elevating the overall profile of Library Studies, and that the initiative should look at this engagement in a further iteration of this work. A comparative perspective would be interesting, thinking firstly of the LAM strands, and then of other practice-oriented disciplines. How do library studies perform in terms of impact on policy/practice compared to other comparable disciplines?

This seems especially important now, given the importance of evidence and arguments in a time of contested value and values.

Collection: LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]

Contribution: Dempsey, L. (2025). Library Studies, the Informational Disciplines, and the iSchool: Some Remarks Prompted by LIS Forward. In LIS Forward (2025) Responses to the LIS Forward Position Paper: Ensuring a Vibrant Future for LIS in iSchools, The Friday Harbor Papers, Volume 2. [pdf]

Contents: Here are the sections from my contribution. Where I have excerpted them on this site, I provide a link.

 


[1] Buschman (2020) talks about ‘white papers’ in the library space, and, focusing attention on the outputs of Ithaka S&R, describes them as ‘empty calories.’

References

Buschman, J. (2020). Empty calories? A fragment on LIS white papers and the political sociology of LIS elites. The Journal of Academic Librarianship, 46(5), 102215. https://doi.org/10.1016/j.acalib.2020.102215

Dempsey, L., & Walter, S. (2014). A platform publication for a time of accelerating change. College and Research Libraries, 75(6), 760–762. https://doi.org/10.5860/crl.75.6.760

Drezner, D. W. (2017). The Ideas Industry: How Pessimists, Partisans, and Plutocrats are Transforming the Marketplace of Ideas. Oxford University Press.

by Lorcan Dempsey at August 15, 2025 01:17 AM

August 14, 2025

David Rosenthal

The Drugs Are Taking Hold

cyclonebill CC-BY-SA
In The Selling Of AI I compared the market strategy behind the AI bubble to the drug-dealer's algorithm, "the first one's free". As the drugs take hold of an addict, three things happen:
As expected, this what is happening to AI. Follow me below the fold for the details.

The price rises

Ethan Ding starts tokens are getting more expensive thus:
imagine you start a company knowing that consumers won't pay more than $20/month. fine, you think, classic vc playbook - charge at cost, sacrifice margins for growth. you've done the math on cac, ltv, all that. but here's where it gets interesting: you've seen the a16z chart showing llm costs dropping 10x every year.

so you think: i'll break even today at $20/month, and when models get 10x cheaper next year, boom - 90% margins. the losses are temporary. the profits are inevitable.

it’s so simple a VC associate could understand it:

year 1: break even at $20/month

year 2: 90% margins as compute drops 10x

year 3: yacht shopping

it’s an understandable strategy: "the cost of LLM inference has dropped by a factor of 3 every 6 months, we’ll be fine”
Source
The first problem with this is that only 8% of the users will pay the $20/month, the 92% use it for free (Menlo Ventures thinks it is only 3%). Indeed it turns out that an entire government agency only pays $1/year, as Samuel Axon reports in US executive branch agencies will use ChatGPT Enterprise for just $1 per agency:
The workers will have access to ChatGPT Enterprise, a type of account that includes access to frontier models and cutting-edge features with relatively high token limits, alongside a more robust commitment to data privacy than general consumers of ChatGPT get. ChatGPT Enterprise has been trialed over the past several months at several corporations and other types of large organizations.

The workers will also have unlimited access to advanced features like Deep Research and Advanced Voice Mode for a 60-day period. After the one-year trial period, the agencies are under no obligation to renew.
Did I mention the drug-dealer's algorithm?

But that's not one of the two problems Ding is discussing. He is wondering why instead of yacht shopping, this happened:
but after 18 months, margins are about as negative as they’ve ever been… windsurf’s been sold for parts, and claude code has had to roll back their original unlimited $200/mo tier this week.

companies are still bleeding. the models got cheaper - gpt-3.5 costs 10x less than it used to. but somehow the margins got worse, not better.
What the A16Z graph shows is the rapid reduction in cost per token of each specific model, but also the rapid pace at which each specific model is supplanted by a better successor. Ding notes that users want the current best model:
gpt-3.5 is 10x cheaper than it was. it's also as desirable as a flip phone at an iphone launch.

when a new model is released as the SOTA, 99% of the demand immediatley shifts over to it. consumers expect this of their products as well.
Source
Which causes the first of the two problems Ding is describing. His graph shows that the cost per token of the model users actually want is approximately constant:
the 10x cost reduction is real, but only for models that might as well be running on a commodore 64.

so this is the first faulty pillar of the “costs will drop” strategy: demand exists for "the best language model," period. and the best model always costs about the same, because that's what the edge of inference costs today.
...
when you're spending time with an ai—whether coding, writing, or thinking—you always max out on quality. nobody opens claude and thinks, "you know what? let me use the shitty version to save my boss some money." we're cognitively greedy creatures. we want the best brain we can get, especially if we’re balancing the other side with our time.
So the business model based on the cost of inference dropping 10x per year doesn't work. But that isn't the worst of the two problems. While it is true that the cost in dollars of a set number of tokens is roughly constant, the number of tokens a user needs is not:
while it's true each generation of frontier model didn't get more expensive per token, something else happened. something worse. the number of tokens they consumed went absolutely nuclear.

chatgpt used to reply to a one sentence question with a one sentence reply. now deep research will spend 3 minutes planning, and 20 minutes reading, and another 5 minutes re-writing a report for you while o3 will just run for 20-minutes to answer “hello there”.

the explosion of rl and test-time compute has resulted in something nobody saw coming: the length of a task that ai can complete has been doubling every six months. what used to return 1,000 tokens is now returning 100,000.
Source
Users started by trying fairly simpple tasks on fairly simple models. The power users, the ones in the 8%, were happy with the results and graduated to trying complex questions on frontier models. So their consumption of tokens exploded:
today, a 20-minute "deep research" run costs about $1. by 2027, we'll have agents that can run for 24 hours straight without losing the plot… combine that with the static price of the frontier? that’s a ~$72 run. per day. per user. with the ability to run multiple asynchronously.

once we can deploy agents to run workloads for 24 hours asynchronously, we won't be giving them one instruction and waiting for feedback. we'll be scheduling them in batches. entire fleets of ai workers, attacking problems in parallel, burning tokens like it's 1999.

obviously - and i cannot stress this enough - a $20/month subscription cannot even support a user making a single $1 deep research run a day. but that's exactly what we're racing toward. every improvement in model capability is an improvement in how much compute they can meaningfully consume at a time.
The power users were on Anthropic's unlimited plan, so this happened:
users became api orchestrators running 24/7 code transformation engines on anthropic's dime. the evolution from chat to agent happened overnight. 1000x increase in consumption. phase transition, not gradual change.

so anthropic rolled back unlimited. they could've tried $2000/month, but the lesson isn't that they didn't charge enough, it’s that there’s no way to offer unlimited usage in this new world under any subscription model.

it's that there is no flat subscription price that works in this new world.
Ed Zitron's AI Is A Money Trap looks at the effect of Anthropic figuring this out on Cursor:
the single-highest earning generative AI company that isn’t called OpenAI or Anthropic, and the highest-earning company built on top of (primarily) Anthopic’s technology.
When Anthropic decided to reduce the rate at which they were losing money, Cursor's business model collapsed:
In mid-June — a few weeks after Anthropic introduced “priority tiers” that required companies to pay up-front and guarantee a certain throughput of tokens and increased costs on using prompt caching, a big part of AI coding — Cursor massively changed the amount its users could use the product, and introduced a $200-a-month subscription.
Cursor's customers weren't happy:
Cursor’s product is now worse. People are going to cancel their subscriptions. Its annualized revenue will drop, and its ability to raise capital will suffer as a direct result. It will, regardless of this drop in revenue, have to pay the cloud companies what it owes them, as if it had the business it used to. I have spoken to a few different people, including a company with an enterprise contract, that are either planning to cancel or trying to find a way out of their agreements with Cursor.
So Cursor, which was already losing money, will have less income and higher costs. They are the largest company buit on the AI major's platforms, despite only earning "around $42 million a month", and Anthropic just showed that their business model doesn't work. This isn't a good sign for the generative AI industry and thus, as Zitron explains in details, for the persistence of the AI bubble.

Ding explains why OpenAi's $1/year/agency deal is all about with similar deals at the big banks:
this is what devins all in on. they’ve recently announced their citi and goldman sachs parterships, deploying devin to 40,000 software engineers at each company. at $20/mo this is a $10M project, but here’s a question: would you rather have $10M of ARR from goldman sachs or $500m from prosumer devleopers?

the answer is obvious: six-month implementations, compliance reviews, security audits, procurement hell mean that that goldman sachs revenue is hard to win — but once you win it it’s impossible to churn. you only get those contracts if the singular decision maker at the bank is staking their reputation on you — and everyone will do everything they can to make it work.
Once the organization is hooked on the drug, they don't care what it costs because both real and political switching costs are intolerable,

Bigger doses are needed

Anjli Raval reports that The AI job cuts are accelerating:
Even as business leaders claim AI is “redesigning” jobs rather than cutting them, the headlines tell another story. It is not just Microsoft but Intel and BT that are among a host of major companies announcing thousands of lay-offs explicitly linked to AI. Previously when job cuts were announced, there was a sense that these were regrettable choices. Now executives consider them a sign of progress. Companies are pursuing greater profits with fewer people.

For the tech industry, revenue per employee has become a prized performance metric. Y Combinator start-ups brag about building companies with skeleton teams. A website called the “Tiny Teams Hall of Fame” lists companies bringing in tens or hundreds of millions of dollars in revenue with just a handful of employees.
Source
Brandon Vigliarolo's IT firing spree: Shrinking job market looks even worse after BLS revisions has the latest data:
The US IT jobs market hasn't exactly been robust thus far in 2025, and downward revisions to May and June's Bureau of Labor Statistics data mean IT jobs lost in July are part of an even deeper sector slowdown than previously believed.

The Bureau of Labor Statistics reported relatively flat job growth last month, but unimpressive payroll growth numbers hid an even deeper reason to be worried: Most of the job growth reported (across all employment sectors) in May and June was incorrect.

According to the BLS, May needed to be revised down by 125,000 jobs to just 19,000 added jobs; June had to be revised down by even more, with 133,000 erroneous new jobs added to company payrolls that month. That meant just 14,000 new jobs were added in June.
...
Against that backdrop, Janco reports that BLS data peg the IT-sector unemployment rate at 5.5 percent in July - well above the national rate of 4.2 percent. Meanwhile, the broader tech occupation unemployment rate was just 2.9 percent, as reported by CompTIA.
Note these points from Janco's table:
Source
The doses are increasing but their effect in pumping the stock hasn't been; the NDXT index of tech stocks hasn't been heading moonwards over the last year.

CEOs have been enthusiastically laying off expensive workers and replacing them with much cheaper indentured servnts on H-1B visas, as Dan Gooding reports in H-1B Visas Under Scrutiny as Big Tech Accelerates Layoffs:
The ongoing reliance on the H-1B comes as some of these same large companies have announced sweeping layoffs, with mid-level and senior roles often hit hardest. Some 80,000 tech jobs have been eliminated so far this year, according to the tracker Layoffs.fyi.
Gooding notes that:
In 2023, U.S. colleges graduated 134,153 citizens or green card holders with bachelor's or master's degrees in computer science. But the same year, the federal government also issued over 110,000 work visas for those in that same field, according to the Institute for Sound Public Policy (IFSPP).

"The story of the H-1B program is that it's for the best and the brightest," said Jeremy Beck, co-president of NumbersUSA, a think tank calling for immigration reform. "The reality, however, is that most H-1B workers are classified and paid as 'entry level.' Either they are not the best and brightest or they are underpaid, or both."
While it is highly likely that most CEOs have drunk the Kool-Aid and actually believe that AI will replace the workers they fired, Liz Fong-Jones believes that:
the megacorps use AI as pretext for layoffs, but actually rooted in end of 0% interest, changes to R&D tax credit (S174, h/t @pragmaticengineer.com for their reporting), & herd mentality/labour market fixing. they want investors to believe AI is driving cost efficiency.

AI today is literally not capable of replacing the senior engineers they are laying off. corps are in fact getting less done, but they're banking on making an example of enough people that survivors put their heads down and help them implement AI in exchange for keeping their jobs... for now.
Note that the megacorps are monopolies, so "getting less done" and delivering worse product by using AI isn't a problem for them — they won't lose business. It is just more enshittification.

Presumably, most CEOs think they have been laying off the fat, and replacing it with cheaper workers whose muscle is enhanced by AI, thereby pumping the stock. But they can't keep doing this; they'd end up with C-suite surrounded by short-termers on H-1Bs with no institutional memory of how the company actually functions. This information would have fallen off the end of the AIs' context.

The deleterious effects kick in

The deleterious effects come in three forms. Within the companies, as the hype about AI's capabilities meets reality. For the workers, and not just those who were laid off. And in the broader economy, as the rush to build AI data centers meets limited resources.

The companies

But Raval sees the weakening starting:
But are leaner organisations necessarily better ones? I am not convinced these companies are more resilient even if they perform better financially. Faster decision making and lower overheads are great, but does this mean fewer resources for R&D, legal functions or compliance? What about a company’s ability to withstand shocks — from supply chain disruptions to employee turnover and dare I say it, runaway robots?

Some companies such as Klarna have reversed tack, realising that firing hundreds of staff and relying on AI resulted in a poorer customer service experience. Now the payments group wants them back.
Of course, the tech majors have already enshittified their customer experience, so they can impose AI on their customers without fear. But AI is enshittifying the customer experience of smaller companies who have acutal competitors.

The workers

Shannon Pettypiece reports that 'A black hole': New graduates discover a dismal job market:
NBC News asked people who recently finished technical school, college or graduate school how their job application process was going, and in more than 100 responses, the graduates described months spent searching for a job, hundreds of applications and zero responses from employers — even with degrees once thought to be in high demand, like computer science or engineering. Some said they struggled to get an hourly retail position or are making salaries well below what they had been expecting in fields they hadn’t planned to work in.
And Anjli Raval note that The AI job cuts are accelerating:
Younger workers should be particularly concerned about this trend. Entire rungs on the career ladder are taking a hit, undermining traditional job pathways. This is not only about AI of course. Offshoring, post-Covid budget discipline, and years of underwhelming growth have made entry-level hiring an easy thing to cut. But AI is adding to pressures.
...
The consequences are cultural as well as economic. If jobs aren’t readily available, will a university degree retain its value? Careers already are increasingly “squiggly” and not linear. The rise of freelancing and hiring of contractors has already fragmented the nature of work in many cases. AI will only propel this.
...
The tech bros touting people-light companies underestimate the complexity of business operations and corporate cultures that are built on very human relationships and interactions. In fact, while AI can indeed handle the tedium, there should be a new premium on the human — from creativity and emotional intelligence to complex judgment. But that can only happen if we invest in those who bring those qualities and teach the next generation of workers — and right now, the door is closing on many of them.
In Rising Young Worker Despair in the United States, David G. Blanchflower & Alex Bryson describe some of the consequences:
Between the early 1990s and 2015 the relationship between mental despair and age was hump-shaped in the United States: it rose to middle-age, then declined later in life. That relationship has now changed: mental despair declines monotonically with age due to a rise in despair among the young. However, the relationship between age and mental despair differs by labor market status. The hump-shape in age still exists for those who are unable to work and the unemployed. The relation between mental despair and age is broadly flat, and has remained so, for homemakers, students and the retired. The change in the age-despair profile over time is due to increasing despair among young workers. Whilst the relationship between mental despair and age has always been downward sloping among workers, this relationship has become more pronounced due to a rise in mental despair among young workers. We find broad-based evidence for this finding in the Behavioral Risk Factor Surveillance System (BRFSS) of 1993-2023, the National Survey on Drug Use and Health (NSDUH), 2008-2023, and in surveys by Pew, the Conference Board and Johns Hopkins University.
History tends to show that large numbers of jobless young people despairing of their prospects for the future is a pre-revolutionary situation.

The economy

Source
Bryce Elder's What’ll happen if we spend nearly $3tn on data centres no one needs? points out the huge size of the AI bubble:
The entire high-yield bond market is only valued at about $1.4tn, so private credit investors putting in $800bn for data centre construction would be huge. A predicted $150bn of ABS and CMBS issuance backed by data centre cash flows would triple those markets’ current size. Hyperscaler funding of $300bn to $400bn a year compares with annual capex last year for all S&P 500 companies of about $950bn.

It’s also worth breaking down where the money would be spent. Morgan Stanley estimates that $1.3tn of data centre capex will pay for land, buildings and fit-out expenses. The remaining $1.6tn is to buy GPUs from Nvidia and others. Smarter people than us can work out how to securitise an asset that loses 30 per cent of its value every year, and good luck to them.
Brian Merchant argues that this spending is so big it is offsetting the impact of the tariffs in The AI bubble is so big it's propping up the US economy (for now):
Over the last six months, capital expenditures on AI—counting just information processing equipment and software, by the way—added more to the growth of the US economy than all consumer spending combined. You can just pull any of those quotes out—spending on IT for AI is so big it might be making up for economic losses from the tariffs, serving as a private sector stimulus program.
Source

Noah Smith's Will data centers crash the economy? focuses on the incredible amounts the big four — Google, Meta, Microsoft, and Amazon — are spending:
For Microsoft and Meta, this capital expenditure is now more than a third of their total sales.
Smith notes that, as a proportion of GDP, this roughly matches the peak of the telecom boom:
That would have been around 1.2% of U.S. GDP at the time — about where the data center boom is now. But the data center boom is still ramping up, and there’s no obvious reason to think 2025 is the peak,
The fiber optic networks that, a quarter-century later, are bringing you this post were the result of the telecom boom.

Source
Over-investment is back, but might this be a good thing?
I think it’s important to look at the telecom boom of the 1990s rather than the one in the 2010s, because the former led to a gigantic crash. The railroad boom led to a gigantic crash too, in 1873 ... In both cases, companies built too much infrastructure, outrunning growth in demand for that infrastructure, and suffered a devastating bust as expectations reset and loans couldn’t be paid back.

In both cases, though, the big capex spenders weren’t wrong, they were just early. Eventually, we ended up using all of those railroads and all of those telecom fibers, and much more. This has led a lot of people to speculate that big investment bubbles might actually be beneficial to the economy, since manias leave behind a surplus of cheap infrastructure that can be used to power future technological advances and new business models.

But for anyone who gets caught up in the crash, the future benefits to society are of cold comfort.
Source
How likely is the bubble to burst? Elder notes just one reason:
Morgan Stanley estimates that more than half of the new data centres will be in the US, where there’s no obvious way yet to switch them on:

America needs to find an extra 45GW for its data farms, says Morgan Stanley. That’s equivalent to about 10 per cent of all current US generation capacity, or “23 Hoover Dams”, it says. Proposed workarounds to meet the shortfall include scrapping crypto mining, putting data centres “behind the meter” in nuclear power plants, and building a new fleet of gas-fired generators.
Good luck with that! It is worth noting that the crash has already happened in China, as Caiwei Chen reports in China built hundreds of AI data centers to catch the AI boom. Now many stand unused.:
Just months ago, a boom in data center construction was at its height, fueled by both government and private investors. However, many newly built facilities are now sitting empty. According to people on the ground who spoke to MIT Technology Review—including contractors, an executive at a GPU server company, and project managers—most of the companies running these data centers are struggling to stay afloat. The local Chinese outlets Jiazi Guangnian and 36Kr report that up to 80% of China’s newly built computing resources remain unused.
Elder also uses the analogy with the late 90s telecom bubble:
In 2000, at the telecoms bubble’s peak, communications equipment spending topped out at $135bn annualised. The internet hasn’t disappeared, but most of the money did. All those 3G licences and fibre-optic city loops provided zero insulation from default:

Peak data centre spend this time around might be 10 times higher, very approximately, with public credit investors sharing the burden more equally with corporates. The broader spread of capital might mean a slower unwind should GenAI’s return on investment fail to meet expectations, as Morgan Stanley says. But it’s still not obvious why creditors would be coveting a server shed full of obsolete GPUs that’s downwind of a proposed power plant.
When the bubble bursts, who will lose money?
A data center bust would mean that Big Tech shareholders would lose a lot of money, like dot-com shareholders in 2000. It would also slow the economy directly, because Big Tech companies would stop investing. But the scariest possibility is that it would cause a financial crisis.

Financial crises tend to involve bank debt. When a financial bubble and crash is mostly a fall in the value of stocks and bonds, everyone takes losses and then just sort of walks away, a bit poorer — like in 2000. Jorda, Schularick, and Taylor (2015) survey the history of bubbles and crashes, and they find that debt (also called “credit” and “leverage”) is a key predictor of whether a bubble ends up hurting the real economy.
The Jorda et al paper is When Credit Bites Back: Leverage, Business Cycles, and Crises, and what they mean by "credit" and "leverage" is bank loans.

Smith looks at whether the banks are lending:
So if we believe this basic story of when to be afraid of capex busts, it means that we have to care about who is lending money to these Big Tech companies to build all these data centers. That way, we can figure out whether we’re worried about what happens to those lenders if Big Tech can’t pay the money back.
And so does The Economist:
During the first half of the year investment-grade borrowing by tech firms was 70% higher than in the first six months of 2024. In April Alphabet issued bonds for the first time since 2020. Microsoft has reduced its cash pile but its finance leases—a type of debt mostly related to data centres—nearly tripled since 2023, to $46bn (a further $93bn of such liabilities are not yet on its balance-sheet). Meta is in talks to borrow around $30bn from private-credit lenders including Apollo, Brookfield and Carlyle. The market for debt securities backed by borrowing related to data centres, where liabilities are pooled and sliced up in a way similar to mortgage bonds, has grown from almost nothing in 2018 to around $50bn today.

The rush to borrow is more furious among big tech’s challengers. CoreWeave, an ai cloud firm, has borrowed liberally from private-credit funds and bond investors to buy chips from Nvidia. Fluidstack, another cloud-computing startup, is also borrowing heavily, using its chips as collateral. SoftBank, a Japanese firm, is financing its share of a giant partnership with Openai, the maker of ChatGPT, with debt. “They don’t actually have the money,” wrote Elon Musk when the partnership was announced in January. After raising $5bn of debt earlier this year xai, Mr Musk’s own startup, is reportedly borrowing $12bn to buy chips.
Smith focuses on private credit:
These are the potentially scary part. Private credit funds are basically companies that take investment, borrow money, and then lend that money out in private (i.e. opaque) markets. They’re the debt version of private equity, and in recent years they’ve grown rapidly to become one of the U.S.’ economy’s major categories of debt:
Source
Are the banks vulnerable to private credit?.
Private credit funds take some of their financing as equity, but they also borrow money. Some of this money is borrowed from banks. In 2013, only 1% of U.S. banks’ total loans to non-bank financial institutions was to private equity and private credit firms; today, it’s 14%.

BDCs are “Business Development Companies”, which are a type of private credit fund. If there’s a bust in private credit, that’s an acronym you’ll be hearing a lot.

And I believe the graph above does not include bank purchases of bonds (CLOs) issued by private credit companies. If private credit goes bust, those bank assets will go bust too, making banks’ balance sheets weaker.
The fundamental problem here is that an AI bust would cause losses that would be both very large and very highly correlated, and thus very likely to be a tail risk not adequately accounted for by the banks' risk models, just as the large, highly correlated losses caused the banks to need a bail-out in the Global Financial Crisis of 2008.

by David. (noreply@blogger.com) at August 14, 2025 03:00 PM

Ed Summers

The Book of Records

I recently finished Madeleine Thien’s The Book of Records and found this in the acknowledgements at the end:

The Book of Records, guided by histories, letters, philosophies, poetry, mathematics and physics, is a work of the imagination. I am indebted to the library, and to librarians, archivists and translators, for their companionship and light–they are the steadfast keepers of the building made of time.

I like how this blends the people and infrastructure of libraries and record keeping, and recognizes them as partners in imagination. Reading and writing are the central theme, of this beautiful book, which David Naimon describes well in the opening to his extended interview with her:

The Book of Records is many things: a book of historical fiction and speculative fiction, a meditation on time and on space-time, on storytelling and truth, on memory and the imagination, a book that impossibly conjures the lives and eras of the philosopher Baruch Spinoza, the Tang dynasty poet Du Fu and the political theorist Hannah Arendt not as mere ghostly presences but portrayed as vividly and tangibly as if they lived here and now in the room where we hold this very book. But most of all this is a book about books, about words as amulets, about stories as shelters, about novels as life rafts, about strangers saving strangers, about friendships that defy both space and time, about choosing, sometimes at great risk to oneself, life and love.

I will add that the underlying theme of being a refugee from various forms of fascism and totalitarianism amidst a catastrophically changing climate really speaks to our moment–especially considering that the book took her ten years to write.

I heard in the interview that Thien worked through copying Spinoza’s Ethics as an exercise while writing The Book of Records. I don’t know if I’m going to do this, but I did enjoy the sections on Spinoza a lot, and previously enjoyed reading about how his philosophy informed Joyful Militancy, so I got a copy too. Fun fact: George Eliot (Mary Ann Evans) wrote the first English translation of Ethics in 1856, but it sat unpublished until 1981.

August 14, 2025 04:00 AM

Library | Ruth Kitchin Tillman

Deeper Dive into Estimating BTAA Sociology Serials Holdings with WMS APIs, Z39.50, and Spreadsheets

Two years ago, my colleague Stephen Woods approached me about collaborating on an article1 extending research he’d already performed about serials use in doctoral sociology work. He and another colleague, John Russell, had developed a methodology for determining “CDRank” based on the number of times a journal was citated across a dissertation dataset and the number/% of dissertations it was cited in.2

On his sabbatical, Stephen had mined citations in 518 sociology dissertations from Big Ten schools. He planned to perform a CDRank analysis and determine the most influential journals by school and see where they overlapped (or didn’t). He had a spreadsheet of titles and ISSNs for the 5,659 distinct journals cited and a related question: What did holdings for these look like across the Big Ten Libraries?

He was interested in whether the highest-ranked journals were more universally held, whether there were any noticeable gaps, basically would any patterns emerge if we looked at the Big Ten’s holdings for these journals. And then, at an institution-by-institution level, were any of the most-used journals for that institution not held by the library?

As anyone who works with it knows, holdings data is notoriously difficult. But I was interested in it as a challenge: could I combine a variety of resources to come up with a reasonable assessment of which libraries had some holdings of the serial in question?

Obtaining Library Holdings: The Summary

The journal article was focused on the outcome, so it wasn’t a place for me to write a deep dive of the process I used for identifying holdings. This is the summarized version from the article:

The 57,777 citations were condensed to a list of 5,659 distinct journal title/ISSN entries. Holdings data from across the BTAA was then queried to determine the extent to which these journals are held by BTAA institutions. It was first necessary to obtain all representative ISSNs for each journal. The WorldCat Metadata API 2.0 was queried by ISSN and results were processed to identify additional ISSNs. These additional ISSNs were used in subsequent queries. During this process, 25 titles were identified that did not include ISSNs and the list was reduced to 5,634 unique pairings.

Holdings data was obtained from the WorldCat Search API v.2 and Z39.50 services. First, the WorldCat Search API’s bibliographic holdings endpoint was queried by ISSN and results limited to a list of OCLC symbols representing the BTAA institutions. However, an institution’s WorldCat holdings may not be up-to-date and are unlikely to represent electronic-only items. In the second step, MarcEdit software was used to query each institution’s Z39.50 service by ISSN for any of the 5,634 entities not found at that institution during the WorldCat API phase. This combined holdings data was saved to a JSON database.

In limitations, I addressed some of the challenges I ran into:

Holdings represent those found in WorldCat and respective library ILSes during November 2023. Several factors limit the effectiveness of representing libraries’ journal holdings. Coverage is not recorded in ways which can be easily machine-parsed at scale to determine whether holdings represent part or all of a publication run. E-journal records are often updated on a monthly basis, resulting in varying results by month. Additionally, if a library does not have sufficient staffing to perform updates, their WorldCat holdings statements may not reflect recent weeding. The presence of WorldCat holdings or of a record in the library’s ILS (queried by Z39.50) indicates, at minimum, that the library has held some coverage of this journal at some point in time.

University of Nebraska-Lincoln’s Z39.50 documentation was not available online and email inquiries were not answered, so the Z39.50 phase could not be run. Gaps in Nebraska’s WorldCat holdings for the combined list of top 161 journals were manually queried by title and ISSNs using the library’s Primo discovery search. As indicated in Results, all but two of these journals were found.

Obtaining Library Holdings: The Whole Story

Even for our original intended audience at Serials Review,3 a full writeup would’ve been too deep a dive (buckle in, this is 2500 words), but I really enjoyed (and hated) the challenge of figuring out how to even tackle the project and solving problems along the way (except when I tore my hair out). So I thought I’d share it here.

I need to preface by noting again that my research question was not whether an institution had complete holdings or precisely which holdings they had. It’s challenging to do that at the scale of one’s own institution. My question was:

Does this institution appear to hold print or electronic versions of some segment of this serial?

Processing ISSNs for Siblings, Mostly

First, I evaluated my starting data. I had journal titles and ISSNs. A quick check confirmed my hypothesis that some were for print materials and some were for e-journals. I wanted to check for both kinds, of course.

Because library records don’t yet have rich cluster ISSNs and I didn’t have a API subscription to the ISSN Portal,4 I decided to use the next best thing – WorldCat. I searched brief bibs in the WorldCat Search API v.2 using the ISSN to obtain all records. I used a function to run through list of ISSNs in a brief bib, clean it up if needed, and append all ISSNs found to a new list. So my output was the title, original ISSN, and a list of all ISSNs found.

{ "title": "journal of loss and trauma",
"original_issn": "1532-5024",
"all_issns": ["1532-5024", "1532-5032"] }

Challenges

The first challenge I ran into was that I was relying on data which came from a field originally intended for recording misprints. However, it had become repurposed to do what I wanted – record the ISSN for the other (print / electronic) version of a serial. Frustration with this dual purpose sparked my MARC Misconceptions post re: the 022$y which explores the problem further. After some rabbit holes attempting to find ways to identify these and some spot checks to identify how often the problem was happening, I finally accepted that the holdings data was just going to be generally accurate. I also decided that I would allow for continuations if they showed up in the ISSN data because when a record had more than 2 ISSNs, my spot checking determined it was almost always for one or more continuations vs. another work entirely.

A more concrete problem was that sometimes ISSN data was recorded with hyphens and sometimes it wasn’t. Sometimes it even contained parentheticals. I developed some rather complex processing logic, including regular expressions and substring slices, to turn a field into just the ISSN, formatted as 1234-5678. Using Regex, I reviewed my data and manually corrected the few errors, most of which were caused by a cataloger typoing a 7-digit ISSN, e.g. 234-5678 and 1234-567.

I also used this phase to manually review a small handful of ISSNs which showed up in two records. In most cases, they were for the same serial. The original citations had used slight title variants (author error) and then the print and electronic ISSNs (natural variance), leading to a database duplication. A few were continuations. I resolved all of these, referring to the ISSN Portal when needed. I also informed Stephen so he could recalculate CD Rank of the merged records.

25 of the journals in the original dataset simply had no ISSNs. While I was running my big scripts to gather sibling ISSNs, I used the ISSN Portal to confirm that they really had no ISSN. Fortunately, all had extremely low CDRanks representing one-off citations.

Querying Holdings: WorldCat

Next, I needed to get actual holdings. The one place I could think of to get holdings in aggregate was, again, WorldCat. I used the bibs holdings API for this one.

First, I created a list of the institutional identifiers for each school. For each record in my database, I ran the list of its potential ISSNs (most often just a pair) through the API using the “heldBySymbol” limiter and grabbed a list of institutions with some holding for this ISSN. It output these to a JSON file/database of records consisting of: title, the original ISSN, the list of ISSNs, the list of holding institutions.

{ "title": "journal of loss and trauma",
"original_issn": "1532-5024",
"holdings": ["MNU", "UPM","IPL","UMC","OSU","GZM","NUI","LDL","IUL","EYM","EEM"],
"all_issns": ["1532-5024", "1532-5032"] }

However, my years of experience working in cataloging departments and with library data meant I also know that WorldCat holdings are unreliable. Worst case for this research, the institution had weeded the journal and not updated their holdings. But, conversely, they likely didn’t provide holdings information for their e-journal records.

Sampling the results I got at this phase, I knew I wasn’t getting the whole picture…

Querying Holdings: Z39.50

So far, I’d been able to work on the whole thing as a single batch – one big batch of ISSN sibling hunts, one big batch where I queried all the library codes at once.5 But now, it was time to get targeted.

I wrote a script to check each each entry in the database for which institutions were not present. It wrote all the ISSNs from these entries to a line break-separated text file of ISSNs. I saved these by symbol, so UPM.txt, EEM.txt, etc. Some of these files were 3000 ISSNs long (but keep in mind that, in most cases, several ISSNs represent the same journal).

I then used MarcEdit to query each institution’s Z39.50 for the whole thing.

Now, in addition to writing MARC files, MarcEdit provides a handy log of your query and results:

Searching on: 1942-4620 using index: ISSN
0 records found in database University of Michigan
Searching on: 1942-4639 using index: ISSN
0 records found in database University of Michigan
Searching on: 1942-535X using index: ISSN
1 records found in database University of Michigan

I saved these as text files6 and then ran a Python script over them to process the search key and results. It read through each pair of lines, made a dict of the results {"1942-4620" : 0, "1942-4639" : 0, "1942-535X" : 1}, then opened the JSON database and updated the holdings. I used an “if value not in” check so that an entry’s holdings would only update once even if the Z39.50 output matched 3 sibling ISSNs from that entry.

…this was one of those moments in coding where you feel like an utter genius but worry that you might be a little unhinged as well.

Querying Holdings: Shared Print

In some cases, the reason an institution didn’t have a journal any more was that they’d contributed it to the BTAA Shared Print Repository. This project specifically targeted journals, so it was entirely possible that one of these institutions had eased its shelf space by sending a run to Shared Print.

Using my contacts at the BTAA, I got emails for the people at Indiana and Illinois who actually managed the projects. Fortunately, both had holdings spreadsheets, including ISSNs, and were willing to share them.

I wrote a Python script to take these (as CSVs) and check for the presence of an ISSN in each spreadsheet. If it found the ISSN, it would write the OCLC library code (UIUSP or IULSP) to the output database. I wrote and ran this while gathering Z39.50 data, since that took several weeks.

This turned out to be a non-issue for the overall project, since almost all of the top 161 journals were held at all the institutions. If contributions to shared print were partly judged on the basis of usage, this would make sense. Still, it might be interesting to look at shared print coverage of the database as a whole.

Minding the Gaps

There was one big gap in the whole thing – University of Nebraska-Lincoln. They had somewhat recently migrated to Alma, their systems librarian at left when they migrated, and they had not yet filled the position. I contacted several people there asking about Z39.50 access for a research project but didn’t hear anything. (Fortunately, they’ve now got a new and engaged systems librarian whom I met at ELUNA this year.)

Anyway, this posed a challenge. If they had Z39.50 turned on, it wasn’t showing up in any of the ways I could think of. I made several attempts, mimicking the many other Alma schools I had queried. Nothing worked.

By this point, we had a combined list of the top 161 journals. We also had partial holdings data for Nebraska from the WorldCat query. So I sat down and did this one manually. I think I searched ~30 journals by hand in their Primo front-end, using advanced search by ISSN and then by title/material type if the ISSN didn’t come up (and then double check to ensure it was the right journal). I marked all the ones I found and used this data to update the top 161.

Because there weren’t many, I decided to be as thorough as possible and manually check each institution’s catalog/discovery/e-journal finders for remaining gaps in the top 161.

Observations

In some ways, my findings were not very exciting: BTAA schools widely hold (at least some of) the journals most commonly used in sociology dissertations. Either that or the commonality of these holdings means that they’re the most widely used. (But many others were just as widely held and not as widely used, so I suspect the former, with internal usage data playing a role in retention.)

Ultimately, my process got far more data than we actually used. I could’ve just run the queries for the top 161. That would’ve been a much smaller project and I could’ve thoroughly validated my results. For example, I would’ve checked any instances where the ISSN list contained more than 2, using the ISSN Portal to be sure these were cases of a journal continuation vs. an actual incorrect ISSN. But when we started, Stephen was still working on his own analysis of the data. And while an enormous job, this yielded a really interesting database of results, something I might be able to revisit in the future. It was also a fascinating challenge.


  1. Woods, Stephen, and Ruth Kitchin Tillman. “Supporting Doctoral Research in Sociology in the BTAA.” Pennsylvania Libraries: Research & Practice, 13, no. 1 (2025). 10.5195/palrap.2025.303↩︎

  2. Woods, Stephen and John Russell. “Examination of journal usage in rural sociology using citation analysis.” Serials Review, 48, no. 1–2 (2022), 112–120. 10.1080/00987913.2022.2127601 ↩︎

  3. We’d intended this for Serials Review, like the other articles Stephen had published in this vein, but they did not respond to our submission for more than 6 months (they did finally acknowledge some time after we pulled it from consideration) and failed to publish an issue, so we pulled it. ↩︎

  4. Though I sure did use it manually throughout the project. ↩︎

  5. Lest this sound smooth on its own, it required a lot of iterative scripting and testing, followed by running them in batches, and dealing with occasional errors which ground things to a halt. It was sometimes exciting and sometimes stressful. At one point, I got snippy with OCLC support for an issue that was on my code’s end (though I still think it should have given a different error message). ↩︎

  6. After spending a day or more running a Z39.50 query, I always felt so nervous at this stage, paranoid that I would close the log while I was attemping to copy it. ↩︎

August 14, 2025 12:00 AM

August 13, 2025

In the Library, With the Lead Pipe

Going around in Circles: Interrogating Librarians’ Spheres of Concern, Influence, and Control

In Brief: The practice placing one’s anxieties into circles of concern, influence, and control can be found in philosophy, psychology, and self-help literature. It is a means of cultivating agency and preventing needless rumination. For librarians, however, it is often at odds with a profession that expects continuous expansion of responsibilities. To reconcile this conflict, it is useful to look back at the original intent of this model, assess the present library landscape through its lens, and imagine a future in which library workers truly feel in control of their vocation.

By Jordan Moore

Introduction

It is a beautiful experience when you discover something that reorients your entire outlook on life. This happened to me during one of my first therapy sessions after being diagnosed with Generalized Anxiety Disorder. My therapist gave me a piece of paper and a pencil and instructed me to draw a large circle. Next, they told me to imagine that circle was full of everything I was anxious about, all the real and hypothetical problems that stressed me out. We labeled that circle “concern.” Then, they asked me to draw a much smaller circle in the middle of it. I would say it was one-tenth the size of the first circle. “That” they said, “represents what you can control.”

A small circle labeled control, within a large circle labeled concern.Figure 1: My first model

I felt disheartened while looking at that picture, as if it spelled out a grave diagnosis. The second circle was already so small, and I could have sworn it was even tinier when I looked back at the page and compared it to the first circle. Then, we began to populate the circle of control with what was in my power to determine – how much sleep I got, how often I reached out to loved ones, how many hours I spent doomscrolling, and so on. Finally, my therapist asked, “How much time do you spend thinking about things in the outer circle?” If I didn’t answer 100%, the number was close. They tapped a finger on the inner circle and, in the way that therapists often phrase advice as a question, asked “What if you concentrated on what is in your control instead?” What if indeed.

That conversation occurred over a decade ago. Since then, I have grown accustomed to categorizing my anxieties into ones of concern or control. If something is weighing on me, but is outside of my circle of control, I do my best not to ruminate on it, or at least redirect my thoughts back to what I, as a single person, can do. I try to devote most of my energy to practices that keep me in good health and good spirits. This has done wonders for my mental health. It has also proven beneficial in my professional life, keeping me focused on the aspects of my job that fulfill me. It has become so integral to my way of thinking that I have even discussed the concept (and the context I learned it from) at work. Naturally, I was at first hesitant to bring “therapy talk” into work. However, it has proven to be a catchy idea. I have been at numerous meetings where someone describes a situation, often the behavior of patrons or administrators, as “outside of our circle,” with a nod in my direction.

Sometimes, though, instead of accepting the situation for what it is, we discuss what adjustments we need to make to our practice or policy to fix the situation. When these types of conversations occur, I think back to that original drawing of two circles. Suddenly, another circle appears between the circle of concern and control. It is the circle of influence. It’s something that wasn’t in my initial understanding of the model, but is in plenty of other illustrations. It is a place meant for one to use tools in their circle of control to enact a small, person-sized amount of impact to their circle of concern. An example of this would be a librarian informing a lingering patron that the library is closing soon. They are not going to pick the patron up and toss them out the door, but they can encourage them to exit promptly. That is a reasonable expectation of influence. An unreasonable expectation would be if that librarian felt the need to make sure that that patron, or any patron, never had a negative thing to say about the library. In my experience, it appears that librarians and libraries seem to have high expectations of influence. I began to wonder why that is, and what could be done to alleviate that burden. To start, I decided to learn more about the model that had been so life-changing for me. That inquiry would take me back further than I expected.

An Unexpected Literature Review

Because I need to find a new therapist every time my health insurance changes – Great job, American Healthcare system! – I unfortunately could not ask the therapist who introduced me to the model of circles how they learned about it. Fortunately, looking for answers is part of my job, and I was able to play both parts of a reference interview. One of the first websites I visited was “Understanding the Circles of Influence, Concern, and Control,” written by Anna K. Scharffner. I noticed that Schraffner’s qualifications include “Burnout and Executive Coach,” which let me know others were thinking about this concept in the workplace. I also noticed that Schraffner’s model includes a sphere of influence. In her description of that area, she writes, “We may or may not have the power to expand our influence… We can certainly try. It is wise to spend some of our energy in that sphere, bearing in mind that we can control our efforts in this sphere, but not necessarily outcomes.”

A circle containing 3 rings: the innermost ring is labeled "circle of control: things I can control," the middle ring is labeled "circle of influence: things I can influence" and the outer ring is labeled "circle of concern: things that are outside of my control"Figure 2: Scharffner’s model

As I continued reading interpretations of the circles model, I noticed references to other concepts that I only had passing familiarity with. The oldest among these was Stoicism. To learn more, I decided to speak with my brother-in-law, a Classical Studies professor. After I told him about what I was researching, he said it had a lot in common with Stoics’ quest to lead a virtuous life by valuing logic and self-possession. At the root of Stoicism is the recognition of the difficult truth that humans cannot control much – neither the whims of capricious gods, nor the actions of flawed human beings. The Greek philosopher Epictetus states in the opening lines of his Enchiridion,

Some things are in our control and others not. Things in our control are opinion, pursuit, desire, aversion, and, in a word, whatever are our own actions. Things not in our control are body, property, reputation, command, and, in one word, whatever are not our own actions (I).

Later, the Roman emperor and philosopher Marcus Aurelius writes in his Meditations, “If thou art pained by any external thing, it is not this that disturbs thee, but thy own judgment about it. And it is in thy power to wipe out this judgment now” (VII. 47).

As unfamiliar and phonetically challenging as these authors and texts were at first glance, I was quickly able to make connections between them and literature in my own frame of reference. I recalled the line in Hamlet, “There is nothing either good or bad, but thinking makes it so” (II.ii). I thought back to reading Man’s Search for Meaning by Victor Frankl, which I picked up on the recommendation of another therapist. I remembered being particularly moved by the line, “Everything can be taken from a man but one thing: the last of the human freedoms – to choose one’s attitude in any given set of circumstances, to choose one’s own way” (75). It turns out I was a fan of Stoicism without knowing it.

Speaking of ideas I learned about in therapy – and you can tell I constantly am – the next concept I came across was cognitive behavior therapy (CBT). Having engaged in CBT work throughout my time in therapy, I was familiar with its thesis that maladaptive behaviors stem from “cognitive distortions,” thoughts and feelings about ourselves and our experiences that do not reflect reality. CBT posits that by challenging these distortions, one can think, feel, and act in a healthier way. What I did not know was that Aaron Beck, one of the pioneers of CBT, was a student of Stoicism. In Cognitive Therapy of Depression, he credits Stoicism as “the philosophical origins of cognitive therapy” (8). The connection made sense once I realized how much of my time with that first therapist was spent battling the cognitive distortion that I could control any situation if I worried about it hard enough.

I still wanted to learn more about the in-between space of influence, and why it seems particularly vast for librarians. As I continued to search for literature about the circle of influence, my references became less tied to philosophy and psychology and closer to self-help and business. One title that kept popping up, and one that I had heard before, was The 7 Habits of Highly Effective People by Stephen Covey. When I started reading it, I felt like I was in familiar territory. Covey supplies anecdotes of people benefitting from concentrating on the elements of their life that they can control, even referencing Viktor Frankl as an example. However, Covey later diverges from the Stoic belief that there are limits to our control. He combines the spheres of control and influence into one circle and instructs readers to pour their energy into it, not necessarily for the sake of their sanity, but for the opportunity to gain more influence. He calls this being “proactive” and writes, “Proactive people focus their efforts in the Circle of Influence. They work on the things they can do something about. The nature of their energy is positive, enlarging, and magnifying, causing their Circle of Influence to increase.” This idea of ever-increasing influence allows Covey to claim, “We are responsible for our own effectiveness, for our own happiness, and ultimately, I would say, for most of our circumstances” (96-98).

A circle within a larger circle. The inner circle is labeled circle of influence, the outer circle is labeled circle of concern. The inner circle has arrows pointing outward, to indicate that the inner circle (circle of influence) is growing. The image is labeled Proactive focus: positive energy enlarges the circle of influence.Figure 3: Covey’s model

Applications in Librarianship

Thinking about Covey’s advice in context of my job made me uneasy. His model, with its arrows pushing ever-outward, gave me the same sense of pressure I got from conversations about how my library or librarianship in general needs to do more to meet that day’s crisis. I also suspected that Covey’s argument for power over all circumstances ignores some basic truths that people, especially those without societal privilege, must face. I knew 7 Habits was popular, with dozens of reprints and special editions since its original publication. However, I was able to find critical voices who shared my skepticism. For instance, in “Opening Pandora’s box: The Unintended Consequences of Stephen Covey’s Effectiveness Movement,” Darren McCabe writes, “Covey preaches freedom, but he fails to acknowledge the constraints on freedom that operate within a capitalist system,” and notes that Covey’s outlook “may be acceptable in a utopian society, but not when one faces inequality, pay freezes, work intensification, monotonous working conditions, autocratic management, or redundancy” (186-187). I also recalled how Schaffner, the burnout specialist, advises against devoting too much energy to the circle of influence, saying we can only control our efforts, not our outcomes. Having brought my research of the history of the spheres model up to the present, I was ready to turn to library literature to see how they play out in the profession.

Giving until it hurts

Since the topic of burnout was fresh on my mind, I began by revisiting Fobazi Ettarh’s “Vocational Awe and Librarianship: The Lies We Tell Ourselves.” In it, she characterizes librarianship’s inflated sense of responsibility and influence like this: “Through the language of vocational awe, libraries have been placed as a higher authority and the work in service of libraries as a sacred duty.” Ettarh describes how this can cause librarians to be underpaid, overworked, and burnt out. After all, it is much more difficult to negotiate the terms of a sacred duty than an ordinary job.

Ettarh is also quoted in Library Journal and School Library Journal’s 2022 Job Satisfaction Survey by Jennifer A. Dixon titled, “Feeling the Burnout: Library Workers Are Facing Burnout in Greater Numbers and Severity—And Grappling with it as a Systemic Problem.” In it, Ettarh states “One of the biggest system-wide problems, when it comes to librarianship, is job creep.” This term describes the continual addition of responsibilities librarians are expected to perform. The report also describes “mission creep,” where libraries, particularly public ones, become response centers for issues that are far afield from routine services. This results in librarians being responsible for assisting patrons experiencing drug overdoses, mental health crises, and homelessness. In these situations, librarians are rarely given additional training or resources, and are indeed dealing with these crises exactly because society at large does not give them adequate attention or funding. In summary, job creep and mission creep cause librarians’ circle of concern to expand, and, as the report illustrates, attempting to exert control or influence over all that new territory can spell disaster. Dixon puts it this way, “With institutions continually cutting budgets without actually reducing their expectations of what library workers can accomplish, those who are committed to service and to their profession will continue pushing themselves to the point of burnout.”

Feeling the disparity

The job satisfaction survey points to another source of discontent for librarians, and that is the cognitive dissonance caused by the gulf between their perceived level of influence and their actual level of influence. For academic librarians, the issue can be seen in the lack of recognition of their expertise in comparison to other professionals on campus. The ambiguous status of academic librarians is also listed as a contribution to low morale in a 2021 Journal of Library Administration review. This review cites Melissa Belcher’s “Understanding the experience of full-time nontenure-track library faculty: Numbers, treatment, and job satisfaction,” which illustrates how academic librarians enjoy less autonomy and less professional courtesy than traditional faculty. I could very much relate to the sentiments expressed in these articles. It is a classic academic librarian conundrum to be expected to be in constant contact with faculty, but not be able to get them to reply to emails. 

For public librarians in the Library Journal survey, the issue can be seen in the disconnect between their perceived status as “heros” or “essential workers” and the antagonism they face from patrons, particularly while attempting to enforce masking during COVID. The Journal of Library Administration review also notes that physical safety is of particular concern to public librarians, stating “It is important to note that morale in libraries can be impacted not only by theoretical or conceptual concerns, but also by qualms about basic physical safety from surrounding communities.” Since hostility toward public libraries has only increased since the report’s publication due to their vilification from federal, state, and local powers, its words are prescient.

Because my experience is limited to academia, I wanted to get a public librarian’s take on the Library Journal job satisfaction survey. When I brought it up to a friend, they were kind enough to share their thoughts, though they wished to remain anonymous. They wrote that during COVID,

“There were a few especially egregious instances of aggression due to patrons’ unwillingness to wear a mask that still affects how I view these folks today. Management was unsupportive, did not originally handle these volatile encounters by stopping people at the door, and expected other staff members lower on the hierarchy to handle these issues.”

In those instances, management had something that was in their control (whether patrons could enter the building without masks) and chose instead to leave it up to librarians to influence patrons’ behaviors.

My friend also provided examples of how management used both vocational awe and job creep to overload staff. They summed up the situation like this, 

“Workloads are never analyzed before staff members are given even more tasks, and if there is any sort of push back, you are viewed as not being a team player. People who speak up are used as examples, and the rest of the staff stays quiet because they fear similar retaliation… I’m always like, ‘OMG, if you don’t like something, please speak up so I’m not constantly viewed as causing trouble and it’s not only me who has the issue.’”

Starting from the top

The stories featured in these articles about job satisfaction and moral, as well as my friend’s account, reminded me of Anne Helen Peterson’s 2022 talk, “The Librarians Are Not Okay,” which appeared in her newsletter, Culture Studies. In it, she lays out the necessity of institutional guardrails to do the work that individual boundaries cannot accomplish alone. She explains that in today’s parlance, “Boundaries are the responsibility of the worker to maintain, and when they fall apart, that was the worker’s own failing.” Guardrails, on the other hand, are “fundamental to the organization’s operation, and the onus for maintaining them is not on the individual, but the group as whole.” An individual’s boundaries can be pushed for many reasons. They could be trying to live up to the ideal that their vocational awe inspires, as Ettarh puts it. Their management may be using that vocational awe to turn any pushback into accusations of betrayal, as both Ettarh and my friend describe. Peterson shows how guardrails can remedy those internal and external pressures by creating a shared understanding of expectations. Those expectations play a critical role in preventing burnout. She gives the example, “an email is not a five-alarm fire, and you shouldn’t train yourself to react as if it was, because that sort of vigilance is not sustainable.” Peterson’s piece caused me to reflect on times that I have talked about the circles of concern, influence, and control in the workplace. I appreciated the moments when all of us, including administration, agreed that something was outside of our responsibility, and we would breathe a sigh of relief. And those occasions when a supervisor or administrator told me not to worry about something I was convinced I needed to worry about? Heaven.

In the interest of exploring what the circles of concern, influence, and control may look like for administrators, I read the most recent Ithaka S+R Library Survey, published in 2022. This survey of academic library leadership offered interesting examples of administrators grappling with the breadth of their concern and the limits of their influence. The report explains,

“Convincing campus leaders of the library’s value proposition remains a challenge. While over 72 percent of library deans and directors report high levels of confidence in their own ability to articulate their library’s value proposition in a way that aligns with the goals of the institution, only 51 percent are confident other senior administrators believe in this alignment.”

The study also lists several key issues, such creating impactful Diversity, Equity, Inclusion and Accessibility (DEIA) initiatives, hiring and retaining staff in technology roles, and supporting Open Access, that leaders categorize as high priorities, yet express a low level of confidence in their organization’s strategies to address these concerns. (This is even before the federal government and the Department of Education began attacking DEIA measures and threatening institutional funding.) At the same time, the survey offers examples of administrators resisting mission creep and focusing their efforts on library service inside their control. The report states, “Deans and directors see the library contributing most strongly to increasing student learning and helping students develop a sense of community, rather than to other metrics such as addressing student basic needs or improving post-graduation outcomes.” Survey results about budgetary considerations also demonstrate the leaders’ commitment to recruiting and retaining positions with high customer-service impact. All in all, the survey shows that these leaders recognize that their library cannot do it all. Because of that, they make strategic choices on where to allot resources, and just as importantly, where to not. Being in charge of their institution, that is their prerogative. But what if that kind of decision-making was available to individuals, as well?

Taking it slow

There is an existing philosophy in our field that complements the philosophy of circles very nicely – slow librarianship. On her blog, Information Wants to be Free, in a post titled “What is Slow Librarianship,” Meredith Farkas describes what slow librarianship values. She writes, “Workers in slow libraries are focused on relationship-building, deeply understanding and meeting patron needs, and providing equitable services to their communities. Internally, slow library culture is focused on learning and reflection, collaboration and solidarity.” In describing what slow librarianship opposes, she writes, “Slow librarianship is against neoliberalism, achievement culture, and the cult of productivity.” Similarly to Peterson, Farkas describes how sticking to these principles require not just boundaries, but guardrails. She writes,

“One of the most important pieces of the slow movement is the focus on solidarity and collective care and a move away from the individualism that so defines the American character. If you’re only focused on your own liberation and your own well-being, you’re doing it wrong.”

What I appreciate about this picture of slow librarianship is that it gives librarians a useful framework to decide if they should dedicate time and energy to a task. It must be meaningful to both the patrons and themselves, and it must support the relationship between them. Better yet, when they identify such a task, they are not going at it alone, but with the community they have developed. Even better still, slow librarianship demands that librarians use their influence not to expand what they control, but to protect what is important to themselves and others.

Another benefit of slow librarianship is that it can alleviate some of the causes of burnout. In “Rising from the Flames: How Researching Burnout Impacted Two Academic Librarians,” Robert Griggs-Taylor and Jessica Lee discuss the changes they have made to their management style after studying and experiencing different factors of burnout. Although the authors do not call their approach slow librarianship, several of their adjustments align with its tenets. This includes encouraging staff to pursue avenues of interest during the workday and to take earned time away from work without overdrawn explanation or guilt. The article is another example of how administrative influence can allow librarians to maintain a healthy circle of control.

I’ve spent the majority of this article using circular imagery to get my point across, but let me offer two more ways of thinking about slow librarianship. In “The Innovation Fetish and Slow Librarianship: What Librarians Can Learn from the Juciero,” Julia Glassman uses flowers, specifically the jacaranda, as a metaphor for the importance of rest and reflection. She explains how in order to bloom in one season, flowers go dormant in others. She writes, “It’s supremely unhealthy, for both individuals and organizations, to try to be in bloom all the time.” I am more of an indoor person, so what comes to my mind is The Fellowship of the Rings and Bilbo Baggins’ description of exhaustion as feeling “like butter that has been scraped over too much bread” (40). When I shared this line with my current therapist, they pointed out that the problem in that scenario is not a lack of butter, but an excess of bread. Librarians have enough butter. We are talented, motivated, and knowledgeable people. There is just too much bread to be concerned about! We can continue to spread ourselves thin, or we can take on only what we can manage without scraping.

Conclusion

If this article were a therapy session – and it may as well be – now would be when the therapist says, “we’re just about out of time” and we would take stock of what we’ve learned. So, we know librarians are being pressured by patrons, administrators, and their own sense of duty to overextend themselves. Even librarians in leadership positions seem to recognize that pouring time, energy, or money into a concern does not guarantee influence over it. This may sound like a sad state of affairs, but I still believe in the philosophy of circles, because it has always meant to cultivate agency in the face of adversity. For librarians and libraries, being cognizant and honest about what aspects of the profession are inside each circle is a start. The next challenge is to maintain those distinctions in the face of internal and external pressures to exert influence over every concern, risking job creep, mission creep, and burnout. Even if one’s work environment is not conducive to such thinking, the beauty of this concept is that it starts with the individual. If it remains an internal process to keep anxiety in check? Great! If it ends up being discussed in staff meetings? Also great! I did not begin talking about it with colleagues in a Covey-esque maneuver to increase my influence in the workplace. In the same vein, I did not write this article with the idea that librarians everywhere will suddenly be free of outsized expectations. Although, the idea certainly is appealing. It would mean not being seen as the last bastion of intellectual freedom or the single remaining thread of a ruined social safety net. Librarians would be able to go slower, grow stronger roots, and not try to cover so much ground (or bread). All that would be lovely, but this exercise has taught me to start small. So I will pose this last question: What would happen if one librarian was empowered to reconsider one of their expectations and nurture one part of their practice that is truly in their control? And yes, that was advice phrased as a question.


Acknowledgements

Thank you to my reviewers, Brea McQueen and Patrice Williams. Thank you to my publishing editor, Jessica Schomberg. Thank you to Alexander Hall, Teaching Professor of Classical Studies & Latin at Iowa State University, for talking shop during family time. Thank you to the public librarian who shared their challenges in trying to create a healthier environment for themself and their colleagues. Thank you to the mental health professionals who have given me advice throughout the years. I’m glad I wrote so much of it down!


Works Cited

Anonymous. Personal interview. 13 March 2025.

Aurelius, Marcus. The Meditations, translated by George Long, 1862. https://classics.mit.edu/Antoninus/meditations.html.

Becher, Melissa. “Understanding the Experience of Full-time Nontenure-track Library Faculty: Numbers, Treatment, and Job Satisfaction.” The Journal of Academic Librarianship, 45, no. 3 (2019) 213-219. https://doi.org/10.1016/j.acalib.2019.02.015.

Beck, Aaron T. Cognitive Therapy of Depression. Guilford Press, 1979.

Covey, Stephen R. The 7 Habits of Highly Effective People. 1989. RosettaBooks LLC, 2012.

Dixon, Jennifer A. “Feeling the Burnout: Library Workers Are Facing Burnout in Greater Numbers and Severity—And Grappling with it as A Systemic Problem.” Library Journal 147, no. 3 (2022): 44. https://www.proquest.com/trade-journals/feeling-burnout/docview/2634087993/se-2.

Epictetus. The Enchiridion, translated by Elizabeth Carter, 1807. https://classics.mit.edu/Epictetus/epicench.html.

Ettah, Fobazi. “Vocational Awe and Librarianship: The Lies We Tell Ourselves.” In the Library With the Lead Pipe, 10 Jan. 2018. https://www.inthelibrarywiththeleadpipe.org/2018/vocational-awe.

Farkas, Meredith. “What is Slow Librarianship?” Information Wants to Be Free, 18 October 2021. https://meredith.wolfwater.com/wordpress/2021/10/18/what-is-slow-librarianship.

Frankl, Viktor E. Man’s Search for Meaning. Translated by Ilse Lasch, New York: Beacon Press, 2006.

Glassman, Julia. “The Innovation Fetish and Slow Librarianship: What Librarians Can Learn from the Juciero.” In the Library With the Lead Pipe, 18 Oct. 2017. https://www.inthelibrarywiththeleadpipe.org/2017/the-innovation-fetish-and-slow-librarianship-what-librarians-can-learn-from-the-juicero.

Griggs-Taylor, R., & Lee, J. “Rising from the Flames: How Researching Burnout Impacted Two Academic Librarians.” Georgia Library Quarterly, 59, no. 4 (2022). https://doi.org/10.62915/2157-0396.2539.

Hulbert, Ioana G. “US Library Survey 2022: Navigating the New Normal.” Ithaka S+R. Last modified 30 March 2023. https://doi.org/10.18665/sr.318642

McCabe, Darren. “Opening Pandora’s Box: The Unintended Consequences of Stephen Covey’s Effectiveness Movement.” Management Learning 42, no. 2 (2011): 183. https://doi.org/10.1177/1350507610389682

Petersen, Anne Helen. “The Librarians are Not Okay.” Culture Studies, 1 May 2022. https://annehelen.substack.com/p/the-librarians-are-not-okay.

Schaffner, Anna Katharina. “Understanding the Circles of Influence, Concern, and Control.” Positive Psychology. Last modified 13 March 2023. https://positivepsychology.com/circles-of-influence.

Shakespeare, William. Hamlet from The Folger Shakespeare. Ed. Barbara Mowat, Paul Werstine, Michael Poston, and Rebecca Niles. Folger Shakespeare Library, https://folger.edu/explore/shakespeares-works/hamlet.

Weyant, E. C., Wallace, R. L., & Woodward, N. J. “Contributions to Low Morale, Part 1: Review of Existing Literature on Librarian and Library Staff Morale.” Journal of Library Administration, 61, no 7 (2021): 854–868. https://doi.org/10.1080/01930826.2021.1972732.

by Jordan Moore at August 13, 2025 04:04 PM

August 12, 2025

Open Knowledge Foundation

[Announcement] Open Data Editor 1.6.0 AI-enhanced Version Release

We are glad to announce today the release of ODE's new version. The app is now evolving into a key companion tool in the early and critical stages of your AI journey.

The post [Announcement] Open Data Editor 1.6.0 AI-enhanced Version Release first appeared on Open Knowledge Blog.

by OKFN at August 12, 2025 08:12 PM

August 10, 2025

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-08-03: The Wayback Machine Has Archived at Least 1.3M goo.gl URLs

The interstitial page for https://goo.gl/12XGLG, telling the user that Google will soon abandon this shortened URL.  



Last year, Google announced it intended to deprecate its URL shortener, goo.gl, and just last week they released the final shut down date of August 25. I was quoted in Tech Friend, a Washington Post newsletter by Shira Ovide, joking that the move "would save Google dozens of dollars."  Then last Friday, Google announced a slight update, and that links that have had some activity in "late 2024" would continue to redirect.


To be sure, the shut down isn't about saving money, or at least not about the direct cost of maintaining the service. goo.gl stopped accepting new shortening requests in 2019, but continued to redirect existing shortened URLs, and maintaining the server with a static mapping of shortened URLs to their full URLs has a negligible hardware cost. The real reason is likely that nobody within Google wants to be responsible for maintaining the service.  Engineers in tech companies get promoted based on their innovation in new and exciting projects, not maintaining infrastructure and sunsetted projects.  URL shorteners are largely a product of the bad old days of social media, and the functionality has largely been supplanted by the companies themselves (e.g., Twitter's t.co service, added ca. 2011).  URL shorteners still have their place: I still use bitly's custom URL service to create mnemonic links for Google Docs (e.g., https://bit.ly/Nelson-DPC2025 instead of https://docs.google.com/presentation/d/1j6k9H3fA1Q540mKefkyr256StaAD6SoQsJbRuoPo4tI/edit?slide=id.g2bc4c2a891c_0_0#slide=id.g2bc4c2a891c_0_0). URL shorteners proliferated for a while, and most of them have since gone away.  The 301works.org project at the Internet Archive has archived a lot, but not all, of the mappings.


When Shira contacted me, one of the things she wanted to know was the scale of the problem. A Hacker News article had various estimates: 60k articles in Google Scholar had the string "goo.gl", and another person claimed that a Google search for "site:goo.gl" returned 9.6M links (but my version of Google no longer shows result set size estimates).


2025-08-03 Google Scholar search for "goo.gl"


2025-08-03 Google search for "goo.gl"



Curious and not satisfied with those estimates, I started poking around to see what the Internet Archive's Wayback Machine has.  These numbers were taken on 2025-07-25, and will surely increase soon based on Archive Team's efforts.  


First, not everyone knows that you can search URL prefixes in the Wayback Machine with the "*" character.  I first did a search for "goo.gl/a*", then "goo.gl/aa*", etc. until I hit something less than the max of 10,000 hits per response. 


https://web.archive.org/web/*/goo.gl/a* 


https://web.archive.org/web/*/goo.gl/aa* 


https://web.archive.org/web/*/goo.gl/aaa* 

https://web.archive.org/web/*/goo.gl/aaaa* 


We could repeat with "b", "bb", "bbb", "bbbb", etc. but that would take quite a while. Fortunately, we can use the CDX API to get a complete response and then process it locally.   


The full command line session is shown below, and then I'll step through it:


% curl "http://web.archive.org/cdx/search/cdx?url=goo.gl/*" > goo.gl

% wc -l goo.gl

 3974539 goo.gl

% cat goo.gl | awk '{print $3}' | sed "s/https://" | sed "s/http://" | sed "s/?.*//" | sed "s/:80//" | sed "s/www\.//" | sort | uniq > goo.gl.uniq

% wc -l goo.gl.uniq

 1374191 goo.gl.uniq


The curl command accesses the CDX API, searching for all URLs prefixed with "goo.gl/*", and saves the response in a file called "goo.gl".  


The first wc command shows that there are 3.9M lines in a single response (i.e., pagination was not used).  Although not listed above, we can take a peek at the response with the head command:


% head -10 goo.gl

gl,goo)/ 20091212094934 http://goo.gl:80/ text/html 404 2RG2VCBYD2WNLDQRQ2U5PI3L3RNNVZ6T 298

gl,goo)/ 20091217094012 http://goo.gl:80/? text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1003

gl,goo)/ 20100103211324 http://goo.gl/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1166

gl,goo)/ 20100203080754 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1010

gl,goo)/ 20100207025800 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1006

gl,goo)/ 20100211043957 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1001

gl,goo)/ 20100217014043 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 999

gl,goo)/ 20100224024726 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1000

gl,goo)/ 20100228025750 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1003

gl,goo)/ 20100304130514 http://goo.gl:80/ text/html 200 HLSTSF76S2N6NDBQ4ZPPQFECB4TKXVCF 1008


The file has seven space-separated columns. The first column is the URL in SURT format (a form of normalizing URLs), the second column is the datetime of the visit, and the third column is the actual URL encountered. The above response shows that the top level URL, goo.gl, was archived many times (as you would expect), and the first time was on 2009-12-12, at 09:49:34 UTC


The third command listed above takes the 3.9M line output file, uses awk to select only the third column (the URL, not the SURT), and the first two sed commands remove the schema (http and https) from the URL, and third sed command removes any URL arguments.  The fourth sed command removes any port 80 remnants, and fifth sed removes any unnecessary "www." prefixes. Then the result is sorted (even though the input should already be sorted, we sort it again just to be sure), then the result is run through the uniq command to remove duplicate URLs.  


We process the URLs and not the SURT form of the URLs because in short URLs, capitalization in the path matters.  For example, "goo.gl/003br" and "goo.gl/003bR" are not the same URL – the "r" vs. "R" matters. 


goo.gl/003br --> http://www.likemytweets.com/tweet/217957944678031360#217957944678031360%23like 


and


goo.gl/003bR --> http://www.howtogeek.com/68999/how-to-tether-your-iphone-to-your-linux-pc/ 


We remove the URL arguments because although they are technically different URLs, the "?d=1" (show destination) and "si=1" (remove interstitial page)  arguments shown above don't alter the destination URLs. 


% grep -i "003br" goo.gl | head -10

gl,goo)/0003br 20250301150956 https://goo.gl/0003bR application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 239

gl,goo)/0003br 20250301201105 https://goo.gl/0003BR application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 328

gl,goo)/0003br?d=1 20250301150956 https://goo.gl/0003bR?d=1 text/html 200 YS7M3IHIYA4PGO37JKUZBPMX3WDCK5QW 591

gl,goo)/0003br?d=1 20250301201104 https://goo.gl/0003BR?d=1 text/html 200 GSJJBSKEC2AULCMM3VLZZ4R7L37X65T7 718

gl,goo)/0003br?si=1 20250301150956 https://goo.gl/0003bR?si=1 application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 237

gl,goo)/0003br?si=1 20250301201105 https://goo.gl/0003BR?si=1 application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 325

gl,goo)/003br 20250228141837 https://goo.gl/003br application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 273

gl,goo)/003br 20250228141901 https://goo.gl/003bR application/binary 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ 281

gl,goo)/003br2 20250302155101 https://goo.gl/003BR2 text/html 200 JO23EZ66WLVAKLHZQ57RS4WEN3LTFDUH 587

gl,goo)/003br2?d=1 20250302155100 https://goo.gl/003BR2?d=1 text/html 200 IQ6K5GU46N3TY3AIZPOYP4RLWZC4GEIT 623



The last wc command shows that there are 1.3M unique URLs, after the URL scheme and arguments have been stripped.  


If you want to keep the arguments to the goo.gl URLs, you can do:


% cat goo.gl | awk '{print $3}' | sed "s/https://" | sed "s/http://" | sed "s/:80//" | sed "s/www\.//" | sort | uniq > goo.gl.args

% wc -l goo.gl.args 

 3518019 goo.gl.args


And the Wayback Machine has 3.5M unique goo.gl URLs if you include arguments (3.5M is, not unsurprisingly, nearly 3X the original 1.3M URLs without arguments).  


Not all of those 1.3M (or 3.5M) URLs are syntactically correct.  A sharp eye will catch that in the first screen shot for https://web.archive.org/web/*/goo.gl/a* there is a URL with an emoji:



Which is obviously not syntactically correct and that URL does not actually exist and is thus not archived:


https://web.archive.org/web/20240429092824/http://goo.gl/a%F0%9F%91%88 does not exist. 



Still, even with a certain number of incorrect URLs, they are surely a minority, and would not effectively change the cardinality of unique 1.3M (or 3.5M) goo.gl URLs archived at the Wayback Machine. 


Shira noted in her article that Common Crawl (CC) told her that they estimated 10M URLs were impacted. I'm not sure how they arrived at that number, especially since the Wayback Machine's number is much lower. Perhaps there are CC crawls that have yet to be indexed, or are excluded from replay by the Wayback Machine, or they were including arguments ("d=1", "si=1"), or something else that I haven't considered. Perhaps my original query to the CDX API contained an error or a paginated response that I did not account for. 


In summary, thankfully the Internet Archive is preserving the web, which includes shortened URLs.  But also, shame on Google for shutting down a piece of web infrastructure that they created, walking away from at least 1.3M URLs they created, and transferring this function to a third party with far fewer resources.  The cost to maintain this service is trivial, even in terms of engineer time. The cost is really just intra-company prestige, which is a terrible reason to deprecate a service. And I suppose shame on us, as a culture and more specifically a community, for not valuing investments in infrastructure and maintenance


Google's concession of maintaining recently used URLs is not as useful as it may seem at first glance.  Yes, surely many of these goo.gl URLs redirect to URLs that are either now dead or are/were of limited importance.  But we don't know which ones are still useful, and recent usage (i.e., popularity) does not necessarily imply importance.  In my next blog post, I will explore some of the shortened URLs in technical publications, including a 2017 conference survey paper recommended by Shira Ovide that used goo.gl URLs, presumably for space reasons, to link to 27 different datasets.  



–Michael





by Michael L. Nelson (noreply@blogger.com) at August 10, 2025 06:33 PM

2025-08-10: Who Cares About All Those Old goo.gl Links Anyway?


11 of the 26 goo.gl URLs for data sets surveyed in Yin & Berger (2017)


In a previous post, I estimated that when Google turns off its goo.gl URL shortening service, at least 1.3M goo.gl URLs are already saved by the Internet Archive's Wayback Machine.  Thanks to the efforts of Archive Team and others, that number will surely grow in the coming weeks before the shutdown.  And Google has already announced plans to keep the links that have recently been used. But all of this begs the question: "who cares about all those old goo.gl links anyway?"  In this post, I examine a single technical paper from 2017 that has 26 goo.gl URLs, one (1/26) of which is scheduled to be deprecated in two weeks.  Assuming this loss rate (1/26) holds for all the goo.gl URLs indexed in Google Scholar, then at least 4,000 goo.gl URLs from the scholarly record will be lost


In our discussions for the Tech Friend article, Shira Ovide shared with me "When to use what data set for your self-driving car algorithm: An overview of publicly available driving datasets", a survey paper published by Yin & Berger at ITSC 2017 in Japan (preprint at ResearchGate).  I can't personally speak to the quality of the paper or its utility in 2025, but it's published at an IEEE conference and according to Google Scholar it has over 100 citations, so for the sake of argument I'm going to consider this a "good" paper, and that as a survey it is still of interest some 8 years later.  


109 citations for Yin & Berger on 2025-08-09 (live web link). 


The paper surveys 27 data sets that can be used to test and evaluate self-driving cars.  Of those 27 data sets, 26 of them are directly on the web (the paper describing the BAE Systems data set has the charming chestnut "contact the author for a copy of the data").  For the 26 data sets that are on the web, the authors link not to the original link, such as:


http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/daimler_pedestrian_benchmark_d.html


but to the much shorter:


https://goo.gl/l3U2Wc


Presumably, Yin & Berger used the shortened links for ease and uniformity of typesetting.  Especially in the two column IEEE conference template, it is much easier to typeset the 21 character goo.gl URL rather than the 98 character gavrila.net URL. But the convenience of the 77 character reduction comes with the loss of semantics: if the gavrila.net URL rotted (e.g., became 404, the domain was lost), then by visual inspection of the original URL, we know to do a search engine query for "daimler pedestrian benchmark" and if it's still on the live web with a different URL, we have a very good chance of (re)discovering its new location (see Martin Klein's 2014 dissertation for a review of techniques).  But if goo.gl shuts down, and all we're left with in the 2017 conference paper is the string "l3U2Wc", then we don't have the semantic clues we need to find the new location, nor do we have the original URL with which to discover the URL in a web archive, such as the Internet Archive's Wayback Machine. 


Fortunately, http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/daimler_pedestrian_benchmark_d.html is still on the live web. 



Let's consider another example that is not on the live web. The short URL:


https://goo.gl/07Us6n


redirects to:


https://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ 


Which is currently 404:

https://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ (via https://goo.gl/07Us6n) is now 404 on the live web. 


From inspection of the 404 URL, we can guess that "Caltech Pedestrians" is a good SE query, and the data appears to be available from multiple locations, including the presumably now canonical URL https://data.caltech.edu/records/f6rph-90m20.  (The webmaster at vision.caltech.edu should use mod_rewrite to redirect to data.caltech.edu, but that's a discussion for another time). 



The Google SERP for "Caltech Pedestrians": it appears the data set is in multiple locations on the live web.  


 

https://data.caltech.edu/records/f6rph-90m20 is presumably now the canonical URL and is still on the live web.


Even if all the caltech.edu URLs disappeared from the live web, fortunately the Wayback Machine has archived the original URL.  The Wayback Machine has archived the new data.caltech.edu URL as well, though it appears to be far less popular (so far, only 8 copies of data.caltech.edu URL vs. 310 copies of the original vision.caltech.edu URL). 



https://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ is well archived at the Wayback Machine. 


This 2017-03-29 archived version is probably close to the state of the page at the time as it was cited by Yin & Berger in 2017. 


The new data.caltech.edu URL is archived, but less so (so far). 


Resolving the 26 goo.gl URLs, 18 of them successfully terminate in an HTTP 200 OK.  The eight that did not have the following response codes or conditions:



Although marked as "404" above, goo.gl/0R8XX6 resolves to an HTTP 200 OK, but it's that HTTP 200 response is actually to an interstitial page saying that this URL was not accessed in late 2024, and thus will be sunsetted on 2025-08-25.  Appending the argument "?si=1" to bypass the interstitial page results in a redirection to the 3dvis.ri.cmu.edu page, and that URL is 404.  Fortunately, the page is archived at the Wayback Machine.  For those in the community, perhaps there is enough context to rediscover this data set, but the first several hits for the query for "CMU Visual Localization Dataset" does not return anything that is obvious to me as the right answer (perhaps the second hit subsumes the original data set?). 



The reference to http://goo.gl/0R8XX6 in Yin & Berger (2017). 




A Google query for "CMU Visual Localization Dataset" on 2025-08-10; perhaps the data set we seek is included in the second hit? 


https://goo.gl/0R8XX6 did not win the popularity contest in late 2024, and will cease working on 2025-08-25. It appears that dereferencing the URL now (August 2025) will not save it. 



Dereferencing https://goo.gl/0R8XX6?si=1 yields http://3dvis.ri.cmu.edu/data-sets/localization/, which no longer resolves (which is technically not an HTTP event, since there is not a functioning HTTP server to respond). 


https://3dvis.ri.cmu.edu/data-sets/localization/ was frequently archived between 2015 and 2018.



https://3dvis.ri.cmu.edu/data-sets/localization/ as archived on 2015-02-19.


So under the current guidance, one of the 26 goo.gl URLs (https://goo.gl/0R8XX6) in Yin & Berger (2017) will cease working in about two weeks, and it's not immediately obvious that the paper provides enough context to refind the original data set. This is compounded by the fact that the original host, 3dvis.ri.cmu.edu, no longer resolves.  Fortunately, the Wayback Machine appears to have the site archived (I have not dived deeper to verify that all the data has been archived; cf. our Web Science 2025 paper).  


2025-08-03 Google Scholar search for "goo.gl"


Here, we've only examined one paper, so the next natural question would be "how many other papers are impacted?"  A search for "goo.gl" at Google Scholar a week ago estimated 109,000 hits. Surely some of those hits include simple mentions of "goo.gl" as a service and don't necessarily have shortened links.  On the other hand, URLs shorteners are well understood and probably don't merit extended discussion, so I'm willing to believe that nearly all of the 109k hits have at least one shortened URL in them; the few that do not are likely balanced by Yin & Berger (2017), which has 26 shortened URLs.


For simplicity, let's assume there are 109,000 shortened URLs indexed by Google Scholar.  Let's also assume that the sunset average (1/26, or 4%) for the URLs in Yin & Berger (2017) also holds for the collection.  That would yield 109,000 * 0.04 = 4,360 shortened URLs to be sunsetted on 2025-08-25.  Admittedly, these are crude approximations, but saying there are "at least 4,000 shortened URLs that will disappear in about two weeks" passes the "looks right" test, and if forced to guess, I would bet that the actual number is much larger than 4,000.  Are all 4,000 "important"? Are all 4,000 unfindable on the live web? Are all 4,000 archived?  I have no idea, and I suppose time will tell.  As someone who has devoted much of their career to preserving the web, especially the scholarly web, deprecating goo.gl feels like an unforced error in order to save "dozens of dollars".  


–Michael 





A gist with the URLs and HTTP responses is available.


by Michael L. Nelson (noreply@blogger.com) at August 10, 2025 06:32 PM

August 09, 2025

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-08-09: ODU's Strategic Research Thrust Areas Uniquely Describe ODU


https://www.odu.edu/research/strategic-research-areas 


ODU published its "Strategic Research Areas" last April, and I've been meaning to comment on it for a while. First, it hasn't been well messaged (yet), as reflected by an informal poll of my WSDL peers.  Second, I'm reaching the stage in my career where I read with enthusiasm "strategy" and "policy" documents.


So what makes this strategy statement different from previous iterations?  First, it uniquely describes ODU.  Instead of a collection of generic terms like "impact", "development", "innovation", etc., the four thrusts identified capture many of ODU's current activities and are areas of ODU's comparative and competitive advantage. The official document has more eloquent language justifying the four thrusts, but mostly it comes down to the geography and resulting industry and demographics of the Hampton Roads region:



Certainly other institutions are active in some of these areas, but I can't think of another institution more centered at the intersection of the four.  In addition to the main areas, there are five cross-cutting research areas that, while not unique to ODU, are critically important and enabling to nearly every research pursuit: 



Happily, I find myself, and WSDL, in most if not all of the five cross-cutting areas. 


This "4+5" model does not exhaustively catalog all of ODU's research areas, but it is a helpful descriptive and prescriptive model for informing future resource investments. All institutions have to choose what they are going to be good at, and in this case, we've chosen to be good at the things that are unique to ODU and Hampton Roads.  These are difficult times for university research, and the nation seems to have lost sight of the economic impact of funding higher education.  Hopefully ODU's alignment of research thrusts to this unique combination of the region's strengths – and weaknesses – will allow us to demonstrate that higher education is a public good. 



–Michael



by Michael L. Nelson (noreply@blogger.com) at August 09, 2025 08:09 PM

August 08, 2025

Journal of Web Librarianship

Seeing No Red Flags: Why Do Authors Ignore Journal Titles?

.

by Chitnarong Sirisathitkul Division of Physics, School of Science, Walailak University, Nakhon Si Thammarat, ThailandChitnarong Sirisathitkul obtained his D.Phil. in 2000 from the University of Oxford, UK. He has been working at Walailak University, Thailand, since 2001, where he was awarded the Best Teacher in 2005 and the Best Researcher in 2017. He is currently an associate professor and head of the Division of Physics at the School of Science. His publications in Scopus-indexed journals cover topics such as magnetic materials, traditional ceramics and artificial intelligence in education. Serving as the editor of Area Based Development Research Journal and Thai Journal of Physics, he is also interested in scholarly publication ethics. at August 08, 2025 02:36 AM

August 07, 2025

LibraryThing (Thingology)

Author Interview: Joanne Harris

Joanne Harris

LibraryThing is pleased to sit down this month with bestselling Anglo-French author Joanne Harris, whose 1999 novel, Chocolat—shortlisted for the Whitbread Award—was made into a popular film of the same name. The author of over twenty novels, including three sequels to Chocolat—as well as novellas, short stories, game scripts, screenplays, the libretti for two operas, a stage musical, and three cookbooks, her work has been published in over fifty countries, and has won numerous awards. She was named a Member of the Order of the British Empire (MBE) in 2013 and an Officer of the Order of the British Empire (OBE) in 2022, for services to literature. A former teacher, Harris is deeply involved in issues of author rights, serving two terms as Chair of the UK’s Society of Authors (SOA) from 2018 to 2024. She is a patron of the charity Médecins Sans Frontières (Doctors Without Borders), to which she donated the proceeds from sales of her cookbooks. Cooking and food are consistent themes in her work, and she returns to the story of her most famous culinary character in her newest novel, Vianne, a prequel to Chocolat that is due out from Pegasus Books in early September. Harris sat down with Abigail this month to discuss this new book.

Set six years before the events of Chocolat, your new book is actually the fifth novel about Vianne Rocher to be released. What made you decide you needed to write a prequel? Did any of the ideas for the story come to you as you were writing the other books about Vianne, or was it all fresh and new as you wrote?

Vianne and I have travelled together for over 25 years, and although we’re different in many ways, I think we have some things in common. When I wrote Chocolat, I was the mother of a small child, and wrote Vianne’s character from a similar perspective. I left her in 2021 as the mother of two children, both young adults, and I realized that both Vianne and I needed to look back in order to move forward. Hence Vianne, my protagonist’s origin story, which answers a number of questions left unanswered at the end of Chocolat, and hopefully gives some insights into her journey. Most of it was new; I found a few references in Chocolat to work from, but until now I’ve had very little idea of what Vianne’s past might have been, which made the writing of this book such an interesting challenge.

Food and cooking are important themes in your work. Why is that? What significance do they have for you, and what can they tell us about the characters in your stories, and the world in which they live?

Food is a universal theme. We all need it, we all relate to it in different, important ways. It’s a gateway to culture; to the past; to the emotions. In Vianne it’s also a kind of domestic magic, involving all the senses, and with the capacity to transport, transform and touch the lives of those who engage with it.

Talk to us about chocolate! Given its importance in some of your best-known fiction, as well as the fact that you published The Little Book of Chocolat (2014), I think we can assume you enjoy this treat. What are your favorite kinds? Are there real life chocolatiers you would recommend, or recipes you like to make yourself? (Note: the best chocolate confections I myself ever tasted came from Kee’s Chocolates in Manhattan).

As far as chocolate is concerned, my journey has been rather like Vianne’s. I really didn’t know much about it when I wrote Chocolat, but since then I’ve been involved with many artisanal chocolatiers, and I’ve travelled to many chocolate producing countries. Some of my favourites are Schoc in New Zealand, and Claudio Corallo in Principe, who makes single-origin bean to bar chocolate on location from his half-ruined villa in the rainforest. And David Greenwood-Haigh, a chef who incorporates chocolate into his recipes much as Vianne does in the book (and who created the “chocolate spice” to which I refer in the story.)

Like its predecessors (or successors, chronologically speaking), Vianne is set in France. As the daughter of an English father and French mother, what insight do you feel you bring to your stories, from a cultural perspective? Do you feel you are writing as an insider, an outsider, perhaps both— and does it matter?

I think that as a dual national, there’s always a part of me that feels slightly foreign, which is why Vianne, too, is a perpetual outsider. But I do know enough about France to write with authority and affection – and maybe a little nostalgia, too. The France of my books is a selective portrait, based on the places and people I love, some of which have disappeared. These books are a way of making them live again.

Tell us a little bit about your writing process. Are you someone who maps out your story beforehand, or do you like to discover where things are going as you write? Do you have a particular writing routine? What advice would you give young writers who are just getting started?

My process varies according to the book, but as a rule I don’t map out the story in its entirety: I usually start with a voice, and a mission, and a number of pivotal scenes, and I see where that takes me. I write where I can: if I’m at home, I prefer my shed in the garden, but I can make do with any quiet space. My process involves reading aloud, so it’s best if I’m alone. And I use scent as a trigger to get me into the zone: a trick borrowed from Stanislasky’s An Actor Prepares, which I’ve been using for 30 years. In the case of Vianne I used Chanel’s Coromandel, partly because it’s an olfactory relative of Chanel No. 5, which I used when I was writing Chocolat. (And on the same theme, I’ve created a scent of my own with the help of perfumier Sarah McCantrey of 4160 Tuesdays): it’s called Vianne’s Confession, and it illustrates a passage from the book.)

As for my advice to young writers; just write. You get better that way. And if you are indeed just getting started, don’t be in a hurry to publish or to share your work if you don’t feel ready. You have as long as you like to write your first book, and only one chance at making a first impression. So take it slow, let yourself grow, and enjoy the process, because if you don’t enjoy what you do, why should anyone else?

What’s next for you? Do you have further books in the pipeline? Do you think Vianne, or any of the sequels to Chocolat, will also be made into a film?

I always have more books in the pipeline: the next one is very different; it’s a kind of quiet folk-horror novel called Sleepers in the Snow. As for films, it’s too early to say, but it would be nice to see something on screen again – though preferably as a series, as I really think these books, with their episodic structure, would probably work better that way.

Tell us about your library. What’s on your own shelves?

At least 10,000 books in French, English, German. I find it hard to give books away, so I’ve accumulated quite a library of all kinds of things, in many different genres.

What have you been reading lately, and what would you recommend to other readers?

Right now I’m reading a proof of Catriona Ward’s new book, Nowhere Burning, which is terrific: so well-written, and like all her books, quite astonishingly creepy.

by Abigail Adams at August 07, 2025 03:03 PM

Ed Summers

Dune Path

dune path

Early morning in Villas, NJ

August 07, 2025 04:00 AM

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-08-06: Paper Summary: "ETD-MS v2. 0: A Proposed Extended Standard for Metadata of Electronic Theses and Dissertations"

Our paper, “ETD-MS v2.0: A Proposed Extended Standard for Metadata of Electronic Theses and Dissertations,” was accepted at the 27th International Symposium on Electronic Theses and Dissertations (ETD 2024), held in Livingstone, Zambia. ETD 2024 welcomed contributions on a wide range of topics related to Electronic Theses and Dissertations (ETDs), including digital libraries, institutional repositories, graduate education and training, open access, and open science. The symposium brought together global researchers, practitioners, and educators dedicated to advancing the creation, curation, and accessibility of ETDs. 


As the number of ETDs in digital repositories continues to grow, the need for a metadata standard that aligns with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles becomes increasingly important. Dublin Core and ETD-MS v1.1 are widely used metadata schemas for scholarly documents and ETDs. However, we identified several gaps that limit their ability to fully represent the structure and content of ETDs. In particular, content-level metadata, such as the individual components or “objects” within an ETD, has become increasingly important. This level of detail is essential for supporting machine learning applications that extract scientific knowledge and for enabling large-scale scholarly data services.

In this paper, we present ETD-MS v2.0, an extended metadata schema developed to address these limitations. ETD-MS v2.0 provides a comprehensive description of ETDs by representing both document-level and content-level metadata. The proposed schema includes a Core Component building on the existing ETD-MS v1.1 schema, and an Extended Component that captures objects, their provenance, and user interactions for ETDs.

Motivation

The motivation for ETD-MS v2.0 arises from three major limitations observed in current metadata standards. First, existing metadata standards lack the metadata elements to describe access rights and ETD file formats in detail. For example, the dc.rights field in ETD-MS v1.1 offers only three preset values for access. The dc.format field assumes a single MIME type per ETD, which is inadequate for ETDs that include multiple file types. Second, current standards lack metadata elements for representing internal components of ETDs such as chapters, figures, and tables. In our schema, these are referred to as “objects,” and they often have rich attributes of their own that require structured representation. Third, existing schemas do not support metadata originating from sources outside the original ETD submission, such as those generated by human catalogers or AI models. The absence of provenance information for such metadata further limits its utility.

Schema Design

ETD-MS v2.0 is composed of two main components: the Core Component and the Extended Component

Figure 1: Relationships among Entities in the Core and Extended Components of ETD-MS v2.0. Blue represents Extended Components, and green represents core components.

Core Component

The Core Component focuses on document-level metadata and was developed using a top-down approach. We analyzed 500 ETDs sampled from a collection of over 500,000 ETDs (Uddin et al., 2021) spanning various disciplines and publication years. The Core Component comprises 10 entities and 73 metadata fields.

Some key improvements include the transformation of dc.rights into a dedicated “Rights” entity, with attributes such as rights_type, rights_text, and rights_date. Another major addition is the “ETD_File” entity, which captures metadata related to multiple file types, file descriptions, generation methods, and checksums. We also introduced a new “References” entity, missing in earlier schemas, to capture structured metadata for cited works, including the fields reference_text, author, title, year, and venue.

The Core Component entities are categorized into two types: those that describe the ETD itself, such as “ETDs,” “Rights,” “Subjects,” “ETD_classes,” and “ETD_topics,” and those that capture relationships between ETDs or collections of ETDs, such as “References,” “ETD-ETD_neighbors,” “Collections,” and “Collection_topics”.

Extended Component

Figure 2: Relationships among Entities in the Extended Components of ETD-MS v2.0. Blue represents Category E.1, red represents Category E.2, and orange represents Category E.3.

The Extended Component focuses on content-level metadata and was developed using a bootstrap approach. It introduces 18 entities with 87 metadata fields, grouped into three categories:

  1. Category E.1: Includes entities such as “Objects,” “Object_metadata,” “Object_summaries,” “Object_classes,” and “Object_topics” to describe individual components such as figures, tables, and sections.
  2. Category E.2: Entities such as “Classifications,” “Classification_entries,” “Classifiers,” “Topic_models,” and “Summarizers” store metadata about how certain content was generated or classified.
  3. Category E.3: Captures metadata about user behaviors and preferences using entities such as “Users,” “User_queries,” “User_queries_clicks,” “User_topics,” “User_classes,” and “User-user_neighbors”.

Implementation

To evaluate the feasibility of ETD-MS v2.0, we implemented the schema using a MySQL database and populated it with data from a separate collection of 1,000 ETDs (distinct from the 500 ETDs used for schema development). These ETDs, sourced from 50 U.S. universities and published between 2005 and 2019, were used to simulate real-world metadata extraction. We used OAI-PMH APIs and HTML scraping to gather document-level metadata, and employed PyMuPDF and Pytesseract for text extraction from born-digital and scanned ETDs, respectively. We developed a GPT-3.5 based prompt to classify ETDs using the ProQuest subject taxonomy, and applied summarization models such as T5-Small and Pegasus to generate chapter and object summaries. For topic modeling, we used LDA, LDA2Vec, and BERT, while CNNs and YOLOv7 were used to detect and classify visual elements such as figures and tables. User interaction data was populated with dummy data. The full process of extracting, processing, and inserting metadata for all 1,000 ETDs was completed in approximately 11 minutes on a virtual machine with 32 CPU cores and 125 GB RAM, demonstrating the scalability of our approach.

Interoperability and Mapping

To ensure interoperability and mitigate schema adoption challenges, we created a detailed mapping between ETD-MS v2.0 and the existing standards Dublin Core and ETD-MS v1.1. For example, the new field ETDs.owner_and_statement aligns with dc.rights, and ETDs.discipline maps to thesis.degree.discipline in ETD-MS v1.1. In some cases, our schema introduces new metadata fields with no equivalents in older standards, such as the detailed “References,” “ETD_File,” and “Object_metadata” entities.

Limitations and Future Work

The current version of the schema was developed using a sample of 500 ETDs, which may not fully capture the metadata of ETDs beyond the scope of selection. For example, some ETDs contain multiple date fields, such as submission date and public release date, or include metadata such as a “peer reviewed” status. These elements are not represented in our current schema.

We view ETD-MS v2.0 as an evolving framework. In the future, we will refine the schema by including additional metadata elements. We will also collect feedback from ETD repository managers, librarians, and other stakeholders.

Conclusion

ETD-MS v2.0 is a comprehensive and extensible metadata schema developed to align ETD metadata with the FAIR principles. Our proposed schema extends existing standards by providing a more complete and detailed description and integrating content-level metadata. The proposed ETD-MS v2.0 schema, along with its mappings to both Dublin Core and ETD-MS v1.1, is available at the following GitHub link: https://github.com/lamps-lab/ETDMiner/tree/master/ETD-MS-v2.0.

References

Salsabil, L., Wu, J., Ingram, W. A., & Fox, E. (2024). ETD-MS v2.0: A Proposed Extended Standard for Metadata of Electronic Theses and Dissertations. In Proceedings of the 27th International Symposium on Electronic Theses and Dissertations (ETD 2024).

Uddin, S., Banerjee, B., Wu, J., Ingram, W. A., & Fox, E. A. (2021, December). Building A large collection of multi-domain electronic theses and dissertations. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 6043-6045). IEEE. https://doi.org/10.1109/BigData52589.2021.9672058


-- Lamia Salsabil (@liya_lamia)

by Lamia Salsabil (noreply@blogger.com) at August 07, 2025 03:30 AM

August 06, 2025

Open Knowledge Foundation

Open Data and ODE in Bangladesh: Students and Researchers Step into a New World of Openness

During our ODE workshop, many participants had an eye-opening moment. At first, some didn’t quite get how open source or open data related to their work. But once we introduced them to tools like the Data Package and its connection with ODE, it all clicked.

The post Open Data and ODE in Bangladesh: Students and Researchers Step into a New World of Openness first appeared on Open Knowledge Blog.

by MRB Rafi at August 06, 2025 08:17 PM

August 05, 2025

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-08-04: Trip Report: 2025 AIAA SciTech Forum

 

The 2025 AIAA SciTech Forum in Orlando served as a seminal meeting point for researchers, practitioners, and policy influencers from across the aerospace spectrum. Throughout the week, a series of plenaries, panel discussions, and technical sessions provided a multifaceted view of contemporary challenges and future directions in aerospace research and development. This report will review key themes—from resilient space systems and exascale computing to transformative applications of artificial intelligence (AI) and NASA’s evolving strategies for continuous human presence in low Earth orbit (LEO) and my own personal takeaways from the various talks I attended. 


I. Opening Plenary and Community Building

In his opening address, Clay Mowry, the newly appointed CEO of the American Institute of Aeronautics and Astronautics (AIAA), set the stage by emphasizing the forum’s role in fostering technical exchange and innovation. Speaking to an audience exceeding 6,000 attendees, including over 2,000 students and young professionals, Mowry underscored the Institute’s dual commitment to honoring its long-standing heritage and charting a forward-looking course for aerospace. He highlighted several strategic priorities:



JPL’s Vision and the Future of Planetary Exploration

In the opening keynote by Dr. Lori Leshin of NASA’s Jet Propulsion Laboratory (JPL), the forum’s attention shifted to the challenges of planetary exploration. Dr. Leshin’s presentation covered topics such as:

Dr. Leshin’s keynote underscored that the future of planetary exploration will be defined by the capacity to execute “ridiculously hard” missions; an endeavor that demands the convergence of technical innovation, rigorous testing, and robust international cooperation. A focus on complex mission research and execution helps to drive home the point that we can not continue to be a leader in space exploration with commercial space alone. 



II. Advancing Resilient Space Systems

A key theme at the forum was the redefinition of resiliency in space systems. In a panel discussion featuring Dr. Deborah Emmons from the Aerospace Corporation and other experts, participants examined the limitations of traditional point-to-point resiliency models and advocated for a distributed, holistic approach. The session presented a rigorous analysis of emergent threats, including:

Such deliberations reinforce the imperative for next-generation space systems to incorporate resiliency as an emergent, distributed property — a concept that will shape both technical R&D and national security policy.


III. Exascale Computing and Its Impact on Aerospace Research

Dr. Bronson Messer’s keynote presentation on Oak Ridge National Laboratory’s Frontier supercomputer provided a technical deep-dive into the transformative potential of exascale computing. Frontier, with its reported peak performance of 2.1 exaflops in double precision, exemplifies the convergence of advanced hardware, optimized interconnectivity, and innovative cooling solutions. Key points included:

Dr. Messer’s presentation illustrates that advances in computational infrastructure are pivotal to solving complex aerospace problems, thereby fostering breakthroughs in both fundamental science and applied engineering.


IV. AI as a Catalyst for Transformative Aerospace Applications

The session titled “The AI Future Is Now,” led by Alexis Bonnell, CIO of the Air Force Research Lab (at the time of the presentation), offered a forward-looking perspective on the integration of AI into aerospace systems. Moderated by Dr. Karen Wilcox, Banel’s presentation addressed several critical issues:

The session’s exploration of AI underscores its role as both a technical enabler and a transformative force that reshapes the dynamics of human-machine collaboration in aerospace.


V. NASA’s Evolving Vision for Low Earth Orbit (LEO)

NASA’s strategic vision for continuous human presence in LEO was articulated by Associate Administrator Jim Free in a session that presented the agency’s new LEO microgravity strategy. Free’s remarks provided a comprehensive overview of NASA’s long-term objectives, which build on the legacy of the International Space Station (ISS) while charting a course for future exploration. Salient aspects included:

This session not only provided a detailed roadmap for future LEO operations but also highlighted the importance of consultation and iterative strategy development in addressing the multifaceted challenges of space exploration.



VI. NASA Langley Specific Talks

Dr. Danette Allen


The presentation by Dr. B. Danette Allen, titled "Teaming with Autonomous Systems for Persistent Human-Machine Operations in Space, on the Moon, and on to Mars," explored the critical role of autonomous systems in NASA’s long-term Moon-to-Mars strategy. The discussion emphasized the need for reliable, resilient, and responsible autonomy to support human-machine teaming in deep space exploration.

Dr. Allen framed the talk around the question of whether autonomy should be "irresponsible"—a rhetorical setup that presented the challenges of ensuring trust, safety, and effectiveness in autonomous robotic systems. The presentation aligned with NASA’s broader Moon-to-Mars architecture, which envisions integrated human and robotic operations to maximize scientific and engineering productivity. The emphasis was on creating autonomous systems that can function effectively in harsh, time-critical environments while maintaining transparency, explainability, and human oversight.

A key focus was the concept of Human-Machine Teaming (HMT) which involves the integration of human cognition with robotic efficiency to optimize exploration activities. NASA’s strategy aims to balance supervised autonomy with trusted, independent robotic operations that extend the reach of human explorers. This approach ensures that, even during uncrewed mission phases, habitation systems, construction equipment, and surface transportation can function autonomously while still allowing human intervention when necessary.

The presentation detailed how autonomous systems will contribute to NASA’s Lunar Infrastructure (LI) and Science-Enabling (SE) objectives. These include autonomous site surveying, sample stockpiling, and in-situ resource utilization (ISRU) to prepare for crewed missions. Autonomous construction techniques will be crucial for building long-term infrastructure, such as power distribution networks and surface mobility systems, while robotic assistants will help optimize astronaut time by handling routine or high-risk tasks.

One of the central challenges discussed was trust in autonomous systems. Dr. Allen highlighted that autonomy in space is not merely about function allocation but about fostering justifiable trust, which ensures that robots make decisions in a way that humans can understand and rely on — especially in safety-critical scenarios. The talk addressed different levels of autonomy, ranging from supervised to fully autonomous systems, and how human explorers will interface with these technologies through natural interaction methods such as gestures, gaze tracking, and speech.

From an in-space assembly perspective, this research is vital. As NASA moves toward constructing large-scale space infrastructure, ranging from modular lunar habitats to Martian research stations, robotic autonomy will be essential in assembling, repairing, and maintaining these structures. Autonomous systems capable of adapting to dynamic conditions will reduce reliance on Earth-based control, allowing for more resilient and self-sustaining operations.

The Moon-to-Mars strategy’s emphasis on interoperability and maintainability also ties into the need for autonomous systems that can adapt to different mission phases. Whether constructing habitats, assisting in scientific research, or supporting crew logistics, autonomy must be integrated seamlessly across NASA’s exploration objectives. By leveraging artificial intelligence and robotic automation, NASA is setting the foundation for a future where in-space assembly and long-term space habitation become feasible and sustainable.

Ultimately, the idea that autonomy in space must be trustworthy, explainable, and mission-critical is fundamental to the development of reliable human-machine teams. These teams will be a cornerstone of NASA’s efforts to establish a persistent human and robotic presence on the Moon and Mars, paving the way for deeper space exploration and long-term space infrastructure development.


Dr. Natalia Alexandrov



The presentation by Natalia M. Alexandrov and colleagues, "MISTRAL: Concept and Analysis of Persistent Airborne Localization of GHG Emissions," explored an innovative approach to tracking and mitigating methane (CH₄) emissions using persistent airborne monitoring. Funded by NASA’s Convergent Aeronautic Solutions (CAS) initiative, the project sought to develop a scalable, low-cost solution for real-time methane detection, with a particular focus on high-emission regions like the Permian Basin.

The presentation emphasized the urgency of methane reduction by highlighting that the global temperature increase had surpassed the critical 1.5°C threshold in 2024. This warming has exacerbated environmental, economic, and health crises, with methane playing a significant role due to its potency as a greenhouse gas. The discussion also addressed the direct health effects of methane emissions, which displace oxygen and contribute to respiratory, cardiovascular, and neurological conditions. Studies cited in the talk estimated that emissions from oil and gas industry operations contribute to 7,500 excess deaths and a $77 billion annual public health burden in the U.S. alone.

Initially, the research team explored airborne CO₂ removal but pivoted toward methane due to its greater short-term climate impact. The final concept emphasized persistent localization and reporting rather than scrubbing, as some experts raised concerns that removal technologies might unintentionally encourage more emissions. Instead, MISTRAL proposed a decentralized approach in which fleets of commercial off-the-shelf (COTS) drones would conduct continuous monitoring and reporting of methane leaks, allowing for timely intervention and mitigation.

The design reference mission (DRM) centered on the Permian Basin, one of the largest methane super-emitters in the world. The project proposed partitioning the observation area into units, each operating a fleet of drones for continuous surveillance of emissions from production sites, pipelines, and storage facilities. The study also explored different operational strategies, such as distributed battery hot swapping and chase vehicle-based battery replacements, to maximize efficiency and minimize downtime.

A key aspect of the analysis was its feasibility assessment. The team evaluated the economic viability of the system, modeling costs under pessimistic assumptions. Even in worst-case scenarios, the study found that small municipalities could afford to implement and maintain a localized monitoring network. The project also aligned with existing Environmental Protection Agency (EPA) third-party reporting initiatives, empowering local governments, first responders, and communities to take direct action in holding polluters accountable.

From an Earth science and conservation perspective, MISTRAL represented a major step forward in environmental monitoring and climate change mitigation. Persistent airborne surveillance of greenhouse gases could provide critical data for climate researchers, regulatory agencies, and policymakers, improving the accuracy of emissions inventories and facilitating more effective enforcement of environmental regulations. The ability to track methane emissions in near-real-time also complemented broader conservation efforts by helping to identify and address sources of ecosystem degradation, such as habitat loss due to oil and gas extraction.

Furthermore, MISTRAL’s model of community-driven, low-cost, technology-enabled environmental oversight offered a scalable blueprint for other regions grappling with industrial pollution. By decentralizing environmental monitoring and making it more accessible, the project aligned with global efforts to use technology for conservation, supporting initiatives like methane reduction pledges under the Global Methane Pledge and broader climate resilience strategies.

Ultimately, the presentation concluded that the MISTRAL concept was not only technically and economically viable but also a transformative tool for conservation and environmental protection. By leveraging autonomous aerial systems for persistent methane tracking, the project offered a pragmatic, actionable solution for reducing greenhouse gas emissions and mitigating climate change at a critical time for global climate action.


Dr. Javier Puig-Navarro


Dr. Javier Puig-Navarro’s talk, “Performance Evaluation of a
Cartesian Move Algorithm for the LSMS Family of Cable-Driven Cranes
”, presented on the performance of a novel algorithm designed for the Lightweight Surface Manipulation System (LSMS)

The LSMS crane operates through multiple cable actuators that provide both support and control of the payload, enabling large workspaces with lightweight hardware. However, the system’s complex nonlinear dynamics, coupled actuator paths, and lack of traditional joint sensors pose significant challenges to motion planning, especially in the precise manipulation required for autonomous or teleoperated operations on the Moon or Mars.

The Cartesian Move Algorithm: A Simpler Path to Precision

Puig-Navarro’s team developed the Cartesian Move Algorithm to simplify these challenges by shifting control focus from joint space to task space. The algorithm's objective is to drive the crane’s end effector (e.g., hook or gripper) to a desired 3D location, maintaining position even in the face of actuation delays, feedback uncertainty, and mechanical compliance.

Inputs to the algorithm include:

Instead of prescribing precise joint motions, the algorithm computes control signals that move the end effector directly along a Cartesian path. This approach abstracts the operator or control planner away from the complexities of cable tensioning, kinematic switching, and nonlinear coupling; common obstacles in cable-driven robotic systems.

In practice, the Cartesian move operates during several key motion phases:

For initial alignment (approach), a separate joint-space trajectory tracking algorithm is used, which ensures smooth transition into Cartesian control when precision is most critical.

Performance Insights from Hardware and Simulation

Puig-Navarro reported on a rigorous evaluation of the algorithm using 49 real-world trials on LSMS testbeds. The results were impressive:

Moreover, the team benchmarked algorithm performance across both hardware and simulation environments. While both platforms showed excellent convergence behavior, physical hardware introduced subtle differences in path curvature and command saturation—attributable to real-world constraints like cable elasticity and latency.

Practical Implications for Planetary Robotics

The key takeaway from Puig-Navarro’s talk is that the Cartesian move algorithm is a powerful and practical solution for tasks requiring final-position accuracy in environments where traditional robot arms are impractical or infeasible. For operations where the path shape is also critical (e.g., obstacle avoidance or coordination with other manipulators), the team recommends using trajectory-tracking or path-following algorithms instead.



Dr. Joshua Moser


Dr. Joshua Moser’s talk,  "Bridging the Gap Between Humans and Robotic Systems in Autonomous Task Scheduling," explored the integration of human decision-making with autonomous task scheduling to enhance operational efficiency in space environments. The core focus was on the sequencing and allocation of tasks and crucial elements in ensuring smooth execution of autonomous operations, particularly in scenarios involving data collection, mining, offloading, assembly, repair, maintenance, and outfitting.

Moser discussed various approaches to task sequencing, emphasizing the importance of dependencies and workflow constraints. He introduced Mixed Integer Programming (MIP) and Genetic Algorithms as computational techniques for optimizing task execution order, ensuring efficiency and feasibility in robotic operations. Similarly, task allocation was analyzed through the lens of an agent’s capabilities, location, and travel constraints — highlighting the necessity of considering independence, dependencies, and failure probabilities when assigning work to robotic systems.

A significant aspect of the presentation was the role of human-autonomy collaboration. Moser distinguished between "human-in-the-loop" and "human-on-the-loop" frameworks, where humans either actively direct autonomous systems or oversee their operations with minimal intervention, respectively. The key challenge lies in creating interfaces that enable intuitive human interaction with autonomy; leveraging graphical representations, large language models (LLMs), and interactive visualization tools.

Moser uses the LSMS (Lightweight Surface Manipulation System) as an example of real-world applications, illustrating how autonomous scheduling can optimize payload offloading using a cable-driven crane and rover system. The emphasis on graphical task-agent visualization and intuitive user inputs (such as click-and-drag interfaces) reflected an effort to make autonomy more interpretable and manageable by human operators.

In the broader context of NASA’s in-space assembly efforts, Moser’s work aligns with ongoing initiatives aimed at enabling autonomous robotic construction and maintenance of space infrastructure. As NASA pushes toward large-scale space structures—such as modular space habitats, solar power stations, and next-generation observatories—intelligent task scheduling and allocation mechanisms become increasingly critical. Bridging the gap between human cognition and robotic automation will be essential to achieving scalable and resilient in-space assembly systems, reducing reliance on direct human intervention while ensuring mission success in unpredictable environments.

Me:

I presented on "Trust-Informed Large Language Models via Word Embedding-Knowledge Graph Alignment," exploring innovative methods to enhance the reliability and accuracy of large language models. The central theme of my presentation was addressing the critical challenge of hallucinations,instances where LLMs generate plausible yet incorrect information, particularly problematic in high-stakes fields such as aerospace, healthcare, and financial services.

My research investigates the integration of LLMs with knowledge graphs, structured representations of real-world knowledge, to foster intrinsic evaluation of information credibility without external verification sources. Specifically, I discussed aligning word embeddings, mathematical representations of words capturing semantic relationships, with knowledge graph embeddings which encode entities and their interconnections. By merging these two types of embeddings into a unified vector space, the resultant model significantly improves its ability to evaluate the plausibility of generated content intrinsically, thus reducing its dependence on external systems and mitigating the risk of hallucinations.

During the presentation, I provided a comprehensive survey of existing alignment methods, including mapping-based approaches, joint embedding techniques, and the application of graph neural networks. Additionally, I outlined key applications where this methodology could significantly enhance trust in AI systems, particularly in safety-critical decision-making environments such as aerospace operations.

Lastly, I addressed the technical, methodological, and ethical challenges that accompany this integration, offering insights into future research directions to further develop robust, trustworthy AI. My work aims not only to advance understanding of language models but also to contribute practically to developing safer, more reliable AI systems that can independently discern truth from misinformation.

VII. Conclusion

The 2025 AIAA SciTech Forum exemplified the integration of cutting-edge technology with strategic foresight in the aerospace domain. Several overarching themes were repeated throughout the conference such as the imperative to develop space systems that are resilient by design, capable of dynamic, distributed response to emergent threats by targeted research and development in Distributed Resilience. Additionally, the transformative role of exascale computing in enabling high-fidelity simulations that drive both fundamental research and applied technology development. Finally, the promise of artificial intelligence to not only optimize operational efficiency but also fundamentally alter the relationship between human decision-making and information processing.

As I return to my work at NASA Langley, I'm reminded that innovation often happens at the boundaries between fields. The conversations in hallways, the unexpected connections between presentations, and the diverse perspectives of over 6,000 attendees all contribute to pushing aerospace forward. In an era where the challenges are "ridiculously hard" (to borrow Dr. Leshin's phrase), our solutions must be equally ambitious—and thoroughly collaborative.

The path from Earth to a sustained presence on the Moon and Mars will require not just technological breakthroughs, but a fundamental shift in how we approach complex systems. The 2025 SciTech Forum showed that the aerospace community is ready for this challenge, armed with distributed thinking, unprecedented computational tools, and a commitment to building AI systems worthy of our trust.


- Jim


by Jim E. (noreply@blogger.com) at August 05, 2025 03:09 AM

Hugh Rundle

How to make a custom template for the Remarkable 2

How to make a custom template for the Remarkable 2

Recently I decided I wanted to make a custom template to use on my reMarkable 2. I eventually figured out how to do this, but whilst I found some useful guides online, all of them were slightly misleading or unhelpful in different ways – probably due to changes over time. This guide is for anyone else wanting to give it a shot in 2025.

The tl;dr

The reMarkables are built on Linux, and the templates are SVG files in a specific directory. Adding your own template is probably easier than you expected:

  1. create an SVG file for your template
  2. connect to your reMarkable using SSH
  3. copy your template to the templates directory
  4. update the templates.json file so your template appears in the listing
  5. reboot the reMarkable

I haven't tried it on Windows, but apparently Windows has an SSH terminal and also scp so you should be able to follow this same process whether you have a computer running Linux, MacOS, any other Unix-based system, or Microsoft Windows.

You will need a computer, software for creating SVG graphics, and a little confidence.

Caveats

It's possible you could brick your reMarkable if you mess this up really badly. Always make sure you have backed up your files before doing anything in reMarkable's file system.

I haven't been using custom templates for long enough to know for sure, but others have suggested that when your reMarkable software is next updated, any custom templates may be deleted. Make sure you have backups of your templates as well!

Finally, this is what worked for me on a reMarkable 2 running the latest operating software in July 2025. Future system updates may change the way this works.

Step 1 - create your template

Older guides for making custom templates, like this one were helpful for me to understand the basics of templates, but it seems that in the past templates were .png files, whereas recently they changed to SVG.

To create a template you will need something to create SVG graphics. I use Affinity Designer, but you could try Inkscape, Adobe Illustrator, or Canva. The reMarkable 2 screen size is 1872px x 1404px so although SVGs will scale proportionally, for best results make your file match that size.

Remember that your reMarkable 2 will only display in black, white, and grey. If your design doesn't quite work the first time, you can play around with it and reload it, so you can experiment a little until you get the design that suits your needs.

Once you're finished, save the template somewhere you can find it easily on your computer, as a .svg file.

Step 2 - connect to your reMarkable via SSH

To access the operating system for your reMarkable, you will need to connect using Secure Shell (SSH). For this, you need two pieces of information about your reMarkable: the IP address, and the password. From the main menu (the hamburger icon at top left) navigate to Settings - Help - Copyrights and licenses. At the bottom of the first page in this section you will find your password in bold type, and a series of IP addresses. The second (IPv4) address is the one you are looking for. This will be a private IP address starting with 10. If your reMarkable is connected to WiFi, you can use SSH over the same WiFi network. Otherwise, connect via your reMarkable's USB power/data cable. Either way, ensure that your reMarkable remains awake whilst you are connected, otherwise your session may hang.

Open a terminal on your computer (Terminal on Mac and Linux desktop, CMD.exe or PowerShell on Windows). You will be logging in as the user called root. This is a superuser on Linux machines so take care - with great power comes great responsibility. You should be able to log in using this command (where xxx.xxx.xxx.xxx is your IP address):

ssh root@xxx.xxx.xxx.xxx

Your terminal will then ask for a password, which you should type in, and then press Enter - the quotation marks are not part of the password. If all goes well, you should see something like this:

reMarkable
╺━┓┏━╸┏━┓┏━┓   ┏━┓╻ ╻┏━╸┏━┓┏━┓
┏━┛┣╸ ┣┳┛┃ ┃   ┗━┓┃ ┃┃╺┓┣━┫┣┳┛
┗━╸┗━╸╹┗╸┗━┛   ┗━┛┗━┛┗━┛╹ ╹╹┗╸
reMarkable: ~/

~ hacker voice ~ You're in 😎.

Step 3 - copy your template to the reMarkable

At this point you should pause to ensure that you know the filepath to the template path on your computer. If you saved it to your desktop (not a great place for long term storage, but convenient for quick operations like this) it will be something like ~/Desktop/my_custom_template.svg. We are now going to create a special subdirectory for your custom template/s, and copy your file across.

In your terminal session you should still be logged in to the reMarkable. The templates are all stored in the /usr/share/remarkable/templates directory. To create a new subdirectory, we use the mkdir command, like this:

mkdir /usr/share/remarkable/templates/my_templates

Now we can copy our template over. Open a new terminal window. We will use the secure copy protocol to copy the file over SSH from your computer to your reMarkable:

scp ~/Desktop/my_custom_template.svg /usr/share/remarkable/templates/my_templates/

Back in your first terminal session – which should still be connected to the reMarkable – you can check whether the file transferred across using the ls command:

ls /usr/share/remarkable/templates/my_templates

This should display my_custom_template.svg.

Step 4 - update the templates.json file

Now for the trickiest part. You will need to update a file in the templates directory called templates.json. This provides information about where each template is stored, what it should be called, and which icon to use in the templates menu. If you make an error here, your templates may no longer work properly (I know this from my own mistake!) - so whilst it is reasonably straightforward, you do need to pay attention.

Many tutorials about editing files on the Linux command line tell you to use vi or vim. These are the default text editors on Linux, but they are also obtuse and confusing for newcomers. We are going to instead use the nano program that is also standard on most Linux distributions, but a little easier to understand. To edit the templates JSON file, open it in nano:

nano /usr/share/remarkable/templates/templates.json

You should now see a screen showing the beginning of a long string of JSON. We want to add a new entry down the bottom of the file, so we will navigate down to line 500 using the keyboard shortcut Ctrl + / + 500 + Enter. From there you can use your cursor/arrow keys to navigate down to the last entry in the file. We want to add a new entry, like this:

    {
      "name": "Hexagon small",
      "filename": "P Hexagon small",
      "iconCode": "\ue98c",
      "categories": ["Grids"]
    },
    {
      "name": "My Daily Schedule",
      "filename": "my_templates/my_custom_template.svg",
      "iconCode": "\ue9ab",
      "categories": ["Planners"]
    }
  ]
}

Make sure you do not overwrite or delete the square and curly brackets at the end of the file, that you do put a comma after the second-last entry and your new one, and do not leave a trailing comma after your new entry.

Note that the filename is relative to the templates directory, so we need to include the new subdirectory. The iconCode uses a "private use" unicode value that matches one of reMarkable's standard images – it is not possible to create your own icon so you will need to re-use one of the existing ones.

Once you confirm everything is correct, enter Ctrl + x to exit, and y + Enter to confirm you want to save changes using the original filename.

Step 5 - reboot

Now for the most terrifying moment: rebooting your reMarkable!

Back on your command line, type reboot and then press Enter.

This step is simple but it will be a little nerve-wracking because your reMarkable will reboot, then pause for a moment before letting you log back in. If everything has gone according to plan you should now be able to find your new template by name in the template directory, and start using it!

Optional bonus step 7 - SSH keys

Logging in with a password is ok, but it can get a bit tedious. An easier way is to use SSH keys.

You can set up an SSH "key pair" on Linux and MacOS and also now natively on Windows.

Once you have created your keys, you can use ssh-copy-id to copy your public key to your reMarkable, allowing you to log in without a password! We use the ssh-copy-id command, with the i flag followed by the path to our ssh key:

ssh-copy-id -i ~/.ssh/id_rsa root@xxx.xxx.xxx.xxx

If you only have one ssh key, you can just enter:

ssh-copy-id root@xxx.xxx.xxx.xxx

At the prompt, enter your password and press Enter. You should see a number of lines of output, ending in:

Number of key(s) added:        1

Now try logging into the machine, with:   "ssh 'root@xxx.xxx.xxx.xxx'"
and check to make sure that only the key(s) you wanted were added.

You should now be able to log in to your reMarkable to update templates at you leisure without a password.

Happy note taking!


by Hugh Rundle at August 05, 2025 12:00 AM

August 04, 2025

Ed Summers

Tide Out

Tide Out

Looking south along Villas, NJ at the Delaware bay.

August 04, 2025 04:00 AM

August 01, 2025

Open Knowledge Foundation

‘Educating the Gaze’ with Open Data Editor at Abrelatam/Condatos Bolivia 2025

Workshop participants were able to identify problems associated with working with data. Some common scenarios were addressed, which were linked to a fear of working with data or dealing with databases without knowing what to do.

The post ‘Educating the Gaze’ with Open Data Editor at Abrelatam/Condatos Bolivia 2025 first appeared on Open Knowledge Blog.

by Omar Luna at August 01, 2025 09:45 PM

Mita Williams

How I use Zotero + OpenRefine + QuickStatements to create Scholia profiles from Wikidata

Let's make scholarly profiles for our colleagues. Together.

by Mita Williams at August 01, 2025 06:16 PM

LibraryThing (Thingology)

August 2025 Early Reviewers Batch Is Live!

Win free books from the August 2025 batch of Early Reviewer titles! We’ve got 232 books this month, and a grand total of 4,410 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.

If you haven’t already, sign up for Early Reviewers. If you’ve already signed up, please check your mailing/email address and make sure they’re correct.

» Request books here!

The deadline to request a copy is Tuesday, September 2nd at 6PM EDT.

Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to the US, the UK, Canada, Germany, Australia, New Zealand, Spain, Denmark, France, Belgium and more. Make sure to check the message on each book to see if it can be sent to your country.

How to Dodge a CannonballThe WhisperingsDeeply Loved: Receiving and Reflecting God's Great Empathy for YouA Killer in the FamilyThe Pursuit of Liberty: How Hamilton vs. Jefferson Ignited the Lasting Battle Over Power in AmericaVianneThe Missionary Kids: Unmasking the Myths of White EvangelicalismGirl LostBlind DevotionDeath on Omega StationIt Will Be God: Live in the Jaw-Dropping Realities of God's GoodnessCozy Foodies: Cute and Comfy Coloring BookBeing Middle (Is a Great Place To Be)Liberals with Attitude: The Rodney King Beating and the Fight for the Soul of Los AngelesSwizThe Man Who Shot Liberty ValanceBroken ArrowCanyon and Cosmos: Searching for Human Identity in the Grand CanyonSnow Fleas and Chickadees: Everyday Observations in the SierraWhere the North EndsSoulslayersAhhh, Milk!: The Many Ways Milk Is MadeMeowsterpieces: Cozy Cat x Famous Paintings Coloring BookAnne DreamsA Skeleton in the ClosetLights at NightThe Little Ghost Quilt's Winter SurpriseThe World's EndThe Sleeping GiantTroop EsmeJourneys to the Nearby: A Gardener Discovers the Gentle Art of UntravellingThe Death KnotNarrowlyOld Man Evil & Other StoriesThe Director's CutIn the Low: Honest Prayers for Dark SeasonsThe Weight of Snow and RegretLight and Shadow: Minds Shine Bright: International Creative Writing AnthologyTrue BlueNo Is a Full F*cking Sentence: How to Stop Saying Yes When You Want to Say F*ck NoPirkei Hallel: A Shared Journey for Bat Mitzvah Girls and Their MothersDivrei Halev: Thoughts of Rabbi Professor David Weiss Halivni on the Weekly Torah PortionWine Journey: An Israeli AdventureLilacs and Roses: A Poetry CollectionEchoes of the EastOnce and FutureAnything For A Quiet Life: The Autobiography of Jack HawkinsThe Joy of Solitude: How to Reconnect with Yourself in an Overconnected WorldAutumn's GraceStillwaterBreach the HullWrecking Ball: Race, Friendship, God, and FootballThe Ghost Who Wouldn't Leave: And Neither Would SheThe Last Adieu: Lafayette's Triumphant Return and the Grand Celebration That United a Grateful AmericaWhat the Water RemembersBut The Wicked Shall PerishIncurable: Stories from the World of CureBorn Lucky: A Dedicated Father, a Grateful Son, and My Journey on the SpectrumWork How You Are Wired: 12 Data-Driven Steps to Finding a Job You LoveJoan Crawford: A Woman's FaceDecades of Nostalgia Coloring Book: Relive Iconic Moments, Relax Your Mind, and Rekindle Memories with Vintage Scenes of American Pop Culture from 1950s To 1990sHedge WitchNiki UndercoverBad Indians Book Club: Reading at the Edge of a Thousand WorldsThe Student Debt Crisis: America's Moral UrgencyInevitable: Inside the Messy, Unstoppable Transition to Electric VehiclesWith My People: Life, Justice, and Activism Beyond the UniversityHow Daddy Lost His Ear and Other StoriesThe 21-DAY Herbal Antiviral & Detox Guide: A Healing Book of Medicinal Herbs, Natural Remedies, and DIY Apothecary Rituals to Cleanse, Boost Immunity & Rejuvenate Health in Just 3 WeeksThe Herbal Apothecary Book & Natural Medicine Guide: 600+ DIY Medicinal Herbs, Healing Recipes & Remedies for Immunity, Infection, Wellness & Everyday Health—A Holistic Reference for Beginners & ExpertsPink Salt Trick for Weight Loss: The 60-Second Morning Electrolyte Reset That Melts Belly Fat, Balances Hormones, and Ignites All-Day EnergyPink Salt Trick for Weight Loss Plan: The 28-Day Hormone-Balancing Challenge with Sunrise Tonics, Real-Food Recipes, and a Printable Success JournalCortisol Detox Diet Plan: The 28-Day Fat-Loss Reset to Balance Hormones, Reduce Stress Weight, and Restore Energy with Science-Backed Cortisol Support for WomenFire and IvyHarriet Tubman, Force of Nature: A Biography in PoemsThis Book Is Too Quiet!: You Add the Noise...Pwcca Pure SoulsThe Rules of Falling for YouThe Infinite GladeWonder and Joy for the Wired and Tired: A Guide to Finding Inspiration and Well-Being in a Wonder-Filled WorldStrange Shape of LoveBurning HeartsDoris the DragonDead Man BluesCooking As Therapy: How to Improve Mental Health Through CookingAn Amateur Witch's Guide to MurderThe Violin Family Plays New MusicFollow Me To The EndMaria la DivinaBond Keeper: The Watcher's GiftAttack of the HangriesIn Search of a Mennonite Imagination: Key Texts in Mennonite Literary CriticismWearing a Broken Indigene Heart on the Sleeve of Christian MissionThe Curious Cliche of the Black ScarabCrocoVenomous River: Changing Climate, Imperiled Forests, and a Scientist's Race to Find New Species in the CongoAn Offer He Can't RefuseGone CountryShattered Peace: A Century of SilenceFinal LapMenagerie in the Dark: StoriesThe Gunslinger's WidowPirate Cuisine: A Children's CookbookRookVividwaterSmart Money MovesSpiders on a ShipThe Mosaic Way: Evolve Your Emotional Intelligence for Inclusive Leadership in a Changing WorldWild Oz: Hilariously Unfiltered Backpacking StoriesMoonlight JusticeDoctor WitchRun Away to MarsFrom Pain to Power: A Guide for Women Finding Strength Through Life's StormsStronger Than Fragile: A Mother's Journey Through Preterm Birth, Osteogeneses Imperfecta and GriefOrphanlandHow I Hacked the MoonDeadly SanctuaryDaughter of DreamGang Way: The BrotherhoodMaya's Diary: The Lost JournalDark Sky Full of StarsMy Digital SoulCherish or Perish: Strengthen Your Intellect; Save the World!Who Banged the Big Bang!: Trans-Scientific Mysteries: Decoded, Simplified!A. R. I. S. E.: The Hummingbird Strategy for Mental Strength and Sustainable SuccessShut Me up in Prose: StoriesIntersecting PathsTo Gaze Upon a Darkened CloudUltimartPulstar III - The Cracked Mirror of the CosmosThe Arts — A Practical Approach: A Textbook for ChildrenThe Cancer Diet: A Memoir on Resilience and RedemptionA Bridesmaid's Guide to MurderIntersectionsOss'stera: An Epic FantasyBefore the Next Crisis: Untold Stories of Public Health and Why They MatterThe Phoenix Gene: Origins of DarknessA Nest of MagicThe Joyful Guide To Retirement—A Simple Roadmap to Happy, Healthy and a Thriving LifeA Stroke of Luck: My Journey Through a Traumatic Brain InjuryMonsterlandHigh Hopes: A MemoirWhere Did Rocky Pee?The Marfa Blues: Searching for TreasureA Vow in the VeilBetween the Lines: A Memoir in the Quiet SpacesThis Small Moment's ShelterTime Travelers: Minecraft Meets Civil War: An Unofficial Minecraft AdventureGreatnessMurder, Magic, and Maple FudgeDown to UrsaHow to Leave Preterism: Escaping the Anti-Communal Theology of Private InterpretationNight of the Living Toilet PaperFuture XOctavoSplitA Brush with FateChemical SoulsA Horse Drawn Sick Bar Cutter: Finding My Road to FelicityBraxton and Booger: Surviving Space SchoolDevil's HandThree Faces of Noir Curse Crime CringeThe Way of LucheriumWhen Secrets BloomBreaking the Chains: Your Path to Financial Independence and Freedom1970: Year of Tragedies and TransitionsArisingBroken Meditations: An Unhinged CollectionGod's Gonna Cut You DownLet Bygones Be BygonesDawn of ShadowsOnce Twice ThriceIs This the Way Home?A Tea for TwoBecoming PeterBiology Made Simple: An Introduction for Kids19 Doors19 DoorsBehind Closed Doors: Memoirs of an American Call GirlThe Founding SyndicateLove and Nature / Amor y NaturalezaThe Aquamarine NecklaceSabrina Tells Maddie the Truth About Her PastMillion Dollar MadnessPrompted: A Journey Into Yourself: 20 Conversations That Awaken Your Deepest SelfThe Art & Practice of Living WondrouslyPeace as a Cosmic Law: Aligning with the Universe’s Guiding PrincipleAtomic Impact: Systems for Transformative ProductivityAfterDarkThe Flown Bird Society: An Illuminated StoryThe Floating Lake of Dressa MoorePhantom AlgebraBehind the MirrorVisage of MorosTapoutWe The People: A PremonitionDefending the Middle Ages: Little Known Truths About the Crusades, Inquisitions, Medieval Women, and MoreThe Intelligent Age: A Field Guide to the FutureBroke, Driven, and Building: A True Story from Rock Bottom to Financial IndependenceThe UmpDescansoA Complete History of the European Figure Skating ChampionshipsA Complete History of the World Figure Skating ChampionshipsYou Had Me at Waffles: The Story of Emily and AdamMAX's Mashed Marble!Plundering PicassoMalariaEmmelyn's Book ClubRed Sun RisingWishstone: Chains That BreakFuneral of Secrets and Other StoriesTread Dead RedemptionThe CampaignDancing in the Aisle: Spiritual Lessons We've Learned from Children (25th Anniversary Edition)The Belly-Up Code: Your Blueprint for Startup SuccessVan Gogh's LoverHearts Beneath The Broken SkyCut Off from Sky and EarthThe Dragonkin Legacy: The Last War & Dragon GuardiansRoots Beyond Borders: Navigating Identity and Belonging in a Multicultural WorldFuturumBeyond the PalePointless: Revenge Is a Rationalization, Not a MotivationThe CollectorsChildren of the Fire MoonUnspokenBear Witness: A Crusade for Justice in a Violent LandBear Witness: A Crusade for Justice in a Violent LandThe Beginner's Guide To Psychedelics: How to Reduce Risk, Deepen Connection, and Maximize InsightsThe Beginner's Guide To Psychedelics: How to Reduce Risk, Deepen Connection, and Maximize InsightsSpiritual Edge: Exploring the Boundaries and Evolution of ReligionSpiritual Edge: Exploring the Boundaries and Evolution of Religion

Thanks to all the publishers participating this month!

Akashic Books Akashic Media Enterprises Alcove Press
Anchorline Press Awaken Village Press Baker Books
Bear Paw Press Bellevue Literary Press Bethany House
Bigfoot Robot Books Broadleaf Books Castle Bridge Media
Chosen Books Cinnabar Moth Publishing LLC CMU Press
Consortium Book Sales and Distribution Crooked Lane Books eSpec Books
Gefen Publishing House Gnome Road Publishing Harbor Lane Books, LLC.
Harper Horizon HarperCollins Leadership Harvard Business Review Press
HB Publishing House Henry Holt and Company Heritage Books
Mayobook Minds Shine Bright Muse Literary Publishing
Paul Stream Press Pegasus Books PublishNation
Purple Moon Publishing Revell RIZE Press
Ronsdale Press Rootstock Publishing Running Wild Press, LLC
Seerendip Publishing Simon & Schuster Sunrise Publishing
Tapioca Stories Tundra Books University of Nevada Press
University of New Mexico Press UpLit Press What on Earth!
Wolf’s Echo Press WorthyKids Yorkshire Publishing

by Abigail Adams at August 01, 2025 06:13 PM

Digital Library Federation

DLF Digest: August 2025

A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation. See all past Digests here

 

Greetings, DLF community! August can seem like a quiet, slow-going month, but it’s also the time when so much behind-the-scenes work is being done to prepare for the busy events, meetings, and schedules in the months ahead. If you’re like much of our community, quietly working hard now with an eye toward the fall, just remember to take a beat and soak up a few sunny warm moments here and there – you deserve rest too.

— Aliya from Team DLF

 

This month’s news:

 

This month’s DLF group events:

Arts and Cultural Heritage Working Group Special Session: Reimagining Digital Library Software for Arts & Cultural Heritage Collections

Thursday, August 28 2025 at 1pm ET / 10am PT; https://clirdlf.zoom.us/meeting/register/BcLPpWNkQQubOtFBKN2oLg 

 

Join the Arts and Cultural Heritage Working Group for a special session presented by Kendra Bouda.

Metavus (https://metavus.net) is a free, open source digital collections platform developed by Internet Scout Research Group at the University of Wisconsin–Madison. Originally designed to support STEM repositories with no physical holdings, Scout is now exploring how the platform might be adapted for small to mid-sized libraries, museums, historical societies, and archives that manage both digital and physical collections.

Join presenter Kendra Bouda for this one-hour session introducing the Metavus for Museums project—a customized installation of Metavus tailored to the needs of arts and cultural heritage institutions. Kendra will share project goals, demo the software and its features, and discuss key challenges encountered in adapting the platform. 

Participants will be invited to share their own experiences and expectations—whether by discussing functionality they value in a digital collections platform or reflecting on challenges they’ve encountered in their own environments. This session aims to spark conversation, surface shared needs, and explore ideas that may lead to more adaptable and user-informed tools. Participants of all backgrounds and levels of expertise are welcome to attend.

Register to attend this session: https://clirdlf.zoom.us/meeting/register/BcLPpWNkQQubOtFBKN2oLg 

 

This month’s open DLF group meetings:

For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.

 

DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member organization. Learn more about our working groups on our website. Interested in scheduling an upcoming working group call or reviving a past group? Check out the DLF Organizer’s Toolkit. As always, feel free to get in touch at info@diglib.org

 

Get Involved / Connect with Us

Below are some ways to stay connected with us and the digital library community: 

The post DLF Digest: August 2025 appeared first on DLF.

by Aliya Reich at August 01, 2025 01:00 PM

Artefacto

Five for Friday – Interesting things about mobile libraries

This week we have put together five interesting things about mobile libraries. While these aren’t necessarily new, some are definitely new to us. And others are just neat things that should be celebrated. We are always impressed by the important work that mobile libraries do – outreach, community building, digital equity and more.  The thing [...]

Continue Reading...

Source

by Artefacto at August 01, 2025 08:01 AM

July 31, 2025

Open Knowledge Foundation

The Road to a Press Freedom AI Commons

Opening the International Press Institute (IPI) archive is not just for preservation, but the first step toward creating a digital press freedom commons for the public good.

The post The Road to a Press Freedom AI Commons first appeared on Open Knowledge Blog.

by Renata Ávila & Scott Griffen at July 31, 2025 12:40 PM

July 30, 2025

Open Knowledge Foundation

Unlocking Transparency with Open Data Editor: Join Us at DRAPAC25 and Open Tech Camp Kuala Lumpur 2025

The Digital Rights Asia-Pacific Assembly returns for its third edition on August 26 - 28, 2025, in Kuala Lumpur, Malaysia, bringing together diverse stakeholders to combat rising digital authoritarianism through collaborative, rights-based digital governance.

The post Unlocking Transparency with Open Data Editor: Join Us at DRAPAC25 and Open Tech Camp Kuala Lumpur 2025 first appeared on Open Knowledge Blog.

by Nikesh Balami at July 30, 2025 07:54 PM

Training Indian Researchers on Data Literacy with Open Data Editor

The offline session discussed the different features of ODE using public datasets downloaded from www.data.gov.in.

The post Training Indian Researchers on Data Literacy with Open Data Editor first appeared on Open Knowledge Blog.

by Mohit Garg at July 30, 2025 07:39 PM

Empowering Life Scientists in Africa through Open Data: Highlights from the ODE Training

Using data such as COVID-19 genomic metadata from GISAID and custom datasets like dog genomic metadata, participants explored how to detect inconsistencies and errors in datasets.

The post Empowering Life Scientists in Africa through Open Data: Highlights from the ODE Training first appeared on Open Knowledge Blog.

by Seun Olufemi at July 30, 2025 07:23 PM

ODE Training in Germany and Argentina: A Hands-on Experience with Data Journalism

The sessions were designed to equip participants with essential skills needed in today's data-driven journalism landscape and incorporating the Open Data Editor (ODE) in the data pipeline for journalists.

The post ODE Training in Germany and Argentina: A Hands-on Experience with Data Journalism first appeared on Open Knowledge Blog.

by Gibran Mena at July 30, 2025 07:09 PM

Our Network Contribution to Shape the European Union’s Data Policy

To create a more coherent and effective legal framework, we propose the following 5 pillars as foundations for the European Data Union Strategy.

The post Our Network Contribution to Shape the European Union’s Data Policy first appeared on Open Knowledge Blog.

by OKFN at July 30, 2025 03:22 PM

Andromeda Yelton

In which we ask Copilot to do the team a solid

So, I wrote an alien artifact no one else on my team understood. (I know, I know.) I’m not a monster — it has documentation and tests, it went through code review for all that that didn’t accomplish its usual knowledge-transfer goals — and there were solid business reasons the alien artifact had to exist and solid skillset reasons I had to write it, and yet. There we were. With an absolutely critical microservice that no one understood except me.

One day someone reported a bug and my creative and brilliant coworker Ivanok Tavarez was like, you know, I’m pretty sure I know where this bug is in the code. I have no idea what’s going on there, but I asked Copilot to fix it. Also I have no idea what Copilot did. But it seems to have fixed it. Knowing that I’m rather more of an AI skeptic than he is, he asked, would I entertain this code?

And you know what? Let’s do it.

the Mean Girls girls-in-car meme template captioned 'get in loser, we're reviewing code live'

I mean obviously we’ve gotta have a human review this code before it lives in main and there isn’t an option besides me. But suddenly we have an opportunity, because if I turn this code review into a Twitch-meets-Webex livestreaming party, my whole team can watch me talk my way through it, interrupt with questions, and hear my whole mental model of this section of the code, right? Hear my reasons why this code works or doesn’t, fits with the existing structure or doesn’t?

It turns out I just needed some code from outside of myself to make this possible. And the only way to get code for our alien artifact was from an alien author.

And it was great.

I could see the gears turning, the lightbulbs flipping on, the “+1 XP”s pinging up. I think it was the first time anyone else on the team got real mental traction on this code. Actually it was so great I’m now doing a series of walkthroughs where I just narrate my way through the code with the team until we all get tired of it or feel adequately enlightened. And for the first time, I feel like if we need to add more functionality to this microservice, I might actually be able to assign someone else to do it — with handholding and consultation, yeah, but without me being that guy from Nebraska for my team.

https://xkcd.com/2347/ , the one with the Jenga tower labeled 'all modern digital infrastructure' and the tiny yet load-bearing element labeled 'a project some random person in Nebraska has been thanklessly maintaining since 2003'

So…yeah. I’m pretty interested in machine learning (I did the original Andrew Ng Coursera course and several of its followups, on my own time! I did a whole experimental neural net interface (now offline) to MIT’s thesis collection in 2017! I taught an AI course at the SJSU iSchool! I was an ML researcher for the Library of Congress!). But I’m also reflexively skeptical of it (I use it in my own coding for autocomplete, but I’ve never described code and had the LLM write it! that AI course sure did talk a lot about algorithmic bias! I went to a college whose actual mission statement is about the impact of technology on society! I believe in the importance of real human thought and kinda want LLMs to just get off my lawn!). This use case captivated me because it genuinely surprised me. I hadn’t thought about it as a way to potentially expand the capacities of my team — not in some hackneyed capitalist-grind productivity way, but by getting us outside my own head (the limiting feature in this case) and giving us a shared artifact that we could use as the basis of a conversation to genuinely advance our own skills.

I can hear you asking whether the code was any good. For the purposes of this use case, the great thing is it doesn’t matter; it just has to be relevant enough to support substantive review. But fine, fine, I won’t leave you hanging: I had minor stylistic quibbles but in fact it correctly identified a semantic bug I had totally missed and fixed it in a way which fit cleanly into the existing structure, and I merged it without qualms.

And yesterday Ivanok came up with a clever new way to leverage AI for teambuilding and skill-building purposes, so I’m gonna tell you about that too! But in the interests of word count, you’ll have to wait for part two :).

by Andromeda at July 30, 2025 02:03 PM

In the Library, With the Lead Pipe

Book Club Pláticas: Reflexiones on Culturally-centered Methodologies

In Brief In spring 2024, two Latinx colleagues at California State University, East Bay, developed a pilot program focused around hosting a book club which has evolved into a larger exploration of plática methodology. This article explores culturally sustaining co-curricular collaborations and spaces on a university campus through the use of book club pláticas and PRAXISioner reflexiones (Reyes, 2021). The authors reflect on their roles as PRAXISioners, plática as methodology and practice, and engage on the value of self-sustaining practices as Latine educators.

By Daisy Muralles and Vanessa Varko Fontana

“This pedagogy makes oppression and its causes objects of reflection by the oppressed, and from that reflection will come their necessary engagement in the struggle for their liberation. And in the struggle this pedagogy will be made and remade.”

Paulo Freire (1921-1997)

Introduction

In spring 2024, we took a popular model often used in American libraries, the book club, and added a cultural and community-building lens as part of that experience. In this article, we will share how we came to this work as PRAXISioners, and the barriers we aim to break down through our collaborative work. We will also describe how our collaboration on the book club project acted as a vehicle to hold culturally informed pláticas and what they looked like; and, finally, we also reflect on how this work allows us the space to come together with our own experiences as teachers and learners. The book club gave us an opportunity to explore the works of Latine scholars and authors, to engage in pláticas, allowing us to dive into new concepts and ideas about our culture that we had not discussed before–the unnamed things that somehow we understood as being part of our cultural identities but were not always sure of where they came from or why they existed. Throughout this article we will use the gender-inclusive “Latine” in place of the plural Latinx or Latina or Latino or Latin@, or its many variations. Created by feminist and nonbinary communities in both Latin America and the United States in the 2000s, Latine aims to describe all people, not just men or women (Guzmán, 2023).

We hope readers will walk away knowing the importance of culturally-sustaining co-curricular programs. We hope readers feel empowered to lean into their cultural-sustaining pedagogies to inform practices that are by and for BIPOC communities. We hope to inspire or mostly affirm for librarians who are already doing this cultural work, that this is important work for ourselves, our students, and campus communities.

Some of the content of this article was originally presented as, “Praxisioners Platicando: Fostering Belonging Through Culturally Centered Learning,” for Case Studies In Critical Pedagogy hosted by the Metropolitan New York Library Council (Muralles & Varko Fontana, 2024). The “Case Studies in Critical Pedagogy” event was a primer for learning about and thinking about anti-colonial theory and pedagogy. This article hopes to expand the reflective process of that presentation.

Context and Positionality

We are both Latine educators at California State University – East Bay, a higher education institution. Cal State East Bay is one of 23 campuses in the California State University (CSU) System. In 2024, it was ranked 13th in the USA Today Most Diverse Universities, and listed in the top 2% by CollegeNet.com in the Social Mobility Index, among other awards and recognition (Cal State University East Bay, 2024). Additional facts from the 2024 report include that Cal State East Bay has a female population of 59.3%, 57.1% are first generation college students, and 40.9% of undergraduates identify as Latinx. Cal State East Bay is also considered a commuter campus. In an internal Parking and Transportation Services report (FY 22-23), from 581 (65%) of undergrads, most students commuted over 5+ miles to get to campus with 119 reporting they commute for more than 30 miles to get to campus (Parking and Transportation, 2023). These are important demographics to keep in mind as our target audience was to reach Latine women on our campus.

Specifically, we are a Latinx student success coordinator and a Latinx librarian navigating relationships, politics, changing leadership, ongoing financial hardship, and more at a Hispanic-Serving Institution (HSI) that has received the Seal of Excellencia. Institutions that receive this national certification are recognized for their direct service to better serve Latino students, rather than the HSI designation which is based primarily on enrollment numbers (Excelencia in Education, 2024). However, we started working with each other in part because we were frustrated that “HSI” and “Seal of Excelencia” still felt like lip-service because of the ongoing financial cuts to our programs, including hearing that some of the programs that led to receiving the Seal of Excellencia were no longer going to be supported by the university.

Barriers Latine Students Face

The ongoing budget cuts, consistent news of low enrollment, and recent executive orders, such as Ending Radical and Wasteful Government DEI Programs and Preferencing (Exec. Order No.14151, 2025), and Ending Illegal Discrimination and Restoring Merit-Based Opportunity (Exec. Order No.14173, 2025) exacerbate issues of discrimination, bias, and exclusion experienced by immigrant and Latine communities pursuing higher education. And despite there being efforts to promote diversity and inclusion in higher education prior to these executive orders, Latine students continue to face barriers to fully engaging and feeling a sense of belonging on their college campuses (Manzano-Sanchez, et al., 2019; Dueñas & Gloria, 2020). This impacts the psychological well-being, academic performance, and overall college experience of Latine students (Manzano-Sanchez, et al., 2023; Fernandez, et al., 2022). Additionally, Latine undergraduates are likely the first in their families to attend college (Postsecondary National Policy Institute, 2022). Our students share with us that they often feel that they can’t fully express their problems or college experiences in their home setting, while also not being able to share the challenges they have at home with their peers.

The book club became a way of meeting these problems, seeing them, and learning directly from students about how these problems impact them and their sense of belonging. Hearing our students’ stories and connecting with those stories highlighted another ongoing problem in higher education. We hear these individual stories and recognize them as stories of resilience and hope. We see the problem-solving from our students and recognize that we have a shared struggle with them, that as Latine women in higher education, we still share these similar experiences in our academic journeys as well. We are reminded that before March 2022,  there was no Latinx Student Success Center. We are reminded that there has never been a Latinx librarian at this institution before I was hired in 2020. But we are here now and what our roles as librarians and student success center coordinators afford us is to bring these students together, not only to us but to connect them with other Latine staff and faculty, to learn from our shared experiences and support each other. We also recognize that as Latinx educators we hear ourselves saying similar things. Both of us have experienced imposter syndrome (Brown, 2023), vocational awe (Ettarh, 2018), or the feeling like we are the “the only one” (Pierre, 2024). But the book club brought us together.

Where we are coming from

In our work, we remind ourselves and each other that we need to keep supporting our cultural selves, as Latine educators and cultural workers in higher education, to be able to welcome and support our racially and ethnically diverse students. Below we provide more information about where we are coming from to help readers understand how we came to this point in our work. We hope to share a bit more about how together, through the book club, our exploration of plática methodology, and our ongoing reflective practice, we have begun to more systematically explore how our partnership with the University Library and the Latinx Center can work towards building culturally sustaining, co-curricular spaces for our students.

Daisy: Born and raised in a primarily immigrant community in a Black and Hispanic neighborhood in Central Los Angeles, I was one of four daughters to Guatemalan immigrants. My parents ingrained in me that education was a priority. For this reason, my father, in an effort to improve our educational attainment and social welfare (there was a lot of substance abuse and gang violence in our neighborhood), moved us to the San Fernando Valley. We were no longer in a familiar environment of primarily Black and brown folks, but as the demographics of our new community changed, and our field of view grew, we were exposed to more diverse communities. This continued throughout my life and several years later, while working as a staff member at an archive, I was introduced to the California Ethnic and Multicultural Archives, curated by Salvador Güereña. I did not find an exact cultural representation of my experience in the different collections within this archive, but I did see something that went beyond. I began to expand my understanding about the shared immigrant experience, about being poor, and coming from a working class family. I felt the “ni de aqui, ni de alla” feeling across silkscreens, postcards, and art work from Chicanx/Latinx, Native American, Asian American and African American artists but something else as well. My cultural self was not necessarily exactly represented but there was some other visual articulation that represented the neighborhoods I grew up in, the food we ate, the familiar figures in our lives.

Fast forward to my current position in the East Bay, where I am the only Latina in my department of 12, I was feeling the cultural disconnect again. Despite being in an HSI/Seal of Excelencia receiving institution, I felt very much like the only one. It was not until a presentation with the Chicanx/Latinx Staff and Faculty Association that I was able to connect to other Latine faculty members. It was through a relationship with one very special Latina faculty member (thank you, Professor Crystal Perez), through her sisterhood, that I was able to meet and befriend other Latine folks on campus, which thank goodness, also included Vanessa. What started off as first meeting in events and attending each other’s workshops and programs turned into hanging out to de-stress, in addition to recommending courses and making student referrals to each other. And through these encounters we finally realized we could be working more closely together because being together encouraged us to do the outreach and programming we wanted to do to serve our Latine student community with our authentic selves.

Vanessa: I was born and raised in Los Angeles county in the 1980s and ‘90s to a Salvadoran mother and Guatemalan father. Within four years my father died, which began my experience being raised by a single mother and later embracing my identity as a fatherless child. Along with this identity shift came the new reality of balancing suburban living and inner city experience. This looked like adjusting to a private school culture as a Spanish-speaking child during the week and staying connected to family and culture headquarters that remained in the heart of an urban mecca on the weekends. Simultaneously, my family chose the route of acculturation and silencing the painful history of our roots in El Salvador which included poverty, civil war, and violence. While the awareness came later, it is clear that these early experiences became a foundation for the career in education that I later embarked on and have embraced for over twenty years. What I came to learn is that this duality mirrored the journey unfolding as a first generation Latina in the US. The feeling of “ni de aqui, ni de alla” explained my struggles as a daughter of an immigrant, a fatherless child, and the choice to develop as a social justice educator and academic scholar. I realized that my liberation would come with learning my history and asking questions to better understand my ancestors, their pain, and their radical resistance. This realization led me to find a degree focused on activism and social change from New College of California and the space to learn from various social movements, including those in El Salvador. As my undergraduate studies offered me the historical and theoretical framework to my early experiences and observations, it was the community work I delved into while studying that provided the additional layer of self awareness and commitment to social justice education.

Personally, my first generation experience meant working 40+ hours a week while maintaining my path towards graduation. These experiences included the typical retail student jobs as well as entry level jobs in education and youth development programs. The non-profits that I connected with offered me additional knowledge and theories such as harm reduction, youth development, positive sexuality, and anti-oppression. As I earned my bachelor’s degree and continued my professional career in non-profits and schools, I recognized the importance of mentorship and guidance. It was also illuminated that when this connection was created with fellow Latine professionals, it added a unique layer of support and understanding that has been essential to my professional and personal goals towards healing and liberation. Hearing their stories, feeling their support, and creating communities helped redirect my professional development to healing centered engagement instead of the typical burnout path that many of us educators experience as we navigate the bureaucracy and institutional oppression that exist at every level in education (Ginwright, 2018). When I arrived at CSUEB as the inaugural Latinx Student Success Center (LSSC) Coordinator, I approached the role with the intention to build community throughout campus and find like-minded individuals. This mindset led me to connect with Latinx professors in the English department with the intention to collaborate on campus events. These successful partnerships led to professional collaborations that would become friendships. Naturally, this camaraderie and safety created a space to share ideas and thought partnership on continuing to build together as individuals and as professionals on campus. When I met Daisy and learned of our shared Guatemalan and San Fernando Valley roots, it felt like a familiar and comfortable space with a cosmic push to create a collaborative project with the library and LSSC. This was proven by the ease and natural flow this project came to be and through the healing, powerful, conversations with each other and student participants.

Becoming PRAXISioners

In our work, we have begun to adopt the term PRAXISioner in referring to our efforts to address systemic problems experienced by the individuals we work with, the communities we aim to uplift and support, and ourselves. A PRAXISioner reframes the practitioner through praxis. A PRAXISioner is thus embedded and is concerned with the history, needs, and aspirations of the community towards self-determination and actualization. The PRAXISioner is continuously studying and sharpening their analysis by deepening their ongoing learning and self-reflection of critical theories; the PRAXISioner understands the affirming and healing potential of their work especially through historizing, problematizing, and reframing (Reyes, 2021). Reyes (2024) further shares that a PRAXISioner:

… is to be of concrete help to my local and global community in our struggle for community preservation and liberation…rooted in [Paulo] Freire’s conception of praxis, which involves engaging the language of critique to problem-pose one’s material conditions within a cycle that includes engagement with critical knowledge/theory, self-reflection, dialogue, and action.

When the opportunity for our departments to collaborate came up, we discussed how to bridge our work and the importance of including self-reflection. It was through our conversations that we saw this term of PRAXISioner as incredibly reflective and applicable to us, both in aspirational and inspirational ways. Therefore, we’ll be using PRAXISioner to describe ourselves in this process as researcher-scholars. In this process, we are continuously recognizing that we need to go through the cycle of problematizing, visualizing, reframing, and reimagining the ways in which we want to lead our practice in supporting our students and ourselves in academic spaces. We hope our reflective article describes how pláticas allows us to do this process, and we are affirmed by the growing educational research around pláticas (Bernal, et al., 2023). We also hope that by introducing this idea of PRAXISioner to our readers, that we can be more critical about how we thread the way we show up for our students in both professional and personal ways.

On Plática Methodology

In this article, we go in a few circles about our reflective process. This is primarily because it has been that type of cycle, an iterative process that we repeat. As we both learn more about plática methodology we find ourselves reading about what we are doing. As PRAXISioners, this process, the relationality, and the theorizing that takes place feel familiar. Learning about plática affirms us in so many ways by recognizing that this methodology has been enacted for centuries–in our communities–in our pedagogies of the home (Fierros & Bernal, 2016; Garcia & Bernal, 2021). In this process, we engage in reflexiones as part of an autoethnographic approach. Autoethnography seeks to describe and systematically analyze personal experience in order to understand cultural experience, which we aim to do with our reflexiones (Ellis, Adams & Bochner, 2011). We analyze our personal experiences in order to understand the broader experience of Latine cultural workers and information educators, and further our understanding of our Latine students.

To clarify, pláticas are informal conversations that take place in one-on-one or group spaces that allow us to share “memories, experiences, stories, ambiguities, and interpretations that impart us with a knowledge connected to personal, familial, and cultural history” (Fierros & Bernal, 2016; Bernal, 2006). Fierros & Bernal articulate, “…family pláticas allow us to witness shared memories, experiences, stories, ambiguities, and interpretations that impart us with a knowledge connected to personal, familial, and cultural history” (2016, p. 99). We believe that through our book club pláticas we were able to engage in familial pláticas, communicating thoughts, memories, ambiguities, and interpretations of our own experiences as Latine individuals in higher education through the discussion of culturally relevant readings, songs, and artworks. By allowing our conversations to shift where they needed to go when they did, it opened us up to explore those familial spaces. The readings shared brought personal, family, and community stories from participants. These stories about lived experiences, family traditions, and community connections were our book club anchors. We began to realize we were engaging in the practice of plática because we were building relationships with each other, honoring each other’s stories, and “find[ing] that pláticas beget other pláticas” (Guajardo & Guajardo, 2013).

A vivid example of engaging in the practice of plática emerged from the reading of “My mother is the first one in her family to leave the Dominican Republic for the United States” (Acevedo, 2023, p. 181). We started with first engaging in reactions and first impressions and quickly discovered that all twelve of us connected and had stories about preparing plátanos and eating different plantain dishes. We quickly began to discuss the recipes but primarily what sides we ate with our plantains. We shared the regions of our ancestors and the impact they had on the plantain dishes we enjoyed while never really knowing about their history. As the plática conversation continued, the topics organically flowed into a conversation about poverty and the socioeconomics of food. These conversations taught us about our different lived experiences and how we all came to be in this same room together. This realization was a reminder of the power in our individual and shared history as Latine folks in the United States.

The moments occurred throughout all of our sessions. These sessions happened each week, and our conversations evolved to deeper topics as the weeks progressed. We recognized that our pláticas were a teaching and learning process and a teaching and learning tool. We were “contributors and co-constructors of the meaning-making process” in our pláticas (Fierros & Bernal, 2016, p.111). The pláticas allowed us all to engage in a hopefully familiar way of learning, to collect and synthesize data from and with each other by ways of reflection, and to extend our ways of knowing through and with each other. In our book club pláticas, we exchanged knowledge, and sometimes it felt like we were building new knowledge of navigating that crosswalk between home and higher education. This meant that our pláticas were healing; they were open and vulnerable spaces that allowed us to look at ourselves individually and collectively as Latine folks in academia. Much of this was because of the location of our conversations as scholars in higher education coming together, opening up, and relating while we have a shared experience.

Project Overview

From creating community to developing a research idea

In the fall of 2023, Daughters of Latin America (2023), edited by Sandra Guzmán, was going viral on social media. Our personal excitement to share Daughters of Latin America (DOLA), combined with the expressed interest that we heard throughout the semesters from students looking to build a community connected to their academics, gave us the idea of a book club. The book club format felt like the right  vehicle to lead us to our destination of community building, a sense of belonging, and learning more about our culture. We used the basic structure of a book club by choosing a book, creating a meeting schedule, reading and discussing the works presented in the book. The book club really was the container we were looking for to bring us together. Using our connections with various students and student organizations, we anticipated forming an inclusive environment where folks could come together, meet other students and make new connections.

The anthology is a great introductory work to learn about women authors and different styles of the written word, from poetry to essay to short story, ranging throughout history and the Latin American diaspora. The book also lent itself to align with Women’s History Month and the work we were each already doing in our primary roles on campus. But, we knew we were going to host a different kind of book club. Guzmán encourages readers to “read from front to back, back to front, or open.. at any page. It’s also meant to be read while listening to songstresses of the Americas – from salsa Queen Celia Cruz to Toto La Momposina, from Ile and Elza Soares to La Lupe…” (2023, p. 17). And this we did. The songstress list she provided inspired us to create a supplemental YouTube playlist that we used to showcase songs before our book talk sessions as ambience while folks got settled into the Zoom room and then highlighting the artists during the sessions to revel in their stories for inspiration.

Our approach and focus felt aligned with our values. Much like the Puente Programs, GANAS, or other culturally-based college programs, we saw that a book club would allow us a collective space to use culturally relevant pedagogies and practices, such as pláticas and PRAXISioner work (Castillo et al., 2023). This collective space where students engage in learning outside the classroom is a co-curricular environment we aim to create in all of our programming. The culturally-relevant content of the book is an important component of the book club because it featured the various Latine authors, creators, scholars, activists, and more, that served to inspire, provide examples, and representation. The diversity of these authors and literary styles provided a welcoming introduction to authors that most of us had never had the chance to explore until now. Additionally, the supplemental content that included music, sounds, and visual art and short videos to build community in the group helped us reimagine the book club experience. Much of the supplemental content we shared can be viewed on our public LibGuide. Through dialogue and thought-partnership of the content covered, this community saw, heard, and experienced how the personal is political. While our planning is student-centered, we were also able to recognize the impact that it had on us personally as Latinx professionals on campus.

Our first barrier was the financial aspect of purchasing book copies. This was important for us because we knew we wanted to use the book as an incentive for participants. Also, giving participants a book they could touch, hold, and have on their shelves to return to after our book club sessions felt much more meaningful. We looked for grants outside the institution, not trusting that we would be able to use money in our programs for a one-off book club. However, in talking to colleagues about the project, and recognizing that this could be more than just hosting a book club, we were able to purchase 15 copies of the book through a Library Faculty Research Support Grant established to help faculty in our Information Studies Department pursue their research. It was in fact at this juncture of applying for the faculty research grant where we came to realize this was something we could use to more directly study, practice, and learn from for our work as educators. We also needed funds for food, decorations, and small tokens of appreciation. For this, the Latinx Student Success Center was able to cover food for the in-person events as long as we posted the event on our campus’ public events website. We also created bookmarks in-house to accompany the book, knowing that these small details centered in celebrating Latine women were components to creating a culturally-sustaining environment for our book club.

Engaging students in co-curricular learning

It was important for us to customize the book club format to the needs of our students at our college campus. We promoted the event on the campus’ official student clubs and organization events page, the library’s official Instagram page as well as on the Latin Student Success Center’s Instagram page. We received a total of 16 interested individuals, primarily Latine women (and one male participant who showed up consistently!) and had 14 participants show up with seven undergraduates who showed up to all events. Some of the participants that had expressed interest were in fact faculty, but unfortunately most faculty interested were not able to make the book club meetings. We did, however, check-in with them about the book club and their encouragement in the work we were doing was also meaningful for our excitement.

One of the ways in which we updated the format was in scaffolding the meeting process by using a hybrid approach. Our first three meetings were online to better accommodate the schedules of our largely commuter campus. We also decided to meet in the evening, which meant that this was most likely after work and school for most folks to give everyone (including us) the flexibility to come together “after hours.” We also believed that this would influence the energy of the group, since we would presumably be more comfortable and cozy in our own spaces (i.e. in bed with a cup of tea, lounging in the living room), hopefully allowing us to be more open to sharing. Our approach for the book club program was hybrid because we held our last two sessions in person during U-Hour, a time when students are not scheduled for any classes and would most likely be available if on campus. The first in-person session was at the university library and the second in-person session was at the Latinx Student Success Center. This was intentional to encourage students who rarely come to campus, the chance to visit the university library and the Latinx Student Success Center.

We saw that coming together after meeting online deepened connections and trust established in the Zoom sessions. Students had either already met us or had gone to the Latinx Student Success Center to meet us casually before our official in-person meetings. We were excited that we had high attendance for our first in-person meeting because we thought folks would not want to come for an on-campus event. But we were beyond thrilled to hear that it was largely due to the relationship-building process that was part of our Zoom sessions prior to meeting in-person. In some cases, this was the initial connection to campus that helped them continue to find ways to build community on campus.

We did not want to follow typical book club conventions (i.e. read chapter, have comments/feedback ready, expectations to share) but we also did not want it to feel like a typical online classroom (i.e. assigning chapters to read, feeling behind if you did not understand or read the poem or essay). We tried to address these things that made us think of “stale” book club sessions or a classroom setting by reminding folks that they did not have to read the content in advance and verbally stating that we did not want to be a classroom. But we also changed the vibe by including multimedia (i.e. playing music) and casual interactive elements (i.e. talking about our day) in our virtual sessions. Not only was the content culturally-relevant to the student population we were engaging with, but we also wanted to make sure our content could be experienced in a variety of ways beyond the text. Along with our weekly reminders the readings were voluntary and that the only requirement was to show up and tune in, we shared the selected readings as PDFs both before and during the sessions. We shared our screen to show a quote or passage and highlighted things that stood out to us, which meant that a participant would still be able to review the content during the group’s discussion and provide their reflections and ideas without having read the content ahead of time. We believe that this also helped us with attendance and participation as we continued the series because the barriers to be ready for book club were removed, and we were able to provide multiple ways of engagement. Building our book club in this way allowed us to go a step further into our pláticas.

Reflecting back on how we implemented the pilot we recognize that even without funding, we can still do this type of book club. We can also imagine bringing this into a more formal classroom setting as well. It feels doable to either scale it up or simplify it because the important components were making sure the environment was culturally-sustaining through the content shared, that the vibe was casual, meaning that there was a low/no-stakes commitment and preparation, and that it focused on engaging in meaningful relationship-building pláticas. As we write this, we can imagine making a mini zine with just the readings and incorporating lyrics from the songs we hear together to make it offline. We can help a student organization do this for their own club members. We think that there are many ways that we can imagine scaffolding this book club.

Ongoing Reflexiones

The focus of this article was to dive into our reflective process as PRAXISioners while engaging in our understanding of plática methodology. We hope that we have provided some examples of how plática emerged as central to our planning, how it appeared during book club, and now, how we continue to use it in the aftermath as well. Here we see how the cycle of learning about plática methodology was actually a return to our cultural history which allows us to affirm community learning practices that empower us to understand the critical power of practical and theoretical tools we already carry from our homes. The affirming and healing potential of this work helps us understand our own histories and helps us reframe the problems and issues we experience–in this case, within higher education. Engaging in reflection has helped us understand our experiences in this process and has ultimately given us the confidence to continue the process of engaging in plática methodology. We continue to hear ourselves say we know this and we’ve been doing this. These reflective conversations have been a source of nourishment for our personal and professional practice. As practitioners, we recognize that creating these spaces benefits our students and ourselves–academically, personally, and professionally–both in the classroom and beyond. To anchor our planning, process, and praxis, it is critical that we are intentional about how we continue to connect with plática as a transformative qualitative inquiry process and methodology (Carmona, et al., 2021).

Reflexiones post book club pilot

Below are excerpts from a plática we held on September 3, 2024, five months after our book club experience, but still before our preparation for the 2025 book club planning. We asked each other questions (Appendix A) to give us some structure and to allow us to tap into our recent and past experiences as educators in framing our book club experience. We reflected on the book club pilot from earlier in the year and realized that we would soon start planning our next iteration. Our conversation is reminiscent of the dialogue in bell hook’s “Building a Teaching Community” (1994, p. 129). hooks has been a guiding light in our educator journeys. The questions we asked each other helped us bring in our personal strengths and interests to enhance the session, but ultimately, we wanted to reflect and prioritize a sense of belonging over the planned content.

Vanessa Varko Fontana: I think for me what was cool was that it was something that started as personal excitement. Right? I got this book. Personally, I was like, “Oh, my gosh!” I was so excited to read it and share it. And I just kept thinking, “Wow! What would have happened if I read these stories when I was an undergrad?” “What would have happened if I was able to talk about stories like this with my peers – when I was going through school?” I think it also speaks to us when we start from a place of our own personal excitement, and how our work integrates with our hobbies or passions.

Daisy Muralles: Yeah, as you were sharing that, it reminded me of one of the moments that I won’t forget. It was one of our last sessions. And there was a student that was like, “it is really amazing to see you two like these two women basically collaborating and working together and bringing us together in this way.” I forget how she phrased it, but that’s what I took away, and I was like damn. I have a few questions about mentorship and like us as [Latine] women in these academic spaces, but, like her words right there at that moment just made me feel like this was so necessary. This was so needed.

VVF: This was one of those few projects that I’ve been able to share more sides of me than I can in different academic spaces like, the creative and goofy and loving to read. I think in academia, sometimes we only tap into a couple of our sides at once, and I think that for me that was really meaningful… from the art, the music, like I felt like I could bring in all these different parts, share different parts of me in the space that I don’t really get to always.

DM: And that makes me think very seriously about myself in/at our campus, and how important it is to be myself in front of our Latine students. But at the same time how necessary it was for me to have that space too with those students and with you, and so it really reminded me about what education can look like and what learning from each other outside the classroom looks like. I think it was a very important moment, and I don’t think I’ll ever forget that.

VVF: And after starting Book Club I noticed them coming by to the point where some maybe just came in once or twice, but a few more started to become regulars, and I would see them multiple times during the week. And you know that definitely helped with the relationship building, but like the resource sharing and like building, you know, community on campus. And I was able to help students or talk to them about their upcoming graduation, include them in the Chicanx Latinx Grad, talk to them about life after graduation, and I think, even hook them up with some other campus jobs. So I really appreciated how the book club allowed me to, you know, get to know students that I wouldn’t have before, and that we were able to because we met each other through the book club.

DM: We were learning so much more about that individual person. And we just got to hear different parts of who those folks were in that Zoom room and then in-person with us. And so it made it intimate. Kind of like quickly. We were able to feel intimate more readily because we were already learning so much from each other. And they didn’t have to. That was the other part. They really didn’t have to [share]. But people opened up. It wasn’t like people were giving out their whole life stories… but there was trust, and you could feel it….

We left this plática feeling refreshed and empowered to continue on this project. At the end of our plática, we asked ourselves, “What happens when people feel affirmed and seen?” From our own experience in it, and from the feedback we heard from the participants, we learned that the book club helped us bring Latine students together to engage in plática. It also showed us that by creating opportunities to engage in culturally-sustaining content with other Latine communities on campus, Latine students can become more comfortable in using campus resources, like visiting the Latinx Student Success Center, engaging in campus events, and creating opportunities to talk about their academic progress, like sharing more information about their classes with us. When we bridge personal connection with academic experiences we are able to demonstrate the intersectionality of culture in our work, no matter the major or industry. This type of information allows us to connect our students with our network of Latine scholars, instructors, advisors, and other folks we know can provide culturally-responsible communication. And we were also able to connect students with new Latine mentors and scholars, not only the ones on our campus, but the inspirational scholars, authors, creators that were showcased in our book club sessions. It was a powerful experience to witness the ripple effect on the group’s consciousness and personal reflection–how a poem led to a song, which led to a painting that was connected to our cultural roots and families, and how it allowed us to see the impact of our goals and agency to succeed and thrive. All of these were positive outcomes of the book club and motivation to continue this project.

Reflexiones a year later

Below we take another look back to our first book club experience a full year later. We try to respond to the following questions: how we felt after the first Zoom, after the first in person, and after the last one; our interpretations of the students’ feelings throughout the process; and how this affected our relationship with our students, each other, and our Latine identity.

DM: I am so glad that Vanessa and I were homies first before we became collaborators. When we first got started, I knew that our energy together would be good for a book club or podcast (this got confirmed by students as well). I feel like Vanessa gives me a different kind of confidence and I am able to be myself more readily. This is because I know Vanessa; she has my back. We have shared cultural experiences that we know about each other. During that first book club session, I was definitely nervous, had some technical difficulties because of course I don’t know how to use slide presentations when I am presenting. But knowing that Vanessa was there filled me with excitement and courage. I think in that first session, I did not consider myself a PRAXISioner; it was something in the back of my mind but it was probably not until after our second online Zoom session that I recognized my role in that space. Yes, it was one part moderator to help navigate through the content, but it was also a collaborator not only with Vanessa but with all of the participants. We had to share and be vulnerable together.

As mentioned earlier, we were surprised about the high attendance in the first in-person meeting, but by then (three sessions in) we had already formed relationships with each other. As an advisor to an academic Latina sorority, I want to see how I can continue to build that sisterhood further by leaning into the academic, professional, and lifelong learning support I aim to provide as a Latinx Librarian. I think I can continue doing that by engaging in plática in multiple spaces-being intentional about it. This also expands into the teaching that I do. In one of my courses, students complete a class podcast focusing on visual and auditory experiences. Learning from book club, I hope I can adjust the podcast for students to create their own types of liberatory teaching spaces. This process has allowed me to find myself as a scholar, recognizing that I want to learn and teach in these ways.

VVF: It was definitely a benefit that we established a rapport with each other but also we trusted our professional work ethic and followed through. In terms of the preparation we were intentional about what we could do or what each of us could offer from our departments. It was refreshing to have built an equitable collaboration that was uplifting and motivating, versus draining or unsustainable.

I was really proud we created a way to address the post-COVID low engagement on campus while introducing a new text and introducing writers throughout the Latine diaspora. As Central American Latinas, we wanted the chance to offer scholars we wish we learned about earlier in our lives. It was exciting to feel innovative and part of the solution. At the same time, I was really nervous about the Zoom factor and staring at gray screens and the impact that has on the vibe and my personal facilitation and the ability to stay engaged. I am so happy I was able to share that plan around that. Not only did we offer a way to share an image with the group while staying off screen, we were able to sprinkle a little tech tool too. I always remember when we asked folks to share images of their cultural foods, we got to see a screen of amazing delish dishes and stories. It was really cool for students not only to humor us with those asks, but really lean into it and like to appreciate it because we were not forcing folks to be on camera but we did offer a way to participate. 

I remember being nervous about reading the readings. We knew we did not want to demand reading the weekly selections but that’s the typical book club format we were accustomed to. This is when I recognized that the anthology format would help us with this. Curating each week with a combination of shorter pieces, essays, excerpts, made it less intimidating to read some or all of the selections. We also had the readings scanned and ready to share with the Zoom room. Whether students had read it before the session or during, they had the opportunity to share their reflections. Creating a familiar sequence from the first session allowed the group to build rapport through their reflections and vulnerability. I know that I felt that when students began to share their faces voluntarily, came to meet us at the library and LSSC between book club meetings, and kept showing up, reading more, and sharing. We both recognized that by the last session we barely spoke, because the conversation just flowed and the connections were sparking with minimal facilitation.

This experience has reminded me of the power of weaving cultural identities in our work as a powerful tool to help connect with students and these connections are a portion of their experience towards graduation and building a fulfilling life and how that impacts positive change in our communities and beyond.

Conclusion

As we learn about plática methodology, and build on the work of others (Fierros & Bernal, 2016; Carmona, et al., 2021; Bernal, et al., 2023), we reflect on our process. We reflect on our shared work, and what it means to develop, create, and consider new practices and methods that honor our cultural selves. These conversations help us name and be honest about how we are engaging and learning not only about plática methodology but about being PRAXISioners. We are finding the words to affirm our academic experiences and scholarship by validating our cultural practices from home. What these pláticas are doing is teaching us a new language, providing us with this opportunity to uplift our practices, blending our cultural education with this academic journey we have chosen to be on. We are reminded of the value of this process when we create these spaces, not just for ourselves but with other Latine scholars.

This process of learning from our students, learning from each other as PRAXISioners, and learning from other Latine scholars has been an affirming and empowering experience. Responses from our pilot survey reminded us of the importance of holding space for Latine students on our college campus. We heard the healing, the need to share, and the need to hear and learn from each other. This is particularly important now in this current political climate. We know we need to do this work even more because it means more, it is necessary. As we work to problematize, visualize, reframe, and reimagine our culturally-sustaining work between sessions to help us navigate this academic landscape together, we are able to see how this collaboration between the university library and the Latinx Student Success Center is beneficial to us. The collaboration allows us to center a cultural and community-building lens to a library’s book club event that we know is more than representation–it is about empowerment. Together, we are able to explore our culture and rediscover these research practices that we learned at home with our mothers, siblings, and other family. We explore these ideas in a familial space, in a culturally-affirming way, where we can laugh, cry, and be vulnerable about our various identities that are between home and school. Again, we are reminded that we need to keep supporting ourselves as Latine scholars, educators, and cultural workers, to be able to show up for our racially and ethnically diverse students. We are reminded that we need to keep creating these spaces that support and retain us because most places do not acknowledge these parts of us.

We will continue our work in understanding how we can dive deeper into the impact of culturally sustaining co-curricular spaces, pláticas, and strengthening our PRAXISioner skills. We hope that this reflection inspires other educators, librarians, and practitioners to incorporate culturally relevant practices like pláticas and reflexiones as part of their anti-colonial practices in instruction, outreach, etc. Though some methodologies are specific to certain groups, we hope it inspires folks to learn more about cultural learning practices.

We believe that through this work we are building a network with and for ourselves and our students. We are building a support system that not only touches on academic struggle but the personal as well. We remind ourselves we are not foreigners and that we can build our own spaces. We listen to each other and figure out what our students need, and also what we need. The semesters will speed up and slow down, but our personal experiences in academia allow us to connect and empathize with our students’ experiences. So we have to take the time and encourage self reflection in ourselves, and share with each other our aspirations and goals. In addition to providing guidance through a praxis framework, we have the chance to introduce organizational efforts and institutional resources to students, which means that we need to make time for our pláticas.


Acknowledgements

We would like to acknowledge internal peer-reviewer, Brittany Paloma Fiedler, and our external peer-reviewer Veronica A. Douglas as well as the Lead Pipe Editors, including Jessica Schomberg, Publishing Editor.


References

Acevedo, A. (2023). My mother is the first one in her family to leave the Dominican Republic for the United States. In S. Guzmán (Ed.), Daughters of Latin America: An international anthology of writing by Latine women (pp. 181-182). HarperCollins Publishers.

Bernal, D. D. (2006). Learning and living pedagogies of the home: The mestiza consciousness of Chicana students. In D. D. Bernal, C. A. Elenes, F. E. Godinez, & S. Villenas (Eds.), Chicana/Latina education in everyday life: Feminista perspectives on pedagogy and epistemology (pp. 113–132). State University of New York Press. https://doi.org/10.2307/jj.18255003.15

Bernal, D.D., Flores A.I., Gaxiola Serrano, T.J., & Morales, S. (2023). An introduction: Chicana/Latina feminista pláticas in educational research. International Journal of Qualitative Studies in Education. https://doi.org/10.1080/09518398.2023.2203113

Brown, O. (2023). BIPOC women and imposter syndrome: Are we really imposters? Urban Institute of Mental Health. https://www.urbanmh.com/uimhblog/bipoc-women-and-imposter-syndrome-are-we-really-imposters

California State University, East Bay. (2024). 2024 Facts. https://www.csueastbay.edu/about/files/docs/2024-factsbk.pdf

Carmona, J.F., Hamzeh, M., Delgado Bernal, D., & Hassan Zareer, I. (2021). Theorizing knowledge with pláticas: Moving toward transformative qualitative inquiries. Qualitative Inquiry, 27(10), 1213–1220. https://doi.org/10.1177/10778004211021813  

Castillo, F. E., García, G., Rivera, A. M., Hinostroza, A. P. M., & Toscano, N. M. (2023). Collectively building bridges for first-generation working-class students: Pláticas centering the pedagogical practices of convivencia in El Puente Research Fellowship. Teaching Sociology, 51(3), 288–300. https://doi.org/10.1177/0092055X231174511

Dueñas, M., & Gloria, A. M. (2020). Pertenecemos y tenemos importancia aquí! Exploring sense of belonging and mattering for first-generation and continuing-generation Latinx undergraduates. Hispanic Journal of Behavioral Sciences, 42(1), 95–116. https://doi.org/10.1177/0739986319899734 

Ellis, C., Adams, T. E. & Bochner, A. P. (2010). Autoethnography: An overview. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 12(1). https://www.qualitative-research.net/index.php/fqs/article/view/1589

Ettarh, F. (2018, January 10). Vocational awe and librarianship: The lies we tell ourselves. In The Library With The Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2018/vocational-awe/

Exec. Order No. 14151, 90 FR 8339 (2025). https://www.federalregister.gov/d/2025-01953

Exec. Order No. 14173, 90 FR 8633 (2025) https://www.federalregister.gov/d/2025-02097

Excelencia in Education (2024). Why does the Seal of Excelencia matter? https://www.edexcelencia.org/seal-excelencia

Fernandez, L. R., Girón, S. E., Killoren, S. E., & Campione-Barr, N. (2023). Latinx college students’ anxiety, academic stress, and future aspirations: The role of sibling relationship quality. Journal of Child and Family Studies, 32(7), 1936–1945. https://doi.org/10.1007/s10826-022-02474-z

Fierros, C. O., & Bernal, D. D. (2016). Vamos a platicar: The contours of pláticas as Chicana/Latina feminist methodology. Chicana/Latina Studies, 15(2), 98–121. http://www.jstor.org/stable/43941617

Freire, P. (2014). Pedagogy of the oppressed (M. B. Ramos, Trans.; 30th anniversary edition.). Bloomsbury Academic, an imprint of Bloomsbury Publishing.

Garcia, N. M., & Bernal, D. D. (2021). Remembering and revisiting pedagogies of the home. American Educational Research Journal, 58(3), 567–601. https://doi.org/10.3102/0002831220954431

Ginwright, S. (2018, May 31). The future of healing: Shifting from trauma informed care to healing centered engagement. Medium. https://ginwright.medium.com/the-future-of-healing-shifting-from-trauma-informed-care-to-healing-centered-engagement-634f557ce69c

Guajardo, F., & Guajardo, M. (2013). The power of plática. Reflections (Baltimore, Md.), 13(1). https://reflectionsjournal.net/archive/

Guzmán, S. (2023). Daughters of Latin America : An international anthology of writing by Latine women (First edition.). Amistad, an imprint of HarperCollinsPublishers.

hooks, b. (1994). Teaching to transgress: Education as the practice of freedom. Routledge.

Manzano-Sanchez, H., Matarrita-Cascante, D., & Outley, C. (2019). Barriers and supports to college aspiration among Latinx high school students. Journal of Youth Development (Online), 14(2), 25–45. https://doi.org/10.5195/jyd.2019.685

Manzano Sanchez, H., Outley, C., Matarrita-Cascante, D., & Gonzalez, J. (2023). Personal and contextual variables predicting college aspirations among Latinx high school students. Voces y Silencios, Revista Latinoamericana de Educación, 13(2), 88–114. https://doi.org/10.18175/VyS13.2.2022.10

Muralles, D.C. & Varko Fontana, V. (2024, Dec. 13). PRAXISioners platicando: Fostering belonging through culturally centered learning. [Presentation]. METRO’s Reference and Instruction Interest Group.

Latino students in higher education (2022, September). Postsecondary National Policy Institute. PNPI.org. https://pnpi.org/wp-content/uploads/2022/09/LatinoStudentsFactSheet_September_2022.pdf

Parking and Transportation Services (2023). FY 2022-2023 CSUEB Survey results. Internal CSUEB report: unpublished.

Pierre, E. (2023, August 4). Starting your career as the only BIPOC on your team. HBR. https://hbr.org/2023/08/starting-your-career-as-the-only-bipoc-on-your-team

Reyes, G. T. (2021, December 3). Critical race, decolonial, culturally sustaining pedagogy. [Gathering]. California State University, East Bay.

Reyes, G. T. (2024). Nice for whom? A dangerous, not-so-nice, critical race love letter. Education Sciences, 14(5), 508. https://doi.org/10.3390/educsci14050508

Appendix A

Questions

The following were question prompts we brainstormed to help us engage in a reflection process that occurred in the summer after the original pilot project was completed.

  1. How would you rate the experience you had at these events?
  2. Which parts of our book club did you find meaningful?
  3. What inspired this idea of book club? The book selected?
  4. How did you find yourself preparing for the book club sessions?
  5. Did what you read or our conversations prompt any questions? How did you follow-up, did you investigate your questions further? (i.e. on your own, with others)
  6. Please share something that you walked away with that really resonated with you and your cultural identity.
  7. What new skills or knowledge did you gain from this experience?
  8. How did the book club increase visibility of the LSSC, the library?
  9. How did you connect to the culture (music, art, literature) celebrated and presented in the topics covered during book club sessions?
  10. To what extent did you feel comfortable being yourself during book club? What contributed to your comfort level and sharing about yourself during book club sessions?
  11. Is there anything you learned from each other? Is there anything that stood out/learned from the participants? What was your impression of how the participants experienced the book club?
  12. Is there anything we learned about ourselves and our role as educators, role models, individuals in these bodies that are placed as mentors in this environment? In our roles as women in academia? In our roles as Latinx/Brown/Indigenous women?
  13. How do we want to approach the next book club? What are some lessons learned?
  14. How can someone reproduce something like this in their environment?

by Daisy Muralles at July 30, 2025 02:01 PM

Harvard Library Innovation Lab

LIL Awarded AALL Public Access to Government Information Award

This past week members of the Library Innovation Lab team traveled to Portland, Oregon to receive the Public Access to Government Information Award from AALL for our data.gov archive.

This award is given every year at the American Association of Law Libraries’ annual meeting to “recognize persons or organizations that have made significant contributions to protect and promote greater public access to government information.”

Image of LIL team members and other HLSL colleagues at the AALL Annual Meeting

The Harvard Law School Library has collected government records and made them available to patrons for centuries, and we are proud to have our contribution to this work recognized by our colleagues at AALL.

by LIL Team at July 30, 2025 12:00 AM

July 29, 2025

Open Knowledge Foundation

Open Data Editor Training in Brazil: Empowering Transparency and Innovation

The in-person training at the Federal University of Ceará (UFC) aimed to ensure adherence to the Brazilian Access to Information Law, promoting both active and passive transparency.

The post Open Data Editor Training in Brazil: Empowering Transparency and Innovation first appeared on Open Knowledge Blog.

by Juliana S. Lima at July 29, 2025 08:46 PM

July 25, 2025

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-07-25: Feature Engineering with Shallow Features and Methods

 

Creating synthesized instances with SMOTE (from Figure 3 in Wongvorachan et al.)


Jason Brownlee gave the definition of feature engineering as follows: "feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data". In other words, feature engineering is manually designing what the input Xs should be.

Feature engineering is regarded as key to success in applied machine learning. "Much of the success of machine learning is actually success in engineering features that a learner can understand", as pointed out by Scott Locklin.

There are many application scenarios for feature engineering, including loan application fraud detection/prevention, recommendation system user behavior modeling and disease diagnosis/risk prediction, etc. In a loan application fraud prevention program, data scientists can decide whether a user is reliable with features based on user's basic information, credit history and other information. A recommendation system can analyze a user's behavior features such as materials clicked by the user in the past few months, positive or negative reactions, user type and so on, to decide the user's most interested topics.

A feature is an attribute useful for the modeling task, but not all attributes can be used as features. In general, for most industrial modeling, expert knowledge is important for feature creation. For example, in a loan application fraud prevention program, experiences from the risk control department will be very helpful. There are potentially hundreds of features based on a user's basic information, credit report, and assets, but not all of them will be used in modeling. Expert knowledge can help data scientists quickly perform feature construction and screening.


1. Feature Describing

This step provides a general understanding of the dataset. We explore max, min, mean, standard deviation values of features, understand the tendency, dispersion, or distribution and find missing values, outliers, and duplicate values. This step serves as the preparatory work for the next steps.


2. Feature Processing

The foundation of feature engineering is feature processing, which is time-consuming and directly associated with data quality. It includes operations such as data-cleaning, standardization, and resampling, and aims to transform raw data into a format suitable for model training.


2.1 Data-cleaning

Data-cleaning generally processes missing values, outliers, and inconsistencies to ensure the accuracy of data. 

Some features may contain missing values because of the lack of observations. Missing values are typically processed in the following ways:

data['feature'] =data['feature'].fillna('-99')

data['feature'] = data['feature'].fillna(data['feature'].mean()))

data['feature'] = data['feature'].interpolate()

from fancyimpute import BiScaler, KNN, NuclearNormMinimization, SoftImpute

dataset = KNN(k=3).complete(dataset)

The most frequently used method is to drop directly or fill with mean values.

Outliers are identified based on interquartile range, mean and standard deviation. In addition, points whose distance from most points is greater than a certain threshold are considered outliers. The main distance measurement methods used are absolute distance (Manhattan distance), Euclidean distance, and Mahalanobis distance. 

We need to process outliers to reduce noise and improve data quality. Typical strategies for processing outliers include: directly deleting outliers when they have a significant impact on the analysis results, treating outliers as missing values and using previous methods for missing values to fill them out, or keeping the outliers when they are considered to be important. 

Duplicate values refer to identical samples from different sources, which will waste storage space and reduce data processing efficiency. The most common way is to drop duplicates completely or partially based on experience. 


2.2 Resampling

Class imbalance refers to the situation where the number of samples in different categories of the training set is significantly different. Machine learning methods generally assume that the number of samples of positive and negative classes is close. However, in the real world, we often observe class imbalance. There are some extreme cases: 2% of credit card accounts are fraudulent every year, and online advertising conversion rate is in the range of 10^-3 to 10^ -6 and so on. Class imbalance can cause prediction results of models to be biased towards the majority class and thus lower prediction power.

We can mitigate class imbalance by oversampling the minority class or undersampling the majority class. When the original dataset is huge, undersampling is a good choice that randomly deletes samples in the majority class to make the number of samples in the two classes equal. When the dataset is small, we prefer to use oversampling. One practice is to resample repeatedly in the minority class to increase the number in the minority class, until it equals the number in the majority class, which has high a risk of over-fitting. A better way is to use SMOTE (Synthetic Minority Over-sampling Technique), in which synthetic instances of the minority class are generated by interpolating feature vectors of neighboring instances, effectively increasing their representation in the training data. To be specific, SMOTE picks a sample point x in the minority class, and randomly picks a point x' from its k nearest neighbors. Then the synthetic instance will be created by the formula x_new = x + (x'-x)*d, where d is in the range [0,1]. Three figures from Wongvorachan et al. shown below demonstrate the three methods more intuitively.

Figure 1. Random oversampling (Figure 1 in Wongvorachan et al.)


Figure 2. Random undersampling (Figure 2 in Wongvorachan et al.)


Figure 3. SMOTE (Figure 3 in Wongvorachan et al.)

Table 1 shows the operating principle, advantages, and drawbacks of each resampling technique. The methods are commonly used, but we also need to emphasize the disadvantages: random oversampling increases the likelihood of overfitting, random undersampling can only keep partial information in the original dataset, SMOTE potentially creates noise in the dataset.

Table 1. The comparison of resampling techniques (Table 1 in Wongvorachan et al.)


A more straightforward way to mitigate class imbalance is Class Weights, which assigns a class weight to each class in the training set. If the number of samples in this class is large, then its weight is low, otherwise the weight is high. There is no need to generate new samples with this method, we just need to adjust the weights in the loss function.


2.3 Feature Transformation

Different features have different scales and ranges. Eliminating scale differences between different features can put data on the same scale and make them numerically comparable.

StandardScaler transforms data into a distribution with a mean of 0 and a standard deviation of 1 by Z-score normalization. Similarly, MinMax scaling normalizes all features to be within 0 and 1. To be specific, StandardScaler obtains the mean and standard deviation of the training data, and then uses these statistics to make a Z-score normalization with the following formula:

<semantics>μ</semantics>μ is mean and σ is standard deviation.

MinMaxScaler obtains the maximum and minimum values of the training data, and then transforms data with the following formula:


<semantics></semantics><semantics>σ</semantics><semantics>σ</semantics><semantics>σ</semantics>
<semantics>σ</semantics>σσ

Feature transformation has different impacts on different models. It has a great impact on SVM (support vector machine) and NN (nearest neighbor) which are based on distances in a Euclidean space, but has little impact on tree models such as random forest or XGBoost.


With a broader definition of feature engineering, the generation of embeddings which represent latent features is also regarded as feature engineering. Latent features follow a different set of methodologies. In this article, we only focus on the narrow definition of feature engineering, where shallow features are selected based on expert knowledge, and data is processed with the methodology discussed above. In real-world practice, especially in industry, successful feature engineering is essential for the model's performance.


- Xin 

by Xin (noreply@blogger.com) at July 25, 2025 04:25 AM

July 24, 2025

David Rosenthal

Meta: Slow Blogging Ahead

Source
There will be fewer than usual posts to this blog for a while. I have to write another talk for an intimidating audience, similar to the audience for my 2021 Talk at TTI/Vanguard Conference. That one took a lot of work but a few months later it became my EE380 Talk. That in turn became by far my most-read post, having so far gained 522K views. The EE380 talk eventually led to the invitation for the upcoming talk. Thus I am motivated to focus on writing this talk for the next few weeks.

Wikipedia's description of the image is:
Titivillus, a demon said to introduce errors into the work of scribes, besets a scribe at his desk (14th century illustration)

by David. (noreply@blogger.com) at July 24, 2025 03:00 PM

July 22, 2025

David Rosenthal

The Selling Of AI

Not AI, just a favorite
On my recent visit to London I was struck by how many of the advertisements in the Tube were selling AI. They fell into two groups, one aimed at CEOs and the other at marketing people. This is typical, the pitch for AI is impedance-matched to these targets:
In The Back Of The AI Envelope I explained:
why Sam Altman et al are so desperate to run the "drug-dealer's algorithm" (the first one's free) and get the world hooked on this drug so they can supply a world of addicts.
You can see how this works for the two targets. Once a CEO has addicted his company to AI by laying off most of the staff, there is no way he is going to go cold turkey by hiring them back even if the AI fails to meet his expectations. And once he has laid off most of the marketing department, the remaining marketeer must still generate the reams of collateral even if it lacks a certain something.

Below the fold I look into this example of the process Cory Doctrow called enshittification.

The first thing to note is that the pitch is working. The discourse is full of CEOs talking their book. For example we have Matt Novak's Billionaires Convince Themselves AI Chatbots Are Close to Making New Scientific Discoveries recounting the wisdom of Travis Kalnick:
“I’ll go down this thread with [Chat]GPT or Grok and I’ll start to get to the edge of what’s known in quantum physics and then I’m doing the equivalent of vibe coding, except it’s vibe physics,” Kalanick explained. “And we’re approaching what’s known. And I’m trying to poke and see if there’s breakthroughs to be had. And I’ve gotten pretty damn close to some interesting breakthroughs just doing that.”
Then there are the programmers extolling "vibe coding" and how it increases their productivity. CEOs who buy this pitch are laying off staff left and right. For example, Jordan Novote reports that Microsoft laying off about 9,000 employees in latest round of cuts:
Microsoft said Wednesday that it will lay off about 9,000 employees. The move will affect less than 4% of its global workforce across different teams, geographies and levels of experience, a person familiar with the matter told CNBC.
...
Microsoft has held several rounds of layoffs already this calendar year. In January, it cut less than 1% of headcount based on performance. The 50-year-old software company slashed more than 6,000 jobs in May and then at least 300 more in June.
How well is this likely to work out? Evidence is accumulating that AI's capabilities are over-hyped. Thomas Claiburn's AI models just don't understand what they're talking about is an example:
Asked to explain the ABAB rhyming scheme, OpenAI's GPT-4o did so accurately, responding, "An ABAB scheme alternates rhymes: first and third lines rhyme, second and fourth rhyme."

Yet when asked to provide a blank word in a four-line poem using the ABAB rhyming scheme, the model responded with a word that didn't rhyme appropriately. In other words, the model correctly predicted the tokens to explain the ABAB rhyme scheme without the understanding it would have needed to reproduce it.

The problem with potemkins in AI models is that they invalidate benchmarks, the researchers argue. The purpose of benchmark tests for AI models is to suggest broader competence. But if the test only measures test performance and not the capacity to apply model training beyond the test scenario, it doesn't have much value.
Source
As far as I know the only proper random controlled trial of AI's productivity increase comes from Model Evaluation and Threat Research entitled Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity:
16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early 2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%--AI tooling slowed developers down.
David Gerard notes:
Even the devs who liked the AI found it was bad at large and complex code bases like these ones, and over half the AI suggestions were not usable. Even the suggestions they accepted needed a lot of fixing up.
This might be why Ashley Stewart reported that Microsoft pushes staff to use internal AI tools more, and may consider this in reviews. 'Using AI is no longer optional.':
Julia Liuson, president of the Microsoft division responsible for developer tools such as AI coding service GitHub Copilot, recently sent an email instructing managers to evaluate employee performance based on their use of internal AI tools like this.

"AI is now a fundamental part of how we work," Liuson wrote. "Just like collaboration, data-driven thinking, and effective communication, using AI is no longer optional — it's core to every role and every level."

Liuson told managers that AI "should be part of your holistic reflections on an individual's performance and impact."
Source
If the tools were that good, people would use them without being threatened. If the tools were that good, people would pay for them. But Menlo Ventures found that only 3% of consumers pay anything. They are happy to use free toys but they have other spending priorities. Other surveys have found numbers up to 8%, but as Ted Gioia notes in The Force-Feeding of AI on an Unwilling Public:
Has there ever been a major innovation that helped society, but only 8% of the public would pay for it?
Gioia didn't want AI but as an Office 365 user he didn't have that option:
AI is now bundled into all of my Microsoft software.

Even worse, Microsoft recently raised the price of its subscriptions by $3 per month to cover the additional AI benefits. I get to use my AI companion 60 times per month as part of the deal.
Source
Microsoft didn't ask their customer whether they would pay for AI, because the answer would have been no. Gioia writes:
This is how AI gets introduced to the marketplace—by force-feeding the public. And they’re doing this for a very good reason.

Most people won’t pay for AI voluntarily—just 8% according to a recent survey. So they need to bundle it with some other essential product.
As I discussed in The Back Of The AI Envelope, the AI giants running the drug-dealer's algorithm are losing money on every prompt. Gioia has noticed this:
There’s another reason why huge tech companies do this—but they don’t like to talk about it. If they bundle AI into other products and services, they can hide the losses on their income statement.

That wouldn’t be possible if they charged for AI as a standalone product. That would make its profitability (or, more likely, loss) very easy to measure.

Shareholders would complain. Stock prices would drop. Companies would be forced to address customer concerns.

But if AI is bundled into existing businesses, Silicon Valley CEOs can pretend that AI is a moneymaker, even if the public is lukewarm or hostile.
Salesforce is another company that has spotted this opportunity:
Yesterday Salesforce announced that prices on a pile of their services are going up around 6% — because AI is just that cool.

Salesforce’s stated reason for the price rise is “the significant ongoing innovation and customer value delivered through our products.” But you know the actual reason is because f- you, that’s why. What are you gonna do, move to SAP? Yeah, didn’t think so.
One problem is that the technology Salesforce is charging its customers for doesn't work well in Salesforce's application space. Salesforce's own researchers developed a new bechmark suite called CRMAArena-Pro:
CRMArena-Pro expands on CRMArena with nineteen expert-validated tasks across sales, service, and 'configure, price, and quote' processes, for both Business-to-Business and Business-to-Customer scenarios. It distinctively incorporates multi-turn interactions guided by diverse personas and robust confidentiality awareness assessments. Experiments reveal leading LLM agents achieve only around 58% single-turn success on CRMArena-Pro, with performance dropping significantly to approximately 35% in multi-turn settings. While Workflow Execution proves more tractable for top agents (over 83% single-turn success), other evaluated business skills present greater challenges. Furthermore, agents exhibit near-zero inherent confidentiality awareness; though targeted prompting can improve this, it often compromises task performance.
Huang et al Table 2
To summarize the results:
The agent bots had 58% success on tasks that can be done in one single step. That dropped to 35% success if they had to take multiple steps. The chatbot agents are also bad at confidentiality:
Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance. These findings suggest a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios.
Despite the fact that most consumers won't pay the current prices, it is inevitable that once the customers are addicted, prices will go up spectacularly. But the wads of VC cash may not last long enough, and things can get awkward with the customers who are paying the current prices, as David Gerard reports:
You could buy 500 Cursor requests a month for $20 on the “Pro” plan. People bought a year in advance.

In mid-June, Cursor offered a new $200/month “Ultra” plan. But it also changed Pro from 500 requests to $20 of “compute” at cost price — the actual cost of whichever chatbot vendor you were using. That was a lot less than 500 requests.

You could stay on the old Pro plan! But users reported they kept hitting rate limits and Cursor was all but unusable.

The new plan Pro users are getting surprise bills, because the system doesn’t just stop when you’ve used up your $20. One guy ran up $71 in one day.

Anysphere has looked at the finances and stopped subsidising the app. Users suddenly have to pay what their requests are actually costing.

Anysphere says they put the prices up because “new models can spend more tokens per request on longer-horizon tasks” — that is, OpenAI and Anthropic are charging more.
The CEO who laid off the staff faces another set of "business risks". First, OpenAI is close to a monopoly; it has around 90% of the chatbot market. This makes it a single point of failure, and it does fail:
On June 9 at 11:36  PM PDT, a routine update to the host Operating System on our cloud-hosted GPU servers caused a significant number of GPU nodes to lose network connectivity. This led to a drop in available capacity for our services. As a result, ChatGPT users experienced elevated error rates reaching ~35% errors at peak, while API users experienced error rates peaking at ~25%. The highest impact occurred between June 10 2:00  AM PDT and June 10 8:00  AM PDT.
Second, the chatbots present an attractive attack surface. David Gerard reports on a talk at Black Hat USA 2024:
Zenity CEO Michael Bargury spoke at Black Hat USA 2024 on Thursday on how to exploit Copilot Studio: Bargury demonstrated intercepting a bank transfer between a company and their client “just by sending an email to the person.”
So the technology being sold to the CEOs isn't likely to live up to expectations and it will cost many times the current price. But the way it is being sold means that none of this matters. By the time the CEO discovers these issues, the company will be addicted.

by David. (noreply@blogger.com) at July 22, 2025 03:00 PM

Library | Ruth Kitchin Tillman

Testing the Summon Research Assistant

Early this spring, Ex Libris released the Summon “Research Assistant.” This search tool is Retrieval Augmented Generation, using an LLM tool (OpenAI’s GPT–4o mini at time of writing) to search and summarize metadata in their Summon/Primo Central Discovery Index.

We did a library-wide test mid-semester and decided that it’s not appropriate to turn it on now. We may do so when some bugs are worked out. Even then, it is not a tool we’d leave linked in the header, promote as-is, or teach without significant caveats (see Reflection).

Brief Overview of the Tool

This overview is for the Summon version, though I believe that the Primo version is pretty similar and it has some of the same limitations.

From the documentation:

  1. Query Conversion – The user’s question is sent to the LLM, where it is converted to a Boolean query that contains a number of variations of the query, connected with an OR. If the query is non-English, some of the variations will be in the query language, and the other variations will be in English.
  2. Results Retrieval – The Boolean query is sent to CDI to retrieve the results.
  3. Re-ranking – The top results (up to 30) are re-ranked using embeddings to identify five sources that best address the user’s query.
  4. Overview Creation – The top five results are sent to the LLM with the instructions to create the overview with inline references, based on the abstracts.
  5. Response Delivery – The overview and sources are returned to the user in the response.

There is one major caveat to the above, also in the documentation, which is the content scope. Once you get through all the exceptions,1 only a slice of the CDI could make it into the top 5 results. Most notably, records from any of the following content providers are not included:

These would be in the results you get when clicking through to “View related results,” but they could not make it into the “Top 5.”

Positive Findings

I would summarize the overall findings as: extremely mixed. As I said up front, we had enough concerns that we didn’t want to simply turn on the tool and encourage our wider base to try it out.

Sometimes, people got really interesting or useful results. When it worked well, we found the query generation could come up with search strings that we wouldn’t have thought of but got good results. I found some electronic resources about quilts that I didn’t know we had – which is saying something!

Some of the ways the tool rephrased research questions as suggested “related research questions” were also useful. A few people suggested that this could be used to help students think about the different ways one can design and phrase a search.

The summaries generally seemed accurate to the record abstracts. I appreciated that they were cited in a way that let me identify which item was the source of which assertion.2

We also had many concerns.

Massive Content Gaps (and Additions)

The content gaps are a dealbreaker all on their own. No JSTOR? No Elsevier? No APA? Whole disiplines are missing. While they do show up in the “View related results,” those first 5 results matter a lot in a user’s experience and shape expectations of what a further search would contain. If someone is in a field for which those are important databases, it would irresponsible to send them to this tool.

The need for abstracts significantly limits which kinds of results get included. Many of our MARC records do not have abstracts. For others, one may infer the contents of the book from a table of contents note, but this requires levels of abstraction and inference which a human can perform but this tool doesn’t.

Then there’s the flip side of coverage. This is based on the Ex Libris CDI (minus the massive content gaps), which includes everything that we could potentially activate. At time of writing, it still doesn’t seem possible to scope to just our holdings (and include our own MARC). This means results include not only the good stuff we’d be happy to get for a patron via ILL but also whatever cruft has made its way into the billion+ item index. And that’s not a hypothetical problem. In one search we did during the session, so much potential content was in excluded JSTOR collections that a top 5 result on the RAG page was an apparently LLM-generated Arabic bookseller’s site.3

LLM Parsing / Phrasing

The next issue we encountered was that sometimes the LLM handled queries in unexpected4 ways.

Unexpected Questions

First, the Research Assistant is built to only answer a specific type of question. While all search tools can be described that way, anyone who’s worked for more than 30 seconds with actual humans knows that they don’t always use things in the way we intend. That’s why we build things like “best bet” style canned responses to handle searches for library hours or materials with complicated access (like the Wall Street Journal).

  1. It was not programmed to do anything with single word searches. A search for “genetics,” for example got the “We couldn’t generate an answer for your question” response. There wasn’t any kind of error-handling on the Ex Libris side to turn it into some kind of “I would like to know about [keyword],” even as a suggestion provided in the error message. For all my critiques of LLMs themselves, sometimes it’s just poor edge case handling.
  2. Then there were the meta questions. Colleagues who staff our Ask-a-Librarian brought in a few that they’ve gotten: “Do you have The Atlantic?” or “What is on course reserve for XXXXX?” In both of those cases, the tool was not able to detect that this was not the kind of question it was programmed to answer. In both cases, it returned a few random materials and generated stochastic responses which were, of course, completely inaccurate.

LLM-Induced Biases

Then there were issues introduced by the nature of LLMs – how they tokenize and what kind of data they’re trained on:

  1. A liaison librarian reported asking about notable authors from Mauritius and being given results for notable authors from Mauritania. I would guess this is a combination of stemming and lack of responses for Mauritius. But they are two very distinct countries, in completely different regions of a continent (or off the continent).
  2. Another bias-triggering question related to Islamic law and abortion. The output used language specific to 20th/21st evangelical Christianity. Because LLMs are configured not to output the same result twice, we could not replicate it, but instead got a variety of different phrasings of results of varying quality. This is a (not-unexpected) bias introduced by the data the LLM was trained on. Notably, it was not coming from the language of the abstracts.

Balancing Safety and Inquiry

Note: While I was finishing this blog post, the ACRLog published a post going into more detail about topics blocked by the “safeguards”. I brought this to our library-wide discussion but I’m going to refer readers to the above. Basically, if you ask about some topics, you won’t get a response. Even though some of these are the exact kind of thing we expect our students to be researching.5

When the Summon listserv was discussing this issue in the spring, I went and found the OpenAI Azure documentation for content filtering. They have a set of different areas that people can configure:

Configuration levels can be set at low, medium, and high for each. I shared the link and list of areas on the listserv and asked about which the Research Assistant uses but did not get an answer from Ex Libris.

Steps to Delivery

This next part relates to the idea of the Research Assistant itself, along with Ex Libris’s implementation.

Very, very few of our patrons need just a summary of materials (and, again, results of only materials which happen to have an abstract, and of only the abstract not the actual materials). Plenty of our patrons don’t need that at all. Unless they’re going to copy-paste the summary into their paper and call it a day, they actually need to get and use the materials.

So once they’ve found something interesting, what are their next steps?

Well, first you click the item.

Search results with 5 item citations above a summary

Then you click Get It.

The first item citation has been expanded and shows a Get It button

Then THAT opens a Quick Look view.

A sidebar has opened on the right of the screen with a full citation. There is no clear place to click but the title is a link

Then you click the title link on the item in the Quick Look View.

A results page which says you are looking for the book and offers a button to get it via Interlibrary Loan

And oh look this was in the CDI but not in our holdings, so it’s sent me to an ILL page (this was not planned, just how it happened).

Maybe ExLibris missed the memo, but we’ve actually been working pretty hard to streamline pathways for our patrons. The fewer clicks the better. This is a massive step backward.

Reflection

I doubt this would be of any utility for grad students or faculty except as another way of constructing query strings. I do think it’s possible to teach with this tool, as with many other equally but differently broken tools. I would not recommend it at a survey course level. Is it better than other tools they’re probably already using? Perhaps, but the bar is in hell.

Optimal use requires:

  1. Students to be in a discipline where there’s decent coverage.
  2. Students to know that topical and coverage limitations exist.
  3. Students to understand the summaries are the equivalent of reading 5 abstracts at once and that there may be very important materials in the piece itself.
  4. Students to actually click through to the full list of results.
  5. Ex Libris to let us search only our own index (due to the cruft issue).
  6. Ex Libris to redesign the interface with a shorter path to materials.

Its greatest strength as a tool is probably the LLM to query translation and recommendations for related searches. When it works. But with all those caveats?

I am not optimistic.


  1. FWIW, I totally understand and support not including News content in this thing. First, our researchers are generally looking for scholarly resources of some kind. Second, bias city. ↩︎

  2. These citations are to the abstract vs. the actual contents. This could cause issues if people try to shortcut by just copy-pasting, since we’re reliant on the abstract to reliably represent the contents (though there’s also no page # citation). ↩︎

  3. A colleague who is fluent in Arabic hypothesized that was not a real bookstore because many small things about the language and site (when we clicked through) were wrong. ↩︎

  4. Ben Zhao’s closing keynote for OpenRepositories goes into how these kinds of issues could be expected. So I’ll say “unexpected” from the user’s POV but also I cannot recommend his talk highly enough. Watch it. ↩︎

  5. Whether ChatGPT can appropriately summarize 5 abstracts from materials related to the Tulsa Race Riots or the Nakba is a whole separate question. ↩︎

July 22, 2025 12:00 AM

Library Tech Talk (U of Michigan)

Collaborating on a Digital Music Archive: U-M's contributions to the Sounding Spirit Digital Library

In 2020, the Emory Center for Digital Scholarship in Atlanta, Georgia reached out to the University of Michigan to contribute to the Sounding Spirit Digital Library (SSDL). They asked the Bentley Historical Library, the U-M Library, and the William L. Clements Library to contribute titles in our collections that would expand their digital collection. This post looks at the range of titles contributed, discusses the equipment used to digitize the titles, and analyses the ways that SSDL and U-M Library align and vary in their digitization efforts.

by Jeremy Evans, Larry Wentzel at July 22, 2025 12:00 AM

July 21, 2025

Jez Cope

Trying to rediscover my voice

I always feel like I have so much to write, so much that I want to write. So why is it that anytime I sit down to actually write my mind goes entirely blank? Even if I have a list of intended topics right in front of me, or a partial draft to work on, I start feeling like I have nothing interesting to say. The only thing I have much success in writing is these long, boring, self indulgent walls of text about how I feel about not being able to write. I don’t know, maybe I should just publish this. At least it will be some of my thoughts out in the world again.

Back in the early days of my blog it seemed to flow quite easily. I was confident, not really that what I had to say was correct, or insightful, or in any way important, but that it was not going to get me into trouble or draw criticism that I couldn’t handle. I naively thought that I could say whatever I wanted without fear.

Since I am a white, well-educated, straight cis man, I was largely right. The only time I felt any discomfort was when I posted a paraphrased recipe from a book and was threatened with legal action by the writer’s agent. While I was certain that legally I was in the clear (you can’t copyright a recipe), and felt that since I was strongly recommending people buy the book it was OK morally too, I deleted the post.

Of course I did. I’ve spent my whole life learning over and over that conflict is bad, it’s my responsibility to resolve it, and it probably is me being unreasonable anyway. Like a lot of ND kids, I learned early on that all these intense feelings and sense impressions (that I believed everyone felt) were not to be acted upon because doing so only brought trouble.

So I think after that experience, benign though it was, I started to doubt whether I could speak freely after all. At the same time I was awakening politically and learning that some of the beliefs I thought were obvious (e.g. that all humans had rich inner lives that affected how they thought/acted) were in fact not that common, and that stating them in particular ways could be seen as somehow controversial. I was also succeeding in my career and starting to internalise what I was told about what I said reflecting on my employer and colleagues.

However it happened, I had lost my voice.

Yes, in the sense of being literally unable to express myself in certain ways, but also in that I felt there was something I once had but had misplaced and desperately wanted to find again without knowing where to look. First it only affected my personal writing, but as the years went by it crept into my professional work too. Working in a large organisation is never not political; what you say affects not only your own standing and influence but also that of your team and department in the wider organisation, which is a heck of a responsibility. By the time I ended up in the public sector I was being regularly reminded that it was my job to remain neutral, impartial, disinterested. Eventually I stopped saying much at all, except to a few trusted friends and colleagues.

This is not my fault, exactly, but it is partly related to some very core parts of my personality. I know that breaking rules feels bad, feels dangerous. So when I’m told implicitly that regardless of the depth of my knowledge or experience my word isn’t good enough —that only things supported by concrete evidence are OK to say— I play it safe.

I see that not everyone plays by those rules, and that some don’t suffer any consequences for breaking them, but I’m unable to discern to my own satisfaction why that is or how to emulate it. Is it because they are brighter or more experienced than I? Are they party to information that explains how they are not, in fact, breaking rules? Do they have better understanding of what rules can or cannot be broken? Are they more mature and self-confident through experience? Are they simply confident because they have the privilege never to have been challenged and the power to carry it off through confidence alone? Maybe some of these are people I don’t want to emulate after all…

Still, there are glimmers of hope. I’m starting to become more aware of contexts where I feel less shackled and more able to express myself. Unsurprisingly, it’s usually when I’m under less pressure (external or internal) to “deliver” some “output” that meets some vague criteria, and when I’m working with people I know well and trust not to judge me personally if we disagree. It’s also easier when I retreat to that shrinking zone where I still feel like I can speak with some authority.

I’m looking for ways to put myself in that context more often. Right now I think that means identifying a small group of trusted colleagues at work that I can bounce ideas around with, and doing more writing in the various communities I find myself in. There was a long while when I didn’t feel I had the authority to speak even about my own lived experience. Two years of therapy, a lot of introspection, and the love of friends and family have brought me to a place where I no longer doubt my own experience of the world (well, not so much as I did — it’s a work in progress), which gives me a place of solid ground to build out from as I re-establish my faith in my skills, experience and judgement in other areas.

Well, this wasn’t the thing I was expecting to write when I started, but here we are. I guess we’ll see how it goes!

July 21, 2025 08:13 PM

July 19, 2025

Ed Summers

DC wall

DC wall

July 19, 2025 04:00 AM

July 18, 2025

Web Science and Digital Libraries (WS-DL) Group at Old Dominion University

2025-07-19: 17th ACM Web Science Conference (WebSci) 2025 Trip Report

 

The 17th ACM Web Science Conference (WebSci 2025) took place at Rutgers University in New Brunswick, New Jersey


The 17th ACM Web Science Conference (WebSci 2025) was held from May 20–23 at Rutgers University in New Brunswick, New Jersey. The theme was "Maintaining a Human-Centric Web in the Era of Generative AI" and highlighted the interdisciplinary nature of Web Science, which examines the complex, reciprocal relationship between the Web and society. This trip report is authored by Kritika Garg and David Calano from the Web Science and Digital Libraries (WSDL) research group at Old Dominion University, who had the pleasure of attending and presenting at the conference. 



Tuesday, May 20, 2025

On the first day of the conference, a series of workshops and tutorials were held on cutting-edge topics such as Generative AI, the Human-Centric Web, and Information Security. Tutorials included sessions on using the National Internet Observatory for collecting web data for research and exploring the Meta Content Library as a research tool. We had to choose one workshop or tutorial to attend.


Tutorial: Beyond APIs: Collecting Web Data for Research using the National Internet Observatory 

The first workshop session was “Beyond APIs”, where members of the National Internet Observatory (NIO) at Northwestern University discussed many of the current issues in interfacing with the Web, collecting data, and ethical concerns of data usage. We at WS-DL often face many of these same challenges when working with APIs of various sites, such as the deprecation of the original Twitter API discussed in the workshop. In the NIO program, users opt into the study and can both voluntarily donate their data and utilize mobile apps and browser extensions which monitor their Web activity and allow researchers to find interesting patterns in user behavior and the interconnectedness of the Web.


 


Workshop: HumanGenAI Interactions: Shaping the Future of Web Science


I, Kritika Garg, participated in the workshop “Human-GenAI Interactions: Shaping the Future of Web Science,” which showcased several fascinating studies.

Lydia Manikonda from Rensselaer Polytechnic Institute presented work on characterizing linguistic differences between human and LLM-generated text using Reddit data from r/explainlikeimfive. They prompted ChatGPT with the same questions as those posed on the subreddit, then compared the top-voted human responses with the AI-generated ones, asking whether readers could distinguish between them and identify the author.

Celia Chen and Alex Leitch from the University of Maryland discussed “Evaluating Machine Expertise,” focusing on how graduate students develop frameworks to assess GenAI content. They noted that LLM-generated content often appears authoritative even without domain expertise. Their research examines whether students build mental models to decide when and how to use LLMs and how these frameworks shift across disciplines. They found that students protect work central to their professional identity, are skeptical of academic LLM content, but trust machine outputs when they can be tested. International students often verify results across languages, such as checking first in English and then confirming in Chinese.

Alexander Bringsjord from Rensselaer Polytechnic Institute explored GenAI’s dual deception based on content and perceived intelligence, highlighting LLM hallucinations and how LLMs blend prior conversation into answers rather than accurately interpreting new documents.

Lydia Manikonda also spoke about the importance of privacy and ethical practices as more companies integrate AI into customer experiences.

Finally, Eni Mustafaraj’s reflections on the Semantic Web and the current state of AI, along with her work on Credbot, left me reflecting on how we might engage with the web and information in the future. The discussion about whether we will continue to visit web pages or shift to new modes of communication felt especially relevant and worth pondering.

 



Wednesday, May 21, 2025

The conference kicked off on Wednesday with opening remarks from General Chair Matthew Weber of Rutgers University. He welcomed attendees to New Jersey and introduced the other chairs. He shared that this year there were 149 submissions from 519 authors across 29 countries, with 59 papers accepted, resulting in an acceptance rate of 39.6%.


Session 1: Digital Identity & Social Systems 


Ines Abbes opened Session 1 with “Early Detection of DDoS Attacks via Online Social Networks Analysis”.  They proposed a BERT-based approach for early detection of DDoS attacks by analyzing user reports on Twitter, demonstrating high accuracy and outperforming existing methods. Next, Sai Keerthana Karnam presented “Social Biases in Knowledge Representations of Wikidata separates Global North from Global South.” Their work investigates social biases embedded in Wikidata’s knowledge representations, showing that geographic variations in bias reflect broader socio-economic and cultural divisions worldwide. Xinhui Chen presented Unpacking the Dilemma: The Dual Impact of AI Instructors’ Social Presence on Learners’ Perceived Learning and Satisfaction, Mediated by the Uncanny Valley”, that explores how adding social presence to AI instructors boosts learners’ perceived learning and satisfaction but also risks triggering uncanny‑valley reactions. Lastly, Ben Treves presented VIKI: Systematic Cross-Platform Profile Inference of Tech Users”. Their work introduces VIKI, a method that analyzes and compares users’ displayed personas, like personality traits, interests, and offensive behavior, across platforms such as GitHub, LinkedIn, and X, revealing that 78% of users significantly alter how they present themselves depending on the context.



Keynote: Mor Naaman 


Mor Naaman from Cornell Tech delivered the first keynote of the conference. His talk was titled “AI Everywhere all at Once: Revisiting AI-Mediated Communication”. He reflected on how, when the concept of AI-Mediated Communication (AIMC) was first introduced in 2019, it seemed mostly theoretical and academic. However, in just a few years, AI has become deeply embedded in nearly every aspect of human communication, from personal conversations to professional work and online communities. Mor revisited key studies from the AIMC area, highlighting findings such as how suspicion of AI can undermine trust in interpersonal exchanges, and how AI assistants can subtly influence not only the language and content of our communication but even our attitudes. Given the rapid growth of AI technologies like ChatGPT, he proposed an updated understanding of AIMC’s scope and shared future research directions, while emphasizing the complex challenges we face in this evolving landscape. His talk highlighted the profound and often subtle ways AI is transforming our communication, not just in what we say, but how we think and connect with one another. It made me wonder about the future of communication as AI becomes increasingly integrated into our daily interactions, raising important questions about how we can preserve authenticity and trust amid this rapid technological rise.



Session 2: Content Analysis & User Narratives 


After lunch, there were two parallel sessions and we attended Session 2 which seemed more aligned with our interests. Jessica Costa started the session with Characterizing YouTube’s Role in Online Gambling Promotion: A Case Study of Fortune Tiger in Brazil”, which examines how YouTube facilitates the promotion of online gambling, highlighting its societal impact and providing a robust methodology for analyzing similar platforms. Next, Aria Pessianzadeh presented Exploring Stance on Affirmative Action Through Reddit Narratives”. This study analyzes narratives on Reddit to explore public opinions on affirmative action, revealing how users express support or opposition through personal stories and thematic framing. Ashwin Rajadesingan presented How Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions”, which study showing how sharing personal stories can motivate people who typically avoid politics to join conversations on Reddit, as these stories resonate more with people and tend to receive more positive engagement than other types of comments. Wolf-Tilo Balke concluded the session with "Scientific Accountability: Detecting Salient Features of Retracted Articles". This study identifies key characteristics of retracted scientific articles, such as citation patterns, language features, and publication metadata, to better understand their impact and improve detection of problematic research. This work offers a new lens to think critically about the credibility of scientific literature, especially in an era of information overload.


 


Keynote: Lee Giles 


Dr. Lee Giles delivered an excellent keynote on the operation and infrastructure of Web crawlers as well as search engines, both general and those created by him. These included numerous *Seer-variant engines, such as ChemXSeer and CiteSeerX. Being a friend of the WS-DL research group, this talk was a nice treat as a current WS-DL student and an incredible resource for other conference participants interested in Web crawlers. Through discussions with other students there, many had attempted to work with or create Web crawlers in the past without realizing the complexity and challenging hurdles they needed to overcome in the process of navigating the modern Web.


 


Lightning Talks & Poster Session


The WebSci ‘25 Lightning Talks were brief presentations meant to advertise and attract audience members to the large selection of posters being presented. As with the session and keynote talks, there was no shortage of interesting work on display. 



I, David Calano, presented the poster "GitHub Repository Complexity Leads to Diminished Web Archive Availability", which highlighted the limited availability of Web hosted (i.e., GitHub) software repositories archived to the Wayback Machine. We looked at the page damage of archived repository landing pages and the availability of the archived source files themselves to assess the viability of potentially rebuilding archived software projects.



Thursday, May 22, 2025

Session 4: Media Credibility & Bias


The talks from Session 4 were all keenly relevant to today’s evolving political climate. The papers presented in this talk were:


All of the papers in this talk presented interesting information and findings. For example, in the case of Kai-Cheng Yang and Filippo Menczer’s paper, it is interesting to note the left-wing bias inherent in LLMs and what effect such biases might have. As many Web users, particularly those of younger generations, default to consulting an LLM chat bot for information and rarely conduct further searches or analysis of sources, what happens to an already polarizing society? Likewise, Chau Tong’s paper explored the topic of polarization in search engine results. The DocNet paper by Zhu et al. also provided a good technical exploration of bias detection systems leveraging AI and Python.


Session 7: Online Safety & Policy 

Deanna Zarrillo presented “Facilitating Gender Diverse Authorship: A Comparative Analysis of Academic Publishers’ Name Change Policies”, which examines the publicly available name change policies of nine academic journal publishers through thematic content analysis, providing insights into how publishers manage rebranding and transparency during transitions. Tessa Masis presented her work, “Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement”  which examines how the #StopAsianHate movement used multilingual posts and K-pop fan culture to build global solidarity and amplify anti-Asian hate messages across different countries and communities online.

I, Kritika Garg, had the pleasure of presenting our work, “Not Here, Go There: Analyzing Redirection Patterns on the Web”. Our research examined 11 million redirecting URIs to uncover patterns in web redirections and their implications on user experience and web performance. While half of these redirections successfully reached their intended targets, the other half led to various errors or inefficiencies, including some that exceeded recommended hop limits. Notably, the study revealed "sink" URIs, where multiple redirections converge, sometimes used for playful purposes such as Rickrolling. Additionally, it highlighted issues like "soft 404" error pages, causing unnecessary resource consumption. The research provides valuable insights for web developers and archivists aiming to optimize website efficiency and preserve long-term content accessibility.


Not Here, Go There: Analyzing Redirection Patterns on the Web from Kritika Garg


Mohammad Namvarpour presented the last presentation of the session, “The Evolving Landscape of Youth Online Safety: Insights from News Media Analysis”, which examines how news stories about keeping kids safe online have changed over the past 20 years, showing that recent coverage focuses more on tech companies and government rules. The authors studied news articles to understand how the conversation about youth online safety has evolved.


Session 9: Contemporary Issues in Social Media 


The papers from Session 9 explored a wide range of topics across social media from war, news, and even mental health and safety. “A Call to Arms: Automated Methods for Identifying Weapons in Social Media Analysis of Conflict Zones” by Abedin et al. presented an interesting framework for analyzing and tracking weapons of war and ongoing conflicts in active regions of war through social media platforms, such as Telegram. Their work heavily utilized computer vision and open-source datasets and provides a window into the scale and lethality of ongoing conflicts. The paper by Saxena et al., “Understanding Narratives of Trauma in Social Media”, was incredibly valuable in discussing the effects of trauma and social media on mental health.


Web Science Panel  


The Web Science panel consisted of Dame Wendy Hall, Dr. Jim Hendler, Dr. Mathew Webber, Dr. Wolf-Tilo Balke, Dr. Marlon Twyman II, and Dr. Oshani Seneviratne. While the panel went a little over on time and not many questions were able to be asked in session, many were had at the reception after. It was a treat to hear from some of the key founders of the field of Web Science and core creators of the World Wide Web at large. The panel topics and moderated questions took on a broad range of topics across the spectrum of Web Science and it was great to hear the thoughts from such key figures on issues related to social media, AI, political governance, the Semantic Web, and the broad applications of Communication and Social Science to the World Wide Web. Also discussed by Dame Hall and Dr. Hendler was the Web Science Trust, which seeks to advance the field of Web Science and bring together researchers from across the globe.

Web Science Panel responding to attendee questions on a wide range of Web Science topics



Friday, May 23, 2025

Session 10: Platform Governance & User Safety 

Session 10 also had a decent variety in terms of content. Two of our favorite papers presented were “Decentralized Discourse: Interaction Dynamics on Mastodon by Brauweiler et al. and “Is it safe? Analysis of Live Streams Targeted at Kids on Twitch.tv”, by Silva et al. Many of the WS-DL members are fans of new, unique, experimental, and decentralized Web tools and social platforms. Some of our members are active in various Mastodon communities and have even run their own instances. It was exciting to hear some researchers are utilizing Mastodon and other social platforms and how they tackled many of the technical challenges present among them. Like the work of Saxena et al. from Session 9, the work by Silva et al. in researching child safety on the popular streaming platform Twitch is also of great importance for the health and wellbeing of the younger Web population. They found that currently Twitch only has minimal options in place for marking and filtering adult content, and in particular only for select forms of media, and such channels are self-reported as for an adult audience, not automatically tagged as such. Furthermore, even if content is not marked for an adult audience, or explicitly marked for kids or for a younger audience, there is no guarantee of the language used by the streamer or topics discussed in chat to be suitable for younger audiences except through voluntary moderation.

 


Closing Keynote: Dame Wendy Hall 

Dame Wendy Hall’s closing keynote was an excellent look through the history of Artificial Intelligence and its relation to the Web. It served as an excellent reminder that progress is not always constant and we tend to alternate between periods of uncertainty and rapid progress that can often blindside us to potential hazards. It was also a reminder of how much Artificial Intelligence relies on the World Wide Web, its users surfing the waves of hyperspace, and the information they share along the way. The collective information of the Web is what comprises AI, without the input of billions of people around the world, there would be no substance to it. Some other great points from the talk were on the dangers and politics surrounding AI research, development, and utilization. Importantly, how much power and control we allow AI to have in our global society and global cooperation (or lack thereof) in regards to AI regulation. The points of this keynote were extremely relevant given the simultaneous release of Anthropic’s Claude 4 LLM model, which in testing was found to engage in blackmail, whistleblowing, and other interesting behaviors.


Conference closing 

Despite the week’s rainy weather, the conference was well-organized, stimulating, and rewarding. For some, this was a return to a familiar community, while for us it was a valuable first in person conference experience. The opportunity to exchange ideas with colleagues from industry and academia worldwide was truly worthwhile. The dinner at the Rutgers Club was a fitting conclusion, providing space to connect before departing. With the next conference scheduled for Germany, we look forward to continuing these conversations there. Many thanks to the organizers for putting together an excellent event.

Snapshots from our trip — Kritika and David presenting at WebSci 2025, meeting Dame Wendy Hall and Dr. Jim Hendler, the must-have ODU WSDL group photo with our alumnus Dr. Sawood Alam, and a scenic drive back to Virginia



- Kritika Garg (@kritika_garg) and David Calano (@lifefromalaptop)


by KritikaGarg (noreply@blogger.com) at July 18, 2025 10:37 PM