The Learnovation Foundation Network organized the Wikidata Loves SDGs 2024 event on 7 March 2024 at the Mustapha Akanbi Library and Resource Centre in Kwara, Nigeria, to celebrate Open Data Day with mini-grant support. The event focused on enhancing and updating Wikidata items related to Sustainable Development Goals (SDGs) in Nigeria, fostering collaboration and awareness among Wikidatans, SDG advocates, and data enthusiasts.
The event boasted the presence of experienced facilitators and Wikimedia project organizers, including Barakat Adegboye, Blessing Linason, and Miracle James. The theme, âOpen data for advancing sustainable development goals,â set the stage for a day of insightful presentations and hands-on activities.
Kehinde Akinsola, the Programs Lead from The Wellbeing Foundation Africa, represented by Miss Jimoh Zainab, delivered an engaging talk on the intersection of open data and SDGs, emphasizing the role of accurate data in achieving sustainable development. Hafisat Ige, a renowned Data Scientist and Women Techstar Fellow â23, provided a deep dive into data-driven strategies for SDG advancement.
Participants, including Undergraduate Students, Educators, Librarians, Mass media and Medical professionals, engaged actively in the sessions. Despite the limitation of resources restricting invitations to only 20 out of 39 registered attendees, the event was a resounding success, with contributions spanning across various SDG-related topics on Wikidata.
The event utilized the Outreach Dashboard to track participant contributions, which included the creation of 2 new items, the editing of 12 items, and a total of 87 edits by 36 editors. The efforts of participants led to the addition of 28 new references, enhancing the reliability and depth of Wikidataâs SDG-related content.
In conclusion, Wikidata Loves SDGs 2024 not only highlighted the critical role of open data in sustainable development but also demonstrated the power of community collaboration in enriching the global data repository for the greater good. The event set a precedent for future initiatives aimed at leveraging open data for societal progress.
The challenges of transitioning to new metadata workflows have long been a concern to OCLC RLP Metadata Managers Focus Group members (What should metadata managers be learning?, Filling the bench, New skill sets for metadata managers). Recently, the group has asked me to facilitate deeper conversations about how to address these challenges. For the January 2024 session, I contacted Crystal Goldman, General Instruction Coordinator for the UC San Diego Library. Crystalâs research examines how staff in research libraries understand and apply succession planning. She notes that although there is some literature about the potential benefits of succession planning (and a call for more among library leaders/HR professionals), no comprehensive studies have been conducted across different libraries. In both her interviews and surveys, she has focused on three areas of activities (based on a framework from the Society of Human Resource Managers (SHRM)):
training and development
career planning and management
replacement planning or formal succession planning
To help us understand where Metadata Managers stand, we asked for responses to an informal survey using some of the questions from a previous instrument used in Crystalâs study of succession planning in ARL libraries.
Among both ARL libraries and Metadata Managers, formal succession planning (i.e. planning/preparing multiple individuals to potentially step into leadership roles) happens (if it happens at all) mostly at senior leadership levels. Like other ARL respondents, Metadata Managers were more likely to know about formal succession planning in their organizations if they were already managers in a leadership role. Metadata Managers identified that they engaged in replacement planning, often around key life events like expected temporary parental/medical leave and/or retirements. Even in these cases, identifying staff to fill gaps may happen in informal discussions with other managers while not directly engaging with staff who might see themselves in new roles. In the worst-case scenarios, Metadata Managers found themselves with unexpected vacancies, forcing them to promote âaccidental managersâ into leadership roles.
Metadata Managers reported slightly higher activity than most ARL respondents around training and development. Participants in our session felt this was unsurprising given the nature of metadata work and the changing landscape of technical developments that have been occurring. Similarly, Metadata Managers participate in some career planning and management, especially thinking about what kinds of competencies will be needed in the next five years. Forecasting those skills can inform decisions about hiring new staff members and/or providing opportunities for staff willing to seek new challenges.
When the topic of succession planning has come up in the past, I sensed that Metadata Managers were responding to broad calls to do better in this area â and perhaps felt guilty that they hadnât made more progress. One of the most valuable things I walked away from the sessions with was a better way to tease apart the challenges we are all facing into structural, cultural, and agentive issues.  Â
Structure
In both our sessions, Metadata Managers acknowledged the challenges of working within organizational contracts, collective bargaining agreements, or other job classification criteria. At a time when metadata is changing, these structures can require additional effort to redefine a positionâs required skills and experience. This may not be feasible due to time limitations and/or limited availability from human resources staff that are trying to fill multiple open positions. In these scenarios, it can help to focus energies toward longer-range thinking about competencies.
Several Metadata Managers noted that these structures can be especially frustrating in places where metadata is transitioning. Moving away from cataloging to other kinds of next-generation metadata work can be inhibited by structural agreements that classify staff differently. As hiring managers are already struggling against economic forces to attract people into libraries with the needed computer/data science expertise, this can require additional effort to navigate. Structures also limited Metadata Managersâ agency to provide professional development opportunities to staff with aptitude/attitude for new challenges because they fall outside narrowly defined positions.
Institutional policies requiring searches to be conducted in a specific way (e.g. external national searches) can also make it hard to elevate staff with an aptitude for leadership within the organization. In Crystalâs research and in our discussions, examples surfaced of promising leaders needing to leave their organizations to advance their careers. For other types of libraries, transitioning into a management role may come with risks due to the loss of contract protections.
Culture
In many ways, succession planning in academic libraries reflects the culture of academic institutions more broadly. In principle, these are organized around merit-based systems of advancement (i.e. tenure) that find corporate-style succession planning distasteful. In these contexts, seeking external candidates holds more value than advancing staff internally. These aspects of culture are often reified into structural policies that are difficult to change (either through practice or contractual obligations).
While there is value in adding new views and voices to an organization, this practice of preferring external hires can inhibit investments in developing staff leadership skills that are key to succession planning. This approach can also create self-fulfilling feedback loops, i.e. current leadership is reluctant to invest in leadership training for non-management staff because they will not be able to advance within the organization. This is reinforced by a fear that when staff do get this training, they are likely to find it easier to leave with their new skills to another organization. These kinds of cultural attitudes are also in operation around technical skills that create a Catch-22 for both managers and staff.
Agency
Within these kinds of structures and cultures, Metadata Managers have some opportunities to exercise their agency:
How can you embed future staffing needs into other strategic planning? Rather than focusing on the advancement of an individual (i.e. traditional succession planning), how can you have transparent conversations about how to advance as a group? In the process, you may find individuals who also want to advance their leadership/technical skills. This longer-range planning can also provide the time needed to navigate structural barriers and provide opportunities to redefine job descriptions that allow for growth with the right attitude.
As a Metadata Manager, you can cultivate a climate that supports discussions about career planning beyond immediate skills development. Even having a basic discussion with your team about planning can be a good way to start the ball rolling.
It may also be helpful to have a conversation within your organization about what it means to be successful regarding the different activities that make up succession planning. Is developing staff who leave to be successful elsewhere a win or a loss? If this is not the outcome youâre hoping for, how can you change the structural/cultural roadblocks to success?
An area that would be worth additional follow-up discussion is the relationship between diversity, equity, and inclusion (DEI) efforts in libraries and succession planning activities. This intersection was outside the scope of Crystalâs work and only briefly discussed during our sessions. On one hand, formal succession planning has been viewed as a detriment to DEI because it can reinforce systemic bias about who can advance in an organization. On the other hand, conscientious use of succession planning activities can help clear away these same obstacles. In our discussion, it was noted that the culture of external searches has been tied to DEI recruitment goals. As noted, this already creates tension when successful leaders need to change institutions to advance, potentially having a detrimental effect on the retention of diverse staff. If this is a topic that youâre currently working on in your library, please reach out about how we could facilitate a future conversation among the Metadata Managers Focus Group.
This column outlines how libraries can add value to their both their digital offerings and programming while providing local music artists with a curated, low-barrier entrance into streaming media. Library-hosted digital music collections give up-and-coming artists increased exposure and credibility to listeners and open a wealth of opportunities to engage with their communities.
This paper summarizes librarian research on information visualization as well as general trends in the broader field, highlighting the most recent trends, important journals, and which subject disciplines are most involved with information visualization. By comparing librarian research to the broader field, the paper identifies opportunities for libraries to improve their information visualization support services.
Digital heritage portal interfaces are generally similar to digital library and search engine interfaces in displaying search results as a list of brief metadata records. The knowledge organization and search result display of these systems are item-centric, with little support for identifying relationships between items. This paper proposes a knowledge graph system and visualization interface as a promising solution for digital heritage systems to support users in browsing related items, understanding the relationships between items, and synthesizing a narrative on an issue. The paper discusses design issues for the knowledge graph, graph database, and graph visualization, and offers recommendations based on the authorsâ experience in developing three knowledge graph systems for archive and digital humanities resources: the Zubir Said personal archive collection at the Nanyang Academy of Fine Arts, Singapore; Singapore Pioneers social network; and Polyglot Medicine knowledge graph of Asian traditional and herbal medicine. Lessons learned from a small user study are incorporated in the discussion.
To study library guides, as published on Springshareâs LibGuides platform, new approaches are needed to expand the scope of the research, ensure comprehensiveness of data collection, and reduce bias for content analysis. Computational methods can be utilized to conduct a nuanced and thorough evaluation that critically assesses the resources promoted in library guides. Web-based library guides are curated by librarians to provide easy access to high-quality information and resources in a variety of formats to support the research needs of their users. Recent scholarship considers library guides as valuable resources and as de facto publications, highlighting the need for critical study. In this article, the authors present a novel model for comprehensively gathering data about a specific genre of books from individual LibGuide pages and applying computational methods to explore the resultant data. Beginning with a pre-selected list of 159 books, we programmatically queried the titles using the LibGuides Community search engine. After cleaning and filtering the resultant data, we compiled a list of 20,484 book references (of which 6,212 are unique) on 1,529 LibGuide pages. By testing against inclusion and exclusion criteria to ensure relevancy, we identified a total of 281 titles relevant to our topic. To gain insights for future study, citation analysis metrics are presented to reveal patterns of frequency, co-occurrence, and bibliographic coupling of books promoted in LibGuides. This proof-of-concept could be adopted for a variety of applications, including assessment of collections, public services, critical librarianship, and other complex questions to enable a richer and more thorough understanding of the information landscape of LibGuides.
Despite ongoing efforts to improve database accessibility, aggregated database vendors concede that they do not have complete control over document accessibility. Instead, they point to the responsibility of journal publishers to deliver articles in an accessible format. This may increase the likelihood that users with disabilities will encounter articles that are not compatible with a screen reader. To better understand the extent of the problem, a document accessibility audit was conducted of randomly selected articles from EBSCOâs Library & Information Source database. Full-text articles from 12 library science journals were evaluated against two measures of screen reader compatibility: HTML format (the optimal format for screen readers) and PDF accessibility conformance. Findings showed inconsistencies in HTML format availability for articles in the selected journals. Additionally, the entire sample of PDF articles failed to meet the minimum standard of PDF Universal Accessibility of containing a tagged structure. However, all PDF articles passed accessibility permissions tests, so could be made accessible retroactively by a third party.
This paper investigates the potential of the Gamified Metaverse as a platform for promoting library services. The study compares the effectiveness of a traditional library program with a Metaverse-based library program in terms of knowledge acquisition and library anxiety. The research also examines studentsâ perceptions of implementing gamification within the context of the Gamified Metaverse platform. A mixed-methods approach was adopted, including pre- and post-test analysis, statistical analysis, and qualitative data collection. The results indicate that both the traditional and Metaverse-based library programs effectively increased the participantsâ knowledge, with no significant difference between the two approaches. However, the Metaverse-based program was found to be less effective in facilitating interaction with librarians and reducing library anxiety. Additionally, students expressed positive perceptions of implementing gamification in the Gamified Metaverse platform, finding it engaging and motivating. These findings contribute to the understanding of the effect of the Metaverse as a tool for promoting library services and enhancing knowledge acquisition. However, it is not as effective in reducing library anxiety, particularly in terms of interaction with librarians and staff. It should be noted that the platform may have limitations such as high costs and potential side effects of virtual reality, making it more suitable as an additional tool for promoting library services, taking into account its feasibility and potential benefits for specific student populations and larger libraries.
Technology in libraries has played an essential role in serving todayâs communities. This study provides an overview of the integrated library systems/software (ILSs) used in libraries in South Sulawesi, Indonesia. It aims to highlight the strengths and possibilities of ILSs and briefly explain their advantages and disadvantages along with the cost of implementation. The data was gathered from questionnaires sent via an online survey and from direct interviews with certain academic libraries over the period of 2019 to 2020. Fifty-three of 67 libraries that fulfilled the study have implemented an ILS. To deeply understand the application, a direct interview with some libraries was conducted to learn the advantages and disadvantages. The result of the study showed that the most used ILSs are SLiMS and INLISlite and other programs like Apollo, Athenium Light, Simpus, Spektra, Jibas, KOHA, and Openlibrary. The budget spent is an average of 300 USD. While the ILSs have helped these libraries improve services, IT expertise and adequate resources are needed, especially when the systems present problems. An easy-to-use system that costs less will potentially be used in this area of research. This study will be particularly helpful for any library in Indonesia. These findings may also be generalized to libraries in other countries facing economic and technological similarities.
On the one hand, all the models that are available for
download on Hugging Face seem pretty much like programming language
compilers and interpreters that we download and use to write software.
You donât try to open and read /usr/bin/python3 in your
text editor. You trust that it works. Simon Willison says
these models are like an âopaque blob that can do weird and interesting
thingsâ, and the same analogy seems to hold for the binary executables
we run too.
But the big difference is that, once you get various dependencies
assembled correctly, you can build
the Python binary. The build depends on other opaque blobs being set up,
like gcc, which in turn can be built by bootstrapping
using a lower level language. There are layers of abstraction at work
that can be tested, and reasoned about, which lead us to having some
confidence that things are working correctly. It might get complicated
but we can debug them when they donât work correctly.
This is not true of the opaque blob Large Language Model (LLM). We donât
have access to the source code that was used to create it. Compiling it
can require a huge investment in time and resources. Thereâs no way to
debug its logic. If youâre lucky there may be a paper about how it was
built, but some donât because it is deemed too dangerous.
So while it feels the same, itâs really not. I just donât understand why
people would like to integrate LLMs into applications, for generating
database queries, or API calls. It seems to me like we would want to be
able to reason about these things, and that we lose the ability to do
that when using an LLM. Why does anyone think this is a good idea?
And if this style of programming were really to catch on with a new
generation of programmers, would we lose our ability to understand SQL
or REST? Are these really useless abstractions like Assembler
that we want to forget? Wonât our ability to reason about our
applications atrophy? The state of software is already kind of bad, and
it seems like some people are dreaming up ways of making it even worse.
These customer experience basics will actually make a meaningful impact on your bottom line. AR and VR, blockchain, and 3D models wonât help you get there.
we increase the capacity of [optical data storage] to the petabit level by extending the planar recording architecture to three dimensions with hundreds of layers, meanwhile breaking the optical diffraction limit barrier of the recorded spots. We develop an optical recording medium based on a photoresist film doped with aggregation-induced emission dye, which can be optically stimulated by femtosecond laser beams. This film is highly transparent and uniform, and the aggregation-induced emission phenomenon provides the storage mechanism. It can also be inhibited by another deactivating beam, resulting in a recording spot with a super-resolution scale. This technology makes it possible to achieve exabit-level storage by stacking nanoscale disks into arrays, which is essential in big data centres with limited space.
Below the fold I discuss this technology.
What the authors mean by "petabit level" is:
The ODS has a capacity of up to 1.6 Pb for a DVD-sized disk area through the recording of 100 layers on both sides of our ultrathin single disk.
1.6 petabit is 200TB per disk, which is 2,000 times the capacity of triple-level Blu-ray media. So this is a big increase. But weirdly, the caption to their Figure 1 claims that:
The capacity of a single 3D nanoscale disk is approximately equivalent to that of a petabit-level Blu-ray library (15.2 Pb, DA-BH7010, Hualu, China) or an HDD data array (12.64 Pb, EMC PowerVault ME5084, Dell, USA).
A decade ago, Facebook's Blu-ray library put 10,000 100GB disks in a single rack for 1 Petabyte or 8 Petabit capacity. This is 5 times as much as the authors' claim for a single disk. The caption's claim of 15.2Pb for the DA-BH7010 is 9.5 times their claim of the capacity of a single disk. Note also that they compare the volume of a single disk to the volume of complete read-write systems, which is comparing apples to oranges. I guess if your meaning of "approximately" is "within an order of magnitude" that makes sense.
The recording material on the disk has three states, as shown in the schematic Figure 3a:
The transition from the second to the third state is initiated by the 515-nm femtosecond Gaussian-shaped laser beam and deactivated by the 639-nm CW doughnut-shaped laser beam.
I assume that because this transition involves polymerization it is irreversible, making the media write-once. Comparing the dark blue line (second state) with the yellow and pink lines (third state) in Figure 3c shows that the second and third states are readily distinguishable by their emissions when illuminated by >1mw 480nm.
There are a number of reasons to be less enthusiastic about the potential of this technology than Hossenfelder. It is true that they have demonstrated the ability to read and write petabit-scale data on a CD-sized medium. To do the reading they use two lasers, a 480nm pulsed and a 592nm continuous laser. To do the writing they used two lasers, a 515nm femtosecond laser and a 639nm continuous-wave laser. I haven't been able to find a price for a 515nm femtosecond laser, but here is a 1550nm femtosecond laser for $48,880. The femtosecond laser they actually (Acculasers ACL-AFS-515-CUS) used is a substantial box with fans and an AC power input.
The authors make claims of the density of the medium but not of the system. Clearly, current femtosecond lasers are too expensive and too large to use in equivalents of the decade-old Facebook Blu-Ray technology. Something like Microsoft Research's system that uses femtosecond lasers to write in Silica allows the cost of the lasers to be amortized over an entire data-center aisle of media. If you are going to build something like this, there is no reason to use the CD form factor.
The repetition rate of the femtosecond laser was 42MHz. I believe it writes one bit per pulse, so the write bandwidth is limited to around 5MB/sec, meaning that writing an entire disk would take around 10.5 10,000 hours. A system using this technology would be write-once, and have a long read latency while the robot fetched the needed disk. It would thus only be suitable for the niche archival market, and in this market the slow write rate would require many drives writing in parallel. This all makes this claim by the authors somewhat hyperbolic:
the development of next-generation industry-oriented nanoscale ODS that is much less expensive than state-of-the-art optical disk libraries and HDD data arrays will fulfil the vast data storage requirements of the big-data era.
Six years on flash has finally impacted the bulk storage market, but it isn't predicted to ship as many bits as hard disks for another four years, when it will be a 40-year-old technology. Actual demonstrations of DNA storage are only 12 years old, and similar demonstrations of silica media are 15 years old. History suggests it will be decades before these technologies impact the storage market.
Hossenfelder makes several mistakes in her report:
"new disk memory that could bring disk memory into the Petabyte range" - no, that is the Petabit range.
Optical disks "were outcompeted by hard disks". - no, write-once removable media and on-line storage are two completely different markets. Optical disks lost out to the cloud and to a lesser extent by flash.
"the information density on compact disks or any optical storage is ultimately limited by the frequency of the laser light" - well yes, but she is talking about a paper describing a 2000-times increase in capacity using laser light.
"in modern flash drives the information is stored in little magnetizable cells that are a few atoms in size" - no, flash isn't a magnetic technology. She also misses that modern flash is a volumetric not a planar technology, just like the technology in the paper.
"figured out how to write data in multiple layers" - no, Blu-ray is a multi-layer technology more than a decade old. They figured out how to write a lot more layers of much smaller bits.
"this could work up to hundreds of layers" - well, they only demonstrated 100 layers, so hundreds plural is speculation. To get to the petabyte range needs at least 500 layers or much smaller bits. Note that modern flash has over 100 layers.
âWhat we think time is, how we think it is shaped, affects how we are able to move through it.â
-Jenny Odell Saving Time, p. 270
What I love about reading Jenny Odellâs work is that I often end up with a list of about a dozen other authors I want to look into after I finish her book. She brings such diverse thinkers beautifully into conversation in her work along with her own keen insights and observations. One mention that particularly interested me in Odellâs book Saving Time (2023) was What Can a Body Do(2020) by Sara Hendren. Her book is about how the design of the world around us impacts us, particularly those of us who donât fit into the narrow band of what is considered ânormal,â and how we can build a better world that goes beyond accommodation. Her book begins with the question âWho is the built world built for?â and with a quote from Albert Camus: âBut one day the âwhyâ arises, and everything begins in that weariness tinged with amazementâ (1).
âWhyâ is such a simple world, but asking it can completely alter the way we see the world. Thereâs so much in our world that we simply take for granted or assume is the only way because some ideology (like neoliberalism) has so deeply limited the scope of our imagination. Most of what exists in our world is based on some sort of ideological bias and when we ask âwhyâ we crack the world open and allow in other possibilities. Before I read the book Invisible Women (2021) by Caroline Criado Perez, I already knew that there was a bias towards men in research and data collection as in most things, but I didnât realize the extent to which the world was designed as if men were the only people who inhabited it and how dangerous and harmful it makes the world for women. What Can a Body Do similarly begins with an exploration of the construction of ânormalâ and how design based on that imagined normal person can exclude and harm people who arenât considered normal, particularly those with disabilities. The book is a wonderful companion to Invisible Women in looking at why the world is designed the way it is and how it impacts those who it clearly was not built for. Iâll explore that more in a later essay in this series.Â
One thing I took for granted for a very long time was time itself. I thought of time in terms of clocks and calendars, not the rhythms of my body nor the seasons (unless you count the start and end of each academic term as a season). I believed that time was scarce, that we were meant to use it to do valuable things, and that anything less was a waste of our precious time. I would beat myself up when, over Spring Break, I didnât get enough practical home or scholarship projects done or if I didnât knock everything off my to-do list at the end of a work week. I would feel angry and frustrated with myself when my bodily needs got in the way of getting things done (Iâm writing this with ice on both knees due to a totally random flare of tendinitis when Iâd planned to do a major house cleaning today so Iâm really glad I donât fall into that shooting myself with the second arrow trap as much as I used to). I looked for ways to use my time more efficiently. I am embarrassed to admit that I owned a copy of David Allenâs Getting Things Done and tried a variety of different time management methods over the years that colleagues and friends recommended (though nothing ever stuck besides a boring, traditional running to-do list). Iâd often let work bleed into home time so I could wrap up a project because not finishing it would weigh on my mind. I was always dogged by the idea that I wasnât getting enough done and that I could be doing things more efficiently. It felt like there was never enough time all the time.Â
From Harold Lloydâs Safety Last (1923)
I didnât start asking questions about time until I was 40 and the first one I asked was a big one âwhat is the point of our lives?â Thinking about that opened a whole world of other questions about how we conceive of time, what kinds of time we value, to what end are we constantly trying to optimize ourselves, what is considered productive vs. unproductive time, why we often value work time over personal time (if not in word then in deed), why time often requires disembodiment, etc. The questions tumbled out of me like dominoes falling. And with each question, I could see more and more that the possibility exists to have a different, a better, relationship with time. I feel Camusâ âweariness, tinged with amazement.â
This is an introduction to a series of essays about time: how we conceive of it, how it drives our actions, perceptions, and feelings, and how we might approach time differently. Iâll be pulling ideas for alternative views of time from a few different areas, particularly queer theory, disability studies, and the slow movement. Iâm not an expert in all these areas, but Iâll be sure to point you to people more knowledgeable than me if you want to explore these ideas in more depth.
How many of you feel overloaded with work? Like youâre not getting enough done? How many of you are experiencing time poverty: where your to-do list is longer than the time you have to do your work? How many of you feel constantly distracted and/or forced to frequently task-switch in order to be seen as a good employee? How many of you feel like youâre expected to do or be expert in more than ever in your role? How many of you feel like itâs your fault when you struggle to keep up? More of us are experiencing burnout than ever before and yet we keep going down this road of time acceleration, constant growth, and continuous availability that is causing us real harm. People on the whole are not working that many more hours than they used to, but we are experiencing time poverty and time compression like never before, and that feeling bleeds into every other area of our lives. If you want to read more about how this is impacting library workers, Iâll have a few article recommendations at the end of this essay.
My exploration is driven largely by this statement from sociologist Judy Wajcmanâs (2014) excellent book Pressed for Time: âHow we use our time is fundamentally affected by the temporal parameters of work. Yet there is nothing natural or inevitable about the way we workâ (166). We have fallen into the trap of believing that the way we work now is the only way we can work. We have fallen into the trap of centering work temporality in our lives. And we help cement this as the only possible reality every time we choose to go along with temporal norms that are causing us harm. In my next essay, Iâm going to explore how time became centered around work and how problematic it is that we never have a definition of what it would look like to be doing enough. From there, Iâm going to look at alternative views of time that might open up possibilities for changing what time is centered around and seeing our time as more embodied and more interdependent. My ideas are not the be-all end-all and Iâm sure there are thinkers and theories Iâve not yet encountered that would open up even more the possibilities for new relationships with time. To that end, Iâd love to get your thoughts on these topics, your reading recommendations, and your ideas for possible alternative futures in how we conceive of and use time.Â
Works on Time in Libraries
Bossaller, Jenny, Christopher Sean Burns, and Amy VanScoy. âRe-conceiving time in reference and information services work: a qualitative secondary analysis.â Journal of Documentation 73, no. 1 (2017): 2-17.
Brons, Adena, Chloe Riley, Ean Henninger, and Crystal Yin. âPrecarity Doesnât Care: Precarious Employment as a Dysfunctional Practice in Libraries.â (2022).
Drabinski, Emily. âA kairos of the critical: Teaching critically in a time of compliance.â Communications in Information Literacy 11, no. 1 (2017): 2.
Kendrick, Kaetrena Davis. âThe public librarian low-morale experience: A qualitative study.â Partnership 15, no. 2 (2020): 1-32.
Kendrick, Kaetrena Davis and Ione T. Damasco. âLow morale in ethnic and racial minority academic librarians: An experiential study.â Library Trends 68, no. 2 (2019): 174-212.
Lennertz, Lora L. and Phillip J. Jones. âA question of time: Sociotemporality in academic libraries.â College & Research Libraries 81, no. 4 (2020): 701.
McKenzie, Pamela J., and Elisabeth Davies. âDocumenting multiple temporalities.â Journal of Documentation 78, no. 1 (2022): 38-59.
Mitchell, Carmen, Lauren Magnuson, and Holly Hampton. âPlease Scream Inside Your Heart: How a Global Pandemic Affected Burnout in an Academic Library.â Journal of Radical Librarianship 9 (2023): 159-179.
Nicholson, Karen P. âBeing in Timeâ: New Public Management, Academic Librarians, and the Temporal Labor of Pink-Collar Public Service Work.â Library Trends 68, no. 2 (2019): 130-152.
Nicholson, Karen. âOn the space/time of information literacy, higher education, and the global knowledge economy.â Journal of Critical Library and Information Studies 2, no. 1 (2019).
Nicholson, Karen P. ââTaking backâ information literacy: Time and the one-shot in the neoliberal university.â In Critical library pedagogy handbook (vol. 1), ed. Nicole Pagowsky and Kelly McElroy (Chicago: ACRL, 2016), 25-39.
Awesome Works on Time Cited Here
Hendren, Sara. What Can a Body Do?: How We Meet the Built World. Penguin, 2020.
Odell, Jenny. Saving Time: Discovering a Life Beyond Productivity Culture. Random House, 2023.
Wajcman, Judy. Pressed for time: The acceleration of life in digital capitalism. University of Chicago Press, 2020.
This yearâs Open Data Day (ODD) was a huge success. Almost 300 events registered worldwide, with 60 countries participating in 15+ different languages.
Before starting the #ODDStories 2024 series, with reports from events around the world, weâve just finalised a report with the main figures and data on the 2024 edition and we canât say it any other way: A HEARTFELT THANKS to everyone in the open data community!
Some lessons learned from this yearâs data:
Open Data Day goes far beyond the days of the event. Our community continues to promote open data beyond the official dates.
Communities and countries in the Global South have shown a great appetite for open data and a growing mobilisation for open data for development.
Our community members prioritise real interactions at face-to-face and both hyperlocal and global events.
The global open data community is growing: +55.9% events in 2024 and +9.4% members compared to last year.
Open Data Day is a truly diverse initiative in terms of gender, power, levels of knowledge and geography.
Letâs move on to 2025 with an even bigger, more diverse and impactful event!
About Open Data Day
Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities.
As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date within a one-week period. In 2024, a total of 287 events happened all over the world between March 2nd-8th, in 60+ countries using 15 different languages.
All outputs are open for everyone to use and re-use.
In 2024, Open Data Day was also a part of the HOT OpenSummit â23-24 initiative, a creative programme of global event collaborations that leverages experience, passion and connection to drive strong networks and collective action across the humanitarian open mapping movement
For more information, you can reach out to the Open Knowledge Foundation team by emailing opendataday@okfn.org. You can also join the Open Data Day Google Group to ask for advice or share tips and get connected with others.
I was asked to participate in a panel at work about AI. I initially
declined, but once it became clear that I would be allowed to get on my
soapbox and rant for 15 minutes I agreed. Below are my notes and some slides.
This was not a fun post to write or present. Iâm sure it rubbed some
people the wrong way, and I am genuinely sorry for that.
Iâve done a little bit of work with AI, like downloading some models
from Hugging Face as part of named entity recognition experiments,
running Whisper on some interviews that I wanted a transcript for, testing out Googleâs
new file identification tool, and writing a bot to use the OpenAI API to
generate some fake diary
entries from some random words a friend of mine was publishing.
But as I listened to my CPU fan spinning, all this experience has really
done is reinforce some concerns I have as a software developer about the
AI industry, and the application of these technologies in libraries and
archives.
Itâs not that I donât think these tools and methods have some use in the
cultural heritage sector, but I do think we need to think carefully and
critically about them. Iâm sure you will be familiar with at least a few
of these topics, but I thought it could be useful to bring them
together, with links to learn more, and also close each one out with
some tactics for addressing them.
If you take nothing else from this presentation Iâd like it to be that
despite what the âboomersâ and âdoomersâ would like you to believe, the
ascendency of AI is not inevitable, and we have decisions to make. What
we decide to do will have a big impact on how these technologies get
deployed.
Much of this perspective is informed by my own interest in Science and
Technology Studies, which encourages an understanding of technology in
its social and historical context, and to remember that âit could be
otherwiseâ (Woolgar, 2014).
It was also informed by reading Dan McQuillanâs book Resisting
AI (highly recommended).
Despite the recent surge of interest in Large Language Models and
Generative AI tools (ChatGPT, DALL-E, etc), AI is part of long history
of computer automation, which is continuing to transform our work, and
our lives. I have tended to prefer the term Machine Learning
(ML) to AI, because it has more specificity when discussing the
recent application of statistical algorithms to increasingly large
datasets, using increasingly large computing environments. But Iâve also
come to appreciate that the term AI is actually useful for talking about
this longer trajectory of automation, stretching back to the beginnings
of modern computing. Looking at this technology as part of a very long
project, involving a shifting set of actors is important.
However the areas that Iâm going to touch on here refer mostly to recent
developments with Large Language Models, although some of them are
relevant for more specialized forms machine learning as well. There are
five points of concern, and for each area Iâll include a tactic
for addressing it in libraries and archives.
Bias
ML models are built using data. Recent advances in Deep Learning have
largely been the result of applying decades old algorithms to
increasingly large amounts of data collected from the web. The data that
is used to train these models is significant because the models
necessarily reflect the data that was used to create them.
Unfortunately corporations are increasingly tight lipped about the data
that has been used to train these models (more on that next).
Some commonly used datasets like CommonCrawl represent significantly
large collections of web data, but the web is a big place, and decisions
have gone into what
websites were collected. CommonCrawl is not representative of the
web as a whole. Furthermore LLMs encode biases that are present in
todayâs society. Blindly using and becoming dependent on LLMs risks
further encrusting these biases and participating in systemic racism.
As LLMs are used to generate more and more web content there is also a
risk that this data is again collected and used to train future models.
This process has been called Model
Collapse and has been shown to lead to a process of forgetting.
OpenAI launched a tool for identifying content generated with an LLM and
had to shut it down 6 months later because it didnât work, and its not
clear that it can even be done with reliability. What would it mean to
only train these models with pre-2023 data?
Tactic: When evaluating an AI tool always see if you can
identify what data has been used to train the model(s). How has it been
âcleanedâ or shaped? How is it updated?
Intellectual Property
Since LLMs have been built with data collected from the web this
includes many types of content, from openly licensed datasets designed
to be shared, to copyrighted books like those found in the
books1,2,3 datasets, which are rumored to have been assembled
from shadow libraries like Library Genesis and SciHub. Over the last
year weâve seen several lawsuits including from the Authors Guild
challenging OpenAIâs use of copyrighted materials in building their GPT
models.
In some ways these types of lawsuits are not new to the web. Napster was
challenged by the Recording Industry Association of American; Google
Books was sued by the Authors Guild in the mid 2000s; the Internet
Archive has been recently sued over its Open Library platform. But what
makes LLMs a bit different is the way they transform the content theyâve
collected, rather than making it available verbatim. The US Copyright
Office published a notice of
inquiry last year to gather information about the use of copyrighted
materials in AI tools, which we can expect to hear more about this year.
But this is not just an issue for blatantly pirated material.
The
New York Times is also suing because of how millions of their openly
published news stories were used by OpenAI to train their models,
without a license. OpenAI is in the midst of trying
to negotiate licensing contracts after the fact with many big
players.
The way LLMs function represents a big shift in how the web ecosystem
has evolved. Web search engines like Google crawl web pages to index
them, and provide users with search results that link back to the
original website. Similarly, social media platforms have provided a
place to discuss web content by sharing links to it, driving other users
to the web publisher.
In the LLM paradigm users never leave the ChatGPT interface, and the
original publisher is completely cut out of the virtuous circle. LLMs
are enclosing the web commons, and threaten to choke off the very
sources of content that they used. Web publishers will lose the ability
to understand how their content is being used.
Some web publishers have chosen to tell LLM bots to stop using
robots.txt. Not all the bots collecting data from the web for LLMs will
respect robots.txt files. In one experiment Ben
Welsh found that 54% of news publishers (628 out of 1156) have
decided to block OpenAI, Google AI, or CommonCrawl.
Tactic: What content should we make available to Generative AI
tools. What would our donors want?
Verifiability
One of the reasons why ChatGPT doesnât link to websites as citations is
that it doesnât know what to link to. In LLMs the neural
network doesnât record information about where a particular piece of
data came from. As LLMs get integrated into more traditional search
tools the challenge is to make
generated text verifiable in the sense that the results include
in-line citations, which should support the statement that they are used
in.
Verifiability is important for understanding when generated content is
out of alignment with the world, a so called âhallucinationâ. Itâs also
important for explaining why the model generated the response it did,
when trying to debug why some interaction went wrong. Explainability is
an active research area
in the ML/AI community, and itâs not clear that given the model size and
the size of the training data, whether the models can be made
explainable, because at a fundamental level we
donât understand why they work. Generative AI applications that
include citations have been shown to be unreliable, and
provide a false sense of security.
The lack of explainability in LLMs presents real problems for libraries
and archives whose raison dâĂȘtre is to provide users with documents,
whether they are books, maps, photographs, sound recordings, films,
letters, etc. We describe these documents, and preserve these documents,
in order to provide access to them, so that users can derive meaning
from them. If we use an LLM to generate a response to a query or prompt,
and we canât back up the response with citations to these documents,
this a problem.
This lack of verifiability is starting to be a problem for Wikipedia
too.
Tactic: Library and archives professionals have a role in
evaluating how AI tools cite documents as evidence.
Work
Part of the value proposition behind recent AI tools like GitHubâs
Copilot, ChatGPT or DALL-E is that they democratize access to
some skill whether it be writing code, authoring news stories, or
creating illustrations. But is it democratic to systematically undermine
creative workers, by stealing their content without having asked to use
it in the first place?
When you make a decision to use these tools you are potentially replacing
a personâs skill with a service. Furthermore you are binding your own
organization to the whims of a corporation which would like nothing
better than for you to divest
of your organizationâs expertise and become completely dependent on
their service. Itâs a trap.
If the past is any guide, we can also expect that skilled creative jobs
will be replaced with lower paid jobs that involve mundane cleaning
of the messes that have been made by automation. Or in the words of
screenwriter C. Robert Cargill (quoted in the previous link):
The immediate fear of AI isnât that us writers will have our work
replaced by artificially generated content. Itâs that we will be
underpaid to rewrite that trash into something we could have done better
from the start. This is what the WGA is opposing and the studios want.
LLMs like ChatGPT are built using a technique called Reinforcement
Learning with Human Feedback (RLHF). The important part here is the
human feedback. Who
is providing this feedback? Are they users of the system? What types
of systematic biases does this training introduce? Are they lower paid
âghost workersâ?
Tactic: When evaluating the use of AI tools involve the people
whose work is impacted in the decision making and
implementation.
Sustainability
Probably the most troubling
aspect to the latest wave of AI technology is their
environmental impact. Recent advances in LLMs were not achieved
through a better understanding of how neural networks work, but by using
existing algorithms with massive amounts of data and compute resources.
This training can takes months of time, and needs to be repeated to keep
models up to date.
Apparently the initial training of GPT-4 took $100 million. The training
relies on Graphical Processor Units (GPU) which are faster than CPUs for
the types of computation that LLMs demand, but require up to four times
as much energy to run. Data centers require waterto
cool, sometimes in environments where it is scarce. This isnât just
a problem for training models, itâs a bigger problem for querying
them which has been estimated to be 60-100 times more in terms of
energy utilization. Another problem lurking here is the lack of data
from data centers that provides transparency about what is going on.
Is this the really the right direction for us to be headed as we are
trying to reduce energy costs globally to limit global warming?
The tech industry is incentivized to try to make AI infrastructures more
efficient. But Jevons Paradox
will likely hold: technological progress increases the efficiency with
which a resource is used, but the falling cost of use induces increases
in demand enough that the resource use is increased.
Tactic: Libraries and archives should be looking for ways to
reduce
energy consumption not increase it.
Security and Privacy
Generative AI is a dual use technology. Experts are increasingly worried
that it will be used to create disinformation as well as fake
interactions online. Weâve had court cases where filings made by lawyers
contained citations
to cases that didnât exist. AI generated voice robo-calls
have been made illegal because of how AI tools were used to impersonate
Bidenâs voice. Bad actors can manipulate images and video to target
specific groups because the tools are more powerful and accessible.
There are possible ways to mitigate this by using trusted sources of
information and provable ways of sharing the provenance of media.
Since the mechanics of how LLMs generate content are not explainable
they are susceptible to attacks like what Simon Willison calls prompt
injection. This is where a prompt is crafted to subvert the original
design of the system to generate an intended response. This has serious
ramifications for the use of LLM technology as glue between other
automated systems. Indeed this was recently demonstrated by researchers
using OpenAI and Google APIs to execute arbitrary code, and exfiltrate
personal information.
While its not great to conflate privacy with security, Iâm running out
of time, and itâs important to note that privacy is also a problem. As
LLM APIs are deeply integrated into applications, data will flow from
one context into another. For example Docusign
and Dropbox recently announced that they were integrated OpenAI into
their products. When enabled your data will flow to OpenAI who may or
may not use it to further train their models.
Tactic: support legislation
that gives users agency over their data and practices that help ensure
authenticity and provenance.
The good news for Tether is shown in this graph, with two huge surges in "market cap" this year. One of about $15B early in the year, and another of about $6B recently. It looks like the euphoria over the prospect of spot Bitcoin ETFs has solved the Greater Fool Supply-Chain Crisis with the cryptosphere experiencing a massive inflow of around $20B actual dollars. As one might expect from injecting $20B whose only uses are to HODL or to buy cryptocurrency into the market, the result has been a massive bubble in cryptocurrency "prices".
Bitcoin has gone from about $16K at the start of the year to around $42K recently. Ethereum has merely doubled, from about $1.2K to about $2.4K.
So all is well with the world; Tether gets to keep the interest on another $20B, which at say 4% is an extra $800M/year on their bottom line, and the Bitcoin HODL-ers see their investment gamble return a 160% gain. Is all really well with the world? Follow me below the fold.
Did hordes of retail investors gamblers really send $20B in cash to Tether? No, because Tether only deals with authorized institutions, and only in amounts over $100K, Did institutions send Tether $20B in cash? It seems unlikely. At various times over its history Tether has been caught minting USDT in return for things other than cash, such as loans to mysterious Chinese companies, or even thin air.
It has never been audited, and has been described as being "practically quilted out of red flags". Matt Levine says "I feel like eventually Tether is going to be an incredibly interesting story, but I still donât know what it is."
When people talk about the leaders of Tether, a few names typically occupy the conversation. Paolo Ardoino, recently promoted from CTO to CEO of the company, is the public face of Tether on Twitter and in the media. Giancarlo Devasini, the failed plastic surgeon, then failed software pirate, now billionaire CFO (howâs that for a career trajectory?), is widely regarded as the de facto leader of the company. And people often joke about Tetherâs absentee former CEO, JL van der Velde, asking whether he even exists (he does, and he also was a serial failure before joining Tether).
...
We have been puzzled by Tetherâs leadership for some time. The massive success of the company and the complexity of its operations seem beyond the abilities of a small-time scam operator and repeat failure (Giancarlo), an inexperienced and sweaty front man (Paolo), and another failed businessman and absentee executive (JL). Notably, none of these guys had any significant prior experience in finance, and it is unlikely that any of them had the sorts of business and political ties that would be essential for keeping a controversial company like Tether afloat.
His name is Christopher Charles Sherriff Harborne, AKA Chakrit Sakunkrit. Styling himself a âdigital nomad,â Mr. Harborneâs busy hands reach across continents, industries, and political movements. The scope of Mr. Harborneâs activities and the apparent wealth backing his activities is staggering. Among those many diverse interests, it appears that Tether has become one of his most important interests. And Mr. Harborne has been far more than a passive investor in the company. Indeed, the available evidence suggests that Mr. Harborneâs involvement in Tether is far more significant than generally recognizedâŠ
Mr. Harborne has major ownership interests in other notable companies. Through a Delaware corporation he is the largest minority shareholder in QinetiQ, a major British defense contractor; this stake is worth around $200 million at present. He is also the sole owner of IFX Payments, a British fintech company specializing in moving large sums of money around the globe (hm). In total, over a dozen corporate entities spread across the globe have been linked to Mr. Harborne, with many more likely still hidden.
Note: On February 28, 2024, Mr. Harborne sued the Wall Street Journal regarding alleged defamation. Mr Harborne alleges that the Journal misrepresented his opening of a bank account at Signature Bank as attempting to assist Bitfinex during a time of crisis. The Journal removed this portion of the article several days prior to the suit being filed. Mr. Harborneâs attorneys subsequently contacted Dirty Bubble Media requesting parts of the article referring to the Journalâs story be removed; while the case is pending we have acquiesced to their request. Additionally, we have added clarification regarding Mr. Harborneâs role as a âprincipalâ at Bitfinex.
Bitcoin is over $44,000! In just the last week, the invisible hand of the market suddenly decided that bitcoins are really good now!
By complete coincidence, Tether has printed five billion USDT stablecoins in the past month out of thin air as âloansâ â backed in the Tether reserve only by the âloansâ themselves.
How high can you pump a number with five billion fake dollars to deploy?
...
In just one month, from November 5 to December 5, Tetherâs issuance climbed from 85 billion to 90 billion.
On December 25, Tether minted 1 billion of its USDT dollar-pegged stablecoin. CEO Paolo Ardoino announced on Twitter that the mint was an "authorized but not issued transaction, meaning that this amount will be used as inventory for next period issuance requests and chain swaps". This seems to be a recent trend for Tether, as similar language was used for a $1 billion mint in September.
The activity has raised more questions around where the real money backing Tether is coming from, and if it even exists at all. Some have argued that these recent Tether mints are being used to artificially inflate the price of Bitcoin, which has been on an upward trend since mid-October.
You would think, with that kind of totally genuine and organic market demand for stablecoins, USDCâs issuance would also be going up â but no. USDCâs issuance is 24.4 billion, having seen a steady decrease from 44 billion in March 2023.
So where is Tether getting all the dollars to back these tethers?
It isnât. Tetherâs printing press is not fueled by demand. This is Tether issuing loans to some of its biggest customers â printing pseudo-dollars out of thin air, with the only âbackingâ being the loan itself, counted as an asset. The loans are secured by cryptos held as collateral â not as reserves. No actual dollars flow into the system this way.
Why would crypto-bros prefer USDT to USDC if it is doubtful that it is fully backed? It might have something to do with USDT lacking "regulatory clarity". How do we know Tether is printing USDT out of thin air? Because:
Tether spent years denying that they issued tethers from thin air as loans â then Alex Mashinsky of Celsius Network confirmed in October 2021 that Celsius had been taking out such loans from Tether. It came out in the CFTC settlement later that month that they had been doing this precise thing for a while.
Tether admitted in September that it was making âsecuredâ loans again â after saying in December 2022 that it would reduce its secured loans to zero. [WSJ]
Presumably, during the "crypto winter", there would have been a need for customers to exchange USDT for USD. But did they?:
In mid-2022, after the Terra-Luna collapse, Tether bragged that it had âredeemedâ 16 billion USDT. We would assume most or all of that was loans being canceled and the tethers burned. We certainly donât know of any independently verifiable evidence that a single actual dollar was transferred in return.
For comparison, USDC reserves are held in short-term treasuries and cash in US bank accounts. A USDC appears to have an actual dollar backing it â and now that interest rates are up, Circle has been making a ton of money.
If Tether had billions of real dollars backing its tethers â as it claims â then the folks running Tether could also make a ton of money simply by putting the reserve into Treasury bills. They do not need to be making loans.
In late 2022, CZ from Binance was deeply upset that Sam Bankman-Fried from FTX might destabilize tethers by trying to cash out ⊠$250,000 worth. Thatâs out of a supposed reserve in the billions. This brings into serious question how many actual dollars are anywhere near Tether â clearly not enough.
Crypto institutions â exchanges, hedge funds â use the tethers to buy leverage and pump the price. They post their inflated crypto as collateral to borrow more USDT and keep pumping. [Dirty Bubble]
These crypto-backed loans are the fuel for the pump inflating the cryptocurrency bubble. Dirty Bubble Media concludes:
Based on the current dataset, we can estimate that Tether issued many billions of USDT backed by crypto collateral. The impact is far larger than one might assume just from looking at the loan balances. For example, Tether lent Celsius Network just over $4 billion in total. Our data indicates that other parties like Amber and 3AC similarly received billions in loans, which round-tripped their way through the crypto-conomy without ever touching the real financial system or ever being backed by real moneyâŠ.
Many questions remain unanswered:
What percentage of Tetherâs âredemptionsâ are actually loan repayments?
What is the impact of cycling billions of crypto-backed USDT through the crypto markets?
And, why did Tetherâs reported secured loans massively diverge from this data starting in May/June 2022, around the same time as Terra, Celsius Network, and Three Arrows Capital collapsed?
The US government isnât entirely happy with Tetherâs financial shenanigans. But theyâre really unhappy about sanctions violations, especially with whatâs going on now in the Middle East.
So Tether has announced that it will now be freezing OFAC-sanctioned blockchain addresses â and itâs onboarded the US Secret Service and FBI onto Tether! [Tether, archive; letter, PDF, archive]
Tether doesnât do anything voluntarily. We expect they were told that they would allow this or an extremely large hammer would come down upon them.
It looks like Tether has achieved some "regulatory clarity". It isn't just US authorities who can get Tether to freeze wallets. Patrick Tan asks What happens when Tether âfreezesâ your Tether?, and recounts the tale of The Victim, whose wallet was frozen at the request of Indian law enforcement. Tan concludes:
The Victimâs transactions are at very most 3 hops away from known bad actors, so itâs not entirely unreasonable for Indian authorities to require more information and detailed documents, not to mention the backdrop of ongoing scams abusing already stretched Indian law enforcement agencies.
For Tetherâs part, it looks as though they received a request from Indian law enforcement and followed it.
But perhaps, and somewhat more significantly, there is also the risk that Tether blacklists the USDT in your wallet, in response to government requests, regardless if those requests are lawful or not.
Itâs entirely possible for government officials or authorities with a personal vendetta, to target causes, or political opponents who receive donations or are known to transact in USDT, and for Tether to err on the side of caution and comply.
Itâs entirely possible that many âinnocentâ addresses are blacklisted in such opaque processes, collateral damage in purges.
The Victim might well be collateral damage, but it was essentially impossible to supply the "more information and detailed documents" that law enforcement required to lift the freeze. Such are the risks of operating without "regulatory clarity". But the kind of "regulatory clarity" the US has imposed on Tether isn't likely to help The Victim or others who become collateral damage, it is likely to increase their numbers. An increasing number of users unable to access their funds is a double-edged sword for Tether; the good news is that Tether gets to keep the interest on frozen funds, the good news is that more and more people figure out how risky USDT is.
The British Library suffered a major cyber attack in October 2023 that encrypted and destroyed servers, exfiltrated 600GB of data, and has had an ongoing disruption of library services after four months. Yesterday, the Library published an 18-page report on the lessons they are learning. (There are also some community annotations on the report on Hypothes.is.)
Their investigation found the attackers likely gained access through compromised credentials on a remote access server and had been monitoring the network for days prior. The attack was a typical ransomware job: get in, search for personal data and other sensitive records to copy out, and encrypt the remainder while destroying your tracks. The Library did not pay the ransom and has started the long process of recovering its systems.
The report describes in some detail how the Library recognized that its conglomeration of disparate systems over the years left them vulnerable to service outages and even cybersecurity attacks. They had started a modernization effort to address these problems, but the attack dramatically exposed these vulnerabilities and accelerated their plans to replace infrastructure and strengthen processes and procedures.
The report concludes with lessons learned for the library and other institutions to enhance cyber defenses, response capabilities, and digital modernization efforts. The library profession should be grateful to the British Library for their openness in the report, and we should take their lessons to heart.
The Attack
The report admits that some information needed to determine the attackersâ exact path is likely lost. Their best-effort estimate is that a set of compromised credentials was used on a Microsoft Terminal Services server (now called Remote Desktop Services). Multi-factor authentication (MFA, sometimes called 2FA) was used in some areas of the network, but connections to this server were not covered. The attackers tripped at least one security alarm, but the sysadmin released the hold on the account after running malware scans.
Starting in the overnight hours from Friday to Saturday, the attackers copied 600GB of data off the network. This seems to be mostly personnel files and personal files that Library staff stored on the servers. The network provider could see this traffic looking back at network flows, but it is unclear whether this tripped any alarms itself. Although their Integrated Library System (an Aleph 500 system according to Marshall Breedingâs Library Technology Guides site) was affected, the report does not make clear whether patron demographic or circulation activity was taken.
RecoveryâRebuild and Renew
Reading between the lines a little bit, it sounds like the Library had a relatively flat network with few boundaries between systems: âour historically complex network topology ⊠allowed the attackers wider access to our network than would have been possible in a more modern network design, allowing them to compromise more systems and services.â Elevated privileges on one system lead to elevated privileges on many systems, which allowed the attacker to move freely across the network. Systems are not structured like that todayânow tending to follow the model of âleast privilegesââand it seems like the Library is moving away from the flat structure towards a segmented structure.
As the report notes, recovery isnât just a matter of restoring backups to new hardware. The system canât go back to the vulnerable state it was in. It also seems like some software systems themselves are not recoverable due to age. The British Libraryâs program is one of âRebuild and Renewâ â rebuilding with fresh infrastructure and replacing older systems with modern equivalents. In the never-let-a-good-crisis-go-to-waste category, âthe substantial disruption of the attack creates an opportunity to implement a significant number of changes to policy, processes, and technology that will address structural issues in ways that would previously have been too disruptive to countenance.â
The report notes âa risk that the desire to return to âbusiness as usualâ as fast as possible will compromise the changesâ, and this point is well taken. Somewhere I read that the definition of âpersonal characterâ is the ability to see an action through after the emotion of the commitment to action has passed. The British Library was a successful institution, and it will want to return to that position of being seen as a thriving institution as quickly as possible. This will need to be a continuous process. What is cutting edge today will become legacy tomorrow. As our layers of technology get stacked higher, the bottom layers get squeezed and compressed into thin slivers that we tend to assume will always exist. We must maintain visibility in those layers and invest in their maintenance and robustness.
Backups
They also found âviable sources of backups ⊠that were unaffected by the cyber-attack and from which the Libraryâs digital and digitised collections, collection metadata and other corporate data could be recovered.â That is fortunateâeven if the older systems have to be replaced, they have the data to refill them.
They describe their new model as âa robust and resilient backup service, providing immutable and air-gapped copies, offsite copies, and hot copies of data with multiple restoration points on a 4/3/2/1 model.â Iâm familiar with the 3/2/1 strategy for backups (three copies of your data on two distinct media with one stored off-site), but I hadnât heard of the 4/3/2/1 strategy. Judging from this article from Backblaze, the additional layer accounts for a fully air-gapped or unavailable-online copy. An example is the AWS S3 âObject Lockâ service, a cloud version of Write-Once-Read-Many (WORM) storage. Although the backed-up object is online and can be read (âRead-Manyâ), there are technical controls that prevent its modification until a set period of time elapses (âWrite-Onceâ). Presumably, the time period is long enough to find and extricate anyone who has compromised the systems before the object lock expires.
Improved Processes
The lessons include the need for better network monitoring, external security expertise retention, multi-factor authentication, and intrusion response processes. The need for comprehensive multi-factor authentication is clear. (Dear reader: if you donât have a comprehensive plan to manage credentialsâincluding enforcement of MFAâthen this is an essential takeaway from this report.)
Another outcome of the recovery is better processes for refreshing hardware and software systems as they age. Digital technology is not static. (And certainly not as static as putting a printed book on a climate-controlled shelf.) It is difficult (at least for me) to envision the kind of comprehensive change management that will be required to build a culture of adaptability and resilience to reduce the risk of this happening again.
Some open questionsâŠ
I admire the British Libraryâs willingness to publish this report that describes in a frank manner their vulnerabilities, the impacts of the attack, and what they are doing to address the problems. I hope they continue to share their findings and plans with the library community. Here are some things I hope to learn:
To what extent was the patron data (demographic and circulation activity) in the integrated library system sought and copied out?
How will they prioritize, plan, and create replacement software systems that cannot be recovered or are deemed too insecure to put back on the network?
Describe in greater detail their changes to data backup plans and recovery tests. What can be taught to other cultural heritage institutions with similar data?
This is about as close to âgreen-fieldâ development as you can get in an organization with many existing commitments and requirements. What change management exercises and policies helped the staff (and public) through these changes?
Cyber security is a group effort. It would be easy to pin this chaos on the tech who removed a block on the account that may have been the beachhead for this attack. As this report shows, the organization allowed this environment to flourish, culminating in that one bit-flip that brought the organization down.
Iâve never been in that position, but I am mindful that I could someday be in a similar position looking back at what my actions or inactions allowed to happen. Iâll probably be at risk of being in that position until the day I retire and destroy my production work credentials. I hope the British Library staff and all involved in the recovery are treating themselves well. Those of us on the outside are watching and cheering them on.
We can recognize the importance of &apossoft skills&apos while acknowledging that &apossoft&apos has multiple associations that may be misleadingly at variance with the importance of those skills.
In this post, I look at some contexts which are making such skills both more visible and more important before making some general concluding observations. It is clear that such skills are core needs for the library which is increasingly networked, relational and community focused.
We can recognize the increasing importance of &apossoft skills&apos while acknowledging that &apossoft&apos has multiple associations that may be misleadingly at variance with the importance of those skills.
Soft skills, and the contributions of the (often female) library workers who demonstrate them, have often been undervalued or gone unobserved. However, the value and visibility of this work is increasingly recognised, and indeed acknowledged as central.
For example, this recognition is a major focus of the important collectionThe Social Future of Academic Libraries: new perspectives on communities, networks and engagement. It raises up the sometimes invisible relational work of library workers, and places it very much at the center of library operations and value.
What are soft skills? Examples are emotional and social intelligence, persuasion, team-working, empathy, communication, negotiation, networking, maintaining boundaries and self-care, conflict resolution, cultural awareness, advocacy, relationship-building. I have deliberately presented these as a random list to emphasize that there is no neat definition or bounded category here.
Seth Godin contrasts them with &aposvocational&apos skills and suggests calling them &aposreal&apos skills:
Vocational skills can be taught: Youâre not born knowing engineering or copywriting or even graphic design, therefore they must be something we can teach. But we let ourselves off the hook when it comes to decision-making, eager participation, dancing with fear, speaking with authority, working in teams, seeing the truth, speaking the truth, inspiring others, doing more than weâre asked, caring and being willing to change things. We underinvest in this training, fearful that these things are innate and canât be taught. Perhaps theyâre talents. And so we downplay them, calling them soft skills, making it easy for us to move on to something seemingly more urgent. // Seth Godin
The phrase &apossoft skills&apos is quite problematic in several ways, including its gendered reception. I discuss some issues below. It is tempting to pre-empt that discussion and refer to such skills as CORE skills, where CORE stands for COmmunication, Relational and Empathetic, or some-such [although see my note about CORE below]. But although it sends out the right message â that what we have come to call &apossoft&apos skills are in fact &aposcore&apos skills â it is probably too confusing.
Similarly, although I think that they might be better, I don&apost use &aposstrategic&apos or &aposrelational&apos or &apossocial&apos in place of &apossoft.&apos Even though it is clear that &apossoft&apos is hopelessly inadequate when it comes to describing some of the social/relational/strategic skills required in today&aposs library.
Four contexts
There is a variety of contexts where the importance of soft skills is increasingly highlighted. I describe four here.
1 The relational library
A frame is always useful. I find it helpful to think of an evolution from a library which is configured around the collection, to one which is configured around the civic, learning or research needs of the people it serves. The relational library is a helpful label in this context.
One could also point to David Lankes&apos succinct provocation: &aposBad libraries build collections, good libraries build services, great libraries build communities.&apos
In an academic setting, the library is interested in more directly supporting research and learning workflows, building connections with departments, instructional practice and research groups. It is also developing richer partnerships with other campus agents, which might include, for example, the center for teaching and learning, the office of sponsored research, and student success initiatives. The library, like other campus units, will be in ongoing interaction with core university functional units such as facilities or communications. What Rebecca Bryant calls social interoperability is key.
In a public library setting, the library is welcoming the community to its space with a growing variety of creative activities and events. It is partnering with social and educational services, with local charities or cultural institutions, with schools and colleges. It is reaching previously overlooked or marginalised populations, it is developing special programs for particular language groups, it is providing services for immigrants. The awareness of the role of the public library as critical social infrastructure has been elevated by the publication of Eric Klinenberg&aposs Palaces for the People.
The library provides access to the means of creative production.
This deeper community engagement requires what we have called soft skills - communication, relational skills, cultural competence. Working with partners also involves these skills, along with persuasion, trust-building and negotiation. And given that library communities may vary quite a bit, there is a premium on flexibility and adaptability.
The use of &apossoft&apos seems especially misleading in the context of the current realities of public library work. Bringing the community into the library means bringing the whole community into the library. And so, library workers have often to engage with community issues, including mental health, homelessness, food insecurity, and a range of social circumstances for which they may be not very well prepared.
2 Challenges to value and values
In an important article about library value, Eleanor Jo Rodger characterises the public library in this way.
Similarly, public libraries are society&aposs way of paying attention to learning and equity. In the United States we hold both in high esteem, so we fund public libraries with tax revenues. // WebJunction
This was written over twenty years ago. We are in a different political environment now. What happens when learning and equity are not held in high esteem, or their value is questioned? Or when the longstanding values or policies around collections are challenged?
At the same time value questions may be asked of all types of libraries given competing financial demands, mistaken impressions about the digital environment, and so on.
There are very big issues here, but in the context of this post, I want to note again the importance of advocacy, persuasion, storytelling, communication, and relationship building. In any discussion, it is important to understand the position of others, especially if one wants to persuade them of a particular course of action.
Such &apossoft&apos skills don&apost seem very soft in this context, as the librarian interacts with the mayor, the library board, the provost or the faculty committee.
Again, the recent environment in public libraries underlines this as library workers have to be prepared to talk about collection development, events or other policies in the face of organized and persistent challenges.
3 Collaboration and partnership
Collaboration is central to library operations. Libraries also scale learning, innovation and advocacy through collaborative work. And as discussed above, libraries partner with others, whether these are other units within the parent municipality or college, or are outside.
It would actually be interesting to see some research into how much time library workers spend on collaboration and partnership. My sense is that this considerable, and that it is also growing. At the same time, libraries are looking more critically at collaboration, assessing the level of investment it requires.
At the University of Washington, I have been developing a course on library collaboration and partnerships. It has seemed to me that this is a somewhat overlooked area in library education, but also in strategy and planning, given how central it is to library operations and thinking.
In working through the course, I have been struck by how much this kind of library work also depends on what we call soft skills.
Collaboration involves communication, relationship-building, negotiation, teamwork. It involves building the trust that allows candid conversations about priorities to happen. But it is much more. Within these collaborative settings, librarians mentor colleagues from other institutions, develop confidence in committee work and presentation, and advocate for their library&aposs interests. The social and political environments that collaborative working involves depend on &apossoft&apos skills to work well and in turn they allow people to develop those soft skills.
Library workers have been stretched and stressed in very challenging ways in recent years through the pandemic, social unrest, and the real impacts of the culture wars. The murder of George Floyd made libraries and library workers recognise the need to more purposefully identify and repair harm. Librarians have had to step up to additional roles and to handle difficult situations in the workplace. Organizational and hierarchical issues have been emphasised, as, for example, in the uneven need to be physically present during the pandemic. Some may now be concerned about the uncertain impact of AI on their work, and the potential dilution of social trust and confidence as synthesized communication or creation spreads. This is especially so in this year of elections. The cumulative emotional attrition has been draining, and empathy can be difficult.
This may be leading to a change in sensibilities and expectations around libraries and the roles of library workers:
This may be unevenly manifested, but underlines the need for the library to recognize the importance of equity and empathy, in terms of both value created and values embraced. We know that libraries are social organizations supporting mental wellness, social cohesion, and personal and community development. Current experience has foregrounded these roles.
It has also foregrounded the importance of empathy and understanding in the workplace, as managers and as colleagues. Empathy, transparency and communication are central.
Writing about soft skills, Emy Nelson Decker [2020] notes a claim that "many professions that rely heavily upon empathy or listening to the needs of others (i.e. soft skills) are also notorious for burn out or "compassion fatigue."
The work of Kaetrena Davis Kendrick on empathy and self-preservation is very relevant here. See for example this WebJunction webinar with a recording and related resources.
The paragraphs above draw on a fuller discussion of equity and empathy in this post, from where the quote above also comes
Discussion - the new core
There has been some research about soft skills in the library field, notably by Laura Saunders. She has explored the knowledge, skills and attributes (KSAs) reported as necessary to library work. She identified eleven KSAs deemed core, and of those seven (and possibly eight depending on categorization) were general and could be considered &apossoft skills.&apos
She makes this interesting note about importance:
This emphasis on interpersonal and communication skills seems to align with the idea of the information professions as user-centered and customer-service oriented. Partridge, Menzies et al. asserted from their findings that âpersonality traits, not just qualifications, were critical to be a successful librarian or contemporary information workerâ (2010, p. 271), and Saunders (2015) found that some focus-group participants said they would prioritize soft skills over hard skills or content knowledge when hiring. // Saunders, 2019
This article dates from 2019 based on survey work carried out in 2017. Given the contexts described above, I imagine that soft skills would be found to be even more urgently important today.
So soft skills are critical to library work in multiple ways and to positioning the library effectively in the community it serves. However &aposSoft&apos is potentially misleading or unhelpful in several ways. In fact the use of the word may actually be damaging in particular settings, or it may suggest something that is directly counter to the actual situation.
It can suggest that such skills cannot be learned or taught. However, they can be, and how they should be tackled is an intriguing question for library education. In fact, mentoring is an important skill, as is a disposition to learning.
It can suggest that such skills are less important than so-called &aposhard&apos skills. Or are somehow easier to develop. However, as noted above, and as can be easily seen elsewhere, soft skills are very important in the workplace. And they need to be learned and practiced.
It may mean that those who have good soft skills are less valued than those with putatively harder skills, that their opinions are less valued, or that they are pushed to the sidelines of decision-making.
The language may reflect or support prejudices about gender, given the association that may be made between hard and soft, respectively, and masculine and feminine. Emy Nelson Decker notes: "This is particularly worrisome in a historically feminized field as is the case with library science."
It can suggest that so-called soft and hard skills are mutually exclusive, and that one can only optimise for one. However, consider how we want to be treated in medical settings. Soft skills are important to technology workers, say, as much as to others.
A recent Journal of Engineering Education editorial argued that it may reduce emphasis on equity and inclusion. &aposUltimately, using the term "soft skills" pushes individuals that advocate for and excel at human-focused competencies to the margins of engineering. While these skills have typically involved communication and interpersonal skills, they also involve commitments to equity and inclusion."
Dissatisfaction with the term is common. I was interested to read these two comments in the context of library collaboration.
The abilities required for collaboration tend to be poorly covered in professional competency statements, scattered around under multiple headings and also inaccurately labelled in the library literature as âsoft skillsâ, indicating fundamental gaps in understanding that need to be addressed as a matter of urgency. // Sheila Corrall, Foreword
Most people think of collaboration as a soft skill and dismiss it with a shrug. Hereâs the thing: You donât collaborate to make people feel okay, because itâs expected of you, or to earn brownie points. You collaborate because on large-scale projects, you have no choice. // Valerie Horton, American Libraries Magazine
However, a quick search shows that this dissatisfaction spreads to the general business press and elsewhere.
While technical skills are vital, "soft skills" are the glue that holds people, teams, and business units together. These skills encompass a wide range of abilities, including communication, problem-solving, critical thinking, emotional intelligence, and teamwork. They are the foundation upon which leaders and their teams develop trust, cooperation, and high performance. // It&aposs about time we abandoned the term &apossoft skills&apos, Forbes
The author argues for the adoption of &aposprofessional skills&apos (which itself might be challenged by some). The piece by Seth Godin I reference above is worth reading in full in this context.
That said, changing the name is unlikely to happen quickly or universally. I think we would benefit from an alternative term which is generally understood. However, the bigger immediate issue is value, recognition and preparation.
For the library community, soft skills are core, and critical to the success of the relational, community focused library. They are essential and valuable skills for all library workers. And what we might have called soft skills are now needed in some very difficult front-line situations where the values of the library are challenged or where library workers are engaging with social issues and stressed populations.
This importance should be recognised ... and not only performatively. But recognised in reality, with focused systematic attention when it comes to review, professional development, promotion, recruitment, human resourcing plans, strategies, job boundaries, and so on.
And it should be recognized that soft skills are not only essential, but they are very hard.
CORE. After suggesting CORE skills above (to refer to Communication, Relational and Empathy), I read Emy Nelson Decker&aposs interesting article on soft skills in academic library settings [Decker 2020]. She references the use of CORE to designate Competence in Organizational and Relational Effectiveness.
Some references
Berdanier, C. G. P. (2022). A hard stop to the term âsoft skills.â Journal of Engineering Education, 111(1), 14â18. https://doi.org/10.1002/jee.20442
Corrall, S. (2024). Foreword: The Network is the Message. In S. Pavey, The Networked Librarian: The School Librarianâs Role in Fostering Connections, Collaboration and Co-creation Across the Community. Facet.
Decker, E. N. (2020). The X-factor in academic libraries: the demand for soft skills in library employees. College & Undergraduate Libraries, 27(1), 17â31. https://doi.org/10.1080/10691316.2020.1781725
Saunders, L. (2019). Core and More: Examining Foundational and Specialized Content in Library and Information Science. Journal of Education for Library and Information Science, 60(1), 3â34. https://doi.org/10.3138/jelis.60.1.2018-0034
Saunders, L., & Bajjaly, S. (2022). The Importance of Soft Skills to LIS Education. Journal of Education for Library and Information Science, 63(2), 187â215. https://doi.org/10.3138/jelis-2020-0053
Photograph: I took the picture at the University of Washington, where I am currently based in the Information School.
Acknowledgements: I am grateful to Sari Feldman, Alicia Salaz, and Sharon Streams who generously commented on a draft.
This post is part of the Library Innovation Labâs announcements in the context of Transform: Justice, celebrating the full, unqualified release of the data from the Caselaw Access Project.
When the Lexis corporation first launched legal research terminals in the 1970s it hoped to âcrack the librarian barrier,â allowing lawyers to do their own legal research from their desks instead of sending law firm librarians through paper search indexes. Today something larger is possible: we may be able to âcrack the justice barrier,â allowing people to answer a larger and larger number of legal questions for themselves. According to the Legal Services Corporation, low-income Americans do not receive any or enough legal help for 92% of their civil legal problems, so there would be a huge public benefit to making legal resources more widely available.
We want academics and nonprofits at the table in discovering the next generation of legal interfaces and helping to close the justice gap. It is not at all clear yet which legal AI tools and interfaces will work effectively for people with different levels of skill, what kind of guardrails they need, and what kind of matters they can help with. We need to try a lot of ideas and effectively compare them to each other.
Thatâs why weâre releasing a common framework for scholarly researchers to build novel interfaces and run experiments: the Open Legal AI Workbench (OLAW). In technical terms, OLAW is a simple, well-documented, and extensible framework for legal AI researchers to build services using tool-based retrieval augmented generation.
Weâre not done building this yet, but we think itâs time to share with the legal technology and open source AI communities for feedback and collaboration.
Out of the box, OLAW looks like this:
Video: OLAWâs chatbot retrieving court opinions from the CourtListener API to help answer a legal question. Information is interpreted by the AI model, which may make mistakes.
What is OLAW for?
OLAW itself is not a useful legal AI tool, and we didnât build it to be used as-is. Instead, OLAW is intended to rapidly prototype new ideas for legal tools. OLAW is an excellent platform for testing questions like:
How are legal AI tools affected by the use of different prompts, models, or finetunings?
How can legal AI tools best incorporate different data sets, such as caselaw, statutes, or secondary sources?
What kind of search indexes are best for legal AI tools (boolean, semantic search etc.)?
How can users be best instructed to use legal AI tools? What interface designs cause users at different skill levels to engage with the tool effectively and manage its limitations?
What kind of safety guardrails and output filters are most effective and informative for legal AI tools?
What kind of information about the toolâs internal processes should be exposed to users?
What kind of questions are better or worse suited for legal AI tools, and how can tools help guide users toward effective uses and away from ineffective ones?
⊠and many others. If you want to experiment with legal AI search tools, and you have a programmer who can write some basic Python, OLAW will give you all the knobs to turn when you get started.
Why is OLAW needed now?
Legal AI tooling is a wide-open design space with the potential to help a lot of people. We want to make it easier for the academic and open source communities to get involved in exploring the future of these tools.
The commercial legal research industry is undergoing the fastest period of exploration since the invention of the internet. While there has been incremental progress, the boolean search techniques still used by lawyers today would be recognizable to lawyers using LEXIS terminals in the 1970s. But now, everything is changing: commercial vendors like Westlaw, LexisNexis, and vLex all introduced novel AI-based search interfaces in the last year.
We want to support research that happens outside the legal industry as well as inside, and research that is published publicly and peer-reviewed as well as proprietary. Thatâs needed because lots of people who need legal help may never be profitable to serve; because lots of novel tools are now possible beyond the ideas any one company can explore; and because everyone will be better off if there is rigorous, public research available on what works and what doesnât.
Whatâs next?
We currently have the core concept implemented: a simple, well documented testbed using tool-based retrieval augmented generation that is easy to modify. These are some directions we would like to explore next:
Automatic benchmarking frameworks. OLAW currently requires manual testing to evaluate the impacts of design experiments. Some impacts may be testable automatically; we would like feedback on the best way to design effective benchmarks.
Additional tools. OLAW ships with just one tool, which runs searches against the CourtListener API. We would welcome additions of default tools that search other legal resources.
Structured extension points. We have a standard plugin-based approach to adding tools, but other extensions such as output filters or display methods require patches to the underlying source code. We would like help identifying other extension points that would benefit from standardized interfaces for testing.
We welcome the communityâs input on these and other areas for improvement.
How do I get involved?
OLAW is currently best suited for programmers who can host their own web software and make their own modifications. To get started, head over to our GitHub repo to get installation instructions, file issues, send pull requests, or comment in the discussion area.
Credits
Thanks to Jeremiah Milbauer and Tom Zick for their input on this effort; all mistakes are by Jack and Matteo.
Research information management systems (RIMS) are an area of growth and investment for US libraries, which OCLC Research has explored in several previous research reports. Recently the OCLC Research Library Partnership hosted a webinar where we learned about the RIMS implementations at three partner institutions through presentations from:
Jason Glenn, Program Director for Research Information Management Services, Carnegie Mellon University Libraries
Brian Mathews, Associate Dean, Research & Innovation, Carnegie Mellon University Libraries
Laura Simon, Research Support Librarian, Bernard Becker Medical Library, Washington University School of Medicine in St. Louis Missouri
Mark Zulauf, Researcher Information Systems Coordinator, University of Illinois Urbana-Champaign
Each presenter shared about the origin, history, and current status of their RIMS implementation, along with information about system scope, uses, and institutional partners. While Iâm providing a high level synthesis in this post, I encourage you to review the publicly available video recording and slides.
RIMS support multiple use cases
In the 2021 OCLC Research report, Research Information Management in the United States, we identified six discrete use cases for US RIMS systems, and the webinar presenters shared about the four use cases currently being supported at their institutions:
Public portals that feature profiles of individual researchers affiliated with the institution, to support expertise discovery and institutional reputation management
Metadata reuse through repurposing of RIMS data for dynamic updates to faculty or unit web pages and directories
Strategic reporting and decision support through reports and visualizations, often in response to queries about research collaboration and impact
Faculty activity reporting, to support annual academic progress reviews and/or tenure and promotion workflows
Public portals
The need to support reputation management and expertise discovery through a public portal was the impetus for RIMS adoption at both the University of Illinois and Washington University School of Medicine. Both institutions license the Pure system from Elsevier, and each institution has about 3,000 public faculty and researcher profiles. At Carnegie Mellon, which licenses Symplectic Elements as part of the broader Digital Science suite utilized there, about one third of 1,500 faculty profiles are now publicly available.
The expertise discovery portals support campus users in many ways at all three institutions. For instance, Laura described how Research Profiles is used at WashU to promote mentors and enhance recruitment for students, postdocs, and residents and fellows. Today Illinois Experts, which has been live since 2016, has about 40,000 visitors/month, and is used to find research collaborators, support media requests and links to research outputs, and identify reviewers for fellowship and award committees.
These public portals promote a consistent brand image, which is explicitly leveraged in Washington University School of Medicine marketing materials, particularly as a single, aggregated referral site about the schoolâs research productivity.
Metadata reuse
Colleges, departments, labs, and faculty members have long maintained their own web pages. However, by leveraging the Pure API, many Illinois units now receive dynamic updates from Illinois Experts, reducing burdensome data reentry and ensuring that information is current and synchronized with other campus pages. Similarly, the WashU Department of Medicine utilizes RSS feeds to maintain a current list of publications for each of its divisions. RSS feeds are also used to maintain publication lists for laboratory or individual web pages at Illinois.
Strategic reporting and decision support
RIMS are part of the toolkit that libraries are increasingly utilizing to support data-driven decision making. Both Illinois and Carnegie Mellon use RIMS data to support timely and accurate decision support for campus needs such as accreditation, bibliometric and research impact analysis, and grant proposal preparation. RIMS data has been leveraged at both institutions to answer questions about the breadth of research in areas such as food scarcity or AI, revealing expertise spread across many campus units. Illinois has also used RIMS to explore collaboration networks, by quantifying institutional collaborations with external industry partners and identifying units where researchers have co-authored with other researchers at minority-serving institutions.
Carnegie Mellon, in particular, is investing in this area, working to build expertise and capacity to support data visualization and reporting for campus users, similar to the type of library-based research analytics and decision support resources in place at Virginia Tech.
Faculty activity reporting (FAR)
At many (and probably most) US research institutions, RIMS facilitating public profiles are separate from platforms that support faculty activity reporting, annual performance reviews, and tenure and promotion processes. Furthermore, these faculty information system (FIS) processes are often still decentralized at the college level, although campus centralization of these workflows is trending upward, as seen at institutions like UCLA, Penn State, and Texas A&M.
Neither Illinois nor the Washington University School of Medicine are currently supporting the faculty activity reporting (FAR) use case. However, Carnegie Mellon is supporting FAR in a limited way by leveraging Elements data to develop standardized CVs for reappointment, promotion, and tenure processes for College of Fine Arts faculty.
Greater interoperability between these systems offers significant potential to reduce redundant data entry practices for faculty and staff, and I see growth as likelyâparticularly as more institutions seek to centralize FIS workflows for increased efficiency and cost savings.
Unsupported use cases
Unsurprisingly, given weaker national mandates in the United States, the presenters didnât mention leveraging the other two use cases described in the 2021 report:
Open access workflows that simplify researcher deposit processes into institutional repositories
Compliance monitoring through the tracking and reporting of information about research activities or open research, in response to external mandates.
These use cases dominate in other national environments such as the UK, Australia, Belgium, Netherlands, and Finland, and offer potentialities for future US uses as well.
Challenges
Each of the presentations made it clear that successfully implementing a RIMS is extremely challenging, including such things as:
Faculty skepticism
No mandates for unit or researcher buy-in
Churn in campus leadership, resulting in uneven support (or even awareness), which can put a RIMS program at risk of losing support
Tensions between institutional and researcher needs
Decentralization
Resource limitations
The absence of any implemented use cases to build upon (i.e., you are starting from scratch)
The necessity of building trust-based collaborative relationships with other campus units
There are also specific limitations related to the data in the RIMS:
Need to enrich with data from a broad range of internal and external sources
Limitations of scope and usefulness of local HR data
Gaps in coverage for humanities, arts, and social sciences
Data enrichment and expanding uses over time
Mark Zulaufâs slides provide a powerful visual representation of maturation of the Illinois Experts system since its launch in 2016. At that time, it primarily aggregated publications for affiliated researchers, harvested from Scopus, with the public portal as the only user of that data. Today, the aggregated dataset is much more robust and useful for facilitating campus insights, as it also includes patents, press/media reports, honors and awards, and researcher datasets ingested from multiple campus sources.
The 2023 slide also visualizes the increase in use cases and data consumers. In addition to the expertise discovery portal, the data is shared via API with campus web pages and the ORCID registry. And it is also available for use for institutional analysis and reporting. The system is now widely used across campus, providing previously unavailable insights and saving time in many ways, the result of a sustained investment by the University of Illinois Library and the Office of the Vice Chancellor for Research.
Strategies for success
The presenters described some of the strategies they applied to make forward progress with their RIMS, despite the challenges.
Build a richer dataset. Most RIMS implementations begin with metadata harvesting from external sources like Scopus, but, as Mark described, this is really just the starting place. By partnering with other campus units, the RIMS can include local data like patents and academic honors, making for a more robust view of campus research activities. Carnegie Mellon shares this vision, with a view of adding institutional facilities and equipment to their RIMS, to provide additional insights about the connections and ROI of these resources. WashU adds local membership data from on-campus centers and institutes to showcase relationships to help support buy-in.
Directly engage with campus units. To support metadata reuse in other campus systems, the Illinois Library worked with the web development teams in the colleges of Liberal Arts and Sciences and Education on API integration into their websites. This investment has played a critical role in securing campus buy-in for Illinois Experts, first with administrators and later with faculty.
Tailor solutions to unit needs. Operating in a decentralized, even federated campus environment, Carnegie Mellon has worked to identify the pain points of individual colleges and develop a plan for each unit. For the Tepper School of Business, this has meant leveraging Elements data to support accreditation reporting, while the library has provided aggregated publications for analysis to the College of Engineering.
Stay laser focused. Related to their RIMS effort that began in 2019, Laura Simon emphasized the need to stay focused on the core objective of expertise discovery (the public portal use case). For Washington University School of Medicine, a conservative approach to deliver on this goal, despite other interesting opportunities, has helped them succeed.
This is a marathon, not a sprint
A major takeaway from this webinar was that achieving success with RIMS in the US takes time. It furthermore requires focus, investment, collaboration, and commitment to the project despite the significant challenges. Libraries are achieving success as campus leaders in these implementations, by leveraging library expertise with publications metadata, scholarly communications, persistent identifiers, publications indexes, open research, bibliometrics, and much more. I hope you will take the time to watch the full webinar presentation.
Since 2023, we are meeting with more than 100 people to discuss the future of open knowledge, shaped by a diverse set of visions from artists, activists, scholars, archivists, thinkers, policymakers, data scientists, educators, and community leaders from everywhere.
The Open Knowledge Foundation team wants to identify and discuss issues sensitive to our movement and use this effort to constantly shape our actions and business strategies to deliver best what the community expects of us and our network, a pioneering organisation that has been defining the standards of the open movement for two decades.
Another goal is to include the perspectives of people of diverse backgrounds, especially those from marginalised communities, dissident identities, and whose geographic location is outside of the worldâs major financial powers.
How openness can accelerate and strengthen the struggles against the complex challenges of our time? This is the key question behind conversations like the one you can read below.
*
This week we had the opportunity to speak with Rebecca Firth, Executive Director of Humanitarian OpenStreetMap Team (HOT), an international team dedicated to humanitarian action and community development through open mapping.
Rebecca joined HOT in 2016 after working in digital and innovation consulting. She holds a Bachelorâs and Masterâs degree in Geography from the University of Cambridge, UK, where she focused on international development. Before taking on the role of Executive Director, Rebecca served as Interim Executive Director and Senior Director of Strategy & Programme. She has worked to improve HOTâs ability to provide longer-term capacity building to OpenStreetMap communities through training and micro-grants, to increase the use of OpenStreetMap by NGOs and other partners, and to spread HOTâs message globally to new volunteers and partners. Rebecca also led HOTâs application for the 2020 Audacious Project. She has lived and worked in Borneo, Japan, Colombia and Peru, focusing on public health, education, disaster risk reduction and organisational management. Rebecca is currently based in London, UK.
This conversation took place online on 27 February 2024 and was moderated by Renata Ăvila, CEO of OKFN, and Lucas Pretti, OKFNâs Communications & Advocacy Director.Â
We hope you enjoy reading it.
*
Renata Ăvila: For a lawyer like me, itâs always been very clear what the barriers to openness are â inaccessible laws, closed databases, proprietary licences and so on. Youâre a geographer by training, and Iâm curious about the geographical perspective on openness. What role does openness play in your practice and work?
Rebecca Firth: I love that, Iâve never been asked that question before. Iâm sure every geographer you talk to would have a different opinion. For me, thereâs something about geography: in theory, everyone can see it, but in practice, not everyone can. So what we do with HOT is sort of map places that might not be visible in other data sources, but they are very visible to the people who live in a particular place and are aware of the challenges that they face.
Thereâs this kind of strange intersection between place, which is obviously an intimate and local thing, and openness when itâs opened up to millions and millions of people. The very local nature of geography kind of collapses with the global appetite that we all have for open data. Because weâre not mapping anything that people donât already know. Itâs just that we have to put it somewhere where people can access it so that it can be used in the best possible way. Obviously, openness is a huge lever to achieve that.
Renata Ăvila: I find it very interesting how open mapping can become an infrastructure in places that lack it. I come from Guatemala â in places like this, sometimes you have the map, but the social layer is completely missing. You have lived in so many places. Based on this experience, please tell us a little bit about the perspective of community participation in mapping and the role of open mapping in critical moments.
Rebecca Firth: Thereâs a lot of representation and justice that happens not just in mapping a place, but in people from that place being the ones to map that place. Because often data is something thatâs used as a tool by one person against another. A good example would be indigenous land: there are a lot of people who do have data â like mining companies or resource companies â and a lot of people who donât, like local communities. Thereâs a clash there.
So the practice of community mapping is about trying to get not just the data into the hands of people who havenât traditionally had access to it, but the power to create it, update it, manipulate it, and figure out how to use it for their purposes. Thatâs only really possible through techniques that lower the barrier to entry to mapping and participating in data as much as possible. It has to happen through open source, because obviously these are communities where proprietary tools arenât going to reach the lateral scale that we hope they will.
The thing that keeps me passionate about the work that HOT is doing is the premise that data shouldnât be a cause of human suffering. All the people who are working on human suffering need access to information that is very difficult and expensive for them to get. So if we can be a part of solving that problem, thatâs amazing.
Renata Ăvila: Weâve spent many years in the open movement discussing licensing, standards, interoperability and so on. And I think there are two missing layers, two unfinished pieces. One is crowdsourced participation and the community component of that â thereâs always a blurred line between exploitative extractivism and meaningful participation and collaboration. The other is the governance structure. We would like to know more about how HOT is organised, how you work locally and globally, and how you connect with communities. Because I think the wider open movement has a lot to learn from this.
Rebecca Firth: In terms of our structure, HOT is run by a group of a few hundred voting members â these are super dedicated volunteers or participants in past projects. One of the most important things they do is elect our board of directors. Weâre very fortunate to be one of the few multi-million dollar non-profits in the world to have a board that is 100% elected by the community. The benefits of that are that the community is really at the forefront of all the major decisions of the organisation. And the board represents the community. Thatâs one of the unique things about HOT.
The voting members also coordinate and lead activities in a number of working groups, which are really great spaces where the board, the community and the staff can interact. Itâs a meeting place where the community can engage with the staff, sharing their ideas and needs in a formalised way. So the staff are there to serve the community. That is a big part of their role as staff.
Of course, as a growing organisation, there are tensions. One thing thatâs really hard about being an open community is also being an organisation. As an organisation, we have to meet deadlines for proposals, projects, funding, budgets and so on, which obviously donât work on the same timescales as communities. Also, the organisation canât grow enormously, but the community can. So our goal is not just to hire infinitely more people, but to grow the community exponentially, which is a challenge in terms of managing the different dynamics and tensions that thatâs going to create.
As an open source movement, the sky is the limit. You can have an infinite mission, but what is your ability to actually achieve it? We set a goal a few years ago to map an area where 1 billion people live, and that was going to be the most vulnerable billion people in the world, those who are either at very high risk of disaster or experiencing very high levels of multidimensional poverty.
But how do you set up your organisation to do that? No one in the world can get close to 1 billion people. So we have a very decentralised structure where most of our work is done through four regional hubs â Latin America and the Caribbean, Western North Africa, Eastern Southern Africa and Asia Pacific. Each of these hubs serves about 20 to 25 countries with a staff of about 15 people. Their aim is to develop leaders in the countries they serve who are experiencing the problems and have a deep understanding of the solutions to those problems.
I think itâs a really nice system. Iâm really proud of it, but itâs also incredibly difficult.
Lucas Pretti: Could you give us some concrete examples off the top of your head of recent open mapping projects that inspire you? I mean, what is the work that HOT is doing at the end of the day?
Rebecca Firth: A really good example of local and global coordination working really well was the response to the earthquake in Turkey and Syria in February last year, just over a year ago. This was a community led disaster response and a good example of the power of local communities really changing the way disaster response happens.Â
This response was in collaboration with and led by a Turkish open mapping community called Yer Ăizenler. They sprang into action very quickly when this event happened in collaboration with us. The affected areas were very densely populated and only partially mapped. So we did remote mapping of the affected areas in both Turkey and Syria, and we had almost 7,000 volunteers around the world who joined forces to map about 1.5 million homes and 66,000 kilometres of roads.
It was great in terms of the amount of mapping that was done, but of course mapping is pointless if itâs not used. So the key role of HOT was to make sure that we had partnerships with responding organisations, including government and local communities. I was really proud of this case because maps were used at every single stage, including search and rescue, which is often the hardest thing to get maps used for because you need them in the right hands incredibly quickly. We also had individual doctors, medics, people facilitating the delivery of medical care, people setting up infrastructure for temporary shelters, and a story of someone trying to get electricity to a tent city.
Today, OpenStreetMap is the standard expectation of any humanitarian responder in a crisis. I got a figure the other day that there have been 330,000 downloads of HOT data for humanitarian and development interventions in the last three years, so our data has been used for impact 330,000 times. Itâs really amazing the scale weâve reached, something Iâm sure the early dreamers who created HOT out of the Haiti earthquake response in 2010 would be very proud of.Â
Renata Ăvila: At OKFN weâve been working hard on standards and data interoperability in projects like Frictionless Data. This year we are developing the Open Data Editor, which will be a very simple, no-code solution for data manipulation and publishing. Since you specifically mentioned data, Iâm curious about the friction you face when working with data. When I say friction, I also mean social friction, institutional friction and so on.
Rebecca Firth: Indeed, we face many of them. On the institutional side, things are getting better. Thereâs an expectation in almost every part of our global economy that decisions are going to be based on data, and good data is going to be required, thatâs a trend thatâs happened in the world over the last 10 years. And I think thatâs helpful for us.
In terms of social friction, obviously how you map and who decides how you map is a really contentious issue and one that a lot of people would have different perspectives on. I can think of some personal experiences Iâve had with this. We once did some local mapping with a community in Peru, and we were tagging the village houses made with adobe, which is the name of the local mud bricks that the houses are made of. At some point, that got changed back on the map by someone in another country saying that the buildings should be brick or whatever. So we are standing in this village at this moment and we can see that this is made of adobe. Who has the power in this interaction? The indigenous person in the community or the person who knows how to do the mass undoing of edits? Open communities also generally reflect (and sometimes amplify) the power dynamics of the world. I think part of HOTâs role is really important to help navigate that.
On the technical side, what weâre trying to work on with our team is how to lower the barrier to entry for mapping and using maps. When I started, the tools were just incredibly difficult. They were all open, but itâs not really open if you canât figure out how to use it. One of the parts of our vision is that everyone can access and contribute to the map, and that open map data is available and used for impact.
So one of the frictions we have is making sure that the process is really open. Iâm not as well placed as most members of your community to debate the exact meaning of the word open, but for me, openness is not just about open data, itâs about open processes, itâs about open policies, itâs about making sure that everything is actually accessible and freely usable. We see so many examples of open data that is still impossible for people to use, whether itâs because itâs ridiculously large and you have to pay for cloud hosting, or because itâs a PDF that you canât really work with. Itâs a huge frustration. I wish the open community would take more seriously the definition of openness as accessibility rather than availability, because thereâs such a big difference between the two.
Renata Ăvila: Absolutely. Iâm glad you mentioned that. We recently joined the Digital Public Good Alliance (DPGA), and that brings me to the importance of standards and the horizontal effects they can have on communities. In particular, the importance of fighting to get those standards adopted by big players, big governments, big aid agencies and so on.Â
Some of the communities that intersect with both openness and maps are those working to mitigate the climate crisis. Of course, there is an element of unpredictability in natural disasters like earthquakes. Coming from Guatemala, I can tell you that you can never predict when the next big one is going to hit. But we can certainly predict other impending disasters. How can members of the open knowledge community work better with members of your communities to join our efforts in trying to solve the most pressing problems of our time, such as the climate crisis?
Rebecca Firth: I didnât think the conversation would go in that direction, and Iâm glad it did because thematically I think itâs so important for people in open communities to engage with why climate change is such a big deal. Like all NGOs, we are used to working on impact areas by categories, such as public health, disaster response, gender equality, displacement, safe migration, climate resilience and sustainability. But what is happening now is that we have climate-related disasters, which lead to displacement, which leads to disease outbreaks, which disproportionately affect women and girls, and so on. So all of these impacts are now completely overlapping and cross-cutting. We need open data highways behind and above all of these areas. Never before has there been a greater need to have a panacea of data that touches on all of these issues, rather than siloed efforts.
Climate is a very, very local thing. The way you experience climate change is going to be radically different depending on where you live. At the moment there are climate models produced by scientists and universities that show a whole country as red, amber or green, based on a rating that is simply not the experience of the people who live there. Even at the city level, there are huge differences for people who live in vulnerable housing, or at the bottom of a hill, or in places where there is no shade, and so on. Experts are looking at this problem at a global level, but the point is how to visualise it locally and add some truth to these reports.
Sometimes I have conversations with people who donât want to fund climate mitigation work because theyâre interested in funding emissions reductions. They say âWeâre not there yet, itâs 10, 20 years awayâ. And that is not true! We are working with communities that are affected by climate change right now. Itâs just that these funders donât know about it because they see a global model that turns a particular country or locality green.
I really think thereâs an important role for open communities. The thing that would help us collaborate and get there faster is a really honest commitment to a minimum-viable product. Hereâs an example. Thereâs this amazing project in Liberia with iLab Liberia where theyâre trying to map the resilience of buildings in coastal cities to flash floods. And they did it by mapping how deep the foundations are by the number of fingers. The map shows where all the buildings are with foundations that are one finger deep, two fingers deep, three fingers deep and so on. This has a huge impact on how resilient that building will be before the next flood.
Something similar happened in Tanzania, where they tried to record historical flash floods according to each residentâs memory of how far the water reached their body. I would consider that really good data.Â
Thatâs the kind of thing our communities need to work together on the most. What is the minimum data needed to solve this problem? If we can get that, weâll be fine. But if weâre arguing about data models and schemas and not everything being perfect, then weâre never going to get out of that conversation.
Lucas Pretti: I really like that. I think we are on the right path of collaboration, starting with a very minimum viable product, which is Open Data Day 2024 as part of HOT OpenSummit â23-24. Your sponsorship through mini-grants focused on open mapping activities was a game changer this year. I think our two organisations share a recent practice of moving away from centralised, self-focused events towards supporting community events. Iâd like you to talk about that. Do you think itâs a trend among global organisations that are as embedded in communities as we are?
Rebecca Firth: You are right, we used to have this thing called the HOT Summit, which was a wonderful event, but it was a conference for us, limited to 200 people who could attend. I think it basically did not do what it was supposed to do. So, thanks to the community working group and the community staff at HOT, we took a completely different approach and asked ourselves, where are people talking about open mapping and open data and how can we support them to do that better?
So they came up with the idea of the OpenSummit. The idea is that HOT can support a range of different global events, from conferences to workshops to just hosting a session at another big event, and so on. It really opens up who can participate. Last year we supported 13 events. There were 113 sessions on open mapping attended by 300 people. And we also had 122 scholarships for community mappers to go to those events. So itâs been amazing in terms of really opening us up and getting out of ourselves.Â
I think it has a similar ethos to Open Data Day. I mean, we both want people to build partnerships and networks and collaborations to do their own thing. What weâve learned from this is yet another example of how sticky the community is. Leaders are nurtured through these events because itâs the relationships they spark that keep them going. The more events we can support, the better chance we have of finding people who want to lead this mapping work in their countries.
Renata Ăvila: Beyond Open Data Day, the same thing is happening with the Open Knowledge Network and our Global Directory. It is almost like having different red phones everywhere that you can just call when something is happening in a country. Exponential change happens when local community members take action and share their knowledge to help the wider community.
Rebecca Firth: Building on that, I think one mistake weâve made in the past is communicating only in terms of big numbers: weâre going to cover an area of âone billion peopleâ, we need âone million volunteersâ, weâre going to work in â94 countriesâ. These huge top lines are obviously important when youâre defining a key overarching mission for an organisation, and we are doing them. But, the reality is often that the majority of local mapping is done by less than 10 people.Â
So practically, in the work we do each day, we donât need to bring 3,000 people to a conference about HOT. We need to turn those three local mappers into six local mappers, and that will lead us to double the amount of local data available. And that will get us to one billion! One thing Iâve learned from the community working groups is that mass global campaigns may reach a lot of people, but not in a way that deeply nourishes a local community, and thatâs what we need to focus on.
The value that can be extracted from data decays rapidly with time.
Thus companies would rather invest in current than archival data.
Thus archival media and systems are a niche market.
Thus archival media and systems lack the manufacturing volume to drive down prices.
Thus although quasi-immortal media have low opex (running cost), they have high capex (purchase cost).
Especially now interest rates are non-zero, the high capex makes the net present value of their lifetime cost high.
Archival media compete with legacy generic media, which have mass-market volumes and have already amortized their R&D, so have low capex but higher opex through their shorter service lives.
Because they have much higher volumes and thus much more R&D, generic media have much higher Kryder rates, meaning that although they need to be replaced over time, each new unit at approximately equal cost replaces several old units, reducing opex.
Especially now interest rates are non-zero, the net present value of the lower capex but higher opex is likely very competitive.
Below the fold I look into why, despite this, Microsoft has been pouring money into archival system R&D for about a decade.
Background
Eleven years ago Facebook announced they were building entire data centers for cold storage. They expected the major reason for reads accessing this data would be subpoenas. Eighteen months later I was finally able to report on Kestutis Patiejunas' talk explaining the technology and why it made sense in More on Facebook's "Cold Storage". They used two different technologies, the first exploiting legacy generic media in the form of mostly-powered-down hard drives. Facebook's design for this was:
aimed at limiting the worst-case power draw. It exploits the fact that this storage is at the bottom of the storage hierarchy and can tolerate significant access latency. Disks are assigned to groups in equal numbers. One group of disks is spun up at a time in rotation, so the worst-case access latency is the time needed to cycle through all the disk groups. But the worst-case power draw is only that for a single group of disks and enough compute to handle a single group.
Why is this important? Because of the synergistic effects knowing the maximum power draw enables. The power supplies can be much smaller, and because the access time is not critical, need not be duplicated. Because Facebook builds entire data centers for cold storage, the data center needs much less power and cooling. It can be more like cheap warehouse space than expensive data center space. Aggregating these synergistic cost savings at data center scale leads to really significant savings.
Patiejunas figured out that, only at cloud scale, the economics of archival storage could be made to work by reducing the non-media, non-system costs. This insight led to the second technology, robots full of long-lived optical media:
It has 12 Blu-ray drives for an entire rack of cartridges holding 10,000 100TB Blu-ray disks managed by a robot. When the robot loads a group of 12 fresh Blu-ray disks into the drives, the appropriate amount of data to fill them is read from the currently active hard disk group and written to them. This scheduling of the writes allows for effective use of the limited write capacity of the Blu-ray drives. If the data are ever read, a specific group has to be loaded into the drives, interrupting the flow of writes, but this is a rare occurrence. Once all 10,000 disks in a rack have been written, the disks will be loaded for reads infrequently. Most of the time the entire Petabyte rack will sit there idle.
In theory Blu-ray disks have a 50-year life, but this is irrelevant:
No-one expects the racks to sit in the data center for 50 years, at some point before then they will be obsoleted by some unknown new, much denser and more power-efficient cold storage medium
Every few months there is another press release announcing that some new, quasi-immortal medium such as 5D quartz or stone DVDs has solved the problem of long-term storage. But the problem stays resolutely unsolved. Why is this? Very long-lived media are inherently more expensive, and are a niche market, so they lack economies of scale. Seagate could easily make disks with archival life, but they did a study of the market for them, and discovered that no-one would pay the relatively small additional cost. The drives currently marketed for "archival" use have a shorter warranty and a shorter MTBF than the enterprise drives, so they're not expected to have long service lives.
The fundamental problem is that long-lived media only make sense at very low Kryder rates. Even if the rate is only 10%/yr, after 10 years you could store the same data in 1/3 the space. Since space in the data center racks or even at Iron Mountain isn't free, this is a powerful incentive to move old media out. If you believe that Kryder rates will get back to 30%/yr, after a decade you could store 30 times as much data in the same space.
The key parameter of these archival storage systems isn't the media life, it is the write bandwidth needed to keep up with the massive flow of data to be archived:
While a group of disks is spun up, any reads queued up for that group are performed. But almost all the I/O operations to this design are writes. Writes are erasure-coded, and the shards all written to different disks in the same group. In this way, while a group is spun up, all disks in the group are writing simultaneously providing huge write bandwidth. When the group is spun down, the disks in the next group take over, and the high write bandwidth is only briefly interrupted.
The lesson Facebook taught a decade ago was that the keys to cost-effective archival storage were first, massive scale, and second, high write bandwidth. Microsoft has learned the lesson and has been working to develop cloud-scale, high write bandwidth systems using two quasi-immortal media, DNA and silica.
This single-channel device, which occupied a tabletop, had a throughput of 5 bytes over approximately 21 hours, with all but 40 minutes of that time consumed in writing âHELLOâ into the DNA.
Rob Carlson's The Quest for a DNA Data Drive provides a useful overview of the current state of the art. Alas he starts by using one of my pet hates, the graph showing an immense gap between the "requirements" for data storage and the production of storage media. Carlson captions the graph:
Prior projections for data storage requirements estimated a global need for about 12 million petabytes of capacity by 2030. The research firm Gartner recently issued new projections, raising that estimate by 20 million petabytes. The world is not on track to produce enough of todayâs storage technologies to fill that gap.
<rant> Carlson's point is to suggest that there is a huge market for DNA storage. But this ignores economics. There will always be a "requirement" to store more data than the production of storage media, because some data is not valuable enough to justify the cost of storing it. The "gap" could only be filled if media were free. Customers will buy the storage systems they can afford and prioritize the data in them according to the value that can be extracted from it. </rant>.
The size of the market for DNA storage systems depends upon their cost. In 2018's DNA's Niche in the Storage Market I imagined myself as the marketing person for a DNA storage system and posed these challenges:
Engineers, your challenge is to increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.
Finance team, your challenge is to persuade the company to spend $24M a year for the next 10 years for a product that can then earn about $216M a year for 10 years.
For a DNA drive to compete with todayâs archival tape drives, it must be able to write about 2 gigabits per second, which at demonstrated DNA data storage densities is about 2 billion bases per second. To put that in context, I estimate that the total global market for synthetic DNA today is no more than about 10 terabases per year, which is the equivalent of about 300,000 bases per second over a year. The entire DNA synthesis industry would need to grow by approximately 4 orders of magnitude just to compete with a single tape drive.
One of our goals was to build a semiconductor chip to enable high-density, high-throughput DNA synthesis. That chip, which we completed in 2021, demonstrated that it is possible to digitally control electrochemical processes in millions of 650-nanometer-diameter wells.
The main problem is that it employs a volatile, corrosive, and toxic organic solvent (acetonitrile), which no engineer wants anywhere near the electronics of a working data center.
Moreover, based on a sustainability analysis of a theoretical DNA data center performed my colleagues at Microsoft, I conclude that the volume of acetonitrile required for just one large data center, never mind many large data centers, would become logistically and economically prohibitive.
Although it is the industry standard, this isn't the only way to write DNA:
Fortunately, there is a different emerging technology for constructing DNA that does not require such solvents, but instead uses a benign salt solution. Companies like DNA Script and Molecular Assemblies are commercializing automated systems that use enzymes to synthesize DNA. These techniques are replacing traditional chemical DNA synthesis for some applications in the biotechnology industry. The current generation of systems use either simple plumbing or light to control synthesis reactions. But itâs difficult to envision how they can be scaled to achieve a high enough throughput to enable a DNA data-storage device operating at even a fraction of 2 gigabases per second.
The University of Washington and Microsoft team, collaborating with the enzymatic synthesis company Ansa Biotechnologies, recently took the first step toward this device. Using our high-density chip, we successfully demonstrated electrochemical control of single-base enzymatic additions.
Novel enzymatic methods are poised to become the dominant processes for de novo synthesis of DNA, promising functional, economic, and environmental advantages over the longstanding approach of phosphoramidite synthesis. Before this can occur, however, enzymatic synthesis methods must be parallelized to enable production of multiple DNA sequences simultaneously. As a means to this parallelization, we report a polymerase-nucleotide conjugate that is cleaved using electrochemical oxidation on a microelectrode array. The developed conjugate maintains polymerase activity toward surface-bound substrates with single-base control and detaches from the surface at mild oxidative voltages, leaving an extendable oligonucleotide behind. Our approach readies the way for enzymatic DNA synthesis on the scale necessary for DNA-intensive applications such as DNA data storage or gene synthesis.
This has, as the article points out, potential for dramatically reducing the current cost of writing DNA, but it is still many orders of magnitude away from being competitive with a tape drive. The good Dr. Pangloss can continue to enjoy the optimism for many more years.
Silica
The idea of writing data into fused silica with a femtosecond laser is at least a decade and a half old
In 2009, Hitachi focused on fused silica known for its excellent heat and water resistance as a recording medium for long-term digital storage. After proposing the use of computed tomography to read data recorded with a femtosecond pulse laser, fused silica glass was confirmed as an effective storage medium. Applying this technology, it is possible to achieve multi-layer recording by changing the laserâs focal point to form microscopic regions (dots) with differing refractive indices. In 2012, a method was developed with Kyoto University to read the recorded dots in 4 layers using an optical microscope (recording density equivalent to a CD), and in 2013, 26 layer recording was achieved (recording density equivalent to a DVD). In order to increase recording density for practical applications, one means is to increase the number of recording layers. At the 100-layer level of recording density equivalent to a Blu-ray disc however issues arose in dot quality degradation and read errors resulting from crosstalk of data recorded in other layers.
But in the last few years Microsoft Research has taken this idea and run with it, as they report in a 68-author paper at SOSP entitled Project Silica: Towards Sustainable Cloud Archival Storage in Glass. It is a fascinating paper that should be read by anyone interest in archival storage.
Like me, the authors are skeptical of the near-term prospects for DNA storage:
Technologies like DNA storage offer the promise of an extremely dense media for long-term data storage. However, the high costs and low throughputs of oligonucleotide synthesis and sequencing continue to hamper the technologyâs feasibility. The total amount of data demonstrably stored in DNA remains on the order of MBs, and building a functional storage system that can offer reasonable SLAs underpinned by DNA is a substantial challenge. Alternative DNA-based technologies like dNAM attempt to bypass costly sequencing and synthesis steps, sacrificing density down to densities comparable with magnetic tape.
Hence their focus on a medium with, in theory, a somewhat lower volumetric density. From their abstract:
This paper presents Silica: the first cloud storage system for archival data underpinned by quartz glass, an extremely resilient media that allows data to be left in situ indefinitely. The hardware and software of Silica have been co-designed and co-optimized from the media up to the service level with sustainability as a primary objective. The design follows a cloud-first, data-driven methodology underpinned by principles derived from analyzing the archival workload of a large public cloud service.
Their analysis of the workload of a tape-based cloud archival storage system in Section 2 shows that:
on average for every MB read there are 47 MBs written, and for every read operation there are 174 writes. We can see some variation across months, but writes always dominate by over an order of magnitude.
...
Small files dominate the workload, with 58.7% of the reads for files of 4 MiB or smaller. However, these reads only contribute 1.2% of the volume of data read. Files larger than 256 MiB comprise around 85% of bytes read but less than 2% of total read requests. Additionally, there is a long tail of request sizes: there is ⌠10 orders of magnitude between the smallest and largest requested file sizes.
...
We observe a variability in the workload within data centers, with up to 7 orders of magnitude difference between the median and the tail, as well as large variability across different data centers.
...
At the granularity of a day, the peak daily [ingress] rate is âŒ16x higher than the mean daily rate. As the aggregation time increases beyond 30 days, the peak over mean ratio decreases significantly down to only âŒ2, indicating that the average write rate is similar across different 30-day windows.
...
To summarize, as expected for archival storage, the workload is heavily write-dominated. However, unexpectedly, the IO operations are dominated by small file accesses.
Subramanian Muralidhar and a team from Facebook, USC and Princeton had a paper at the 2014 OSDI that described Facebook's warm layer above the two cold storage layers and below Haystack, the hot storage layer. Section 3 of f4: Facebook's Warm BLOB Storage System provides workload data for this layer, which filters out most of the IOPS before they hit the archival layers. I explained in Facebook's Warm Storage that:
A BLOB is a Binary Large OBject. Each type of BLOB contains a single type of immutable binary content, such as photos, videos, documents, etc.
...
Figure 3 shows that the rate of I/O requests to BLOBs drops rapidly through time. The rates for different types of BLOB drop differently, but all 9 types have dropped by 2 orders of magnitude within 8 months, and all but 1 (profile photos) have dropped by an order of magnitude within the first week.
The vast majority of Facebook's BLOBs are warm, as shown in Figure 5 - notice the scale goes from 80-100%. Thus the vast majority of the BLOBs generate I/O rates at least 2 orders of magnitude less than recently generated BLOBs.
Note these important differences between Microsoft's and Facebook's storage hierarchies:
Microsoft stores generic data and depends upon user action to migrate it down the hierarchy to the archive layer, whereas Facebook stores 9 specific types of application data which is migrated automatically based upon detailed knowledge of the workload for each of the types.
Because Facebook can migrate data automatically, it can interpose a warm layer above the archive layer of the hierarchy, and because it has detailed knowledge about the behavior of each of the data types it can make good decision about when to move each type down the hierarchy.
Because the warm layer responds to the vast majority of the read requests and schedules the downward migrations, Facebook's archive layer's IOPS are a steady flow of large writes with very few reads, making efficient use of the hardware.
Contrast Facebook's scheduled ingest flow with the bursty ingest rate shown in Figure 2 of the Silica paper, which finds that:
At the granularity of a day, the peak daily rate is âŒ16x higher than the mean daily rate.
Another interesting aspect of the Silica design is that the technologies, and thus the hardware, used for writing and reading are completely different. The authors point out the implications for the design:
As different technologies are used to read and write, after a platter is written it must be fully read using the same technology that will be used to read it subsequently. This happens before a platter is stored in the library and any staged write data is deleted.
This design has an interesting consequence: during the period when user data is being written into the library, the workload is going to become read-dominated. Every byte written must be read to be verified, in addition to the user reads. The read bandwidth has to be provisioned for peak user read rate, however the read workloads are very bursty, so read drive utilization is extremely low on average. Thus, the verification workload simply utilizes what would otherwise be idle read drives.
Using separate write and read drives has two advantages:
This allows independent scaling of read and write throughput. Additionally, this design allows us to create the first storage system that offers true air-gap-by-design: the robotics are unable to insert a glass platter into a write drive once the glass media has been written.
Since the majority of reads are for verification, the design needs to make specific provision for user reads:
To enable high drive efficiency, two platters can be mounted simultaneously in a read drive: one undergoing verification, and one servicing a customer read. Customer traffic is prioritized over verification, with the read drive switching away when a platter is mounted for a customer read. As soon as the customer platter stops being accessed, the read drive has the ability to quickly switch to the other platter and continue verification. This is similar to avoiding head-of-line blocking of mice flows by elephant flows in networked systems.
Like Facebook's, the prototype Silica systems are data-center size:
A Silica library is a sequence of contiguous write, read, and storage racks interconnected by a platter delivery system. Along all racks there are parallel horizontal rails that span the entire library. We refer to a side of the library (spanning all racks) as a panel. A set of free roaming robots called shuttles are used to move platters between locations.
...
A read rack contains multiple read drives. Each read drive is independent and has slots into which platters are inserted and removed. The number of shuttles active on a panel is limited to twice the number of read drives in the panel. The write drive is full-rack-sized and writes multiple platters concurrently.
Their performance evaluation focuses on the ability to respond to read requests within 15 hours. Their cost evaluation, like Facebook's, focuses on the savings from using warehouse-type space to house the equipment, although is isn't clear that they have actually done so. The rest of their cost evaluation is somewhat hand-wavy, as is natural for a system that isn't yet in production:
The Silica read drives use polarization microscopy, which is a commoditized technique widely used in many applications and is low-cost. Currently, system cost in Silica is dominated by the write drives, as they use femtosecond lasers which are currently expensive and used in niche applications. This highlights the importance of resource proportionality in the system, as write drive utilization needs to be maximized in order to minimize costs. As the Silica technology proliferates, it will drive up the demand for femtosecond lasers, commoditizing the technology.
I'm skeptical of this last point. Archival systems are a niche in the IT market, and one on which companies are loath to spend money, The only customers for systems like Silica are the large cloud providers, who will be reluctant to commit their archives to technology owned by a competitor. Unless a mass-market application for femtosecond lasers emerges, the scope for cost reduction is limited.
Six years on flash has finally impacted the bulk storage market, but it isn't predicted to ship as many bits as hard disks for another four years, when it will be a 40-year-old technology. Actual demonstrations of DNA storage are only 12 years old, and similar demonstrations of silica media are 15 years old. History suggests it will be decades before these technologies impact the storage market.
Although February is designated in the United States as âBlack History Monthâ these resources provide content of continuing interest and value. To borrow playwright Tony Kushnerâs phrase, I am âan intensely secular Jewâ who has had a lifelong interest in and fascination with Jewish kinship to other groups. The TPS album brings together resources on topics ranging from entertainer Sammy Davis Jr. to Black Jewish rabbis, cantors, and congregations. Contributed by Jay Weitz.
ALA CORE Interest Group week sessions on DEI
ALA CORE Interest Group week takes place the first week in March and features 30 different programs that are free for anyone to attend. The following IG sessions are particularly interesting to those working on DEI efforts:
5 March 2024 2:00 p.m. CST: Conducting a Pilot for Library of Congress Demographic Group Terms Elizabeth Hobart, Interim Head of Cataloging and Metadata Services, Penn State (OCLC Symbol: UPM) â Register
8 March 2024 â 10:00 a.m. CST: Homosaurus Usage in the OCLC Database: an Exploratory Analysis â Paromita Biswas (Continuing Resources Metadata Librarian), Amanda Mack (Cataloger in the Film & Television Archive), and Erica Zhang (Metadata Librarian for Open Access), UCLA  (OCLC Symbols: CLU & UCFTA) â Register
8 March 2024 â 3:00 p.m. CST Diversity Audits and the Role of Technical Services Staffâ, presented by Rachel Fischer, Member Services Librarian for Technical Services, Cooperative Computer Services (CCS) (OCLC Symbol: JED) â Register
IG Week provides a good resource for everyone working in cataloging & metadata areas. In addition to the DEI-focused presentations, there will be lots of good learning opportunities around next-generation metadata, workflows, and professional development. Contributed by Richard J. Urban.
Applying toponymic justice to library spaces
Authors Natalia FernĂĄndez, Jane Nichols, and Diana Park explain toponymic justice was enacted in the renaming of a library classroom at the Oregon State University (OCLC Symbol: ORE)âs Valley Library. In the article, âEngaging in Toponymic Justice: Proactively Naming the Nishihara Family Classroomâ (posted 7 February 2024 in In the Library with the Lead Pipe), they characterize the renaming of the OSU library classroom as âproactive naming,â that results in a name reflecting values, an inspiring person or other meaningful name, regardless of funding. (The classroom was temporarily named â2nd Floor West Classroom.â) The name âNishihara Family Classroom,â primarily after Janet Nishihara, Director of OSUâs Education Opportunities Program, but including âfamilyâ for Nishihara siblings who had been student workers at the OSU Libraries. The classroomâs door contains text that gives context to the classroomâs name within the library space: âThis room is named in honor of the Nishihara Family for the dedication to student learning and success.â
The authors place this OSU example in the context of the trend for many institutions to reevaluate their naming policies and change the names of existing spaces named after controversial people. However, their article is one part of a much larger and complex conversation about toponymic justice. The library classroom had a temporary generic name in 2019 when it opened. The renaming of the Louisiana State University (OCLC Symbol: LUU) main library building in 2020 from âMiddleton Libraryâ to âLSU Libraryâ (a temporary name) is a more complex case. The library was named for Middleton in 1978 because of his accomplishment of having the new library built while he was university president. However, Middletonâs pro-segregationist views made keeping that name untenable. LSUâs decision to temporarily rename the library âLSU Libraryâ may have been an imperfect solution, but it reflects the reality that removing a controversial name may be easier than providing a meaningful one. Contributed by Kate James.
A new year always offers an opportunity to reflect on the past and plan intentionally for the year to come. Lately, my mind has been on the art of gathering, thanks to this inspiring book by Priya Parker.
In looking forward and back, I recognize the value of connecting both in-person and virtually. Whether we gather in the same physical space or online, I always appreciate the unstructured moments where we check in with one another, sharing our current perspectives and pathways. Having the opportunities to learn from each other and experiencing the shared desire to care and connect creates positive energy.
The RLP team works to center the human connection in our programming and bring that flow of energy to our partner network.
Metadata Managers: new vision and energy
The power of community and connection was a theme in our January kickoff meetings of the Metadata Managers Focus Group. Senior Program Officer Richard Urban has actively stewarded this group, and its Planning Group, which is tasked with re-envisioning focus and activities, honoring the expertise and time commitment of all those involved. Weâre excited to welcome these new members:
Liz Bodian, Metadata Technologies Librarian at Brandeis University
Susan Dahl, Director, Content Services at University of Calgary
Chingmy Lam, Manager, Metadata Services at University of Sydney Library
Chloe Misorski, Cataloging Librarian at Cleveland Museum of Art
Helen Williams, Metadata Manager at LSE Library
Look for summaries coming soon from our January meetings, as well as announcements about upcoming focus group meetings, and related programming. See the most recent activities and full list of the Metadata Managers Planning Group.
SHARES delivers value
The SHARES Executive Group is another dynamic group in the OCLC RLP, lending expertise and strategic vision to our efforts. We are pleased to welcome four new SHARES Executive Group members:
Marilyn Creswell, University of Michigan Law School
Vicky Flood, University of Manchester
Sylvie Larsen, University of Pennsylvania
Kerry Kristine McElrone, Swarthmore College
Senior Program Officer Dennis Massie continues to energize the SHARES community with his weekly town halls (190 and counting). Recently, he engaged specifically with SHARES institutions in the UK and Ireland to gain insights into collection sharing challenges there and to discuss patterns, trends, and opportunities indicated by analysis of their FY23 ILL activity. The numbers demonstrate that each of these UK and Irish institutions draws significant value from SHARES participation, especially borrowing physical items; the numbers also reveal there are multiple viable approaches to utilizing SHARES for maximum benefit depending upon your situation.Â
Itâs all about the data
In each of our areas of focus, we see the growing need to use data to better understand impact, recognize trends, and make better decisions. Earlier this month, we hosted a facilitated discussion on data-driven decision making in libraries, exploring how insights can support research information management, collection management, and the libraryâs value proposition to institutional stakeholders. This session was part of the OCLC and LIBER joint series, âBuilding for the future: Opportunities and responsibilities for state-of-the-art services.â
Our next facilitated discussion session will take place Wednesday, 17 April 2024, focusing on AI, machine learning, and data science, and you can join us.
RLP Leadership Roundtables
Iâve mentioned our exemplary Metadata Managers and SHARES groups, which offer a welcoming venue for connection and deep participation and the opportunity to influence RLP and OCLC Research programming. Iâm excited to share that weâre launching new leadership roundtable discussions that will address challenges in the key areas of research support and special collections, particularly as they relate to the large ecosystem of changes we face today: staffing demographics, impact assessment, and collaboration opportunities.
The RLP draws from a unique, international mix of independent, academic, national, and museum libraries, and we know there is value in bringing our partners together to learn from each other. Senior Program Officers Rebecca Bryant (research support) and Chela Scott Weber (archives and special collections) moderate the discussions.
Benefits of participation
Itâs a great privilege to meet with peers. Whether an event is virtual or in person, some of the most compelling moments occur during the more social, less structured times where we can reconnect individually with peers and colleagues and see the world from their unique perspectives. Learning about the wide variety of local challenges and opportunities that you face is essential for us to synthesize the common themes and recognize âscalable momentsâ where we can devote energy and attention to shed light on the current landscape.
Our leadership networks, workshops, and other virtual convenings that center person-to-person interaction and leverage our peer network are what make RLP programs so vibrant and enriching. In our busy world, it can be hard to extend beyond a familiar professional peer group. Let RLP help ease the process of expanding your organizational networks through a variety of programming and interest groups, each with different levels of time and attention investment.
Looking forward
The talented RLP team of program officers, along with the broader OCLC Research team, are eager to extend our work, hosting conversationsâand gatheringâlearning as we go and growing as a community.
Thank you for your time and attention, but most of all, your continued support.
We love seeing case studies, posts and articles from libraries and library staff celebrating their wins, whether big and small. But we also learn a lot from libraries sharing their experiences when things donât necessarily go to plan. And weâve learnt from publishing more than 230 issues of our library newsletter that library professionals also [...]
Hello DLF community â itâs March! Last month we had the pleasure of opening and receiving a lot of proposals for our in-person DLF Forum happening in Michigan in July. Keep your eye out for community voting opening early next week, where you can see submitted proposals and vote for your favorites. DLF working groups were also busy last month and made some great plans for the future. Make sure to stay up to date on group meetings by reviewing the DLF Community Calendar.
-Team DLF
This monthâs news:
Applications Open: CLIR invites applications from collecting organizations for the digital reformatting of audio and audiovisual materials for the Recordings at Risk grant program now through April 17, 2024.
Learn IIIF Basics: An upcoming 5-day workshop (March 18-22) will teach attendees the basics about the International Image Interoperability Framework (IIIF). No prior knowledge of IIIF is required. Learn more about the workshop and register.
Register: Registration is now open for the IIIF Annual Conference, to be held in Los Angeles, June 4-7.Â
Register:Registration is now open for the International Internet Preservation Consortiumâs (IIPC) General Assembly and Web Archiving Conference in Paris, April 24-26.
This monthâs open DLF group meetings:
For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Canât find meeting call-in information? Email us at info@diglib.org. Reminder: Team DLF working days are Monday through Thursday.
Born-Digital Access Working Group (BDAWG): Tuesday, March 5, 2pm ET / 11am PT.
Digital Accessibility Working Group (DAWG): Wednesday, March 6, 2pm ET / 11am PT.Â
Assessment Interest Group (AIG) Cultural Assessment Working Group (CAWG): Monday, March 11, 2pm ET/11am PT.
AIG Cost Assessment Working Group: Monday, March 11, 3pm ET/12pm PT.
AIG Metadata Assessment Working Group: Thursday, March 14, 1:15pm ET / 10:15am PT.
AIG User Experience Working Group: Friday, March 15, 11am ET / 8am PT.Â
Committee for Equity and Inclusion (CEI): Monday, March 25, 3pm ET / 12pm PT.Â
Climate Justice Working Group: Wednesday, March 27, 12pm ET / 9am PT.Â
AIG Metadata Working Group: Thursday, March 28, 1:15pm ET / 10:15am PT.
DAWG Policy & Workflows subgroup: Friday, March 29, 1pm ET / 11am PT.Â
How do you make the invisible visible? This is the central premise of The Cloud, a project that Iâve been working on as a technologist in residence at LIL. The idea originated as a visual joke about the cloud: the vaporous metaphor we use to describe the distributed servers that host remotely-run software and infrastructure.
Initially, I was curious about the effects of the applications and services LIL runs (see this post for a more detailed look at the cost of Perma.cc), and thought it would be interesting to visualize when users were interacting with one of our apps. From there the idea grew and melded with my other interests: what if, rather than just notifying us that users were interacting with us, we could be reminded of something specific, such as the carbon emissions of an action? What would that look like?
Obviously, it had to include a cloud. And when do you see clouds? In a thunderstorm! What if a cloud emitted little lightning strikes every time someone created a Perma Link?
That seemed a little too hard.
What if instead, it rained?
I pursued this idea a bit, finding resources to create a smart water pump that I could program to respond to our API. But I hesitatedâall that water around all those devicesâthe idea seemed too wet to execute.
What if we found a way to represent rain that wasnât literal rain? That could work. We could use LEDs, and program them to look like rain was falling every time someone created a new link.
This seemed like a physically reasonable project, but it wasnât quite hitting the mark. Why was the cloud raining every time someone hit our API? What did that have to do with carbon emissions and climate change?
Instead, I decided to simplify the conceit. A user scrolls through various digital activities (such as a Google search or mining bitcoin) and their associated carbon footprints, culled from various sources.
Four screenshots from the Cloud app
The cloud responds by growing at each step, until it ultimately explodes.
I was lucky enough to be able to try it out on students, staff, and faculty at Harvard Law Schoolâs Caspersen Student Center.
Running the pop-up events provided invaluable insights. Some users were shocked and said that they had never previously considered the carbon impact their digital lives might have. There were those who were simply delighted by the project and found the cloud itself a relatively benign presence. Others found the cloud and its conceit somewhat terrifyingâespecially as it grew larger and appeared to be on the edge of bursting. Many wanted to touch it, and most wanted to know what steps they could take to lower their personal footprints. This is a tough question to answer given that the sources of impact are so diffuse, but some suggestions include extending the life of your electronics and avoiding passive consumption. Putting pressure on companies to reveal the environmental impacts of their products could also be an effective tactic because having that information would enable users to make more informed choices about their purchases. The takeaways from the pop-ups confirmed my suspicion: that this is an area ripe for further education.
I hope to bring the Cloud to other locations (such as libraries), to continue to use it as an education tool. The code itself is available in a GitHub repo with instructions for how to build the cloud attachment for anyone who wants to create their own.
We at the Open Knowledge Foundation (OKFN) are excited to announce the list of organisations that have been awarded mini-grants to help them host Open Data Day (ODD) events and activities across the world.Â
Our team received a total of 305 applications and was greatly impressed by the quality of the event proposals. In 2024, we are running two separate calls to accommodate the diverse interests in our community. The first call was for the general community, and the second was specifically for activities related to open mapping.
General Mini-Grant Winners
This call was open to any practices and disciplines carried out by open data communities around the world â such as hackathons, tool demos, artificial intelligence, climate emergency, digital strategies, open government, citizen participation, automation, monitoring, etc. A total of 18 events will receive a grant amount of USD 300 each, thanks to the sponsorship of Jokkolabs Banjul (Gambia), Open Knowledge Foundation (OKFN), Open Knowledge Germany, Datopian and Link Digital.
Here are the winning proposals by country, in alphabetical order:
âOpen Data for climate risk-informed societiesâ â Leveraging open data to mitigate adverse effects of lakes and sea level rise in the African great lakes region.
âEcoSolutions: Harnessing Tools for Climate Resilienceâ â To empower participants with the knowledge and skills to navigate the diverse array of tools available for addressing climate challenges.
âData and Drinks: Girls at the tableâ â To âput on the tableâ the importance of data with a gender perspective to move towards gender equality, contribute to closing gender gaps and overcoming gender stereotypes.
âVillage Leaders Conclave: Navigating the Climate Crisis with Open Dataâ â To empower 100 elected village-level leaders in Cuddalore district, Tamil Nadu, India, with the knowledge and tools to address the climate emergency through open data-driven decision-making and collaborative local action.
âNoCode-LowCode GeoAI Workshop for Sustainable Climate Actionâ â To empower participants with tools for low carbon economy and meaningful climate action, fostering innovation and collaboration through multi-modal open data and open source software such as KNIME.
âMapping Climate Change in 4D: Belvedere Glacierâs Open Geo Data for Education and Researchâ â To conduct an innovative teaching workshop dedicated at familiarizing students of the GIS course with raster data and point cloud processing using real data from an alpine glacier, which is experiencing an extreme retreat due to climate change.
âEmpowering Young Changemakers: Harnessing Open Data & Indigenous Knowledge for Climate Actionâ â To empower young people from Kenyan pastoralist communities in Marsabit County to address the climate emergency using open data and indigenous knowledge.
âOpen Data for Environmental Monitoringâ â To celebrate and promote the impact of open data in environmental monitoring, through showcasing the impact of sensors.AFRICAâs citizen science initiative and inspiring participants to explore and innovate with open data for climate-resilient cities in Africa.
âOpen Data for Green & Circular Economyâ â To discuss and create visualisations/stories regarding the use of open data in green and circular economy. The event is going to be in a workshop style where there will be participants from college/university clubs.
âLetâs Count 4SDGsâ â To enhance community awareness and engagement in best practices for Open Data in achieving the Sustainable Development Goals (SDGs).
âClimate-Induced Displacement: Understanding Impacts on African Women through Open Dataâ â To raise awareness, facilitate informed discussions, and propose data-driven strategies to address the unique challenges faced by African women affected by climate-induced displacement, leveraging open data for better understanding and sustainable solutions.
âWikidata Loves SDGs Nigeriaâ â To harness the power of Wikidata to open up and update data related to SDGs, including fields of work, targets, indicators, organisations working on SDG topics, organisations whose field of work encompasses any of the SDGs, and SDG advocates in Nigeria.
âHacking for Healthy Food & Green Futures: An Open Data Challenge for Ningi Youthâ â To empower Ningi youth to use open data to develop innovative solutions for food security, mental health, and climate change, contributing to SDGs 2, 3, and 13.
âOpen Data as a Human Right Workshop: Empowering Law Students for Sustainable Developmentâ â Empower law students by framing open data as a fundamental human right, exploring its intersection with digital rights, and highlighting its role in advancing sustainable development.
âEmpowering migrant and refugee women to use open data to hold duty bearers accountable for quality sexual reproductive health servicesâ â Mobilize and orient a pool of 60 rural migrant refugee women and girls and 10 women-led community organisations on the concept of open data and how to use open data to hold duty bearers accountable in providing quality SRHR services.
âNeighborhood Data Discoveryâ â The event will focus on presenting neighborhood-level data on SDG indicators for participants to learn about and explore as well as an opportunity to provide feedback and future direction to measuring SDGs at the neighborhood level.
âBootcamp INFOTOPIA version 2.0: Learning how to monitor and infographics gender-based violenceâ â Training for organizations to monitor and visualise open data on gender-based violence in the Capital District.
Open Mapping Mini-Grant Winners
This call was specifically seeking to promote events related to open mapping â such as the use and promotion of geodata, mapathons, environmental monitoring, disaster response, community mapping, land productivity analysis, etc. A total of 8 open mapping events will receive a grant amount of USD 300 each, thanks to the sponsorship of Humanitarian OpenStreetMap (HOT).
Here are the winning proposals by country, in alphabetical order:
âCommunities Mapping Communities: Brazil-Africa Connectionâ â Empower vulnerable communities in Brazil and Africa through the exchange of knowledge facilitated by open mapping data.
âEnvironmental Mapping: Collecting Colombiaâs biodiversity data through urban treesâ â Introduce new mappers to open data collection applied to urban biodiversity with OSM notes.
âWaterPointMappingâ â To produce a participatory map of the areas where agro-pastoralists have access to water in the dry season, in order to improve the OpenStreetMap database quantitatively and qualitatively.
âBus-friendly: mapping participation guiding blocks & halt for equality of public transportation users and engage disabled voicesâ â Evaluate the condition of the Non-BRT Trans Metro Pasundan halt in terms of accessibility for people with disabilities and marginalised people (SDGs 11 & 10) using OSM & Wikipedia.
âPedalMap: Engaging Biking Communities in Open Mapping for Sustainable Development Goalsâ â Through a combination of remote mapping and field sessions, we plan to collaborate with biking communities to collect 360 satellite images through Mapillary and enhance biking-related OpenStreetMap (OSM) data.
âMapping Nyarugenge High-Risk Zone for Disaster Preparednessâ â To map disaster-prone areas and provide training to youth on mapping tools for effective disaster response.
As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date between March 2nd and 8th. All outputs are open for everyone to use and re-use.
In 2024, Open Data Day is also a part of the HOT OpenSummit â23-24 initiative, a creative programme of global event collaborations that leverages experience, passion and connection to drive strong networks and collective action across the humanitarian open mapping movement
For more information, you can reach out to the Open Knowledge Foundation team by emailing opendataday@okfn.org. You can also join the Open Data Day Google Group to ask for advice or share tips and get connected with others.
Met dank aan Vincent Jordaan, OCLC, voor het vertalen van de oorspronkelijke Engelstalige blogpost.
Het OCLC Research Library Partnership (RLP) en LIBER (een vereniging van Europese onderzoeksbibliotheken) organiseerden op 7 februari 2024 een begeleide discussie over besluitvorming op basis van data. De bijeenkomst maakte deel uit van de lopende reeks Building for the future (Bouwen aan de toekomst), waarin we onderzoeken hoe bibliotheken werken aan innovatieve dienstverlening, zoals beschreven in de LIBER-strategie 2023-2027.
Het OCLC RLP-team werkte samen met leden van de LIBER-werkgroepen Research Data Management en Data Science in Libraries om de discussievragen op te stellen. Net zoals in ons vorige gesprek over research data management, hebben we geprobeerd de discussie praktisch te houden. We vroegen de deelnemers om hun huidige en toekomstige inspanningen te delen. Ook wilden we graag hun gedachten horen over de rol en waarde van de bibliotheek bij het ondersteunen van datagedreven besluitvorming. De discussies in kleine groepen werden gefaciliteerd door enthousiaste vrijwilligers van LIBER-werkgroepen en OCLC.
De virtuele bijeenkomst werd bijgewoond door deelnemers van 35 instellingen in 15 landen uit Europa, Noord-Amerika en AziĂ«. Ondanks de vele regionale en nationale verschillen waren er verschillende hoofdthemaâs die in de zeven discussiegroepen naar voren kwamen.
Wat betekent datagedreven besluitvorming voor bibliotheken?
We stelden deelnemers deze vraag in een online poll. We bereikten een vrij sterke consensus dat datagedreven besluitvorming betekent âhet gebruik van bewijs om beslissingen te onderbouwen en de resultaten ervan te evaluerenâ. Hoewel we in deze discussie de term âdatagedrevenâ hebben gebruikt, erkennen we dat anderen de voorkeur geven aan âdatagestuurdâ of âdatabewustâ.
In de gesprekken kwam naar voren dat kwaliteitsdata belangrijk zijn voor het onderbouwen van beslissingen. Ook kwam naar voren dat we voorzichtig moeten zijn en data alleen als hulpmiddel moeten gebruiken bij besluitvorming. Het is belangrijk om data te begrijpen binnen de juiste context. Deze data moeten niet enkel worden beschouwd als vervanging voor andere kwalitatieve onderzoeksmethoden om kennis te vergaren.
Reacties op een online poll met een vraag over de betekenis van âdatagedrevenâ besluitvorming
Hoe maken bibliotheken gebruik van datagedreven besluitvorming?
Er zijn talloze manieren waarop bibliotheken datagedreven besluitvorming gebruiken. We hoorden van deelnemers die gezamenlijke inspanningen beschreven voor collectiebeheer. Hierbij werkt een groep bibliotheken samen om hun collecties samen te beheren, beslissingen over collectiebehoud te ondersteunen, en nog veel meer. Ook kunnen uitleenstatistieken worden gebruikt voor collectieontwikkeling en het saneren van collecties.
De deelnemers spraken ook over het analyseren van data met betrekking tot het gebruik van de bibliotheekgebouwen. Denk daarbij aan het automatisch registreren van hoeveel mensen een ruimte binnenkomen en verlaten of het analyseren van Wi-Fi-gebruik. Op deze manier kunnen ze de drukte in elke ruimte meten en onderbouwen ze beslissingen over ruimtebeheer.
De aanwezigen benadrukten ook de groeiende rol van de bibliotheek in onderzoeksanalyse, ter ondersteuning van institutionele doelstellingen. In het Verenigd Koninkrijk is de bibliotheek vaak verantwoordelijk voor het beheer van data over de wetenschappelijke kennis van de instelling, voor rapportage aan het nationale Research Excellent Framework (REF)[i].
Ook op andere plekken ondersteunen bibliotheekmedewerkers institutionele inspanningen om inzicht te krijgen in de onderzoeksproductiviteit, de voortgang naar open onderzoeksdoelen en het identificeren van mogelijke samenwerkingsverbanden. Bibliotheken creĂ«ren ook specifieke functies om een breed scala aan onderzoeksdata te beheren en beschikbaar te maken voor hergebruik. Een onderwerp wat ook terugkwam in een recent LIBER-interview met Matthias Töwe, Data Curator bij de ETH ZĂŒrich Library.
Datagedreven besluitvorming ondersteunen is een uitdaging
Bibliotheken worden overspoeld met data. Verschillende deelnemers beschreven het gevoel overweldigd te worden door alle beschikbare gegevens. Alleen al de hoeveelheid maakt het lastig om data effectief te beheren, op te schonen en te gebruiken. Ook is het soms moeilijk om te weten welke data beschikbaar zijn, omdat ze verspreid zijn over vele siloâs binnen de organisatie. Daarom is meer organisatie en transparantie noodzakelijk.
Samenwerking is vereist, ongeacht de omvang. Het analyseren van collecties van meerdere instellingen vergt aanzienlijke investeringen en betrokkenheid van diverse belanghebbenden in verschillende instellingen en bibliotheekafdelingen. Zelfs bij het oplossen van lokale operationele vraagstukken moeten bibliotheekmedewerkers sociale interoperabiliteit toepassen. Een deelnemer merkte op: âwe hebben data van anderen nodigâ, om eigen taken te volbrengen.
Gebruikers die om data en rapporten vragen, kunnen vaak niet helder uitleggen wat ze precies nodig hebben. Dit blijkt een veelvoorkomend probleem te zijn, zoals uit onze online poll over de spanningen en uitdagingen van samenwerking rond datagedreven besluitvorming bleek. Een kleine groep besprak de noodzaak om âreferentie-interviewsâ toe te passen bij het praten met gebruikers van data, om zo beter te begrijpen welke vragen ze willen beantwoorden en deze te verduidelijken.
Wat biedt de bibliotheek aan meerwaarde voor datagedreven besluitvorming?
We vroegen de deelnemers om in kleine groepen te discussiëren over de algehele waarde die de bibliotheek biedt bij het faciliteren van beslissingen op basis van data:
Bibliotheken weten alles over metadata. De vaardigheden en kennis van bibliotheekmedewerkers over bibliotheekdata zijn van onschatbare waarde voor het beheren van collecties en meer. Deze deskundigheid op het gebied van metadata is duidelijk een sterk punt, maar wordt vaak over het hoofd gezien. Een deelnemer uitte zijn zorgen over het feit dat bibliotheekexpertise te gemakkelijk wordt afgedaan als âalleen boekenâ. Dit gebeurt zonder de overdraagbaarheid en waarde van deze vaardigheden te erkennen, zoals ervaring met complexe bedrijfssystemen, vaardigheid met databeheer en de consistente toepassing van regels, standaarden en beleid.
Bibliotheken gebruiken data om op een verantwoorde manier bronnen te beheren. Activiteiten zoals gedeelde drukwerkcollecties en andere collectieve collecties vertrouwen op verzamelde data over bibliotheekcollecties. Deze data worden gebruikt om beslissingen te nemen over collectieontwikkeling, retentie en langdurig en kosteneffectief beheer van wetenschappelijke data. Verschillende deelnemers beschreven ook hoe data over collecties en het gebruik van bibliotheekgebouwen werden gebruikt. Dit gebeurde om beslissingen te nemen over toekomstig ruimtegebruik. Bibliotheken moeten namelijk aantonen dat ze hun middelen goed benutten, zodat ze blijvende financiering ontvangen.
Ondersteunende diensten voor onderzoek gaan verder dan de traditionele bibliotheek en zijn zeer zichtbaar voor andere belanghebbenden op de campus. Bibliotheekondersteuning op het gebied van bijvoorbeeld research data management, onderzoeksinformatie en het beheer van data voor nationale rapportagevereisten, in lijn met de strategische prioriteiten van de campus, trekt vaak de meeste aandacht van niet-bibliotheekbelanghebbenden.  Deelnemers uit het Verenigd Koninkrijk en Hong Kong benadrukten bijvoorbeeld de centrale rol van de bibliotheek bij het verzamelen van de wetenschappelijke data van de instelling. Dit ter ondersteuning van nationale rapportageverplichtingen en om analyses te maken van de output en impact van institutioneel onderzoek. Een Canadese deelnemer beschreef de aanstelling van een bibliometrische bibliothecaris, die nu leiding geeft aan een informeel netwerk van business intelligence officers binnen de universiteit. De groep biedt nu ondersteuning bij het nemen van beslissingen over naleving, beoordeling en financiering.  Bibliotheken onderzoeken ook hoe ze een reeks indicatoren kunnen definiëren die inzicht geven in open onderzoeksactiviteiten, zoals beschreven in een recente RLP-webinarpresentatie door Scott Taylor van de Universiteit van Manchester.
Hoe kunnen bibliotheken hun waarde beter communiceren en welke strategieën kunnen ze hiervoor gebruiken?
Het is belangrijk dat bibliotheekbestuurders actief pleiten voor de rol en toegevoegde waarde van de bibliotheek. We hebben veel voorbeelden gehoord van bibliotheken die ondersteuning bieden bij institutionele besluitvorming. Maar het blijft nog steeds een uitdaging om niet-bibliotheekbelanghebbenden ervan te overtuigen dat de bibliotheek een waardevolle bijdrage levert. Een terugkerende zorg is dat mensen simpelweg niet aan de bibliotheek denken. Deze zorg kwam ook naar voren in de vorige begeleide discussie over research data management. Daarom is het van belang dat bibliotheekleiders vastberaden zijn in het benadrukken van de kennis en vaardigheden van bibliotheekmedewerkers. Ze moeten partners begeleiden om de bibliotheek op nieuwe en eigentijdse manieren te zien en te waarderen.
Gebruik een âdoelenboomâ om duidelijk te maken wat belangrijk is en om dit intern en extern te communiceren. Een Britse deelnemer aan een van de kleine groepsdiscussies vertelde hoe ze zoân doelenboom had gemaakt voor haar metadatateam. Daarin stond in grote lijnen wat ze wilden bereiken en hoe dit past binnen de doelen van de bibliotheek en de universiteit. Zo liet ze zien dat de catalogiseerders niet alleen maar âin de hoek zaten en boeken doornamenâ, maar een belangrijke rol hebben in het beheren van kwaliteitsmetadata, wat weer helpt bij allerlei bedrijfsbehoeften. De andere deelnemers aan de discussie waren enthousiast over dit idee. Het lijkt een goede manier om het team te versterken en ervoor te zorgen dat iedereen dezelfde doelen voor ogen heeft.
Visualisaties en data storytelling zijn nodig. Een sterk thema in de gesprekken in kleine groepen was dat de data alleen niet genoeg zijn. Bibliotheekmedewerkers moeten ook vaardigheden ontwikkelen in het presenteren van data. Ze moeten het gebruik van visualisaties beheersen om effectief te communiceren en enthousiasme te wekken voor de resultaten.
Word cloud gemaakt tijdens de bijeenkomst
Bibliotheekmedewerkers moeten hun kennis bijspijkeren, zowel individueel als in teamverband. Ze zijn goed in het beheren van data, maar ze missen vaak training in data-analyse met programmaâs zoals Power BI en Tableau. Verschillende deelnemers hebben verteld over het aanleren van deze vaardigheden. Iemand uit Hong Kong beschreef bijvoorbeeld hoe haar instelling een groep heeft opgericht om samen data-analyse te verkennen, waardoor ze van elkaar kunnen leren. Een andere deelnemer uit Nederland doet iets vergelijkbaars met een lokale werkgroep die vaardigheden in datavisualisatie aanleert en een breder praktijknetwerk opbouwt. Over het algemeen zeiden de deelnemers dat er behoefte is aan bijscholing van het huidige personeel en dat er in de toekomst ook nieuwe mensen met goede technische vaardigheden nodig zijn voor data-analyse.
Doe mee aan de volgende begeleide discussie over AI, Machine Learning en Data Science
De volgende sessie in deze reeks over geavanceerde diensten vindt plaats op 17 april. Tijdens deze bijeenkomst verkennen we gezamenlijk de uitdagingen en kansen van AI, machine learning en data science. De focus ligt op de manieren waarop onderzoeksbibliotheken vooruitstrevende technologieën gebruiken, of willen gebruiken, om werkprocessen in de bibliotheek, metadata en meer te verbeteren. Door gestructureerde discussies in kleine groepen te faciliteren, nodigen we deelnemers uit om ideeën te delen en op te doen over hun visies op de toekomst van AI en datawetenschap. Tegelijkertijd willen we gericht de uitdagingen onderzoeken waarmee bibliotheken worden geconfronteerd bij het verantwoord toepassen van opkomende technologieën. Meld je vandaag nog aan om een plekje te bemachtigen.
[i] Het Research Excellence Framework (REF) is een systeem dat wordt gebruikt in het Verenigd Koninkrijk om de kwaliteit van het onderzoek aan universiteiten en onderzoeksinstellingen te evalueren en te beoordelen.
Ok, this is a blast from the past. Here are some rough notes for a
conversation today with Ernesto about how congressedits
worked, and Wikipedia data more generally. I couldnât summon the will to
create a slidedeck, but I was taking some notes, and it seemed easiest
to just drop them here. Ernesto does some amazing work studying the political
dimensions of the web.
Shira Peltzman, in her second year as a member of the NDSA Coordinating Committee, has been elected by the NDSA Leadership as its 2024 Vice Chair and 2025 Chair. The Vice Chairâs duties include:Â
Managing the annual process to elect new CC members.
Facilitating the new member application process.
Convening quarterly meetings for the Co-Chairs of Working Groups and Interest Groups.
Participating in quarterly meetings between NDSA and CLIR.
Along with the Chair, ensuring the NDSA Code of Conduct is carried out.
Shira Peltzman (1st CC term, 2023-2025) is the Associate Director for Preservation Digital Strategies at Yale University Library where she provides leadership and direction for digital preservation, media preservation, and preservation imaging. In her role she serves as an advocate for sustainable stewardship and works with stakeholders across campus to champion ambitious preservation initiatives that support enduring access to Yaleâs digital collections.
Please join me in congratulating Shira on this new role!
This dog has the option to chomp the hand poking his nose. So far, he has chosen not to exercise it. Photo by Benjamin Williams on Unsplash
I really like the concept of options as a way of thinking about future opportunity. In this post, Iâd like to make a case that adopting an âoptions perspectiveâ can strengthen library decision-making in a range of scenarios â including the decision whether to collaborate. But let me start with a few brief remarks about options that will help clarify application of this concept to libraries.
An option bestows upon its owner the right, but not the obligation to do something. For example, in finance, a call option grants the right, but not the obligation, to purchase a security (say, a particular stock) at a specified price at any time before the option expires. Why is a call option valuable? If the market price of the stock rises above the price specified in the option contract (the âstrike priceâ), you can exercise the option, purchase the security at the strike price, and then sell it on the open market for a profit. But there is no obligation to exercise the option; if the market price remains below the strike price, then the option will likely be allowed to expire unexercised.
If you boil away the details of particular types of options and the specific features of the option contract, I think you are left with two general points:
Options represent an opportunity to do something in the future.
Possessing that opportunity is valuable.
Call options â and other types of financial options â are not free. In fact, option values are monetized and traded in financial markets. This leads to a third general point about options:
There is a cost to acquire an option.
In other words, you pay an upfront cost to acquire an option, in the hopes of exercising it for a larger return sometime in the future. So you must expend resources to get an option; itâs not just a choice that already exists on its own.
The options perspective
Enough about financial options. What does this have to do with libraries? Let me illustrate the connection through examples having to do with digital preservation and research data management.
As I mentioned at the outset, I like the concept of options, and the reason is because I have found that it pops up in all sorts of contexts, and often provides some unexpected insights. In 2010, I wrote an appendix for the report Sustainable Economics for a Digital Planet: Ensuring Long-term Access to Digital Information in which I made the case that the concept of options deepens our understanding of investing in digital preservation. Preserving a digital object is an uncertain enterprise â in particular, it is often unknown whether future usage of the object will justify the expense of preserving it. But this decision need not be made once and for all at the outset; by making initial commitments to retain the object for a finite period â say a few years â the repository has for all intents and purposes âpurchasedâ the option to preserve the object for a longer period, a decision which can then be made at a later time after reevaluating usage patterns for the object over the initial retention period.
The example of digital preservation satisfies the three basic features of options enumerated above:
Opportunity: the opportunity to preserve the digital object long term
Value: the potential benefits of ongoing usage of the object
Cost: the expense of initial preservation actions
The scenario laid out above is not merely a thought exercise, but a useful framework for addressing a real-world problem for many academic libraries: data set retention. For example, the Illinois Data Bank is a âpublic access repository for publishing research data from the University of Illinois at Urbana-Champaign.â The preservation policy associated with a deposited data set includes a commitment to preserve the data set for a minimum of five years. After this period, the Data Bank reserves the right to review the data set and determine if it will be retained or deaccessioned. This is implicitly an option-based approach to data curation: an initial investment to retain the data set for a limited period sets up an option to continue to preserve it long term. The review described in the preservation policy is, essentially, a decision whether or not to exercise that option.
Contrast this to an extreme case: a once-and-for-all decision at the time the data is ready for deposit to either accept the data set and commit to retaining it indefinitely, or not accept it at all. In either eventuality, an element of choice, or flexibility, is lost: either the ability to deaccession a data set if its predicted future value does not warrant its preservation cost, or the ability to resolve some of the uncertainty over the data setâs future value by preserving it for a limited time, and then make a more informed decision about long term retention later. And of course, if the object is not retained at the outset â if the option to preserve is not created through the initial investment in curation â a potentially valuable data set could be lost forever.
Our report documents several case studies of multi-institutional collaborations in the RDM space. One theme running through these case studies was the importance of trust among collaborating partners. Trusted partners not only improve the chances of success for current collaborative efforts, but also open up opportunities for expanding collaboration into new areas. As we observe in the report:
âAnother example of an intangible benefit is the accumulation of trust through the shared experience of collaboration. Trust, in turn, is important to the success and the prospect of future partnerships. . . . In this sense, collaborating is itself a benefit of collaboration, building up a shared foundation that can create an âoption to collaborateâ for the future.â
In other words, investing in a collaboration â even a small-scale effort with limited objectives â cultivates among the partners a shared experience of working together, which can be leveraged as other opportunities for collaboration arise. This intangible benefit â the creation of an option to collaborate in the future â sits alongside any direct, transactional benefit an institution receives from participating in the partnership, but is probably rarely accounted for in any cost-benefit analysis of participating in a collaboration.
The report illustrates this idea with the example of the Texas Data Repository (TDR), a service that allows researchers affiliated with a member institution of the Texas Digital Library (TDL) to publish their data sets. As the report notes, an important incentive to participate in the TDR was that it âwas designed and operated by the TDL, a trusted entity with a track record of building community and shared capacities among its membership.â The experience of working with TDL partners on past collaborative efforts created a viable âoption to collaborateâ on future joint endeavors â an option that was indeed exercised when the opportunity to build shared data repository capacity arose. Â
âPart of the value of collaboration is collaboratingâ
A similar observation is found in another recent OCLC Research report, Sustaining Art Research Collections: Case Studies in Collaboration. This report explores the experiences of several art museum libraries partnering with academic libraries as part of a strategy for achieving long-term sustainability for their collections. The report notes:
â. . . [A]n important intangible benefit of any partnership is the creation of a shared history of collaboration between partners that can be leveraged in the future. As staff from different institutions accumulate experience working together, a measure of trust and confidence in the relationship grows. This ends up representing an âoption to collaborateâ that can be exercised in the futureâeither on an entirely new effort, or on extending existing collaborations into new activities. Part of the value of collaboration is collaborating, and this should not be overlooked when assessing the benefits returned from working with other institutions.â
For example, in a case study detailing the partnership between the Hirsch Library at the Museum of Fine Arts, Houston and the nearby Fondren Library at Rice University, we found that individuals we spoke to at both institutions emphasized the value of the relationshipas distinct from the value of the current collaboration. They believed that âregardless of the benefits perceived from the original agreement, the relationship between the two institutions is valuable and should be protected and preserved.â These partnering institutions saw that the full value of a collaboration goes beyond the immediate transactional benefits, to include the value created by cultivating an option to work together in the future.
In short, collaboration shares the same option-like features we saw with the digital preservation and data set retention examples mentioned earlier: a value in investing in something that creates the opportunity to make choices at a later time. And like those examples, there is insight to be gained about collaboration from thinking about it from an options-based perspective.
In making the case for the option value in collaboration, I am not suggesting that libraries should enter into every collaborative opportunity that comes along in the expectation of creating valuable options to collaborate that can be exercised in the future. Instead, the potential value of a trusted relationship with a partnering institution that can catalyze future collaborations should be considered alongside the many other factors that play into treating collaboration as a strategic choice. In doing so, libraries will be addressing the recommendation put forward in Building Research Data Management Capacity: value the intangible benefits of collaboration.
Insights from options improve decision making
More generally, the findings from our research on library collaboration, as well as our work in other areas of strategic interest to libraries, suggest that decision making can be improved by adopting an options perspective. When confronted with decisions that involve future opportunity, it is valuable to factor that opportunity into assessments of costs and benefits. Can a modest investment now lead to an expansion of future choices or flexibility?
Many decision makers probably consider this option value at least implicitly as part of their decision making process: for example, repository managers know that if effort is not made to retain and curate a data set now, the ability to use the data set later may be irrevocably lost. But the existence of an option value may be less clear in considering collaborative opportunities and the value that flows from them. In these situations, we may need to look a little harder for the option value. But as the findings from Building Research Data Management Capacity indicate, the option value is often there, and it can be significant.
Our systematic search for the Empirical Retraction Lit bibliography EXCLUDES retraction notices or retracted publications using database filters. Still, some turn up. (Isnât there always a metadata mess?)
While most retraction notices and retracted publications can be excluded at the title screening stage, a few make it through to the abstract screening, and, for items with no abstracts, to the full-text screening. Todayâs example is âRetraction of unreliable publicationâ. Kept at the title-screening stage**; no abstract; so itâs part of the full-text screening. PubMed metadata would have told us itâs a âRetraction of Publicationâ â but this particular record came from Scopus.
The Zotero-provisioned article, âClinical guidelines: too much of a good thingâ, had nothing to do with retraction so I went back to the record (which had this link with the Scopus EID). To see what went wrong, I searched Scopus for EID(2-s2.0-84897800625) which finds the Scopus record, complete with an incorrect DOI: 10.1308/xxx which today takes me to a third article with another DOI.***
Scopus Preview is even more interesting because it shows the EMTREE terms ânoteâ and âretracted articleâ (which are not so accurate in my opinion):
In my 2020 Scientometrics article, I cataloged challenges in getting to the full-text retraction notice for a single article. Itâs not clear how common such errors are, nor how to systematically check for errors.
Iâm continuing to think about this, since, for RISRS II, Iâm on the lookout for metadata disasters (in research-ese: What are the implications of specific instances of successes and failures in the metadata pipeline, for designing consensus practices?)
This particular retrieval error is due to the wrong DOI â which could affect any article (not just retraction notices). Iâve reported the DOI error to the Scopus document correction team.
**Keeping âRetraction of unreliable publicationâ for abstract screening may seem overgenerous. But consider the title âRetractionsâ. Surely âRetractionsâ is the title of a bulk retraction notice! Nope, itâs a research article in the Review of Economics and Statistics by Azoulay, Furman, Krieger, and Murray. Thanks, folks. While plurals are more likely than singulars to signal research articles and editorials I try to keep vague/ambiguous titles for a closer look.
***For 10.1308/xxx Crossref just lists this latest article. Same with Scopus.
But my university library system has multiple results â a mystery!
Apart from getting started in the midst of one of Silicon Valley's regular downturns, another great thing about the beginnings of Nvidia was that instead of insisting on the "minimum viable product" our VCs, Sutter Hill and Sequoia, gave us the time to develop a real architecture for a family of chips. It enabled us to get an amazing amount of functionality into a half-micron gate array; I/O virtualization, a DMA engine, a graphics processor that rendered curved surfaces directly, not by approximating them with triangles, a sound engine and support for game controllers. As I write, after a three decade-long history of bringing innovations to the market, Nvidia is America's third most valuable company.
I've written several times about how in pursuit of a quicker buck, VCs have largely discarded the slow process of building an IPO-ready company like Nvidia in favor of building one that will be acquired by one of the dominant monopolists. These VCs don't support innovation. Even if their acquisition-bound companies do innovate in their short lives, their innovations are rarely tested in the market after the acuisition.
Below the fold I discuss a new paper that presents a highly detailed look at the mechanisms the dominant companies use to neutralize the threats startups could pose to their dominance.
In Coopting Disruption law professors Mark Lemley (Stanford) and Matthew Wansley (Cardozo) ask a good question:
Our economy is dominated by five aging tech giantsâAlphabet, Amazon, Apple, Meta, and Microsoft. Each of these firms was founded more than twenty years ago: Apple and Microsoft in the 1970s, Google and Amazon in the 1990s, and Facebook in 2004. Each of them grew by successfully commercializing a disruptive technologyâpersonal computers (Apple), operating systems (Microsoft), online shopping (Amazon), search engines (Google), and social networks (Facebook). Each of them displaced the incumbents that came before them. But in the last twenty years, no company has commercialized a new technology in a way that threatens the tech giants. Why?
The TL;DR of Lemley and Wansley's answer to their question is:
While there are many reasons for the tech giantsâ continued dominance, we think an important and overlooked one is that they have learned how to coopt disruption. They identify potentially disruptive technologies, use their money to influence the startups developing them, strategically dole out access to the resources the startups need to grow, and seek regulation that will make it harder for the startups to compete. When a threat emerges, they buy it off. And after they acquire a startup, they redirect its people and assets to their own innovation needs.
a company that is started with the goal of being swallowed by a tech giant probably isnât contributing much to society.
Introduction
They start by identifying the advantages and disadvantages the incumbents possess in their efforts to monetize innovations. Their list of advantages is:
"large incumbents can take advantage of economies of scale" not just in manufacturing, but also in marketing an distribution by exploiting their existing customer relationships.
"Large incumbents can also take advantage of economies of scope. Innovation creates âinvoluntary spilloversâânew knowledge that has economic value beyond the specific product that the firm was developing."
"large incumbents can access capital at a lower cost" for example from retained earnings from their cash cows.
"large incumbents may have another potential advantageâa longer investment time horizon" even more so now with the compression of VC time horizons.
Their list of incumbents disadvantages in innovation is more interesting:
"their success will cannibalize their own market share" or "More generally, a monopolist has diminished incentives to introduce new products, improve product quality, or lower prices because any new sales generated replace its existing sales." Economists call this "Arrow's replacement effect"; more specifically: "The general lesson is, all else equal, the larger a firmâs market share and the less it is threatened by competition, the weaker its incentives to innovate. So we should expect large incumbents to not innovate much. And if they can dispense with the competitors rather than have to compete with them, they will do that."
"their managers prefer to deliver incremental innovations to their existing customers". Unlike Arrow's theory, "Christensenâs theory of disruptive innovation, ... focuses on the career incentives of middle managers ... Incumbent managers have an incentive to deliver sustaining innovationsâincremental improvements in quality to the firmâs existing products that will please its existing customers. But they have substantial disincentives to pursue projects that upset the apple cart, even if doing so would bring new customers to the firm" The fundamental problem is that "Housing an innovation project inside a firm with diverse lines of business creates conflict with those other businesses. Some firm assetsâcash, cloud computing, equipment, facilities, and engineersâ timeâare rivalrous and finite, so executives must be willing to fight internal constituencies to devote those resources to innovation." Ingenuity, NASA's wildly successful Mars helicopter is a good example, as Eric Berger reports in Before Ingenuity ever landed on Mars, scientists almost managed to kill it. It was competing for cost, weight and risk with Perseverance's primary mission.
"their single veto point decision-making structure encourages risk-aversion" More specifically: "Inside a large incumbent, decisions about whether to fund an innovative project must pass through one veto point. In the venture capital market, many competing investors independently decide whether to finance an innovative idea. Inside a firm, an employee with an innovative idea must pitch an idea to managers who ultimately report to one executive gate-keeper. In the venture capital market, if a would-be startup founder pitches an idea to ten VC firms, and nine of them are not persuaded, the idea gets funded." The advantage of market-based finance over internal finance applies not just to the initiation but also the continuation of an innovation project. Inside a firm, an executive who has soured on a project can terminate it. In the venture capital market, when a startupâs initial investors grow skeptical, the company can still pitch outsiders on infusing more cash." The authors make this important point (my emphasis): "And while economists often describe markets as efficient, there is no reason to believe individual corporate executives make efficient (or even rational) decisions. Just ask Twitter. Markets work not because private executives make good decisions but because the ones who make bad decisions get driven out. But that dynamic only works with competition."
"they cannot appropriately compensate employees working on innovation projects." The reason they cannot is that: "Startups solve this problem by giving employees stock options. Every employee with significant equity knows that if the startup successfully exits, they will be rewarded. Stock in a large, diversified public company does not create similar incentives. The incentives are diluted because the value of the stock will be affected by too many variables unrelated to the success of the specific innovation project." And that: "large firms do not recognize internal âproperty rightsâ to innovations that employees develop. If they did, employees might become reluctant to share information. But not protecting internal property rights gives innovative employees incentive to leave. If employees at a large firm found their own startup and raise venture capital to fund it, they will earn a much greater share of the profits of the innovation."
The authors go on to describe five techniques incumbents use to neutralize the threat of disruption that innovative startups might pose; network effects, self-preferencing, paying for defaults, cloning, and coopting the disruptor. They claim other have described the first four, but they don't amount to an adequate explanation for why the tech giants haven't been disrupted. I will summarize each of the four in turn..
Network effects
Nearly three decades ago W. Brian Arthur, in Increasing Returns and Path Dependence in the Economy explained how increasing returns to scale, or network effects, of technology markets typically led to them being dominated by one player. Consider a new market opened up by a technologcal development. Several startups enter, for random reasons one gets bigger then the others, network effects amplify its advantage in a feedback loop.
This effect is more important now, as the the giants' business models have evolved to become platforms:
The tech giantsâ core businesses are built on platforms. A platform is an intermediary in a two-sided market. It connects users on one side of the market with users on the other side for transactions or interactions.
...
Platforms tend to exhibit network effectsâthe addition of a new user increases the value of a platform to existing users and attracts new users.
This is precisely the mechanism Brian Arthur described, but applied to a business model that has since been enabled by the Internet.
Self-preferencing
Self-preferencing happens when a platform isn't just a two-sided market, but one in which the platform itself is a vendor:
Amazon, for example, both invites third party vendors to sell their products in its online marketplace and sells its own house brands that compete with those vendors. Amazon has a powerful advantage in that competition. It has access to data on all of its competitorsâwho their customers are, which products are selling well, and which prices work best. And it controls which ads consumers see when they search for a specific product. Assuming Amazon uses that information to prefer its own products to those of its competitors (either by pricing strategically or by promoting its own products in search results) â something alleged but not yet proven in a pending antitrust case -- the result is to bias competition. Vendors cannot realistically protest Amazonâs self-preferencing (or just go elsewhere) because Amazon has such a dominant share in the online retail market.
Paying for defaults
The value of the default position is notorious because:
Alphabet pays Apple a reported $18 billion (with a b) each year for Google to be the default search engine on iOS devices. Android and iOS together account for 99% of the U.S. mobile operating system market. Consequently, almost everyone who uses a smartphone in America is accustomed to Google search. Alphabet claims that âcompetition is just a click away.â But research and experience have shown that defaults can be somewhat sticky. So controlling the default position can give Alphabet (or whoever wins the Apple bid) an advantage. That said, someone has to be the default, and it might be better for consumers if the default is the search engine most users already prefer. The real problem might be the idea of paying for placement, whoever wins the bidding war.
Cloning
There are many examples of a tech giant tryng to neutralize the threat from a startup by using the threat of cloning their product to force the startup to sell itself, or of actually cloning the product and using their market power to swamp the startup. Meta's addition of Reels to Instagram in response to Tik Tok is an obvious example. But the authors make two good points: First:
Cloning is only objectionable if the tech giant wins out not by competition on the merits, but by exclusionary conduct.
Google+, Googleâs effort to build a social media service that combined the best of Facebook and Twitter was an abject failure. Appleâs effort to control the music worldâs move to streaming by offering its own alternative to Spotify hasnât prevented Spotify from dominating music streaming and eclipsing the once-vibrant (and Apple-dominated) market for music downloads. Metaâs effort to copy Snap, then TikTok, by introducing Stories and Reels has not proven terribly successful, and certainly has not prevented those companies from building their markets.
The fact that the giants can clone a startup's product leads the authors to ask:
If the product is cloneable, then why would you buy the company and burn cash paying off its VCs?
There are two possible answers. It may be faster and easier, though likely not cheaper, to "acquihire" the startup's talent than to recruit equivalent talent in the open market. Or it may be faster and easier, though likely not cheaper, to acquire the company and its product rather than cloning it.
Microsoft enjoyed strong network effects in the 1990s as the dominant maker of operating system software â far more dominant than it is today. It cloned internet browser technology from upstarts like Netscape, and it engaged in anticompetitive conduct designed to ensure that it, not Netscape, became the browser of choice.82 But Microsoftâs victory over Netscape was short-lived. New startups â Mozilla and then Google â came out of nowhere and took the market away from it. Microsoft still benefits from network effects, and it still uses cloning and self-preferencing to send users to its Edge browser. But it doesnât work. Microsoft employed all the tools of a dominant firm in a network market, but it still faced disruption.
So these four techniques aren't an explanation for the recent dearth of disruption.
Coopting disruption
The authors imagine themselves as a tech giant, asking what else they would do to prevent disruption, and coming up with four new techniques:
"First, you would learn as much as you can about which companies had the capability to develop disruptive innovations and try to steer them away from competing with you â perhaps by partnering with them, or perhaps by investing in them."
"Second, you would make sure that those companies could not access the critical resources they would need to transform their innovation into a disruptive product."
"Third, you would tell your government relations team to seek regulation that would build a competitive moat around your position and keep disruption out."
"Fourth, if one of the companies you were tracking nevertheless did start to develop a disruptive product, you would want extract that innovationâand choke off the potential competitionâin an acquisition."
These are the techniques they call "coopting disruption", pointing out that the tech giants have:
"built a powerful reconnaissance network covering emerging competitive threats by investing in startups as corporate VCs and by cultivating relationships with financial VCs."
"accumulated massive quantities of data that are essential for many software and AI innovations, and they dole out access to this data and to their networks selectively."
"asked legislators to regulate the tech industryâin a way that will buttress incumbents."
"repeatedly bought potentially competitive startups in a way that has flownâuntil a few years agoâbelow the antitrust radar."
The authors detail many examples of each of these techniques, for example Facebook conditioning access to user data on the purchase of advertising, and Google's purchase of DoubleClick and YouTube. Interestingly, they contrast the recent purchasing of the tech giants with Cisco's famously successful purchases in the 90s:
The Cisco story exemplifies how the venture capital market, as a market, is better at exploring a series of risky ideas than a firm with a single risk-averse gatekeeper. It also illustrates how the advantages of a large incumbentâin this case access to markets and existing customer relationshipsâcan sometimes extract more market value out of a technology than a new entrant.
The rapid evoluution of networking technology at the time meant that even Cisco, the largest company in the market, didn't have the R&D resources to explore all the opportunities. They depended upon VCs to fund the initial explorations, rewarding them by paying over the odds for the successes. Their market power then got the successes deployed much faster than a startup could.
Our claim here is that the same dynamics that inhibit disruptive innovation by longstanding employees of large incumbents inhibit disruptive innovation by new employees from acquired startups.
...
The tech giants win from coopting disruption even though it destroys social value. In fact, they benefit in two ways. They make faster incremental progress on the sustaining innovations that they want. They get the new code, the valuable intellectual property, and the fresh ideas of the startup. And, critically, they also kill off a competitor. They no longer have to worry about the startup actually developing the more disruptive innovation and leapfrogging them or other tech giants acquiring the startup and using its assets to compete with them.
And, by making the innovators from the startup rich, the acquirer greatly reduces their incentives for future innovation. Andy Bechtolsheim is an outlier.
Remedies?
Lemley and Wansley, who seem to think in fours, make a set of four proposals for how these harms might be reduced:
Unlocking Directorates â under the Clayton Act "interlocking officers and directors between companies that compete, even in part, are illegal per se â without any inquiry into whether the companies in fact restrained competition because of their overlapping interests or whether the conduct offered procompetitive benefits." Companies with less than $4.1M in revenue are exampt, which excludes most startups; this should be revised.
Limiting Leveraging of Data and Networks â "we would impose on incumbent tech monopolists a presumptive duty of nondiscrimination in access where the defendant (1) provides or sells data or network access to at least some unaffiliated companies and (2) refuses to provide or sell the same data or network access to the plaintiff company on comparable terms, but (3) the plaintiff does not operate a competing network or otherwise compete with the defendant in the market from which it collected the relevant data."
Regulating Regulation â "Done right, regulation of technology can be beneficial and even necessary to the development of that technology, minimizing the risk of harm to third parties and ensuring that the world views the technology as safe and trustworthy. But all too often regulation has become a way to insulate incumbents from competition, with predictable results." The authors' suggestions exemplify this difficulty, being rather vague and aspirational.
Blocking Cooptive Acquisitions â this is the most complex of the four proposals, and builds on Nascent Competitors by C. Scott Hemphill & Tim Wu, who write:
We favor an enforcement policy that prohibits anticompetitive conduct that is reasonably capable of contributing significantly to the maintenance of the incumbent s market power. That approach implies enforcement even where the competitive significance of the nascent competitor is uncertain.
Justifying blocking mergers because of a nascent threat that might never materialize is problematic. But it is only more so than the current way anti-trust works, by projecting likely harm to consumer welfare, which also might never materialize (although it almost always does). Lemley and Wansley explain the dilemma:
antitrust enforcers need a strategy for blocking cooptive acquisitions that works within existing case law (or plausible improvements to that law) and is surgical enough to avoid chilling investment.
For cooptive acquisitions like Facebook/Instagram deal, we think Hemphill and Wuâs strategy makes sense. Zuckerbergâs email arguing for acquiring startups like Instagram because âthey could be very disruptive to usâ is a smoking gun of anticompetitive intent.
But Lemley and Wansley go further, arguing for blocking megers based the startup's ability to innovate distruptive technology:
Of course, an approach to policing startup acquisitions based on innovation capabilities need limits. Many startups have some innovation capabilities that could have a significant effect on competition. We can cabin enforcement in three waysâby focusing on specific technologies and specific firms and by looking at the cumulative effects of multiple acquisitions.
Their examples of technologies include generative AI and virtual and augmented reality, both cases where it is already too late. The companies they identify "Alphabet, Amazon, Apple, Microsoft, and Meta" are all veterans of multiple acquisitions in these areas. But they argue that committing to
challenge fuure mergers:
would create socially desirable incentives for startups. A startup developing one of the listed technologies would gain stronger incentives to turn its innovations into the products that its management team believed would garner the highest value on the open marketârather than the one most valuable to the tech giants. They would also gain stronger incentives to build a truly independent business and go public since an acquisition by the tech giants would be a less likely exit.
I think these would all be worthwhile steps, and I'm all in favor of updating anti-trust law and, even better, actually enforcing the laws on the books. But I am skeptical that the government can spot potentially disruptive technologies before the tech giants spot and acquire them. Especially since the government can't be embedded in the VC industry the way the tech giants are. Note that many of the harms Lemley and Wansley identify happen shortly after the acquisition. Would forcing Meta to divest Instagram at this late date restore the innovations the acquisition killed off?
Equity, Diversity, and Inclusion (EDI) are essential to the preservation of intellectual freedom (American Library Association). Yet some Library and Information Science scholars argue that EDI work within libraries is not evolving significantly or rapidly enough. Using our work in building the Diverse BookFinder Community of Practice as an example, we highlight overarching principles that can guide EDI professional development towards greater effectiveness and sustainability. Sharing concrete strategies and examples of how to keep community and the communityâs shared purpose at the forefront of a volunteer-based EDI program, we posit that with only minimal adjustments, our model can be adapted to fit other EDI work, whether itâs focused on sustaining the efforts of an external volunteer group or on supporting and sustaining the crucial and everyday EDI work of librarians in collection development, programming, and committee building. We end the manuscript with a checklist designed to support the development of EDI training.Â
At the Diverse BookFinder (DBF), we work to move the diverse books discussion beyond increasing the number of books (see Aronson et al.) to a deeper consideration of how Black and Indigenous people and People of Color (BIPOC) are represented within diverse books. To accomplish this change, weâve cataloged and analyzed thousands of trade picture books published or distributed in the United States (including various Canadian publishers) since 2002 to surface and create a one-of-a-kind resource.Â
In 2020, the DBF received funding from the Institute of Museum and Library Services (IMLS) to âreach upâ and include all of childrenâs literature in our work: picture books (generally ages 3-8), early readers (5-9), middle grade (7-12), and young adult books (12-18). As we worked to expand, it became clear that we would require an exponentially larger group of people to read and analyze texts. Practically, this meant that the DBF needed to establish a volunteer-based community of learners who could be trained in the specific methods of DBF book analysis and who would be invested enough in the project to provide continued participation. In response, we created and sustained a Community of Practice (CoP).Â
As we worked through the process of creating, training, and sustaining a diverse, volunteer-based community of learners, we discovered that we were creating a model for sustainable Equity, Diversity, and Inclusion (EDI) work that is missing in the field of librarianship: a model that chips away at traditional systemic and institutional barriers to create truly inclusive and collaborative working partnerships. Using our work in building the DBF CoP as an example and drawing together interdisciplinary research and practice, we highlight overarching principles that can guide EDI professional development towards greater effectiveness and sustainability. We share concrete strategies and examples of how to keep community and the communityâs shared purpose at the forefront of a volunteer-based EDI program. Furthermore, we posit that with only minimal adjustments, our model can be adapted to fit other EDI work, whether itâs focused on sustaining the efforts of an external volunteer group or on supporting and sustaining the crucial and everyday EDI work of library professionals in collection development, programming, and committee building.
Why is the DBF Work Important to Libraries?Â
Providing patrons with access to diverse books is central to librarianship and the role of library professionals who have an obligation to maintain collections that represent the experiences, interests, and needs of historically marginalized communities (âDiverse Collectionsâ). Furthermore, longstanding disparities in publishing have created a need for library professionals to be more intentional about their selection process (see Cummins). Library professionals must identify and select books that provide visual and textual representations of diverse characters across various forms and genres. They must also identify and select books that depict diverse characters in culturally relevant ways. This task is further complicated by recent attempts to ban library books that highlight the unique experiences of BIPOC and LGBTQIA+ communities. In 2022, the American Library Association (ALA) tracked 1,269 book challenges, the highest number yet, mostly aimed at removing diverse books (âAmerican Libraryâ). These challenges are harmful to EDI work in libraries because they can exacerbate existing inequities present within library collections.Â
In light of these challenges, ALA leaders have taken the position that equity, diversity, and inclusion are essential to the preservation of intellectual freedom. Despite this position, some Library and Information Science scholars argue that EDI work within libraries is not evolving significantly or rapidly enough, due to the lack of diversity thatâs prevalent within the library workforce andlibrary collections (Dali and Caidi). Additionally, libraries articulate diversity as a core value but have not developed methodologies that would align practice with professional values (see Espinosa de los Monteros and Enimil; Dali and Caidi).
We therefore argue that in order to maintain the cadence of EDI work, library professionals must be intentional about their approach to collection development and management. Without intentionality, we fear that EDI work may continue to evolve slowly. According to Dr. Martin Luther King Jr., âJustice too long delayed is justice deniedâ (839).Â
The work of the DBF is beneficial to libraries in two ways. At a granular level, the DBF can support selection decisions related to diverse content. Through the DBF Collection Analysis Tool (CAT), libraries can access a snapshot of their collections in order to determine where diversity gaps exist in terms of the representation and presentation of BIPOC communities. Also, the metadata undergirding the DBF work provides a shared EDI language specific to childrenâs literature, potentially making it easier for all library professionals, BIPOC community members, and allies to talk about EDI in childrenâs literature writ large.
Second, the DBF CoP serves as a model for recruiting, engaging, and sustaining large groups of library professionals in diverse collection development practices and other EDI activities. Our model achieves measurable outcomes and encourages collaboration among library workers and/or volunteers from different cultures, ethnicities, and backgrounds, with varying levels of professional experience and types of expertise.Â
Literature Review and Theoretical Framework for Our Work
Our leadership group grounded the design and implementation of the CoP in feminist pedagogical theories that have been developing for over 25 years. In a feminist classroom, students and teachers work together to achieve mutual goals through âcollaboration, community building and validating knowledge based on experienceâ (McCusker 445). In developing our training plan and throughout the training program, we placed significant emphasis on personal lived experiences and translating those experiences into learning opportunities to effect social change. Rather than insisting on a traditional academic model that centers expertise with a clear head of the classroom, we chose to create an environment with a shared responsibility for learning between facilitators and learners since we had much to learn from one anotherâs unique positionalities (Tedesco-Schneck 267, Grissom-Broughton 166). Having multiple voices involved in planning, during the training program, and in interpreting DBFâs metadata allowed us to decenter authority and power, a necessary condition for EDI work. We also provided ample opportunities for continual reflexive practices in order to analyze intersections of oppression and how these intersections play out in our reading of texts (Grissom-Broughton 171, McCusker 456).Â
Another crucial aspect to our model was a pedagogy of care, which involves âan approach based on an ethic of care as both a moral imperative and pedagogical necessity (Gay, 2018)â (Barek et al). Pedagogy of care theories stress the relationship between teacher and student with an emphasis on mutual respect and authentic dialogue with compassion, reciprocity, and positionality. Focusing on our learners from a perspective of radical compassion, in which we try to relieve causes of distress and discomfort, allowed all of us, facilitators and learners alike, to center radical self-care (Ravitch 6). In doing so, we could âlovingly revise parts of ourselves as a necessary dimension of our work to re-envision and reconstruct the world from a perspective of equity, social identity, and liberationâ (Ravitch 6).Â
Upon reflection after the initial training program was complete, we saw clear connections between meaningful learning and our initial roots, intellectual partners, and intentions. After all, meaningful learning occurs when âlearners are active, constructive, intentional, cooperative, and working on authentic tasksâ (Jonassen 49). In particular, our focus was intentional and goal-oriented with an authentic task of coding approximately 2,380 books within a year, so that library professionals and readers could make more informed decisions about book selection. In order to support meaningful learning, our training program was inherently cooperative to help all of us, facilitators and learners, solve problems and generate new knowledge. Like a feminist classroom, âthese characteristics are interrelated, interactive, and interdependentâ (Jonassen 51).Â
However, we also felt that meaningful learning and some of the other frameworks we drew upon were more limited in their depictions of the relational aspects of learning and teaching than what we were striving for in our model. Living up to the expectations of facilitators and learners and developing authentic, honest, and caring relationships are essential to the reparative work often involved in EDI projects and partnerships. Traditionally, student-teacher relationships have been viewed from a binary positive-negative emotional response, but studies relying on this binary âdo not consider the interplay between the emotions of student-teacher relationship and the cultural and social organization of interactionâ (Tormey 994). Our relationships to others are influenced by the implicit biases we all carry with us and bring to our perceptions of others. Thus, in preparing an EDI focused training program, itâs necessary to fully understand the relational aspects of how people learn and how people teach. Relationally, there are multiple levels of engagement in a training program: between facilitators; between learners; between learners and facilitators; and between learners, facilitators, and the materials under analysis.
Each of these relational levels requires attention not only to others but also to ourselves. Kathleen M. Quinlan outlined different relational levels in the classroom and noted that âeducation is relational, and emotions are central to relationships. ⊠how we feel with and about others are central to the quality of our relationshipsâ (102). In maintaining expectations and authentic, caring relationships, we create a relational third space for action, thought partnership, and empathy (Ravich 4).
Planning and Recruitment
Leadership TeamÂ
When developing an EDI project in librarianship, the leadership team is responsible for the planning, promotion, and implementation of the projectâs objectives, so determining the members of this team is crucial. While the DBF has multiple teams working on various aspects of the project, the CoP Advisory Group (CoP AG) is a team of seven (originally eight) members from a variety of professional backgrounds, lived experiences, and positionalities. Two of the original DBF founders and a former DBF project manager are a part of our group, each bringing significant experience in applying the specific methods of DBF book analysis to picture books. Four new members joined the group as part of the expansion into early readers, middle grade, and young adult literature. This combination not only allowed for cohesion between the two phases of the database but also for flexibility in considering new interpretations of the DBFâs metadata and diverse childrenâs literature and audiences.Â
Academically, our group members are experts in psychology, librarianship, childrenâs literature, and gender and sexuality studies, and one is an award-winning author-illustrator of childrenâs books. Most importantly, however, each member brought experience in and with EDI work from various vantage points, whether as BIPOC and/or with expertise in working with minoritized populations. Collectively, we were grounded in interdisciplinary feminist, critical race, anti-racist, gender, and sexuality theories. Since our individual thought partners and lived experiences varied, each member brought an invaluable, unique perspective to EDI work and a shared respect for discussion, collaboration, and willingness to learn and work by consensus.Â
Planning for a Virtual Experience
After creating the leadership team, an integral part of the planning process included preparing a program that would function well as an entirely virtual experience. We knew we would be recruiting volunteers from across the United States and Canada and that our volunteers would be coming from a variety of backgrounds with a wide array of scheduling needs. This meant that we wanted to be as intentional as possible in creating a training structure that would cater to the greatest number of participants. Planning for this reality involved three major components: creating a flexible training structure; encouraging consistency and active learning; and creating and providing easily accessible training materials.Â
To allow for a variety of schedules and time zones, we focused on building a training course structure that was intentionally flexible. Dividing the training sessions into Large Core Classes (facilitator-led instruction) and Small Group Sessions (group-led discussion) allowed us to provide a training that could be both asynchronous (the recorded Large Core Classes) and synchronous (Small Group Sessions). In addition to providing flexibility for scheduling, dividing the training into these two types of experiences also furthered our goal of following a feminist pedagogical model that decenters expertise by allowing us to place equal importance on both facilitator-imparted knowledge (Large Core Classes) and group learning through more personal interactions and discussions (Small Group Sessions).1
Furthermore, the synchronous sessions were offered on various days and times throughout the week, and participants were invited to select which session would work best for them rather than being assigned a session. This attention to flexible scheduling encouraged engagement and prevented scheduling conflicts from prohibiting participation, which made the involvement of such a large group of volunteers sustainable. In October 2022, we started our first CoP with 76 volunteers and by August 2023, we retained 63 volunteers, with the departing volunteers leaving for various personal reasons rather than for reasons related to our shared work. Of the remaining 63 volunteers, 37 expressed interest in continuing work with the DBF even after their initial one-year term was completed.Â
Throughout this process, it was necessary for the CoP facilitators to support the work of incoming coders and to ensure that the overall scope of our work was meaningful and produced tangible results. One way in which we fostered these goals was through an emphasis on consistency. With such a large group of learners being split among various Small Groups centered on discussion, we wanted to make sure that each learner received consistent messaging and training. To this end, facilitators met weekly. During these weekly meetings, facilitators reviewed how the prior weekâs class and discussion sessions had gone and considered how we could best create ongoing active learning opportunities that would enrich and reinforce the knowledge constructed during the Large Core Classes. This constant loop of feedback between the facilitators, as well as between the facilitators and learners, helped us to create spaces where participants had consistent structure and support and thus felt comfortable engaging in sensitive dialogue around topics of race, culture, and identity.
Once the course structure and schedule were finalized, it was also important to provide training materials and instructions in such a way that they would be available to all participants, regardless of when they were working on their assigned tasks. Thus, we developed our training materials using the Google Suite of products which suited our need for software that provides user friendly and accessible collaboration at no cost. With Google Sites, we created both an online instructional manual and a âTraining Base Camp,â which served as a resource center for learners and facilitators. The base camp stored all the important documents, links, and forms that a learner or facilitator might need to access during the training program and their year of coding. We also used Google Forms to create âQuestion Submission Formsâ so that participants had the opportunity to submit questions on a rolling basis without having to wait for the next session. This created a loop of constant feedback between the learners and facilitators that was both cooperative and flexible. Â
Recruitment
Once a flexible and adaptive course structure was established, we turned our attention to volunteer recruitment. Given our focus on diversity and inclusion and our goal of creating a widely diverse CoP, we guided our recruitment efforts towards library organizations that included participants who already had some experience with diversity in childrenâs literature and with metadata. We also focused our efforts on recruiting volunteers from professional organizations that already had a stated diversity focus.2
Our strategic goal of recruiting diversity-minded individuals was reflected in the creation of our application materials, as well as where we shared them. The short-answer questions on the application asked potential participants to reflect on the importance of diversity in childrenâs literature and on how their own identities and positionalities might influence how they interact with literature and other participants. By intentionally directing our recruitment strategy and materials towards library professionals already engaged in EDI work, we aimed to create a group that was both diverse and already familiar with some of the concepts addressed in our training. We asked applicants to provide a resume, and the application included an optional Lived Experience survey through which they could disclose their races and ethnicities, as well as a number of other identities, such as gender, sexuality, religion, and ability status.Â
The success of our recruitment efforts can be seen through both the professional and demographic diversity that was achieved within our participant cohort. Our first cohort from 2022-2023 included a wide array of academic, public, and school librarians, as well as students working towards a Masterâs degree in Library and Information Sciences, hailing from 31 states and Canada. Of our 63 remaining CoP members, 48 completed the Lived Experience survey. Of these individuals, 26 self-identified as BIPOC (41%), whereas among US credentialed librarians as a whole, the percentage identifying as BIPOC is only 12% (âDiversity Countsâ).Â
Implementation and Responsiveness
Facilitators and Learners
We intentionally created a community of learning in which learning was an authentic, active, social process for all of us. As members of the CoP AG, we chose to call ourselves facilitators because we wanted to emphasize that none of us are â or can be â experts in all aspects of EDI work, particularly as EDI work is constantly evolving. Just as members of our advisory group learned continually from one another, we knew we would learn from the cohort members, too, particularly if we invited them to bring their whole selves to the program and share their insights and expertise. We referred to the new cohort members as learners, but they quickly became co-facilitators, helping to shape the training program and some of the coding work itself.
From our own experiences, we knew that learning all the DBF terminology and identifying the categories and tags in books takes time and can initially feel overwhelming, so we scaffolded the training, with the overall coding process broken down into smaller, more manageable sections. Learners developed their coding skills and knowledge over multiple weeks, with new sections of coding added in consecutive weeks and plenty of time for questions and connection building between sections. We also practiced coding with multiple books of varying genres, formats, and intended audiences. We provided the reading assignments in advance, so learners could incorporate the work into their already busy schedules.
Within this carefully designed training structure, we also incorporated flexibility. As anticipated, the work evolved based on our learning communityâs feedback and needs. We provided written and oral feedback opportunities, both through formal surveys and informal discussions in Small Group Sessions, and we quickly responded to feedback. For example, several weeks into the first training, we centered the Small Group Sessions even more fully on discussions of book coding and specific questions that arose rather than reviewing material from the Large Core Classes, and we lengthened the training program for the second cohort based on input from the first cohort.Â
As learners expressed their fears of coding âincorrectly,â we increased our refrain about there being no one ârightâ way to approach EDI issues through book coding; we all code based on the evidence we find in the books and our lived experiences. Our training focused on providing consistent information, guidance, and messaging, and part of our recurrent messaging was that our diversity of experiences and lenses would lead to some valid, different interpretations of material. Through discussions about how and why we coded a book, we learned from one anotherâs positionalities and perspectives and were able to perceive new ways of interpreting the books and the DBF terminology.
Responsivity
At the start, we sought to create an inclusive, compassionate, affirming, and humanizing learning environment. Initiating our work with a growth mindset, we discussed the difference between safe spaces and brave spaces (see Arao and Clemens), inviting everyone to embrace the challenges inherent to EDI work and consider what they needed to do so. As a first step, we introduced the community agreement created by the Association for Library Service to Children (ALSC) in the first Large Core Class, asking learners to consider its elements and propose revisions, additions, and/or amendments that would allow them to step into a brave space. The ALSC agreement and revision suggestions were reviewed in Small Group Sessions the following week. One suggestion was to clarify how we would identify and manage observations of something oppressive being said or done in the group. We discussed this revision as a leadership team, suggested language, then brought it back to the small groups for further discussion and elaboration before sharing it again in the Large Core Class. Once affirmed in the large group, the agreement was finalized and posted on our shared online training base camp. (See Appendix for the agreement.) The whole process was completed in two weeks. This important community building exercise let our learners understand that we saw them as partners, experts capable of contributing to our shared work. It also provided support for our shared goal of authentic, active participation. It is interesting to note that we never had to return to or invoke the community statement, even though we engaged in numerous conversations about hot topics in childrenâs books, diversity, and librarianship together.
We also received feedback from those coding stories featuring Indigenous characters, including from those who identified as Indigenous, reporting that they werenât able to generate what felt like complete and accurate summaries of books featuring Indigenous people. For instance, the metadata meant to capture religious or spiritual experiences was lacking. Using this feedback, we engaged in a larger project, inviting tribal librarians and others with knowledge, skill, and/or lived experience to participate in metadata revisions. The result was the conceptualization and vetting of multiple new tags in collaboration with CoP members and other experts nationally who were part of their networks. Moreover, the experience further conveyed our commitment to shared expertise, authenticity, and active participation, deepening our knowledge and relationships within and beyond the DBF group, leading to one CoP learner agreeing to become a co-facilitator in the 2023-2024 training program.Â
Conclusion
Developing a sustained volunteer-based EDI program or sustained committee work related to EDI requires a multi-theoretical and dimensional approach that questions and begins to erode systemic and institutional barriers to integrative and collaborative working partnerships. In our approach, we used pedagogies of feminism and care and meaningful learning that allowed us to translate theory into practical application and move beyond the conversational and performative aspect of EDI work often seen in libraries. The success of our program, as well as the theories through which we formulated our training goals and structure, is exemplified through the continued involvement and commitment of our first volunteer cohort and their expressed comfort in communicating and learning with our facilitators.
As we build on our success and begin our training program with a new cohort, we continue to add greater structure to our CoP AG conversations and practices and focus on the most essential elements of our program.
Integrate personal lived experiences into the learning environment and consider all the relational levels present to avoid imposing an artificial boundary between professional and self-knowledge.
When learners and facilitators express their needs, listen and respond carefully, trusting that people have good intentions and know what will most benefit and support them.
Allow flexibility for shifts in response to the cohortâs needs and use the ongoing reflection to intentionally and steadily move from contemplation to action.
We hope that reflecting on the following questions will guide and enhance your work as you consider your next steps in creating and/or sustaining intentional and authentic EDI programs that challenge the status quo.
Guiding Questions for a Community of Practice for Volunteer-Based EDI Work
Forming and Supporting Your Leadership Team
What knowledge and positionalities do members of your leadership team have? What knowledge and positionalities are lacking? Are you being honest about what you, as individuals and a group, know and what you donât know? How will you fill in any identified gaps?
Have you built in time for facilitators to reflect individually and check in with each other throughout the program to ensure connection, alignment, and consistency?
Recruiting ParticipantsÂ
How will you recruit participants? How will your methods ensure a diverse group?
How will you invite program participants to engage fully and authentically?
What special considerations are needed to sustain participants from marginalized communities?
Creating Your Program StructureÂ
How will you design your program to avoid life/logistical barriers to participation?Â
What methods for decentering authority and power between facilitators and learners and between learners are you utilizing? How are you making your intentions around this practice clear?Â
Who are your thought partners? What guiding frameworks will you use to inform program development?
How will you celebrate different positionalities, which are key components of a successful program?
How will you build in the flexibility to readily adjust your program, based on your learnersâ and facilitatorsâ needs?
Implementing and Evaluating Your Program
How are you listening and responding to feedback? Are your learners able to see how you are listening and responding? Can they see their real-time impact on the work you are doing together?
What kinds of collection methods will inspire the most honest and complete feedback from participants to allow a full assessment of your program? How and when will you collect this feedback?
We would like to thank the other current members of the Diverse BookFinder Community of Practice Advisory Group, Anne Sibley OâBrien and Andrea Breau, as well as past member Marianne Williams, for envisioning and building this community of practice with us. We would also like to thank the other DBF team members and Community of Practice cohort members for all the work they do to make the DBF possible and accessible to users. Finally, many thanks to our reviewers Ikumi Crocoll and LaKeshia Darden and our editor Jaena Rae Cabrera for helping to shape our writing. We appreciate the time and labor that went into improving this article and connecting it with readers.Â
References
âALSC Community Agreements.â Association for Library Service to Children, 2020,Â
âAmerican Library Association reports record number of demands to censor library books and materials in 2022. â American Library Association, March 22, 2023, www.ala.org/news/press-releases/2023/03/record-book-bans-2022. Accessed 8 Sept. 2023.
Arao, Brian and Kristi Clemens. âFrom Safe Spaces to Brave Spaces: A New Way to Frame Dialogue around Diversity and Social Justice.â The Art of Effective Facilitation: Reflections from Social Justice Educators, edited by Lisa M, Landreman, Stylus, 2013, pp. 135-150.
Aronson, Krista Maywalt, Breanna D. Callahan, and Anne Sibley OâBrien. âMessages Matter: Investigating the Thematic Content of Picture Books Portraying Underrepresented Racial and Cultural Groups,â Sociological Forum, vol. 33, no. 1, 2018, pp. 165-185, https://onlinelibrary.wiley.com/doi/10.1111/socf.12404. Accessed 8 Jan. 2024.
Cummins, June. âThe Still Almost All-White World of Childrenâs Literature: Theory, Practice, and Identity-Based Childrenâs Book Awards.â Prizing Childrenâs Literature: The Cultural Politics of Childrenâs Book Awards, edited by Kenneth B. Kidd and Joseph T. Thomas, Jr., Routledge, 2017, pp. 87-103.
Dali, Keren and Nadia Caidi. âDiversity by Design,â The Library Quarterly, vol.87, no. 2, 2017, pp. 88-98.
Davis, Angela et al. Abolition. Feminism. Now. Chicago: Haymarket, 2022.Â
Espinosa de los Monteros, Pamela and Sandra Enimil. âDiversity, Equity, and Inclusion in Action: Designing a Collective DEI Strategy with Library Staff.â Diversity, Equity, and Inclusion in Action: Planning, Leadership, and Programming, edited by Christine Bombaro, ALA Editions, 2020, pp. 13-27. Â
Grissom-Broughton, Paula A. âA Matter of Race and Gender: An Examination of an Undergraduate Music Program Through the Lens of Feminist Pedagogy and Black Feminist Pedagogy.â ResearchStudiesinMusicEducation, vol. 42, no. 2, 2020, pp. 160-176.
Jonassen, D. H. âExternally Modeling Mental Models.â LearningandInstructionalTechnologiesforthe21stCentury: VisionsfortheFuture, edited by Leslie Moller, Jason Bond Huett, and Douglas M. Harvey, Springer, 2009, pp. 49-74.
McCusker, Geraldine. âA Feminist Teacherâs Account of Her Attempts to Achieve the Goals of Feminist Pedagogy.â GenderandEducation, vol. 29, no. 4, 2017, pp. 445-460.
Quinlan, Kathleen M. âHow Emotion Matters in Four Key Relationships in Teaching and Learning in Higher Education.â College Teaching, vol. 64, no. 3, 2016, pp. 101-111, DOI: 10.1080/87567555.2015.1088818.Â
Tedesco-Schneck, Mary. âClassroom Participation: A Model of Feminist Pedagogy.â NurseEducator, vol. 43, no. 5, 2018, pp. 267-271.
Tormey, Roland. âRethinking Student-Teacher Relationships in Higher Education: A Multidimensional Approach.â Higher Education, vol. 82, 2021, pp. 993-1011, DOI: 10.1007/s10734-021-00711-w.
Appendix
Community Agreements for the Diverse BookFinder Community of Practice
Thank you to the Association for Library Service to Children (ALSC) for widely sharing their community agreements, which we have drawn on heavily in this document.
Diverse BookFinder Community Agreements:
These community agreements were developed so that all meetings/classes convened by Community of Practice members/facilitators of the Diverse BookFinder (DBF) are spaces where meaningful and respectful conversations are held. The agreements outline best practices to ensure that everyone has an opportunity for expression, accountability, and growth.Â
They provide a guide to how topics are discussed, the language used, and how our different experiences, identities, and knowledge are reflected in our thought processes, discussions, and decisions. As you participate in discussions, meetings, presentations, etc. please use these guidelines as a starting point and as a group; add additional agreements if necessary.
â Speak for yourself. Use âIâ and be aware that your perspective is not everyoneâs perspective or the ânormalâ perspective.
â Embrace multiple perspectives to engage in curiosity-driven dialogue (not debate or argument). Have compassion for and honor peopleâs varied journeys while respecting their humanity. The goal of dialogue should not be to change anyoneâs mind but to offer and receive a perspective for consideration and curiosity. Even if your every cell feels in disagreement with someoneâs perspective, right and wrong binaries rarely build connection and understanding. Do note that racism, bigotry, and all other forms of oppression are not a difference of opinion and will not be tolerated.
â Be aware of the privilege, oppressions, and life experiences you carry and how they might impact your discussion process.
â Listen to and use peopleâs correct names and pronouns. Let people know how you would like to be addressed during introductions and include pronouns if you would like. If pronouns are not shared or if you are unsure of someoneâs pronouns, refer to the person by their name.
â Share the air. Be aware of how much you are talking versus listening. Challenge yourself to invite others into the conversation and âstep upâ if you are prone to not participating. We all have something to bring to the discussion.
â Interrupt attempts to derail. Oftentimes, discomfort is so great that we immediately attempt to change the conversation to something that feels more comfortable. Before you know it, the conversation is about the weather when we were talking about equity. Work to stay engaged when you feel uncomfortable and make mistakes (this is when learning happens).
â Acknowledge intent while addressing impact. Work to not personalize the responses of others while taking care to be mindful of the impact of our words and our actions on others. Understand that intent does not equal impact and acknowledge the impact of something that was said or done during the conversation (or break) by criticizing ideas and not individuals.
â Interrupt bias and take feedback. We want to cultivate a space for everyone to learn, to be wrong and unlearn, to be accountable and change. We recognize that this process always happens in relation to each other and so can and will be hard. Itâs also important to us that the necessary labor of creating this space does not fall on the same bodies. In order to hold the systems and structures of power that create harmful ways of relating to each other accountable, this work requires careful intention, thoughtfulness, creativity, and experimentation. We will not always get it right, but if we do this work collectively, we can move forward together.**
Self and community/collective accountability are essential to our work together. If you observe something oppressive being said or done (by yourself or others), please acknowledge it. [For example, âouchâ and âoopsâ and âohâ are words that can be used to acknowledge moments when you recognize something oppressive is said (âouch,â âoh,â or another term) or you notice a mistake that youâve made (âoops,â âoh,â or another term). Of course, you can raise a topic for discussion without using these terms as well.]
During Large Group Classes: A facilitator will be identified as âChat Moderatorâ for the evening. If you would like to bring an experience or example of bias to the attention of the group and have it addressed in some way, please use the chat to privately message the Chat Moderator and let them know.
During Small Group Discussions:Â If you would like to bring an experience or example of bias to the attention of the group and have it addressed in some way, please use the chat to privately message your facilitator and let them know.
In either case, facilitators may address the moment immediately or they may ask for some grace and the opportunity to further reflect (and receive guidance) on how to best address the situation.
Please remember that everyone (your facilitator included) is human. As we experience feedback about bias, it is our personal responsibility to keep learning. However, that learning may require deeper dialog, reflection, and/or time.
 **Ideas adapted from Angela Davis et alâs Abolition. Feminism. Now. (which centers the tools/strategies of transformative justice and community accountability).
â Remember that we all have opportunities to grow. Feedback is a gift of experience and expertise, and it acknowledges that learning is complex and never-ending. Receive it and consider systems of dominance and power at play in community conversations and interactions. Be aware of the lenses you do and do not have as a result of your identities and experiences.
â Whatâs said here, stays here. Whatâs learned here, leaves here. DBF meetings should be a safe place where people can feel free to be vulnerable and share things about their identities. No one should have to worry about these things being discussed outside of DBF. But take other DBF knowledge and learning with you!
â On Cameras: Connection is crucial. As we move through the process of training and discussion, we will be interrogating challenging and sensitive topics. We aim to provide the most open, productive, and engaged spaces in which to do this while still considering the flexibilities often required by real life. We believe that being on camera allows us to best build the connections and trust required to fully engage in these conversations. While we suggest that your camera remain on during all of your DBF sessions, cameras will be required during small group sessions.
If you need to turn your camera off temporarily, please turn it on as soon as possible.
If you are having technical difficulties or need to leave your camera off for an entire small group session, please communicate that with your facilitator.
Occasional interruptions from guest stars such as dogs, cats, other furry/feathered/scaly friends, children, roommates, partners, parents, coworkers, doorbells, and food deliveries are a normal part of virtual meetings and working from home and will be expected.
â Reach out before conflicts get worse. All of our facilitators are skilled educators, whether in higher education or as professional speakers, with particular expertise with facilitating conversations about race, ethnicity, and culture. Every effort has been made to create a solid footing for this work, with the goal of creating a brave space where we can all learn and grow together. Even so, conflicts may emerge during the course of our work. When that happens we would first like you to reach out to your small group facilitator. Our hope is that together you can discuss the matter and work towards resolution.
If you experience conflict with your small group leader, please reach out to Krista A. or Lisely L. for assistance.
Sources:
Much of the language above is borrowed from the following organizations/documents:
During the seven-week training program, learners and facilitators participated in weekly Large Core Classes and Small Group Sessions. After the training program ended, we continued to hold monthly Small Group Sessions to discuss new coding questions and share insights and approaches. The monthly Small Group Sessions provided ongoing consistency and community and continued our collaborative approach to learning from one another.
In our recruitment efforts, we promoted the DBF CoP to members of the following library organizations:Â the American Library Associationâs Ethnic and Multicultural Information Exchange Round Table (EMIERT); the Black Caucus of ALA (BCALA); the Association for Library Service to Childrenâs Equity, Diversity, and Inclusion Implementation Task Force; the American Association of School Librariansâ Diversity, Equity and Inclusion Community of Practice; REFORMA: The National Association to Promote Library and Information Services to Latinos and the Spanish Speaking; the Asian Pacific American Library Association (APALA); the American Indian Library Association (AILA); the Rainbow Roundtable (RRT); and the Association of Jewish Libraries (AJL). We also shared the opportunity with the Association for Library Service to Children (ALSC), the Young Adult Library Services Association (YALSA), and divisions of the National Council of Teachers of English (NCTE).
The followingâŻpost is part of an ongoing series about the OCLC-LIBER âBuilding for the futureâ program. A Dutch version of this blog post is also available.
The OCLC Research Library Partnership (RLP) and LIBER (Association of European Research Libraries) hosted a facilitated discussion on the topic of data-driven decision making on 7 February 2024. This event was a component of the ongoing Building for the future series exploring how libraries are working to provide state-of-the-art services, as described in LIBERâs 2023-2027 strategy.
The OCLC RLP team worked collaboratively with members of the LIBER Research Data Management  and Data Science in Libraries working groups to develop the discussion questions. Like our earlier discussion on research data management, we tried to keep things practical, asking participants to share about current and future efforts, and to contribute their thoughts on the role and value of the library in supporting data-driven decision making. Small group discussions were facilitated by generous volunteers from LIBER working groups and OCLC.
The virtual event was attended by participants from 35 institutions across 15 countries from Europe, North America, and Asia. Despite many regional and national differences, there were several key themes that surfaced across the seven breakout discussion groups, which is synthesized below.
What does âdata-driven decision makingâ mean for libraries?
We asked participants this question in a virtual poll, and we reached fairly strong consensus that data-driven decision making means âusing evidence to inform decisions and evaluate their outcomes.â While we framed this discussion using the phrase âdata-driven,â we recognize that others prefer âdata-informedâ or âdata-conscious.â
Indeed, while the conversations recognized the value of using quality data to inform decisions, we also heard cautionary comments that data should be considered as a decision support tool. Data should be used within context, and users should not use data to the exclusion of other qualitative ways of knowing.
Online poll responses to question about the meaning of âdata-drivenâ decision making
How are libraries supporting data-driven decision making?
There are dozens of ways that libraries are supporting data-driven decision making. We heard from participants who described collective collections efforts, where a group of libraries is working together to manage their combined holdings, to support collection retention decisions, and more. Additionally, borrowing statistics can be used to inform both collection development and weeding decisions.
Beyond collections, participants described analyzing library building usage data (such as gate traffic and wifi usage) to measure the busyness of spaces, to inform space management decisions.
Participants also described the growing role of the library in research analytics, in support of institutional goals. In the UK, the library is usually responsible for managing data about the institutional scholarly record, for reporting to the national Research Excellent Framework (REF) assessment exercise. Elsewhere, library workers are supporting institutional efforts to understand research productivity, progress toward open research goals, and identify potential collaborations. And, of course, libraries are creating specific roles to manage a wide variety of data and make it available for reuse, the topic of a recent LIBER interview with Matthias Töwe, Data Curator at ETH Zurich Library.
Supporting data-driven decision making is challenging
Libraries are awash in data. Several participants described the feeling of being overwhelmed by all the data available, with the sheer volume making it challenging to manage, clean, and use effectively. At the same time, it can be difficult to even know what data is available, because it is spread across many silos within the organization. Greater organization and transparency are necessary.
Collaboration is required, regardless of scale. Multi-institutional collective collections analyses demand significant investment and commitment from a wide variety of stakeholders across many institutions and library units. Even when seeking an answer to local operational questions, where , as one participant noted, âwe need certain bits of data from other people,â library workers must apply social interoperability to get work done.
Users asking for data and reports are often unable to clearly articulate what they need. This is apparently such a widely felt pain point that it was the #1 response to our online poll about the tensions and challenges of collaboration around data-driven decision making. One small group discussed the need to repurpose âreference interviewâ skills to interview data consumers in order to clarify the questions they are seeking to answer.
What is the value proposition of the library for data-driven decision making?
We asked the small groups to discuss the overarching value proposition of the library in supporting data-informed decisions, and several themes emerged across the group discussions:
Libraries know metadata. The skills and knowledge that metadata librarians hold about library data is invaluable for managing collections. . . and more. This metadata expertise is clearly a strength, but one that may be easily overlooked, requiring improved messaging to non-library audiences. One participant expressed concern that library expertise is too easily dismissed because it was seen as âjust books,â without recognizing the transferability and value of these skills, such as experience with complex enterprise systems, proficiency with data management, and the consistent application of rules, standards, and policies.
Libraries use data to responsibly steward resources. Shared print and collective collections activities rely upon aggregated library holdings data to make decisions about collections development, retention, and long term and cost effective stewardship of the scholarly record. Several participants also described how data about both collections and library building usage has been leveraged to make decisions about future space utilization. Libraries also need âto show that we are making good use of [campus] resources, so that they will continue to fund us.â
Research support services that extend beyond the library are highly visible to other campus stakeholders. Library support in areas like research data management, research intelligence, and managing data for national reporting requirements, in alignment with campus strategic priorities, often offer the greatest visibility to non-library stakeholders. For example, participants from the UK and Hong Kong described the central role of the library in collecting the scholarly record of the institution, to support national reporting requirements and provide analysis of the output and impact of institutional scholarship. A Canadian participant described their creation of a bibliometrics librarian who now leads an informal network of business intelligence officers across the university, providing decision support about compliance, assessment, and funding. Libraries are also exploring how they can define a set of indicators that will provide insights into open research activities, as described in a recent RLP webinar presentation by Scott Taylor at the University of Manchester.
What are some strategies libraries can use to demonstrate this value proposition?
Library leaders must advocate for the libraryâs role. We heard many examples of libraries providing institutional decision support. However, it can still be a challenge for non-library stakeholders to recognize the library as strong contributor, and participants echoed a concern we heard in the previous facilitated discussion on research data management: âPeople donât think of the library.â Library leaders should be relentless in advocating for the knowledge and skills of library workers, guiding campus partners to conceptualize the library in new and modern ways.
Use a âpurpose treeâ to codify value and communicate internally and externally. A UK participant in one small group discussion shared how she had created a purpose tree for her metadata team, which included a high level vision and strategy statement about their activities and how they contribute to library and university strategy. The document helped demonstrate that catalogers werenât just âsitting in the corner and going through books,â but that they played a vital role in stewarding quality metadata, supporting an array of business needs. The other small group participants expressed sincere enthusiasm for this idea, and it seems to offer a framework for team building and strategic goal alignment.
Visualizations and data storytelling is required. A strong theme throughout the small group discussions was that the data itself is not enough. Librarians must also develop data storytelling skills and leverage visualizations in order to effectively communicate and create enthusiasm for the data findings.
Library workers must upskill, both individually and in teams. Library workers bring significant skills to managing data, but they often lack training in data analysis, including tools like PowerBI and Tableau. Participants shared many stories of how they are acquiring these skills. For example, a Hong Kong participant described how her institution formed an interest group to explore data analysis and build skills, enabling participants to learn from each other in a supportive environment. Another participant from the Netherlands described a similar effort, where their local working group is learning data visualization skills and building a broader community of practice. In general, participants expressed the need not only for the upskilling of existing staff, but the future onboarding of staff members with mature technical data analysis skills.
Word cloud summary from event polling
We concluded the event by inviting participants to share one word about how they felt, and they reported feeling inspired, informed, and encouraged.
Join us for the upcoming facilitated discussion on AI, Machine Learning, and Data Science
The next discussion in this multi-part series on state-of-the-art services will take place on 17 April, where we will collectively explore the challenges and opportunities of AI, machine learning, and data science. The session will focus on the ways that research libraries are using (or want to use) advancing technologies to improve library workflows, metadata, and more. By facilitating structured small group discussions, we are inviting participants to ideate and share about their future visions for AI and data science, while also purposefully exploring the challenges libraries face in leveraging emerging technologies responsibly. Register today to save your spot.
The news
about Google open sourcing its new âAIâ driven file format
identification tool magika made a splash in the
usual tech places recently. This post provides a very quick look at just
one file through the lens of three file format identification tools, and
gestures a bit about what we are giving up when we give in to big techâs
machine learning models.
Andy Jackson
from the Digital Preservation Coalition has a good
post about how magika is quite limited in terms of the
formats it identifies, and the types of information it reports. He also
points out that itâs important to remember that Google created magika to
help route files to specialized security scanners in Gmail and Google
Drive, which is quite different from digital preservation use cases. In
digital preservation the concern is usually around mitigating perceived
obsolescence of file formats, and also determining what applications can
be used to render the file, both of which require knowledge not just of
the format but also its version.
So, hereâs a quick comparison of looking at one TIFF file using
magika, the venerable file Unix
command, and siegfried, which is
a specialized tool developed by and for the digital preservation
community. Think of this as a close reading of
tools for file format identification, to try to discover or illustrate
something significant in the details of their output, rather than a
statistical overview of what the tools do more generally.
$ magika MCE_AF2G_2010.tif
MCE_AF2G_2010.tif: TIFF image data
Awesome file, thanks for the extra information about the
dimensions, compression, and colors.
$ sf MCE_AF2G_2010.tif
---
siegfried : 1.11.0
scandate : 2024-02-20T18:01:54-05:00
signature : default.sig
created : 2023-12-17T15:54:41+01:00
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V116.xml; container-signature-20231127.xml'
---
filename : 'MCE_AF2G_2010.tif'
filesize : 1484121
modified : 2024-02-08T10:07:46-05:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/155'
format : 'Geographic Tagged Image File Format (GeoTIFF)'
version :
mime : 'image/tiff'
class : 'GIS, Image (Raster)'
basis : 'extension match tif; byte match at 0, 186 (signature 1/4)'
warning :
Niiice siegfried, this is important! The TIFF file isnât just
an any image file, itâs a GeoTIFF file. If we
were to open the TIFF file in a regular image viewer like Preview on
MacOS weâd see this:
But since we know itâs a GeoTIFF we can also view it in GIS software
like QGIS:
And we can use other tools like gdalinfo to look at
metadata in the file:
â tmp gdalinfo x.tif
Driver: GTiff/GeoTIFF
Files: x.tif
Size is 7460, 3724
Coordinate System is:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (56.249996410171626,38.166360030600401)
Pixel Size = (0.002160683455290,-0.002160683455290)
Metadata:
AREA_OR_POINT=Area
DataType=Thematic
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 56.2499964, 38.1663600) ( 56d14'59.99"E, 38d 9'58.90"N)
Lower Left ( 56.2499964, 30.1199748) ( 56d14'59.99"E, 30d 7'11.91"N)
Upper Right ( 72.3686950, 38.1663600) ( 72d22' 7.30"E, 38d 9'58.90"N)
Lower Right ( 72.3686950, 30.1199748) ( 72d22' 7.30"E, 30d 7'11.91"N)
Center ( 64.3093457, 34.1431674) ( 64d18'33.64"E, 34d 8'35.40"N)
Band 1 Block=7460x1 Type=Byte, ColorInterp=Red
Mask Flags: PER_DATASET ALPHA
Band 2 Block=7460x1 Type=Byte, ColorInterp=Green
Mask Flags: PER_DATASET ALPHA
Band 3 Block=7460x1 Type=Byte, ColorInterp=Blue
Mask Flags: PER_DATASET ALPHA
Band 4 Block=7460x1 Type=Byte, ColorInterp=Alpha
So if we used magika we never would have known we could put the
image on a map.
⊠dramatic pause âŠ
But perhaps even more important is what we are giving up when
we rely entirely on a machine learning model, like what comes with
magika, instead of the hand crafted rules used by file
and siegfried.
We lose the ability to reason about the output. Why was one
format chosen and not the other?
We lose the ability to update the tool to recognize new formats or to
correctly choose other ones.
When you install the Python version of magika you get a few
files added to your Python environment:
âââ __init__.py
âââ cli
â  âââ magika.py
âââ colors.py
âââ config
â  âââ content_types_config.json
â  âââ magika_config.json
âââ content_types.py
âââ logger.py
âââ magika.py
âââ models
â  âââ standard_v1
â  âââ model.onnx
â  âââ model_config.json
â  âââ model_output_overwrite_map.json
â  âââ thresholds.json
âââ prediction_mode.py
âââ strenum.py
âââ types.py
The model.onnx file is what is used to determine what
format a file is in. It was generated by developers at Google who used
large amounts of data that they have, because theyâre Google, and
(presumably) a compute cluster, because theyâre Google. Itâs a binary
blob that you canât edit, or read as a human.
file and siegfried on the other hand use hand crafted
databases of âmagic codesâ to look for in order to determine what format
is likely for a file. Lots of time and effort have gone into creating
and maintaining them. The rules arenât perfect, or complete, but we do
know how to update and fix them.
We donât know all the details yet about how Google built their model,
but itâs quite unlikely that the dataset they used is going to be made
publicly available. I guess itâs possible (go on Google I dare you), but
even if they did you will need (potentially a lot) of compute resources
to be able to run the modeling itself. This means that not anyone can
build this âopensourceâ tool. The model data is available, but the way
to create it is not. Itâs basically like making binary executables
available without the source code.
If you find a new file format, or notice that a file format isnât being
recognized correctly you wonât be able to fix it, because fixing it
involves tuning the machine learning algorithm that was used, and
running it on an augmented dataset, which you donât have access to.
Funnily enough, siegfried and file have no idea what
the file format for model.onxx is. But magika says
itâs a âPython compiled bytecode (executable)â. After experimenting a
little bit it does seem like magika is able to distinguish
programming language source code and executables quite a bit better than
traditional tools.
So perhaps the reality we are in is that it might be useful to
have multiple perspectives on file formats, and that running multiple
tools could have uses. However the digital preservation community should
be careful not to throw the baby out with the bath water. Itâs important
that we are able to maintain our tools, and be able to understand why
they behave the way they do.
Update 2024-02-22: V pointed out to
me that it should be possible to fine
tune to the magika model, without requiring access to the original
corpus that it was trained on, and the compute infrastructure that was
used. This sounds promising, and I would actually really like to be
proven wrong here. But I remain concerned that while fine tuning might
be achievable, adding new file formats could prove
difficult, or at least beyond the ken of digital preservationists. Iâm
not against learning new things (this old dog can still be taught new
tricks), but replacing domain expertise in peopleâs brains for whatâs in
ML engineerâs brains is a real transfer of power that is underway at the
moment. Perhaps it has been underway for decades under the guise of
automation more generally (not just machine learning), but that is a
topic for another postâŠ
In early December 2022 when I wrote skeptically about the economics of Bitcoin mining in Foolish Lenders the Bitcoin "price" was around $17K. It has now climbed 153% to around $43K and, below the fold, I am still posting skeptically about the economics of mining.
Bitcoin miners are getting a jump on an anticipated decline in revenue from the so-called halving in April, when the blockchainâs network protocol will reduce rewards for verifying transactions by half.
Miner reserves â unsold Bitcoin held in digital wallets associated with the companies â have dropped by 8,400 tokens since the start of 2024 to 1.8 million, a level last seen in June 2021, according to data compiled by CryptoQuant. Analysts said the decrease indicates miners are selling tokens.
The somewhat misleading graph of the miners HODL-ings actually shows only a 2% drop in the number of BTC from the peak in August 2022. But the "value" of those HODL-ings has risen 75% from $44.8B to $78.5B.
One way of looking at it is that the mining industry started August 2022 with 1.865M BTC and, in the 18 months since mined 821,250 BTC for a total of 2,686,250 but they now have only 1.825M BTC so they must have sold 861,250, or about 5% more than they mined.
Nevertheless:
âMiners have begun to sell more of their coins to bolster balance sheets and fund growth capex ahead of tougher times for margins when block rewards are halved in April,â said Matthew Sigel, head of digital-asset research at VanEck. âAfter the halving, scale will matter even more.â
Before the great EFT pump, it was generally believed that the frantic efforts to pump BTC over $30K suggested that the mining industry's break-even point was around $30K. On 6th October BTC was $26K and the hash rate was around 412M TH/s. Lets assume that the industry was breaking even at BTC=$30K and a hash rate of 412MTH/s, so the industry's costs were covered by 45K BTC/month or $1.35B/month. Assuming the increase in efficiency is roughly cancelled by less efficient operators entering, the hash rate is now 527 TH/s and so costs are around $1.73B. But income is around $1.94B/month so margins are around 12%.
After the halvening, income is 22.5K BTC/month. At BTC=$43K, this is $968M/month or 56% of current costs. To maintain a 12% margin, costs need to be cut to $852M/month, or 49% of current costs. Alternatively, if costs stay the same, the Bitcoin "price" needs to increase to $86K in April.
The industry isn't going to halve its costs in the next three months, and even massive printing of unbacked Tether isn't going to double the BTC "price", so "tougher times for margins" are certainly among the clouds overhead.
Source
I am not the only skeptic. Short-seller J Capital Research investigated Hut 8, a large public miner, and on January 18th published The Coming HUT Pump and Dump. As the chart shows, it was effective. The report featured a list of 22 different red flags, focused mainly on the recent merger between Hut 8 (HUT) and U.S. Bitcoin Corp. (USBTC) and asking:
Why then did HUT pay $745 mln to acquire this company and its planned payments?
Hut 8âs North Bay mining facility has been non-operational for an extended period of time, and problems at its Drumheller facility âhave been causing miners to fail.â
has an industry-low efficiency rate and, post halving, will produce Bitcoin at a loss of close to $20,000 per Bitcoin at current spot prices.
In other words, the merged company can barely make money now and cannot survive when the industry's income is halved in less than 3 months. This all looks like rats leaving the sinking ship with whatever they can carry. An impression reinforced by David Pan in Bitcoin Miner Hut 8 CEO Exits Three Weeks After Short-Seller Allegations :
Hut 8 Corp., one of the largest publicly traded Bitcoin mining companies, named Asher Genoot to succeed Jaime Leverton as chief executive officer, three weeks after a short-seller released a report critical of its recent merger.
The transition is effective immediately. Genoot served as the chief operating officer and the president of US Bitcoin Corp. Miami-based US Bitcoin, which has large-scale mining facilities across the US including Texas, completed its merger with then-Canadian miner Hut 8 in late 2023.
The leadership transition comes amid increasing competition among the miners, a Bitcoin code update set to drastically reduce mining revenue in two months as well as the Jan. 18 report from short-seller J Capital Research alleging the merged company was a âpump and dumpâ waiting to happen. Hut 8 has disputed the claim.
His co-founder was USBTC's CEO and is now HUT's CSO/director and:
is a 30-year-old used-car salesman from Vancouver whose history is littered with involvement in SEC-defined pump-and-dumps, sporting share-price declines of 83%,
The key inputs for profitable mining are low-cost power and state-of-the-art chips to use it as effficently as possible. Clouds are gathering over both of them.
A cryptocurrency firm has lost a bid to force BC Hydro to provide the vast amounts of power needed for its operations, upholding the provincial government's right to pause power connections for new crypto miners.
Conifex Timber Inc., a forestry firm that branched out into cryptocurrency mining, had gone to the B.C. Supreme Court to have the policy declared invalid.
But Justice Michael Tammen ruled Friday that the government's move in December 2022 to pause new connections for cryptocurrency mining for 18 months was "reasonable" and not "unduly discriminatory."
BC Hydro CEO Christopher O'Riley had told the court in an affidavit that the data centres proposed by Conifex would have consumed 2.5 million megawatt-hours of electricity each year.
Bitcoin miners in the US are consuming the same amount of electricity as the entire state of Utah, among others, according to a new analysis by the US Energy Information Administration. And thatâs considered the low end of the range of use.
Electricity usage from mining operations represents 0.6% to 2.3% of all the countryâs demand in 2023, according to the report released Thursday. It is the first time EIA has shared an estimate. The mining activity has generated mounting concerns from policymakers and electric grid planners about straining the grid during periods of peak demand, energy costs and energy-related carbon dioxide emissions.
âThis estimate of U.S. electricity demand supporting cryptocurrency mining would equal annual demand ranging from more than three million to more than six million homes,â the report said.
Global electricity demand from data centers, cryptocurrencies and artificial intelligence could more than double over the next three years, adding the equivalent of Germanyâs entire power needs, the International Energy Agency forecasts in its latest report.
There are more than 8,000 data centers globally, with about 33% in the US, 16% in Europe and close to 10% in China, with more planned. In Ireland, where data centers are developing rapidly, the IEA expects the sector to consume 32% of the countryâs total electricity by 2026 compared to 17% in 2022. Ireland currently has 82 centers; 14 are under construction and 40 more are approved.
Overall global electricity demand is expected to see a 3.4% increase until 2026, the report found. The increase, however, will be more than covered by renewables, such as wind, solar and hydro, and all-time high nuclear power.
The Biden administration is now requiring some cryptocurrency producers to report their energy use following rising concerns that the growing industry could pose a threat to the nationâs electricity grids and exacerbate climate change.
The Energy Information Administration announced last week that it would start collecting energy use data from more than 130 âidentified commercial cryptocurrency minersâ operating in the US. The survey, which started this week, aims to get a sense of how the industryâs energy demand is evolving and where in the country cryptocurrency operations are growing fastest.
âAs cryptocurrency mining has increased in the United States, concerns have grown about the energy-intensive nature of the business and its effects on the US electric power industry,â the EIA said in a new report, following the announcement. âConcerns expressed to EIA include strains to the electricity grid during periods of peak demand, the potential for higher electricity prices, as well as effects on energy-related carbon dioxide emissions.â
A sudden freeze in Texas may have contributed to a 34% drop in the Bitcoin hash rate, as some miners were forced to curtail operations amid demand on the stateâs energy grid.
Beginning on Jan. 14, temperatures in many parts of Texas dropped below freezing for one of the first times since a massive ice storm in February 2023. According to data from YCharts, the total Bitcoin network hash rate fell from more than 629 exahashes per second (EH/s) on Jan. 11 to roughly 415 EH/s on Jan. 15 â a 34% drop. The analytics site reported the hash rate increased to more than 454 EH/s on Jan. 16 as temperatures in Austin briefly rose above freezing during the day.
Ethiopia has emerged as a rare opportunity for all firms that mine the original cryptocurrency, as climate change and power scarcity fuel a backlash against the $16 billion-a-year industry (at Bitcoinâs current price) elsewhere. But it holds special appeal for Chinese companies, which once dominated Bitcoin mining but have struggled to compete with local rivals in Texas, the current hub.
It is also a risky gamble, for the companies and Ethiopia alike. A succession of developing countries like Kazakhstan and Iran initially embraced Bitcoin mining, only to turn on the sector when its energy use threatened to fuel domestic discontent. Chinaâs reign as the epicenter of Bitcoin mining came to an abrupt end in 2021, when the government banned it. Dozens of companies were forced to leave.
Ethiopian officials are wary of the controversy that accompanies Bitcoin mining, according to industry executives who spoke on condition of anonymity to avoid jeopardizing government relations. Even after new generation capacity came online, almost half the population live without access to electricity, making mining a delicate topic. At the same time, it represents a potentially lucrative source of foreign-exchange earnings.
...
The reliance on abundant power is also a major vulnerability because it can put miners in competition for electricity with factories and households, exposing them to political backlash.
When Kazakzstan imposed fresh curbs and taxes on miners, âit basically killed the industry,â said Hashlabs co-founder Alen Makhmetov. Two years after the clampdown, his 10-megawatt facility there is still sitting idle.
And in an era when rising temperatures wreak havoc around the world, Bitcoin mining is increasingly seen as a contributor to global warming that doesnât serve any productive purpose â even though miners have claimed theyâre increasingly tapping clean energy. A study by United Nations University published in October estimated that two-thirds of the electricity used for Bitcoin mining in 2020 and 2021 was generated using fossil fuels.
The Arkansas Data Centers Act, popularly called the Right to Mine law, offers Bitcoin miners legal protections from communities that may not want them operating nearby. Passed just eight days after it was introduced, the law was written in part by the Satoshi Action Fund, a nonprofit advocacy group based in Mississippi whose co-founder worked in the Trump administration rolling back Obama-era climate policies.
Despite efforts to build bipartisan support, the Satoshi fund has succeeded predominantly in red states. But in Arkansas, where the state legislature is dominated by Republicans, it is conservatives who have led calls to repeal the law, including Senator Bryan King, a poultry farmer whose district includes a property purchased by one of the companies tied to the Chinese government. He said it was not fair that the Bitcoin operators received special protections under the law, which shields them from âdiscriminatory industry specific regulations and taxes,â including noise ordinances and zoning restrictions.
At least the Ethiopian mines don't emit much CO2, they run on hydropower:
The opening of the GERD project increased Ethiopiaâs installed generation capacity to 5.3 gigawatt, 92% of which comes from hydropower, a renewable energy source.
Once GERD is fully completed, Ethiopiaâs generation capacity will double, according to Ethiopian Electric Power. It charges Bitcoin miners a fixed rate of 3.14 US cents per kilowatt hour for electricity drawn from substations, Marketing and Business Development Director Hiwot Eshetu said in an interview.
While thatâs similar to the average in Texas, rates in the Lone Star State can swing wildly, Luxorâs Vera said, making profits there less predictable. In Ethiopia, the price will fall once miners connect directly to power plants, according to Hiwot.
But if the utility can make money selling power to the mines right by the dam they have little incentive to build out the grid than could get the power to the unserved half of the population.
As regards the longer-term issue of access to state-of-the-art chips, it is important to note that the best mining chips are sold by Bitmain, a Chinese company, but manufactured at TSMC in Taiwan. There are two major risks here. The first is that the US appears determined to prevent China importing leading-edge chips and the equipment to make them. China remains at least a generation and a half behind TSMC and Samsung, and reportedly has poor yields on its leading-edge process. These restrictions could well prevent Chinese mining companies acquiring leading-edge rigs, and might cause TSMC problems in fab-ing Bitmain products.
Second, there is the looming threat of a Chinese blockade or even invasion of Taiwan. Of course, difficulties for Bitcoin miners are hardly the major impact if these threats are made good. One might think that, even if supplies of new mining chips were cut off, existing rigs would continue working. In the short term they would, but there is a long history of rigs being obsolete after about 18 months. So they aren't designed or operated for longevity.
Andrew Lelandâs memoir documents his journey dealing with a genetic disorder that is slowly diminishing his sight. As a voracious reader, writer, and editor, Leland discusses the challenges he has faced in adapting to assistive technologies. His journey isnât just about learning new technologies but about dealing with intersectional identities as a sighted person learning to integrate into the culture of people with low vision. If you want a quick hit, check out Roman Marsâ 99% Invisible podcast interview with Leland. Â
Listening to Lelandâs interview, I was struck when he asked many of the questions Iâve asked myself. As my vision diminishes, should I learn assistive technologies while Iâm still sighted? Despite a life of wearing corrective glasses, I have not necessarily identified myself as a member of the low-vision community â although it animates my interest in accessibility. Lelandâs discussion of his own journey here was particularly poignant to me. Contributed by Richard J. Urban.Â
Intellectual Freedom Round Table virtual book clubÂ
The IFRT Reads discussion group of ALAâs Intellectual Freedom Round Table will host the second installment of its 2024 series in the form of a free hour-long webinar on 27 February 4:00 p.m. Eastern Time. During the first session held on January 24, Chapter 2 (âUnderstanding the Library Bill of Rights and its Significance to Diversity in Collection Developmentâ) of Decentering Whiteness in Libraries: A Framework for Inclusive Collection Management Practices, by Dr. Andrea Jamison, assistant professor of school librarianship at Illinois State University (OCLC Symbol: IAI), was discussed. Dr. Jamison will be present and answering questions for the February 27 installment, registration for which is now open.Â
As chair of the working group responsible for the 2019 âDiverse Collections:Â An Interpretation of the Library Bill of Rights,â Dr. Jamison stands in a unique position to talk authoritatively about building diverse collections according to the core principles of intellectual freedom. Currently, Dr. Jamison serves as Chair of ALAâs Ethnic and Multicultural Information Exchange Round Table (EMIERT), which is specifically charged with promoting services âfor all ethnolinguistic and multicultural communities in general.â Contributed by Jay Weitz.Â
Academic libraries leading the way in access and diversityÂ
Insight Into Diversity magazine, the largest and oldest diversity and inclusion publication in higher education, has awarded 56 academic libraries the inaugural 2024 Library Excellence in Access and Diversity (LEAD) Award for their outstanding programs and initiatives promoting diversity, equity, and inclusion (DEI). The LEAD Award highlights initiatives in areas such as research, technology, accessibility, exhibitions, and community outreach. âAs higher education institutions provide more than just legally required accessibility and disability services, they could find guidance from their own academic libraries, who are often at the forefront of this field. From digital resources and sensory spaces to personalized assistance, many academic libraries prioritize creating an environment where all members of the academic community can thrive, ensuring equal access to information.â Out of nearly 150 applicants, the 56 winners will be featured in the March 2024 issue of Insight Into Diversity, with the outstanding work of ten libraries featured in a preview.Â
This welcome recognition of libraries leading the way highlights the range of initiatives and programs implemented across libraries. From prioritizing diverse hiring practices to an embedded Equity and Engagement Librarian, and from fellowship programs for underrepresented groups, to providing sensory spaces and adaptive computing labs, their successes provide the field with models and inspiration for libraries to prioritize this work locally. Contributed by Jennifer Peterson.Â
University of North Texas libraries advised to suspend Pride Week eventsÂ
On 15 February 2024, the KERA News website reported that the University of North Texas legal counsel advised UNT Libraries (OCLC Symbol: INT) to suspend planned events for Pride Week. In an email sent to library employees on 9 February, university administration stated that âusing staff and faculty time on the activities we were planning around Pride Week would be in violation of SB17,â a bill passed by the Texas State Legislature and signed by Governor Greg Abbott on 14 June 2023 that prohibits publicly funded colleges and universities from conducting âtrainings, programs, or activities that advocate for or give preferential treatment on the basis of race, sex, color, ethnicity, gender identity, or sexual orientation.â Melisa Brown, senior director of UNT Relations, stated that the ârecognition of commemorative months [such as Black History Month, Pride Month, International Womenâs Month, Asian American and Pacific Islander Heritage Month, Disability Pride Month and the like] is something the university has celebrated for years, and UNT plans to continue this. What is changing in the universityâs recognitions is that any event the university funds must focus on the history of the culture being celebrated in order to be compliant with the law.âÂ
Recognizing diverse groups through various library events is a common method of promoting inclusion on college campuses. Although the article does not describe the events that had to be suspended, it is unfortunate that Texas librarians find themselves under threat of losing funding for developing programming that celebrates diversity. Contributed by Morris Levy
Welcome to our Forum Feedback series, a space dedicated to gathering insights from our vibrant community. Here we delve into the ever-changing conference landscape, exploring themes such as health, safety, accessibility, affordability, and sustainability. Follow along as we share data, insights, and thought-provoking discussions aimed at shaping the future of gatherings with inclusivity at the forefront. We encourage you to actively participate by sharing your own valuable feedback. Together, letâs shape the landscape of conferences for the better.Â
Team DLF has been looking forward to 2024 for a couple of years because we knew it would be the best time to start our journey of evaluating how we gather for the DLF Forum. In preparation, we asked the registrants of the 2023 CLIR Events (DLF Forum, Learn@DLF, and NDSAâs Digital Preservation) the question, âWhen attending a conference, what is most important to you?â While we make it a regular part of our workflow to send surveys to attendees after an event, we wanted to take advantage of the opportunity to ask every person who would be joining us in St. Louis this question. Itâs important to note the language of this question and those that followed (reviewed below). Our question asks what folks are looking for from a conference, not for feedback on past DLF Forum events. This was purposeful as we wanted the opportunity to think beyond what has already been done. This is an opportunity to talk about the larger context of the academic conference.
We received over 400 responses and categorized them into four major categories: networking, event venue, event sessions, and opportunities to present.Â
Word cloud showing major themes of responses to the question, âWhen attending a conference, what is most important to you?â
After collecting this data from registrants, we turned what we learned into a participatory closing plenary session in St. Louis. In this session, we shared the four major categories we saw in the registrant responses and asked participants to quietly reflect on prompts related to each category. For this update, weâre sharing and reflecting on what folks had to say about the event sessions category.Â
The questions developed for the event sessions category included:Â
What determines a quality conference session or workshop?Â
What determines a quality featured speaker (keynote) session?Â
Are there any specific session types you particularly like?Â
We provided the prompts on the digital screens as well as half sheets of printed paper on each table. Photo by Tyler Small.
We provided these questions as a way to get started, but folks were also empowered to freestyle. The questions were developed out of the desire for folks to elaborate on their responses to the 2023 CLIR Events registration question. Many people responded to the question âWhen attending a conference, what is most important to you?â by saying âquality conference sessions.â âQualityâ can mean different things to different people, so we wanted to give folks a chance to reflect on what this means.Â
After quiet reflection, another pivotal aspect of the plenary unfolded: the sharing and exchange among community members. Participants were seated at round tables, equipped with large sticky pads, smaller sticky pads, pens, and markers, facilitating discussions and the recording of their responses and reflections. Providing a platform for our individual experiences to be heard is invaluable. Engaging with fellow community members serves as a reminder of the diverse perspectives present at the conference, enriching our collective understanding. Following the small group discussions, we opened the floor for anyone wishing to share insights or resonant thoughts with the larger group. This structured approach, akin to the think, pair, share method, evoked nostalgic memories of my days in library instruction and provided an enjoyable conclusion to the 2023 DLF Forum.Â
In an effort to include folks who may not have been able to attend the in-person plenary session, we offered a virtual session with the same guidelines and utilized Padlet for folks to record group discussions.Â
After independent reflection, folks participated in small group discussions of their responses. Photo by Tyler Small.
Insights and Desires for Conference Sessions
The word âsessionsâ was used 369 times in the 2023 CLIR Events registration responses, and other words such as âquality,â âpractical,â âdiverse,â âengaging,â and ârelevantâ were used as descriptors of what folks are looking for.Â
At the in-person closing plenary session, we heard responses such as requests for more working sessions, combination sessions (for DLF this typically includes 2-3 15-minute presentations organized around similar topics) and roundtable discussions. Traditionally, working sessions have included time for DLF working groups that already exist to meet or for new ones to kick off around a topic of interest. Other folks want to see accessibility emphasized, including the use of microphones, coaching on how to project oneâs voice, and requiring accessible slides. In virtual trainings leading up to our events, in our email communications, and in our opening plenary, Team DLF does make sure to emphasize the importance of microphone use by all presenters and attendees, and we offer resources for creating accessible presentations, thanks to the work of Debbie Krahmer and DLFâs Committee for Equity and Inclusion. However, this doesnât mean we canât expand our emphasis on accessibility for in-person and virtual events and meetings.Â
Folks also want to see a good balance between the technical and the practical aspects of digital library work. We heard feedback from participants from all Forum Feedback modalities that they are seeking practical and applicable tips and workflows to incorporate at their home institutions. Folks also want to see diverse, friendly and engaging presenters on the conference program.Â
After independent reflection, folks participated in small group discussions of their responses. Photo by Tyler Small.
Navigating Diverse Perspectives
As expected, individuals offered varying perspectives throughout all modalities of Forum Feedback. Some prefer detailed, granular sessions, while others seek broader discussions. Preferences also diverge regarding session lengths, with some advocating for longer sessions and others for shorter ones. Navigating through this qualitative data, albeit occasionally contradictory, can be enlightening and worthwhile. The discernible patterns weâve identified are invaluable to us as conference organizers, aiding in the deliberation of an optimal conference format for both in-person and virtual events. Itâs essential to recognize that an in-person conference doesnât automatically translate to a virtual one simply because itâs livestreamed online.Â
With these considerations in mind, we invite continued engagement and feedback from our community as we collectively shape the future of gatherings. Thank you so much to those who have made invaluable contributions to Forum Feedback!Â
Enhancing the customer experience is the heart of product discovery. Here's the strategic approach brands should take and the technological prowess that can drive success.