Planet Code4Lib

The ILS without patron data: a thought experiment / Peter Murray

Library systems hold significant information about patrons, including their search and reading histories. For librarians, ensuring the privacy and confidentiality of this data is an essential component of professional ethics. In the United States, for example, the third point in the American Library Association Code of Ethics is “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”

To understand this better, consider how the Video Privacy Protection Act of 1988 arose in the U.S. after the controversy surrounding the publication of Robert Bork’s video rental history. A year earlier, Robert Bork was nominated to the U.S. Supreme Court. In the course of his confirmation hearing, a reporter published Bork’s video rental history. Although this list of videos were not a factor in his rejected nomination, that the list was published was found to be outrageous enough spur Congress to pass the law. Similarly, if your library records were made public, it could well be embarrassing and intrusive. (Side note: While there is no federal protection for personal library records like those for video rental records, state laws offer a patchwork of protections.)

Library systems, like the video rental systems of old, tie personally identifiable details with patron activity. So, what if we could separate these details? Before we delve into this, let’s define some terms related to Federated Identity systems. Skip these sections if you know about Federated Identity systems.

Federated Identity Systems: Identity Providers and Service Providers

In our complex world, library services often come from multiple providers. Rather than have the hassle of separate logins and passwords, it is common for these providers to call back to a central service where a people can prove they are who they say they are. The place where people log in called an Identity Provider (IdP). The place where people want to go is called a Service Provider (SP). A Federated Identity System is a trust relationship and a set of agreements/technologies that enable the sharing of identity information and authorizations across systems. It allows people to access resources and services across different systems using a single set of credentials, typically managed by their Identity Provider. (IdPs are sometimes called Assertion Parties because these are the software systems in the trust relationship that assertions about who a user is; SPs are sometimes called Relying Parties because they are trusting the IdP’s assertions.) Federated Identity systems exchange attributes about someone. Those attributes can be specific to a person, like “name” and “email address”, or general categories, like “student” or “community-member”. Attributes can also have special meanings to the IdP and SP, like Pairwise-Subject-ID.

Pairwise Subject Identifier

An identifier that is specific to a user is called a “subject identifier”. These typically look somewhat like an email address with parts specific to both the user and the organization. For example: murraype@dltj.orgmurraype is specific to me and dltj.org gives the identifier context to my organization. In a Federated Identity system, the same subject identifier is given to every SP that asks for it.

However, if we don’t want multiple SPs correlating a user’s activities, we can use a “pairwise-subject-identifier”. Within this workflow, the IdP sends different identifiers to different SPs for the same person, making the identifiers unique to each IdP-SP pair. More formally, pairwise-subject-identifier (“pairwise-id”) is defined this way:

This is a long-lived, non-reassignable, uni-directional identifier suitable for use as a unique external key specific to a particular relying party. Its value for a given subject depends upon the relying party to whom it is given, thus preventing unrelated systems from using it as a basis for correlation.

Typically opaque, these identifiers don’t offer additional information to the SPs trying to correlate activities between users. For instance, the pairwise-id between IdP-SP#1 is uGDJVRxK48E@dltj.org and the pairwise-id between IdP-SP#2 is T6vNM9v5tUna@dltj.org. Not only can the two SPs figure out if this is the same person, there is also no meaning in the identifier to find out who this person is in the first place.

Pairwise-ID as THE library system ID

In our ideal library system aiming to minimize personal data collection, the pairwise-id becomes the unique identifier in the library system. (There are some drawbacks to using the pairwise-id as the unique identifier…we’ll get to those later.) The first time the library system’s SP gets a new pairwise-id, it creates a new user record in the system. The system uses other attributes from the IdP to determine privileges for this new record - for instance, a “student” status gets a normal loan period, a “faculty” status gets an extended loan period, and a “conference visitor” status gets blocked from borrowing.

The library SP is trusting the attributes received by the IdP—see the discussion above about the trust relationship for the assertions—so it does not need prior knowledge about the patron. So other than knowing that the person is a specific individual with a recognized status in the organization, the library system knows nothing about the patron. If the patron’s borrowing and search history are leaked from the library system, the system’s leaked records has nothing else to offer to tie those to a person. (Again, there are de-anonymizing nuances, but for later discussion.)

…but I need to send overdue notices to the patron

Let’s consider some operational aspects that usually require personal data: sending overdue notices, applying fees to a patron, and handling patron requests. The library system knows enough about its patron community to check out books to authorized users—people with attributes coming from the IdP that we trust and use to set how long the loan needs to be. What if a user keeps a book too long…we need a way to send a notice to a person to return the book and to bill them when they don’t return it. But the only thing the library system has is an opaque identifier that only has meaning at the IdP.

Library systems are typically self-contained: they send their own email messages and have their own billing systems for keeping track of patron charges. In a library system without patron data, though, we need to rely on others with more information about the person to handle those tasks.

Let’s take the example of sending notices to the patron. Rather than the library system doing sending the notice itself, our system tells another system to do it. The group that runs the IdP has a service that, when given a pairwise-id and the content of a message, will send that message to the patron for us.

Another example: billing the patron when they say they’ve lost the item or the library declares it missing. The IdP group has another service that takes in the pairwise-id, a currency amount, and a description then adds that information to the person’s central account. The library keeps track of the fact that a pairwise-id has been billed, but it never knows the person behind that identifier. If the item turns up again, our library system reverses the charge: it sends the pairwise-id, a credit amount, and a credit description.

Library patrons also request items be held for them; what do we do in this case? When someone requests an item, the library system prints a “paging slip” that is used to get the item from the shelf. The paging slip has information about the item—its title, author, and shelving location—as well as information about the person who requested it. The paging slip usually turns into the hold pick-up slip; it is taped to the outside of the book and shelved alphabetically by the patron’s last name. There is a serious privacy downside to this workflow, though: everyone from the staff member pulling the item to the other users browsing the hold-pickup shelf can see the name of the person who asked for it. Instead, our library-system-with-no-names prints a random three-word phrase to stand in for the name of the person who asked for the item. This same three-word phrase is sent in the hold-pickup message to the library patron so they can find the item on the hold-pickup shelf.

But could we build it?

While this thought experiment is theoretical, could a real-world library system actually function this way? In the next post, we’ll explore possible adaptations for the FOLIO Library Services Platform to turn theory into practice.

Blog On Vacation / David Rosenthal

This blog will be taking a break for a couple of weeks.

NDSA Updates Strategic Activities / Digital Library Federation

As part of the NDSA’s broader organizational alignment activities taking place over the last year, the NDSA Coordinating Committee recently charged a small group of Leadership members to review and update its foundational strategy, which had previously been published in 2019.

The updated NDSA Strategy retains the NDSA’s mission, vision, and values. The three top-level goals of the organization remain the same, too:

  1. Convening and sustaining an engaged community to advance digital stewardship theory and practice.
  2. Identifying, communicating, and advocating for the common needs, concerns, standards, and good practices of the community.
  3. Providing outreach, resources, training, and professional development opportunities to bolster the effectiveness, productivity, and continuity of the community.

To provide a meaningful, actionable roadmap towards achieving each of these goals, the NDSA Leadership has outlined specific activities and initiatives to be completed in the next three to five years. Some of these activities include strengthening and stabilizing the NDSA’s shared governance, enhancing membership services with improved outreach and new groups, and increasing transparency through new communication methods and channels.

Please check out the “Goals and Strategies” section of the NDSA 2024 Strategy for the full list of activities and initiatives that you can look forward to in the coming years!

One of the first activities that the NDSA Leadership will begin working towards is investigating avenues to develop a sustainable funding model, including but not limited to restructuring membership options, hosting events, and seeking sponsorships. Towards that end, in the coming weeks we will be sending out a brief survey about funding the NDSA work. Keep an eye out and respond to the survey to make your voice heard! And as always, feel free to reach out to the NDSA Leadership with your thoughts and feedback at ndsa.digipres@gmail.com.

– Bethany Scott, 2024 Coordinating Committee Chair

The post NDSA Updates Strategic Activities appeared first on DLF.

What’s Missing in Conversations about Libraries and Mental Illness / In the Library, With the Lead Pipe

In Brief

It is inevitable that public librarians interact with mentally ill patrons daily. We do our best to help find information and connect patrons to resources, where appropriate. What is missing from these conversations is that mentally ill librarians exist too. We often mask our own mental health struggles for the sake of helping patrons, or maybe because of sanism and stigma amongst coworkers. Mentally ill librarians deserve to be open about their experiences if they desire. It is not two absolute groups of the well versus the “crazy.” Mental health is fluid, and sometimes even the helpers struggle too.

By: Morgan Rondinelli

Introduction

I am a mentally ill library professional, and every day I interact with mentally ill patrons. Outwardly, especially to my coworkers, I don’t seem mentally ill. I am exceptionally “high functioning,” though I know many, especially in the autistic community, dislike that term. I am intelligent, productive, and organized. Others prefer the term high-masking, and that more accurately applies to my experience as well. I may be struggling on the inside, but I mask and hide these symptoms externally. I am often pretending to not be mentally ill. Even therapists, doctors, and other mental health professionals frequently fail to recognize when I am suffering. Sometimes the more I am struggling, the more outwardly fine I seem. This was especially true in high school when I earned perfect grades as a result of my obsessive-compulsive disorder (OCD). It seemed like I was thriving, but in reality, I was incredibly stressed and felt like I had to be perfect. Masking for me is a major symptom.

My own mental illnesses, especially anxiety and OCD, still come up in hidden ways while I’m at work. I have written about this more extensively on my own blog, My OCD Voice, but to summarize, I fear doing something morally or ethically wrong. What if I say something that offends a patron? What if I am careless and incorrectly handle personal patron information? I greatly fear accidentally harming others by doing something wrong. Responsibility terrifies me, such as when I am the only full-time staff member for the Adult Services floor on certain evenings. This is even though I have shown time and time again that I am more than capable of handling responsibility, and handling it well at that. I’m open to talking about these fears and experiences with anyone who wants to know more, but if I don’t bring it up, they wouldn’t know.

You cannot tell from my outwardly calm demeanor whether I am actually calm that day or I am having an incredibly anxious day. I’ve had a few panic attacks at work, and no one would know, unless I told them. You cannot see my history of severe depression on my face. You cannot see the medications I rely on taking each day to function. You cannot see my OCD, and though I consider myself in recovery now, at its worst it required temporarily leaving school for residential treatment. You cannot see that in college I was hospitalized four times on inpatient psychiatric units for suicidal ideation. You cannot see these things because I consistently seem fine. Unless I voice these aspects of my past and present, no one would know I am mentally ill. Coworkers do not know. Patrons do not know. So the assumption is I am not part of that group.

Working with the Public

I work at a public library. We are open and available to, of course, the public. Every day, it is statistically highly likely  I interact with patrons, who like me, are very good at masking or otherwise hiding symptoms. I also interact with patrons who display more socially obvious signs of mental illness. I have presented at the International OCD Foundation Annual Conferences in person and online several times. Six years ago, I founded Not Alone Notes, a nonprofit mailing free, encouraging notes to others with OCD. I also served an AmeriCorps term in 2019 teaching Mental Health First Aid courses to rural communities in Central Illinois. Though I am not a mental health professional and cannot diagnose or treat someone, I’ve had enough experience in the field to recognize signs of when someone may be experiencing mental illness.

Library professionals interact every day with patrons externally demonstrating symptoms of mental illness. There is compassion. There are attempts  to connect patrons to resources. And there is sometimes also stigma and bias, even when there are good intentions. I define stigma and bias as these negative perceptions we may hold, without even realizing we hold them. They can come from a variety of places. What have we been taught from our own families or backgrounds? What have we seen on television? Are we afraid of the mentally ill? Have we ever knowingly met someone who identifies as and who we would consider mentally ill? We all have met someone, but knowingly is the keyword.

I’ve heard the word “crazy” used in the workroom after a particularly difficult interaction. “Crazy” is a common word in everyday conversation. I don’t think we will rid the English language of that colloquialism. It’s hurtful though because it is dehumanizing and othering to people who experience mental illness. To me, even once is one too many times to use this word towards a person. I’ve also watched staff groan at answering certain phone calls because a patron may ramble about non-library-related business for twenty minutes. I’ve seen staff baffled or even scared by someone talking to themselves, even though they weren’t harming anyone else around them. 

 I know my behavior with patrons is not perfect or even prejudice-free either. We all make mistakes due to the biases and stereotypes we have learned from society. I bring these instances up because I want to continually improve the culture of library professionals towards those with mental illnesses. There is always room for more education and training. 

These moments are examples of how prejudice prevented library workers from focusing on what the patron needed specifically. Did they need to firmly, but respectfully, be reminded of library use policies? Did they need to make a phone call to a friend or shelter? Did they just need to talk to someone for a few minutes? When we focus on the need, then we can focus on ways to effectively and efficiently address these needs, or to even recognize that we are not able to fill these needs, and then refer to someone else or another organization who can.

A False Dichotomy

There seems to be an assumption that there are people like me who are fine and passing and probably just overly anxious. We can work and function in society, and you wouldn’t even know we have a psychiatric diagnosis. We aren’t who most people mean when they say mental illness. We aren’t part of that group.

Then there are stereotypes of the people who are really “sick” and “crazy” and “seriously ill.” We see people mumbling or shouting or carrying a dozen bags. With these people, the symptoms are “obvious,” and the mental illness is “real.” This is who society often means as part of that group.

What this dichotomy misses though is that mental illness is a spectrum and fluid, with potential symptoms extending in all directions. Some psychiatric disorders are visible, and others are invisible. The visibility of symptoms does not reflect the severity of the illness, someone’s level of internal struggle, or how much help or accommodations they may need. You often cannot tell if someone is ill, unless they tell you.

Individual experiences can also change by the day. Many people who are disabled identify as dynamically disabled. Symptoms change every day, if not every hour. There is not a line to cross where before you were actually fine, and now you are actually mentally ill. I too have been seriously mentally ill in my past. I hope not, but maybe someday I will be that ill again. Not to mention, mental illness is incredibly common. According to a study published in The Lancet Psychiatry by Dr. John J. McGrath et al., half of the world’s population will experience a mental health disorder at some point in their lifetime. In the United States alone, according to the Center for Disease Control (CDC), more than one in five adults live with a mental illness. How can I both convince those around me that I am mentally ill too, that I count, while at the same time advocating that others are not “crazy”? How can I be open about my experiences with mental illness, to validate my experiences of mental illness, while also maintaining that is not my whole identity? 

How library staff treat mentally ill patrons matters. All patrons deserve our assistance and service. They have to follow our user conduct policies, and sometimes a lack of following them, like stealing or repeatedly creating messes in bathrooms, can lead to temporary suspensions. But these instances are rare. Even in these cases, the patrons still deserve respect and dignity when enforcing those policies. Most of the time, I see that respect and dignity from colleagues. I see the good intentions and the service. But sometimes there is still prejudice. The frustration and overwhelm of the day or the week can allow words like “crazy” to slip out.

How library staff treat mentally ill patrons matters also for the mentally ill coworkers who are watching. Mentally ill patrons exist. Mentally ill staff exist. Perhaps under different circumstances, we could easily be in each other’s shoes. We shouldn’t take that lightly. I very much see myself in these struggling patrons. In a different world, that could have been me. I can mask, but that doesn’t negate my mentally ill status and identity. We are on the same spectrum. So when you call them “crazy,” you are saying that to me too.

I know I am privileged to have had access to therapy, intensive treatment, hospitalizations, support from family and friends, and continued access to medication. These factors are a huge part of how I am able to cope and function so well, like having a full-time job and attending graduate school in library science, or other activities like participating in community theater and running my own mental health nonprofit. Without these supports, would I be the library patron having a panic attack while on a public computer instead of privately in the workroom?

What’s Missing

Conversation exists about how library staff can best help and support mentally ill or even just mentally struggling patrons, though even that literature is limited. For example, what training can librarians undergo to learn more about helping patrons? My library has implemented the Ryan Dowd Homeless Training, which isn’t exactly mental health focused, but it is definitely adjacent. I’ve seen positive outcomes from this training, such as staff focusing on getting what Dowd calls “pennies in the cup.” Introduce yourself by name, ask the patron’s name, make eye contact, smile. Get to know them, so if there is conflict, you already have had several positive interactions banked up.

I wonder if we should hire social workers in public libraries. I’m hoping to take the Library Social Work class in my MLIS  graduate program at the University of Illinois before I graduate next spring. What programming can be organized to provide connection to resources? And on the day to day, what are the best ways to, for example, approach someone who is loudly shouting at themselves to tone down their volume a touch? 

The former Association of Specialized and Cooperative Library Agencies (ASCLA) division of the American Library Association (ALA) published Guidelines for Library Services for People with Mental Illnesses in 2007 and a revised tip sheet in 2010, though both are no longer available online. These publications are helpful movements towards addressing these questions and helping mentally ill library patrons. 

 What I am not seeing though is much literature about mentally ill library staff and librarians themselves. I recently read the anthology LIS Interrupted: Intersections of Mental Illness and Library Work, edited by Miranda Dube and Carrie Wade. This book was validating that others are open about working in a library with a mental illness. In it, Stephanie S. Rosen wrote about library work as a “caring profession,” and how to do that care in a way that cares for yourself too. Alice Bennett wrote about disclosing mental illness as a privilege, since there can be discriminatory consequences in some contexts. Separately, JJ Pionke has written about working in a library with post-traumatic stress disorder (PTSD), a largely invisible disability for him, and trying to seek accommodations. There has also been some research about perceptions of mental illness among academic library staff, revealing that stigma can prevent disclosing mental illness. 

Mentally ill library staff exist, yet vocational awe has created a version of libraries where we can only be the “sane helpers.” We can be expected to be put together and give our all to help. And there is a sense, whether recognized or not, that because as librarians we can help someone, we are better than them or not mentally ill. This all feeds into that false dichotomy.

I thankfully don’t experience this pressure to always help at my workplace. Taking care of yourself is heavily encouraged. This includes taking breaks, not checking emails when off work, and staying home if you are sick, physically or emotionally. I have taken a mental health day before, and no one batted an eye when I said that as my reason for calling out sick. As library professionals, of course we want to help patrons, but we also must take care of ourselves. 

Being solely viewed as the helpers is unrealistic and unfair. It forces masking of our own symptoms, always pretending to be fine. It erases our experiences and reduces comfort seeking accommodations we may need at work. OCD is protected under the Americans with Disabilities Act (ADA). I personally don’t seek accommodations at work, but I could and maybe someday I will. Ignoring that OCD and other psychiatric disorders are protected under ADA and other aspects of being seen as just helpers furthers the illusion that there is a dichotomy: “we are totally well and you are mentally ill.” In reality, it is not us versus them, or even us helping them. Mental illness affects both patrons and librarians, visibly and invisibly. Perceptions of mental health are ever changing, and hopefully the conversations about it can keep changing too.

Perhaps because of our similar experiences, mentally ill staff are better able to help mentally ill patrons. With or without more training, we already have a deeper understanding of symptoms and the experiences of mentally ill patrons. Of course, no two peoples’ experiences are exactly the same, but we can relate to one another. We have been there too. I also wonder if I were to disclose my mentally ill identity and more obviously wear it “on my sleeve,” could that make a patron feel more comfortable asking for resources? Does my disclosing help coworkers feel more comfortable accessing resources for them as staff? I have seen at least this somewhat playout. When I mention I have OCD, some coworkers ask thoughtful questions. Some go on to say “me too.”

There are many library professionals ready to be open about their stories with mental illness, either with coworkers in person or more anonymously online. Either avenue is acceptable, and not sharing your story is also a valid choice. Disclosure should always be optional, and disclosure is not necessary for someone’s own journey or recovery. But those of us who want to share, we are ready and capable to speak about these topics and our personal experiences. We are ready to have these conversations in the workroom, to openly take mental health days when we need them, and to better help patrons because of these lived experiences. The real question is how can we create a space where our mentally ill coworkers and selves feel like we have the space to share our voices.


Acknowledgments

I would like to thank Internal Peer Reviewers, Jessica Schomberg and Brea McQueen; External Peer Reviewer, Alice Bennett; and Publishing Editor, Jaena Rae Cabrera for their thoughtful and thorough work in helping revise this piece. I would also like to thank Laura Golaszewski and Rachel Park, for providing feedback on this piece before submission, and Professor Katie Chamberlain Kritikos for introducing me to In the Library with the Lead Pipe in class.


References

About mental health. U.S. Centers for Disease Control and Prevention. https://www.cdc.gov/mentalhealth/learn/ 

Alvares, G. (2019, July 11). Why we should stop using the term “high functioning autism.” Autism Awareness Australia. https://www.autismawareness.com.au/aupdate/why-we-should-stop-using-the-term-high-functioning-autism 

Burns, E. & Green. K.E.C. (2019). Academic librarians’ experiences and perceptions on mental illness stigma and the workplace. College & Research Libraries, 80(5), 638. https://doi.org/10.5860/crl.80.5.638

Dube, M., & Wade, C. (Eds.). (2021). LIS interrupted: Intersections of mental illness and library work. Litwin Books: Library Juice Press. 

Ettarh, F. (2018, January 10). Vocational awe and librarianship: The lies we tell ourselves. In The Library With The Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2018/vocational-awe/ 

Homeless Training. https://homelesslibrary.com/ 

IOCDF Conference Series. International OCD Foundation. https://iocdf.org/programs/conferences/ 

McGrath, J.J., Al-Hamzawi, A., Alonso, J., Altwaijri, Y. Andrade, L.H., Bromet, E.J., Bruffaerts, R., Caldas de Almeida, J.M., Chardoul, S., Chiu, W.T., Degenhardt, L., Demler, O.V., Ferry, F., Gureje, O., Haro, J.M., Karam, E.G., Karam, G., Khaled, S.M., Kovess-Masfety,V.,…Zaslavsky, A.M. (2023). Age of onset and cumulative risk of mental disorder: A cross-national analysis of population surveys from 29 countries. The Lancet Psychiatry, 10(9), 668-681. https://doi.org/10.1016/S2215-0366(23)00193-1 

Not Alone Notes. https://notalonenotes.org/ 

Pionke, JJ. (2019). The impact of disbelief: On being a library employee with a disability. Library Trends, 67(3), 423-435. https://doi.org/10.1353/lib.2019.0004

Rondinelli, M. (2023, October 17). How OCD affects me at work. My OCD Voice. https://myocdvoice.com/2023/10/17/how-ocd-affects-me-at-work/ 

Sarmento, I.M. Dynamic disability. DisArt. https://www.disartnow.org/journal/dynamic-disability/ 

Spencer, M.M. Americans with Disabilities Act: The law and tips for working people with OCD. Internatijonal OCD Foundation. https://iocdf.org/expert-opinions/expert-opinion-americans-with-disabilities-act/ 


In the Library with the Lead Pipe welcomes substantive discussion about the content of published articles. This includes critical feedback. However, comments that are personal attacks or harassment will not be posted. All comments are moderated before posting to ensure that they comply with the Code of Conduct. The editorial board reviews comments on an infrequent schedule (and sometimes WordPress eats comments), so if you have submitted a comment that abides by the Code of Conduct and it hasn’t been posted within a week, please email us at itlwtlp at gmail dot com!

In MedPage Today – Retract Now: Negating Flawed Research Must Be Quicker / Jodi Schneider

Check my latest piece, Retract Now: Negating Flawed Research Must Be Quicker — Incentives and streamlined processes can prevent the spread of incorrect science in “Second Opinions”, the editorial section of MedPage Today.

I argue that

“It is urgent to be faster and more responsive in retracting publications.”

Retract Now: Negating Flawed Research Must Be Quicker Jodi Schneider in MedPage Today

Thanks to The OpEd Project, the Illinois’ Public Voices Fellowship, and my coach Michele Weldon (whose newest book is out in July). Editorial writing is part of my NSF CAREER: Using Network Analysis to Assess Confidence in Research Synthesis. The Alfred P. Sloan Foundation funds my retraction research in Reducing the Inadvertent Spread of Retracted Science including the NISO Communication of Retractions, Removals, and Expressions of Concern (CREC) Working Group.

#ODDStories 2024 @ Nairobi, Kenya 🇰🇪 / Open Knowledge Foundation

Sub-Saharan Africa bears an immense burden of disease and premature deaths attributed to environmental pollution. Recent studies suggest that escalating pollution levels could undermine efforts to improve health and economic efforts across the continent and slow progress towards achieving sustainable development goals (SDGs). However, lack of sufficient and reliable data and research hinders the development of policies aimed at reducing environmental pollution.

This year Open Data Day, celebrated on 7 March, placed special emphasis on “Open Data for Advancing Sustainable Development Goals.” sensors.AFRICA celebrated the day by organising an event dubbed “Open Data for Environmental Monitoring.”

The event attracted participants from various fields, including data analysts, health practitioners, environmental and atmospheric scientists, urban planners, reporters, fact-checkers, software engineers, as well as undergraduate and graduate students. The primary objective was to bring together stakeholders committed to tackling environmental issues and promoting sustainable urban development. The event was also aimed at showcasing the advantages and potential of open data in environmental monitoring, and in advancing SDGs.

Underscoring the role of open data in promoting transparency, accountability, and innovation, Laura Mugeha, the community coordinator at sensors.AFRICA and africanDRONE, stressed the importance of data being findable, accessible, interoperable, and reusable that encompass the openness of data. “Open data should be as open as possible and as closed as necessary,” she added.

sensors.AFRICA, is a citizen science initiative by Code for Africa that locally develops and assembles low-cost sensors to give citizens and civic watchdogs actionable information about their cities. The initiative provides open-air quality data and has deployed sensors in different cities in sub-Saharan Africa, including in Accra, Abuja, Kumasi, Lagos and Nairobi.

“Our air quality datasets are utilised by a wide range of stakeholders, including journalists who report impactful stories on the detrimental effects of air pollution in marginalised communities,” said Alicia Olago, senior product manager at sensors.AFRICA.

She further emphasised the critical role of citizen science and participatory mapping in pinpointing pollution sources, noting: “The added sense of ownership by communities who participate in data collections often leads to informed decision-making and social change in African communities.”

Gideon Maina, the senior IoT engineer at sensors.AFRICA delved into the technical aspects, showcasing the hardware and software components, along with the design principles that guide the development of the low-cost sensors. “We want anyone with the skills and capacity to develop a sensor to use our design and build their own sensor.”

The sensors have been upgraded to include a solar panel to cope with power fluctuations, and feature a dashboard crucial for tracking deployment and maintenance schedules, OTA updates, device configurations, and mapping the sensor network.

The air quality Datathon: Use cases for open air quality data

The main event was the air quality datathon. Participants were asked to divide into groups of five, with each group nominating a leader, presenter, notetaker, timekeeper, and a researcher. Teams chose from Nairobi, Accra, and Abuja to develop a use case for the selected city’s dataset. With just an hour at their disposal, groups harnessed open air quality data from sensors.AFRICA to craft their innovative use cases.

Smart mobility solutions in Abuja

The Abuja group aimed to create smart mobility solutions. Using a predictive model, they sought to guide commuters towards the most convenient and eco-friendly transport options. Their analysis – which connected poor air quality directly to transportation choices – involved using sensors.AFRICA air quality datasets travel data from Abuja Metropolitan Authority, as well as demographic data from the Nigerian National Bureau of Statistics. The group envisioned providing air quality information to facilitate informed decisions.

Non-motorised transportation

Focussing on reducing emissions, the Accra team’s use case was centred around advocating for non-motorised transportation to minimise future health impacts on the students at the University of Ghana, where the sensor is located. Their concern arose from the university’s proximity to a major highway, hence the elevated air pollution levels. With particulate matter readings exceeding WHO standards, and health data from Ghana’s Ministry of Health showing rising cases of respiratory diseases such as asthma and pneumonia, the group advocated for legal reforms. Their goal: to leverage air quality data in supporting community-led initiatives for cleaner air.

The Nairobi group proposed transforming a waste disposal site in Mathare, an informal settlement, into a vibrant green space. The area suffers from poor air quality and high temperature levels from open waste burning, and deforestation. By correlating air quality and temperature data with environmental degradation indicators, the group aimed to use placemaking as a tool. Their goal was to foster community engagement in revitalising Mathare into a healthier, greneer neighbourhood.

Reflecting on the Datathon

The event concluded with a panel discussion, led by Alicia Olago, highlighting the need for inclusive dialogue on environmental monitoring, involving governments, NGOs, and media. Challenges with data analysis and constraints in data collection were also discussed. “One of the greatest challenges for environmental reporting is data analysis, interpretation and understanding science jargon.” Jackline Lidubwi, project coordinator, Internews Earth Journalism Network said. “We should not start at data and end at data,” Victor Indasi, Breathe Cities Lead, Kenya, Clean Air Fund.

Highlighting the benefits of open data, Maurice Kavai, Green Nairobi said “Open data has supported us in advocating for more resources from the legislative assembly for air quality interventions in Nairobi County.”On open data for sustainable development goals, Dr Andriannah Mbandi, Waste Lead for United Nations High Level Climate Champions said: “I always say atmospheric science is a creed; this event has provided me with a new audience to work with, one that resonates with the commitment to leave no-one behind”.


About Open Data Day

Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities.

As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date within a one-week period. In 2024, a total of 287 events happened all over the world between March 2nd-8th, in 60+ countries using 15 different languages.

All outputs are open for everyone to use and re-use.

In 2024, Open Data Day was also a part of the HOT OpenSummit ’23-24 initiative, a creative programme of global event collaborations that leverages experience, passion and connection to drive strong networks and collective action across the humanitarian open mapping movement

For more information, you can reach out to the Open Knowledge Foundation team by emailing opendataday@okfn.org. You can also join the Open Data Day Google Group to ask for advice or share tips and get connected with others.

Distant Reader Index / Distant Reader Blog

Abstract

This posting outlines how I implemented an index to Distant Reader content through the use of Koha and protocol called SRU (Search/Retrieve via URL). Because of this implementation it is easy for me (or just about anybody else) to search the collection for content of interest and create Distant Reader data sets ("study carrels"). TLDNR: Implement a Koha catalog, turn on SRU, patch the SRU server, and write XSL stylesheets.

Introduction

I have cached a collection of about .3 milllion plain text, JSON, PDF, and TEI/XML files. These files are etexts/ebooks or journal articles. Everything dates from Ancient Greece to the present. For the most part, the subject matter lies squarely in Western literature, but there is a fair amount of social science material as well as an abundance of COVID-related articles. Everything is written in English and everything is open access. This is my library.

I wanted to make my library available to a wider audience, and consequently I implemented a traditional library catalog against the collection -- The Distant Reader Catalog. I outlined the implementation of this catalog -- a Koha instance -- in a different posting. The result works very much like a library catalog. Patrons. Libraries. Circulation. Items. MARC records. Etc. But I also desired a machine-readable way to search the catalog and programmatically process the results. Fortunately, Koha suports a standard API called "SRU" (Search/Retrieve via URL) which is intended for exactly this purpose. Thus, this posting outlines how I implemented SRU against my collection of .3 million open access items.

Implementing SRU

SRU is a protocol for querying remote indexes. Like OAI-PMH, it defines a set of standardized name/values pairs to be included in the query of a URL. Unlike OAI-PMH, it is intended for search where OAI-PMH is used for browse and harvest.

Fortunately -- very fortunately -- Koha supports SRU out-of-the-box. All one has to do is turn it on from the Koha administrative interface. Sent from the command line, the following URL ought to return an SRU Explain response outlining the functionality of the underlying server:

        http://catalog.distantreader.org:2100/biblios
      

After reading between the lines of the Explain response, one can then search. Here is a query for the word "love", and it will only return the number of records found. For the sake of readability, carriage returns have been added to the query, and the query is linked to the XML response:

        http://catalog.distantreader.org:2100/biblios?
version=2.0&
operation=searchRetrieve&
query=love
      

This URL will not only search but also retrieve (4) records in the given MARCXML schema:

        http://catalog.distantreader.org:2100/biblios?
version=2.0&
operation=searchRetrieve&
query=love&
maximumRecords=4&
recordSchema=marcxml
      

I was very impressed with this functionality, but two things were lacking. The first and most important was support of faceted results. I wanted to group my results by author, collection, data type, etc., and the out-of-the-box interface did not support this. After shaking the bushes, per se, a person named Andreas Roussos voluntarily stepped up and wrote a patch to Koha's SRU implementation for me. I applied the patch and the faceting worked seamlessly. "Thank you, Andreas!" I can now facet the previous query:

        http://catalog.distantreader.org:2100/biblios?
version=2.0&
operation=searchRetrieve&
query=love&
maximumRecords=4&
recordSchema=marcxml&
facetLimit=32
      

The second thing lacking was robust support of XSL stylesheets. Raw SRU/XML streams makes things easier for computer post-processing but makes things difficult for people. XSL stylesheets are intended to overcome this problem by transforming the SRU/XML into HTML and then rendering it in a Web browser. The Koha SRU allows for stylesheets, but there was no place to save them without the Koha user interface getting in the way. And while one is able to denote the location of a stylesheet on a different computer, one's Web browser will complain because the risks of CORS (cross-origin resource sharing). Again, after shaking the bushes a bit, I learned of a Web server (Apache) configuration called ProxyPass. Using this configuration in an Apache virtual host definition, I was able to denote a simplified name for the SRU interface, map the name to the canonical host/port combination, and specify the location of any stylesheets. The whole thing was extraordinarily elegant:

  <VirtualHost index.distantreader.org:80 >
    ServerAdmin emorgan@nd.edu
    DocumentRoot /var/www/index
    ServerName index.distantreader.org
    ProxyPass "/style-searchRetrieve.xsl" "!"
    ProxyPass "/style-explain.xsl" "!"
    ProxyPass / http://catalog.distantreader.org:2100/
  </VirtualHost>

I can now search, retrieve, and render a query:

        http://index.distantreader.org/biblios?
version=2.0&
operation=searchRetrieve&
query=love&
maximumRecords=4&
recordSchema=marcxml&
facetLimit=32&
stylesheet=style-searchRetrieve.xsl
      

While the ProxyPass configuration is essential, the stylesheets make the whole thing come to life. There are two of them: 1) explain to HTML, and 2) search/retrieve to HTML.

But wait. There's more!

Like any library catalog or bibliographic index, one can do the perfect search and get back dozens, if not hundreds or thousands, of relevant search results. Don't let them fool you. Yes, those items at the top of the list are the most statistically relevant, but the differences between the relevancy ranking scores are tiny; for all intents and purposes each item is equally relevancy ranked. Consequently, one is left with the problems of getting the articles, reading them, and coming to an understanding of what they say.

To address this problem -- the problem of consuming a large corpus of materials -- the Index sports a function labeled "Automatically build a data set". Here's how it works:

  1. enter a simple query - a single word or quoted phrase
  2. use the faceted search results to refine the query
  3. when you are satisfied with the results, click "Automatically build a data set"

You will then be asked to authenticate against ORCID. If successful, a data set will be created from the search results, and one can apply natural language processing, machine learning, generative-AI, and general text and data mining to the collection. Heck, one can even print the items in the collection and consume them in the traditional manner of reading. No click, save, click, save, click, save, click, save, etc.

./screenshot.png
Screen shot of the the Distant Reader Index

Summary

Koha is a first-class citizen when it comes to open source software, especially in the library domain. Feed it MARC records, the records get indexed, and the search interface is more than functional. Additionally, one can turn on both OAI-PMH and SRU interfaces making a Koha catalog's content machine readable. With the help of the community, I have done this work, and I have taken it one step further. Not only can one search the collection and get results, but one can also create a data set from the results for the purposes of reading at scale.

ANZREG 2024 / Hugh Rundle

On 18 June I delivered a keynote address to the ANZREG 2024 conference. This was a pretty amazing privilege - talking for nearly an hour to a group of Australian and Kiwi systems and technical services librarians.

A huge thanks to Alissa McCulloch who provided invaluable advice about the earlier drafts of this talk–though of course responsibility for any errors is entirely mine.


Hello. I live in Naarm/Melbourne, just a short walk from the Birrarung where it curves around in a series of turns, folding and looping back on itself. Even though I live in one of the city's oldest inner suburbs, the river is surprisingly peaceful here. In the summer herons dry their wings in the sun and lorikeets shriek as they zoom through the river redgums. On winter mornings like today, a low fog often hangs over the water. The place talks to me about the Wurundjeri people who have lived on this river and this land with the mists and the birds and the redgums for tens of thousands of years.

Traditional Wurundjeri culture, like the culture of other Indigenous people in what is now called Australia, is inseparable from the maintenance of knowledge and learning. Some was embodied, like making a canoe, a coolamon or a net. Some was more cerebral - passed on through stories and mnemonics carved on everyday or sacred objects. And some was experiential, like star maps and Ceremony.

So I want to pay respect to the knowledge and the wisdom and the elders of the Wurundjeri, and Indigenous peoples on all the lands we are gathered on today - to those who lived here many years ago, to those who are passing on their knowledge today, and to the next generation of elders who are emerging.

When Ella asked me to deliver a keynote address at this ANZREG conference, I was quite startled. It's a great honour. Ella told me that I could talk about "Anything–because you have opinions". I think we need more opinions in librarianship. I'd love to provoke you to share some of yours. Hopefully we'll have time at the end of this session to hear from some of you as well.

So friends today we're going to think about libraries, and systems, and library systems.

The description in the program said I'd be using six of my favourite books, but I don't actually have enough time for that, so we'll talk about five instead. Funnily enough, time is one of the things I want to talk about. We'll come back to these themes at the end:

  • Lots of things can be true at the same time
  • Lots of times can be true at the same thing
  • Know where you are
  • Share what you know

Sand Talk

Tyson Yunkaporta is an Aboriginal scholar and founded the Indigenous Knowledge Systems Lab at Deakin University. His book Sand Talk changed how I see the world, so I always want to tell people about it. But the thing about Sand Talk is that it's a book about Indigenous thinking that helps us to think through Indigenous thinking. The whole thing is a series of interconnected stories. This makes it very difficult for me to just pick out a nice quote that summarises everything, which I suspect would make Tyson happy.

So instead I'm going to keep coming back to some of the concepts he shares, as we explore the other books. But briefly, for Yunkaporta everything is about relationships and flows. We need to seek out different people and experiences, and learn what we can. In our interactions the knowledge, energy and resources should flow through us and out to others. And all of these connections and interactions should transform us, not just transform the things and people we interact with and act upon.

In Sand Talk there is also a discussion of time. In Indigenous understanding and many Indigenous languages, something can be both a time and a place. A time can be both future and past. This is part of what is sometimes called the Dreaming, though I personally find it easier to think about it using another term: everywhen. We'll come back to this a few times but let's move on to our next book.

Seeing like a state

James C Scott is an American political scientist and anthropologist. His book Seeing like a state explores how states try to enforce "legibility" on their subjects and in the process ignore or override local and contextual knowledge.

What he means is that for big states and empires, it's impossible to govern a large group of people or a large area of land unless you can understand how the people are behaving and what is happening on the land. States need to "see" the people and the land. But it's too hard to govern if you have to understand all of the details about every person and every tree. So large states simplify things so that they can see what they need to see in order to maintain control. That's what he means by making things "legible".

When this is combined with large scale "high modernist" projects like building cities from scratch, or nation-wide agricultural reforms, the conflict between the clarity of the plan and the messiness of the real world usually results in catastrophic failure.

There's a lot packed into 360 pages, but I want to read along with a particular section for a moment, because it gets to the heart of Scott's argument and has a lot of relevance for our own work. A note here–when Scott writes about "bracketing contingency", what he means is assuming that anything the creators haven't considered is basically irrelevant and won't impact on their plan.

Scott writes in part of his conclusion:

The power and precision of high-modernist schemes depended not only on bracketing contingency but also on standardizing the subjects of development.. such subjects... have no gender, no tastes, no history, no values, no opinions or original ideas, no traditions, and no distinctive personalities to contribute... The lack of context and particularity is not an oversight: it is the necessary first premise of any large-scale planning exercise.

James C Scott is writing here about huge state projects like the construction of the city of Brasilia, and the vast and catastrophic changes to agriculture and social life in Stalinist Russia. But I think there are lessons here for those of us working in libraries today as well.

I see most theory in librarianship, and much of its practice, as fitting squarely within the high modernism that Scott writes about. After all–what is classification, cataloguing, and ontological mapping really all about, if not an attempt to render the messiness of the world "legible", as Scott puts it? Every controlled vocabulary flattens reality into a list of predetermined categories and definitions. To an extent this is inescapable, but we should at the very least be more mindful about what perspectives we are including and we're excluding.

Most weeks I think about how absurd it is that the overwhelming majority of libraries of all types in Australia and New Zealand use Library of Congress Subject Headings as the main, or only, controlled vocabulary in our systems. As if the interests and needs of the United States government are somehow universal and relevant enough that we don't need to bother thinking about what local people, local concepts, and local worldviews need in and from the collections we manage and the systems we use to tell the world about them.

We all know how unhelpful it is to shelve physical items based on a classification system designed by an American racist sexual predator from the nineteenth century. Yet the Dewey Decimal system is the one thing most people think they know about libraries.

Even when we "genrefy" collections, you can usually scratch the surface and see DDC underneath.

Books in a "genre" section that also have Dewey Decimal call number spine labels

And whilst there are discipline-specific thesauri and classification systems, and there's been a lot of work happening in the last few years to add things like AIATSIS headings and Homosaurus terms, it would be very rare to find an original catalogue record in a general-collection library that was built on those from scratch and doesn't use LCSH at all.

And this kind of flattening vision happens all the time in other parts of our systems. In recent months I've gone back and forth with a vendor in an ever more frustrating series of support tickets. Needless to say, support, such as it is, is offered in the middle of the Australian night.

Some of our students haven't been able to sign in to Lexis Nexis using single sign on. After some troubleshooting, it became clear that the problem was that the students in question all have only one name each, and the Lexis Nexis platform assumes that they have two different names: a "first name" and a "surname". When we pointed out this flaw in their system, Lexis Nexis insisted that we need to change things at our end so that we sent them two names for each user.

Lexis Nexis sees like a state. If users don't have surnames, we're expected to create one for them. James C Scott tells us in his book that the Spanish colonial powers did exactly the same thing in the Philippines, giving each regional governor a couple of pages from an alphabetically ordered list of approved names. They then went from village to village and assigned family names from the list so that the government could identify people more effectively. Here we are in 2024 expected to do the same thing.

Our support ticket with Lexis Nexis remains open.

This tendency to only see certain things was obvious when the State Library of Queensland launched their Virtual Veteran project just before Anzac Day this year. The Virtual Veteran is a generative-AI chatbot trained on a selection of texts the library holds about the First World War.

Unfortunately the project team at SLQ had a hard time imagining the sort of people I immediately thought of when I first found out about it. Though to be fair, that's probably because I first found out about it not from the library, but from someone who had rather different ideas about how the Virtual Veteran might be used:

The Queensland Government has spent money on an LLM chatbot "trained" on World War I lore in order to "celebrate" ANZAC Day. Please note that it has specifically been given guardrails to not respond to questions about Ben Roberts-Smith, do not under any circumstances try to get around them for comedic effect:

I'm sure you can imagine the sorts of questions this chat bot was responding to pretty soon afterwards.

In a webinar the next week, one of the people involved in building the Virtual Veteran chatbot confirmed that no testing had been done to check whether the guardrails put in place would work. It simply hadn't occurred to them that anyone would ask the Virtual Veteran about anything other than ...well, I don't know. How brave the soldiers had been, or what they thought about mateship? Who did SLQ have in mind as the people who might ask questions of the Virtual Veteran? Maybe they were people like Scott's "human subjects", people with:

no gender, no tastes, no history, no values, no opinions or original ideas, no traditions, and no distinctive personalities.

Well, it turns out people do have tastes, values, and original ideas:

screenshot of chatbot conversation where the bot is tricked into providing a recipe for Turkish Delight

Despite claiming that Charlie has "the persona of a World War I soldier", his responses have the blandly safe tone you might expect from an official government spokesperson. The bot was trained on the 12-volume Official History of Australia in the War 1914-1918; on newspaper articles from the time that were subject to government censorship; and on war correspondence donated to the library. Generative AI by its nature provides confident responses to any prompt based on the content it was trained on. When it was trained on the official narrative of a government that declared war in support of the British Empire, it's hardly surprising that the result is a sanitised imperial view of the world.

What does Charlie have to say to Australians of Turkish, German, or Arab descent? How does the chat bot provide insight into the multiplicity of experiences Australians had during the first world war?

I think we can do better than this.

I can see what the librarians at SLQ were probably trying to do. The Virtual Veteran was an interesting idea that wanted to help Queenslanders engage with their history in a more interactive and intuitive way. Popping down to the library to read all twelve of Bean's volumes about the war in an uncomfortable chair isn't for everyone.

It might have worked with a different framing. The Library could have been explicit about the absences and biases in the collections their chatbot drew from. There could maybe have been different personas to answer different types of questions. Or even better: different personas to answer every question, each having a different perspective.

Because there are always different perspectives. The one thing every historian agrees on is that history is contested.

The Corley Explorer is an example of taking the opposite approach to inviting a community to engage with a library collection. Interestingly, this was also a State Library of Queensland project. The Corley Explorer invites a multitude of overlapping experiences and histories into the library, contributing to the stories the library knows about its collections.

Frank and Eunice Corley drove around the suburban streets of 1960s and 70s Queensland in their pink Cadillac. They took photos of houses, developed them, and then went door to door to sell them to the homeowners.

When 61,000 photographs from the Corleys' basement were donated to SLQ, the library did something very interesting. The Library didn't have much meaningful metadata for this large collection, so it launched the Corley Explorer, inviting Queenslanders to browse the collection and fill in the missing details: geo-locating each house, providing information about what has changed since the photograph was taken, and sharing family histories about who had lived in the house and when.

The Corley Collection could have been a pretty tedious set of broadly similar Queenslander houses from half a century ago, and it wouldn't have said much other than perhaps something about architectural history. Instead of that, SLQ now shares thousands of individual stories, finds connections between them and maintains an ever-growing text for future generations to learn from.

So the Virtual Veteran and the Corley Explorer–two projects from the same library–give us an example of what James C Scott calls "seeing like a state", and also an example of how we might avoid it.

In his book, Scott describes something I think is really important for us to think about as librarians. He writes:

[High modernism's] simplifying fiction is that, for any activity or process that comes under its scrutiny, there is only one thing going on.

I'd love to see more discovery tools like the Corley Explorer. And sure, you could describe it as a "crowd sourcing" project, but I think that really underestimates the work the Corley Explorer is doing. With the Corley Explorer. SLQ has the courage to admit that the collection is full of unknown unknowns, and it uses the Library's ignorance to create connections and encourage storytelling.

The project explicitly encourages multiple perspectives, understandings and histories to layer across each other and connect with each other. It explores memory, community, and meaning. Because of this, every photo is also a mnemonic object like the ones I mentioned at the beginning, prompting memories of childhoods, families and neighbours, and then wrapping those memories into something the library can record.

The Corley Explorer encourages the diversity and human connections that Tyson Yunkaporta highlights as so important in Sand Talk. But it also does the other thing he highlights as important, that we always find so difficult in big institutions: the Corley Explorer changes the State Library of Queensland and the Corley Collection itself, every time someone interacts with it.

Spam

Finn Brunton's Spam: a shadow history of the Internet gives us some helpful ways to understand the weirder and more annoying things we encounter online. Brunton is now Professor of Science and Technology Studies at UC Davis, and studies histories of technology and hacking culture. He has a pretty capacious definition of spam:

Spam is the use of information technology infrastructure to exploit existing aggregations of human attention.

The book was published in 2013, but here's a site someone put together to show how we experience the web in the 2020s, which really helps us visualise what he was talking about:

https://how-i-experience-web-today.com

So every commercial website is mostly spam. But let's touch on some weirder examples before getting into things more specific to libraries.

For something slightly closer to libraries, we can observe the Amazon ebook store. This problem is so old, Finn Brunton flagged it as an interesting new development in his book eleven years ago. But since then, it's expanded into a much more serious problem.

Last August, author Jane Friedman published a blog post called I Would Rather See My Books Get Pirated Than This. What on earth could be so bad that a published author would prefer her books to be pirated?

AI-generated ebooks listing her as the author, being sold on Amazon.

Friedman found at least five titles attributed to her that were obviously AI-generated. Needless to say, she hasn't received any royalties for these books.That's pretty bad, but what happened next shows how optimised and automated metadata systems can really go wrong.

Amazon owns the Goodreads platform used by readers and authors to track reading and talk about books. Friedman found out about these fake Amazon titles because they were automatically linked to her Goodreads profile. Nobody at Amazon knew these titles weren't published by her, because the system was so "optimised" that there were no humans involved at all. And Amazon's response when she complained?

"Please provide us with trademark registration numbers".

Demanding that authors go through the bureaucratic work of trademarking their own names before they take any action against robo-impersonation might seem callous, greedy, and irresponsible. But that's only because it is callous, greedy, and irresponsible.

I quoted James C Scott earlier when we were talking about his book:

[High modernism's] simplifying fiction is that, for any activity or process that comes under its scrutiny, there is only one thing going on.

Well, Amazon insists there is only one thing going on in their ebook store, because for them there is only one thing going on: making gigantic profits. Anything else is just a price worth paying.

Especially if it's someone else who pays.

I reckon Tyson Yunkaporta would identify this as a bad energy flow.

August last year was a pretty busy time for stories about the AI-generated catastrophe that is the Amazon ebook store. Another story showed how high this price could be for those unfortunate enough to purchase a book from Amazon. Fake books by real authors is one thing, but fake books by fake authors can be deadly.

In late August a bunch of AI-generated mushroom-foraging books appeared in the store. One of these was The Ultimate Mushrooms Book Field Guide Of The Southwest: An essential field guide to foraging edible and non-edible mushrooms outdoors and indoors. Perhaps the idea of foraging "indoors" for "non-edible" mushrooms should have been a clear giveaway that something wasn't quite right here.

Some people seek out mushrooms for their hallucinogenic effect, but I think we can hopefully all agree that if you're foraging for mushrooms, you don't want to trust a generative AI that is hallucinating.

Samantha Cole from 404 Media wrote:

False Morels and Death Caps are two species found in the American Southwest that look a lot like their edible, non-poisonous counterparts and can kill you within hours. Foraging safely for mushrooms can require deep fact checking, curating multiple sources of information, and personal experience with the organism.

Of course, generative AIs in the cloud can't have personal experiences with anything. The right way to understand and share knowledge about wild mushrooms is to do it the way the Wurundjeri did. I mentioned embodied knowledge right at the top, and this is an example of that. Sometimes you have to be able to observe something in its context: to smell the air and touch the ground. When reading it in a book isn't enough. Especially when the book was written by a spicy auto-complete.

If we're going to call ourselves information professionals, we need to be thinking a few steps ahead about the consequences of short term convenience. We need to think like a system. We need to think like a spammer. In his conclusion, Finn Brunton thinks about what spam tells us:

Spammers push the properties of information technology to their extremes: the capacity for automation, algorithmic manipulation, and scripting; the leveraging of network effects and vast economies of scale; distributed connectivity and free or very low-cost participation. Indeed, from a certain perverse perspective spam can be presented as the Internet's infrastructure used maximally and most efficiently.

So what is this telling us? Let us imagine.

What happens when generative AI creates a deadly mushroom foraging book and then an automated recommendation algorithm adds it to a standing order for a collections librarian trying to fit two full time roles into one set of working hours? Who is checking whether all of the articles and all of the journals in our big deals even exist, let alone make any sense? If a misconfiguration or a malicious publisher silently deleted all the subject headings or references related to a particular topic, how long would it take your library to even notice?

When you optimise for efficiency, things can be taken so far that there ceases to be any meaning at all.

The people building systems that advise us to add glue to our pizza toppings, eat crushed glass, and drive into the desert aren't interested in context, they're not interested in localised or place-based knowledge, and they're not interested in building human connections.

The information systems I want to work on and with would do more than simply push increasing amounts of text at decreasing amounts of attention.

Among other things, it's this context that makes me worry about the huge firehoses of metadata like Ex Libris's Central Discovery Index or the EBSCO Discovery Service. The efficiency of having a single source of metadata directly from publishers is obviously convenient. But it is in direct contradiction to the push for us to have culturally relevant, decolonised collection metadata.

We can't have it both ways.

The Shock of the Old

David Edgerton is an English historian and educator. His The Shock of the Old is a history of technology in the twentieth century. Edgerton makes the case that the history of technology-in-use provides a much more realistic view of how invention and innovation works, and particularly highlights the importance of repair and maintenance.

The British Library is an interesting case study. Although it doesn't feature in Edgerton's book, we can think with some of his themes to find some lessons for our own libraries.

In October last year, the British Library was hit by a ransomware attack that knocked most of its systems offline for weeks and weeks on end. Some of these systems are still unusable.

When I speak of what Tyson Yunkaporta tells us about allowing interactions to change us, this isn't what I mean!

The Library's official incident review provides details of how the cyber attack came about, but the much more interesting lesson for us is what the consequences were. It reads in part:

The Library's vulnerability to this particular kind of attack has been exacerbated by our reliance on a significant number of ageing legacy applications which are now, in most cases, unable to be restored, due to a combination of factors including technical obsolescence, lack of vendor support, or the inability of the system to operate in a modern secure environment... A few key software systems, including the library management system, cannot be brought back in the form that they existed in before the attack, either because they are no longer supported by the vendor and the software is no longer available, or because they will not function on the Library's new secure infrastructure

  • "ageing legacy applications"
  • "technical obsolescence"
  • "lack of vendor support"
  • "no longer supported by the vendor"
  • "software is no longer available"
  • "will no longer function on the organisation's new infrastructure"

Friends, does any of this sound familiar?

Like all other public libraries in the UK, the British Library clearly has been underfunded for the task it has been given responsibility for. But something I think has been lost in all the commentary about this event is the extent to which this catastrophe was the result of a mismatch between the infrastructure needs of a national library and the commercial interests of software companies.

The average lifespan of a company on the S&P 500 is now only around 20 years. Unless they are tech startups hoping to be acqui-hired, corporations want to do everything they can to make a profit for their shareholders today. They might be gone tomorrow.

Libraries and those of us who work within them have fundamentally different priorities. For all the recent noise about libraries and progressive social values, librarianship is ultimately a very conservative profession. How will we retain our collections and their metadata into the future? Can the data be migrated to a new system, and how long will the current one last? How easy will it be when the inevitable change to preferred terminology occurs, and we want to update our vocabularies? And most fundamentally, how can we guarantee that we always know where everything in our collection is?

It's the job of corporate executives to think about what's needed in the next financial quarter. But librarians often need to think about what's needed in the next generation.

This is why reading the British Library report was both horrifying, and horrifyingly familiar. I could see exactly how they got into this predicament, and I could visualise exactly the systems I'm responsible for that might suffer the same fate.

The ultimate conclusion of the British Library's report was that they are going to protect themselves in the future by moving as many systems as possible to "the cloud". But even they acknowledge that this doesn't really solve the problem, saying:

Moving to the cloud does not remove our cyber-risks, it simply transforms them to a new set of risks.

UniSuper recently learned about this new set of risks, when Google engineers accidentally deleted UniSuper's entire Google Cloud account and all its data.

I predict that Google, Amazon, and even Clarivate will all be long gone before the British Library closes its doors for the last time.

David Edgerton's book helps us to ask questions about the maintainability of systems that have long surpassed their expected lifetimes and the contexts in which they were created.

Will we still be able to access anything after our software as a service vendors declare bankruptcy? What internal processes have we optimised on the assumption that a vendor will take care of it? What happens when they don't? Do we even know how to export and import backups? Does anyone in our team still know how to create a marc record from scratch? Are we training the next generation of librarians who will come after us, and passing on the old knowledge that informs how and why we do things?

We need to ask these questions, because nobody else will.

Technical services and systems work loops and folds back on itself like the Birrarung. Sometimes we're moving forward and backwards at the same time. We use ideas from the nineteenth century or earlier, to build new systems and frameworks for the future. We always have to think about how things from the past will be embedded in the future, and how that affects our present.

So, we need hope.

Hope in the dark

Rebecca Solnit is one of the most skilled essayists of the last century, able to see both the most exquisite details and the sweeping vistas of meaning they are part of. In her collection of essays, Hope in the dark, she writes:

This is an extraordinary time full of vital, transformative movements that could not be foreseen. It's also a nightmarish time. Full engagement requires the ability to perceive both.

I've been talking about some of the frustrating, depressing and exhausting things about librarianship. It is, indeed, both an extraordinary and a nightmarish time. Rebecca Solnit helps us to find a way forward.

One way we can move forward is–as Tyson Yunkaporta teaches us–to reconsider the very idea of moving forward.

The British Library's systems were fragile, and would be difficult or impossible to recover in the case of an attack or other misfortune. The Library's past was always present, in the form of software abandoned by its creators, data formats whose documentation was lost, and security updates that hadn't been performed.

But it's not as if the Library staff didn't know about these things. Quite the opposite: the present they lived in was no doubt full of frustration that they were – for a variety of reasons – unable to remedy these problems. Various futures lived alongside them – from the eventual possibility of opening up software after patents and copyrights expired, to better funding or some kind of artificial intelligence breakthrough. The futures imagined by library staff, by British politicians, and by international hackers interacted with the present of an under-resourced tech team and a couple of basic information security mistakes, with the past of closed-source systems and out-of-business vendors.

If we think in terms of spirals rather than straight lines, if we think of everywhen, then we can more easily understand that pasts, presents and futures exist simultaneously. We can open our imaginations to new possibilities. David Edgerton tells us:

The history of invention is not the history of a necessary future to which we must adapt or die, but rather of failed futures, and of futures firmly fixed in the past. We should feel free to research, develop, innovate, even in areas which are considered out of date by those stuck in passé futuristic ways of thinking.

David Edgerton encourages us to ignore what he calls "passé futuristic ways of thinking". This is a beautiful phrase, but it also captures an approach I think librarians need to embrace more comfortably.

I've lived through a profession-wide panic based on fear of obsolescence. And after a decade or more chasing the latest trends, libraries are now struggling to find people with the kind of deep knowledge about metadata and technical systems that you all know is crucial to running a large library, an d to any claim we have to being a profession. This is not a problem that can be solved through individual hiring decisions, because it's a systemic problem born of a failed future.

We need to fix it together.

I would go further than Edgerton. I strongly encourage you to research, develop, and innovate, especially in areas which are considered out of date by those stuck in passé futuristic ways of thinking.

I've been as guilty as anyone else of venting my frustration by suggesting we should just burn everything down and start again.

But we don't live in a perfect world, and frankly we don't have the resources to start from scratch. We have to make the future whilst operating in the present with the tools we've inherited from the past. As Rebecca Solnit tells us,

Waiting until everything looks feasible is too long to wait.

So what are we going to do? At the beginning of this talk I said I had four things I'd like us to think about:

Lots of things can be true at the same time

Traditional library practice sees like a state, assuming there is "only one thing going on". We need to continue to apply multiple vocabularies to our collections, expressing multiple worldviews. Linked data hasn't yet lived up to its promise, but it might help. However we do it, we'll need to be comfortable with the idea that these different ways of organising concepts and connections between things don't necessarily map neatly onto each other.

Lots of times can be true at the same thing

Everything we do has a past and a future that are active in the present. Maybe multiple futures. Make time to write good documentation for your future self or the person who replaces you. They'll be incredibly grateful. Try to understand why things are set up the way they are, but remember that you don't have to doing things that way. Things that seemed like a good idea at the time ...maybe weren't. And remember that the time that things erroneously seemed like a good idea might be our time, right now. Try to think about the consequences of your decisions in 5, 10, or 25 years. Maybe saving a few dollars this year could end up being very expensive.

Know where you are

Central indexes can be convenient, but we need to be sophisticated in how we use them. We live and work in Australian and New Zealand contexts. Our library collections and their descriptions should reflect that.

Most of our institutions at least pay lip service to the idea that they need to be more culturally aware, and to "decolonise" or "Indigenise". If that's going to happen, we need to be describing our collections in locally contextual, culturally appropriate terms. This isn't something where we can just press a button or tick off a project in a single annual plan. And it's only partially something we can solve collectively with agreed standards. It's a mindset and an ongoing, local responsibility. The people who allocate money and other resources in our organisations don't want to hear this, but that's the reality if we're serious about it.

I don't have any magical solutions to this for you today, but I'd really like for us to keep talking about it and share ideas about what is working in your context.

Share what you know

ANZREG is a wonderful community that shares freely. But it seems to me that we could be sharing more ideas, tools, and techniques in librarianship generally. I encourage you all to be a little more brave. Send that "dumb question" to the email list. Publish that blog post you're not sure about. Post some code to GitHub. Say yes when someone invites you to give a conference talk. You're good enough to write a journal article. You know enough to peer review a conference talk or a paper.

Tyson Yunkaporta quotes his friend Katherine Collins in his latest book Right Story, Wrong Story. She says:

When learning new things, we are trained to think Is this true or false? But it is so much better to think When will this be useful? Also When should I not rely on this? When will it fall apart?

These are good questions to think about as we attend the sessions this week. Let's make connections, and let them change us, and think about how in turn we're going to change the systems we work in and with and on.


Cloudfront in front of S3 using response-content-disposition / Jonathan Rochkind

At the Science History Institute Digital Collections, we have a public collection of digitized historical materials (mostly photographic images of pages). We store these digitized assets — originals as well as various resizes and thumbnails used on our web pages — in AWS S3.

Currently, we provide access to these assets directly from S3. For some of our deliveries, we also use the S3 feature of a response-content-disposition query parameter in a signed expiring S3 url, to have the response include an HTTP Content-Disposition header with a filename and often attachment disposition, so when the end-user saves the file they get a nice humanized filename (instead of our UUID filename on S3), supplied dynamically at download time — while still sending the user directly to S3, avoiding the need for a custom app proxy layer.

While currently we’re sending the user directly to urls in S3 buckets set with public non-authenticated access, we understand a better practice is putting a CDN in front like AWS’s own CloudFront. In addition to the geographic distribution of a CDN, we believe this will give us: better more consistent performance even in the same AWS region; possibly some cost savings (although it’s difficult for me to compare the various different charges over our possibly unusual access patterns); and additionally access to using AWS WAF in front of traffic, which was actually our most immediate motivation.

But can we keep using the response-content-disposition query param feature to dynamically specify a content-disposition header via the URL? It turns out you certainly can keep using response-content-disposition through CloudFront. But we found it a bit confusing to set up, and think through the right combination features and their implications, with not a lot of clear material online.

So I try to document here the basic recipe we have used, as well as discuss considerations and details!

Recipe for CloudFront distribution forwarding response-content-disposition to S3

  • We need CloudFront to forward response-content-disposition header to s3 — by default it leaves off query string (after ? in a URL) when forwarding to origin. You might reach for a custom Origin Request Policy, but it turns out we’re not going to need it, because a Cache Policy will take care of it for us.
  • If we’re returning varying content-disposition headers, we need a non-default Cache Policy such that the cache key varies based on response-content-disposition too — otherwise changing the content-disposition in query param might get you a cached response with old stale content-disposition.
    • We can create a Cache Policy based on the managed CachingOptimized policy, but adding the query params we are interested in.
    • It turns out including URL query params in a Cache Policy automatically leads to them being included in origin requests, so we do NOT need a custom Origin Request Policy. Only a custom Cache Policy that includes response-content-disposition
  • OK, but for the S3 origin to actually pay attention to the response-content-disposition` header, you need to set up a CloudFront Origin Access Control  (OAC) given access to the S3 bucket, and set to “sign requests”. Since S3 only respects this header for signed requests.
    • You don’t actually need to restrict the bucket to only allow requests from CloudFront, but you probably want to make sure all your buckets requests are going through cloudfront?
    • You don’t need to restrict the CloudFront distro to Restrict viewer access, but there may be security implications of setting up response-content-disposition forwarding with non-restircted distro? More discussion below.
    • Some older tutorials you may find use AWS “Origin Access Identity (OAI)” for this, but OAC is the new non-deprecated way, don’t follow those tutorials.
    • Setting this all up has a few steps, and but this CloudFront documentation page leads you through it.

At this point your Cloudfront distribution is working to forward response-content-disposition headers, and return the resultant content-disposition headers in response — Cloudfront by default forwards on all response headers from origin, by default if you haven’t set a distribution behavior “Response headers policy”. Even setting a response headers policy like Managed-CORS-with-preflight-and-SecurityHeadersPolicy (which is what I often need), it seems it forwards on other response headers like content-disposition no problem.

Security Implications of Public Cloudfront with response-content-disposition

An S3 bucket can be set to allow public access, as I’ve done with some buckets with public content. But to use the response-content-disposition or response-content-type query param to construct a URL that dynamically chooses a content-disposition or content-type — you need to use an S3 presigned url (or some other form of auth I guess), even on a public bucket! “These parameters cannot be used with an unsigned (anonymous) request.”

Is this design intentional? If this wasn’t true, anyone could construct a URL to your content that would return a response with their chosen content-type or content-disposition headers. I can think of some general vague hypothetical ways this could be used maliciously, maybe?

But by setting up a CloudFront distribution as above, it is possible to set things up so an unsigned request can do exactly that. http://mydistro.cloudfront.net/content.jpg?response-content-type=application%2Fx-malicious, and it’ll just work without being signed. Is that a potential security vulnerability? I’m not sure, but if so you should not set this up without also setting the distribution to have restricted viewer access and require (eg) signed urls. That will require all urls to the distribution to be signed though, not just the ones with the potentially sensitive params.

What if you want to use public un-signed URLs when they don’t have these sensitive params; but require signed URLs when they do have these params? (As we want the default no-param URLs to be long-cacheable, we don’t want them all to be unique time-limited!)

Since CloudFront “restricted access” is set for the entire distribution/behavior, you’d maybe need to use different distributions both pointed at the same origin (but with different config). Or perhaps different “behaviors” at different prefix paths within the same distribution. Or maybe there is a way to use custom Cloudfront functions or lambdas to implement this, or restrict it in some other way? I don’t know much about that. It is certainly more convoluted to try to set up something like how S3 alone works, where straight URLs are public and persistent, but URLs specifying response headers are signed and expiring.

Other Considerations

You may want to turn on logging for your CloudFront distro. You may want to add tags to make cost analysis easier.

In my buckets, all keys have unique names using UUID or content digests, such that all URLs should be immutable and cacheable forever. I want the actual user-agents making the request o get far-future cache-control headers. I try to set S3 cache-control metadata with far-future expiration. But if some got missed or I change my mind about what these should look like, it is cumbersome (and has some costs) to try to check/reset metadata on many keys. Perhaps I want the CloudFront distro/behavior to force add/overwrite far-future cache-control header itself? I could do that either with a custom response headers policy (might want to start with one of the managed policies, and copy/paste it modifying to add cache-control header), or perhaps a custom origin request policy that added on a S3 response-cache-control query param to ask S3 to return a far-future cache-control header. (You might want to make sure you aren’t telling the user-agent to cache error messages from origin though!)

You may be interested in limiting to a CloudFront price class to control costs.

Terraform example

Terraform files demonstrating what is described here can be found: https://gist.github.com/jrochkind/4edcc8a4a1abf090a771a3e0324f6187

More detailed explanation below.

Detailed Implementation Notes and Examples

Custom Cache Policy

Creating cache polices discussed in AWS docs.

Documentation that Cache Policy results in query params being included in origin requests from documentation on Control origin requests with a policy.

Although the two kinds of policies are separate, they are related. All URL query strings, HTTP headers, and cookies that you include in the cache key (using a cache policy) are automatically included in origin requests. Use the origin request policy to specify the information that you want to include in origin requests, but not include in the cache key. Just like a cache policy, you attach an origin request policy to one or more cache behaviors in a CloudFront distributionz

You set a cache policy for your distribution (or specific behavior) by editing a Behavior here:

I created the Cache Policy with TTL values from “CachingOptimized” managed behavior, and added the query params I was interested in:

Which looks like this in terraform:

 resource "aws_cloudfront_distribution" "example-test2" {
      # etc
      default_cache_behavior {
          cache_policy_id        = "658327ea-f89d-4fab-a63d-7e88639e58f6"
      }
}

resource "aws_cloudfront_cache_policy"  "jrochkind-test-caching-optimized-plus-s3-params" {
  name        = "jrochkind-test-caching-optimized-plus-s3-params"
  comment     = "Based on Managed-CachingOptimized, but also forwarding select S3 query params"
  default_ttl = 86400
  max_ttl     = 31536000
  min_ttl     = 1
  parameters_in_cache_key_and_forwarded_to_origin {
    enable_accept_encoding_brotli = true
    enable_accept_encoding_gzip   = true

    cookies_config {
      cookie_behavior = "none"
    }
    headers_config {
      header_behavior = "none"
    }
    query_strings_config {
      query_string_behavior = "whitelist"
      query_strings {
        items = [
          "response-content-disposition",
          "response-content-type"
        ]
      }
    }
  }
}

Cloudfrong Origin Access Control (OAC) to sign requests to S3

Covered in CloudFront docs Restrict access to an Amazon Simple Storage Service origin, which lead you through it pretty nicely.

While you could leave off the parts that actually restrict access (say allowing public access), and just follow the parts for setting up an OAC to sign requests… you probably also want to restrict access to s3 so only CloudFront has it, not the public?

Relevant terraform follows. (You may want to use templating feature for the json policy, shown in complete example above).

resource "aws_cloudfront_distribution" "example-test2" {
    # etc
    origin {
        connection_attempts = 3
        connection_timeout  = 1
        domain_name         = aws_s3_bucket.example-test2.bucket_regional_domain_name
        origin_id           = aws_s3_bucket.example-test2.bucket_regional_domain_name
        origin_access_control_id = aws_cloudfront_origin_access_control.example-test2.id
    }
}

resource "aws_s3_bucket_policy" "example-test2" {
    bucket = "example-test2"
    
    policy = jsonencode(
        {
            Id        = "PolicyForCloudFrontPrivateContent"
            Statement = [
                {
                    Action    = "s3:GetObject"
                    Condition = {
                        StringEquals = {
                            "AWS:SourceArn" = aws_cloudfront_distribution.example-test2.arn
                        }
                    }
                    Effect    = "Allow"
                    Principal = {
                        Service = "cloudfront.amazonaws.com"
                    }
                    Resource  = "arn:aws:s3:::example-test2/*"
                    Sid       = "AllowCloudFrontServicePrincipal"
                  },
            ]
            Version   = "2008-10-17"
        }
    )
}

resource "aws_cloudfront_origin_access_control" "example-test2" {
  description                       = "Cloudfront signed s3"
  name                              = "example-test2"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

Restrict public access to CloudFront

We want to require signed urls with our CloudFront distro, similar to what would be required with a non-public S3 bucket directly. Be aware that CloudFront uses a different signature algorithm and type of key than s3 and expirations can be further out.

See AWS doc at Serve private content with signed URLs and signed cookies.

  • Create a public/private RSA key pair
    • openssl genrsa -out private_key.pem 2048
    • extrat just public key with openssl rsa -pubout -in private_key.pem -out public_key.pem
    • Upload the public_key.pem to CloudFront “Public Keys”, and keep the private key in a secure place yourself.
  • Create a CloudFront “Key Group”, and select that public key from select menu
  • In the Distribution “Behavior”, select “Restrict Viewer Access”, to a “Trusted Key Group”, and choose the Trusted Key Group you just created.

Now all CloudFront URLs for this distribution/behavior will need to be signed to work, or else you’ll get an error Missing Key-Pair-Id query parameter or cookie value. See Use signed URLs. (you could also use a signed cookie, but that’s not useful to me right now).

You’ll need the private key to sign a URL. Note that CloudFront uses an entirely different key signing algorithm, protocol, and key than s3 signed urls! Shrine’s S3 docs have a good ruby example of using ruby AWS SDK Aws::CloudFront::UrlSigner, which will by default use a “canned” policy. (I’m not sure the default expiration you’ll get without specifing it in the call, as in that example.)

In terraform, the public key, trusted key group, and distribution settings might look like the following, using a “canned” policy that just has a simple expiration. Passing a custom expiration for 7 days in future might look something like this:

signed_url = signer.signed_url(
  "https://mydistro.cloudfront.net/content.jpg?response-content-disposition=etc",
  expires: Time.now.utc.to_i + 7 * 24 * 60 * 60,
)

Terraform for creating restricted cloudfront access as above:

resource "aws_cloudfront_public_key" "example-test2" {
  comment     = "public key used by our app for signing urls"
  encoded_key = file("public_key-example-test2.pem")
  name        = "example-test2"
}

resource "aws_cloudfront_key_group" "example-test2" {
  comment = "key group used by our app for signing urls"
  items   = [aws_cloudfront_public_key.example-test2.id]
  name    = "example-test2"
}

resource "aws_cloudfront_distribution" "example-test2" {
  # etc
  trusted_key_groups = [aws_cloudfront_key_group.example-test2.id]
}

(Warning, with terraform aws provider v5.53.0, to have terraform remove the trusted_key_groups and have the distro be public again, have to leave in trusted_key_groups = [], rather than remove the key entirely. Perhaps that’s part of how terraform works)

On Donella Meadows and Systems Thinking / Jez Cope

This weekend I started reading Donella Meadows' Thinking in Systems: A Primer and I cannot overstate how profoundly glad I am to have come across Systems Thinking as a whole field of study. It pulls together so many things that have interested me over the years, and makes sense of a whole load of things I’ve observed about the world around me.

I used to tell people that my abiding interest over time was the study of systems, but stopped because most people assumed that meant computer systems. I meant complex systems in the broadest sense, and I found it terribly frustrating that people thought I would focus only on the stereotypical white male nerd interest of “computers”.

Obviously, if you know me at all, then assuming the unsaid “computer” in systems is pretty reasonable, as it’s where I spend most of my time. Computers were the first systems I found that I could have a significant level of control over. They gave me a marketable skillset and I fitted into the stereotype so my interest in them was encouraged.

But I’ve always been just as interested in computers and software for the part they play in wider systems, especially those involving people. Maybe that’s partly why I’ve never been tempted to go for higher paid work in the software industry where I would be expected to have that narrow focus and only care about my team’s sub-sub-subsystem. I guess as an undiagnosed autistic kid, developing a special interest in the interactions and relationships between people was something that enabled me to go through the motions and fit in, which undoubtedly kept me safe, though probably at the expense of my mental health.

In any case, I am where I am now and thankfully have enough time and energy to explore this. Meadows’ work resonates with me especially because of the role she played in bringing a systems perspective to environmental and societal issues. She was the main author of the 1972 report, The Limits to Growth, which predicted that a business-as-usual approach to economic and population growth on a finite planet would inevitably lead to a collapse. Largely ignored by mainstream policymakers at the time, those predictions are looking increasingly prescient.

N.B. This post is an extended version of this Mastodon thread.

NDSA Welcomes 1 New Member in Quarter 2 of 2024 / Digital Library Federation

As of June 2024, the NDSA Coordinating Committee voted to welcome one new applicant into the membership. This member brings a host of skills and experience to our group. Keep an eye out for them on your working and interest group calls and be sure to give them a shout out. Please join me in welcoming our new members! To review our list of members, you can see them here.

The Gates Preserve

On their website, The Gates Preserve, “believes that archiving is a statement of value. Taking the time to gather, contextualize, catalog, and articulate the moments of an individual, a culture, or a subculture is critical to its legacy persisting into the future. We believe in thoughtful, carefully constructed legacies that are presented and shared in a way that honors its subjects. This is what we’ve attempted to do here.”  In their application, The Gates Preserve states that they would “like to be in alignment with associations that are doing work in alignment with the services they offer their clients” and the activities and commitments in digital preservation. 

 

The post NDSA Welcomes 1 New Member in Quarter 2 of 2024 appeared first on DLF.

‘Digital literacy, not just access, is Africa’s biggest problem’ / Open Knowledge Foundation

This is the tenth conversation of the 100+ Conversations to Inspire Our New Direction (#OKFN100) project.

Since 2023, we are meeting with more than 100 people to discuss the future of open knowledge, shaped by a diverse set of visions from artists, activists, scholars, archivists, thinkers, policymakers, data scientists, educators, and community leaders from everywhere.

The Open Knowledge Foundation team wants to identify and discuss issues sensitive to our movement and use this effort to constantly shape our actions and business strategies to deliver best what the community expects of us and our network, a pioneering organisation that has been defining the standards of the open movement for two decades.

Another goal is to include the perspectives of people of diverse backgrounds, especially those from marginalised communities, dissident identities, and whose geographic location is outside of the world’s major financial powers.

How openness can accelerate and strengthen the struggles against the complex challenges of our time? This is the key question behind conversations like the one you can read below.

*

This time we did something different, a collective conversation. Last week we had the chance to bring together several members of the Open Knowledge Network to talk about the current context, opportunities and challenges for open knowledge in Africa.

The conversation took place online in English on 4 June 2024, with the participation of Romeo Ronald Lomora (South Sudan), Justine Msechu (Tanzania), Oluseun Onigbinde (Nigeria) and Maxwell Beganim (Ghana), moderated by Lucas Pretti, OKFN’s Communications & Advocacy Director.

One of the important contexts of this conversation is precisely Maxwell’s incorporation as regional coordinator of the Network’s Anglophone Africa Hub. With this piece of content, we also aim to facilitate regional integration and find common points of collaboration for shared work within the Network. That’s why we started by asking him to introduce himself in his own words. 

We hope you enjoy reading it.

*

Maxwell Beganim: My name is Max, and I currently serve as the Anglophone Africa Coordinator for the Open Knowledge Network. My primary focus is to consolidate efforts and collaborate with other Anglophone countries to advance the conversation around open knowledge in our region. I’ve been deeply involved in the open ecosystem for the past six years, starting initially with Wikimedia projects like Wikipedia and later expanding into OpenStreetMap and Creative Commons activities through the recently established Ghana chapter.

One of my significant projects is the Kiwix project in Ghana, where we address the digital divide by providing offline educational resources to senior high schools. This initiative is crucial as it bridges the gap between basic and tertiary education, ensuring access to essential information even in areas with limited connectivity.

As a language activist, I co-founded the Ghanaian Language Wikimedia Community and the Ghanaian Pidgin Wikimedia Community to promote linguistic diversity and knowledge sharing in local languages. 

Recently, I’ve been involved in launching the Open Goes COP coalition, stemming from my experience at international conferences like COP, where I observed a gap in integrating open technologies with climate discussions. This project brings together climate activists and open knowledge enthusiasts to amplify our impact during global environmental conversations.

In summary, my journey in the open knowledge and tech space revolves around promoting educational equity, linguistic diversity, and environmental sustainability across Anglophone Africa. I’m passionate about leveraging open licensing and collaborative efforts to empower communities and drive positive change on multiple fronts.

Romeo Ronald Lomora: First, congratulations on establishing the Ghana chapter. 

In my country, South Sudan, we face significant challenges, particularly in terms of digital literacy. For example, only about 10% of our population is digitally literate. When I tried to implement a Kiwix project here, I realised that it couldn’t be as effective as in Ghana because most students don’t even know how to use computers.

I’ve started with basic ICT literacy for many young people. I saw a great project on Open Knowledge about storing government archives, but I wonder if these solutions are viable if citizens don’t understand the structures to access this information. My biggest question is about the new African Hub. Is there a space to share knowledge, exchange experiences, discuss resources and build on these initiatives to solve our challenges?

Maxwell Beganim: Thank you Romeo. I remember working with you and appreciate the incredible work you’re doing in the open ecosystem. I agree that while we share some common issues, our challenges can be quite different. Let me respond to your questions with an example.

In Ghana, during our Kiwix implementation, we had to visit schools and install Kiwix on their computers. However, many schools only had two working computers. So we shifted our focus to digital citizenship, teaching media and information literacy and digital etiquette. This prepares students to understand the digital ecosystem in theory.

Once they’re comfortable with digital citizenship, we train them to use Kiwix, which mimics the internet without the need for a connection. We found that students didn’t know about YouTube or Wikipedia, but they were using their phones to place bets. So we taught them about valuable resources like Wikipedia and showed them how to translate and write articles.

We also identified a digital literacy gap between teachers and students. We trained teachers and provided them with offline content so that they could use it for learning and teaching.

In terms of knowledge sharing, we’ve started mapping the ecosystem and planning information sessions to explain the structures and the Open Knowledge Global Directory. We’ll continue to bring in experts to share best practice and initiatives.

In the climate field, we’ve identified people working in the open and climate fields and will host webinars to share their work. My aim as Regional Coordinator is to support grassroots projects in an inclusive way. I welcome your ideas to co-curate this process and make us proud of our collective achievements. Let’s use our experience with Wikipedia and grassroots mobilisation to adapt and improve our initiatives.

Justine Msechu: You mentioned that while spaces for free access to knowledge are being created, the availability of resources remains a challenge. We can’t provide resources to everyone who needs access to information immediately; it’s a long way to go. My question is, how does the Open Knowledge Network support organisations in creating spaces for access to knowledge? What resources do you provide to help them share information with those who need it?

Maxwell Beganim: One critical point to note is that when I started with Kiwix, it quickly gained traction because it was a low-cost solution. Initially I just had Kiwix as a platform and used it as a starting point. Now we’re working on a toolbox model in Ghana to pool resources.

The goal is to scale this model within the open ecosystem and train representatives to use this information in the classroom and beyond. As regional coordinator, I’m co-curating this effort. If there’s something interesting, we’re open to incorporating it. For example, I recently received an email from the Reading Wikipedia in the Classroom project, where I’m a certified trainer. There’s also an Open Learning Collective that wants to help, and I’m planning to get in touch with them.

We’re doing a mapping exercise to find out who, where and what resources are available. This collaborative approach involves everyone, like you, Justine, to understand and gather resources for the grassroots.

The two main initiatives are offline learning with open and learning collectives, and the Ghana Model Box project. I’m doing a six-month training with Sabier to develop the Ghana Moodle toolbox. Once equipped, I will be able to train others in our community to implement these projects locally. For now, these are the resources and strategies we’re focusing on.

Oluseun Onigbinde: Congratulations, Max, on this new role and the chapter in Ghana. I hope at some point Nigeria will be up to speed. 

I would just like to say that I think we might need an Anglophone Africa strategy to clarify our objectives. For example, there are projects like electoral initiatives and data commons systems, but we need to think about what is important to us in Africa. Is it bridging the digital divide, accountability issues or something else? The Open Knowledge Foundation has done a lot of work on open spending in the past, working on budgets and more.

It’s important to identify what we’re going to focus on so that in a year’s time we can look back and see our achievements in countries like Ghana, Nigeria and Kenya. We need joint strategies and projects. Perhaps we could have an Anglophone Africa Day to host events and discuss what data we want to liberate and what ecosystems we want to stimulate. We should explore joint programming, peer learning and sharing. What do you think?

Maxwell Beganim: Thanks, I completely agree with the points you’ve raised about having a common strategy. Going forward, we should have regular meetings to update each other and learn from best practices. A unified strategy is crucial and I’ll discuss this idea with the main team. We’ve started with a Telegram platform for Anglophone Africa where we can share information and brainstorm together.

Co-curation is essential because shared ownership makes everyone feel involved. We could have a meeting for Anglophone Africa where you share your experiences and we define our goals together. For example, Open Goes COP, an initiative of Open Knowledge Ghana, can be a starting point. After the launch, we can identify thematic areas such as elections and leverage knowledge sharing and capacity building.

Lucas Pretti: What about resources? How do you mobilise them? Is there support from the government, public authorities, CSOs and foundations? How do you manage funding for initiatives in Africa, especially in your countries?

Maxwell Beganim: In my experience, openness initiatives are often supported by organisations such as Creative Commons, the Open Government Partnership (OGP) and the Open Society Foundation (OSF). Government support is rare, but there are working partnerships. For example, we’re working with the National Folklore Board in Ghana on the Wiki Loves Living Heritage project.

Another big problem is that people don’t even know what openness is. When you start talking about openness, people just sit down and look at you and say “what are you talking about?” So the onus is on us as civil society organisations to start looking at ways in which people can build their skills when it comes to the whole open ecosystem.

In Ghana, government funding for openness projects is minimal, but we’re trying to integrate these initiatives gradually. Most of the funding comes from external sources like the Wikimedia Foundation, as local government systems are not always transparent or supportive.

What about you, Romeo? How do you mobilise resources in South Sudan?

Romeo Ronald Lomora: In my country, direct government support is rare, but there are funds available for certain ICT initiatives. However, accessing these funds is complex and often requires connections, who you know in the government. It’s not very systematic. For example, the National Communications Authority supports some initiatives, but the process isn’t yet clear to the public.

External sources that understand the role of open knowledge, such as the Wikimedia Foundation or Open Society, are more reliable. We often work in silos, disconnected from government and the public, which makes it difficult to explain the importance of our work and secure local support.

So while government funding is available, it is difficult to access and we rely primarily on external organisations that understand the importance of open knowledge and society.

I am curious to hear from you, Justine, about the reality of funding in Tanzania.

Justine Msechu: It’s very similar to what Romeo described. Often the funding comes from external donors and not from the government. Even when we receive funding from donors outside the country, government officials sometimes expect a share of that funding.

For example, there was a case where an organisation provided computers to local government schools. But when these computers arrived at the airport, it became a big problem. Customs officials wanted to inspect everything and insisted that someone pay for the transport, even though the computers had been donated to government schools. This highlights the lack of support from the government, as they create additional hurdles rather than facilitating the process.

As in other countries, we rely heavily on external organisations such as the Wikimedia Foundation and others for support. So we need to focus on how we can support each other within our network. Perhaps we can create a platform to present our needs to the government and seek their support more systematically.

Oluseun Onigbinde: Let me move on to research. Much of our research still relies heavily on academic sources, often from foreign journals, which are seen as a form of validation. However, these journals tend to be very expensive to access and publish in, especially with the recent exchange rate devaluations making it even more expensive. This is a significant barrier.

If we want to be a respected hub for open knowledge resources, making research more accessible would be highly beneficial. We often underestimate the value of the research repositories we already have. For example, in our work at BudgIT, researchers are the primary users of our content, even more so than the general public. This suggests a strong demand for accessible, high quality research.

Perhaps we should think about creating an open knowledge space dedicated to research. Current platforms often have paywalls that restrict access.

Lucas Pretti: Do you all think that there’s at least a willingness on the part of governments or organisations to become more transparent and open, especially in terms of public policy, digital public infrastructure (DPI), free software and so on? Is there a public discourse about it?

Oluseun Onigbinde: There seems to be some discursive adoption of these ideas by governments and organisations, but the practical implementation is still lacking. First of all, we need to understand the specific ecosystem we’re dealing with. Are we focusing on universities, the general public, development agencies or development institutions? Once we’ve identified our audience, we need to clearly communicate what ‘open’ means and what the benefits are for them.

It’s important to frame this in terms of incentives. We should ask ourselves: what do these stakeholders gain by being open? For example, are they contributing to a larger ecosystem? Do they increase the visibility of their work? Do they facilitate the exploitation of knowledge? Early on, we developed a thesis around these incentives to encourage participation in an open knowledge environment.

To make openness a more prominent public statement, we must first identify our target audience and the major sources of knowledge within the continent. From there, we can define specific incentives that resonate with them and clearly communicate the benefits of being part of an open, transparent ecosystem.

Romeo Ronald Lomora: Recently a university from the UK came to South Sudan to do research on the heritage and tribes of the country. They documented a lot, including songs and photographs, and did a lot of work. But the concern is that all the products of this research are being taken back to London and are not available to the local people.

In our discussions, we noted a duplication of existing research content on certain issues. Researchers come with resources and carry out studies, but the results are not accessible to the local community. This raises the question: what does access to these resources mean for us? Should there be laws to ensure that research results are available locally?

In South Sudan, our academia does not focus much on research or policy development. For example, at the National Bureau of Statistics, as we approach an election, we’re relying on outdated statistics from over 30 years ago due to budget constraints. This highlights the complexity of the problem, which goes beyond a lack of access to resources or academic focus.

Oluseun mentioned that as part of our strategy we need to define what open access to information means for us. This could include creating shared resources and doing specific research in our country that can be made openly accessible to help people make better policy decisions.

Maxwell suggested that we map our resources in more detail. We need to be clear about what we are mapping and what we want to discuss in our meetings. These discussions should lead to the development of specific services or products that can help us build our structure.

Finally, we need to consider the broader framework of open knowledge in Africa. What does open knowledge mean to us as the Open Knowledge Network? How can we adapt these principles to our context? By redefining existing factors and finding meaningful solutions, we can develop a robust open knowledge structure. I’m glad that we are addressing these questions together. 

Lucas Pretti: Does everyone here agree with Max’s statement that access is the main issue in Africa as a whole? 

Oluseun Onigbinde: That’s an important point. But we might also need to redefine access in some way. When you say access, it might just be about what platform is there for us to publish. That’s a big point for us to diagnose. When we publish, what is the distribution of these works? Are people able to find these documents easily? And once they find them, can they reuse them properly? 

In this age of AI and fluid governance of digital publishing, we need to take that context to discuss the rules that govern access to those publishing platforms.

Justine Msechu: Access is still a big challenge here in Tanzania. For example, we don’t have a common platform where researchers or academics can put things up and everyone can read them. I know of some universities that even tell students not to use Wikipedia because it’s not a good source of information. There are a lot of people trying to put content on Wikipedia using the Swahili language and prove that it’s a free and safe space for people to share their knowledge and access it for free.

Romeo Ronald Lomora: Let me say this. The government of South Sudan has published a budget for this year and it’s open. It is there. People will get it. But then the big question is: do people even understand this budget? Do they understand how it relates to them? Do they understand how that information is there? 

The challenge for me is to understand the barriers that are really associated with access, rather than just looking at access itself. To redefine access, we need to go back to the basics and identify digital literacy as the biggest problem. Most people don’t really understand how to use a computer, so imagine the whole digital ecosystem. The information is there, it’s open, but they’re not going to access it. So if you ask me, is access still a challenge? I would say yes, but to what extent can we go back and redefine it as a broader educational challenge?

In South Sudan, there’s a huge information gap between the people and the government. People don’t really seem to trust or care about what’s happening and what the government is doing. So my question is, what are the small steps to make people understand? I like what your organisation is doing, Oluseun, to simplify very complex ideas into simple visualisations. Like with budgets, breaking it down into infographics, getting people to understand and really relate to these huge numbers. That’s the first step before you can think about taking action or reacting to it. 

Justine Msechu: I think this meeting is very important because I think it’s time to start within ourselves and our organisations and the small community that we are forming. We are building links and a platform to come together and share resources. For example, maybe I’ll start working on an education project where Romeo could come and teach my community what to do with computers and so on. We can definitely start our own community.

Oluseun Onigbinde: Yes, as well as the feedback about the need to help with data visualisation, breaking down information, simplification and so on. That would be a real resource that we could help the community with. Just ping me when we do the first info sessions and I will get myself and my team to show a little bit of what we have done over the years and tell you about our guiding principles.

Maxwell Beganim: Thank you, everyone. I think we have a common understanding emerging here. We are identifying where we can start in terms of education, media information literacy and digital citizenship.

I’ve really enjoyed the conversation and I don’t want this to be a one-off. We should find ways to reconnect and then start having some of these conversations to help the Network.

My vision is to strengthen our Anglophone African Hub, build a sustainability plan and really start to have the impact that we want. Those are just my closing remarks. Thank you very much for your respect and for helping us to work together and co-curate this together.

Making Magic Happen / Information Technology and Libraries

For many patrons, libraries are synonymous with books and reading. However, people don’t always take advantage of reader’s advisory services offered by libraries. Rather than approaching librarians for suggestions of what to read, most people instead turn to their personal networks or express a preference for more passive approaches to recommendations. As a halfway point between in-person reader’s advisory interactions and algorithmic recommendations, Worthington Libraries staff leveraged the NoveList and Polaris APIs to create custom book recommendation kiosks. Recommendation Stations, as we call them, allow people to scan a book barcode, browse read-alikes, check local availability and print shelf locations, all in the guise of an interactive fortune teller.

On-Demand Circulation of Software Licenses / Information Technology and Libraries

The Miami University Libraries (MUL) developed an open-source Software Checkout system to allow patrons to make use of software licenses owned by the library. The system takes advantage of user-based licensing under the Software as a Service (SaaS) license model and vendor-created APIs to easily and legally assign access to users. The service currently supports Adobe Creative Cloud, Final Cut Pro, and Logic Pro software. MUL has successfully used this software for three years. This article describes the expansion of offerings and the increasing use of the service over that time. Built on a model developed by Pixar for managing employee software licenses, the Software Checkout system is believed to be the first of its kind for circulating licenses to library patrons. Both this lending model and the open-source software developed by MUL are available to other libraries. This paper is intended to prompt libraries to take advantage of the legal and technical environment to expand software license sharing to other libraries.

A Framework for Measuring Relevancy in Discovery Environments / Information Technology and Libraries

Institutional discovery environments now serve as central resource databases for researchers in the academic environment. Over the last several decades, there have been numerous discovery layer research inquiries centering primarily on user satisfaction measures of discovery system effectiveness. This study focuses on the creation of a largely automated method for evaluating discovery layer quality, utilizing the bibliographic sources from student research projects. Building on past research, the current study replaces a semiautomated Excel Fuzzy Lookup Add-In process witha fully scripted R-based approach, which employs the stringdist R package and applies the Jaro-Winkler distance metric as the matching evaluator. The researchers consider the error rate incurred by relying solely on an automated matching metric. They also use Open Refine for normalization processes and package the tools together on an OSF site for other institutions to use. Since the R-based approach does not require special processing or time and can be reproduced with minimal effort, it will allow future studies and users of our method to capture larger sample sizes, boosting validity. While the assessment process has been streamlined and shows promise, there remain issues in establishing solid connections between research paper bibliographies and discovery layer use. Subsequent research will focus on creating alternatives to paper titles as search proxies that better resemble genuine information-seeking behavior and comparing undergraduate and graduate student interactions within discovery environments.

Implementing Library Maker Projects Outside of the Makerspace / Information Technology and Libraries

The popularity and relevance of the library makerspace has been well-established and documented in the previous decade of researcher and practitioner work, including numerous hands-on guides from a variety of dimensions relevant to starting and operating a makerspace. Less studied, however, and the focus of this work are the applications of maker technologies within wider library work. Prior qualitative research conducted by the author included interviews with librarians to understand and document their use of maker technologies, such as the Raspberry Pi single-board computer, to support broader library work outside of the makerspace. The findings indicated that common use cases included running library display screens and collecting patron traffic numbers and environmental data. The objective of this subsequent case study is to examine the potential for wider use of such projects by librarians in an academic library setting, by introducing these projects into a new library setting and assessing the related code and educational materials developed by the researcher. This work reports on the findings of the case study, in which the projects were successfully operated in several usage contexts, as well as the challenges and broader implications for adoption within libraries of all types.

#ODDStories 2024 @ Zaria, Nigeria 🇳🇬 / Open Knowledge Foundation

At Digital Grassroots, we believe in the transformative power of open data. On March 2nd, 2024, we had the privilege of hosting the ‘Open Data as a Human Right’ Workshop at the Faculty of Law, Ahmadu Bello University Zaria – Nigeria to celebrate Open Data Day 2024. This workshop, organised with support from the Open Knowledge Foundation (OKFN), was led by myself, our Administrative and Advocacy Lead, and marked a significant milestone in our ongoing efforts to empower youth and advocate for digital rights and sustainable development.

The workshop commenced with a warm welcome, setting the tone for an engaging and insightful day ahead. With 50 participants in attendance, including law students, legal professionals, and experts in open data and human rights, the event underscored our commitment to fostering inclusive community-building and gender balance across all our programs.

Insights and Key Highlights

Throughout the workshop, participants engaged in insightful presentations, interactive sessions, and group practical exercises. Highlights included:

  1. Insightful presentations on open data, access to justice, and digital rights by industry experts.
  2. Group practical exercises focused on leveraging open data for Sustainable Development Goals (SDGs), fostering collaboration, and driving meaningful change.
  3. Thought-provoking discussions on the importance of open data in promoting transparency, accountability, and social equity.
  4. Participant feedback reflecting on the value and impact of the workshop, emphasizing newfound knowledge and inspiration for future advocacy efforts.

Sessions:

  1. Introduction to Open Data: Zainab Idris and Mahmud Muhammad Ibrahim introduced participants to Open Data.
  2. Access to Justice: Musa Suleiman led an engaging session on understanding open data for access to justice, exploring topics such as the Open Government Partnership (OGP) and the Freedom of Information (FOI) Act.
  3. Human Rights Advocacy: Joy Gadani facilitated a thought-provoking discussion on open data as a human right and its intersection with digital rights, highlighting its potential to drive positive social change.
  4. SDGs and Sustainable Development: Yazid Salahudeen Mikail concluded the workshop with a presentation on harnessing open data to achieve Sustainable Development Goals (SDGs), inspiring participants to take proactive steps towards leveraging data for sustainable development.

Practical Exercises and Group Discussions

One of the workshop’s most impactful components was the group practical exercise focused on exploring how open data can aid in the attainment of the SDGs. Participants were divided into groups and tasked with analyzing specific SDGs, proposing strategies for leveraging open data to advance progress towards these goals. The diversity of perspectives and insights generated during these discussions was truly remarkable.

Participant Feedback and Reflections

Participants hailed the workshop as a resounding success, citing its interactive nature, practical relevance, and engaging content as particularly valuable. From gaining insights into digital rights to exploring the potential of open data for sustainable development, attendees left the workshop feeling informed, empowered, and inspired to effect positive change in their communities.

Moving Forward

As we reflect on the success of the “Open Data as a Human Right” Workshop, we extend our heartfelt thanks to all our speakers, participants, sponsors, and volunteers who contributed to its success. We remain committed to advancing the cause of open data and look forward to organizing similar initiatives in the future.

We invite you to read and download the full workshop report for a comprehensive overview of the discussions and insights shared during the event. Here’s also the link to our blog post about the workshop on our website.


About Open Data Day

Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities.

As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date within a one-week period. In 2024, a total of 287 events happened all over the world between March 2nd-8th, in 60+ countries using 15 different languages.

All outputs are open for everyone to use and re-use.

In 2024, Open Data Day was also a part of the HOT OpenSummit ’23-24 initiative, a creative programme of global event collaborations that leverages experience, passion and connection to drive strong networks and collective action across the humanitarian open mapping movement

For more information, you can reach out to the Open Knowledge Foundation team by emailing opendataday@okfn.org. You can also join the Open Data Day Google Group to ask for advice or share tips and get connected with others.

The 2021 NDSA Staffing Survey is a 2024 Digital Preservation Award Finalist / Digital Library Federation

The Digital Preservation Coalition (DPC) has recently announced the finalists for the 2024 Digital Preservation Awards. We are very pleased to announce that the Working Group team behind the revision and reimagining of the 2021 NDSA Staffing Survey are finalists for the International Council on Archives Award for Collaboration & Cooperation. You can read our full award application summary here.

The redesign of the 2021 NDSA Staffing Survey was a significant international effort to build and refine one of the only longitudinal open datasets of its kind by reconfiguring the survey from an organizational focus (as seen in the 2012 and 2017 versions), to allow for individual participation. This shift allowed for a more detailed picture of the current state of digital preservation staffing to emerge from the data, and was the product of intensive collaboration between the members of the Working Group, as well as the digital preservation community.

DPC members will be selecting their first and second choices for each category as well as providing feedback on the finalists to the judges before winners are selected. Voting opens on this Friday, June 14 and closes on Friday, July 12, 2024. (If you are a DPC member, we would really appreciate your vote!)

Winners will be announced at iPres 2024 in Ghent, Belgium on Monday, September 16.

The post The 2021 NDSA Staffing Survey is a 2024 Digital Preservation Award Finalist appeared first on DLF.

The PII Figleaf / Eric Hellman

The Internet's big lie is "we respect your privacy". Thanks to cookie banners and such things, the Internet tells us this so many times a day that we ignore all the evidence to the contrary. Sure, there are a lot of people who care about our privacy, but they're often letting others violate our privacy without even knowing it. Sometimes this just means that they are trying to be careful with our "PII". And guess what? You know those cookies you're constantly blocking or accepting? Advertisers like Google have mostly stopped using cookies!!!

fig leaf covering id cards

"PII" is "Personally Identifiable Information" and privacy lawyers seem to be obsessed with it. Lawyers, and the laws they care about, generally equate good PII hygiene with privacy. Good PII hygiene is not at all a bad thing, but it protects privacy the same way that washing your hands protects you from influenza. Websites that claim to protect your privacy are often washing the PII off their hands while at the same time coughing data all over you. They can and do violate your privacy while at the same time meticulously protecting your PII.

Examples of PII include your name, address, social security number, your telephone number and your email address. The IP address that you use can often be traced to you, so it's often treated as PII, but often isn't. The fact that you love paranormal cozy romance novels is not PII, nor is the fact that you voted for Mitt Romney. That you have an 18 year old son and an infant daughter is also not PII. But if you've checked out a paranormal cozy romance from your local library, and then start getting ads all over the internet for paranormal cozy romances set in an alternate reality where Mitt is President and the heroine has an infant and a teenager, you might easily conclude that your public library has sold your checkout list and your identity to an evil advertising company.

That's a good description of a recent situation involving San Francisco Public Library (SFPL). As reported by The Register :

In April, attorney Christine Dudley was listening to a book on her iPhone while playing a game on her Android tablet when she started to see in-game ads that reflected the audiobooks she recently checked out of the San Francisco Public Library.

Let me be clear. There's no chance that SFPL has sold the check-out list to anybody, much less evil advertisers. However, it DOES appear to be the case that SFPL and their online ebook vendors, Overdrive and Baker and Taylor, could have allowed Google to track Ms. Dudley, perhaps because they didn't fully understand the configuration options in Google Analytics. SFPL offers ebooks and audiobooks from Overdrive, "Kindle Books from Libby by Overdrive",  and ebooks and audiobooks from Baker and Taylor's "Boundless" Platform. There's no leakage of PII or check-out list, but Google is able to collect demographics and interests from the browsing patterns of users with Google accounts.

A few years ago, I wrote an explainer about how to configure Google Analytics to protect user privacy.  That explainer is obsolete, as Google is scrapping the system I explained in favor of a new system, "Google Analytics 4" (GA-4), that works better in the modern, more privacy-conscious browser environment. To their credit, Google has made some of the privacy-preserving settings the default - for example, they will no long store IP addresses. But reading the documentation, you can tell that they're not much interested in Privacy with a capital P as they want to be able to serve relevant (and thus lucrative) ads, even if they're for paranormal cozy romances. And Google REALLY doesn't want any "PII"! PII doesn't much help ad targeting, and there are places that regulate what they can do with PII.

We can start connecting the dots from the audiobook to the ads from the reporting in the Register by understanding a bit about Google Analytics. Google Analytics helps websites measure their usage. When you visit a webpage with Google Analytics, a javascript sends information back to one or more Google trackers about the address of the webpage, your browser environment, and maybe more data that the webpage publisher is interested in. Just about the only cookie being set these days is one that tells the website not to show the cookie banner!

From the Register:

The subdomain SFPL uses for library member login and ebook checkout, sfpl.bibliocommons.com, has only a single tracker, from Alphabet, that communicates with the domains google-analytics.com and googletagmanager.com.

The page is operated by BiblioCommons, which was acquired in 2020 by Canada-based Constellation Software. BiblioCommon has its own privacy policy that exists in conjunction with the SFPL privacy policy.

In response to questions about ad trackers on its main website, Wong( acknowledged that SFPL does use third-party cookies and provides a popup that allows visitors to opt-out if they prefer.

With regard to Google Analytics, she said that it only helps the library understand broad demographic data, such as the gender and age range of visitors.

"We are also able to understand broad interests of our users, such as movie, travel, sports and fitness based on webpage clicks, but this information is not at all tied to individual users, only as aggregated information," said Wong.

The statement from Jaime Wong, deputy director of communications for the SFPL, is revealing. The Google Analytics tracker only works within a website, and neither SFPL or its vendors are collecting demographic information to share with Google. But Google Analytics has options to turn on the demographic information that libraries think they really need. (Helps to get funding, for example.) It used to be called "Advertising Reporting Features" and "Remarketing" (I called these the "turn off privacy" switches) but now it's called "Google Signals". It works by adding the Google advertising tracker, DoubleClick, alongside the regular Analytics tracker. This allows Google to connect the usage data from a website to its advertising database, the one that stores demographic and interest information. This gives the website owners access to their user demographics, and it gives the Google advertising machine access to the users' web browsing behavior.

I have examined the relevant webpages from SFPL, as well as the customized pages that BiblioCommons, Overdrive, and Baker and Taylor provide for SFPL for trackers. Here's what I found:

  • The SFPL website, SFPL.org, has Analytics and  DoubleClick ad trackers enabled.
  • The BiblioCommons website, sfpl.bibliocommons.org, has two analytics trackers enabled, but no advertising tracker. Probably one tracker "belongs" to SFPL while the other "belongs" to BiblioCommons.
  • The Overdrive website, sfpl.overdrive.com has Analytics and DoubleClick ad trackers enabled.
  • The Baker and Taylor website, sfpl.boundless.baker-taylor.com has Analytics and  DoubleClick ad trackers enabled.

So it shouldn't be surprising that Ms. Dudley experienced targeted ads based on the books she was looking at in the San Francisco Public Library website. Libraries and librarians everywhere need to understand that reader privacy is not just about PII, and that the sort of privacy that libraries have a tradition of protecting is very different than the privacy that Google talks about when it says  "Google Analytics 4 was designed to be able to evolve for the future and built with privacy at its core." At the end of this month earlier versions of Google Analytics will stop "processing" data. (I'm betting the trackers will still fire!)

What Google means by that is that in GA-4, trackers continue to work despite browser restrictions on 3rd party cookies, and the tracking process is no longer reliant on data like IP addresses that could be considered PII. To address those troublesome regulators in Europe, they only distribute demographic data and interest profiles for people who've given their permission to Google to do so. Do you really think you haven't somewhere given Google permission to collect your demographic data and interest profiles? You can check here

Here's what Google tells Analytics users about the ad trackers:

When you turn on Google signals, Google Analytics will associate the session data it collects from your site and apps with Google's information from accounts of signed-in, consented users. By turning on Google signals, you acknowledge you adhere to the Google Advertising Features Policy, including rules around sensitive categories, have the necessary privacy disclosures and rights from your end users for such association, and that such data may be accessed and deleted by end users via My Activity.

In plain english, that means that if a website owner flips the switch, it's the website's problem if the trackers accidentally capture PII or otherwise violate privacy, because it's responsible for asking for permission. 

Yep. GA-4 is engineered with what I would call "figleaf privacy" at its core. Google doesn't have fig leaves for paranormal cozy romance novels!


Time: It doesn’t have to be this way / Meredith Farkas

Three pocket watches

“What we think time is, how we think it is shaped, affects how we are able to move through it.”

-Jenny Odell Saving Time, p. 270

This is the first of a series of essays I’ve written on time. Here are the others (they will be linked as they become available on Information Wants to be Free):

What I love about reading Jenny Odell’s work is that I often end up with a list of about a dozen other authors I want to look into after I finish her book. She brings such diverse thinkers beautifully into conversation in her work along with her own keen insights and observations. One mention that particularly interested me in Odell’s book Saving Time (2023) was What Can a Body Do (2020) by Sara Hendren. Her book is about how the design of the world around us impacts us, particularly those of us who don’t fit into the narrow band of what is considered “normal,” and how we can build a better world that goes beyond accommodation. Her book begins with the question “Who is the built world built for?” and with a quote from Albert Camus: “But one day the ‘why’ arises, and everything begins in that weariness tinged with amazement” (1).

“Why” is such a simple world, but asking it can completely alter the way we see the world. There’s so much in our world that we simply take for granted or assume is the only way because some ideology (like neoliberalism) has so deeply limited the scope of our imagination. Most of what exists in our world is based on some sort of ideological bias and when we ask “why” we crack the world open and allow in other possibilities. Before I read the book Invisible Women (2021) by Caroline Criado Perez, I already knew that there was a bias towards men in research and data collection as in most things, but I didn’t realize the extent to which the world was designed as if men were the only people who inhabited it and how dangerous and harmful it makes the world for women. What Can a Body Do similarly begins with an exploration of the construction of “normal” and how design based on that imagined normal person can exclude and harm people who aren’t considered normal, particularly those with disabilities. The book is a wonderful companion to Invisible Women in looking at why the world is designed the way it is and how it impacts those who it clearly was not built for. I’ll explore that more in a later essay in this series. 

One thing I took for granted for a very long time was time itself. I thought of time in terms of clocks and calendars, not the rhythms of my body nor the seasons (unless you count the start and end of each academic term as a season). I believed that time was scarce, that we were meant to use it to do valuable things, and that anything less was a waste of our precious time. I would beat myself up when, over Spring Break, I didn’t get enough practical home or scholarship projects done or if I didn’t knock everything off my to-do list at the end of a work week. I would feel angry and frustrated with myself when my bodily needs got in the way of getting things done (I’m writing this with ice on both knees due to a totally random flare of tendinitis when I’d planned to do a major house cleaning today so I’m really glad I don’t fall into that shooting myself with the second arrow trap as much as I used to). I looked for ways to use my time more efficiently. I am embarrassed to admit that I owned a copy of David Allen’s Getting Things Done and tried a variety of different time management methods over the years that colleagues and friends recommended (though nothing ever stuck besides a boring, traditional running to-do list). I’d often let work bleed into home time so I could wrap up a project because not finishing it would weigh on my mind. I was always dogged by the idea that I wasn’t getting enough done and that I could be doing things more efficiently. It felt like there was never enough time all the time. 

Black and white photo of a man hanging from a clock atop a buildingFrom Harold Lloyd’s Safety Last (1923)

I didn’t start asking questions about time until I was 40 and the first one I asked was a big one “what is the point of our lives?” Thinking about that opened a whole world of other questions about how we conceive of time, what kinds of time we value, to what end are we constantly trying to optimize ourselves, what is considered productive vs. unproductive time, why we often value work time over personal time (if not in word then in deed), why time often requires disembodiment, etc. The questions tumbled out of me like dominoes falling. And with each question, I could see more and more that the possibility exists to have a different, a better, relationship with time. I feel Camus’ “weariness, tinged with amazement.”

This is an introduction to a series of essays about time: how we conceive of it, how it drives our actions, perceptions, and feelings, and how we might approach time differently. I’ll be pulling ideas for alternative views of time from a few different areas, particularly queer theory, disability studies, and the slow movement. I’m not an expert in all these areas, but I’ll be sure to point you to people more knowledgeable than me if you want to explore these ideas in more depth.

How many of you feel overloaded with work? Like you’re not getting enough done? How many of you are experiencing time poverty: where your to-do list is longer than the time you have to do your work? How many of you feel constantly distracted and/or forced to frequently task-switch in order to be seen as a good employee? How many of you feel like you’re expected to do or be expert in more than ever in your role? How many of you feel like it’s your fault when you struggle to keep up? More of us are experiencing burnout than ever before and yet we keep going down this road of time acceleration, constant growth, and continuous availability that is causing us real harm. People on the whole are not working that many more hours than they used to, but we are experiencing time poverty and time compression like never before, and that feeling bleeds into every other area of our lives. If you want to read more about how this is impacting library workers, I’ll have a few article recommendations at the end of this essay.

My exploration is driven largely by this statement from sociologist Judy Wajcman’s (2014) excellent book Pressed for Time: “How we use our time is fundamentally affected by the temporal parameters of work. Yet there is nothing natural or inevitable about the way we work” (166). We have fallen into the trap of believing that the way we work now is the only way we can work. We have fallen into the trap of centering work temporality in our lives. And we help cement this as the only possible reality every time we choose to go along with temporal norms that are causing us harm. In my next essay, I’m going to explore how time became centered around work and how problematic it is that we never have a definition of what it would look like to be doing enough. From there, I’m going to look at alternative views of time that might open up possibilities for changing what time is centered around and seeing our time as more embodied and more interdependent. My ideas are not the be-all end-all and I’m sure there are thinkers and theories I’ve not yet encountered that would open up even more the possibilities for new relationships with time. To that end, I’d love to get your thoughts on these topics, your reading recommendations, and your ideas for possible alternative futures in how we conceive of and use time. 

Works on Time in Libraries

Bossaller, Jenny, Christopher Sean Burns, and Amy VanScoy. “Re-conceiving time in reference and information services work: a qualitative secondary analysis.” Journal of Documentation 73, no. 1 (2017): 2-17.

Brons, Adena, Chloe Riley, Ean Henninger, and Crystal Yin. “Precarity Doesn’t Care: Precarious Employment as a Dysfunctional Practice in Libraries.” (2022).

Drabinski, Emily. “A kairos of the critical: Teaching critically in a time of compliance.” Communications in Information Literacy 11, no. 1 (2017): 2.

Kendrick, Kaetrena Davis. “The public librarian low-morale experience: A qualitative study.” Partnership 15, no. 2 (2020): 1-32.

Kendrick, Kaetrena Davis and Ione T. Damasco. “Low morale in ethnic and racial minority academic librarians: An experiential study.” Library Trends 68, no. 2 (2019): 174-212.

Lennertz, Lora L. and Phillip J. Jones. “A question of time: Sociotemporality in academic libraries.” College & Research Libraries 81, no. 4 (2020): 701.

McKenzie, Pamela J., and Elisabeth Davies. “Documenting multiple temporalities.” Journal of Documentation 78, no. 1 (2022): 38-59.

Mitchell, Carmen, Lauren Magnuson, and Holly Hampton. “Please Scream Inside Your Heart: How a Global Pandemic Affected Burnout in an Academic Library.” Journal of Radical Librarianship 9 (2023): 159-179.

Nicholson, Karen P. “Being in Time”: New Public Management, Academic Librarians, and the Temporal Labor of Pink-Collar Public Service Work.” Library Trends 68, no. 2 (2019): 130-152.

Nicholson, Karen. “On the space/time of information literacy, higher education, and the global knowledge economy.” Journal of Critical Library and Information Studies 2, no. 1 (2019).

Nicholson, Karen P. ““Taking back” information literacy: Time and the one-shot in the neoliberal university.” In Critical library pedagogy handbook (vol. 1), ed. Nicole Pagowsky and Kelly McElroy (Chicago: ACRL, 2016), 25-39.

Awesome Works on Time Cited Here

Hendren, Sara. What Can a Body Do?: How We Meet the Built World. Penguin, 2020.

Odell, Jenny. Saving Time: Discovering a Life Beyond Productivity Culture. Random House, 2023.

Wajcman, Judy. Pressed for time: The acceleration of life in digital capitalism. University of Chicago Press, 2020.

Slow productivity is a team sport: A critique of Cal Newport’s Slow Productivity / Meredith Farkas

Impressionist painting of four people in flowing clothes resting on the bank of a river

Image credit: Dolce far Niente by John Singer Sargent 

This is the fourth in a series of essays I’ve written on time. You can view a list of all of them on the first essay.

This was going to be a somewhat different essay before I read Cal Newport’s Slow Productivity. I read the book the day it came out, interested in seeing how he incorporated the ideas from slow movements into the world of productivity, since in so many ways, productivity is the enemy of slowness. Given what I’d read of his work in the New Yorker, I was skeptical that he would really embrace slowness in his book and I discovered my skepticism was more than justified. I’m going to start by critiquing Newport’s book, but then get into my own vision for what it might take to achieve slow productivity.

In late 2021, Cal Newport began writing about “slow productivity,” largely in response to a tidal wave of published books that questioned our society’s focus on productivity (for productivity pundits, the answer is always productivity). He saw the goal of slow productivity as “keep[ing] an individual worker’s volume at a sustainable level” and argued that this will not have a negative impact on organizational productivity because less overloaded workers will be less focused on managing a glut of information. He envisioned systems that will track people’s work and assign new tasks based on when the people with the needed skills have time available. In a world full of unique individuals whose capacities vary day by day and where most tasks are far from mechanistic, I question whether this is possible. Tack on the fact that we have people working at varying levels of precarity plus the fact that our reward systems incentivize overwork and we’re always going to have some people who feel the need to do significantly more to prove themselves. Creating systems that don’t change the underlying realities and inequities in the world of work will not adequately address the issue of overwork and overwhelm. 

Strangely, though, his book has no suggestions for how slow productivity could be achieved at the systems level. It’s so individual-focused, that he suggests only taking on projects that don’t require meetings with others (the “overhead tax” on projects he calls it). The idea that meetings with others could make us better at our jobs doesn’t seem to occur to him. His understanding of slow proves to be surface-level at best. The slow movement isn’t just about individuals choosing to step away from fast culture; it’s about changing the culture so that everyone can slow down. Otherwise it just becomes an elitist enterprise where only those with the most privilege can actually access the benefits of slow living.

Mountz, et al. (2015) wrote about slow scholarship, arguing that it “is not just about time, but about structures of power and inequality. This means that slow scholarship cannot just be about making individual lives better, but must also be about re-making the university” (1238). Slow Food advocate, Folco Portinari (the author of the slow food manifesto though I rarely see him credited), wrote “there can be no slow-food without slow-life, meaning that we cannot influence food culture without changing our culture as a whole.” Slow Food isn’t just about buying local and slow scholarship isn’t just about not buying into the productivity expectations of the academy. It’s about collectively working to change the systems themselves.

But, really, Cal Newport is not writing this book for most of us. He’s writing it for white, male (there are plenty of critiques of his previous work on the basis of sexism), affluent, lone geniuses who aren’t accountable to a boss. He waits until the end of the book to explicitly state that his advice is for academics and people who work for themselves, but when he offers advice like go see a movie matinee on a weekday once a month, take month+ long vacations to gain perspective, cut your salary, and only take on projects that require no collaboration with others, we see how unrelatable this is to most knowledge workers. 

I’ll bet he pulled himself up by his bootstraps!

All you need to know about Newport’s philosophy you can get from page 7 of the book:

Slow productivity [is] a philosophy for organizing knowledge work efforts in a sustainable and meaningful manner, based on the following three principles:

1. Do fewer things

2. Work at a natural pace

3. Obsess over quality

I agree that these are good goals, but his book won’t help you get there. The rest of the book is recycled productivity tips from his previous work (many of which won’t work unless you have total control over your work) punctuated by completely unrelatable stories of famous figures throughout history that don’t connect well to any sort of usable takeaway. I read his story of Jane Austen and how she was only able to really be productive in her writing when her brother inherited an estate, she went to live there, and the family decided not to participate in society anymore. So is the takeaway that I need no children, plenty of servants, and no social engagements to be productive? Cool cool cool.

I will never understand why we trust advice from people who have zero experience working the sort of jobs we have. It would be one thing if his work was research-based, but it isn’t. Early in the book, he writes about how people don’t really understand why people are suddenly so exhausted and burned out by work, but there’s ample research in the sociology, anthropology, business, and psychology literature that addresses this. I know because I’ve read a lot of it! And if we’re trusting his experience, what does a person who went from Ivy League undergraduate work, to graduate work at MIT, to a post-doc, to a tenure-line position at Georgetown in computer science really know about what it’s like to work in a typical knowledge organization with a manager and peers who rely on them? I am in a massively privileged position where I have tenure and summers off and even I found very little that I could apply to my own work. As an instruction librarian, I teach students to look into the author of something they are going to rely on and determine if/why they would trust that particular author’s expertise on that subject. Maybe we should do the same?

If you’re looking for really brilliant and well-researched work relevant to slow productivity, check out Melissa Gregg’s Counterproductive, both of Jenny Odell’s books, Oliver Burkeman’s Four Thousand Weeks, Carl Honoré’s book on the slow movement, and Wendy Parkins and Geoffrey Craig’s Slow Living. They will not offer you concrete tips for being more productive, but, really, there’s no magical list of tips that will work for everyone. They will open your mind to what’s wrong with how we’ve been working and what is possible if we came together to collectively fight for change.

In my next post, I’ll share my own vision of what slow productivity looks like (I decided to break this up into two posts because it was getting a bit long). My tips for slow productivity are quite different from Newport’s in that they’re much more focused on our collectivity. He was right in his piece on “The Rise and Fall of Getting Things Done” that productivity advice is broken because it is not changing things at the level of the system (though he then produced another book focused on individual productivity, go figure). In organizations, we are often dependent on one another to complete our work. We are also held to the collective norms of the organization around productivity and performing busyness. Therefore, slow productivity must be a team sport. 

See you again in a couple of weeks!!!

Burkeman, Oliver. 2023. Four Thousand Weeks : Time Management for Mortals. First paperback edition. New York: Picador.

Gregg, Melissa. Counterproductive: Time Management in the Knowledge Economy. Durham North Carolina; London, Duke University Press, 2018.

Honoré, Carl. In praise of slow: How a worldwide movement is challenging the cult of speed. Vintage Canada, 2009.

Mountz, Alison, Anne Bonds, Becky Mansfield, Jenna Loyd, Jennifer Hyndman, Margaret Walton-Roberts, Ranu Basu et al. “For slow scholarship: A feminist politics of resistance through collective action in the neoliberal university.” ACME: An International Journal for Critical Geographies 14, no. 4 (2015): 1235-1259.

Newport, Cal. “The Rise and Fall of Getting Things Done.” The New Yorker, 17 Nov. 2020, www.newyorker.com/tech/annals-of-technology/the-rise-and-fall-of-getting-things-done.

Newport, Cal. “It’s Time to Embrace Slow Productivity.” The New Yorker, 3 Jan. 2022, www.newyorker.com/culture/office-space/its-time-to-embrace-slow-productivity

Newport, Cal. 2024. Slow Productivity : The Lost Art of Accomplishment without Burnout. New York: Portfolio/Penguin.

Odell, Jenny. 2019. How to Do Nothing : Resisting the Attention Economy. Brooklyn, NY: Melville House.

Odell, Jenny. Saving Time: Discovering a Life Beyond Productivity Culture. Random House, 2023.

Parkins, Wendy and Geoffrey Craig. 2006. Slow Living. Oxford: Berg.

Petrini, Carlo. Slow food: The case for taste. Columbia University Press, 2003.

Quilting together at OCLC / HangingTogether

If you’re attending the American Library Association annual conference in San Diego later this month, watch out for the colorful display of quilts. Each year the ALA Biblioquilters hosts a silent auction of quilts as a fundraiser for the Christopher Hoy/ERT Scholarship Fund, which awards a $5,000 scholarship each year to an MLIS student. 

There you will find a colorful color wash style quilt comprised of 480 blocks and more than 2500 pieces entitled “Quilting Together,” donated by the OCLC Quilters. The quilt was designed, pieced, assembled, and financially supported by a team of twelve current and retired OCLC employees from across the organization. Each quilter dug into their own fabric stash to make the 3” blocks, which were then assembled into this colorful and unique creation.

Four people holding up a colorful patchwork quiltThe “Quilting Together” quilt, displayed by four of the twelve OCLC quilters

This isn’t the first offering by the OCLC Quilters. Last year we created a scrappy cat-themed quilt called a “World of cats,” obviously inspired by WorldCat. It raised $775 to support the scholarship.

Photo of quilt with cloth comprised of numbersThe data-inspired quilt backing

This year’s quilt is also inspired by WorldCat. We’ve borrowed the title of this quilt, “Quilting Together,” from a title in WorldCat, Quilting together : how to organize, design, and make group quilts, which is held by more than 200 libraries worldwide. And, like the record for this book, and all of WorldCat, this quilt is backed by data. Take a look at the numerically themed backing fabric!

Making a group quilt requires special considerations. For example, to be inclusive, the block chosen should be simple enough to accommodate sewists with a broad range of skills. Furthermore, a scrappy style allows participants to use their own leftover quilt scraps in their own fabric “stash” without having to purchase materials. An ample supply of scraps was donated by experienced quilters for anyone in need of supplies. Finally, as with other collaborations, it’s critical to recognize that not everyone has to contribute in the same way. While some employees were engaged at every stage of the quilt-making process, others contributed by making blocks, while others donated money for the professional longarm quilting.

An image of the OCLC quilt that shows how the OCLC logo has been incorporatedAn OCLC logo is incorporated into the quilt

A cataloging colleague pointed out to me that group quilting seems to have parallels to cataloging in WorldCat, as each contributor is part of a larger community that collectively enhances the object. And quilts, just like bibliographic records, are made up of a lot of components, with their own terms, like blocks, backing, binding, and much more. I like that. 

I hope you’ll not only stop by the quilt auction in San Diego, but that you’ll also get out your wallet to bid on it. You’ll get a one-of-kind item made by a group committed to the vision of collaboration in libraries. And quilting. 

Since thanks to Kate James, Program Coordinator for Metadata Engagement for her input on this post.

The post Quilting together at OCLC appeared first on Hanging Together.

Not Business as Usual: Incorporating LIS Student Perspectives in the Apprenticeship Hiring Process / In the Library, With the Lead Pipe

In Brief

While a Master’s in Library and Information Science (MLIS) degree is typically necessary to become an academic librarian, practical experiences such as internships, practicums, and apprenticeships are essential in gaining employment post-graduation. Providing paid opportunities where LIS students participate in and contribute to meaningful mentorship, training, and work experience is critical to improving inclusion in academic libraries. This article reflects on experiences of student employees of the University of Colorado (CU) Boulder University Libraries’ Ask a Librarian Apprenticeship, who collaborated with the apprenticeship supervisor to purposefully reassess the hiring process for incoming apprentices. This article demonstrates how including student employees as active participants in the hiring process is not only a valuable experiential learning opportunity, but also shifts power dynamics from a sole hiring manager to a team including student employees, creating a better hiring process for student applicants. 

By: Estefania Eiquihua, Karen Adjei, Janelle Lyons, and Megan E. Welsh

Introduction

Although the Master of Library and Information Science (MLIS) degree is required (in many cases) in order to be a professional librarian, a degree alone is not sufficient for library school (LIS) graduates when they enter the job market. Hands-on experience through internships, practicums, and apprenticeships allows students to put coursework into practice and prepare for the post-graduation job search by gaining a sense of what librarianship looks like. As these work experiences have historically been unpaid, it is crucial that libraries begin and continue to offer paid opportunities so that LIS students are not forced to pay for credits toward their degree or contribute free labor to an organization in exchange for practical experience. The challenge of receiving worthwhile professional experience, which may or may not be paid, is especially poignant for emerging library professionals who identify with a historically marginalized group that has traditionally been excluded from librarianship. 

Providing paid opportunities for emerging library professionals is one way to promote inclusion. However, libraries can further facilitate an environment of inclusion by actively involving their current student employees in the hiring process of such paid opportunities. When student employees are actively and purposefully involved in the hiring process – through crafting the job ad, developing evaluation criteria, and interviewing candidates – it benefits the library, the current employee, and future applicants. By intentionally including student employee experiences in hiring practices, professional development opportunities aimed to support emerging library professionals become more accessible. At the University of Colorado (CU) Boulder University Libraries, we experienced the power of involving student employees in the hiring process firsthand by embedding current graduate student apprentices throughout all stages of the hiring process as we recruited a new apprentice. Current student employees were able to gain valuable experience in hiring, candidates experienced a more transparent application and interview process, and the hiring supervisor received valuable insights into how best to implement more inclusive student employee hiring practices to benefit future iterations of the apprenticeship program. 

This article demonstrates how including student employees as active participants in the hiring process is not only a meaningful experiential learning opportunity for apprentices, but also shifts power dynamics from a sole hiring manager to a team including student employees. This article contextualizes these experiences by reviewing the literature on meaningful professional development opportunities for LIS students as well as literature about hiring processes in academic libraries. Our overall intention is to highlight how including current apprentices in iterations of the hiring process creates a better experience for applicants. The practices laid out in this article would be of particular interest for any library hiring supervisors interested in challenging the status quo, providing a rewarding professional development opportunity for student employees, and recruiting a more diverse population of student employees through thoughtful hiring practices.

Literature Review 

Much has been published on the value of providing LIS students with practical experiences through mentorship programs, internships, and practicums. Most literature in support of practical experiences for LIS students argue that an LIS curriculum alone does not provide students the on-the-job training that seems to be expected in the field. Lacy & Copeland (2013) cite that while all LIS programs place value on practical experiences, in many cases students are not required to participate in internships or practicums in order to graduate (unless they are concentrating on school librarianship, for example). The authors emphasize the importance of mentorship programs that offer opportunities for LIS students to network, experience day-to-day work life and job expectations, and to enhance job seeking skills. A study by Goodsett & Koziura (2016) questions what can be done to improve LIS education for new librarians. They surveyed over 575 LIS graduates in order to gain insight into the perceived effectiveness of their LIS education. While respondents undoubtedly found value in their LIS education, most reported that their LIS curriculum emphasized theoretical knowledge. An overwhelming number of respondents reported that practical experiences such as work experience, internships, and practicums were essential in gaining employment post-graduation. 

The need of LIS students to supplement their graduate curriculum sheds light on the importance for libraries to provide meaningful practical experiences so that the next generation of information professionals is well prepared to intentionally maintain and improve the field of librarianship. Lewey & Moody-Goo (2018) suggest that the ideal internship is mindfully designed and should be “transformative and empowering” for the LIS student. The authors emphasize that internships which are mindfully designed can “benefit all parties involved—intern, institution, library, librarians, and the LIS field as a whole” (p.238). The authors advocate that “meaningful internships should have four key features: supportive mentorship, purposeful planning and training, simulation of an authentic professional position, and reflection and assessment” (p. 238). Wang et al. (2022) agree that access to meaningful internships is essential for post-graduate success. However, the authors argue that internships should also strive to become more equitable. The authors cite various barriers that hinder LIS students from being able to participate in practical experiences such as: availability of opportunities, location, lack of time, and finances. Another barrier mentioned was the expectation for students to “volunteer” for experiences or to complete credit bearing practicums in which students have to pay tuition. The authors are critical of the “superficial professionalization” of librarianship and recommend that libraries should work toward supporting LIS students and recent graduates by funding internships and practicums. They also recommend offering interns competitive pay and offering remote or hybrid work to help alleviate the financial or geographic burden of trying to gain practical experience. 

Wildenhaus (2019) emphasizes the critical importance to denormalize unpaid positions in LIS. She notes that the message presented to many LIS students and new librarians is “the cost of entry to a career in libraries and archives is a willingness—and ability—to work for free” (p.2). Wildenhaus states,“the prevalence of unpaid internships may negatively impact efforts for diversity and inclusion among information workers while contributing to greater precarity of labor throughout the workforce” (p.1). Unpaid labor is an additional barrier to Black, Indigenous, and persons of color (BIPOC) seeking practical experience, as Galvan (2015) points out, “only students with access to money can afford to take an unpaid internship… insuring [sic] the pool of well-qualified academic librarians skews white and middle class” (para. 31). Holler (2020) furthers this notion by highlighting, “only certain sorts of people can afford to work for free: people who are wealthy; people with spouses or partners who can provide for them; people who have the luxury of living with families or guardians; people who are unburdened by care work and its economies; people without outstanding medical bills or student debt; and, overwhelmingly: people who are white” (para. 40). Holler (2020) rejects the notion that unpaid or underpaid labor should be normalized and advocates for a “equity budgeting model” in which the culture of paying dues is denounced and institutions commit to paying all workers, especially students who are trying to gain practical experiences in community-based cultural work sectors. Holler (2020) explains that the equity budgeting model is rooted in the desire to “[repair] the damage of a fundamentally extractive nonprofit-industrial complex and cultural work sector, which has survived on the systemic underpayment (or non-payment) of community members of color and freelance cultural workers alike — resulting in a cultural work economy in which independently wealthy, white, or salaried practitioners hold unfair and unequal sway” (para. 3).

There is a significant gap in the literature detailing the perspectives of BIPOC LIS students and new librarians’ experiences with unpaid labor. The lack of literature on the topic may be due to the vulnerable position in which BIPOC LIS students and new librarians find themselves–trying to break into their profession while being entrenched in a culture that insists on “paying your dues” in order to gain professional experience. Insight into their experiences would provide essential knowledge to challenge the status quo in hopes to denormalize the prevalence of unpaid labor in LIS. 

Furthermore, we were not able to find literature that specifically discussed the experiences of LIS students being involved in the hiring process, a growing body of literature has emphasized the importance of inclusive hiring practices as a way to reduce barriers that hinder recruitment efforts (Cunningham et al., 2019; Galvan, 2015; Harper, 2020; Houk & Nielsen, 2023; Shah & Fife, 2023). Shah & Fife (2023) further state, “the recruitment/hiring/retention life cycle for BIPOC job candidates for academic and research libraries is fraught with bureaucracy and layers of communication that deter the very DEAI concepts that they aim to practice” (para. 2). The authors emphasize that complex job descriptions and complicated application processes hinder recruitment efforts and instead, libraries should “focus on the humanity of the candidates” (para. 16), and work toward dismantling barriers by providing honest and concise job descriptions. 

Houk & Nielsen (2023) further this argument for person-centered hiring practices by advocating that every aspect of the recruitment process be critically examined. Specifically, the authors critically examine interviews and emphasize “the need for intentionality in creating environments where candidates, particularly candidates from marginalized communities, feel welcome and set up for success during their interviews” (Discussion section, para. 1). In their research, the authors found that the idea of “the interview as a test” was common. This manifested in explicit testing of skills through presentations or interview questions, in hidden testing through observations of a candidate’s behavior, or in perceived “fit.” The idea of hiring based on the “interview as a test” and “fit” is problematic when put into context of a profession that has been historically predominantly white. According to an American Library Association (ALA) 2012 Diversity Counts survey, nearly 88% of professional librarians identified as white. Cunningham et al. (2019) emphasizes that “fit” is often “undefinable, intangible, and thus allows for libraries to stay within their comfort zones and replicate the status quo” (p. 17). 

Furthermore, while interviews are an integral part of determining whether a candidate is a good match for a position, Houk & Nielsen (2023) argue that libraries should reexamine how they are evaluating candidates and ensure they are making intentional efforts to reduce bias in their hiring criteria. They suggest intentional actions such as providing candidates with interview questions and giving candidates accommodations to ensure that candidates are comfortable and more confident during the interview process. Establishing well-defined hiring criteria and qualifications help reduce bias. The work to improve the hiring practices for CU Boulder Libraries’ Ask a Librarian Apprenticeship through the inclusion of student apprentices, directly addresses these suggestions from the literature review and furthers the conversation by contributing a successful model of reducing professional development barriers in the LIS field.

Apprenticeship Context

University of Colorado (CU) Boulder is a large, R1, public university enrolling over 30,000 students. Five libraries on campus comprise the University Libraries system and support undergraduate and graduate students, faculty, staff, and the broader Boulder, Colorado community. The largest library on campus currently has a distinct reference desk (the Ask a Librarian Desk) and the University Libraries maintain a virtual chat service which we call “Ask A Librarian.” On most evenings and weekends during the academic year, our virtual chat service is staffed exclusively by LIS student employees. Since 2018, we have hired ten graduate students in library and information science as Ask A Librarian Apprentices at CU Boulder. The apprenticeship is a paid, practical experience which aims to build library school students’ skills in reference work by staffing evening and weekend chat shifts, while also supporting their interests as they engage in professional development, networking, and special projects ranging from building research guides to collection development to publishing and presenting. Unique from internships and practica, the apprenticeship is an intentionally scaffolded experience which provides LIS students with a holistic view of academic librarian responsibilities. It is an experience that lasts longer than a typical semester-long internship or practicum, and usually for the duration of the apprentice’s LIS education (due to campus funding parameters, LIS students are no longer eligible to be an apprentice after they graduate). 

In 2020, as the COVID-19 pandemic shifted the apprenticeship to a remote work opportunity, CU Boulder Libraries also intentionally viewed the apprenticeship as an opportunity to recruit LIS students of color to academic librarianship. Contextualized by the Black Lives Matter movement, the murders of Breonna Taylor and George Floyd, growing awareness of the historical injustices and predominance of whiteness in academic library settings, and training dedicated to recruiting and retaining librarians of color (see the excellent the Library Juice Academy course “Recruiting and Retaining Librarians from Underrepresented Minoritized Groups”), CU Boulder Libraries accepted a proposal in summer 2021 to continue the remote modality of the apprenticeship and to explicitly welcome BIPOC students to apply. The apprenticeship is a valuable opportunity for students to gain practical skills as they look toward graduation and enter the job market. It has evolved over the years, especially given the pandemic when apprentices transitioned from staffing our physical reference desk in person to staffing our virtual chat service. Apprentice project work over the past four years has included increased participation in the hiring process for incoming apprentices. 

Initially, in 2018 and 2019, the hiring process involved Megan as the apprenticeship supervisor and hiring manager developing and posting a job ad, reviewing applications, scheduling interviews, and making the final hiring decision; sometimes her colleague who managed the reference desk joined the interviews. The hiring process has evolved to be entirely virtual, matching the modality in which the apprenticeship is currently offered, and now includes current apprentices. The extent of apprentice participation in the hiring process has grown over the past four years. In 2020, apprentices began to sit in on interviews. We moved from incorporating a staff colleague as a companion interviewer to involving current apprentices, both because that staff colleague’s responsibilities had changed and that role experienced turnover, and also as a way for graduate student applicants to hear directly from the experience of current apprentices. This opportunity for current apprentices to articulate their unique perspectives and to be transparent about what the job actually looks like is a valuable opportunity for them and for applicants. Apprentices are able to describe everything from the questions they receive over chat, to the project work they engage in, to what it’s like to work with Megan as a mentor and supervisor. These are questions that Megan cannot answer in the same way, or in nearly as meaningful a way, as our current apprentices.

Currently, CU Boulder Ask a Librarian Apprentices participate in the hiring process by:

  • Reviewing and revising the job ad in collaboration with Megan. This helps to capture, in real-time, what apprentices have experienced throughout the entire hiring and employment process. They are able to bring their experiences into all stages of the hiring process to ensure that it benefits future apprentices. CU Boulder apprentice involvement in hiring creates continuity of feedback, revision, learning, and application of inclusive practices for everyone throughout the hiring process so that apprentices and the supervisor can learn from each other and improve approaches to hiring and onboarding,
  • Helping to recruit by advertising through listservs, library school forums, on social media (e.g., the We Here Facebook group, a space exclusively for BIPOC library school students and library professionals), and through word of mouth with peers at conferences and individually. These recruiting efforts highlight how apprentices create and leverage their networks within the LIS field to positively contribute to the hiring process. Advertising through these networks expands the reach of the job posting and knowledge of CU Boulder as a site that supports LIS student labor. It also represents the various social networks that current LIS students are a part of, especially ones which the hiring manager may not be aware of, have access to, or be welcome to participate in,
  • Reviewing, discussing, and suggesting revisions to hiring documentation. This documentation includes a rubric used to rank application materials, a list of interview questions, and a rubric used to rank interviewees,
  • Reviewing applications and ranking them to help prioritize who we should invite to the interview stage, 
  • Participating in the interview process by asking interview questions and answering candidates’ questions about their experience in the apprenticeship, and  
  • Ranking interviewees to help inform a final hiring decision.

Including apprentices in the interview stage of the hiring process can provide clarity for potential apprentices about the day-to-day work of the apprenticeship and tasks listed in the job ad, addressing questions and alleviating confusion that applicants may have. In this way, current apprentices help to reduce barriers for student applicants throughout the hiring process. Yet, beyond including current apprentices as key participants in such a visible aspect of the hiring process as the interview, much of the evolution of our hiring has involved apprentices helping to create and refine hiring documentation. This documentation helps to standardize the hiring process, enhance clarity of the job and applicant requirements, and decrease bias in the application and interview evaluation by ensuring that multiple perspectives are represented throughout. Increasing apprentice engagement in all elements of hiring helps Megan to evaluate applicants with perspectives other than her own, and it gives current apprentices the opportunity to learn about the hiring process more as a hiring authority rather than as the applicant they once were. 

Apprentice perspectives

Job ad development

Estefania and Karen were both excited to participate in reviewing the hiring material and criteria for the incoming apprentice. They were eager to participate because they wanted to gain experience on the other side of the hiring process while also improving the hiring process for the next round of applicants. In order to revise the hiring materials, Megan and the apprentices reviewed the materials that were used when Karen and Estefania were applying to the apprenticeship. Both apprentices relied on memories and past experience as the interviewee to inform how they would like to see changes made to the hiring material. Each considered what could have been perceived as a barrier by incoming applicants with the intention to make the hiring process more inclusive for the next round of applicants. Both also reflected on their prior experiences while originally applying for the apprenticeship and considered what specific wording from the job ad had appealed to them, what made the apprenticeship an attractive opportunity, and what revisions should be made to ensure the hiring materials were concise, transparent, and reduced bias. 

While reflecting on the job ad (see Appendix A), both apprentices had helpful suggestions for tweaking the original language to more accurately reflect the apprenticeship. For example, Karen suggested changing the language in which the apprenticeship was originally described as a “fast-paced” environment. Karen admits that she initially shied away from applying to the CU Boulder apprenticeship due to this description because she had previous work experience in a “fast-paced” environment and had mixed feelings about entering into a similar workplace. She encouraged changing the language because oftentimes “fast-paced” could be code for work environments that require a lot of responsibility with tight deadlines and no support. Karen also recalled that during Atla Annual 2022, she had attended a workshop co-hosted by Megan entitled, “Navigating in the Fog: Shining a Light on the Library Job Search Process” (Welsh & Knievel, 2022). From the hiring workshop, she was able to learn about common wording that deters women of color applicants in particular, which helped her to identify specifically why she did not apply to the apprenticeship position in the first place. The group decided to omit the “fast-paced” language and instead highlighted that the apprenticeship values practical professional experience alongside receiving mentorship from faculty librarians. We specifically changed the verbiage in the job ad to emphasize the exploratory nature of the apprenticeship experience in allowing emerging library professionals to contribute to and build their interests in the field of academic librarianship.

Estefania also reflected on the specific wording that made her excited about the opportunity and considered how the language in the job ad could enhance the transparency of the responsibilities and make the position more appealing, especially for BIPOC LIS students. For example, the original job ad stated, “A core goal of the apprenticeship program is to invite and encourage involvement of MLIS students from traditionally underrepresented groups in academic librarianship.” When Estefania originally applied for the position, she appreciated that this statement was included and recommended that for Fall 2023 the job ad include the addition of “BIPOC (Black, Indigenous, and People of Color) MLIS students are highly encouraged to apply.” While a small gesture, the additional language is important to advertise that this program is intentionally recruiting BIPOC and people from underrepresented groups. Estefania shared that when institutions add this verbiage, she feels more empowered to apply. 

Recruitment strategies

While Megan maintains a list of library schools to share the job ad with and colleagues in CU Boulder Libraries’ HR share the posting to the Libraries’ website, a job board, and a platform called Handshake, apprentice involvement in promoting the apprenticeship was crucial during the recruitment phase. Karen intentionally shared the job ad with as many groups she was a part of and networks she was connected to in order to cast the net far and wide. This strategy ensured that LIS students would be able to see the ad across many platforms and would have a better chance of being exposed to this opportunity. Leveraging and contributing to social networks is especially important in virtual modalities of professional and academic spaces, where in-person connection and subsequent exchange of information needs to be deliberate and intentional in order to be effective at all.

Places where we shared the job ad include the following: 

  • University of Maryland (UMD) MLIS Student listserv,
  • UMD MLIS Student discord channel, 
  • Association of Research Libraries (ARL) Diversity Programs Alumni,
  • Asian Pacific American Librarians Association (APALA),
  • National Association to Promote Library and Information Services to Latinos and the Spanish Speaking (REFORMA),
  • We Here Facebook Group,
  • Atla Listserv, a listserv for Theological and Religious Studies librarians,
  • Karen shared the job ad with a former supervisor and a mentor so they could forward this opportunity on to others who may be interested or who might have other networks they could spread the word through. Karen also shared with a peer whom she met at the California Library Association (CLA) conference, and
  • Estefania shared the job ad with iSchool Students of Color, a group which she was part of at the University of Illinois Urbana-Champaign.

In total, we received over sixty applications in the Fall 2023 hiring cycle, similar to previous hiring cycles since moving the apprenticeship to a remote opportunity in 2020.

Reviewing and ranking candidates’ application materials

The rubric for evaluating applications was another aspect of the hiring materials that we collaboratively decided to change (see Appendix B). We deconstructed the job ad and applied a numbered scale to help us determine which candidates addressed the qualifications highlighted in the ad. The scale ranged  from 0, representing that the criterion was not addressed in the applicant’s CV/résumé or cover letter or indicating ineligibility for the apprenticeship, to 2 representing that the applicant fully addressed the criterion and met eligibility requirements for the apprenticeship. While this numbering system would help us to keep track of who excelled in crafting their application materials, we decided to allow space for evaluator comments in order to balance the quantitative and qualitative in holistically considering which applicants should progress to the final interview stage. In addition, we also decided to change the language of the following criterion: “[Applicant] discussed interest in pursuing a career in academic librarianship.” Instead of using the word pursuing, we decided to use exploring. Karen advocated for this subtle change because she emphasized that LIS students may still be unsure about committing to academic librarianship. Rather, they would benefit from the opportunity to explore what it is like to work in an academic library without the added pressure of being sure about the career path as a qualifier for being chosen to interview for the apprenticeship. 

With over sixty applications to sort through, using the updated application rubric aided in the standardization of reviewing and ranking candidates. The numbered rating system helped us to generate a finalist list to invite for interviews in an efficient manner so that we did not prolong the hiring process. This efficiency and our concern for “closing the communication loop” in a timely manner meant that we could respond to applicants and provide constructive feedback and resources if they were not progressing through the hiring process. We thought that this clear, thoughtful, and quick communication with all applicants, regardless of acceptance or rejection throughout each step, would be another way for us to respect their time, energy, and effort while also providing guidance and resources that would help to further their careers.

Reviewing and updating interview materials

While we had used an application rubric in the past, the Fall 2023 hiring cycle was the first time we used a rubric to help evaluate interviews (see Appendix C). The interview rubric was structured in a similar manner as the application rubric, where each interview question corresponded with an item on our scoring rubric. For each question that an interviewee answered, we ranked responses according to a numbered scale spanning 1 to 5, where 1 meant that the interview questions were not answered and 5 indicated that the interview questions were answered very well. Since the interview rubric focused on how well the interviewee answered the questions, we felt that this newly developed tool helped to mitigate any bias we may have had in this decision making process. Also similar to the application rubric, we decided to keep space for interviewer comments and added a field to record suggested ranking in order to balance the quantitative and qualitative evaluation. This extra space, not tied to specific interview questions, afforded an opportunity to holistically consider the interviewees and help us determine a finalist to offer joining the apprenticeship program.

We also updated the wording for the Fall 2023 hiring cycle interview questions (see Appendix D). For example, in a three-part question, we asked candidates to reflect on how they would describe themselves, how fellow students or classmates would describe them, and how a teacher, professor, or supervisor would describe them. We decided to remove the second part of this question (about how peers would describe the interviewee) because we wanted to minimize any overwhelm the interviewee may feel and we realized that it did not provide additional substantive information compared to the other parts of the question (see Appendix D, Question #4). Self-reflection and the perspective of an evaluative figure were more important to us than how a peer might view the interviewee. In addition, we felt that some students might not have had enough experience in their studies to have received any feedback from their peers. As previous interviewees, we felt this part of the question may subject interviewees to unnecessary added pressure to prove their worth in a superficially professionalized manner. 

Similarly, we changed the following question, “Please share with us what diversity, equity, and inclusion mean to you, and how these values relate to academic librarianship,” to “Please share how you engage with diversity, equity, and inclusion in your current work or studies, and how you hope to bring DEI into this position and academic librarianship (see Appendix D, Question #5). First, we felt that the initial way the question was worded referred too vaguely to DEI and academic librarianship, and this would limit the opportunity to have a productive conversation with the interviewee. Karen and Estefania felt that it was too impersonal for us to get a sense of who the candidate was and how they could uniquely contribute to and benefit from the apprenticeship. We also felt that it was making the candidate espouse broad and generalized statements about DEI, and we pointed out that this would end up enforcing surface-level commitment to DEI that we have witnessed and experienced at other institutions and in the field. We knew that this was not Megan’s intention nor goal with valuing DEI in the Ask A Librarian Apprenticeship, so we reworded the question in a way that would invite authentic reflections on DEI. 

Additionally, when reviewing the interview questions, Estefania reflected back to her experience of feeling very nervous going into the interview. She shared that she was filled with doubt and anxiety and tried to combat this by endlessly researching CU Boulder Libraries and potential interview questions. While she agreed to some extent that this research was necessary and strengthened her responses to the interview questions overall, when given the opportunity to participate in revising the interview questions, she advocated for sharing the interview questions before the interview. We agreed that this would give candidates an opportunity to ease their interview anxiety and help them prepare their responses in a more constructive way. Megan shared interview questions with each interviewee the day before their interview in the Fall 2023 hiring cycle. 

Engaging in the interview process

Everyone agreed that it would be important to specify that having cameras on during the video interview was optional. We believed this option would minimize barriers people may have to applying to positions, such as nervousness or inability to find an appropriate space due to other commitments, for example. However, we kept in mind that if hired, the full use of technology would be critical in order to engage fully with the apprenticeship. The option to have cameras on or off was communicated in an email to applicants confirming their interview time.

Including a current apprentice in the interview process as an interviewer and as someone crafting documentation was incredibly beneficial. The apprentice reflected on their own experience and improved the interview questions, clarifying and adjusting them when needed to help interviewees further express themselves and showcase their candidacy. This robust and organic apprentice involvement in the interview process allowed us to gain a deeper sense of the person interviewing for the apprenticeship, and not just reduce them to numbers and ranking. In particular, for the updated question “Please share how you engage with diversity, equity, and inclusion in your current work or studies, and how you hope to bring DEI into this position and academic librarianship,” Karen also added a phrase of “You are welcome to share any lived experiences” for the first person we interviewed. Megan really appreciated how that question was phrased, and encouraged Karen to continue to phrase this question in this modified version. Pivoting for the rest of the interviews seemed to have a positive effect, as applicants were keen to share their lived experiences of DEI as well, especially if they did not have a lot of experience working with DEI in the workplace. This change also reinforced our strategy of modifying the initial interview question in order to elicit more authentic reflections on DEI within the apprenticeship. Both Megan and Karen hoped that this change set the stage for a better interview experience overall for this current round of recruitment.

Having a current student apprentice as part of the interview process further provided a mentorship opportunity on how to reduce bias in the interview process. For one candidate in particular, Karen had asked about their potential fit in the organization and apprenticeship. Megan was gracious enough to take the time to respond by giving Karen an institutional resource on the importance of interrogating what someone means by “fit” and to actually have criteria for this in order to mitigate personal bias in the hiring process as much as possible. Megan reinforced the importance of a holistic and equity-informed application rubric that both apprentices worked to improve so that fit bias would not be an issue. She also shared with Karen a resource from CU Boulder’s website on the different types of biases that may appear in the hiring process (e.g., beauty bias, institutional bias, etc.) and how to develop a plan to recognize these (Department of Environmental Studies, n.d.). This example highlights the mentorship opportunities afforded by including apprentices in the hiring process, along with the potential to ultimately create a more supportive and equitable academic librarianship landscape. 

Reviewing and ranking interviewees to choose a finalist

After the interviews, a few candidates were highly ranked by both Karen and Megan, necessitating a need for further discussion and prioritizing who we would extend an offer to. Reviewing both of our numbered rankings and qualitative observations helped in checking our assumptions and reevaluating our assessment of the whole application. Even with the standardization of the application and interview process to efficiently and fairly narrow down the list of candidates to one finalist, we had to review the qualitative measures within our ranking system to make sure we were taking the whole person into full consideration after all interviews had taken place. Specifically, the comments section of both the application and interview rubrics helped us to appropriately and fairly incorporate the human aspects in this decision making process to choose the finalist. 

Given that the apprenticeship seeks to fill gaps in LIS students’ experiences and education, Megan was initially unsure if a particular applicant would truly benefit from the apprenticeship because they already had some experience in an academic library setting. However, Karen noted that, although this candidate had academic library experience, they did not specifically have reference experience and would benefit from filling that gap through this apprenticeship. In making this decision, Karen thought about the pressures students face as they are getting ready to apply to jobs, and thus was keenly aware that specific experiences for a skill or role plays a key role in being considered for and obtaining future employment. As a result, pointing this out influenced Megan’s perspective about the value of the apprenticeship for the candidate, and this candidate was ultimately hired. Including the perspective of a student throughout the hiring process highlights how one fellow student in a position of power can advocate for another and helps to deconstruct any assumptions about student needs, goals, and readiness for a position. Ultimately, by taking into account current student experiences and embracing a whole-person approach, we created a more positive hiring process for all in making an informed decision on the final candidate. 

Reflections from the other side of the hiring process

From the early stages of the application process, Janelle felt optimistic that the values and climate of CU Boulder Libraries would align with what she hoped for in an employer. Janelle heard about the apprenticeship from Estefania, who she knew through a student group at their institution for aspiring librarians of color. Based on Estefania’s comments about her experience, Janelle sensed that the apprenticeship would be a great work environment and an ideal opportunity to learn more about academic librarianship. 

When Janelle went to apply in Summer 2023, she was struck by how approachable the job posting was. Unlike a number of position descriptions she encountered, when reading the Ask a Librarian posting, she thought to herself, “Wow, I definitely meet all of those requirements! I feel very confident about applying.” Particularly for internships and apprenticeships where training and learning is an integral part of a student’s experience, it is helpful when postings are transparent about the skills and mindset required for a position, while framing these requirements in a way that encourages students to apply.

Janelle also remembers the interview process as a positive experience. In her professional career, she recalls only one other interview where she received the questions in advance. In both cases, receiving the questions beforehand allowed her to enter the interview feeling more at ease, having ideas of what she could discuss for each question. She appreciated how welcoming Megan and Karen were, which helped create a supportive environment during the interview. Although she had initial nerves (as with most interviews), as the interview progressed she became more comfortable due to how Megan and Karen facilitated the interview. She was unable to ask all of her questions during the 30-minute interview, and so at Megan and Karen’s encouragement, she emailed her questions to them afterward. She appreciated the depth of their responses, and found it very helpful to be able to ask Karen directly about her experience with the apprenticeship.

During the interview process, it was clear that Karen was an active participant, and not just an observer. Beyond simply asking questions, Karen was very engaged and present in the interview process, which was a role Janelle had not seen a student occupy before. To Janelle’s knowledge, students are not typically embedded in the hiring process to this extent, although as mentioned above, thoughtfully involving students in the hiring process brings benefits to everyone involved. Seeing Karen’s significant involvement in the hiring process indicated to Janelle that her input and perspectives were valued, and showed her the potential that CU Boulder Libraries apprentices have to be active and respected participants in projects and tasks as important as hiring a new student employee.

As an apprentice who was hired through a process that included active involvement from a current apprentice, Janelle experienced firsthand the benefits of this approach to hiring. From learning about the apprenticeship from Estefania, to asking Megan and Karen questions about the apprenticeship, and to actually working in the position, the apprenticeship experience has met her original expectations. Throughout the hiring process, Janelle gained a good sense of the culture at CU Boulder Libraries, which made her feel confident and excited when starting the position. As a current Ask A Librarian apprentice, her opinions and experiences are valued, and she has had opportunities to challenge herself while receiving guidance and support. This speaks to the apprenticeship’s strength in empowering emerging librarians so that they have increased confidence when starting out in full-time positions. 

Recommendations

For professional librarian positions, we often hear the phrase that “interviewing is a two-way street”–the institution is interviewing the applicant and the applicant is interviewing the institution. By interrogating our hiring processes for graduate student positions, we can help foster that “two-way street” mentality at the student employment level as well. Understanding how individual academic institutions can differ, we anticipate that libraries can customize incorporating LIS students in the hiring process based on their needs. Through the course of writing this article, we have also recognized how our hiring process can improve in future hiring cycles. We would like to offer some recommendations for you as we consider how we may continue to iterate upon the hiring processes we outlined above: 

  • Introduce students to what you have to offer. Host a drop-in information session for potential applicants to learn about the apprenticeship before applying. At CU Boulder, we envision Megan sharing some information about the apprenticeship during the first part of an information session then leaving so that applicants may openly ask past and current apprentices about their experiences, Megan’s supervisory style and level of support, and how any institutional issues have impacted them. Apprentices can also share what projects they have worked on, specific accomplishments they achieved, and what they learned through the apprenticeship. Such a session is also a great time to introduce potential applicants to the values of the institution and share how the apprenticeship aligns and supports the mission, vision, and values of the library. The goal of this session is transparency and we encourage readers to consider ways that their hiring processes may be more transparent. 
  • Offer alternative opportunities as a source of continued support. Include links to similar apprenticeship opportunities or other professional development opportunities in emails to candidates who are not chosen for the position. As artificial intelligence is already changing the ways in which candidates draft their documentation and apply for jobs, this is an important time for the field of library science to consider how such tools may be used effectively by LIS student applicants. An applicant rejection email may include links to AI tools which could support crafting stronger application documentation for future job opportunities. Offer to connect applicants to colleagues you know if their current geographic region or work align with the LIS student’s career goals. Leveraging your networks and making connections to others in the LIS field can be a helpful source of support for LIS student applicants as they pursue other experiences in the field. 
  • Be open, invite critique, make changes, and repeat. We regularly reflect on our hiring processes and we suggest that, immediately after hire, the incoming apprentice is invited to consider the hiring process they just experienced, provide feedback on it, and suggest changes. Student applicant perspectives are invaluable and need to be honored in order to improve processes for future applicants. 
  • Build community among apprentices and highlight their value to the institution. Host debrief sessions where apprentices can share updates on project work, collectively explore successes and challenges, and socialize. An aspirational improvement to CU Boulder’s apprenticeship is inviting the cohort of apprentices for a site visit to explore the physical library and campus spaces that they will answer questions about through chat reference, and to build community. Administrative support and funding for such a site visit or for other professional development opportunities (e.g., attending conferences, funding book purchases to build a student’s professional library) signal that the library values LIS student labor and sees the apprenticeship as an important component of the professional journey to invest in. While requesting funds to support these opportunities may seem intimidating, we encourage you to ask, even if you think the answer will be “no” or “not yet.” We view such funding requests as acts of advocacy and we believe that advocating for ourselves is inherently advocating for others.
  • Think critically and reflect often about the ways the traditional power structures inherent to hiring practices may be disrupted. We appreciate the suggestions of Eamon Tewell, our external reviewer, in considering the possibility of apprentices exclusively leading the hiring process and offering a final hiring recommendation to HR, rather than offering Megan input which leads to her making the final hiring decision as the apprenticeship coordinator. The hiring process could also afford an opportunity for LIS student applicants to interview the highest levels of the library hierarchy before they are even hired. While we provide an opportunity for apprentices to “pick a Dean” to meet with within their first few months of CU Boulder apprenticeship as a way to challenge feelings of intimidation prior to the high stakes meeting with library leadership during their first post-graduate job interview, we can intentionally place a meeting with library leadership prior to the apprenticeship hire so that applicants can learn about leadership’s priorities as they consider if they want to accept an offer to join the institution. 
  • Foster student support networks. Many LIS students were encouraged to apply to the CU Boulder apprenticeship based on the encouragement of peers. Such informal, word of mouth networks are crucial supports for students as they navigate library school and the job search process. Building upon these informal networks while also acknowledging the competing priorities faced by many students, we would ideally like to see student-run listservs, job boards, a dedicated group (similar to the “We Here” Facebook group) for students, and a library Green Book for LIS students which provides information on the quality of mentorship, culture, and institutional support at libraries that employ LIS students.  
  • Expand networks and community among LIS mentors. We would also like to see the development of a community of practice which focuses on LIS student mentorship. Some support may be found for mentors affiliated with specific programs (e.g., the ARL Kaleidoscope Program), in informal networks, and at related gatherings such as the relatively new Conference on Academic Library Management which is hosting its fourth conference in 2024. However, currently, there is not a distinct source of community and support for mentors of LIS students more broadly. 

Conclusion

We hope that the documented hiring practices of CU Boulder’s Ask a Librarian Apprenticeship can act as a testimony for how to improve practical learning experiences for LIS students. We encourage academic libraries to advocate for and invest in paid employment opportunities such as apprenticeships, and when possible, to invite students to participate in the hiring process to provide a realistic work experience that will be valuable when students enter the job market. The benefits of including apprentices in the hiring process are apparent and abundant. Their input can foster inclusion in the hiring process by providing reflection and reassessment of job ads, recruiting, and the interview process. In turn, current apprentices help to reduce barriers for student applicants throughout the hiring process. Also, when applicants see apprentices deeply embedded in the hiring process, it can reflect positively on the institution’s culture, and help applicants feel at ease, knowing they can speak directly with a fellow student about the position and to see if it would benefit their professional goals. 

One of the most meaningful aspects of CU Boulder’s apprenticeship program is its iterative nature. The evolution of our hiring practices embodies this iterative approach and highlights the value of LIS student perspectives and experiences in academic library settings. We hope the curiosity and growth embodied in our own apprenticeship will be mirrored across the profession as more institutions and librarians think deeply about the opportunities they can provide to LIS students.


Acknowledgments

The authors would like to thank the colleagues who helped to make this article into the piece you are reading today, especially our ITLWTLP editor Jess Schomberg, ITLWTLP peer-reviewer Jaena Rae Cabrera, and our external reviewer, Eamon Tewell, whose invaluable feedback challenged us to interrogate our practices more deeply. This work is the culmination of various rounds of hiring and input from past Ask a Librarian Apprentices; we would like to honor their contributions to improving our hiring practices over the years. In an article so strongly focused on the power of mentorship and succeeding in the academic library job search, we also want to thank all of the mentors who have helped to shape our library journeys: Dawn Harris, Lisa Hopkins, Jamie Lin, Victoria Adjei, Nicole Finzer, Laura Alagna, Kana Jenkins, Motoko Lezec, Kirsten Gaffke, Kimberly Go, Ann Ku, Elise Wu, Noriko Asato, Renee Hill, Carisse Berryhill, Craig Chapin, Arianna Alcaraz, Ray Pun, Tsione Wolde-Michael, Steve Adams, Katrina Fenlon, Alison Oswald, Irene Lewis, Noriko Sanefuji, Steve Hoke, Bill and Nancy Stragand, Farah Nageer-Kanthor, Sharon Friedman, Meredith Bowers, Patrice Folke, Sheila and George Madison, Rose Tabbs, Twanna Hodge, Xiaoli Ma, Gama Viesca, Jennifer Knievel, and Karen Sobel.


References

American Library Association. (2012). Diversity Counts. http://www.ala.org/aboutala/offices/diversity/diversitycounts/divcounts

Cunningham, S., Guss, S., & Stout, J. (2019). Challenging the ‘good fit’ narrative: Creating inclusive recruitment practices in academic libraries. Recasting the Narrative: The Proceedings of the ACRL 2019 Conference, April 10–13, 2019, Cleveland, Ohio, 12–21. https://scholarship.richmond.edu/university-libraries-publications/42

Department of Environmental Studies. (n.d.) Develop a plan to recognize and mitigate bias. https://www.colorado.edu/envs/develop-plan-recognize-and-mitigate-bias

Galvan, A. (2015). Soliciting performance, hiding bias: Whiteness and librarianship. In the Library with the Lead Pipe. https://www.inthelibrarywiththeleadpipe.org/2015/soliciting-performance-hiding-bias-whiteness-and-librarianship/

Goodsett, M., & Koziura, A. (2016). Are library science programs preparing new librarians? Creating a sustainable and vibrant librarian community. Journal of Library Administration, 56(6), 697–721. https://doi.org/10.1080/01930826.2015.1134246

Harper, L. M. (2020). Recruitment and retention strategies of LIS students and professionals from underrepresented groups in the United States. Library Management, 41(2/3), 67–77. https://doi.org/10.1108/LM-07-2019-0044

Holler, J. L. R. (2020). Equity budgeting: A manifesto. Marion Voices Folklife + Oral History. https://marionvoices.org/equity-budgeting/

Houk, K., & Nielsen, J. (2023). Inclusive hiring in academic libraries: A qualitative analysis of attitudes and reflections of search committee members. College & Research Libraries, 84(4). https://doi.org/10.5860/crl.84.4.568

Lacy, M., & Copeland, A. J. (2013). The role of mentorship programs in LIS education and in professional development. Journal of Education for Library & Information Science, 54(1), 135–146. https://www.jstor.org/stable/43686941

Lewey, T. D., & Moody-Goo, H. (2018). Designing a meaningful reference and instruction internship: The MLIS student perspective. Reference & User Services Quarterly, 57(4), 238–241. https://www.jstor.org/stable/90022642

Shah, M., & Fife, D. (2023). Obstacles and barriers in hiring: Rethinking the process to open doors. College & Research Libraries News, 84(2). https://doi.org/10.5860/crln.84.2.55

Wang, K., Kratcha, K. B., Yin, W., & Tewell, E. (2022). Redesigning an academic library internship program with equity in mind: Reflections and takeaways. College & Research Libraries News, 83(9). https://doi.org/10.5860/crln.83.9.385

Welsh, M. E., & Knievel, J. (2022). Navigating in the fog: Shining a light on the library job search process. Atla Summary of Proceedings, 9–14. https://doi.org/10.31046/proceedings.2022.3178 

Wildenhaus, K. (2019). Wages for intern work: Denormalizing unpaid positions in archives and libraries. Journal of Critical Library and Information Studies, 2(1), Article 1. https://doi.org/10.24242/jclis.v2i1.88

Appendix A: 

Job Ad Used in the Fall 2023 Hiring Cycle

Apprenticeship Announcement

University of Colorado (CU) Boulder Libraries

Ask A Librarian Apprenticeship (Virtual)

Approximately 12 hrs/week throughout Fall 2023 – Spring 2024 academic year, $19-$20/hr 

Description

Gain practical professional experience in a robust academic library. The CU Boulder University Libraries is looking to hire an Ask A Librarian Apprentice who will receive training in research competencies, staff the Ask A Librarian virtual chat service two evenings from 5-8pm MT (Mondays & Wednesdays) and one weekend day from 1-5pm MT each week (Sundays), participate in special projects based on professional interests under the mentorship of a faculty librarian, and explore issues relevant to new academic librarians through professional development opportunities. The successful candidate will provide virtual research assistance in a major academic library that serves a world-class research university. This position is a great opportunity to supplement your graduate studies with experiential learning and explore the field of academic librarianship. 

Responsibilities:

  • Provide virtual research assistance through our Ask Us! chat service
  • Attend trainings, workshops, and meetings on a virtual meeting platform  
  • Participate in special projects based on professional interests and availability, under the mentorship of the Ask A Librarian Apprenticeship supervisor
  • Explore other internal and external opportunities for professional development, including research, writing, publishing, and presentations, based on interest and availability

Qualifications:

  • Currently enrolled as a library & information science graduate student for the duration of the apprenticeship. 
  • Candidates must be eligible to work in the United States at time of hire.
  • Maintain a strong customer service orientation and a desire to provide high quality research assistance.
  • Demonstrate interest in the principles of diversity, equity, inclusion, accessibility, and social justice, and how these relate to the mission and values of CU Boulder’s University Libraries.
  • Interest in exploring a career in academic librarianship.

Additional Information

A core goal of the apprenticeship program is to invite and encourage involvement of MLIS students from traditionally underrepresented groups in academic librarianship. BIPOC (Black, Indigenous, and People of Color) MLIS students are highly encouraged to apply. 

This program begins with trainings which can occur around your schedule in August 2023 and staffing virtual reference shifts with experienced colleagues from mid August through mid-September 2023. The Ask a Librarian Apprentice is expected to complete approximately 12 hours of work per week, including virtual reference shifts on nights and weekends (schedule to be finalized at point of hire), project work, and professional development. The Apprentice will be paid $19-$20/hr and will work through the Fall 2023 – Spring 2024 academic year. For full consideration, please apply by Monday, June 26, 2023. A course schedule providing proof of enrollment in a library science graduate program is required at the time of hire.

To apply, please submit the following documents:

  1. Cover Letter
  2. Resume or CV

Send application materials with “Ask A Librarian Apprenticeship Application” in the subject line to Megan.Welsh@colorado.edu

Appendix B: 

Rubric Used to Evaluate Application Materials in the Fall 2023 Hiring Cycle

Apprentice Application Rubric
 
This rubric will help to quantify the credentials we seek and determine which applicants we should invite to interview.
 
Evaluator: _____________________________________
 
Candidate name: _______________________________
 
The applicant is currently enrolled as a Masters of Library & Information Science graduate student.
0 = not currently enrolled
1 = enrolled for part of the apprenticeship
2 = enrolled for the duration of the apprenticeship
 
Has the applicant completed at least one semester/quarter?
No, they are entering their first semester/quarter
They have completed one semester/quarter
Yes, they have completed two semesters/quarters or more
 
Does the applicant have reference or customer service experience, or did they discuss customer service mindset/philosophy?
0 = No customer service/reference experience; didn’t discuss
2 = Discussed reference/customer service experience/philosophy
 
Does the applicant already have a position similar to the apprenticeship?
0 = Yes, either formerly or currently employed in a role similar to CU’s apprenticeship
2 = No, the applicant has not had nor is currently employed in a role similar to CU’s apprenticeship
 
Did the applicant demonstrate or discuss interest in the principles of diversity, equity, inclusion, accessibility, and social justice, and how these relate to the mission and values of CU Boulder’s University Libraries?
0 = No discussion or evidence of DEIA principles
2 = Demonstrated interest around DEIA principles
 
Discussed interest in exploring a career in academic librarianship.
0 = No, didn’t discuss
1 = Yes, did discuss
 
Does the apprenticeship fill gaps in the applicant’s training and experience (e.g., would the apprenticeship provide reference experience that they desire but currently don’t have?)?
No, the applicant has a wealth of academic library experience already
Yes, the apprenticeship would fill an important gap
 
Comments: _______________________________________________________
 

Appendix C: 

Rubric Used to Evaluate Interviews During the Fall 2023 Hiring Cycle

Apprentice Interview Rubric
This rubric will help us think about candidate responses to questions and rank them to determine a finalist.

Evaluator: _____________________________________

Candidate name: _______________________________

On a scale of 1 to 5, how well did interviewees address each question? 

What motivates you to explore the field of academic librarianship? 
1 = Did not answer 5 = Answer far exceeded expectations!

Tell us about yourself and any past experiences, such as course work or work experience, that would help you in this position. 
1 = Did not answer 5 = Answer far exceeded expectations!

What is your approach to reference/research assistance services? 
1 = Did not answer 5 = Answer far exceeded expectations!

Think of a time where you facilitated a particularly positive customer service interaction. What about that situation went well? What qualities contributed to a positive interaction?
1 = Did not answer 5 = Answer far exceeded expectations!

Please share how you engage with diversity, equity, and inclusion in your current work or studies, and how you hope to bring DEI into this position and academic librarianship.
1 = Did not answer 5 = Answer far exceeded expectations!

If you were to describe yourself in three adjectives or short descriptive phrases, what would they be? If a past teacher/professor/supervisor were to describe you in three adjectives or short descriptive phrases, what would they be?
1 = Did not answer 5 = Answer far exceeded expectations!

What makes the apprenticeship appealing to you?
1 = Did not answer 5 = Answer far exceeded expectations!

Through December 2023, this position will require 2 evening shifts from 5-8pm MT, including Monday and Wednesday evenings, and one weekend shift on Sundays, each week. Based on your schedule, does this work for you? 
Yes No Other: ___________________

Did the interviewee ask questions? (use “other” to describe if there was no time left to ask questions)
Yes No Other: ___________________

Overall reactions/Comments: _______________________________________________________

Suggested ranking: ______

Appendix D: 

Interview Questions Used in Fall 2023 Hiring Cycle

  1. What motivates you to explore the field of academic librarianship? 
  1. Tell us about yourself and any past experiences, such as course work or work experience, that would help you in this position. 
  1. What is your approach to reference/research assistance services? (If they don’t have reference experience: Could you describe any other experience you have providing customer service in a virtual environment, or how your in-person customer service experience might transfer to a virtual environment?) 
  1. This question has two parts: Think of a time where you facilitated a particularly positive customer service interaction (If they are struggling: maybe you were the customer, maybe you were the one providing the service).
    • What about that situation went well?
    • What qualities contributed to a positive interaction?
  1. Please share how you engage with diversity, equity, and inclusion in your current work or studies, and how you hope to bring DEI into this position and academic librarianship. 
  1. This question has two parts:
    • If you were to describe yourself in three adjectives or short descriptive phrases, what would they be?
    • If a past teacher/professor/supervisor were to describe you in three adjectives or short descriptive phrases, what would they be?
  1. What makes the apprenticeship appealing to you?
  1. Through December 2023, this position will require 2 evening shifts from 5-8pm MT, including Monday and Wednesday evenings, and one weekend shift on Sundays, each week. Based on your schedule, does this work for you?  
  1. What questions do you have for us?

Comments Policy:

In the Library with the Lead Pipe welcomes substantive discussion about the content of published articles. This includes critical feedback. However, comments that are personal attacks or harassment will not be posted. All comments are moderated before posting to ensure that they comply with the Code of Conduct. The editorial board reviews comments on an infrequent schedule (and sometimes WordPress eats comments), so if you have submitted a comment that abides by the Code of Conduct and it hasn’t been posted within a week, please email us at itlwtlp at gmail dot com!

Reflection: The first half of my sixth year at GitLab: helping other (Support Engineers) and leaving Support / Cynthia Ng

It’s a bit mind boggling to me that I’m talking about my sixth year at GitLab. It’s not quite been half (5 months), but as I’m internally transferring out of Support, I thought this was a good place to break up “the year”. First time readers may want to check out my previous reflection posts. … Continue reading "Reflection: The first half of my sixth year at GitLab: helping other (Support Engineers) and leaving Support"

Video Game Preservation / David Rosenthal

Source
I have written fairly often about the problems of preserving video games, most recently last year in in Video Game History. It was based upon Phil Salvador's Survey of the Video Game Reissue Market in the United States. Salvador's main focus was on classic console games but he noted a specific problem with more recent releases:
The largest major platform shutdown in recent memory is the closure of the digital stores for the Nintendo 3DS and Wii U platforms. Nintendo shut down the 3DS and Wii U eShops on March 27, 2023, resulting in the removal of 2,413 digital titles. Although many of these are likely available on other platforms, Video Games Chronicle estimates that over 1,000 of those games were exclusive to those platforms’ digital stores and are no longer available in any form, including first-party Nintendo titles like Dr. Luigi, Dillon’s Rolling Western, Mario & Donkey Kong: Minis on the Move, and Pokémon Rumble U. The closures also affected around 500 historical games reissued by Nintendo through their Virtual Console storefronts, over 300 of which are believed not available on any other platform or service.
Below the fold I discuss recent developments in this area.

Salvador writes:
Games released during the digital game distribution era may have content or features tied to online services, which may be (and regularly are) deactivated. According to researcher James Newman, this is sometimes employed by game publishers as a deliberate strategy to devalue used games, shorten their distribution window, and encourage sales of new titles, which has ominous implications for preservation.
This post started with Timothy Geigner's One YouTuber’s Quest For Political Action To Preserve Old Video Games. He lays out the problem:
it’s probably long past time that there be some sort of political action to address the real or potential disappearance of cultural output that is occurring. The way this works far too often is that a publisher releases a game that is either an entirely online game, or an offline game that requires backend server calls or connections to make it work. People by those games. Then, some time down the road, the publisher decides supporting the game is no longer profitable and shuts the servers down on its end, disappearing the purchased game either completely, or else limiting what was previously available. Those that bought or subscribed to the game are left with no options.
The trigger for renewed attention to this problem was Ubisoft's April 1st(!) delisting of The Crew:
With The Crew, millions of copies of the game were played around the world. When Ubisoft delisted the game late last year, the game became unplayable. On top of that, because of copyright law, it would be illegal for fans to keep the game alive themselves by running their own servers, even assuming they had the source code necessary to do so. So fans of the game who still want to play it are stuck.
Kenneth Shephard reported in A Ubisoft Game Is At The Center Of A Fight To Stop Online Game Shutdowns that this triggered an effort to respond:
Ross Scott, who runs Accursed Farms, posted a 31-minute video on the channel, which outlines the problem and how he believes drawing attention to The Crew’s April 1 shutdown could cause governments to enact greater consumer protections for people who purchase online games. As laid out in the video, consumer rights for these situations vary in different countries. France, however, has some pretty robust consumer laws, and Ubisoft is based there.
Here is Scott's video:
Scott surveyed countries around the world looking for the prospects of two possible actions:
  • Lobbying the country's consumer protection agency to take action on the grounds that the company purported to sell the game but failed to make it clear that the game could be rendered useless at any time without warning or compensation.
  • Petitioning the government to take legal or legislative action against this practice.
These actions would be aimed at ensuring that after a vendor's support for a game it sold ended:
  • Games sold must be left in a functional state.
  • Games sold must require no further connection to the publisher or affiliated parties to function.
  • The above also applies to games that have sold microtransactions to customers.
  • The above cannot be superseded by end user license agreements.
These appear to match Scott's goal of a minimal set of consumer rights that is hard to argue against. If the game publishers can't live with them they can always explicitly rent the game instead of selling it. Renting makes it clear that the purchaser's rights are time-limited.

More details of the campaign can be found at Scott's Stop Killing Games website. Scott has so far posted three update videos to his Accursed Farms YouTube channel:
The idea that purchasers are entitled to be told at least the minimum lifetime of the gane is interesting. Ubisoft's excuse for delisting The Crew was that their licenses to music and other content featured in the game had expired. But that implies that, at launch, they knew that they would delist the game at the date when their licenses were due to expire. To make an informed purchase decision customers needed to know that date. Not providing it on the box was clearly deceptive.

Publishers would hate a requirement to put a "playable until" date on the box; it would clearly reduce purchases. They might find Scott's requirements less onerous.

Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 10 June 2024 / HangingTogether

The following post is one in a regular series on issues of Inclusion, Diversity, Equity, and Accessibility, compiled by a team of OCLC contributors.

Knowledge equity, and the role of ontologies 

A purple petunia growing between the cracks of a sidewalk.Photo by Ted Balmer on Unsplash

Wikimedia Deutschland (WMDE) shared key findings from their Knowledge Equity in Linked Open Data project. They discovered that while Wikidata has immense potential for sharing knowledge, it still carries over structural and historical inequities from Wikipedia. The project involved community members working with marginalized knowledge, who faced challenges fitting their knowledge into Wikidata’s Western, academic perspectives. As a result, these communities have started building their own knowledge graphs, finding a sense of freedom and safety in expressing knowledge that reflects their needs. However, the report highlights high barriers to developing the necessary expertise due to scattered documentation and limited technical support. Additionally, the lack of mobile-friendly interfaces further hinders access for marginalized communities who heavily rely on mobile internet. 

Last month we wrote about OCLC Research’s engagement with the community around the WorldCat ontology. Findings in the WMDE report from align well with what we learned, which is that library-based ontologies can exclude other worldviews. In Wikidata, this has led communities to create focused ontologies that represent marginalized knowledge in ways that reflect community epistemologies. It would be useful for those of us working to reimagine descriptive workflows to consider the barriers identified in this report.  Contributed by Richard J. Urban. 

Houston’s LGBTQ history in radio archives 

A piece from NPR’s Morning Edition, Saving Houston’s LGBTQ history through thousands of hours of radio archives, highlights the important role of audio-visual collections in documenting culture and history. As is so often the case, a group of dedicated community members kept and safeguarded the fragile cassette recordings for over thirty years, when they were painstakingly digitized by University of Houston archivists Emily Vinson and Bethany Scott.  

As we kick off Pride month in the US, this story helps to illuminate the important role of radio, and how it is not only a vital medium for communities but also plays a vital role in reflecting history and experiences. The piece also gives some insight into how difficult it is to migrate delicate audiovisual formats to digital before it is too late. (As a sidenote, Vinson also presented on her work with A/V backlogs in a 2020 Works in Progress Webinar: Approaches to Processing Audiovisual Archives for Improved Access and Preservation Planning. Contributed by Merrilee Proffitt

Patterns in library censorship 

In recent weeks, former librarian Kelly Jensen of Book Riot, someone who always tracks the pulse of censorship in the United States, has written a series of pieces doing exactly that.  Each one deserves to be read and absorbed.  In “Are Librarians Criminals?  These Bills Would Make Them So: Book Censorship News, May 3, 2024,” Jensen looks at some of the anti-library — and anti-librarian — legislation under consideration or enacted in eighteen states. “Here’s Where Library Workers are Prohibited From Their Own Professional Organization: Book Censorship News, May 24, 2024,” highlights the bills that seek to keep library workers from becoming part of the American Library Association.  Because ALA is the organization that accredits library and information studies programs across the U.S., Canada, and Puerto Rico, among many other things, these anti-ALA efforts threaten to deprofessionalize library work.  Lest you fear that it’s all bad news, Jensen also tells “How Alabama Library Supporters Took Action and You Can, Too: Book Censorship News, June 7, 2024,” and the story that “Colorado Passes Anti-Book Ban Bill for Public Libraries.” 

The insightful and vital work of Kelly Jensen has been noted in “Advancing IDEAs” on16 April 2024, “Book censorship in academic, public, and school libraries,” on 7 March 2023, “During comic book challenges.”  Her invaluable “Book Censorship News” series notwithstanding, she also shares happier themes, especially regarding Young Adult literature, gifts for the bookish, leisure reading suggestions, and other stuff for those who love books and libraries.  Contributed by Jay Weitz. 

FAIR + CARE survey: establishing current data practices  

The FAIR + CARE Cultural Heritage Network is a new project with the aim to develop, disseminate, and promote ethical good practice guidance and digital data governance models integrating FAIR (Findable, Accessible, Interoperable, and Reusable) Data with CARE (Collective benefit, Authority to control, Responsibility, and Ethics) Principles for Indigenous Data Governance. The FAIR+CARE network aims to reconcile the principles of both standards for future incorporation into data governance models that are both socially and technically compliant and compassionate, with a focus on data related to Indigenous and other descendant communities. In 2021, Hanging Together reported on an event hosted by the OCLC RLP and the National and State Libraries Australia (NSLA) on the CARE Principles.   

A project survey is open until 30 June 2024 and and invites respondents to share their collection, management, preservation, curation, sharing and storage of cultural information practices to gather a landscape of what the field is currently doing.  

The FAIR+CARE principles are valuable for cultural heritage and cultural resources manager. They are equally valuable for libraries, archives and museums that hold cultural heritage objects and information about them as we navigate the world when data reuse and social and cultural expectations are growing, and sometimes conflict. Better guidance for all cultural information professionals is sorely needed to be respectful and transparent in their practice now and into the future. Contributed by Lesley A. Langa 

The post Advancing IDEAs: Inclusion, Diversity, Equity, Accessibility, 10 June 2024 appeared first on Hanging Together.

#ODDStories 2024 @ Goma, DRCongo 🇨🇩 / Open Knowledge Foundation

On March 05th, 2024 in the Lake Tanganyika coastal city of Baraka in the eastern Democratic Republic of the Congo, Disaster Risk Management in Africa – DRM Africa, an initiative which strengthens the community resilience to natural and anthropogenic hazards in the African Great Lakes region since 2012, with financial support from Open Knowledge Foundation, held an Open Data Day event entitled “Open Data for Risk-informed societies”, in response to the ongoing Lake Tanganyika’s water rise.

Since 2017, adverse effects of climate change have dramatically increased in Lake Tanganyika’s coastal areas disrupting the social and economic tissue of all riparian countries in the African Great Lakes region (Tanzania, Burundi, Zambia and the Democratic Republic of the Congo).

While 2021’s most disastrous rapid rise of Lake Tanganyika has affected over 50,000 basic infrastructures in the Congolese coastal city of Baraka and left up to 50,000 homeless on its shorelines, the nowadays water rise has already surpassed last year’s level which already affected thousands additional basic infrastructures all coastal cities according to reports from the Congolese local disaster management agency.

The proposed event’s overall goal was to strengthen the community’s resilience to the adverse impact of lake Tanganyika’s rapid rise in harnessing the power of open data to address the pressing increase the community’s level of preparedness and ensure a sustainable and resilient future for coastal communities. By leveraging climate collected and available public data, the project aimed at raising awareness, facilitating informed decision-making, and implementing practical solutions to protect vulnerable coastal communities in the riparian countries especially the Congolese coastal city of Baraka and neighborhoods.

Milestones

In order to reach the proposed project’s goal, the following activities were carried out:

  • Identification of already flooded and flood-prone areas in the city of Baraka: the activity aimed at collecting data in the field to identify and map areas threatened by coastal floods and those already flooded. We collected data such as basic infrastructures already flooded in flood-prone areas in counties of Mwemezi, AEBAZ, Matata and Moma in the city of Baraka.
  • Flood Risk Management capacity building: the activity aimed at using data collected in the field to build the capacity of vulnerable communities in flood risk management. Up to 50 participants including local authorities, and representatives of flood-prone quarters, have participated in the risk management capacity-building workshop. The activity was animated and moderated by myself, Kashindi Pierre, Founding CEO of DRM Africa.

Outcomes

After the project’s completion, the following are outcomes raised:

  • Increased knowledge in flood risk management for up to 50 participants, including vulnerable communities exposed to river and coastal floods from the city of Baraka and neighborhoods.
  • Up to 5,000 basic infrastructures were identified and mapped in the city of Baraka and neighborhoods.
  • Active involvement of local communities in addressing the adverse effects of lake Tanganyika’s rapid rise.

About Open Data Day

Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities.

As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date within a one-week period. In 2024, a total of 287 events happened all over the world between March 2nd-8th, in 60+ countries using 15 different languages.

All outputs are open for everyone to use and re-use.

In 2024, Open Data Day was also a part of the HOT OpenSummit ’23-24 initiative, a creative programme of global event collaborations that leverages experience, passion and connection to drive strong networks and collective action across the humanitarian open mapping movement

For more information, you can reach out to the Open Knowledge Foundation team by emailing opendataday@okfn.org. You can also join the Open Data Day Google Group to ask for advice or share tips and get connected with others.

Curves on a Coordinate Axis / Ed Summers

The narrator, in Tokarczuk’s Flights, explains their difficulty in studying psychology, which I think is also a good commentary on the difficulty of layering quantitative methods over qualitative ones, and the tyranny of categories more generally:

How was I supposed to analyze others when it was hard enough for me to get through all those tests? Personality diagnostics, surveys, multiple columns on multiple-choice questions all struck me as too hard. I noticed this handicap of mine right away, which is why at university, whenever we were analyzing each other for practice, I would give all of my answers at random, whatever happened to occur to me. I’d wind up with the strangest personality profiles–curves on a coordinate axis. “Do you believe that the best decision is also the decision that is easiest to change?” Do I believe? What kind of decision? Change? When? Easiest how? “When you walk into a room, do you tend to head for the middle or the edges?” What room? And when? Is the room empty, or are there plush red couches in it? What about the windows? What kind of view do they have? The book question: Would I rather read one than go to a party, or does it also depend on what kind of book it is and what kind of party?

What a methodology! It is tacitly assumed that people don’t know themselves, but that if you furnish them with questions that are smart enough, they’ll be able to figure themselves out. They pose themselves a question, and they give themselves an answer. And they’ll inadvertently reveal to themselves that secret they knew nothing of till now.

And there is that other assumption, which is terribly dangerous–that we are constant, and that our reactions can be predicted. (Tokarczuk, 2019, pp. 14–15)

It reminds me of a poem by another Polish Nobel Prize winner, Wisława Szymborska’s A Word on Statistics. We can unlock new understanding with words, but we need to enter into them first.

References

Tokarczuk, O. (2019). Flights. (J. Croft, Trans.) (First Riverhead trade paperback edition). New York: Riverhead Books.

Graduate hourly-paid job: chemistry expert for a computer information system design project (summer 2024) / Jodi Schneider

Prof. Jodi Schneider’s Information Quality Lab <https://infoqualitylab.org> seeks a paid graduate hourly researcher ($25/hour) to be a chemistry expert for a computer information system design project. Your work will help us understand a computational chemistry protocol by Willoughby, Jansma, and Hoye (2014 Nature Protocols), and the papers citing this protocol. A code glitch impacted part of the Python script for the protocol; our computer information system aims to determine which citing papers might have been impacted by the code glitch, based on reading the papers.

The project can start as soon as possible and needs to be completed in July or early August 2024. We expect your work to take 15 to 20 hours, paid at $25/hour for University of Illinois Urbana-Champaign graduate students. 

Tasks

  • Read and understand a computational chemistry protocol (Willoughby et al. 2014)
  • Read Bhandari Neupane et al. (2019) to understand the nature of the code glitch
  • Make decisions about whether the main findings are at risk for citing publications. You’ll read sentences around citations to ~80 citing publications.
  • Work with an information scientist to design a decision tree to capture the decision-making process.

Required Qualifications

  • Enrolled in a graduate program (Master’s or PhD) in chemistry at University of Illinois Urbana-Champaign and/or background in chemistry sufficient to understand Willoughby et al. (2014) and Bhandari Neupane et al. (2019)
  • Good verbal and written communication skills
  • Interest and/or experience in collaboration

Preferred Qualifications

  • Experience in computational chemistry (quantum chemistry or molecular dynamics) preferred
  • Interest in informatics or computer systems preferred

How to apply

Please email your CV and a few sentences about your interest in the project to Prof. Jodi Schneider (jodi@illinois.edu). Application review will start June 10, 2024 and continue until the position is filled.

Sample citation sentence for Willoughby et al. 2014

“Perhaps one of the most well-known and almost mandatory “to-read” papers for those initial practitioners of the discipline is a 2014 Nature Protocols report by Willoughby, Jansma, and Hoye (WJH).10 In this magnificent piece of work, a detailed 26-step protocol was described, showing how to make the overall NMR calculation procedure up to the final decision on the structure elucidation.”

from: Marcarino, M. O., Zanardi, M. M., & Sarotti, A. M. (2020). The risks of automation: A study on DFT energy miscalculations and its consequences in NMR-based structural elucidation. Organic Letters, 22(9), 3561–3565. https://doi.org/10.1021/acs.orglett.0c01001

Bibliography

Bhandari Neupane, J., Neupane, R. P., Luo, Y., Yoshida, W. Y., Sun, R., & Williams, P. G. (2019). Characterization of Leptazolines A–D, polar oxazolines from the cyanobacterium Leptolyngbya sp., reveals a glitch with the “Willoughby–Hoye” scripts for calculating NMR chemical shifts. Organic Letters, 21(20), 8449–8453. https://doi.org/10.1021/acs.orglett.9b03216

Willoughby, P. H., Jansma, M. J., & Hoye, T. R. (2014). A guide to small-molecule structure assignment through computation of (1H and 13C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042

Distant Reader Catalog / Distant Reader Blog

Abstract

About a year ago I implemented a traditional library catalog against content of the Distant Reader. I used Koha to do this work, and the process was almost trivial. Moreover, the implementation suits all of my needs. Kudos to the Koha community!

Introduction

About a year ago I got an automated email message from OCLC, and to paraphrase, it said, "Your collection has been successfully updated and added to WorldCat." I asked myself, "What collection?" After a bit of digging around, I discovered a few OAI-PMH data repositories I had submitted to OCLC many years ago, and these repositories contained the content being updated. Through the process of this discovery, I learned I have an OCLC symbol, ININI , and after wrestling authentication procedures, I was able to edit my profile. Fun!

I then got to thinking, "I am able to programmatically create and edit MARC records. I am able to bring up an online catalog. Koha supports OAI-PMH. I could create MARC records describing the content of the Distant Reader, import them into Koha, and ultimately have them become a part of WorldCat. Hmmm..." So I did.

Implementation

My first step was to create a virtual computer runing Ubuntu, because Ubuntu is the preferred flavor of Linux supported by Koha. I spun up a virtual computer at Digital Ocean. It has 2 cores, 4 GB of RAM, and 60 GB of disk space. Tiny, by my standards. This generates an ongoing personal monthly expense of something like $25.

The next step was to install Koha . This took practice; I had to destroy my virtual machine a few times, and I had to re-install Koha a few times, but all-in-all the process worked as advertised. Again, it was not difficult, it just took practice. I was able to get Koha installed in less than a few days. I could probably do it now in less than eight hours.

The third step was to add records to the catalog. This required me to first use Koha's administrative interface to create authorized terms for both local collections and data types. I then wrote a set of scripts to create MARC records from my cache of content. These scripts were written against curated databases describing: 1) etexts from Project Gutenberg, 2) PDF files from DOAJ journals, 3) articles on the topic of COVID from a data set called CORD-19, and 4) TEI files from a project call Early Print. In each case, I looped through the given database, read the desired metadata, and output MARC records amenable to Koha. At the end of this proces, I had created about .3 million records. The small sample of linked records exemplify the data I created. Simple. Rudimentary. Functional.

To actually load the records I wrote two tiny shell scripts -- both front-ends to Koha's bulkmarcimport.pl routine. The first front-end simply deletes records. Given a set of MARC records, the second front-end imports them. This importing process is very efficient. Read record. Parse it. Add parsed data to database. After a number of configured records have been added, add them to the index. Repeat for all records. Somebody really knew what they were doing when they wrote bulkmarcimport.pl

Usage

Now that records have been loaded and indexed, the catalog can be queried. For the most part, I use the advanced search interface for this purpose because I'm usually interested in searching within my set of collections. Search results are easily limited by facets. Detailed results point to the original/canonical items as well as the local cache. See the screen shots below:

./search.png
advanced search interface

./results.png
results page

./details.png
details page

What's even better is Koha's support for OAI-PMH. Just use Koha's administrative interface to turn OAI-PMH on, and the entire collection becomes available. My catalog's OAI-PMH data root is located at http://catalog.distantreader.org/cgi-bin/koha/oai.pl. Returning to OCLC, I updated my collection of repositories to include a pointer to the catalog's OAI-PMH root URL, and by the time you read this I believe I will have added my .3 million records to WorldCat.

Summary

The process of creating a traditional library catalog of Distant Reader content was easy: 1) spin up a virtual machine, 2) install Koha, 3) create/edit MARC records, 4) add them to Koha, 5) go to step #3. The process is never done. Finally, you can use the catalog at http://catalog.distantreader.org. It is not fast, but it is functional, very. Again, "Kudos to the Koha community!"

Student Note: ChatGPT Ate My Homework. Can LLMs Generate Compelling Case Briefs? / Harvard Library Innovation Lab

The Library Innovation Lab welcomes a range of research assistants and fellows to our team to conduct independently-driven research which intersects in some way to our core work.
The following is a reflection written by Chris Shen, a Research Assistant who collaborated with members of LIL in the spring semester of 2024. Chris is a sophomore at Harvard College studying Statistics and Government.

From poetry to Python, LLMs have the potential to drastically influence human productivity. Could AI also revolutionize legal education and streamline case understanding?

I think so.

A New Frontier

The advent of Large Language Models (LLMs), spearheaded by the release of OpenAI’s ChatGPT in late 2022, have prompted universities to adapt in order to responsibly harness their potential. Harvard instituted guidelines, requiring professors to include a “GPT policy” inside their syllabus.

As students, we read a ton. A quick look at the course catalog published by Harvard Law School (HLS) reveals that many classes require readings of up to 200 pages per week. This sometimes prompts students to turn to summarization tools as a way to help quickly digest content and expedite that process.

LLMs show promising summarization capabilities, and are increasingly used in that context.

Yet, while these models have shown general flexibility with handling various inputs, “hallucination” issues continue to arise, in which outputs generate or reference information that doesn’t exist. Researchers also debate the accuracy of LLMs as context windows continue to grow, highlighting potential mishaps in identifying and retaining important information in increasingly long prompts.

When it comes to legal writing, which is often extensive and detail-oriented, how do we go about understanding a legal case? How do we avoid hallucination and accuracy issues? What are the most important aspects to consider?

Most importantly, how can LLMs play a role in simplifying the process for students?


Initial Inspirations

In high school, I had the opportunity to intern at the law firm Hilton Parker LLC, where I drafted declarations, briefs, demand letters, and more. Cases ranged from personal injury, discrimination, wills and affairs, medical complications, and more. I sat in on depositions, met with clients, and saw the law first-hand, something few high schoolers experience.

Yet, no matter the case I got, one thing remained the same –– the ability to write well in a style I had never been exposed to before. But, before one can write, one must first read and understand.

Back when I was an intern, there was no ChatGPT, and I skimmed hundreds of cases by hand.

Therefore, when I found out that the Harvard Library Innovation Lab (LIL) was conducting research into harnessing LLMs to understand and summarize fundamental legal cases, I was deeply intrigued.

During my time at LIL, I have been researching a method to simplify that task, allowing students to streamline their understanding in a new and efficient way. Let’s dive in.


Optimal Outputs

I chose case briefs as the final product over other forms of summarization, like headnotes or legal blurbs, due to the standardized nature of case briefs. Writing case briefs is not explicitly taught to many, if not most law students, yet it is implicitly expected by law professors to keep up with the pace of courses during 1L.

While these briefs typically are not turned in, they are heavily relied upon during class to answer questions, engage in discussion, and offer analytical reflections. Even so, many students no longer write their own briefs, using cookie-cutter resources behind paywalled services like Quimbee, LexisNexis, and West-Law, or even student-run repositories such as TooDope.

This experiment dives into creating succinct original case briefs that contain the most important details of each case, and go beyond the scope of so-called “canned briefs”. But what does it take to write one in the first place?

There are typically 7 dimensions of a standard case brief offered by LexisNexis:

  • Facts (name of the case and its parties, what happened factually and procedurally, and the judgment)
  • Procedural History (what events within the court system led to the present case)
  • Issues (what is in dispute)
  • Holding (the applied rule of law)
  • Rationale (reasons for the holding)
  • Disposition (the current status or final outcome of the case)
  • Analysis (influence)

I used Open AI’s GPT-4 Turbo model preview (gpt-4-0125-preview) to experiment with a two-pronged approach to generate case briefs matching the above criteria. The first prompt was designed both as a vehicle for the full transcript of the court opinion to summarize and as a way of giving the model precise instructions on generating a case brief that reflects the 7 dimensions. The second prompt serves as an evaluation prompt, asking the model to evaluate its work and apply corrections as needed. These instructions were based on guidelines from Rutgers Law School and other sources.

When considering legal LLM summarization, another critical element is reproducibility. I don’t want a slight change in prompt vocabulary to alter the resulting output completely. I have observed that, before applying the evaluative prompt, case briefs would be disorganized or often random in the elements the LLM would produce. For example, information related to specific concurring or dissenting judges would be missed, analyses would be shortened, and inconsistent formatting would be prevalent. Sometimes even the most generic “Summarize this case” prompts would produce slightly better briefs!

However, an additional evaluative prompt now standardizes outputs and ensures critical details are captured. Below is a brief illustration of this process along with the prompts used.

Diagram: Two-prompt system for generating case briefs using an LLM. Diagram: Two-prompt system for generating case briefs using an LLM.

See: Initial and Evaluative prompts

Finally, after testing various temperature and max_token levels, I settled on the values 0.1 and 1500, respectively. I discovered that lower temperatures best suit the professional nature of legal writing, and a 1500 maximum output window allowed the LLM to produce all necessary elements of a case brief without including additional “fluff”.

Old vs. New

To test this apparatus, I picked five fundamental constitutional law cases from the SCOTUS that most 1L students are expected to analyze and understand. These include Marbury v. Madison (1803), Dred Scott v. Sandford (1857), Plessy v. Ferguson (1896), Brown v. Board of Education (1954), and Miranda v. Arizona (1966).

Results of each case brief are below.

Of course, I also tested the model on cases no LLM had ever seen before. This would ensure that our approach could still produce quality briefs past the knowledge cut-off for our model, which was December 2023 in this case. These include Trump v. Anderson (2024) and Lindke v. Freed (2024).

Results of each case brief are below, with attributes –– temperature = 0.1. max_bits = 1500.

Applying a critical eye to the case briefs, I see a successful adherence to structure and how the model has outputted case details consistently. There is also a clearly succinct tone that allows students to grasp core rulings and their significance without getting overrun with excessive details. This is particularly useful for discussion review and exam preparation. Further, I find the contextual knowledge presented, such as in Dred Scott v. Sandford, allow students to understand cases beyond mere fact and holding but also broader implications.

However, I also see limitations in the outputs. For starters, there is a lack of in-depth analysis, particularly for the concurring or dissenting opinions. Information on precedents used is skimmed over and there is a scarcity of substantive arguments presented. In the example of Marbury v. Madison, jurisdictional insights are also left out, which are vital for understanding the procedural and strategic decisions made in the case. Particularly for cases unknown to the model, there is evidence of speculative language that can occur due to incomplete information, prompt ambiguity, or other biases.

So, what’s next?

Moving forward, I’m excited to submit sample case briefs to law students and professors to receive comments and recommendations. Further, I plan to compare our briefs against “canned” ones from resources like Quimbee and gather external feedback on what makes them better or worse, where our advantage lies, and ultimately equip law students in effective and equitable ways.

Based on initial observations, I also see potential for users to interact with the model in more depth. Thought-provoking questions such as “How has this precedent changed over time?”, “What other cases are relevant to this one?”, “Would the resulting decision change in today’s climate?”, and more, will hopefully allow students to dive deeper into cases instead of just scratching the surface.

While I may still be early in the process, I firmly believe a future version of this tool could become a streamlined method of understanding cases, old and new.

I’d like to extend a special thank you for the contributions of Matteo Cargnelutti, Sankalp Bhatnagar, George He, and the rest of the Harvard LIL for their support and continued feedback throughout this journey.

#ODDStories 2024 @ Bouaké, Côte d’Ivoire 🇨🇮 / Open Knowledge Foundation

As part of Open Data Day 2024, the YouthMappersUAO chapter (Côte d’Ivoire), in collaboration with the Open Knowledge Foundation (OKFN), organized a water point mapping activity in the city of Bouaké called “Water Point Mapping Day”.  The event aimed to train and build the capacity of participants to collect data using Open Source tools, and to map water points in the city of Bouaké. The activity took place over two (2) days. Friday 08 and Saturday 09 March 2024 in Bouaké. On the first day, we were in the American Corner room of the University Alassane Ouattara de Bouaké (UAO), and on the second day, we were in the “cité forestière” in the town of Bouaké. The twenty (2O) participants were reminded of the context of the day: Open Data Day (ODD), the Open Knowledge Foundation, the prerogatives of the YouthMappers community in general, but also the vision of the YouthMappersUAO chapter. They were amazed by the ideology of Open Data, especially for our developing countries, but especially for students enrolled in operational research frameworks in various university courses.

Presentation of the YouthMappersUAO Chapter and Open Data Day by Victorien Kouadio, Interim President of the YouthMappersUAO Chapter

After the introductions, the day’s data collection tools were presented. Mardoché Azongnibo presented the two (2) applications to be used to carry out the activity. These were #Osmtracker and #KoboCollect. A practical session was organized for the two (2) applications. We were able to observe the applications’ settings, including GPS accuracy for good-quality coordinates. After this session, we went out into the field to collect data in groups of four (4) teams, with one team of 5 members.

Final instructions from Dr. Mardoché Azongnibo and departure for collection in the various zones

The day continued with the essential part of the activity, which focused on data collection. In practice, the town of Bouaké, with a surface area of 177,000 hectares, has been divided into four (4) sub-areas. Bouaké South-East, Bouaké South-West, Bouaké North-East and Bouaké North-West. Given the size of the city, we collected data in two (2) parts of the city: Bouaké South-East and Bouaké South-West.

The applications examined were put to good use when collecting data in the field. Participants were able to collect different types of water points throughout the day in the respective areas. From wells to streams, all were collected to update the map of water points in the city of Bouaké. A form edited in Kobotoolbox was used to categorize the different types of water points during the collection with KoboCollect.

A total of 163 water points were collected in Bouaké south. Each participant played an essential role in this collaborative activity.

Breakdown of water points collected and updated during the WaterPointMapping Day organized as part of Open Data Day

Most of the 163 water points collected were built by the community. 88 water points were built by the community, 27 by religious leaders, and 19 by religious leaders. The rest were built by the government and natural water points. Thanks to the commitment and willingness of all concerned, a significant amount of data has been collected, enabling a more accurate and comprehensive map of the water points in the southern part of the city to be created.

The balanced involvement of men and women demonstrates the importance of diversity and inclusion in such community initiatives.

Spatial distribution of water points by category

In conclusion, the event was a great success and provided valuable information in the field of water points. This accurate information will be used to create a participatory map of farmers’ access to water during periods of drought in Bouaké, to improve both the quantity and quality of the OpenStreetMap database. The event was greatly appreciated by the participants, who shared their experiences and knowledge, enabling the public to get involved and learn.


About Open Data Day

Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities.

As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date within a one-week period. In 2024, a total of 287 events happened all over the world between March 2nd-8th, in 60+ countries using 15 different languages.

All outputs are open for everyone to use and re-use.

In 2024, Open Data Day was also a part of the HOT OpenSummit ’23-24 initiative, a creative programme of global event collaborations that leverages experience, passion and connection to drive strong networks and collective action across the humanitarian open mapping movement

For more information, you can reach out to the Open Knowledge Foundation team by emailing opendataday@okfn.org. You can also join the Open Data Day Google Group to ask for advice or share tips and get connected with others.

The library beyond the library / HangingTogether

This post was co-authored with Rebecca Bryant and Richard Urban.

Image of a yellow traffic sign indicating merging, from two directions into one.“Center lanes merge” from Wikimedia Commons

Research libraries have changed radically over the past thirty years. The library of the past was primarily focused on managing an “outside-in” collection of externally purchased materials made available to local users. This was a well-understood role for the research library, and one that was recognized and valued by the library’s stakeholders, including university administrators, other campus units, and faculty and students. In carrying out this collections-focused mission, the library functioned more or less autonomously on campus as the primary provider of collections-related services. Of course, research libraries did act in collaboration with other libraries in supporting certain aspects of collection management, particularly resource sharing and cooperative cataloging. 

Today, libraries still manage important local collections for use chiefly by local users, but with less insularity and more connection to the network: think of shared print programs, collective collections, and the “inside-out” collection (e.g., digitized special collections, electronic theses and dissertations (ETDs), and research datasets). At the same time, the library has become increasingly engaged in the university research enterprise through an expanding array of research support services, assuming new responsibilities in areas such as institutional repositories, research data management (RDM), institutional reputation management through researcher profiles and research information management, and bibliometrics and research impact services. Activities in these areas are often closely aligned with and directly advance institutional needs and priorities, such as decision support.  

OCLC Research has documented these shifts through its research on collective collections, the evolving scholarly record, research support services, and more. This work has led us to two observations: 

  1. Libraries are increasingly engaged in partnerships with other units across campus in order to address new responsibilities in emerging areas of research support. 
  1. For many of these new responsibilities carried out in the context of cross-campus partnerships, the library role, contribution, and value proposition is not clearly defined or recognized by other campus stakeholders. 

In many instances, the partnerships libraries are forming with other units on campus are new, ad hoc, and sometimes experimental, and the roles, responsibilities, administrative organization, and even the partners involved are often in flux and vary from institution to institution. But we also observe examples of more formalized arrangements emerging (more on this below). Looking ahead, we expect that library engagement in these cross-campus partnerships will need to be accompanied by: 

  1. New operational structures that formalize and facilitate library engagement with other campus units to support the university research enterprise. 
  1. Clear articulations of the library value proposition as it is manifested within the context of these new operational structures. 

The emergence of these new operational structures and value propositions are the foundation of what we call the Library Beyond the Library. Research libraries are engaging in new operational structures that extend beyond the confines of library hierarchies. Through these new structures, libraries are projecting their skills, expertise, services, and roles beyond the library into the broader campus environment, in partnership with other parts of the institution. As libraries support institutional priorities through these new channels, they will need to find ways to communicate an increasingly complex value proposition to campus stakeholders who may be unfamiliar with the library’s new roles and responsibilities. 

The Library Beyond the Library conceptual model is closely aligned with our previous OCLC research on social interoperability. We define social interoperability as the creation and maintenance of working relationships across individuals and organizational units that promote collaboration, communication, and mutual understanding. In many ways, social interoperability is about strengthening the “people skills” needed to support robust cross-unit partnerships that increasingly involve the library. Our work on this topic highlighted the need for improved social interoperability between the library and other campus units in the context of deploying and sustaining research support services.  

But ad hoc cross-campus partnerships are maturing into new operational structures. In this sense, the Library Beyond the Library is an amplification of social interoperability, moving beyond personal relationships to more formal connections that can outlast the tenure of specific individuals, and moving beyond partnerships built on temporary, project-focused goals to more permanent arrangements that become part of the institution’s operational structure. 

The Library Beyond the Library is not about changes in internal library organizational structures. These have been evolving, too (see, for example, Ithaka S+R’s report on library organizational structures, which provides strong evidence of the expansion of library capacities and positions in research support). But there seems to be less recognition and documentation of evolving operational structures where library services and expertise extend beyond the library and across the campus enterprise, in collaboration with non-library units. Many research libraries will find these structures increasingly germane to carrying out their mission in a landscape of new roles, responsibilities, and institutional priorities.  

Navigating these changes effectively is an important strategic and risk management consideration for libraries: failure to do so may result in diminished resources, impact, and influence, with a value proposition that becomes increasingly opaque to the rest of the institution. In light of this, extending the library beyond the library is something that we not only observe, but also advise as a strategy for ensuring ongoing library visibility and impact.  

Although it is still early days, there are some examples where new operational structures involving the library and other campus units have emerged: 

  • The University of Waterloo Library has invested in a Bibliometrics and Research (BRI) Librarian who not only monitors institutional performance and provides analysis for institutional leaders, but also serves as the leader of a campus-wide community of practice around research analytics. Through this leadership role, the BRI librarian provides consultation and expert guidance to other campus units using research analytics tools, leveraging a new operational structure that engages other parts of the institution and extends library expertise and influence. 
  • Saskia Scheltjens, head of the Research Services department at the Rijksmuseum and chief librarian of the Rijksmuseum Research Library, joined that institution in 2016 to establish a new research services unit and combine several existing departments. The resulting research services unit is built around the research library, where digitized collections, digital scholarship, digital knowledge production and sharing, as well as digital learning and communications, act in unison with a world-famous physical collection. Saskia has described how “the library needed to be more than a library,” and it now sits at the center of a new “fundamental hybrid reality,” where the library extends its services and expertise beyond the traditional library collection.  
  • At the University of Manchester, the library is extending its role and leadership for research support with the establishment of a new Office of Open Research (OOR). This new unit supports institutional strategic goals to create a more open and responsible research environment, and the OOR website provides a single point of contact for researchers to connect with services provided not only by the library but by other units on campus. The library is positioned at the center—and as a leader—of campus open research activities. While Manchester seems to be the first UK institution with this type of Open Research unit, other institutions are moving in a similar direction: for example, Sheffield University has also been recently been recruiting for a director to lead a new Office of Open Research and Scholarship. 
  • At Montana State University, a new Research Alliance, composed of both library and non-library research support units, is collocated in the library. This partnership includes non-library units in research development, research cyberinfrastructure, and undergraduate research, in addition to library scholarly communications and data management offerings. Each unit retains its place in the existing campus hierarchy, but the library is operationally positioned as the hub of research support for the institution. 
  • At the University of Illinois Urbana-Champaign, the library manages a research information management system (RIMS) that is financially supported by the Office of the Vice Chancellor for Research. By managing a registry of the institutional scholarly record, the library extends its expertise with bibliographic metadata to manage not just library collections, but also faculty profiles, patents, honors, research facilities and equipment, and more, combining data maintained by other campus stakeholders to create a knowledge graph that can inform enterprise-level strategic directions, make expertise discoverable, and support institutional reputation management.

These examples reflect the two key characteristics of the Library Beyond the Library conceptual model:  

  • The partnerships in which the library engages to provide research support services have been formalized into new operational structures that combine the capacities of library and non-library units. A novel operational configuration was created that transcends traditional administrative boundaries, and reflects the array of units around campus contributing toward provision of the services – including the library.  
  • The new units closely connect library value propositions with institutional priorities. For example, Manchester’s Office of Open Research emphasizes that “[t]he University supports the principles of Open Research and researchers are encouraged to apply these throughout the research lifecycle. While engagement with the principles is voluntary, the University expects researchers to act in accordance with funder mandates.” Similarly, Montana State’s Research Alliance makes clear that it brings together units around campus for the purpose of “working together to support and increase the excellence of the university’s research enterprise.” The library’s contribution to these units is surfaced in light of key institutional priorities. 

The Library Beyond the Library is the focus of a new research project at OCLC. Our goal is to describe and illustrate these key changes in library operational structures and value proposition through models and examples. We will also provide an assessment of future directions for libraries regarding these changes, and where possible, suggest gaps and opportunities for data, tools, and other types of operational infrastructure. 

This work builds upon past research at OCLC related to research support (especially research data management) where we have observed the trends underpinning the Library Beyond the Library. But we believe that the main ideas – cross-campus partnerships formalized into new operational structures, along with new articulations of the library’s value proposition – can be extended to other areas of strategic interest to libraries as well.  

To inform our work, we are convening an invitational discussion as part of the OCLC Research Library Partnership (RLP) Leadership Roundtable on Research Support during the week of 17 June, where RLP affiliates will discuss how their libraries are collaborating with other campus stakeholders to provide research support services. Participants have been asked to consider:   

  • How are the library’s research support services evolving in response to university priorities? 
  • How is your library partnering with other campus stakeholders to achieve institutional and library goals in the research support space? 
  • Have cross-campus partnerships in research support led to, or will they lead to, new operational structures? 

RLP Leadership Roundtables provide an opportunity for partner institutions to share information and benchmark services and goals while providing OCLC Research with information to synthesize and share with the broader library community. Participants must be nominated by the RLP institutional partner representative. The RLP Leadership Roundtable on Research Support first convened in March, to discuss current practices and challenges in the provision of bibliometric and research impact services. This gathering was attended by 51 individuals from 33 RLP member institutions in four countries, and highlights from the discussion were synthesized in a recent post.  

We encourage participation from all RLP partner institutions in this upcoming discussion, which will help us refine and expand the ideas in this post as we continue to explore what they mean for libraries and their futures. As with the previous roundtable, we will synthesize the conversation in a blog post for the broader library community. If you have questions about nominations or participation, please contact Rebecca Bryant. We hope to see you there!  

The post The library beyond the library appeared first on Hanging Together.

Running Song of the Day / Eric Hellman

(I'm blogging my journey to the 2024 New York Marathon. You can help me get there.)

Steve Jobs gave me back my music. Thanks Steve!

I got my first iPod a bit more than 20 years ago. It was a 3rd generation iPod, the first version with an all-touch control. I loved that I could play my Bruce, my Courtney, my Heads and my Alanis at an appropriate volume without bothering any of my classical-music-only family. Looking back on it, there was a period of about five years when I didn't regularly listen to music. I had stopped commuting to work by car, and though commuting was no fun, it had kept me in touch with my music. No wonder those 5 years were such a difficult period of my life!

Today, my running and my music are entwined. My latest (and last 😢) iPod already has some retro cred. It's a 6th generation iPod Nano. I listen to to my music on 90% of my runs and 90% of my listening is on my runs. I use shuffle mode so that over the course of a year of running, I'll listen to 2/3 of my ~2500 song library. In 2023, I listened to 1,723 songs. That's a lot of running!

Yes, I keep track. I have a system to maintain a 150 song playlist for running. I periodically replace all the songs I've heard in the most recent 2 months (unless I've listened to the song less than 5 times - you need at least that many plays to become acquainted with a song!) This is one of the ways I channel certain of my quirkier programmerish tendencies so that I project as a relatively normal person. Or at least I try.

Last November, I decided to do something new (for me). I made a running playlist! Carefully selected to have the right cadence and to inspire the run! It was ordered to have to have particular songs play at appropriate points of the Ashenfelter 8K  on Thanksgiving morning. It started with "Born to Run" and ended with either "Save it for Later", "Breathless" or "It's The End Of The World As We Know It", depending on my finishing time. It worked OK. I finished with Exene. I had never run with a playlist before.

1. "Born to Run". 2. "American Land". The first part of the race is uphill, so an immigrant song seemed appropriate. 3. "Wake Up" - Arcade Fire. Can't get complacent. 4. "Twist & Crawl - The Beat. The up-tempo pushed me to the fastest part of the race. 5. "Night". Up and over the hill. "you run sad and free until all you can see is the night".  6. "Rock Lobster" - B-52s. The perfect beats per minute.  7. "Shake It Up" - Taylor Swift. A bit of focused anger helps my energy level. 8. "Roulette". Recommended by the Nuts, and yes it was good. Shouting a short lyric helps me run faster. 9. "Workin' on the Highway". The 4th mile of 5 is the hardest, so "all day long I don't stop". 10. "Your Sister Can't Twist" - Elton John. A short nasty hill. 11. "Save it for Later" - The Beat. I could run all day to this, but "sooner or later your legs give way, you hit the ground." 12. "Breathless" - X. If I had hit my goal of 45 minutes, I would have crossed the finish as this started, but I was very happy with 46:12. and a 9:14 pace. 13. "It's The End Of The World As We Know It" - R.E.M. 48 minutes would not have been the end of the world, but I'd feel fine.

Last year, I started to extract a line from the music I had listened to during my run to use as the Strava title for the run. Through September 3, I would choose a line from a Springsteen song (he had to take a health timeout after that). For my New Year's resolution, I promised to credit the song and the artist in my run descriptions as well.

I find now that with many songs, they remind me of the place where I was running when I listened to them. And running in certain places now reminds me of particular songs. I'm training the neural network in my head. I prefer to think of it as creating a web of connections, invisible strings, you might say, that enrich my experience of life. In other words, I'm creating art. And if you follow my Strava, the connections you make to my runs and my songs become part of this little collective art project. Thanks!


Reminder: I'm earning my way into the NYC Marathon by raising money for Amref. 

This series of posts:


The Great MEV Heist / David Rosenthal

The Department of Justice indicted two brothers for exploiting mechanisms supporting Ethereum's "Maximal Extractable Value" (MEV). Ashley Berlanger's MIT students stole $25M in seconds by exploiting ETH blockchain bug, DOJ says explains:
Anton, 24, and James Peraire-Bueno, 28, were arrested Tuesday, charged with conspiracy to commit wire fraud, wire fraud, and conspiracy to commit money laundering. Each brother faces "a maximum penalty of 20 years in prison for each count," the DOJ said.

The alleged scheme was launched in December 2022 by the brothers, who studied at MIT, after months of planning, the indictment said. The pair seemingly relied on their "specialized skills" and expertise in crypto trading to fraudulently gain access to "pending private transactions" on the blockchain, then "used that access to alter certain transactions and obtain their victims’ cryptocurrency," the DOJ said
Below the fold I look into the details of the exploit as alleged in the indictment, and what it suggests about the evolution of Ethereum.

Background

Lets start with some history. The key issue with MEV is that the architecture of decentralized cryptocurrencies enables a form of front-running, which Wikipedia defines thus:
Front running, also known as tailgating, is the prohibited practice of entering into an equity (stock) trade, option, futures contract, derivative, or security-based swap to capitalize on advance, nonpublic knowledge of a large ("block") pending transaction that will influence the price of the underlying security. ... A front running firm either buys for its own account before filling customer buy orders that drive up the price, or sells for its own account before filling customer sell orders that drive down the price. Front running is prohibited since the front-runner profits from nonpublic information, at the expense of its own customers, the block trade, or the public market.
Note that the reason it is illegal in these markets is that, at the time the front-runner enters their order, the customer's order is known only to them. It is thus "material non-public information". Arguably, high-frequency traders front-run by placing their computers so close to the market's computers that the information about orders on which they trade has not in practice had time to "become public".

I wrote about front-running in cryptocurrencies, describing how it was different, in 2020's The Order Flow:
In order to be truly decentralized, each miner must choose for itself which transactions to include in the next block. So there has to be a pool of pending transactions visible to all miners, and thus to the public. It is called the mempool. How do miners choose transactions to include? Each transaction in the pool contains a fee, payable to the miner who includes it. Miners are coin-operated, they choose the transactions with the highest fees. The mempool concept is essential to the goal of a decentralized, trustless cryptocurrency.
Source
The pool of pending transactions is public, thus front-running is arguably legal and anyone can do it by offering a larger fee. Ethereum's block time is 12 seconds, plenty of time for bots to find suitable transactions in the mempool. It normally contains a lot of pending transactions. Ethereum is currently processing about 1.12M transactions/day (46.7K/hr) and there are around 166K pending transactions, or about 3.6 hours worth. Bitcoin is processing about 700K transactions/day and there are normally around 100K transactions in the mempool, or 3.5 hours worth.

Arguably, this is analogous to high-frequency trading, not front-running by brokers. In The Order Flow I recount how the prevalence of high-frequency trading led institutions to set up dark pools:
When conventional “lit” markets became overrun with HFT bots, investment banks offered large investors “dark pools” where they could trade with each other without the risk of being front-run by algos. But Barclays allowed HFT bots into its dark pool, where they happily front-run unsuspecting investors who thought they were safe. Eventually Barclays was caught and forced to drain its dark pool. In 2016, it was fined $70 million for fraud. It was not the only large bank that accepted money from large investors to protect them from HFT bots and money from HFT traders to allow them access to the investors it was supposed to be protecting.
The Order Flow was in large part sparked by two accounts of attempts to avoid being front-run:
  • Ethereum is a Dark Forest by Dan Robinson and Georgios Konstantopoulos:
    In the Ethereum mempool, these apex predators take the form of “arbitrage bots.” Arbitrage bots monitor pending transactions and attempt to exploit profitable opportunities created by them. No white hat knows more about these bots than Phil Daian, the smart contract researcher who, along with his colleagues, wrote the Flash Boys 2.0 paper and coined the term “miner extractable value” (MEV).

    Phil once told me about a cosmic horror that he called a “generalized frontrunner.” Arbitrage bots typically look for specific types of transactions in the mempool (such a DEX trade or an oracle update) and try to frontrun them according to a predetermined algorithm. Generalized frontrunners look for any transaction that they could profitably frontrun by copying it and replacing addresses with their own.
    Their attempt to rescue about $12K failed because they didn't know a miner and thus couldn't avoid the dark forest in the mempool, where a front-runner bot found it.
  • And Escaping the Dark Forest, Samczsun's account of how:
    On September 15, 2020, a small group of people worked through the night to rescue over 9.6MM USD from a vulnerable smart contract.
    The key point of Samczsun's story is that, after the group spotted the vulnerability and built a transaction to rescue the funds, they could not put the rescue transaction in the mempool because it would have been front-run by a bot. They had to find a miner who would put the transaction in a block without it appearing in the mempool. In other words, their transaction needed a dark pool. And they had to trust the cooperative miner not to front-run it.

    Ths attempt succeeded because they did know a miner.
Reading both is essential to understand how adversarial the Ethereum environment is.

The 2019 paper that published the MEV concept was Flash Boys 2.0: Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges by Philip Daian et 7 al:
In this work, we explain that DEX [decentralized exchanges] design flaws threaten underlying blockchain security. We study a community of arbitrage bots that has arisen to exploit DEX flaws. We show that these bots exhibit many similar market-exploiting behaviors— frontrunning, aggressive latency optimization, etc.—common on Wall Street, as revealed in the popular Michael Lewis expose´ Flash Boys. We explore the DEX design flaws that spawned arbitrage bots, measure and model these bots’ behavior, and illuminate systemic smart-contract ecosystem risks implied by our observations.
Daian and co-authors describe five pathologies: Pure revenue opportunities, Priority gas auctions (PGAs), Miner-extractable value (MEV), Fee-based forking attacks, and Time-bandit attacks. Their results find two surprises:
First, they identify a concrete difference between the consensus-layer security model required for blockchain protocols securing simple payments and those securing smart contracts. In a payment system such as Bitcoin, all independent transactions in a block can be seen as executing atomically, making ordering generally unprofitable to manipulate. Our work shows that analyses of Bitcoin miner economics fail to extend to smart contract systems like Ethereum, and may even require modification once second-layer smart contract systems that depend on Bitcoin miners go live.

Second, our analysis of PGA games underscores that protocol details (such as miner selection criteria, P2P network composition, and more) can directly impact application-layer security and the fairness properties that smart contracts offer users. Smart contract security is often studied purely at the application layer, abstracting away low-level details like miner selection and P2P relayers’ behavior in order to make analysis tractable ... Our work shows that serious blind spots result. Low-level protocol behaviors pose fundamental challenges to developing robust smart contracts that protect users against exploitation by profit-maximizing miners and P2P relayers that may game contracts to subsidize attacks
Because it promised profits, MEV became the topic of a lot of research. By 2022, in Miners' Extractable Value I was able to review 10 papers about it.

Then came Ethereum's transition to Proof-of-Stake. As usual, Matt Levine provides a lucid explanation of the basics:
How does the blockchain decide which transactions to record, and in what order? In Ethereum, the answer is: with money. People who want to do transactions on the Ethereum network pay fees to execute the transactions; there is a flat base fee, but people can also bid more — a “priority fee” or “tip” — to get their transactions executed quickly. Every 12 seconds, some computer on the Ethereum network is selected to record the transactions in a block. This computer used to be called a “miner,” but in current proof-of-stake Ethereum blocks are recorded by computers called “validators.” Each block is compiled by one validator, selected more or less at random, called a “proposer”; the other validators vote to accept the block. The validators share the transaction fees, with the block proposer getting more than the other validators.

The block proposer will naturally prioritize the transactions that pay more fees, because then it will get more money. And, again, the validators are all computers; they will be programmed to select the transactions that pay them the most money. And in fact there is a division of labor in modern Ethereum, where a computer called a “block builder” puts together a list of transactions that will pay the most money to the validators, and then the block proposer proposes a block with that list so it can get paid.
Levine then gets into the details:
I am giving a simplistic and somewhat old-fashioned description of MEV, and modern Ethereum has a whole, like, institutional structure around it. There are private mempools, where you can hide transactions from bots. There is Flashbots, “a research and development organization formed to mitigate the negative externalities posed by Maximal Extractable Value (MEV) to stateful blockchains, starting with Ethereum,” which has things like MEV-Boost, which creates “a competitive block-building market” where validators can “maximize their staking reward by selling their blockspace to an open market,” and MEV-Share, “an open-source protocol for users, wallets, and applications to internalize the MEV that their transactions create,” letting them “selectively share data about their transactions with searchers who bid to include the transactions in bundles” and get paid.

What Is Alleged?

We have two explanations of what the brothers are alleged to have done, one from the DoJ's indictment and one from Flashbots, whose MEV-Boost software was exploited.

Dept. of Justice

The DoJ's indictment explains MEV-Boost:
  1. “MEV-Boost” is an open-source software designed to optimize the block-building process for Ethereum validators by establishing protocols for how transactions are organized into blocks. Approximately 90% of Ethereum validators use MEV-Boost.
  2. Using MEV-Boost, Ethereum validators outsource the block-building process to a network of “searchers,” “builders,” and “relays.” These participants operate pursuant to privacy and commitment protocols designed to ensure that each network participant—the searcher, the builder, and the validator—interacts in an ordered manner that maximizes value and network efficiency.
  3. A searcher is effectively a trader who scans the public mempool for profitable arbitrage opportunities using automated bots (“MEV Bots”). After identifying a profitable opportunity (that would, for example, increase the price of a given cryptocurrency), the searcher sends the builder a proposed “bundle” of transactions. following transactions in a precise order: The bundle typically consists of the (a) the searcher’s “frontrun” transaction, in which the searcher purchases some amount of cryptocurrency whose value the searcher expects to increase; (b) the pending transaction in the mempool that the MEV Bot identified would increase the price of that cryptocurrency; and (c) the searcher’s sell transaction, in which the searcher sells the cryptocurrency at a higher price than what the searcher initially paid in order to extract a trading profit. A builder receives bundles from various searchers and compiles them into a proposed block that maximizes MEV for the validator. The builder then sends the proposed block to a “relay.” A relay receives the proposed block from the builder and initially only submits the “blockheader” to the validator, which contains information about, among other things, the payment the validator will receive for validating the proposed block as structured by the builder. It is only after the validator makes this commitment through a digital signature that the relay releases the full content of the proposed block (i.e. — the complete ordered transaction list) to the validator.
  4. In this process, a relay acts in a manner similar to an escrow account, which temporarily maintains the otherwise private transaction data of the proposed block until the validator commits to publishing the block to the blockchain exactly as ordered. The relay will not release the transactions within the proposed block to the validator until the validator has confirmed through a digital signature that it will publish the proposed block as structured by the builder to the blockchain. Until the transactions within the proposed block are released to the validator, they remain private and are not publicly visible.
Note the importance of the relay maintaining the privacy of the transactions in the proposed block.

The indictment summarizes how the brothers are alleged to have stolen $25M:
  1. ANTON PERAIRE-BUENO and JAMES PERAIRE-BUENO took the following steps, among others, to plan and execute the Exploit: (a) establishing a series of Ethereum validators in a manner that concealed their identities through the use of shell companies, intermediary cryptocurrency addresses, foreign exchanges, and a privacy layer network; (b) deploying a series of test transactions or “bait transactions” designed to identify particular variables most likely to attract MEV Bots that would become the victims of the Exploit (collectively the “Victim Traders”); (c) identifying and exploiting a vulnerability in the MEV-Boost relay code that caused the relay to prematurely release the full content of a proposed block; (d) re-ordering the proposed block to the defendants’ advantage; and (e) publishing the re-ordered block to the Ethereum blockchain, which resulted in the theft of approximately $25 million in cryptocurrency from the Victim Traders.
The indictment adds:
  1. Tampering with these established MEV-Boost protocols, which are relied upon by the vast majority of Ethereum users, threatens the stability and integrity of the Ethereum blockchain for all network participants.
This statement has attracted attention. Why should the DoJ care about "the stability and integrity of the Ethereum blockchain"? Note that the brothers are not charged with this, the indictment has three counts:
  1. Wire fraud, Title 18, United States Code, Section 1349.
  2. Wire fraud, Title 18, United States Code, Sections 1343 and 2.
  3. Conspiracy to Commit Money Laundering, Title 18, United States Code, Section 1956(a)(1)(B)(i).
The steps in 11-14 are charged as wire fraud. The indictment then goes on to detail the steps they are alleged to have taken to launder the loot, leading to the money laundering charge.

Flashbots

Flashbots' explanation starts by explaining the role of a relay:
mev-boost works through a commit and reveal scheme where proposers commit to blocks created by builders without seeing their contents, by signing block headers. Only after a block header is signed are the block body and corresponding transactions revealed. A trusted third party called a relay facilitates this process. mev-boost is designed to allow block builders to send blocks that contain valuable MEV to validators without having to trust them. Removing the need for builders to trust validators ensures that every validator has equal access to MEV regardless of their size and is critical for ensuring the validator set of Ethereum remains decentralized.
Notice the traditional cryptocurrency gaslighting about "trustlessness" and "decentralization" in that paragraph:
  • It is true that by introducing a relay they have eliminated the need to trust the validators, but they have done so by introducing "a trusted third party called a relay". The exploit worked because the third party violated its trust. They would likely argue that, unlike the validators, the relay lacks financial incentives to cheat. But a malign relay could presumably also play the role of the malign proposer in the exploit.
  • ETH 5/21/24
    Saying "the validator set of Ethereum remains decentralized" implies that it is decentralized. It is certainly good that the switch to Proof-of-Stake has increaed Ethereum's Nakamoto coefficient from 2-3 to 5-6, as I pointed out last month in "Sufficiently Decentralized":
    A year ago the top 5 staking pools controlled 58.4%, now they control 44.7% of the stakes. But it is still true that block production is heavily centralized, with one producer claiming 57.9% of the rewards.
    But a Nakamoto coefficient of 6 isn't very decentralized. Further, this misses the point revealed by the brothers' exploit. With about 55% of execution clients running Geth and around 90% of validators trusting MEV-Boost's relaying, just to take two examples, the software stack is extremely vulnerable to bugs and supply chain attacks.
Flashbots then explain the bug the brothers exploited:
The attack on April 3rd, 2023 was possible because the exploited relay revealed block bodies to the proposer so long as the proposer correctly signed a block header. However, the relay did not check if the block header that was signed was valid. In the case that the block header was signed but invalid, the relay would attempt to publish the block to the beacon chain, where beacon nodes would reject it. Crucially, regardless of whether the block was rejected by beacon nodes or not, the relay would still reveal the body to the proposer.

Having access to the block body allowed the malicious proposer to extract transactions from the stolen block and use them in their own block where it could exploit those transactions. In particular, the malicious proposer constructed their own block that broke the sandwich bots’ sandwiches up and effectively stole their money.
Then they explain the mitigation:
Usually, proposers publishing a modified block would not only equivocate but their new block would have to race the relay block - which has a head start - to acquire attestations for the fork choice rule. However, in this case, the relay was not able to publish a block because the proposer returned an invalid block header. Therefore, the malicious proposer’s new block was uncontested and they won the race automatically. This has been addressed by requiring the relay to successfully publish a block, thereby not sharing invalid blocks with proposers. The mitigations section covers this and future looking details at more length.
By "equivocate" they mean proposing more than one block in a time slot. Validators responsibilities are:
The validator is expected to maintain sufficient hardware and connectivity to participate in block validation and proposal. In return, the validator is paid in ETH (their staked balance increases). On the other hand, participating as a validator also opens new avenues for users to attack the network for personal gain or sabotage. To prevent this, validators miss out on ETH rewards if they fail to participate when called upon, and their existing stake can be destroyed if they behave dishonestly. Two primary behaviors can be considered dishonest: proposing multiple blocks in a single slot (equivocating) and submitting contradictory attestations.

Discussion

Matt Levine covered this case in Crypto Brothers Front-Ran the Front-Runners by focusing on front-running:
There is a sort of cool purity to this. In stock markets, some people are faster than others, and can make money by trading ahead of a big order, and people get mad about this and think it is unfair and propose solutions. And when money changes hands for speed advantages — “payment for order flow,” “colocation” — people complain about corruption. In crypto it’s like “let’s create an efficient market in trading ahead of big orders.” I once wrote: “Rather than solve this concern about traditional markets, crypto made it explicit.” That feels almost like a general philosophy of crypto: Take the problems of traditional finance and make them, worse, sure, but more transparent and visible and explicit and subject to unbridled free markets.
And then casting the brothers' actions as front-running:
Ethereum and its decentralized exchanges have a market structure that is like “bots can look at your transactions and front-run them if that’s profitable.” And these guys, allegedly, front-ran the front-runners; they turned the market structure around so that they could get an early look at the front-running bots’ front-running transactions and front-run them instead. By hacking, sure, sure, it’s bad. But it leaves the Justice Department in the odd position of saying that the integrity of crypto front-running is important and must be defended.
I think Levine is wrong here. Just as with high-frequency trading, "crypto front-running" is legal because it uses public information. The brothers were not indicted for front-running. What is illegal, and what the DoJ is alleging, is trading on "material non-public informatiion", which they obtained by wire fraud (a fraudulent signature). The indictment says:
this False Signature was designed to, and did, trick the Relay to prematurely release the full content of the proposed block to the defendants, including the private transaction information.
The DoJ is not defending the "integrity of crypto front-running", it is prosecuting activity that is illegal in all markets.

The next day Levine made the first of two clarifications:
First, though I described the exploit as “front-running the front-runners,” I do want to be clear that it was not just that. This is not a pure case of (1) submitting spoofy orders to bait front-running bots, (2) having them take the bait and (3) finding some trade to make them lose money. (There are prior examples of that, using oddly structured tokens to make the front-runners lose money.) Here, though, the brothers are accused of something closer to hacking, exploiting a weakness in software code to be able to see (and reorder) a series of transactions that was supposed to be kept confidential from them. That is worse; it’s sort of like the difference between (1) putting in spoof orders on the stock exchange to try to trick a high-frequency trading firm and (2) hacking into the stock exchange’s computer system to reverse the HFT firm’s trades. Even if you think that the front-running bots are bad, you might — as the Justice Department does — object to this approach to punishing them.
Exactly. Levine's second clarification was:
Second, I said that “they exploited a bug in Ethereum” to do this, but that’s not quite right. They exploited a bug in Flashbots’ MEV-Boost, open-source block-building software that “approximately 90% of Ethereum validators use” but that is not part of the core Ethereum system itself. (Here is Flashbots’ explanation.) They exploited a bug in how blocks are normally built and proposed on Ethereum. From the names “Flashbots” and “MEV-Boost,” though, you might get some sense of why the case is controversial. The way that blocks are normally built and proposed on Ethereum involves “maximal extractable value” (MEV), where arbitrage traders bid to pay validators for priority to make the most profitable trades. These brothers hacked that system, but not everyone likes that system, because it involves predatory traders front-running more naive traders.

This is also important because, as one reader commented: “A a crucial distinguishing factor here is that James and Anton did not re-order committed transactions; they instead picked an ordering of pending transactions that were favorable to them. Under this lens, the integrity of the blockchain is not compromised; the network explicitly ‘allows’ validators to pick whatever arbitrary ordering of transactions they like; it's just that generally it’s economically favorable for validators to prioritize transactions which pay them the most first.”
Part of Satoshi Nakamoto's genius in designing Bitcoin was that he observed KISS, the important software mantra Keep It Simple, Stupid. The Bitcoin blockchain does only one thing, maintain a ledger of transactions. So it the Bitcoin ecosystem has evolved very slowly, and has been remarkably free of vulnerabilities over the last decade and a half. Ethereum, on the other hand, is a Turing-complete environment that does whatever the users want it to. So over the last less than a decade the Ethereum ecosystem has evolved much faster, accreting complexity and thus vulnerabilities.

Look at Molly White's Web3 is Going Just Great. It is full of exploits of "smart contracts" such as "decentralized exchanges" and "bridges". Try searching for "bitcoin". You only find it in the context of the amuonts raided. It is precisely the fecundity of the Ethereum's programmability that leads to an ecosystem full of buggy code vulnerable to exploits such as the MEV-Boost one.

Daniel Kuhn's What the DOJ’s First MEV Lawsuit Means for Ethereum also discusses the details of the case:
“They used a flaw in MEV boost to push invalid signatures to preview bundles. That gives an unfair advantage via an exploit,” former employee of the Ethereum Foundation and Flashbots Hudson Jameson told CoinDesk in an interview. Jameson added that the Peraire-Bueno brothers were also running their own validator while extracting MEV, which violates something of a gentleman’s agreement in MEV circles.

“No one else in the MEV ecosystem was doing both of those things at once that we know of,” he added. “They did more than just play by both the codified and pinky promise rules of MEV extraction.”
The "gentleman's agreement" is important, because what the brothers were doing creates a conflict of interest, the kind that the SEC frowns upon.

Kuhn quotes Consensys General Counsel Bill Hughes:
“All of the defendants' preparation for the attack and their completely ham-fisted attempts to cover their tracks afterwards, including extensive incriminating google searches, just helps the government prove they intended to steal. All that evidence will look very bad to a jury. I suspect they plead guilty at some point,”
He also discusses a different reaction in the cryptosphere:
MEV, which itself is controversial, can be a highly lucrative game dominated by automated bots that often comes at blockchain users’ expense, which is partially why so many in the crypto community have rushed to denounce the DOJ’s complaint.
...
Still, others remain convinced that exploiting MEV bots designed to reorder transactions is fair game. “It's a little hard to sympathize with MEV bots and block builders getting f*cked over by block proposers, in the exact same way they are f*cking over end users,” the anonymous researcher said.
Kuhn quotes Hudson Jameson:
Jameson, for his part, said the MEV is something the Ethereum community should work to minimize on Ethereum, but that it’s a difficult problem to solve. For now, the process is “inevitable.”

“Until it can be eliminated, let's study it. Let's illuminate it. Let's minimize it. And since it does exist, let's make it as open as possible for anyone to participate with the same rules,” he said.
Jameson is wrong in suggesting that MEV could be eliminated. It is a consequence of the goal of decentralizing the system. Even the mechanism in place for "anyone to participate with the same rules" requires a trusted third party.

Using a Proposed Library Guide Assessment Standards Rubric and a Peer Review Process to Pedagogically Improve Library Guides: A Case Study / In the Library, With the Lead Pipe

In Brief

Library guides can help librarians provide information to their patrons regarding their library resources, services, and tools. Despite their perceived usefulness, there is little discussion in designing library guides pedagogically by following a set of assessment standards for a quality-checked review. Instructional designers regularly use vetted assessment standards and a peer review process for building high-quality courses, yet librarians typically do not when designing library guides. This article explores using a set of standards remixed from SUNY’s Online Course Quality Review Rubric or OSCQR and a peer review process. The authors used a case study approach to test the effectiveness of building library guides with the proposed standards by tasking college students to assess two Fake News guides (one revised to meet the proposed standards). Results indicated most students preferred the revised library guide to the original guide for personal use. The majority valued the revised guide for integrating into a learning management system and perceived it to be more beneficial for professors to teach from. Future studies should replicate this study and include additional perspectives from faculty and how they perceive the pedagogical values of a library guide designed following the proposed rubric.

A smiling librarian assists a student who is sitting at a computer located within the library.

Image: “Helpful”. Digital image created with Midjourney AI. By Trina McCowan CC BY-NC-SA 4.0

Introduction

Library guides or LibGuides are a proprietary publishing tool for libraries and museums created by the company Springshare; librarians can use LibGuides to publish on a variety of topics centered around research (Dotson, 2021; Springshare, n. d.). For consistency, the authors will use the term library guides moving forward. Librarians can use Springshare’s tool to publish web pages to educate users on library subjects, topics, procedures, or processes (Coombs, 2015). Additionally, librarians can work with teaching faculty to create course guides that compile resources for specific classes (Berić-Stojšić & Dubicki, 2016; Clever, 2020). According to Springshare (n. d.), library guides are widely used by academic, museum, school, and public libraries; approximately 130,000 libraries worldwide use this library tool (Springshare, n. d.). The library guides’ popularity and continued use may stem from their ease of use as it eliminates the need to know a coding language to develop online content. (Bergstrom-Lynch, 2019).

Baker (2014) described library guides as the “evolutionary descendants of library pathfinders” (p. 108). The first pathfinders were paper brochures that provided suggested resources for advanced research. Often, librarians created these tools for the advanced practitioner as patrons granted access to the library were researchers and seasoned scholars. As the end users were already experts, there was little need for librarians to provide instruction for using the resources (Emanuel, 2013). Later, programs such as MIT’s 1970s Project Intrex developed pathfinders that presented students with library resources in their fields of interest (Conrad & Stevens, 2019). As technology advanced, librarians created and curated pathfinders for online access (Emanuel, 2013). 

Today, due to the modernization of pathfinders as library guides and their ease of discoverability, students and unaffiliated online users often find these guides without the assistance of a librarian (Emanuel, 2013). Search engines such as Google can extend a library guide’s reach far beyond a single institutional website, drawing the attention of information experts and novice internet users alike (Brewer et al., 2017; Emanuel, 2013; Lauseng et al., 2021). This expanded access means a librarian will not always be present to help interpret and explain the library guide’s learning objectives. Stone et al. (2018) state that library guides should be built using pedagogical principles “where the guide walks the student through the research process” (p. 280). Bergstrom-Lynch (2019) argues that there has been an abundant focus on user-centered library design studies but little focus on learning-centered design. Bergstrom-Lynch (2019) advocates for more attention directed to learning-centered design principles as library guides are integrated into Learning Management Systems (LMS) such as Canvas and Blackboard (Berić-Stojšić & Dubicki, 2016; Bielat et al., 2013; Lauseng et al., 2021) and can be presented as a learning module for the library (Emanuel, 2013; Mann et al., 2013). The use of library guides as online learning and teaching tools is not novel; however, their creation and evaluation using instructional design principles are a recent development (Bergstrom-Lynch, 2019). 

A core component of an instructional designer’s job is to ensure that online course development meets the institution’s standards for quality assurance (Halupa, 2019). Instructional designers can aid with writing appropriate course and learning objectives and selecting learning activities and assessments that align back to the module’s objectives. Additionally, they can provide feedback on designing a course that is student-friendly—being mindful of cognitive overload, course layout, font options, and color selection. Additionally, instructional designers are trained in designing learning content that meets accessibility standards (Halupa, 2019).

Instructional design teams and teaching faculty can choose from a variety of quality assurance standards rubrics to reference to ensure that key elements for online learning are present in the online course environment. Examples of quality assurance tools include Quality Matters (QM) Higher Education Rubric and SUNY’s Online Course Quality Review Rubric or OSCQR, a professional development course refreshment process with a rubric (Kathuria & Becker, 2021; OSCQR-SUNY, n.d.). QM is a not-for-profit subscribing service that provides education on assessing online courses through the organization’s assessment rubric of general and specific standards (Unal & Unal, 2016). The assessment process is a “collegial, faculty-driven, research-based peer review process…” (Unal & Unal, 2016, p. 464). For a national review, QM suggests three reviewers certified and trained with QM to conduct a quality review. There should be a content specialist and one external reviewer outside of the university involved in the process (Pickens & Witte, 2015). Some universities, such as the University of North Florida, submit online courses for a QM certificate with High-Quality recognition or an in-house review based on the standards earning High-Quality designation. For an in-house review at UNF, a subject matter expert, instructional designer, and trained faculty reviewer assess the course to provide feedback based on the standards (CIRT, “Online Course Design Quality Review”, n. d.; Hulen, 2022). Instructional designers at some institutions may use other pedagogical rubrics that are freely available and not proprietary. OSCQR is an openly licensed online course review rubric that allows use and/or adaptation (OSCQR-SUNY, n. d.). SUNYY-OSCQR’s rubric is a tool that can be used as a professional development exercise when building and/or refreshing online courses (OSCQR-SUNY, n.d.).

Typically, library guides do not receive a vetted vigorous pedagogical peer review process like online courses. Because library guides are more accessible and are used as teaching tools, they should be crafted for a diverse audience and easy for first-time library guide users to understand and navigate (Bergstrom-Lynch, 2019; Smith et al., 2023). However, Conrad & Stevens (2019) state: “Inexperienced content creators can inadvertently develop guides that are difficult to use, lacking consistent templates and containing overwhelming amounts of information” (p. 49). Lee et al. (2021) reviewed library guides about the systematic review process. Although this topic is complex, Lee et al. (2021) noted that there was a lack of instruction about the systematic review process presented. If instructional opportunities are missing from the most complex topics, one may need to review all types of library guides with fresh eyes. 

Moukhliss aims to describe a set of quality review standards, the Library Guide Assessment Standards (LGAS) rubric with annotations that she created based on the nature of library guides, and by remixing the SUNY-OSCQR rubric. Two trained reviewers are recommended to work with their peer review coordinator to individually engage in the review process and then convene to discuss the results. A standard will be marked Met when both of the reviewers mark it as Met, noting the evidence to support the Met designation. In order for a standard to be marked as Met, the library guide author should show evidence of 85% accuracy or higher per standard. To pass the quality-checked review to receive a quality-checked badge, the peer review team should note that 85% of the standards are marked as “Met.” If the review fails, the library guide author may continue to edit the guide or publish the guide without the quality-checked badge. Details regarding the peer review process are shared in the Library Guide Assessment Standards for Quality-Checked Review library guide. Select the Peer Review Training Materials tab for the training workbook and tutorial.

Situational Context

The University of North Florida (UNF) Thomas G. Carpenter Library services an R2 university of approximately 16,500 students. The UNF Center for Instruction and Research Technology (CIRT) supports two online learning librarians. The online librarians’ roles are to provide online instruction services to UNF faculty. CIRT staff advocate for online teaching faculty to submit their online courses to undergo a rigorous quality review process. Faculty can obtain a High-Quality designation for course design by working with an instructional designer and an appointed peer reviewer from UNF, or they may opt to aim for a High-Quality review after three years of course implementation by submitting for national review with Quality Matters (Hulen, 2022). Currently, Moukhliss serves as a peer reviewer for online High-Quality course reviews. 

After several High-Quality course reviews, Moukhliss questioned why there are no current vetted review standards for the various types of library guides reviewed and completed by trained librarians as there are for online courses and thus borrowed from The SUNY Online Course Quality Review Rubric OSCQR to re-mix as the Library Guide Assessment Standards Rubric with annotations

Literature Review

The amount of peer-reviewed literature on library guide design is shockingly small considering how many library guides have been created. The current research focus has been on usability and user experience studies, although some researchers have begun to focus on instructional design principles. As Bergstrom-Lynch (2019) states, peer-reviewed literature addressing library guide design through the lens of instructional design principles is at a stage of infancy. Researchers have primarily focused on collecting data on usage and usability (Conrad & Stevens, 2019; Oullette, 2011; Quintel, 2016). German (2017), an instructional design librarian, argues that when the library guide is created and maintained through a learner-centered point of view, librarians will see library guides as “e-learning tools” (p. 163). Lee et al. (2021) noted the value of integrating active learning activities into library guides. Stone et al. (2018) conducted a comparison study between two library guides, one library guide as-is and the other re-designed with pedagogical insight. Stone et al. (2018) concluded that “a pedagogical guide design, organizing resources around the information literacy research process and explaining the ‘why’ and ‘how of the process, leads to better student learning than the pathfinder design” (p. 290). A library guide representative of a pathfinder design lists resources rather than explaining them. Lee and Lowe (2018) conducted a similar study and noted more user interaction when viewing the pedagogically designed guide vs. the guide not designed with pedagogical principles. Hence Stone (2018) and Lee and Lowe (2018) discovered similar findings.

Authors like German (2017) and Lee et al. (2021) have touched upon instructional design topics. For example, Adebonojo (2010) described aligning the content of a subject library guide to library sources shared in course syllabi. Still, the author does not expand to discuss any other instructional design principles. Bergstrom-Lynch (2019) thinks more comprehensively, advocating for the use of the ADDIE instructional design model (an acronym for Analysis, Design, Development, Implementation, and Evaluation) when building library guides. The analysis phase encourages the designer to note problems with current instruction. The design phase entails how the designer will rectify the learning gap from the analysis phase. The development phase entails adding instructional materials, activities, and assessments. The implementation phase involves introducing the materials to learners. The evaluation phase enables the designer to collect feedback and improve content based on suggestions. ADDIE is cyclical and iterative (Bergstrom-Lynch, 2019). Allen (2017) introduces librarians to instructional design theories in the context of building an online information literacy asynchronous course but does not tie in using these theories for building library guides.

As Bergstrom-Lynch (2019) focused on best practices for library guide design based on ADDIE, German et al. (2017) used service design thinking constructs to build effective instruction guides. The five core principles of service design thinking are “user-centered, co-creative, sequencing, evidencing, and holistic” (German et al., 2017, p. 163). Focusing on the user encourages the designer to think like a student and ask: What do I need to know to successfully master this content? The co-creator stage invites other stakeholders to add their perspectives and/or expertise to the guide. The sequencing component invites the librarian to think through the role of the librarian and library services before, during, and after instruction. German et al. (2017) advocates for information from each stage to be communicated in the library guide. Evidencing involves the librarian reviewing the library guide to ensure that the content aligns with the learning objective (German et al., 2017). Both authors advocate for instructional design methods but fall short of suggesting an assessment rubric for designing and peer-reviewing guides.

Smith et al. (2023) developed a library guide rubric for their library guide redesign project at the Kelvin Smith Library at Case Western Reserve University. This rubric focused heavily on accessibility standards using the Web Accessibility Evaluation Tool or WAVE. Although Smith et al. (2023) discuss a rubric, the rubric was crafted as an evaluation tool for the author of the guide rather than for a peer review process. 

Although Bergstrom-Lynch (2019), German et al. (2017), and Smith et al. (2023) are pioneering best practices for library guides, they take different approaches. For example, Bergstrom-Lynch (2019) presents best practices for cyclical re-evaluation of the guide based on instructional design principles and derives their best practices based on their usability studies. The Smith et al. (2023) rubric emphasizes accessibility standards for ADA compliance—essential for course designers but a component of a more comprehensive rubric. German et al. (2017) emphasizes a user-centered design through the design thinking method. Moukhliss intends to add to the literature by suggesting using a remix of a vetted tool that course developers use as a professional development exercise with faculty. This OSCQR-SUNY tool envelopes the varying perspectives of Bergstrom-Lynch (2019), Smith et al. (2023), and German et al. (2017). 

Strengths & Weaknesses of the Library Guide

As with any tool, library guides have their strengths and weaknesses. Positives include indications that library guides can play a positive role in improving students’ grades, retention, and overall research skills (Brewer et al., 2017; May & Leighton, 2013; Wakeham et al., 2012). Additionally, library guides are easy to build and update (Baker, 2014; Conrad & Stevens, 2019). They can accommodate RSS feeds, videos, blogs, and chat (Baker, 2014), are accessible to the world, and cover a vast range of library research topics. According to Lauseng et al. (2021), library guides are discoverable through Googling and integrated into online Learning Management Systems (LMS). These factors support the view that library guides hold educational value and should be reconsidered for use as an Open Education Resource (Lauseng et al., 2021).

However, there are no perfect educational tools. Library guide weaknesses include their underutilization largely due to students not knowing what they are or how to find them (Bagshaw & Yorke-Barber, 2018; Conrad & Stevens, 2019; Ouellette, 2011). Additionally, library guides can be difficult for students to navigate, contain unnecessary content, and overuse library jargon (Sonsteby & DeJonghe, 2013). Conrad & Stevens (2019) described a usability study where the students were disoriented when using library guides and reported that they did not understand their purpose, function, or how to return to the library homepage. Lee et al. (2021) and Baker (2014) suggest that librarians tend to employ the “kitchen sink” (Baker, 2014, p. 110) approach to build library guides, thus overloading the guide with unapplicable content.

Critical Pedagogy and Library Guides

In his publication titled “The Philosophy of the Oppressed,” Paulo Freire introduced the theory of critical pedagogy and asserted that most educational models have the effect of reinforcing systems of societal injustice through the assumption that students are empty vessels who need to be filled with knowledge and skills curated by the intellectual elite (Kincheloe, 2012; Downey, 2016). Early in the 21st century, information professionals built upon the belief that “Critical pedagogy is, in essence, a project that positions education as a catalyst for social justice” (Tewell, 2015, p. 26) by developing “critical information literacy” to address what some saw as the Association of College and Research Libraries’ technically sound, but socially unaware “Information Literacy Competency Standards for Higher Education” (Cuevas-Cerveró et al., 2023). In subsequent years, numerous librarians and educators have written about the role of information literacy in dismantling systems of oppression, citing the need to promote “critical engagement with information sources” while recognizing that knowledge creation is a collaborative process in which everyone engages (Downey, 2016, p. 41).

The majority of scholarly output on library guides focus on user-centered design rather than specifically advocating for critical pedagogical methods. Yet there are a few scholars, such as Lechtenberg & Gold (2022), emphasizing how the lack of pedagogical training within LIS programs often results in information-centric library guides rather than learner-centric ones. Their presentation at LOEX 2022 reiterates the importance of user-centered design in all steps of guide creation, including deciding whether a library guide is needed.   

Additionally, the literature demonstrates that library guides are useful tools in delivering critical information literacy instruction and interventions. For instance, Hare and Evanson used a library guide to list open-access sources as part of their Information Privilege Outreach programming for undergraduate students approaching graduation (Hare & Evanson, 2018). Likewise, Buck and Valentino required students in their “OER and Social Justice” course to create a library guide designed to educate faculty about the benefits of open educational resources, partly due to students’ familiarity with the design and functionality of similar research guides (Buck & Valentino, 2018). As tools that have been used to communicate the principles of critical pedagogy, the evaluation of institutional library guides should consider how effectively critical pedagogy is incorporated into their design.  

The Library Guide Assessment Standards (LGAS) Rubric 

For the remixed rubric, Moukhliss changed the term “course” from OSCQR’s original verbiage to “library guide,” and Moukhliss dropped some original standards based on the differences between the expectations for an online course (i.e., rubrics, syllabus, etc.) and a library guide. Likewise, several standards were added in response to the pros and cons of the library guides, as found in the literature. Additionally, Moukhliss wrote annotations to add clarity to each standard for the peer review process. For example, Standard 2 in the remixed LGAS rubric prompts the reviewer to see if the author defines the term library guide since research has indicated that students do not know what library guides are nor how to find them (Bagshaw & Yorke-Barber, 2018; Conrad & Stevens, 2019; Ouellette, 2011). Standard 7 suggests that the librarian provide links to the profiles of other librarian liaisons who may serve the audience using the library guide. Standard 9 prompts the reviewer to see if the library guide links to the library university’s homepage to clarify Conrad & Stevens’s (2019) conundrum that the library guide is not the library homepage. These additional standards were added to ensure that users are provided with adequate information about the nature of library guides, who publishes them, and how to locate additional guides to address the confusion that Conrad & Stevens (2019) noted in their library guide usability study. Additionally, these added standards may be helpful for those who discover library guides through a Google search. 

Moukhliss intends to use the additional standards to provide context about the library guide to novice users, thus addressing the issue of information privilege or the assumption that everyone starts with the same background knowledge. Standard 22 was added to negate adding unnecessary resources to the guide, which Baker (2014) and Conrad & Stevens (2019) cited as a common problem. Standard 27 encourages the use of Creative Commons attribution, as suggested by Lauseng et al. (2021). They found that not only faculty, staff, and students at the University of Illinois Chicago were using their Evidence Based Medicine library guide, but also a wider audience. Recognizing its strong visibility and significant external usage, they considered it a potential candidate for an Open Educational Resource (OER). As library guides are often found without the help of the librarian, Standard 28 suggests that reviewers check that library guide authors provide steps for accessing research tools and databases suggested in the library guide outside of the context of the guide. Providing such information may help to negate Conrad & Stevens’s (2019) findings regarding students’ feelings of disorientation while using a library guide and difficulty navigating to the library homepage from the guide. 

Standard 30 was added so that students have a dedicated Get Help tab explaining the variety of ways the user can contact their library and/or librarians for additional assistance. Standard 31 was re-written so that the user could check for their understanding in a way appropriate for the guide (Lee et al., 2021), such as a low-stakes quiz or poll. Finally, Standard 32 encourages the user to provide feedback regarding the guide’s usefulness, content, design, etc., with the understanding that learning objectives follow an iterative cycle and are not stagnant. Student feedback can help the authoring librarian update and maintain the guide’s relevancy to users and will give students the opportunity to become co-creators of the knowledge they consume.

UNF’s LGAS Rubric for Quality-Checked Review library guide includes an additional tab for a Quality-Checked badge (available on the Maintenance Checklist/Test Your Knowledge tab) and a suggested maintenance checklist (See Maintenance Checklist/ Test Your Knowledge tab) for monthly review, twice-a-year, and yearly reviews. Moukhliss borrowed and remixed the checklist from the Vanderbilt University Libraries (current as of 8/21/2023). The Peer Review Training Materials tab includes a training workbook and training video on the LGAS rubric, the annotations, and the peer review process. Moukhliss provides a Creative Commons license to the library guide to encourage other institutions to reuse and/or remix at the LGAS’s Start Here page

Methodology, Theoretical Model, and Framework

Moukhliss and McCowan used the qualitative case study methodology. Gephart (2004) stated, “Qualitative research is multimethod research that uses an interpretive, naturalistic approach to its subject matter. . . . Qualitative research emphasizes qualities of entities —the processes and meanings that occur naturally” (pp. 454-455). Moukhliss and McCowan selected the exploratory multi-case study so that they could assess multiple student user/learning perspectives when accessing, navigating, and digesting the two library guides. 

The theoretical model used for this study is the Plan-Do-Check-Act cycle. This quality improvement model has evolved with input from Walter Shewart and Dr. Edward Deming (Koehler & Pankowski, 1996). The cycle walks a team through four steps: Plan, Do, Check, and Act. The Plan phase allows time for one to think through problems such as the lack of design standards for library guides. During the “Do” phase, Moukhliss selected and made a remix of the quality review tool SUNY OSCQR. Additionally, she selected a “kitchen sink” (Baker, 2014, p. 10) library guide and redesigned it with the proposed rubric. Moukhliss’s aim was only to remove dead links and/or outdated information when restructuring the guide. The only items deemed outdated were the CRAAP test learning object and selected books from the curated e-book list. The CRAAP test was removed, and no substitution of similar materials was made. The list of selected books was updated in the revised guide to reflect current publications. As Moukhliss restructured the guide, she decided to use tabbed boxes to chunk and sequence information to appease Standards 11, 12, 13, and 15. You may view this restructuring by comparing the original Fake News guide and the revised Fake News guide. The “Do” phase includes Moukhliss recruiting participants to evaluate the two library guides — the original Fake News guide with the Fake News Guide 2 revised to follow the suggested standards and peer review process. Moukhliss and McCowan submitted the library guide study proposal to the Institutional Review Board in November 2023, and the study was marked Exempt. In December 2023, Moukhliss recruited participants by emailing faculty, distributing flyers in the library, posting flyers on display boards, and adding a digital flyer to each library floor’s lightboard display. The librarians added the incentive of 10-dollar Starbucks gift cards to the first 15 students who signed consent forms and successfully completed the 30-minute recorded Zoom session (or until saturation was reached).

Moukhliss interviewed one test pilot (P1) and ten students (P2-P11) for this study and she noted saturation after her seventh interview but continued to ten participants to increase certainty. Although some may view this as a low sample population, the data aligns with the peer-reviewed literature. Hennick & Kaiser (2019) discuss saturation in in-depth interviews and point to Guest et al.’s (2006) study. Guest et al. (2006) determined that after reviewing data from 60 interviews deemed in-depth, they determined that saturation presented itself between Interviews 7-12 “at which point 88% of all themes were developed and 97% of important (high frequency) themes were identified” (Hennick & Kaiser, 2019, para. 5). The Questionnaire framework for this study is centered around Bloom’s Taxonomy. This taxonomy provides a framework of action verbs that align with the hierarchical levels of learning. Bloom’s taxonomy includes verbiage for learning objectives that align with the level of the learning outcomes of remember, understand, apply, analyze, evaluate, and create. McCowan incorporated various levels of Bloom’s Taxonomy as she built the UX script used for this study. Moukhliss interchanged Fake News and Fake News 2 as Guide A and Guide B throughout the interview sessions. After each recorded Zoom session, Moukhliss reviewed the session and recorded the task completion times on the script, recorded the data to the scorecard, and updated data into the qualitative software NVivo. Both script and scorecard are available on the Library Guide Study page. Moukhliss created a codebook with participant information, assigned code names for everyone, and stored the codebook to a password protected file of her work computer to keep identifiable information secure. Moukhliss used the code names Participant 1, Participant 2, Participant 3, etc. and removed all personal identifiers as she prepared to upload the study’s data to a qualitative software system. For coding, the authors chose the NVivo platform, a qualitative assessment tool that can organize data by type (correspondence, observation, and interviews), enable the researcher(s) to easily insert notes in each file, and develop codes to discover themes. Moukhliss coded the interviews based on the LGAS (i.e., Standard 1, 2, 3, etc.). Additional codes were added regarding navigation and content. Moukhliss & McCowan reviewed the codes for themes and preferences regarding library guide design.

The “Check” phase guided Moukhliss and McCowan in considering the implementation of the LGAS rubric and peer review process for library guides at UNF. During this phase, they reviewed participants’ qualitative responses to the Fake News library guide and the Fake News 2 library guide. Data from the “Check” phase will drive Moukhliss & McCowan to make recommendations in the “Act” phase (Koehler & Pankowski, 1996), which will be discussed in the Conclusion.

Interviewees

Moukhliss worked with one test pilot and interviewed ten students for this study. The ten students’ majors were representative of the following: Nursing, Computer Science, Communications, Public Administration, Electrical Engineering, Information Technology, Health Sciences, Philosophy, and Criminal Justice. Participants included two first-year students, two sophomores, three juniors, two seniors, and one graduate student. Eight participants used their desktops, whereas two completed the study on their phones. When evaluating the familiarity of users with library guides, one participant noted they had never used a library guide before, two others stated they rarely used them, and another two students stated that they occasionally used them. Finally, five students stated they did not know whether they had ever used one or not. 

Findings & Discussion

Overall, students were faster at navigating the Fake News 2 Revised guide vs. the original guide except for listing the 5 Types of Fake News. This may be because the 5 Types of Fake News were listed on the original guide’s first page. The overall successful mean navigability for the original guide was 39 seconds, whereas the revised guide’s mean was 22.2 seconds for the successful completion of a task. Moukhliss noted a pattern of failed completion tasks often linked back to poorly labeled sections of the new and revised guides. 

Although the content was identical in both guides except for the removal of outdated information, dead website links from the original guide, and the updated list of e-books to the revised guide, the students’ overall mean confidence level indicated 4.2 for the original guide’s information vs a 4.4 for the revised guide. The mean recommendation likelihood level for the original guide is 6.4, whereas the mean recommendation likelihood level of the revised guide increased to 7.9.

Regarding library guide personal preferences for a course reading, one student indicated they would want to work off the old guide, and 9 others indicated wanting to work from the revised guide for the following reasons:

  • Organization and layout are more effective.
  • Information is presented more clearly.
  • There is a tab for dedicated UNF resources.
  • Easier to navigate.
  • Less jumbled
  • Easier to follow when working with peers.

Regarding perceptions of which guide a professor may choose to teach with, three chose the original guide, whereas the other seven indicated the revised guide. One student stated that the old guide was more straightforward and that the instructor could explain the guide if they were using it during direct instruction. Preferences for the revised guide include:

  • More “interactive-ness” and quizzes
  • Summaries are present.
  • Presentation of content is better.
  • Locating information is easier.
  • The guide doesn’t feel like “a massive run-on sentence.”
  • Ease for “flipping through the topics.”
  • Presence of library consult and contact information. 

Although not part of the interview questions, Moukhliss was able to document that eight participants were not aware that a library guide could be embedded into a Canvas course, and one participant was aware. Moukhliss is unaware of the other participant’s experiences with an embedded library guide. Regarding preferences for embedding the library guide in Canvas, one student voted for the old guide whereas nine preferred the revised guide. Remarks for the revised guide include the inclusion of necessary Get Help information for struggling students and for the guide’s ease of navigation. 

Although not every standard from the LGAS rubric was brought up in conversation throughout the student interviews, the LGAS that were seen as positive and appreciated by students to integrate into a guide’s design include the following Standards: 4, 7, 11, 12, 15, 21, 22, 28, 30, and standards 31. It was noted through action that two students navigated the revised guide by the hyperlinked learning objectives and not by side navigation (Standard 5), thus indicating that Standard 5 may hold value for those who maneuvered the guide through the stated objectives. Moukhliss noted during her observations that one limitation to hyperlinking the object to a specific library guide page is when that page includes a tabbed box. The library guide author is unable to directly link to a specific sub-tab from the box. Instead, the link defaults to the first page of the box’s information. Thus, students maneuvering the guide expected to find the listed objective on the first tab of the tabbed box, and they did not innately click through the sub-tabs to discover the listed objective.

Through observation, Moukhliss noted that six students struggled to understand how to initially navigate the library guides using the side navigation functionality, but after time with the guide and/or Moukhliss educating them on side navigation, they were successful. Moukhliss noted that for students who were comfortable with navigating a guide or after Moukhliss educated them on navigating the guide, students preferred the sub-tabbed boxes of the revised guide to the organization of the original guide. The students found neither library guide perfect, but Moukhliss & McCowan noted there was an overall theme that organization of information and proper sequencing and chunking of the information was perceived as important by the students. Three students commented on appreciating clarification for each part of the guide, which provides leverage for proposed Standard 28.

Additionally, two students appreciated the library guide author profile picture and contact information on each page and three students positively remarked on the presence of a Get Help tab on the revised guide. One participant stated that professors want students to have a point of contact with their library liaisons, and they do not like “anonymous pages” (referring to the original guide lacking an author profile). The final participant wanted to see more consult widgets listed throughout the library guide. Regarding the Fake News 2 Guide, two students preferred that more content information and less information about getting started be present on the first page of the guide. Furthermore, images and design mattered, as one student remarked that they did not like the Fake News 2 banner, and several others disliked the lack of imagery on the first page of the Fake News 2 guide. For both guides, students consistently remarked on liking the Fake News infographics. 

Those supporting the old guide or parts of the original guide, three students liked the CRAAP Test worksheet and wanted to see it in the revised guide, not knowing that the worksheet was deemed dated by members of the instruction team and thus removed by Moukhliss for that reason. One student wanted to see the CRAAP test worksheet repurposed to be a flowchart regarding fake news. Moukhliss noted that most of the students perceived objects listed on the original guide and revised guide to be current, relevant, and vetted. Eight participants did not question their usefulness or relevancy or whether the library guide author was maintaining the guide. Only one student pointed out that the old guide had a list of outdated e-books and that the list was refreshed for the new guide. Thus, Moukhliss’s observations may reinforce to library guide authors that library guides should be reviewed and refreshed regularly as proposed by Standard 22 —⎯ as most students from this study appeared to take for face value that what is presented on the guide to be not only vetted but continuously maintained.

Initial data from this study indicate that using the LGAS rubric with annotations and a peer review process may improve the learning experience for students, especially in relation to being mindful of what information to include in a library guide, as well as the sequencing and chunking of the information. Early data indicates students appreciate a Get Help section and want to see Contact Us and library liaison/author information throughout the guide’s pages. 

Because six students initially struggled with maneuvering through a guide, Moukhliss & McCowan suggest including instructions on how to navigate in either the library guide banner and/or a brief introductory video for the Start Here page or both locations. Here is a screenshot of sample banner instructions:

A sample Fake News library guide banner being used to point students to how to maneuver the guide. Banner states: "Navigate this guide by selecting the tabs." And "Some pages of this guide include subtags to click into."

As stated, Moukhliss noted that most students were not aware of the presence of library guides in their Canvas courses. This may indicate that librarians should provide direct instruction during one-shots in not only what library guides are and how to maneuver them, but directly model how to access an embedded guide within Canvas. 

Conclusion

Library guides have considerable pedagogical potential. However, there are no widely-used rubrics for evaluating whether a particular library guide has design features that support its intended learning outcomes. Based on this study, librarians who adopt or adapt the LGAS rubric will be more likely to design library guides that support students’ ability to complete relevant tasks. At UNF, Moukhliss and McCowan plan to suggest to administration to employ the LGAS rubric and annotations with a peer review process and to consider templatizing their institution’s (UNF) library guides to honor the proposed standard that was deemed most impactful by the student participants. This includes recommending to library administration to include a Get Started tab for guide template(s) and to include placeholders for introductory text, a library guide navigation video tutorial, visual navigational instructions embedded in the guide’s banner, and the inclusion of the guide’s learning objectives. Furthermore, they propose an institutionally vetted Get Help tab that can be mapped to each guide. Other proposals include templatizing each page to include the following: a page synopsis, applicable explanations for accessing library-specific resources and tools from the library’s homepage, placeholders for general contact information, a link to the library liaison directory, a placeholder for the author bio picture, feedback, assessment, and a research consultation link or widget as well as instructions for accessing the library’s homepage.

Following the creation of a standardized template, Moukhliss plans to propose to recruit a team of volunteer peer reviewers (library staff, librarians, library administration) and provide training on the LGAS rubric, the annotations, and the peer review process. She will recommend all library guide authors to train on the proposed LGAS rubric and the new library guide template for future library guide authorship projects and for updating and improving existing guides based on the standards. The training will cover the rubric, the annotations, and the maintenance calendar checklists for monthly, bi-annually, and yearly review. All proposed training materials are available at the LGAS’s Start Here page

Moukhliss and McCowan encourage other college and university librarians to consider using or remixing the proposed LGAS rubric for a quality-checked review and to conduct studies on students’ perceptions of the rubric to add data to this research. The authors suggest future studies to survey both students and faculty on their perspectives on using a quality assurance rubric and peer review process to increase the pedagogical value of a library guide. Moukhliss & McCowan encourage future authors of studies to report on their successes and struggles for forming and training library colleagues on using a quality-checked rubric for library guide design and the peer review process.


Acknowledgments

The authors would like to express our gratitude to Kelly Lindberg and Ryan Randall, our peer reviewers. As well, we would like to thank the staff at In The Library with the Lead Pipe, including our publishing editor, Jaena Rae Cabrera.


References

Adebonojo, L. G. (2010). LibGuides: customizing subject guides for individual courses. College & Undergraduate Libraries, 17(4), 398–412. https://doi.org/10.1080/10691316.2010.525426  

Allen, M. (2017). Designing online asynchronous information literacy instruction using the ADDIE model. In T. Maddison & M. Kumaran (Eds.), Distributed learning pedagogy and technology in online information literacy instruction (pp.69-90). Chandos Publishing.

Bagshaw, A. & Yorke-Barber, P. (2018). Guiding librarians: Rethinking library guides as a staff development tool links to an external site. Journal of the Australian Library and Information Association67(1), 31–41. https://doi.org/10.1080/24750158.2017.1410629

Baker, R. L. (2014). Designing LibGuides as instructional tools for critical thinking and effective online learning. Journal of Library and Information Services in Distance Learning, 8(3–4), 107–117. https://doi.org/10.1080/1533290X.2014.944423 

Bergstrom-Lynch. (2019). LibGuides by design: Using instructional design principles and user-centered studies to develop best practices. Public Services Quarterly, 15(3), 205–223. https://doi.org/10.1080/15228959.2019.1632245

Berić-Stojšić, & Dubicki, E. (2016). Guiding students’ learning with LibGuides as an interactive teaching tool in health promotion. Pedagogy in Health Promotion, 2(2), 144–148. https://doi.org/10.1177/2373379915625324

Bielat, V., Befus, R., & Arnold, J. (2013). Integrating LibGuides into the teaching-learning process. In A. Dobbs, R. L. Sittler, & D. Cook (Eds.). Using LibGuides to enhance library services: A LITA guide (pp. 121-142). ALA TechSource.

Brewer, L., Rick, H., & Grondin, K. A. (2017). Improving digital libraries and support with online research guides. Online Learning Journal, 21(3), 135-150. http://dx.doi.org/10.24059/olj.v21i3.1237

Buck, S., & Valentino, M. L. (2018). OER and social justice: A colloquium at Oregon State University. Journal of Librarianship and Scholarly Communication, 6(2). https://doi.org/10.7710/2162-3309.2231

CIRT. (n. d.) Online Course Design Quality Review. https://www.unf.edu/cirt/id-Quality-Review.html

 Clever, K. A. (2020). Connecting with faculty and students through course-related LibGuides. Pennsylvania Libraries, 8(1), 49–57. https://doi.org/10.5195/palrap.2020.215

Conrad, S. & Stevens, C. (2019). “Am I on the library website?: A LibGuides usability study. Information Technology and Libraries, 38(3), 49-81. https://doi.org/10.6017/ital.v38i3.10977

 Coombs, B. (2015). LibGuides 2. Journal of the Medical Library Association, 103(1), 64–65. https://doi.org/10.3163/1536-5050.103.1.020

Cuevas-Cerveró, A., Colmenero-Ruiz, M.-J., & Martínez-Ávila, D. (2023). Critical information literacy as a form of information activism. The Journal of Academic Librarianship, 49(6), 102786. https://doi.org/10.1016/j.acalib.2023.102786

Dotson, D. S. (2021). LibGuides Gone Viral: A Giant LibGuides Project during Remote Working. Science & Technology Libraries (New York, N.Y.)40(3), 243–259. https://doi.org/10.1080/0194262X.2021.1884169

Downey, A. (2016). Critical information literacy: Foundations, inspiration, and ideas. Library Juice Press.

Emanuel, J. (2013). A short history of LibraryGuides and their usefulness to librarians and patrons. In A. Dobbs, R. L. Sittler, & D. Cook (Eds.). Using LibGuides to enhance library services: A LITA guide (pp. 3-20). ALA TechSource.

Gephart, R. P., Jr. (2004). Qualitative research and academy of management journal. Academy of Management Journal, 47(4), 452–462. https://doi.org/10.5465.amj.2004.14438580

German, E. (2017). Information literacy and instruction: LibGuides for instruction: A service design point of view from an academic library. Reference & User Services Quarterly, 56(3), 162-167. https://doi.org/10.5860/rusq.56n3.162

German, E., Grassian, E., & LeMire, S. (2017). LibGuides for instruction: A service design point of view from an academic library. Reference and User Services Quarterly, 56(3), 162–167. https://doi.org/10.5860/rusq.56n3.162

Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? An experiment with data saturation and variability. Field Methods, 18, 59–82. doi:10.1177/1525822X05279903

Halupa, C. (2019). Differentiation of roles: Instructional designers and faculty in the creation of online courses. International Journal of Higher Education, 8(1), 55–68. https://doi.org/10.5430/ijhe.v8n1p55

Hare, S., & Evanson, C. (2018). Information privilege outreach for undergraduate students. College & Research Libraries, 79(6), 726–736. https://doi.org/10.5860/crl.79.6.726

Hennink, M., & Kaiser, B., (2019). Saturation in qualitative research, In P. Atkinson, S. Delamont, A. Cernat, J.W. Sakshaug, & R.A. Williams (Eds.), SAGE Research Methods Foundations. https://doi.org/10.4135/9781526421036822322

Hulen, K. (2022). Quality assurance drives continuous improvements to online programs. In S. Kumar & P. Arnold (Eds.), Quality in online programs: Approaches and practices in higher education. (pp. 3-22). The Netherlands: Brill. https://doi.org/10.1163/9789004510852_001 

Kathuria, H., & Becker, D. W. (2021). Leveraging course quality checklist to improve online courses. Journal of Teaching and Learning with Technology, 10(1) https://doi.org/10.14434/jotlt.v10i1.31253 

Kincheloe, J. (2012). Critical pedagogy in the twenty-first century: Evolution for survival. Counterpoints, 422, 147–183.

Koehler, J. W. & Pankowski, J. M. (1996). Continual improvement in government tools & methods. St. Lucie Press.

Lauseng, D. L., Howard, C., Scoulas, J. M., & Berry, A. (2021). Assessing online library guide use and Open Educational Resource (OER) potential: An evidence-based decision-making approach. Journal of Web Librarianship, 15(3), 128–153. https://doi.org/10.1080/19322909.2021.1935396

Lechtenberg, U. & Gold, H. (2022). When all you have is a hammer, everything looks like a LibGuide: Strengths, limitations, and opportunities of the teaching tool [Conference presentation]. LOEX 2022 Conference, Ypsilanti, MI, United States.  https://vimeo.com/721358576 

 Lee, Hayden, K. A., Ganshorn, H., & Pethrick, H. (2021). A content analysis of systematic review online library guides. Evidence Based Library and Information Practice, 16(1), 60–77. https://doi.org/10.18438/eblip29819

Lee, Y. Y., & Lowe, M. S. (2018). Building positive learning experiences through pedagogical research guide design. Journal of Web Librarianship, 12(4), 205-231. https://doi.org/10.1080/19322909.2018.1499453

Mann, B. J., Arnold, J. L., and Rawson, J. (2013). Using LibGuides to promote information literacy in a distance education environment. In A. Dobbs, R. L. Sittler, & D. Cook (Eds.). Using LibGuides to enhance library services: A LITA guide (pp. 221-238). ALA TechSource. 

May, D. & Leighton, H. V. (2013). Using a library-based course page to improve research skills in an undergraduate international business law course. Journal of Legal Studies Education, 30(2), 295-319. doi: 10.11n/jlse.12003

OSCQR – Suny Online Course Quality Review Rubric (n. d.). About OSCQR. https://oscqr.suny.edu/

Ouellette, D. (2011). Subject guides in academic libraries: A user-centred study of uses and perceptions. Canadian Journal of Information and Library Science, 35(4), 436–451.10.1353/ils.2011.0024 

Pickens, & Witte, G. (2015). Circle the wagons & bust out the big guns! Tame the “Wild West” of distance librarianship using Quality Matters TM Benchmarks. Journal of Library & Information Services in Distance Learning, 9(1-2), 119–132. https://doi.org/10.1080/1533290X.2014.946352

Quintel, D. F. (2016, January/February). LibGuides and usability: What our users want. Computers in Libraries Magazine, 36(1), 4-8. 

Smith, E. S., Koziura, A., Meinke, E., & Meszaros, E. (2023). Designing and implementing an instructional triptych for a digital future. The Journal of Academic Librarianship, 49(2), 102672–106277. https://doi.org/10.1016/j.acalib.2023.102672

Sonsteby, A. & DeJonghe, J. (2013). Usability testing, user-centered design, and LibGuides subject guides: A case study. Journal of Web Librarianship, 7(1), 83–94. https://doi.org/10.1080/19322909.2013.747366

SpringShare (n. d.). LibGuides. https://springshare.com/libguides/

Stone, S. M., Sara Lowe, M., & Maxson, B. K. (2018). Does course guide design impact student learning? College & Undergraduate Libraries, 25(3), 280-296. https://doi.org/10.1080/10691316.2018.1482808

Tewell, E. (2015). A decade of critical information literacy: A review of the literature. Comminfolit, 9(1), 24-43. https://doi.org/10.15760/comminfolit.2015.9.1.174

Unal, Z. & Unal, A. (2016). Students Matter: Quality Measurements in Online Courses. International Journal on E-Learning, 15(4), 463-481. Waynesville, NC USA: Association for the Advancement of Computing in Education (AACE). Retrieved September 21, 2023 from https://www.learntechlib.org/primary/p/147317/.

Wakeham, M., Roberts, A., Shelley, J. & Wells, P. (2012). Library subject guides: A case study of evidence-informed library development. Journal of Librarianship and Information Science, 44(3), 199-207. https://doi.org/10.1177/0961000611434757 

Building a simple IIIF digital library with Tropy, Tropiiify and Canopy / Raffaele Messuti

Creating and maintaining an online digital collection can be a complex process involving multiple components, from organizational procedures to software solutions. With many moving parts, it's no surprise that building and curating a digital collection can be costly, time-consuming, and demanding to maintain. When dealing with cultural heritage, maintenance and long-term preservation should be our primary concerns. The approach we should always consider is minimal computing.

In this tutorial, I'll show you how to create and maintain a simple IIIF collection using Tropy and Canopy, two powerful tools that can help you build static sites requiring zero maintenance.

There are many other libraries and applications, including free software, that can achieve the same result. However, they often require minimal programming knowledge or the maintenance of server-side applications.

Tropy

Tropy is a desktop application designed to organize and manage archival research photos, though it's also great for managing almost any kind of image, including invoices or handwritten notes. It doesn't require any online service; you can work offline on your desktop without needing to upload anything.

Although it's yet another Electron application, the UI is very pleasant, minimal, and fast to use. You will quickly notice a significant improvement in your offline workflow compared to using online applications in a browser.

There's an extensive user guide to learn Tropy, I won't cover all the details here. Instead, I want to highlight some features I consider important:

  • A Tropy project is saved into an SQLite database. This is a huge advantage because your data won't be locked inside the application. If you have programming knowledge, you can build a workflow to manage the data of a Tropy project and integrate it into any external application.
  • Tropy can import many image formats, including PDFs and multi-page TIFFs.
  • You can describe images with standard templates (a default Tropy template and a Dublin Core one) or create your own.
  • Tropy can be extended with plugins.

IIIF Plugin: tropiiify

One plugin that stands out is tropiiify. With this plugin you can export a Tropy collection to a static IIIF collection: images will be saved in tiles (no IIIF server required), and manifests and collection files will be generated. You simply need to move the exported output to a static HTTP server (remember to configure CORS).

Notes:

  • Every document needs to have an identifier. Use whatever you want, for small collections also progressive numbers are sufficient. Alternatively, use UUIDs or any other unique identifiers, like Nanoid (if you don't want to script a Nanoid generator, point your browser to UUID Nanoid Generator and get a new identifier with each reload).

  • You can create multiple export configurations. Set IIIF base id with the full public URL where you are going to publish the export

Here is and example collection https://docuver.se/statiiify/index.json (just some book covers shot with smartphone) that can be opened with any IIIF viewer (tify or mirador).

There are many other libraries or applications that can help you achieve the same result (vips, iiif-tiler, iiif-prezi), but they require knowledge of the shell and some scripting/programming to put everything together.

Canopy

An IIIF export from Tropy is ready to be used with any IIIF viewer out there. But there is another interesting application: Canopy. It's a static site generator for IIIF collections that includes a browsing interface (with facets), a search engine, and a IIIF image viewer (with annotations). Everything bundled in a static site that doesn't need any server-side technology to be served.

Here is a short guide to use canopy (see also their documentation)

Clone the repository

git clone https://github.com/canopy-iiif/canopy-iiif

Install dependencies

npm i

Configure

Edit .env with the full public URL where you will publish the static exported collection

NEXT_PUBLIC_URL="https://docuver.se"
NEXT_PUBLIC_BASE_PATH="/statiiify/browse"

Edit config/canopy.json with the IIIF collection manifest

{
  "collection": "https://docuver.se/statiiify/index.json",
  "devCollection": "https://docuver.se/statiiify/index.json",
  "featured": [
    "https://docuver.se/statiiify/yosyij-w8eonh4whu7wcv/manifest.json"
  ],
  "metadata": [
    "Title",
    "Creator",
    "Date",
    "Publisher"
  ],
...

Build

npm run build:static

Deploy online: copy the content of out directory to your http server.

Here is a complete demo https://docuver.se/statiiify/browse.

Five ways RO-Crate data packages are important for repositories / Peter Sefton

This post is also available at the Language Data Commons of Australia site.

PDF version

Five ways RO-Crate data packages are important for repositories :: Peter Sefton*, Stian Soiland-Reyes** ::  :: *University of Queensland, Australia; **The University of Manchester, UK ::  ::  ::

Five ways RO-Crate data packages are important for repositories

Presented at: The 19th International Conference on Open Repositories, June 3-6th 2024, Göteborg, Sweden

Session: Presentations: Integrations for Research Data Management

Time: 05/June/2024: 11:00 - 12:30 · Location: Drottningporten 1

Peter Sefton*, Stian Soiland-Reyes**

*University of Queensland, Australia; **The University of Manchester, UK

Research Object Crate is a linked data metadata packaging standard which has been widely adopted in research contexts. In this presentation, we will briefly explain what RO-Crate is, how it is being adopted worldwide, then go on to list ways that RO-Crate is growing in importance in the repository world:

  • Uploading of complex multi-file objects means RO-Crate is compatible with any general-purpose repository that can accept a ZIP file (with some coding, repository services can do more with RO-Crates).

  • Download for well-described data objects complete with metadata from a repository rather than just a ZIP or file with no metadata.

  • Using RO-Crate metadata reduces the amount of customisation that is required in repository software, as ALL the metadata is described using the same simple, self-documenting linked-data structures, so generic display templates.

  • Sufficiently well-described RO-Crates can be used to make data FAIR compliant, aiding in Findability, Accessibility, Interoperability and Reusability thanks to standardised metadata and mature tooling.

  • And if you’re looking for a sustainable repository solution, there are tools which can run a repository from a set of static files on a storage service, in line with the ideas put forward by Suleman in the closing keynote for OR2023.


Uploading of complex multi-file objects … ::

Uploading of complex multi-file objects

RO-Crate [1], [2] is a data packaging format and can be used to put multiple data files together with their metadata into a package such as a ZIP, tar or disk image file. This means that as long as your repository can handle a ZIP file it can take RO-Crates.

RO-Crates enable data to travel with metadata.

Beyond simply allowing the upload of opaque RO-Crates, there are opportunities for repository software to recognise metadata in an uploaded package and to pre-populate built-in metadata forms and/or datastores. This is not a pattern the authors have seen widely implemented in comprehensive institutionally focussed repositories, although at the time of writing it is being explored in Dataverse and InvenioRDM/Zenodo. We would encourage repository developers to explore this further, particularly those working with research data. RO-Crate support is increasing in research-domain repositories; e.g. RO-Crate upload with metadata extract is supported by WorkflowHub and ROHub).


… is already supported even if the repo does not speak RO-Crate ::

One of the design features of an RO-Crate is that as it can be “just a ZIP file” it can be used with any old repository that can handle ZIP files. The JSON file can also be uploaded separately. As RO-Crate adoption increases, the repository may be able to start to use the RO-Crate metadata it already has. You can/will have heard from Dieuwertje Bloemen here at OR2024 that DataVerse has support for RO-Crate metadata preview and building import/export mechanisms.


📂 ::  :: | :: |- ro-crate-metadata.json :: |-- Folder1/  :: |          |-- file1.this :: |          |-- file2.that :: |-- Folder2/ :: |     -- file1.this :: |          |-- file2.that :: |-2021-04-08 07.58.17.jpg  :: { ::       "@id": "2021-04-08 07.58.17.jpg", ::       "@type": "File", ::       "contentSize": 3271409, ::       "dateModified": "2021-04-08T07:58:17+10:00", ::       "description": "", ::       "encodingFormat": [ ::         { ::           "@id":  "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/391" ::         }, ::         "image/jpeg" ::       ], ::       "name": "Cute puppy" ::     }, ::  ::  RO-Crate Metadata Document ::

The RO-Crate specification described a method of packaging data in a folder, which can be zipped, with any kind of file.

This slide shows a folder of data, including a file with a typical obscure file name based on a timestamp (in this case created by a Dropbox upload from a digital camera). The RO-Crate Metadata file, in JSON Linked Data Format, can describe files. This one has a name (i.e. a title) for the file; “Cute puppy”.


📂 ::  :: |-- Folder1/  :: |          |-- file1.this :: |          |-- file2.that :: |-- Folder2/ :: |       |-- file1.this :: |          |-- file2.that :: |-2021-04-08 07.58.17.jpg  ::  ::

A human-readable description and preview (ro-crate-preview.html) can be in an HTML file that lives alongside the metadata. This slide shows an HTML view of the data that shows the image with its metadata, including the Schema.org name (equivalent to a Dublin Core title) for the file.


Increasingly, research repository infrastructure is accepting RO-Crate input - this screenshot from WorkflowHub [documents the upload API[(https://about.workflowhub.eu/developer/ro-crate-api/) for submitting RO-Crate packaged descriptions of scientific workflows to the system. These can then be downloaded by others for reuse. Here, RO-Crate allows bypassing of the traditional “title, author, license, description” fields (rendered from the crate), as well as permitting user extensions on metadata to be kept in the repository.


2.  RO-Crate is a packaging format suitable for downloads :: 2. ::

RO-Crate is a packaging format suitable for downloads

One of the perennial problems with downloads is that once a user has the data, it often does not come with metadata as shown on the landing page, or if present it is in an ad hoc or specialised format. RO-Crate solves this by specifying an extensible way to put linked-data metadata with data assets and to provide an HTML page or small website with the data to explain it. Thus data travels with its metadata and can be made human-readable.

RO-Crate download is already available in many data repositories. Examples include:

  • WorkflowHub: A registry for describing, sharing and publishing scientific computational workflows.

  • ROHub: A repository of Earth Science datasets and computational methods.

  • TLCMap: The Time Layered Cultural map is a set of tools that work together for mapping Australian history and culture.

  • The Language Data Commons of Australia data portal: entirely built on RO-Crates, the underlying data consists of crates-on-disk and the API is based on RO-Crate metadata.

  • Senckenberg Wildlive portal: exposes metadata about automatic photo captures of endangered animals using RO-Crate.

  • Dataverse: at the time of writing, RO-Crate downloads are in development.

We will encourage developers from other repository platforms to follow the Dataverse project’s lead and add RO-Crate support.


 ::

This is a screenshot of the Gazetteer of Historical Australian Places – not exactly a repository but an example of a place where people can download datasets.


HTML Preview Describes the files ::

This kind of download could be added to any repository system where there is at least one file that has metadata; offer a download ZIP option that has machine-readable JSON metadata (linked data in JSON-LD) and a human-readable summary of the metadata – this one has descriptions of a few files in it.


Enable programmatic downloads – include the metadata and its extensions :: signposting.org  ::

Every repository implementer should add FAIR Signposting – just a couple of HTTP headers – this means machines can go from an HTML landing page to the actual download without guesswork – or even better – to an RO-Crate! I’m sure you’ll hear this mentioned in one of the talks by Herbert van de Sompel and colleagues, such as the one on FAIRiCat. Again, DataVerse is ahead of the curve and has already implemented this.

As we mentioned above, data should travel with the metadata – one example of this from the WorkflowHub is how other services like the LifeMonitor retrieves the RO-Crate and then looks for custom annotations in the metadata to pick up and connect to the testing infrastructure. This was only possible because RO-Crate is extensible - you are not trapped with whatever 25 properties we’ve selected. The vessel is still RO-Crate, the repository didn't need to add anything to support the LifeMonitor.


3. Less user interface customisation will be needed for different types of metadata ::

One of the key benefits of linked-data metadata over previous ‘legacy’ approaches, is that multiple vocabularies can be combined into a single metadata document in a way that is not possible with, say MARC, or MODS XML, and that all these vocabularies can use the same syntax and approach to describing data. This means that a simple generic RO-Crate viewer can be used to visualise any metadata whether it is basic “Who, What, Where” metadata (like Dublin Core) or domain-specific metadata like the RO-Crate metadata profile (https://w3id.org/ldac/profile) used by the Language Data Commons of Australia. This can be displayed alongside the core RO-Crate metadata without any expensive configuration or coding. If the recommendations are followed, the RO-Crate metadata terms are self-documenting, e.g. all the Language Data Commons terms which use a Schema.org Style approach, are defined here: https://w3id.org/ldac/terms.


 ::  :: Here the mechanism is to use the ‘magic’ name METS.xml to store some extra metadata – with a fully linked-data system this kind of thing is not needed ::

This slide is a repeat of one we used last year – this screenshot is a bit of (undated) DSpace documentation found following a tip from Kim Sheppard – we have included it here to illustrate that storing additional metadata (in this case METS) for an object was done by convention – it had to be stored in a special file called METS.xml. Using a linked-data system means that we no longer have to do this kind of thing – there’s still one magic file name in RO-Crate but it’s only one for the metadata and one for the HTML preview – everything else is labelled and extensible.


 ::  ::

This is an example of a data object in RO-Hub, which has a variety of files described in the object.

This site does not need to have multiple plugins for different kinds of data, as linked data has a generic structure. We saw another example of how the generic structure can be rendered in the opening example with a picture of Sefton's dog.


No need to invent (completely) new file formats anymore! ::  ::  ::

So now we can tell all these research software developers, I know you like making new file formats, and I would love to support that in my repository, so could you perhaps use RO-Crate as the basis for making that format? Then we can pull out all the boring stuff, even for new file extensions, we just detect it’s a ZIP file and look for that magic file. If it says it’s an RO-Crate we’ll believe you.


4. The availability of RO-Crate editing tools opens the way for repository software to focus on access and discoverability ::

The availability of RO-Crate editing tools opens the way for repository software to focus on access and discoverability. We argue that the core functionality of a repository is keeping data safe and making it available with appropriate access controls (remember, not all data can be made Open Access - the A for accessibility in FAIR is about giving the right people (or other agents) access to the right data). RO-Crates require clear licensing statements to travel with data, and we will demonstrate how these have been integrated into access-control systems.

There is an opportunity, if RO-Crate is adopted as an interchange format, for the metadata editing functions (and authorisation) functions of a repository to be decoupled from it so the editor components for a particular metadata profile can be shared between repository instances, or handled in a more distributed architecture than in typical current repositories.


 ::

The Describo website mentions this integration with the Dataverse repository where the Describo RO-Crate editor can be used to enter metadata. The pattern is potentially very powerful - separate the creation of metadata from the repository so that repositories can focus on data retention and multiple other applications can be built for use by researchers or other users – closer to or embedded in systems they are already using.


 :: RO-Crate editing :: Data Portal ::

The Language Data Commons of Australia team has produced an alternative to Describo known as Crate-O – this is available as a web component ready to drop in to any web app or can function stand-alone – we use it as part of the workflow in maintaining the data portal for the Language Data Commons of Australia.


 Portal shows that accessing this data requires authorisation. First step: Log in.

This slide shows the beginning of an access-control process. The data was prepared in RO-Crate format using batch-processing scripts and has a license attached. The repository portal is shown here - with an indication that the user needs to log in to access the data.


 logging in

The user logs in using CILogon (or another federated authentication service).


 REMS

The user is then directed to an instance of REMS - the Resource Entitlement Management System (REMS) to request that a licence be granted to access the data. REMS is open-source software.


 Applying for a license

After an approval process which may be automated, or may involve humans checking credentials, the user is directed back to the repository.


5. With a repository to keep data safe and serve it using persistent Identifiers, RO-Crates help make data FAIR  ::

With a repository to keep data safe and serve it using persistent identifiers, RO-Crates help make data FAIR.

RO-Crate is increasingly being used to describe the provenance [3] of derived data in such a way that the workflows/computation that produced it can be re-run automatically to validate it, or as a basis for new research. This might be a button on a repository to run a bioinformatics workflow, or re-run a Jupyter notebook that produces a set of plots.


RO-Crate helps to enable FAIR research practice; RO-Crates can describe inputs, outputs and code in any combination to record research processes, and can be used to provision services.


 ::

This slide is a collage from a presentation at eResearch Australasia, lead author Alex Ip put together showing some examples of code notebooks for text analytics and geophysics. Alex is working with the Language Data Commons of Australia team to make RO-Crate Profiles that can describe code, not just in terms of its authorship, language and inputs and outputs, but the (usually virtual) execution environment and hardware requirements needed to run it. This is a key step forward in the Interoperability and Reuse of data called for by the FAIR principles.


6. (bonus point) There are tools which can run a repository from a set of static files on a storage service, in line with the ideas put forward by Prof Suleman at OR 2023 ::

There are tools which can run a repository from a set of static files on a storage service, in line with the ideas put forward by prof Suleman at OR 2023. The team at the Language Data Commons of Australia, with partner institutions and colleagues, has been working to produce a set of tools for building Archival Repository software stacks that is based on a principled approach to keeping data safe, based on the principles presented in the Arkisto website[4] and more recently at https://w3id.org/ldac/pilars the core idea is that a collection of RO-Crates in a storage service can be the basis of a repository – either using a simple on-disk directory layout or something more complicated such as an Oxford Common File Layout (OCFL) specification.


The [UTS Research Data Portal] is an example of a very minimal data repository system which uses a standard RO-Crate viewer to show RO-Crates that are sitting on file storage. This example is of an engineering dataset.


The Language Data Commons of Australia Data Partnerships (LDaCA-DP), Language Data Commons of Australia Research Data Commons (LDaCA-RDC), and Australian Text Analytics Platform (ATAP) projects received investment (https://doi.org/10.47486/DP768, https://doi.org/10.47486/HIR001, & https://doi.org/10.47486/PL074)  :: from the Australian Research Data Commons (ARDC).  ::  :: The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS). ::

The Language Data Commons of Australia Data Partnerships (LDaCA-DP), Language Data Commons of Australia Research Data Commons (LDaCA-RDC), and Australian Text Analytics Platform (ATAP) projects received investment (https://doi.org/10.47486/DP768, https://doi.org/10.47486/HIR001, & https://doi.org/10.47486/PL074) from the Australian Research Data Commons (ARDC).

The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).

Crate-O - a drop-in linked data metadata editor for RO-Crate (and other) linked data in repositories and beyond / Peter Sefton

This post is also available at the Language Data Commons of Australia site.

PDF version

Crate-O - a drop-in linked data metadata editor for RO-Crate (and other) linked data in repositories and beyond ::  ::

Developer Track Session 2

Time: 05 June 2024, 09:00 - 10:30 · Location: Drottningporten 2

Crate-O - A drop-in linked data metadata editor for RO-Crate (and other) linked data in repositories and beyond

Peter Sefton, Alvin Sebastian, Moises Sacal Bonequi, Rosanna Smith

University of Queensland, Australia

Research Object Crate is a metadata packaging standard which has been widely adopted over the last few years in research contexts and which debuted at Open Repositories with a workshop in 2019.

 ::  ::

Crate-O is an editor for the RO-Crate Metadata Specification.

RO-Crate has been presented here at Open Repositories for the last few years, and is now starting to be incorporated into many research repository solutions (though they are not always called repositories).

Five ways RO-Crate data packages are important for repositories :: Peter Sefton*, Stian Soiland-Reyes** ::  :: *University of Queensland, Australia; **The University of Manchester, UK ::  ::  :: Time: 05/June/2024: 11:00 - 12:30 · Location: Drottningporten 1 ::

I am presenting in the next session on why RO-Crate is important for repositories.

Five ways RO-Crate data packages are important for repositories

Time: 05 June 2024, 11:00 - 12:30 · Location: Drottningporten 1

Peter Sefton*, Stian Soiland-Reyes**

*University of Queensland, Australia; **The University of Manchester, UK

Research Object Crate is a linked data metadata packaging standard which has been widely adopted in research contexts. In this presentation, we will briefly explain what RO-Crate is, how it is being adopted worldwide, then go on to list ways that RO-Crate is growing in importance in the repository world:

  • Uploading of complex multi-file objects means RO-Crate is compatible with any general-purpose repository that can accept a zip file (with some coding, repository services can do more with RO-Crates)

  • Download for well-described data objects complete with metadata from a repository rather than just a zip or file with no metadata

  • Using RO-Crate metadata reduces the amount of customisation that is required in repository software, as ALL the metadata is described using the same simple, self-documenting linked-data structures, so generic display templates

  • Sufficiently well-described RO-Crates can be used to make data FAIR compliant, aiding in Findability, Accessibility, Interoperability and Reusability thanks to standardised metadata and mature tooling

  • And if you’re looking for a sustainable repository solution, there are tools which can run a repository from a set of static files on a storage service, in line with the ideas put forward by Suleman in the closing keynote for OR2023.

The Language Data Commons of Australia Data Partnerships (LDaCA-DP), Language Data Commons of Australia Research Data Commons (LDaCA-RDC), and Australian Text Analytics Platform (ATAP) projects received investment (https://doi.org/10.47486/DP768, https://doi.org/10.47486/HIR001, & https://doi.org/10.47486/PL074)  :: from the Australian Research Data Commons (ARDC).  ::  :: The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS). ::

The Language Data Commons of Australia Data Partnerships (LDaCA-DP), Language Data Commons of Australia Research Data Commons (LDaCA-RDC), and Australian Text Analytics Platform (ATAP) projects received investment (https://doi.org/10.47486/DP768, https://doi.org/10.47486/HIR001, & https://doi.org/10.47486/PL074) from the Australian Research Data Commons (ARDC).

The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).

 :: Start describing your research  :: TODAY! ::

A version of the Crate-O component is available as a playground for Chrome browsers only. This allows you to describe your local folders in your computer and generate an RO-Crate.

 ::  ::

https://github.com/Language-Research-Technology/ro-crate-modes/blob/main/docs/soss-profiles.md

A Schema.org style Schema (SOSS) specifies a metadata vocabulary of Classes and Properties, based on the RO-Crate specification's use of Schema.org classes. An RO-Crate Profile has (at least) a document that explains how metadata entities from the Schema are used for a particular purpose. An RO-Crate Mode is a set of lightweight syntactic rules for combining SOSS Classes, Properties and DefinedTerms, expressed in a JSON file.

"classes": { ::     "Dataset": { ::       "id": "http://schema.org/Dataset", ::       "subClassOf": [], ::       "inputs": [ ::         { ::           "id": "http://schema.org/isAccessibleForFree", ::           "name": "isAccessibleForFree", ::           "help": "This is available under an Open Access license.", ::           "required": false, ::           "multiple": false, ::           "type": [ ::             "Boolean" ::           ] ::         }, ::        … :: { ::   "metadata": { ::     "name": "Language Data Commons top-level Collection (corpus)", ::     "description": "Implements the language data commons RO-Crate Metadata Profile for top-level collection.", ::     "version": 0.3, ::     "author": "University of Queensland", ::     "license": "GPL-3.0" ::   }, ::  "conformsToUri": [ ::    "https://w3id.org/ldac/profile#Collection" ::  ], :: Modes ::  ::

Crate-O RO-Crate Editor Mode Files are editor configurations that implement RO-Crate Metadata Profiles.

The configuration files are intended to form the basis of an approach for describing RO-Crate editor behaviour and can be used for RO-Crate validation.

Initial versions of this work were based on the Describo Profiles (which vary between versions of Describo) used to configure the Describo family of RO-Crate editing tools, currently maintained by Marco La Rosa.

Embed Crate-O in your own Vue app :: Functionalities include everything Crate-O can do except anything to do with file handling (load/save crate, read files, manage profiles, etc.) :: Published in NPM, just do npm install and import the component in your Vue app :: As a component, it should run in any modern web browser :: Crate-O is developed with Vue 3 framework and exposes the Vue component CrateEditor that can be imported as a library by any Vue app. ::  :: CrateEditor UI part ::

Crate-O is a single-page front-end web app developed using the Vue Javascript framework. Vue app is built by composing and nesting modular structures called components.

The main part Crate-O is then bundled in the CrateEditor component (highlighted in red) that allows a user to view, add, edit and delete properties of an entity, add and delete entities, and navigate and browse all entities in the crate.

The CrateEditor component is exported as EMAScript Module and can be imported into any Vue web app by adding Crate-O as dependency.

The default main app (for showcasing) only runs in Chrome-based browsers because it requires the showDirectoryPicker method from File System API to access (read/write) a directory on a local machine. The feature is still experimental and only available in Chrome-based browsers. However, the CrateEditor component itself does not require that feature.

Embed Crate-O in your own Vue app ::  ::

This slide illustrates the very basic example of embedding Crate-O CrateEditor in any Vue app.

The example shows a Vue Single-File Component (SFC) (*.vue)

  • Inside the < import the CrateEditor component from the Crate-O package

  • Initialise all required variables

  • Add the <crate-editor> tag inside the <template> and pass in the data via the attributes:

  • crate: a plain js object in json-ld format, usually is the result of JSON.parse() of the string content of ro-crate-metadata.json file

  • mode: a plain js object conforming to the ro-crate-mode syntax

Data can be imported into Crate-O using spreadsheets - this is an efficient way to create metadata for collections of objects. Spreadsheet skills are common and many projects have already been described using spreadsheets to describe and manage files, we work with data custodians to standardize their approach to this so that they can create rich linked-data metadata.

 ::

Crate-O can be found here https://github.com/Language-Research-Technology/crate-o

#ODDStories 2024 @ Ningi, Nigeria 🇳🇬 / Open Knowledge Foundation

The MUMSA Initiative, a youth-driven non-profit organization, successfully held a two-day hackathon titled “Hacking for Healthy Food & Green Futures” at Ningi, Nigeria on March 6th and 7th, 2024. This event aligned perfectly with Open Data Day 2024 and empowered young people in Ningi to address critical local challenges through the power of open data. 

Thematic Focus: open data for advancing sustainable development goals (SDGs) – specifically, SDG 2 (Zero Hunger), SDG 3 (Good Health and Well-being), and SDG 13 (Climate Action)

The hackathon brought together passionate young minds from different schools and inside the community to tackle interconnected issues of food security, mental health, and climate change. Participants leveraged local and national open datasets on agriculture, nutrition, weather, mental health resources, and environmental indicators.

Over the two days, teams observed to develop groundbreaking solutions that directly impact their community. These solutions included:

  • Data-driven strategies for identifying areas with food insecurity and optimizing crop selection based on climate data. This aims to empowers local farmers to make informed decisions and improve food production.
  • Development of interventions to address local mental health needs and create awareness campaigns based on real-time data. With esense to increases access to resources and promotes mental well-being.
  • Promotion of climate-smart agricultural practices through data analysis. This approach facilitates the reduction of food waste and fosters progress towards environmental goals.

MUMSA Initiative ensured a well-rounded experience by offering:

Equipping participants with the skills to access, analyze, and utilize open data effectively and connecting participant in teams the facilitators provide guidance and support throughout the hackathon. And encourage the participants to share their note of ideas for wider impact, maximizing the reach and potential of their solutions.

The “Hacking for Healthy Food & Green Futures” hackathon is a testament to the power of engaged youth. This event serves as a model for other organizations and communities seeking to empower young people to use open data and tackle real-world challenges.


About Open Data Day

Open Data Day (ODD) is an annual celebration of open data all over the world. Groups from many countries create local events on the day where they will use open data in their communities.

As a way to increase the representation of different cultures, since 2023 we offer the opportunity for organisations to host an Open Data Day event on the best date within a one-week period. In 2024, a total of 287 events happened all over the world between March 2nd-8th, in 60+ countries using 15 different languages.

All outputs are open for everyone to use and re-use.

In 2024, Open Data Day was also a part of the HOT OpenSummit ’23-24 initiative, a creative programme of global event collaborations that leverages experience, passion and connection to drive strong networks and collective action across the humanitarian open mapping movement

For more information, you can reach out to the Open Knowledge Foundation team by emailing opendataday@okfn.org. You can also join the Open Data Day Google Group to ask for advice or share tips and get connected with others.

We Are Makers – how we work at Artefacto / Artefacto

While we wear many hats, building things has always been central to our work at Artefacto. We’ve always described ourselves as both makers and librarians and our consultancy, training and design work fits within these identities.  We are especially excited when we can build and share tools, resources and platforms that deliver a user-friendly experience.  [...]

Continue Reading...

Source

Cross-Searching Simplified & Traditional Chinese / Library Tech Talk (U of Michigan)

Catalog Search showing Chinese-language search results
Image Caption

A search for the traditional-character Chinese phrase "戶籍", which shows the same results as the equivalent search for the simplified characters "户籍".

The U-M Library recently added the capability to search across Chinese-language materials in our catalog, regardless of which Chinese character set was used in the query or the record. This improvement expands to our large collection of materials and improves the user experience.