Planet Code4Lib

Free LITA Webinar: Protect Library Data While Working From Home / LITA

A Crash Course in Protecting Library Data While Working From Home

Presenter: Becky Yoose, Founder / Library Data Privacy Consultant, LDH Consulting Services

Thursday, April 9, 2020

1:00 – 2:00 pm Central Time

There’s a seat waiting for you… Register for this free LITA webinar today!

Libraries across the U.S. rapidly closed their doors to both public and staff in the last two weeks, leaving many staff to work from home. Several library workers might be working from home for the first time in their current positions, while many others were not fully prepared to switch over to remote work in a matter of days, or even hours, before the library closed. In the rush to migrate library workers to remote work and to migrate physical library programs and services to online, data privacy and security sometimes gets lost in the mix. Unfamiliar settings, new routines, and increased reliance on vendor technology all puts library data privacy at risk.

This 60-minute interactive webinar will guide library workers in protecting patron data as they adjust to the new normal of working from home. Participants will also have the chance to ask questions of their fellow attendees or share how they are addressing library data privacy at their library in the webinar.

Learning objectives for this program include:

  •  Identify and understand key risks to library data privacy and security from a rapid shift to working from home
  •  Learn and implement strategies and tools in mitigating common privacy and security risks to library data when working from home
  •  Assess and plan how to protect library data while working remotely for the long term

Who Should Attend:

  •  Library workers working from home for the first time in their current positions or careers
  •  Library technology workers responsible in guiding library workers in securing their home offices while working remotely
  •  Library administrators who are concerned about how library user data privacy and security can be affected by shifting to remote work, and what their staff can do to mitigate additional risks

Register online:​​​​​​​

Special thanks to LDH Consulting Services for sponsoring this free webinar.

All of a sudden, I’m working from home. Now what do I do? / HangingTogether

Working from home is a new experience for many of us in the library community, and we are collectively facing challenges while not only “working from home” but also “homing from work.” We hear from library colleagues across the globe who have abruptly transitioned to work-from-home. You are converting your home to a workplace (frequently with new “colleagues” such as roommates, spouses, children and pets). You may also be re-thinking what work looks like, when done outside the library. You may be navigating software and systems challenges (VPN, anyone?).

Many of us in OCLC’s Membership and Research group have experience working from home, and we offer some tips that will hopefully help. We recognize that these are not ideal circumstances and that this is an evolving situation, so we are not advising a home office makeover, or investment in special equipment. We hope you will offer your own tips and wisdom in the comments.

Lynn Silipigni Connaway has worked remotely for 18 years

Set boundaries. You can learn from my mistakes! I did not want my colleagues to think that I was not working, so I made myself available – too available – 24/7. Develop a schedule and routine and set parameters. A friend recommends the Pomodoro Technique, developed by Francesco Cirillo, who named the system after the tomato-shaped timer he used to track his work as a student. Organize your day into 15-minute chunks, with 5-minute breaks. Twice a day, take a longer break – 15 – 20 minutes. When the day is over, shut down the computer, close the door (if you have a separate office), and walk away until the next workday begins. This is difficult for many of us. However, try to not be “on” constantly.

Mentally and physically prepare for work. Get up, dress, and have breakfast. You can be casual, but plan as though you are going to an office or meeting outside of the home. This helps to put you in the work mindset.

Hydrate. Have water at your work area since you may be on multiple, consecutive phone calls or video conferences. This also will force you to stand up, take a bio break, and replenish your water.

Merrilee Proffitt has been working from home for 15 months and started practicing “shelter in place” with her family on March 13. She has some thoughts on working from home with children

Calendaring. Even for those of us who have been working from home for a long period of time, having the whole family home with us at the same time presents some new challenges. Colleagues in the library world with young children report that back-to-back meetings are not realistic; they need to space out their workday to spend time with those younger family members. If this is the case for you, do “defensive calendaring” and communicate your needs to team members. And if you work with people with young children, recognize that your colleagues may need additional support in this regard.

Leave Space. When you’re adding meetings to calendars, make sure to leave some space for people. No one who is telecommuting needs back-to-back-to-back-to-back meetings. Give at least 15-minute breathers in peoples’ schedules, to allow for stretching, resting eyes, refilling water, answering email, recharging introvert batteries

Routine. I am working at home with my 13-year-old, and we’ve created a schedule with 2-3 hours for focused academic time, and the remainder of the time is devoted to creative time, outdoor time, music practice and chores. Of her own accord, my daughter has volunteered to offer a free of charge virtual babysitting service for parents with younger children who need focus time and has also offered to be a “pen pal” for kids that are more her age. It has been great to see her, and other children see themselves as “helpers”. This time has also given her a firsthand glimpse into the work that I do; she’s always been curious about my work but overhearing my work conversations on a regular basis has given her a front row seat. 

Sharon Streams has regularly telecommuted between home and an office that’s a 90-mile commute away

The big switch for me is that I am now connecting to a global network of colleagues sitting in their homes! These tips are written to help you and others adjust to working together while not in the same building.

Accommodating styles. In a typical office, there is a mix of work styles: those that drop by your desk unannounced, others who want long interrupted focus time, the ones who always remember birthdays or plan happy hour, “the scheduler” who likes meetings, early birds, night owls, and so on. Our work personalities follow us home, so, we need to accommodate styles in the online environment. Think about how you can use email, IM, web conferencing, telephone, and text to get the level and type of communication that suits you best. But also remember that preferences vary; meet your colleagues halfway to help them feel connected and supported. If you used to bring in a cake to the workplace on someone’s birthday, send cake GIFs over your messaging platform. If you work with a person who would drop by to “just throw some ideas” at you, invite them to continue sharing their practice on a preferred platform.

About face. If you are unable to have video meetings where you can see other people’s expressions and read their body language, be extra mindful of how your express yourself verbally or in writing so that you do not create misunderstanding or hurt feelings. When you are on video, keep your self-view turned on so that you can see if you are unintentionally scowling at the speaker (or have food on your face).

A close up of text on a white surface. "Dude, I'm on a video call"  Description automatically generatedNot right now

Order of operations. I have typically approached meetings like a mullet: business in the front and party at the end. Now, when we are navigating very unsettling circumstances, I recommend switching it around. Start by checking in with everyone to see how they are feeling, then proceed with your agenda.

Dude! Post a sign that says “ON VIDEO CALL” or something, so that family members* (other than toddlers or pets) do not walk into camera view or ask if you would like lunch. Here’s the post-it I stick on the basement door for my spouse to see.

Prior to joining OCLC in 2016, Rebecca Bryant worked from home during her time as Community Director for ORCID, and shares her experience working from home with pets

My cat, Buddha, was so glad to have me home during the day, and he wanted to always be on my desk, on top of whatever I was working on. He grew jealous if I was on a videoconference, and he would act out during those times, gnawing on the power and internet cables or offering some full-throated vocalizations. He was a huge distraction, and I needed to get him off my desk.

Orange cat on a deskBuddha, “supervising

I finally realized I didn’t have to fight with him—I just needed to give him a better option. I put a basket with a fleece pillow on top of the cabinet behind my desk which quickly became his new spot. He was still near me but in a safe, confined space where he could watch everything. In fact, I’m quite sure he thought he could better “supervise” me, as he watched over my shoulder all day.  He could also more effectively participate in video calls, as he had full access to my screen. Later my woodworker husband made a small shelf for him, offering him corner office views.

Our pets love us and want to be near us, and we can offer them spaces that help them be near, but out of our way.

Orange cat in a basket Buddha lends a hand

Annette Dortmund has been working from home since 2005, initially from a desk in her living room

Create a work / home division. Working and living in the same space can present challenges. Rituals can help keep a clear divide and ease transition between the two. For example, if you only have one work desk (very likely!) use different parts of the desk for your work and personal lives. Box your home materials away in the evening, and box work stuff away when you are done for the day. I use different operating systems and applications for work and private life. Even on one laptop, just using different browsers, with a different set of bookmarks, can make all the difference. Failing all that, one can use different cups for coffee, work ones and after-work ones. One can dress for work and change dress for private life; this need not be head to toe, it can be jacket on/off, whatever. Small changes in the environment can help support a conscious change of mindset.

On the other hand, if washing the dishes or doing the laundry helps you sort out a tricky work-related problem, by all means, take advantage of being at home.

Rethink your calendar. In addition to the calendar mindfulness mentioned above, you can also take control of your calendar by blocking out the time you need for family or exercise, and pre-schedule a regular series of breaks, including comfort breaks, during the work part of your day. (If this sounds crazy to you, I have been in the situation to find this necessary.) A five-minute buffer can make all the difference. And now that we are all meeting virtually, consider switching to 45- or 50-minute meetings as a rule. As agendas fill up, this could greatly help.

Meryl Cinnamon, a team member who has worked from home since 2005 offers this final bit of advice for avoiding temptation: “Do not put out candy dishes when working from home.  Those little pieces of sugar can become way too inviting!”

Want more? Our colleague Andy Bush shared his tips for working from home after twenty years of experience – they are highlighted on Skip Prichard’s blog.

The post All of a sudden, I’m working from home. Now what do I do? appeared first on Hanging Together.

Connecting environmental politics and data in Brazil: Open Data Day 2020 report / Open Knowledge Foundation

On Saturday 7th March 2020, the tenth Open Data Day took place with people around the world organising over 300 events to celebrate, promote and spread the use of open data. Thanks to generous support from key funders, the Open Knowledge Foundation was able to support the running of more than 60 of these events via our mini-grants scheme

This blogspot is a report by Marília Gehrke from Afonte Jornalismo de Dados (Afonte Data Journalism) in Brazil who received funding from Resource Watch to raise awareness about environmental politics and empower the community to use public and open data.

Open Data Day Porto Alegre panelists and organising team (photo: Juliana Spilimbergo)

People who attended Open Data Day Porto Alegre learned about the ecosystem where they live. Through graphics, figures, maps, and even a new database released during the event, three panelists explained the impact of Mina Guaíba installation. The project involves a coal mine exploration, which will affect Porto Alegre, in Brazilian South, and its metropolitan region. If it occurs, about 166 tonnes of coal might be extracted in 23 years. One of the main problems is pollution: Jacuí river, and consequently Guaíba Lake, which supplies water for the city, will be at risk of contamination. Approximately 4.6 million people would be affected. 

“How can society feel safe about the coal mine?”, asked Dr. Rualdo Menegat, professor of Geosciences Institute at the Federal University of Rio Grande do Sul (UFRGS) and one of the panelists. According to him, the project that aims to start coal exploration does not foresee potential risks and environmental emergencies – natural disasters, explosions, fires, storms, and inundations might happen. He also presented a periodic table of substances that are part of the chemical composition of coal. “It is chemical garbage,” he summarised. 

Dr. Marilene Maia, the coordinator of the Observatory of Realities and Public Policies of Vale do Rio dos Sinos (Observasinos), believes science and data are essential for people to be aware of the mine’s risks, as well as public transparency. She said that sometimes data is available, but is not accessible because citizens do not comprehend it.

Presentation by Iporã Possantti (photo: Marília Gehrke)

Iporã Possantti, who is an environmental engineer and also a member of the group Critical Environment (Coletivo Ambiente Crítico), organised and released a new database to empower the community and inspire people to investigate. Territorial and georeferenced information will allow the creation of maps and promote subsidies for data analysis. The main goal, he said, is to offer structured data that is already public in different places, but can disappear depending on the governors’ decision. 

Open Data Day in Porto Alegre also had a workshop to stimulate the use of the Access of Information Law (a Brazilian version of the Freedom of Information Act) to obtain public data that are not publicly available unless if someone requests. LL.M. Bruno Morassutti, who is a lawyer and specialist in this topic, showed several examples of how to access Websites and protocols to ask for information. He also presented the environmental legislation in Brazil to support the arguments for the requirements. 

The audience was able to ask questions during the event. In the first panel, invited journalists – freelancers and professionals from different news media – started the debate. Overall, the community expressed concern about the future of environmental events and people who did not know the data presented acted surprise. “It is not an event that starts tomorrow and ends at the end of the year. It will affect future generations”, said Dr. Menegat about the coal mine.

About 60 people attended Open Data Day in Porto Alegre. For the second year in a row, journalists Marília Gehrke and Taís Seibt, from Afonte Jornalismo de Dados (Afonte Data Journalism), organised the event with support from Unisinos University. All the presentations (in Portuguese) are available online. The event was covered by the regional media and posts on Twitter.

Communicating with Information: Creating Inclusive Learning Environments for Students with ASD / In the Library, With the Lead Pipe

In Brief
The focus of this article is twofold: it 1) considers how digital humanities techniques and methodologies increase accessibility and scholarship opportunities for students with Autism Spectrum Disorder; and 2) outlines how libraries can collaborate with existing services to provide subsequently appropriate supports for students. Autism Spectrum Disorder (ASD), one of the increasingly prevalent manifestations of neurodiversity within higher education, presents significant challenges to students interacting with academic resources and producing traditional scholarly outputs. Traditional scholarship presupposes that students possess a certain level of ability to interact with their course materials, analyze that interaction, and then write both about their interaction and analyses. However, limitations in working memory and Theory of Mind create additional barriers for students with ASD in meeting these presumptions. Fortunately, emerging scholarly practices within the digital humanities now provide more equitable mediums for scholastic output as well as new opportunities for students to access and interact with course content and materials. While current structures within academia presuppose that students are able to interact with materials in a specific way, libraries are uniquely positioned to collaborate with constituent departments and services across campuses of higher education to teach students emerging strategies to more effectively interact with scholarly materials.

By Frederick C. Carey


Institutions of higher education not only offer students the academic freedom to cultivate intellectual interests and develop skills that they can hone into lifelong careers, but they also establish social and professional expectations that provide the foundations for sustained success. As such, students are expected to interact with social and professional networks both in person and virtually. However, studies show that perpetual connectivity through social media and other technological platforms contribute to increased cases of stress, anxiety, and depression.1 Therefore, institutions of higher education support students’ needs in these areas by offering mentoring, mental health, and transitional services to better equip students to successfully adapt and thrive within their new environments. The effectiveness of these services, however, are explicitly connected to the makeup of the student population they serve.

Currently, student populations across higher education continue to grow increasingly neurodiverse2 , and as such, both social and academic services have been institutionalized to meet student needs. Institutions provide supports for transitioning into new routines; navigating new social structures both in and outside of classroom settings; managing fatigue and sensory overload; treating anxiety, depression, and stress; as well as developing executive function (EF) skills related to planning, organizing, and prioritizing information; self-monitoring; self-regulating; and creating time management plans. These services are essential for acclimating to the social and professional structures of higher education and post-collegiate life, but do not provide all the tools neurodivergent students need to succeed in academia.

Autism Spectrum Disorder (ASD), one of the increasingly prevalent manifestations of neurodiversity within higher education, presents significant challenges to students interacting with academic resources and producing traditional scholarly outputs. Traditional scholarship presupposes that students possess a certain level of ability to interact with their course materials, analyze that interaction, and then write both about their interaction and analyses. However, limitations in working memory and Theory of Mind (ToM) create additional barriers for students with ASD in meeting these presumptions. Fortunately, emerging scholarly practices within the digital humanities (DH) now provide more equitable mediums for scholastic output as well as new opportunities for students to access and interact with course content and materials. While current structures within academia presuppose that students are able to interact with materials in a specific way, libraries are uniquely positioned to collaborate with constituent departments and services across campuses of higher education to teach students emerging strategies to more effectively interact with scholarly materials. Therefore, the focus of this article is twofold: it 1) considers how digital humanities techniques and methodologies increase accessibility to course materials and scholarship opportunities for students with Autism Spectrum Disorder; and 2) outlines how libraries can collaborate with existing services to provide subsequently appropriate supports for students to more effectively interact with their course materials.

Autism Spectrum Disorder

The 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) characterizes ASD as a range of neurodevelopmental conditions that manifest through either deficiency in social interaction and communication across multiple contexts, or restricted, repetitive patterns of behavior, interests, or activities3 . Since the publication of DSM-5 in 2013, ASD now “encompasses disorders previously referred to as early infantile autism, childhood autism, Kanner’s autism, high-functioning autism, atypical autism, pervasive developmental disorder not otherwise specified, childhood disintegrative disorder, and Asperger’s disorder.”4 Subsequently, the challenges that those with ASD face can vary depending on the manifestation in each individual.

ASD continues to grow as one of the most common manifestations of neurodivergence both inside and outside of higher education. The Center for Disease Control and Prevention’s most recent statistics indicate that the overall prevalence of ASD is approximately 1 in 59 children over the age of 8 years old, or approximately 1.7% of the overall student population.5 Despite the increased prevalence and understanding of ASD, graduation rates within higher education for students with ASD remain low. According to a 2011 report commissioned by the US Department of Education, 52% of students without any registered disability graduate from their respective programs, while only 39% of students with ASD graduate.6 These statistics reveal a gap in equitable higher education opportunities for students with ASD.

This gap becomes even more apparent when considering the number of students who enter higher education without a formal ASD diagnosis or who choose not to disclose their diagnosis. In a study conducted by White et al. evaluating the prevalence of students with ASD on college campuses, none of the 5 participants who met ASD criteria from the 667 sample set had previously been diagnosed.7 Furthermore, Underhill et al. discovered that many students elect not to disclose their diagnosis out of fear of either becoming stigmatized by their instructors and peers or creating new social barriers for themselves.8 Subsequently, these students do not receive many of their entitled supports, and it is likely that the true gap in graduation rates is larger than the statistics indicate.

Supports prioritizing immediate social, environmental, and executive function challenges are increasingly becoming routine procedure across institutions of higher education. In order to effectively establish equitable learning environments for students with ASD, however, it is imperative that support be given to students in navigating the inherent social and communicative components of scholarship, especially within disciplines that emphasize expository and persuasive writing. Acknowledging these fundamental characteristics of traditional scholarship and the added challenges that they create for students with ASD will positively contribute to establishing more inclusive, equitable learning environments.

Social and Communicative Characteristics of Traditional Scholarship

The social and communicative interactions inherent within traditional modes of scholarship create barriers for students with ASD. Despite oral communication barriers appearing more immediate than those created by written language due to observable extrinsic manifestations, the skills required to understand and interpret both modes of communication remain similar. In fact, the syntactical structure and language of the written word is often more complex than oral speech.9 This can be especially true of the materials that students work with in higher education that, depending on the discipline, may incorporate high amounts of technical writing, figurative language, or older systems of speech that are no longer used in contemporary language.

Language comprehension is established by forming inferences and hypotheses from the language used, the schemata in which it exists, and the context in which it was delivered. It presupposes an inherent understanding of the social constructs of language.10 In order to accurately and effectively make inferences based on the schemata and structure of the communicated information, one must have mastered the social context in which the information exists and is delivered. Current social skills interventions offered through therapy treatments can assist those with ASD to interpret facial expressions, body language, and other markers to better navigate social interactions.11 These strategies can be used to indicate when sarcasm, metaphor, or other nonliteral expressions of language may be changing the meaning of what is spoken.12 However, students do not have the same markers that help recognize such constructs when reading. In speaking of figurative language, Vuchanova et al. state that “such expressions are characterized by interpretations which cannot be retrieved by simply knowing basic senses of constituent lexical item, and where the addressee needs to arrive at the intended meaning rather than what is being said.”13 Therefore, while the skills required to understand oral and written language are similar, interpreting written language relies solely on the intrinsic social and communicative literacy of the reader, while oral language interpretation can benefit from extrinsic interventions.

Producing written language, however, proves even more challenging than interpreting it. In a study on effective writing interventions for students with ASD, Accardo et al. state that “writing has a social context, follows rules and conventions, and makes use of inferences and ambiguous meaning to convey humor and metaphor, all of which can be challenging to individuals with ASD.”14 When reading, students only need to recognize the social context of what is presented, but when writing, they are expected to recreate that social context and use it to deliver their thoughts and findings. The skills needed to recognize social structures differ drastically from those needed to replicate these structures, and as such students with ASD face significant barriers in producing traditional scholarly outputs. Furthermore, the rules and conventions of writing differ depending on genre. In a 2020 study, Price et al. demonstrate that expository and persuasive writing prove more challenging than narrative writing for students with ASD.15 Additionally, Walters’ 2015 case study into the experiences of two first year writing students with ASD states that one student “struggled to translate her passion for writing into the classroom because her ways of writing – particularly in her fan fiction communities – were not valued as social or socially meaningful in her course.”16 Students in higher education are not only expected to write across genres, but also are often writing across academic disciplines that incorporate their own specific conventions. All of these challenges can be further understood by considering the roles of working memory and Theory of Mind in these processes.

Working Memory

Working memory proves essential for communicating any thoughts, ideas, or connections as it dictates the amount of information an individual can efficiently process at any given time. Camos and Barrouillet describe it “as a kind of mental space, located in frontal lobes of the brain, corresponding to a quick-access memory able to hold temporary, transient plans for guiding behavior.”17 It enables the multitasking functionality required when making connections, taking notes, and presenting information. Subsequently, students with ASD experience numerous challenges when interacting with their course materials due to limitations in their working memory. Thoughts easily get lost while considering the syntactical components and structure of language when performing tasks such as reading and writing. Graham et al. point to spelling as one such challenge. They state that “students may forget plans and ideas they are trying to hold in working memory as they stop to think about how to spell a word.”18 Similarly, thoughts and connections can be lost when attempting to parse the syntax and structure of complex writing, metaphors, figurative language, or other nonliteral structures. While executive function strategies, such as immediately writing down thoughts when you have them, are helpful techniques for overcoming such challenges, limitations in working memory present persistent obstacles for students with ASD.

Theory of Mind

ToM directly impacts how individuals recognize, empathize, and interact both with thoughts and emotions, and subsequently highlights many of the challenges that students with ASD face when interacting with their course materials. The role of ToM can be better understood by distinguishing between cognitive ToM and affective ToM. Pino et al. state that “cognitive ToM refers to the ability to make inferences about beliefs, intentions, motivations and thinking, whereas affective ToM is the ability to understand what people feel in specific emotional contexts such as their own emotional states.”19 In order to effectively make inferences and connections through cognitive ToM, it is necessary to recognize and understand emotional states and undertones through affective ToM. Scholarship, especially in the humanities, expects a high-level cognitive ToM, and subsequently, a strong foundation in affective ToM. However, the inherent social and communicative components of language and traditional scholarship create major barriers for students with ASD in establishing an affective ToM foundation. Limitations in working memory further exacerbate this loose foundation as students attempt to build upon it using the skills involved in cognitive ToM. Furthermore, studies demonstrate that students with ASD do not develop ToM skills at the same rate as their peers. ToM development progresses in a specific sequence, and Broekhof et al. demonstrate that while students with ASD follow the same sequence as their peers, their developmental timeline is comparatively delayed.20 In order to create equitable and inclusive learning environments in institutions of higher education, it is therefore essential that supports be implemented to assist students with ASD in overcoming these barriers and accessing course materials more effectively.

Emerging Opportunities

Over the last few decades, research and the way it is conducted has developed just as rapidly as the technology available to researchers. In reflecting upon research developments during this era of technological growth, it is easy to think about the way that new (and not so new) tools have been adopted into the research process. The digital humanities, however, encapsulates much more than just tools and how they can be integrated into humanities research. DH represents the discovery of new methodologies for doing research, new ways of interacting with materials, and new manners for telling stories and disseminating knowledge. DH is not a replacement for the humanities; it enlarges the scope of what is possible within the humanities and how humanities research can be done. It increases accessibility not only to how materials can be analyzed and interrogated, but also to how information can be shared and communicated. It allows for a much more inclusive environment that invites new perspectives and collaborations across disciplines.

Not only do DH methodologies, techniques, and outputs grow the humanities, but they can also provide respite to many of the scholastic challenges that students with ASD face. These emerging scholarly practices create opportunities for people to access and interact with materials in ways that were previously not possible. Textual analysis techniques such as sentiment analyses and topic modeling can provide students with ASD opportunities to move beyond some of the challenges they face when interacting with course materials. Various forms of visualizations can provide alternative scholastic outputs for students instead of the more limiting traditional forms. DH practices can not only provide students with ASD the opportunity to interact with scholarly materials in a more unrestrained way, but they can also empower students to communicate their work and tell the stories they are interested in telling through a more unrestricted outlet.

Furthermore, libraries have emerged as the center of DH support in institutions of higher education. This is due in part to libraries serving the needs of all constituent departments as a neutral entity. More importantly, however, libraries are devoted to helping students develop information literacy skills. The Association of College & Research Libraries’ (ACRL) Framework for Information Literacy in Higher Education (Framework) defines information literacy as “the set of integrated abilities encompassing the reflective discovery of information, the understanding of how information is produced and valued, and the use of information in creating new knowledge and participating ethically in communities of learning.”21) The values of information literacy and DH methodologies and practices ideally dovetail to make libraries the natural support structure for DH projects.

Textual Analysis Strategies

McKee describes a textual analysis as “a methodology – a data-gathering process – for those researchers who want to understand the ways in which members of various cultures and subcultures make sense of who they are, and of how they fit into the world in which they live.”22 Traditionally, researchers conduct such analyses by interrogating, interacting, and interpreting texts through close readings that combine their individual perspectives, contextual awareness, and the structures of the texts undergoing analysis. However, through DH practices the scope of what can be analyzed and how things are analyzed continues to grow larger. Individual words and the subsequent grammatical and syntactical structures in which they exist can now be analyzed as individual data points that allow increased accessibility to texts. Information hidden in the structure of the texts now can be mined, visualized, and interpreted. These practices do not replace traditional processes for gathering data from texts, instead they provide alternate access points for individuals to interact with the data, identify patterns and trends, and interpret the information presented. These alternative access points present students with ASD increased opportunity to interact with texts and bypass some of the social and communicative structures inherent within them.

Idioms, similes, metaphors, and other representations of figurative language all base their comparisons on an intuited set of shared characteristics. Glucksburg claims that one technique for grasping the abstract meaning of figurative language is categorization, which “involves finding the nearest available category that subsumes both X and Y.”23 As previously discussed, connecting abstract concepts provides a barrier for students with ASD and consumes a large amount of their working memory. Topic modeling is a textual analysis strategy that simplifies this process by clustering similarly used words together to help illuminate the syntactical structure and schemata of the text. This allows students to more easily recognize patterns based on how the words are used within the local context, and focus on the meaning of those patterns instead of struggling to establish the syntactical structure of the text. Students are able to establish labels for these word clusters based on those patterns and assign their own meanings and interpretations to the groupings. The structures created by topic modeling allow students to move beyond the social and communicative schemata used to deliver the meaning, create a more solid affective ToM foundation, and maximize the amount of working memory available to interact with the meaning of a text though cognitive ToM skills.

Students can also perform a sentiment analysis on a text as a strategy for moving beyond literal language. Sentiment analyses, or opinion mining, allow students to perform emotion recognitions and polarity detections to establish words or phrases in a text that represent emotional meanings. Emotion recognitions can not only help solidify an affective ToM foundation within the context of any given text, but they can also alleviate some challenges posed by limitations in working memory by providing a non-abstract structure for students to recognize and assign more figurative and abstract concepts. Similarly, polarity detection creates a structure in which abstract ideas can be categorized by emotional relation and be used comparatively. Cambria states that polarity detection is “usually a binary classification task with outputs such as ‘positive’ versus ‘negative,’ ‘thumbs up’ versus ‘thumbs down,’ or ‘like’ versus ‘dislike’.”24 Such identification can be especially useful in comparing voices within a single text or comparing tone within larger corpora. Similarly to topic modeling, sentiment analyses maximize students’ functional ability to employ cognitive ToM skills to interact with course materials beyond the meaning of the language within that set schemata.

Strategies such as these relate directly to two of the threshold concepts in ACRL’s Framework: “Information has Value” and “Research as Inquiry”. First, ACRL states, “Information possesses several dimensions of value, including as a commodity, as a means of education, as a means to influence, and as a means of negotiating and understanding the world.”25 When interrogating texts, there are several layers of information and dimensions of value. Researchers can extract a plethora of information and insight conducting a close reading of a text. However, incorporating textual analysis strategies allows for different information and insight to be drawn from different layers of resources. These strategies increase the scope of what is possible when working with texts. Furthermore, ACRL adds that “research is iterative and depends upon asking increasingly complex or new questions whose answers in turn develop additional questions or lines of inquiry in any field.”26 These strategies allow researchers to ask questions and embark down roads of inquiry that were not possible in the past. In helping students develop information literacy skills, librarians encourage the use of new research strategies to find new ways of interacting and interpreting information encased within materials.

Alternative Outputs

Emerging DH methodologies not only allow for outputs, such as story mapping, geographic information system mapping, and social network analyses to be considered as alternatives to traditional forms of scholarship, but in some cases they necesitate it. As technological advancements grow and new methods of conducting research emerge, traditional forms of scholarship grow increasingly restrictive. Unilaterally relying on traditional scholarly outputs undermines the research process and places greater emphasis on individual outputs than on the research itself. Scholarly outputs are simply instruments used to communicate knowledge derived from the research process. To adhere to a singular, prescriptive output while more appropriate outputs exist for communicating specific information is not only counterintuitive, but also jeopardizes the impact of the research itself.

In leading the charge to develop student’s information literacy skills, libraries emerge as ideal advocates for promoting the implementation of increased scholarly outputs. ACRL’s Framework cites “Information Creation as a Process” as one of the threshold concepts of information literacy. In defining this frame, ACRL states, “Information in any format is produced to convey a message and is shared via a selected delivery method.”27 It adds that “the iterative processes of researching, creating, revising, and disseminating information vary, and the resulting product reflects these differences.”28 In order to properly assist students in developing information literacy skills, it is therefore essential that librarians not only make students aware of alternative outputs, but that they also advocate to constituent departments on campuses of higher education to do the same. In order to create inclusive learning environments for students with ASD, the emphasis needs to be placed on the research process itself, not the output. Emphasizing traditional outputs highlights limitations beyond students’ control. In focusing emphasis on the research process, students will be empowered to direct their efforts to conducting research and developing strong foundational research strategies. It is imperative to encourage students to communicate their research through the medium that they perceive to be the appropriate output for their project or individual communication style.

Opportunities for Library Supports

Libraries provide an ideal infrastructure for supporting neurodivergent students to more effectively interact with scholarly materials. These supports need to take a more prominent role in conversations regarding the future of information literacy. As emerging scholarly practices continue to become an increasingly prominent part of research, it is important to consider the challenges that neurodivergent students face when interacting with materials, and consider new research techniques and methodologies as opportunities to create more accessible, inclusive learning environments. This endeavor is not only a cornerstone of information literacy, but a principal value of librarianship. In discussing the differences between data and information, Lanning asserts that information needs “some kind of context for their meaning to be discerned.”29 As discussed, there are numerous layers to this context that create barriers for neurodivergent students to effectively interact with information due to the social and communicative aspects of the syntactical schemata in which it exists, limitations in working memory, and comparatively delayed ToM development. However, the unique role of libraries within institutions of higher education creates opportunities to teach emerging research techniques and strategies to students directly, collaborate with services across campus to create more holistic support networks, and work directly with constituent campus departments to establish inclusive learning environments.

Campus-wide Collaborations

In a study into establishing strategies for more effectively integrating student supports into their academics, Dadger et al. found that all strategies have the same two aims: “(a) to make student services and supports a natural part of students’ college experience and (b) to increase the quality of both support services and instruction.”30 In order to effectively meet these goals with relation to supporting neurodivergent students and establishing a strong network of services, increased collaborations between librarians and disability services, academic mentors and coaches, and advising personnel are crucial. The challenges that neurodivergent students face are multifaceted and require a widespread system of supports that work harmoniously together. Dadger et al. found that the first step to creating such a network is to connect preexisting services.31 Many established library services, especially one-on-one consultations with librarians, can prove beneficial to neurodivergent students, but students may not be aware that these services exist. Students who disclose their diagnoses and seek supports from campus are involved in at least some, if not all, of the aforementioned programs, so increased collaborations can increase visibility of preexisting library services.

Such collaborations would also invite the establishment of new supports. In a 2018 survey assessing which supports students with ASD found most helpful, Accardo et al. discovered that 91% of participants identified academic coaching as a preferred service, with one participant adding that coaching is a support that “isn’t contingent on somebody’s agenda for me.”32 Academic coaching and mentoring provides increased agency to students, and librarians can positively contribute to furthering that development by providing services around interacting with course materials. If greater collaboration exists between librarians and mentors, then mentors will both be able to suggest to their students specific library services that benefit their individual goals and plans, as well as make suggestions to librarians for new services that they think would benefit their students. All of these collaborations can help students interact with their course materials by making library services more visible and encouraging increased communication between students and their full network of supports.

Liaising with Constituent Departments

As previously discussed, many neurodivergent students elect not to disclose their diagnoses and subsequently do not receive any of the services to which they are entitled. This makes it all the more important for liaison librarians to work closely with their constituent departments to establish inclusive environments and practices. Much of the outreach that liaison librarians do is already geared towards creating inclusive learning environments, but it is imperative that liaison librarians bring new research strategies both to their students and faculty to ensure continued growth in developing such practices and spaces. As conversations focused on neurodivergent inclusivity within information literacy continue, many new practices will emerge and liaison librarians will be the primary drivers of delivering these practices across campuses. For now, many of these practices within the humanities are emerging through DH engagement, so it is imperative that liaison librarians focus on cultivating DH understanding and acceptance within the culture of their constituent departments. Organizing workshops and presentations that incorporate DH practices relevant to departmental research interests, inviting constituent faculty to collaborate on a project incorporating emerging scholarly practices, and sharing digital projects are a few examples of efforts that may lead to increased opportunities to grow emerging practices in constituent departments. Many disciplines are still in the midst of establishing best practices for considering scholarship and outputs that fall outside the traditional scope, and as such, may be unsure as to how to appropriately encourage students to engage with such practices. Moving forward, libraries will continue to play an integral role not only in supporting the creation of new information and scholarship, but also ensuring that best practices are created for using research innovation to create inclusive learning environments.

Teaching Emerging Research Techniques

The majority of students will engage with library-led information literacy opportunities through supplementary sessions within courses taught through constituent campus departments. While some courses may integrate these sessions at numerous points during a semester, it is common that students either only have the opportunity to participate in one session or are not presented with the opportunity at all. The focus of these content-oriented courses is not to develop information literacy skills for interacting with course materials, but instead is on extrapolating knowledge or ideas by interacting with the course material and then presenting this knowledge through a largely proscriptive medium. Their structures presuppose that students are able to interact with the materials in a specific way, and are not designed to teach students how to interact with the materials themselves. They may introduce new forms of materials and teach students how to use or incorporate those materials, but even within these situations the ability to interact with the information is assumed. While these courses may not be the appropriate place to teach students techniques or research methods that enable a deeper interaction with their texts, such a course is necessary.

The mission and values of librarianship make libraries the ideal home for such courses. Libraries are becoming the central support for emerging scholarly practices and DH, and the devotion that librarians demonstrate to information literacy make them ideally suited not only to teach students how to interact with materials, but also how to present their work in nontraditional ways. Such courses can empower all students, but especially neurodivergent students, to not only take control of their own research endeavors but also to increase agency when participating in other courses. Despite most academic disciplines requiring some variation of a discipline-specific research and writing course, these courses are structured around traditional academic norms that do not provide neurodivergent students with the supports they need for effectively interacting with materials. If libraries begin offering courses that teach these supports, then neurodivergent students may face reduced barriers in their discipline-specific courses. More research into the effectiveness of such courses needs to be conducted, but indicators discussed in this article suggest that they have the potential to positively contribute to more inclusive learning environments.


Institutions of higher education are currently maneuvering shifts both in the neurological makeup of student populations and the composition of scholarship itself. As student populations continue to grow more neurodiverse, and DH practices establish themselves as research norms, libraries will play an important role in establishing more inclusive learning environments for students and faculty. Neurodivergent students face a plethora of additional challenges to their peers. While many of those challenges are already being supported through various services, there are no institutionalized supports that help students approach the social and communicative aspects of interacting with information and their course materials. Limitations in working memory and ToM development combined with the social and communicative components inherent within the engagement with and production of traditional modes of scholarship significantly impact neurodivergent students’ abilities to successfully maneuver collegiate expectations. However, libraries can play a decisive role in supporting these students and creating more inclusive learning environments. DH methodologies and practices challenge the limitations of traditional modes of scholarship and provide neurodivergent students an opportunity both to interact with and present information in ways that they were unable to in the past. Libraries can currently teach strategies for interacting with information by integrating into the ever-growing system of services campuses offer students. They can implement research strategy courses that specifically target the research needs of neurodivergent students and advocate for more inclusive practices to be implemented within constituent departments. Moving forward there is an increasing need for greater emphasis to be placed on supporting the information literacy needs of neurodivergent students. As institutions of higher education continue to grow more neurodiverse, it is the responsibility of libraries to create accessible means and strategies for students to effectively interact with and present information.


I would like to extend my sincerest gratitude to peer reviewers Jessica Schomberg and Bethany Redcliffe, as well as publishing editor Ian Beilin for their insight, enthusiasm, and encouragement throughout the review process. Their thoughtful feedback and probing questions contributed immensely to the formation of this article. I would also like to thank Merinda McLure, whose continued support and guidance during the early stages of developing these ideas was irreplaceable. I am very thankful for all of your efforts and contributions to making this the piece that it is. Thank you all!


Accardo, Amy L., Elizabeth G. Finnegan, S. Jay Kuder, and Estyr M. Bomgardner. “Writing Interventions for Individuals with Autism Spectrum Disorder: A Research Synthesis.” Journal of Autism and Developmental Disorders, March 5, 2019.

Accardo, Amy L., S Jay Kuder, and John Woodruff, “Accommodations and Support Services Preferred by College Students with Autism Spectrum Disorder.” Autism, February 23, 2018.

American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (Arlington, VA: American Psychiatric Association, 2013).

Anderson, Anastasia H., Jennifer Stephenson, and Mark Carter. “A Systematic Literature Review of the Experiences and Supports of Students with Autism Spectrum Disorder in Post-Secondary Education.” Research in Autism Spectrum Disorders 39 (July 2017): 33–53.

Association of College and Research Libraries (ACRL). “Framework for Information Literacy for Higher Education.” (2015)

Barnhill, Gena P. “Supporting Students With Asperger Syndrome on College Campuses: Current Practices.” Focus on Autism and Other Developmental Disabilities 31, no. 1 (March 2016): 3–15.

Broekhof, Evelien, Lizet Ketelaar, Lex Stockmann, Annette van Zijp, Marieke G. N. Bos, and Carolien Rieffe. “The Understanding of Intentions, Desires and Beliefs in Young Children with Autism Spectrum Disorder.” Journal of Autism and Developmental Disorders 45, no. 7 (July 2015): 2035-2045.

Cambria, E. “Affective Computing and Sentiment Analysis.” IEEE Intelligent Systems 31, no.2 (March 2016):102-107.

Camos, Valerie and Pierre Barrouillet. Working Memory in Development (Abingdon, Oxon: Routledge, 2018).

Centers for Disease Control and Prevention (2018) Prevalence of Autism Spectrum Disorder among Children Aged 8 Years. Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. Morbidity and Mortality Weekly Report (MMWR) Surveillance Summary, 67(6), 1–28.

Dadgar, Mina, Thad Nodine, Kathy Reeves Bracco, and Andrea Venezia. “Strategies for Integrating Student Supports and Academics: Strategies for Integrating Student Supports and Academics,” New Directions for Community Colleges no. 167 (September 2014): 41–51.

Fox, Jesse, and Jennifer J. Moreland. “The Dark Side of Social Networking Sites: An Exploration of the Relational and Psychological Stressors Associated with Facebook Use and Affordances.” Computers in Human Behavior 45 (April 2015): 168–76.

Glucksberg, S. “Understanding Metaphors: the Paradox of Unlike Things Compared,” in Ahmad K. (eds) Affective Computing and Sentiment Analysis: Emotion, Metaphor, and Terminology. Springer, Dordrecht, 2011.

Graham, Steve, Alyson A. Collins, and Hope Rigby-Wills. “Writing Characteristics of Students with Learning Disabilities and Typically Achieving Peers: A Meta-Analysis.” Exceptional Children 83, no. 2 (January 2017): 199–218.

Kamhi, Alan G. and Hugh W. Catts. “Language and Reading: Convergences, Divergences, and Development,” in Reading Disabilities. Boston, Massachusetts: College-Hill Press, 1989.

Kross, Ethan, Philippe Verduyn, Emre Demiralp, Jiyoung Park, David Seungjae Lee, Natalie Lin, Holly Shablack, John Jonides, and Oscar Ybarra. “Facebook Use Predicts Declines in Subjective Well-Being in Young Adults.” PLOS ONE 8, no. 8 (August 14, 2013): e69841.

Lanning, Scott. Concise Guide to Information Literacy, 2nd Edition. Santa Barbara, California, 2017.

Lerner, Matthew D., and Amori Y. Mikami. “A Preliminary Randomized Controlled Trial of Two Social Skills Interventions for Youth With High-Functioning Autism Spectrum Disorders.” Focus on Autism and Other Developmental Disabilities 27, no. 3 (September 2012): 147–57.

Lin, Liu yi, Jaime E. Sidani, Ariel Shensa, Ana Radovic, Elizabeth Miller, Jason B. Colditz, Beth L. Hoffman, Leila M. Giles, and Brian A. Primack. “Association Between Social Media Use and Depression Among U.S. Young Adults: Research Article: Social Media and Depression.” Depression and Anxiety 33, no. 4 (April 2016): 323–31.

McKee, Alan. Textual Analysis : A Beginner’s Guide. London: SAGE Publications, 2003. ProQuest Ebook Central.

Newman, L., Wagner, M., Knokey, A.-M., Marder, C., Nagle, K., Shaver, D., … Schwarting, M. (2011). The Post-high School Outcomes of Young Adults with Disabilities up to 8 Years after High School. A Report from the National Longitudinal Transition Study-2 (NLTS2) (NCSER 2011–3005) Menlo Park, CA: SRI International.

Peterson, Candida C. and Henry M. Wellman. “Longitudinal Theory of Mind (ToM) Development from Preschool to Adolescence with and without ToM Delay.” Child Development 00, no.0 (April 2018): 1-18.

Pino, Maria Chiara, Monica Mazza, Maelania Mariano, Sara Peretti, Dagmara Dimitriou, Francesco Masedu, Marco Valenti, and Fabia Franco. “Simple Mindreading Abilities Predict Complex Theory of Mind: Developmental Delay in Autism Spectrum Diorders.” Journal of Autism and Developmental Disorders 47, no. 9 (September 2017): 2743-2756.

Price, Johanna R., Gary E. Martin, Kong Chen, and Jennifer R. Jones. “A Preliminary Study of Writing Skills in Adolescents with Autism Across Persuasive, Expository, and Narrative Genres.” Journal of Autism and Developmental Disorders 50, no. 1 (January 2020): 319–32.

Twenge, Jean M., Thomas E. Joiner, Megan L. Rogers, and Gabrielle N. Martin. “Increases in Depressive Symptoms, Suicide-Related Outcomes, and Suicide Rates Among U.S. Adolescents After 2010 and Links to Increased New Media Screen Time.” Clinical Psychological Science 6, no. 1 (January 2018): 3–17.

Underhill, Jill Cornelius, Victoria Ledford, and Hillary Adams. “Autism Stigma in Communication Classrooms: Exploring Peer Attitudes and Motivations toward Interacting with Atypical Students.” Communication Education 68, no. 2 (April 3, 2019): 175–92.

Van Hees, Valérie, Tinneke Moyson, and Herbert Roeyers. “Higher Education Experiences of Students with Autism Spectrum Disorder: Challenges, Benefits and Support Needs.” Journal of Autism and Developmental Disorders 45, no. 6 (June 2015): 1673–88.

Vannucci, Anna, Kaitlin M. Flannery, and Christine McCauley Ohannessian. “Social Media Use and Anxiety in Emerging Adults.” Journal of Affective Disorders 207 (January 1, 2017): 163-166.

Vulchanova, Mila, David Saldana, Sobh Chahboun, and Valentin Vulchanov. “Figurative Language Processing in Atypical Populations: The ASD Perspective.” Frontiers in Human Neuroscience 9 (February 17, 2015).

Walters, Shannon. “Toward a Critical ASD Pedagogy of Insight: Teaching, Researching, and Valuing the Social Literacies of Neurodiverse Students,” Research in the Teaching of English Vol. 49, No. 4 (May 2015): 340-360.

White, Susan W., Thomas H. Ollendick, and Bethany C. Bray. “College Students on the Autism Spectrum: Prevalence and Associated Problems.” Autism 15, no. 6 (November 2011): 683–701.

Wei, Xin, Mary Wagner, Laura Hudson, Jennifer W. Yu, and Harold Javitz. “The Effect of Transition Planning Participation and Goal-Setting on College Enrollment Among Youth with Autism Spectrum Disorders.” Remedial and Special Education 37, no. 1 (January 2016): 3–14.

  1. Fox et al., “The Dark Side of Social Networking Sites: An Exploration of the Relational and Psychological Stressors Associated with Facebook Use and Affordances.” Computers in Human Behavior 45 (2015): 168–76.; Kross et al., “Facebook Use Predicts Declines in Subjective Well-Being in Young Adults.” PLoS ONE 8, no. 8 (2013): 1-6.; Lin et al., “Association Between Social Media Use and Depression Among U.S. Young Adults: Research Article: Social Media and Depression.” Depression and Anxiety 33, no. 4 (2016): 323–31.; Twenge et al., “Increases in Depressive Symptoms, Suicide-Related Outcomes, and Suicide Rates Among U.S. Adolescents After 2010 and Links to Increased New Media Screen Time.” Clinical Psychological Science 6, no. 1 (2018): 3–17.; Vannucci et al., “Social Media Use and Anxiety in Emerging Adults.” Journal of Affective Disorders 207 (2017): 163-166.
  2. Accardo et al., “Accommodations and Support Services Preferred by College Students with Autism Spectrum Disorder.” Autism 23, no.3 (April 2019): 574.; Anderson et al., “A Systematic Literature Review of the Experiences and Supports of Students with Autism Spectrum Disorder in Post-Secondary Education.” Research in Autism Spectrum Disorders 39 (2017): 33–34.; Gena P. Barnhill, “Supporting Students with Asperger Syndrome on College Campuses: Current Practices.” Focus on Autism and Other Developmental Disabilities 31, no. 1 (2016): 3.; Underhill et al., “Autism Stigma in Communication Classrooms: Exploring Peer Attitudes and Motivations toward Interacting with Atypical Students.” Communication Education 68, no. 2 (2019): 175.; Van Hees et al., “Higher Education Experiences of Students with Autism Spectrum Disorder: Challenges, Benefits and Support Needs.” Journal of Autism and Developmental Disorders 45, no. 6 (2015): 1673.; Wei et al., “The Effect of Transition Planning Participation and Goal-Setting on College Enrollment Among Youth with Autism Spectrum Disorders.” Remedial and Special Education 37, no. 1 (2016): 3–14.
  3. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (2013).
  4. Ibid.
  5. Centers for Disease Control and Prevention, Prevalence of autism spectrum disorder among children aged 8 years. Autism and developmental disabilities monitoring network, 11 Sites, United States, 2014 (2018): 2.
  6. Newman et al., The post-high school outcomes of young adults with disabilities up to 8 years after high school. A report from the National Longitudinal Transition Study-2 (NLTS2) (2011): 16-21.
  7. White et al., “College Students on the Autism Spectrum: Prevalence and Associated Problems,” Autism 15, no. 6 (2011): 695.
  8. Underhill et al., “Autism Stigma in Communication Classrooms: Exploring Peer Attitudes and Motivations toward Interacting with Atypical Students,” Communication Education 68, no. 2 (2019): 189-190.
  9. Alan G. Kamhi and Hugh W. Catts., “Language and Reading: Convergences, Divergences, and Development,” in Reading Disabilities (1989):1-34.
  10. Ibid.
  11. Matthew D. Lerner and Amori Y. Mikami., “A Preliminary Randomized Controlled Trial of Two Social Skills Interventions for Youth With High-Functioning Autism Spectrum Disorders.” Focus on Autism and Other Developmental Disabilities 27, no. 3 (2012): 147–57.
  12. It is important to remember here that there are multiple manifestations of ASD, and while these techniques can be modified to support individuals with ASD, they are not applicable universally and do not prove effective for all people. Furthermore, these techniques help recognize that something more may be contributing to what is being communicated beyond the literal meaning of the words, but they do not always help decipher the full meaning of what is being communicated.
  13. Vulchanova et al., “Figurative Language Processing in Atypical Populations: The ASD Perspective,” Frontiers in Human Neuroscience 9 (2015): 1.
  14. Accardo et al., “Writing Interventions for Individuals with Autism Spectrum Disorder: A Research Synthesis,” Journal of Autism and Developmental Disorders (2019): 2.
  15. Price et al., “A Preliminary Study of Writing Skills in Adolescents with Autism Across Persuasive, Expository, and Narrative Genres.” Journal of Autism and Developmental Disorders 50, no. 1 (2020): 319–32.
  16. Shannon Walters, “Toward a Critical ASD Pedagogy of Insight: Teaching, Researching, and Valuing the Social Literacies of Neurodiverse Students,” Research in the Teaching of English Vol. 49, No. 4 (May 2015): 353-354.
  17. Valerie Camos and Pierre Barroulillet, Working Memory in Development (2018): 3.
  18. Graham et al., “Writing Characteristics of Students with Learning Disabilities and Typically Achieving Peers: A Meta-Analysis,” Exceptional Children 83, no. 2 (2017): 200.
  19. Maria Chiara Pino et al., “Simple Mindreading Abilities Predict Complex Theory of Mind: Developmental Delay in Autism Spectrum Diorders.” Journal of Autism and Developmental Disorders 47, no. 9 (2017): 2744.
  20. Evelien Broekhof et al., “The Understanding of Intentions, Desires and Beliefs in Young Children with Autism Spectrum Disorder,” Journal of Autism and Developmental Disorders 45, no. 7 (2015): 2035-2045.
  21. Association of College and Research Libraries (ACRL). “Framework for Information Literacy for Higher Education.” (2015
  22. Alan McKee, Textual Analysis : A Beginner’s Guide. (London: SAGE Publications, 2003), ProQuest Ebook Central: 8.
  23. S. Glucksberg, “Understanding Metaphors: the Paradox of Unlike Things Compared,” in Ahmad K. (eds), Affective computing and sentiment analysis: emotion, metaphor, and terminology, Springer, Dordrecht (2011): 4.
  24. E. Cambria, “Affective Computing and Sentiment Analysis,” IEEE Intelligent Systems 31, no.2 (March 2016):103.
  25. ACRL (2015).
  26. Ibid.
  27. Ibid.
  28. Ibid.
  29. Scott Lanning, Concise Guide to Information Literacy (2017): 3.
  30. Mina Dadgar et al.“Strategies for Integrating Student Supports and Academics: Strategies for Integrating Student Supports and Academics,” New Directions for Community Colleges no. 167 (2014): 50–51.
  31. Ibid, 48-49.
  32. Accardo et al. (2018), 574-83.

MarcEdit Webinar: Working with Non-MARC Data (12 PM EST, Apr. 3, 2020) / Terry Reese

Topic: MarcEdit Webinar: Working with Non-MARC Data
Time: Apr 3, 2020 12:00 PM Eastern Time (US and Canada)

Join Zoom Meeting

One tap mobile
+13126266799,,475049028# US (Chicago)
+16468769923,,475049028# US (New York)

Dial by your location
         +1 312 626 6799 US (Chicago)
         +1 646 876 9923 US (New York)
         +1 253 215 8782 US
         +1 301 715 8592 US
         +1 346 248 7799 US (Houston)
         +1 408 638 0968 US (San Jose)
         +1 651 372 8299 US
         +1 669 900 6833 US (San Jose)
Meeting ID: 475 049 028
Find your local number:

Join by SIP

Join by H.323 (US West) (US East) (China) (India Mumbai) (India Hyderabad) (EMEA) (Australia) (Hong Kong) (Brazil) (Canada) (Japan)
Meeting ID: 475 049 028

The Ohio State University

CarmenZoom is supported by the Office of Distance Education and eLearning:

CarmenZoom resources
Phone: 614-688-HELP (4357)

If you have a disability and experience difficulty accessing this content, contact the Accessibility Help Line at 614-292-5000 or Text Telephone for the Deaf at 614-688-8743.

MarcEdit Webinar: Working with Non-MARC data (11 pm EST; Apr. 2) / Terry Reese

Topic: MarcEdit Webinar: Working with Non-MARC data
Time: Apr 2, 2020 11:00 PM Eastern Time (US and Canada)

Join Zoom Meeting

One tap mobile
+13126266799,,300773968# US (Chicago)
+16468769923,,300773968# US (New York)

Dial by your location
         +1 312 626 6799 US (Chicago)
         +1 646 876 9923 US (New York)
         +1 346 248 7799 US (Houston)
         +1 408 638 0968 US (San Jose)
         +1 651 372 8299 US
         +1 669 900 6833 US (San Jose)
         +1 253 215 8782 US
         +1 301 715 8592 US
Meeting ID: 300 773 968
Find your local number:

Join by SIP

Join by H.323 (US West) (US East) (China) (India Mumbai) (India Hyderabad) (EMEA) (Australia) (Hong Kong) (Brazil) (Canada) (Japan)
Meeting ID: 300 773 968

The Ohio State University

CarmenZoom is supported by the Office of Distance Education and eLearning:

CarmenZoom resources
Phone: 614-688-HELP (4357)

If you have a disability and experience difficulty accessing this content, contact the Accessibility Help Line at 614-292-5000 or Text Telephone for the Deaf at 614-688-8743.

Archival Cloud Storage Pricing / David Rosenthal

Although there are significant technological risks to data stored for the long term, its most important vulnerability is to interruptions in the money supply. The current pandemic is likely to cause archives to suffer significant interruptions in the money supply.

In Cloud For Preservation I described how much of the motivation for using cloud services was their month-by-month pay-for-what-you-use billing, which transforms capital expenditures (CapEx) into operational expenditures (OpEx). Organizations typically find OpEx much easier to justify than CapEx because:
  • The numbers they look at are smaller, even if what they add up to over time is greater.
  • OpEx is less of a commitment, since it can be decreased if circumstances change.
Unfortunately, the lower the commitment the higher the risk to long-term preservation. Since it doesn't deliver immediate returns, it is likely to be first on the chopping block. Thus both reducing storage cost and increasing its predictability are important for sustainable digital preservation. Below the fold I revisit this issue.

For more than 6 years I've been pointing out that Amazon's margins on its S3 storage service are extortionate, using first local storage and later Backblaze as example competitors. Another issue I raised in Cloud For Preservation was the effect of the lock-in period. The cost and time involved in getting the data out make the customer vulnerable to price hikes. Since cloud storage pricing is normally on a month-by-month basis these can happen with a month's notice.

Another of the risks month-by-month billing poses was detailed by Backblaze CEO in Backblaze Durability is 99.999999999% — And Why It Doesn’t Matter:
Some customers pay by credit card. We don’t have the math behind it, but we believe there’s a greater than 1 in a million chance that the following events could occur:
  • You change your credit card provider. The credit card on file is invalid when the vendor tries to bill it.
  • Your email service provider thinks billing emails are SPAM. You don’t see the emails coming from your vendor saying there is a problem.
  • You do not answer phone calls from numbers you do not recognize; Customer Support is trying to call you from a blocked number; they are trying to leave voicemails but the mailbox is full.
If all those things are true, it’s possible that your data gets deleted simply because the system is operating as designed.
I commented in What Does Data "Durability" Mean?:
Thus Backblaze believes that the probability of losing all your objects due to billing problems is more than 10-6, which makes the difference between 8 and 11 nines of durability of a single object irrelevant.
For the last few years, Wasabi's entire pitch has been that it is 5-6 times cheaper than S3. I used Wasabi among the examples in my report on Cloud For Preservation, in part because their lack of egress charges means a remarkably short lock-in period:

Storage-only Services
Service In Store Out Total Lock-in
Wasabi $2,495 $59,880 $2,495 $64,870 0.5
Backblaze B2 $2,504 $60,000 $12,504 $75,008 2.5
Now, Chris Mellor's Wasabi intros price-fixing – in a good way – for cost-conscious cloud storage customers reports that Wasabi is addressing not just the total cost but also its predictability:
Wasabi, a cloud storage startup, has devised a pricing model whereby customers buy capacity up-front in return for lower charges.

Wasabi’s Reserved Capacity Storage (RCS) provides enterprises with price predictability and one-time billing. The deal sees storage costs cut by up to 27 per cent compared with pay-as-you-go when customers commit to fixed price terms of up to five years. Customers pay only for storage that is reserved; there are no fees for data egress, API requests or data retrieval.
Wasabi said its research shows many customers prefer the predictability of a fixed price purchase order. The company thinks RCS will be useful for backups, second copies of data, and archives.
Because archives come in fixed-size chunks, a fixed-price term contract appears to make sense. You are going to use exactly a chunks worth of data every month, so the use-it-or-lose it economics of RCS don't matter.

If you have a 100TB archive at Wasabi on their pay-as-you-go plan with premium support it will cost $38,456 over 5 years. If you pay up front for RCS, assuming you get the 27% discount at this level (the discount details aren't clear from this page) it would cost $28,072, a difference of $10,383. So if the organization's internal cost of capital is more than 7.4%, which it is likely to be, RCS is going to be adjudged more expensive than pay-as-you-go Wasabi, even ignoring the decreased flexibility.

The way organizations account for OpEx and CapEx continues to be a barrier to sustainable digital preservation.

Mapping waste dumping locations in Malawi: Open Data Day 2020 report / Open Knowledge Foundation

On Saturday 7th March 2020, the tenth Open Data Day took place with people around the world organising over 300 events to celebrate, promote and spread the use of open data. Thanks to generous support from key funders, the Open Knowledge Foundation was able to support the running of more than 60 of these events via our mini-grants scheme

This blogspot is a report by Patrick Ken Kalonde from Youth for Environmental Development in Malawi who received funding from the Foreign and Commonwealth Office to inspire university students to take action and contribute to environmental protection through mapping.

University students look at a map of the location of dumpsites around their campus

University students just finished data collection exercise. Everyone was curious seeing captured photos being turned into a waste disposal map. One after another, groups synchronised photos they captured on open access platform. A few minutes later, map showing waste dumping locations across the university campus was ready. 

You might wonder what is really happening. This is in Malawi, and the students are from the Lilongwe University of Agriculture and Natural Resource. The students were participants in an event organised by their peers from Youth for Environmental Development (YED), a youth-led community-based organisation operating in a small area of 310 households in the capital of Malawi, Lilongwe.

The event was organised to celebrate Open Data Day with the idea of sharing knowledge about how open data on waste disposal can help to trigger innovations and solution to make our communities clean. This is also partly to inspire the university students to think on innovative data solutions. 

Nearly 4 hours prior to this particular moment, my colleague Alick Chisale Austin presented about YED, its activities and why the group is cerebrating Open Data Day.

University students were amazed to learn how YED, a group comprised of volunteers comprising of secondary school students, university students, graduates and school leavers from the community, unified with a common desire to make their community free from careless waste disposal practices.

It did not take long before I took the stage, sharing why our volunteer youth organisation has focused on citizen science in confronting the problem of poor sanitation/waste disposal in the community. Several maps were presented with the first one illustrating the location of my community, my home and its key features. The second one presented the same community with waste dumping locations mapped in 2017. According to the second map, 67 percent of the waste dumping locations are right in our local rivers. 

With the presentation of the status quo, I quizzed the audience on whether they think it is a problem or not to have trash accumulating in our river networks. Unanimously, the audience responded it is very undesirable to have them. I also shared information about the clean-up campaign which our group organised in our community back in 2019. We joined what was trending on social media through the #TrashChallenge. 

YED members cleaning up a dumpsite in March 2019

I presented how information and knowledge is generated from data. It was clear to us that spatial data can be used as strong evidence to inform our decisions regarding solving poor waste disposal problem. Not only that, much as the problem of waste disposal is of public concern, I presented the need to have open data platforms for environmental protection.

At this point, I introduced the tools students can use to contribute to making our communities clean by gathering evidence to inform our decisions like Open Litter Map (, an online platform that allows the locations of litter sites to be captured using a smartphone. This data is uploaded online where it can be accessed by anyone. 

I then introduced a practical and hands on exercise of going around the university campus mapping all trash dumping locations. When they finished collecting data, they came back to upload the data online. Moment later when uploaded data was verified, all university campus waste dumping locations were displayed. I led the participants downloading the data they just synchronised which was later exported to QGIS to prepare a map. 

Getting closer to the end of the Open Data Day celebrations, some participants explained that they were interested to extend the environmental data mapping exercises they just learnt when they move back to their various communities. Others were more motivated to continue carry on our work by extending the message about the importance of data to their personal circles.

Being the first team involving university students mapping their own campus, our team hopes this motivates others to generating relevant evidence, vital in making informed decisions for protecting our environment. 

Metadata Events System: Accounting for Time / Mark E. Phillips

In the last post, I mentioned that there four primary things that we have needed in order to move a large portion of our student and staff workers to full remote work during this quarantine.

In this post, I wanted to jump ahead a bit and talk a bit about how we are accounting for remote workers’ time. This is one of the things that the university wrestled with when thinking about moving students offline, how would we be able to account for time. Well, one of the ways we are trying to track time is by using the activity in the metadata editing system to help managers understand what their workers are working on.

Over the past two weeks, the Software Development Unit at the UNT Libraries has been working hard to push out quite a few changes to a system we call Events. This system has been sitting in the background of our metadata editing infrastructure for a number of years collecting what we call “Edit Events”. An Edit Event is logged in the Event system and contains information about the username, which record they edited, the timestamp for when it was edited, and how long the metadata edit window was open while they were doing metadata. This is then aggregated for users in a dashboard that they can view.

For a few years now this system has been unusable because of some code that needed to be refactored now that we have almost 2.5 million edit events. That’s what we have been working on for the past few weeks.

The first thing to note for users of our Edit system is “how do you get to the Events pages”. Well in the upper left corner of the screen, if you click on the “home” icon you will see an Events option in the dropdown.

Getting to Edit Events from Edit Search Dashboard

This will take you to the Events landing page, where you are greeted with an overview of what is going on with the events system. All data is divided into Today, This Month, and All Time and gives you statistics for the number of edits, the number of active users, and the number of unique items that have been edited. Clicking on the different buttons will take you to different places. I am going to walk through by clicking on any of the blue buttons for today.

Edit Events Overall Stats

By clicking on any of the Today buttons, you are presented with statistics for the Edit Events that have happened in the system today. You can get an overview of everything that has happened in the system. You can also navigate to different days by clicking on the Previous Day button.

Daily Stats View for March 30, 2020

At the top of the page, you will see an overview of stats for the day. This includes the number of edits, the number of unique records edited, and how much aggregate time has been spent during the day editing. We also show the first and most recent edit, the number of users that have edited during the day, and then we list the user who has the most edits along with their edit count.

Events Daily Stats Detail

Next up is a view of the Activity By Hour section. I have been amazed to see the time of day when users are editing records. For example, on Saturday the 28th there were 35 users who edited records in 22 of the 24 hours during the day.

Events Hourly Stats Detail

Below that block is the Activity By User for the day. You can see a listing of all of the users who have edited during the day as well as some statistics related to their activity including the number of edits, number of records, total editing duration and the average time per edit, and the average time per record. Clicking on any of the names will take you to an overview of that user’s activity.

Daily Activity by User

A user’s page gives information going back a little over a month so that managers can easily verify the time for pay periods that are either every two weeks or one month. In addition to an overview of all of the user’s activity in the Events system, you can see a breakdown of what days they have edited records as well as statistics about what activity they completed during that day. By clicking on the day link you can see information specific to that day for that user.

User Daily Activity View

The User Hourly Detail View presents statistics for a user on a specific day. It includes similar information as the overview pages mentioned previously. There is also an hourly table that shows when the user was editing records, including how many edits, hours, duration and average time per edit and record for a given hour.

User Hourly Detail View

Below the hourly breakdown of activity, you see all of the edits performed by the user on that day. You can link directly to the edit event or you can view information about the record that they have edited.

Below you will see the detail for a record in the Events system. You see how many total edits have taken place with a record including when, and which user performed the edits. There is a link on the page to view the records summary in the edit system to see more information about the record that was edited.

Record Activity View

When you click on that link you are taken to the record summary page in the Edit interface.

Record in Edit System

If you want to dig deeper into what happened with the record, you can click on the View History link and view different versions of the record to see what changes were made.

Metadata History Page

There are two other views in the Edit Events interface that can be useful. If you had clicked on the orange Users button on the Edit landing page you would see a list of all of the Users who have been active during the current week (starting Sunday).

User Activity for this Week

If you click on the orange items button on the Events landing page you get to a view that shows a listing of the records that were edited this week. It also includes the number of edits and the duration of editing for that week.

Record Activity for This Week

We are hoping that the improved Events pages will be useful for managers as they begin to review timesheets for students that they supervise. I know that I have been pleasantly surprised by the data when I can view the number of edits we are getting at all hours of the day. I think it shows the opportunities that we have with our digital library systems for providing engaging, meaningful work to a wide range of users during this quarantine.

I skipped forward in my list of components we have needed for this process. Over the next blog posts, I will go back and pick up where I left off describing the infrastructure we have in place to communicate instructions and documentation, and finally how we are identifying work that needs to be done in the system.

If you have questions or comments about this post,  please let me know via Twitter.

Raising visibility of women and the LGBT community in Mexico: Open Data Day 2020 report / Open Knowledge Foundation

On Saturday 7th March 2020, the tenth Open Data Day took place with people around the world organising over 300 events to celebrate, promote and spread the use of open data. Thanks to generous support from key funders, the Open Knowledge Foundation was able to support the running of more than 60 of these events via our mini-grants scheme.

This blogspot is a report by Ricardo Mirón from Future Lab in Mexico who received funding from the Foreign and Commonwealth Office to give visibility to women and the LGBT community in local decision making within government, business and civil society using open data.

At Future Lab’s Open Data Day, we aimed to give visibility to women and the LGBT community in local decision making within government, business and civil society using open data. This day served to unite the efforts of different sectors of society, in which we wanted to focus on a topic that is of special relevance in the context of our country, Mexico; where gender violence, discrimination and unequal opportunities are still very present. 

For this we decided to partner with different actors: 

  • LAB León ( the city’s public innovation laboratory. Providing the link with the government and opening databases for use during this day. 
  • Codeando Mexico ( one of the strongest communities and movements in the country in terms of civic technology and citizen participation. 
  • HERE Technologies ( an international company that creates mapping solutions, put at our disposal the necessary infrastructure to be able to visualize the results of this day. 
  • CANACINTRA ( the National Chamber of the Transformation Industry is the body that represents the Industrial Sector of León, supporting with its facilities and willingness to carry out the event. 

During the event, around 80 people participated, mostly young enthusiasts with different profiles such as data scientists, journalists, political scientists, architects and designers as well as several members of public agencies and private initiatives. There was a clear interest in people who usually had no experience in the topic – about half of the attendees had never worked with open data. 

In order to contribute to the movement and culture that we want to promote in society and government, we held three workshops to add to the narrative of how we can create from different methodologies and tools:

  • Open data workshop: (Gender equality) An analysis of public data on femicides in Mexico was carried out, using Tableau and different visualisations an infographic was created. 
  • Open mapping workshop: (LGBT community) A collaborative map was worked among the attendees, starting by geo-referencing points such as cafes, parks, offices and other safe spaces for the community. Subsequently, the data was loaded on a map and it was customised to show these points in a specific polygon in the city of León. 
  • Open source workshop: (Citizen participation) Participants contributed to a code repository on GitHub learning the methodology of participation in an open source project. With a focus on creating citizen collaboration projects. 

As a conclusion, we know that constant and gradual work is needed to continue promoting the use of open data and open source and that real value is generated when we contribute to the solution of the different problems that we face together as society, we hope to add to the raising awareness of how discrimination in the LGBT community and gender inequality affects us all, hoping that this movement and this community will grow exponentially until materialising actions such as public policies, platforms and projects take place.

Under renovation / LITA

The site is currently under renovation as we transfer themes. Please bear with us, as things may move or look different during this transition.

Everybody’s Library Questions: Finding films in the public domain / John Mark Ockerbloom

Welcome to another installment of Everybody’s Library Questions, where I give answers to questions people ask me (in comments or email) that seem to be useful for general consumption.

Before I start, though, I want to put in a plug for your local librarians.  Even though many library buildings are closed now (as they should be) while we’re trying to get propagation and treatment for COVID-19 under control, many of those libraries offer online services, including interactive online help from librarians. (Many of our libraries are also expanding the scope and hours of these services during this health crisis.)   Your local librarians will have the best knowledge of what’s available to you, can find out more about your needs when they talk to you, and will usually be able to respond to questions faster than I or other specific folks on the Internet can. Check out your favorite library’s website, and look for links like “get help” or “online chat” and see what they offer.

OK, now here’s the question, extracted from a comment made by Nicholas Escobar to a recent post:

I am currently studying at the University of Edinburgh getting masters degree in film composition. For my final project I am required to score a 15 minute film. I was thinking of picking a short silent film (any genre) in the public domain that is 15 minutes (or very close to that length) and was wondering if you had any suggestions?

There are three questions implied by this one: First, how do you find out what films exist that meet your content criteria?  Second, how do you find out whether films in that set are in the public domain?  Finally, how can you get access to a film so you can do things with it (such as write a score for it)?

There are a few ways you can come up with films to consider.  One is to ask your local librarian (see above) or professor to recommend reference works or data sources that feature short films.  (Information about feature films, which run longer, are often easier to find, but there’s a fair bit out there as well on short films.)  Another is to search some of the reference works and online data sources I’ll mention in the other answers below.

The answer to the copyright question depends on where you are.  In the United States, there are basically three categories of public domain films:

That’s the situation in the United States, at least.  However, if you’re not in the United States, different rules may apply.  In Edinburgh and elsewhere in the United Kingdom (and in most of the rest of Europe), works are generally copyrighted until the end of the 70th year after the death of the last author.  In the UK, the authors of a film are considered to be the principal director, the screenwriter(s), and the composer(s).  (For more specifics, see the relevant portion of UK law.)  However, some countries will also let the copyrights of foreign works expire when they do in their country of origin, and in those a US film that’s in the public domain in the US would also be public domain in those countries.  As you can see in the UK law section I link to, the UK does apply such a “rule of the shorter term” to films from outside the European Economic Area (EEA), if none of the authors are EEA nationals.  So you might be good to go in the UK with many, but not all, US films that are public domain in the US.  (I’m not a UK copyright expert, though; you might want to talk to one to be sure.)

Let’s suppose you’ve come up with some suitable possible films, either ones that are in the public domain, ones that have suitable Creative Commons licenses or you can otherwise get permission to score, or ones that are in-copyright but that you could score in the context of a study project, even if you couldn’t publish the resulting audiovisual work.  (Educational fair use is a thing, though its scope also varies from country to country.  Here a guide from the British Library on how it works in the UK.)  We then move on to the last question: How do you get hold of a copy so you can write a score for it?

The answer to that question depends on your situation.  Right now, the situation for many of us is that we’re stuck at home, and can’t visit libraries or archives in person.  (And our ability to get physical items like DVDs or videotapes may be limited too.)  So for now, you may be limited to films you can obtain online.  There are various free sources of public domain films: I’ve already mentioned the Internet Archive, whose moving image archive includes many films that are in the public domain (and many that are not, so check rights before choosing one to score).  The Library of Congress also offers more than 2,000 compilations and individual films free to all online.  And your local library may well offer more, as digital video, or as physical recordings (if you can still obtain those).  A number of streaming services that libraries or individuals can subscribe to offer films in the public domain that you can free free to set to music.  Check with your librarian or browse the collection of your favorite streaming service.

I’m not an expert in films myself.  Folks reading this who know more, or have more suggestions, should feel free to add comments to this post while comments are open.  In general, the first librarians you talk to won’t usually be experts about the questions you ask.  But even when we can’t give definitive answers on our own, we’re good at sending researchers in productive directions, whether that’s to useful research and reference sources, or to more knowledgeable people.  I hope you’ll take advantage of your librarians’ help, especially during this health crisis.  And, for my questioner and other folks who are interested in scoring or otherwise building on public domain films, I’ll be very interested in hearing about the new works you produce from them.


Introducing Rockpool - a web app for communities of practice / Hugh Rundle

I've spent roughly a year completely rewriting the code running Aus GLAM Blogs, and it's finally ready for use. The old codebase was built with MeteorJS. This was a really useful framework when I was first learning JavaScript, because it allowed me to build a usable webapp with logins and routing, without really being that good at coding. As my JavaScript and general coding knowledge improved, Meteor became increasingly frustrating. I was spending more time fiddling with versions and dependencies than I was writing code. Integration with the npm module world was imperfect. I also realised there were some larger structural problems with the app. So I foolishly decided to build a new version from scratch.

After fiddling with a few options, including writing it in Python instead of JavaScript, I ended up using express, with the existing MongoDB backend and - a key component - passwordless for user logins. I'm calling the new software Rockpool - because it allows people to form a small conversational community without having to constantly swim in the bigger ocean of information. It's a crappy metaphor but I had to call it something.

What I learned

You might be wondering how on earth it took me (more than) a whole year to rebuild something that is really not that complicated. The answer is partially that I was doing this in my spare time, partially that it got more complex as I completed each part and became more ambitious, and partially that I used this as a learning opportunity for a number of new things I hadn't done before. I spent quite a lot of time thinking about how to implement new features I really wanted: several would require blog owners to sign in to the app, but I didn't want to be responsible for keeping passwords secure. I initially considered passport, but then I found Florian Heinemann's passwordless which sends a one-use login link every time a user wants to log in. Given users may only ever log in one or two times, this was a perfect solution. Unfortunately, Florian hasn't updated the code for a while, and there was a problem with one of the dependencies. So I had to fork the code and make a minor adjustment, and then publish that fork as a new npm module.

I got some experience publishing my own scoped npm module from scratch when I realised that obtaining RSS feed information from a website URL and vice-versa was something I might want to do in other coding projects. That module - feedfinder - also gave me a small project to use for finally learning unit testing with mocha, which I went on to use in a much bigger way in Rockpool. I've written before about using mocha, and how whilst it sometimes felt like it was slowing me down, it almost certainly made the project go faster with better code in the long run.

I played around with some roll-my-own AJAX, learned the basics of React, and then rejected it as far too fiddly and decided to learn and use VueJS instead. It turned out that using mocha to do unit testing really helped me when structuring Vue components, (even though I didn't actually test them with mocha), because it directed me to a kind of API-driven structure for the app's routes and function triggers.

I learned a bit about MongoDB indexes (though it's quite likely I haven't got this right), and finally got my head around Docker Compose. I've been using Docker Compose to run my Mastodon server for a while, but putting Rockpool into Docker containers helped me to understand what's actually going on. In some ways, Rockpool was just a vehicle for learning a bunch of web technologies and frameworks.

What it means for Aus GLAM Blogs

That's a lot of tech jargon for people uninterested in software development! Here's what it means if you follow or have registered your blog with Aus GLAM Blogs:

Improved performance

The old app connected to a remotely hosted database, and I hadn't set up any indexes. There were also problems with the way my database queries were set up. As a result, it was really slow - especially on first loading a page. The new app is much faster.

Update your own blog details

The new software allows blog owners to log in and update the Twitter and Mastodon accounts associated with their blog(s), change the category, or delete the blog from the app entirely (though not old posts). If you already have a blog on Aus GLAM Blogs you'll need to 'claim' your blog. If you ever change the title of you blog, you can use the update feature without actually changing your blog's category, and it will grab the new title.

Integrates with Mastodon

Aus GLAM Blogs will now post to Mastodon as well as to Twitter. Mastodon integration was actually the main thing I wanted to add, originally, but I got a bit carried away. If you want to add your Mastodon account to your blog listing (so you are mentioned on Mastodon when you publish a new blog post) you need to log in to, add you mastodon details, and claim your blog.

Hashtags and content warnings

If your blog is tagged glam blog club then when the bot tweets or toots it will now use the #glamblogclub hashtag. The Mastodon bot will also use content warnings for key words I think might benefit from it.

Improved Search

You can now search by keyword or phrase across tags, titles and authors, or browse by tag. I've also added some filters for 'this month' and 'last month'[1].

Tags are also lightly normalised, as I described last year.

Better options for Pocket

For some time readers have been able to subscribe to the main feed to have posts added directly to their Pocket list. Unfortunately I neglected to create a way to unsubscribe. I've now added that and also an option to exclude particular blogs from the feed if there are some that you find irrelevant or uninteresting. I also removed the function that adds all the tags in the original blog post as tags in Pocket, because it turned out to be really annoying. You can always add Pocket tags manually if you want.

If you have a Pocket account registered with Aus GLAM Blogs, you will already have an account to log in the new app, using the same email address you use for Pocket. If your Pocket username is not an email address, there was no way to migrate your account over to the new app, but I think that only affects me.

Improved admin features

You won't notice this directly (probably), but the new software gives administrators more control over what's going on. I can now suspend a blog if, say, someone gets temporarily hacked, or begins posting things not in line with 'community standards'. This is a gentler option that simply removing the blog completely. It's also now much easier to delete a blog if it goes offline or the feed starts failing for some other reason.

Running your own Rockpool instance

The last reason why this took a bit longer than I anticipated is that I wanted other people to be able to use the code in a meaningful way. It's one thing to put your code up on GitHub. It's quite another to build it in a way that other people can actually understand how to install and use it. I had to think about documentation, and what needed to be adjustable settings rather than hard-coded values. I've probably got some things wrong, but I hope that it's reasonably straightforward to set up your own instance of Rockpool for your own community, should you wish to do so. I'd be really interested to talk to anyone who wants to give it a go.

The whole thing runs in two Docker containers using Docker Compose, so the good news is that you shouldn't have to worry much about getting you environment set up properly or maintaining dependencies. I anticipate that the most difficult thing will be getting past Twitter's Bot Police to get your Twitter API keys.

If you want to try it for your own community, check out the installation instructions on GitHub with the code. I'm more than happy to provide any help or advice.

What now?

Aus GLAM Blogs is now running on Rockpool. You can find out more from the help page, or the Rockpool docs. If you have a blog on Aus GLAM Blogs it would be great if you could log in and claim it, and check out the app.

If you have any feedback I'm keen to hear it. If you spot a bug, please report it. A good thing about having rewritten everything is that it is now much easier for me to make updates and run tests, so if they're needed, updates and bug fixes can be done relatively quickly. Special thanks to Alissa who helped me with some great testing, and even submitted a pull request to fix some bugs.

Having said that, I'm hoping to take a bit of a rest from Rockpool for a while. I've got a shelf full of books to read, and a couple of new features to add to ephemetoot.

  1. To be perfectly honest these are really there to make it easier for newCardigan to quickly find the latest GLAM Blog Club posts. ↩︎

MarcEdit Webinar 2.5: Getting started with MarcEdit in MacOS / Terry Reese

Terry Reese is inviting you to a scheduled CarmenZoom meeting.

Topic: MarcEdit Webinar 2.5: Getting started with MarcEdit in MacOS
Time: Mar 30, 2020 01:00 PM Eastern Time (US and Canada)

Join Zoom Meeting

One tap mobile
+16468769923,,292238303# US (New York)
+13126266799,,292238303# US (Chicago)

Dial by your location
+1 646 876 9923 US (New York)
+1 312 626 6799 US (Chicago)
+1 301 715 8592 US
+1 346 248 7799 US (Houston)
+1 408 638 0968 US (San Jose)
+1 651 372 8299 US
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US
Meeting ID: 292 238 303
Find your local number:

Join by SIP

Join by H.323 (US West) (US East) (China) (India Mumbai) (India Hyderabad) (EMEA) (Australia) (Hong Kong) (Brazil) (Canada) (Japan)
Meeting ID: 292 238 303

The Ohio State University

CarmenZoom is supported by the Office of Distance Education and eLearning:

CarmenZoom resources
Phone: 614-688-HELP (4357)

If you have a disability and experience difficulty accessing this content, contact the Accessibility Help Line at 614-292-5000 or Text Telephone for the Deaf at 614-688-8743.

In Media Res / Ed Summers

I’ve been re-reading Bruno Latour’s Reassembling the Social, which is a book that landed me in graduate school in the first place, and has been a guiding text for me since. One of the reasons why I really like the book (other than his argument what constitutes the social) is that Latour is an excellent writer, with a sense of humor. It is remarkable because Latour native tongue is French, but (I believe) he wrote Reassembling in English. I wanted to copy this page because it so aptly characterizes my experience doing field work and the analysis/writing that I’m in the middle of. Latour humanizes the process of research, and makes you feel like you are doing ok, even when it feels like you are not.

What is an account? It is typically a text, a small ream of paper a few millimeters thick that is darkened by a laser beam. It may contain 10,000 words and be read by very few people, often only a dozen or a few hunder if we are really fortunate. A 50,000 word thesis might be read by half a dozen people (if you are lucky, even your PhD advisor would have read parts of it!) and when I say ‘read’, it does not mean ‘understood’, ‘put to use’, ‘acknowledged’, but rather ‘perused’, ‘glanced at’, ‘alluded to’, ‘quoted’, ‘shelved somewhere in a pile’. At best, we add an account to all those which are simultaneously launched in the domain we have been studying. Of course, this study is never complete. We start in the middle of things, in media res, pressed by our colleagues, pushed by fellowships, starved for money, strangled by deadlines. And most of the things we have been studying, we have ignored or misunderstood. Action had already started; it will continue when we will no longer be around. What we are doing in the field–conducting interviews, passing out questionnaires, taking notes and pictures, shooting films, leafing through the documentation, clumsily loafing around–is unclear to the people with whom we have shared no more than a fleeting moment. What the clients (research centers, state agencies, company boards, NGOs) who have sent us there expect from us remains cloaked in mystery, so circuitous was the road tat led to the choice of this investigator, this topic, this method, this site. Even when we are in the midst of things, with our eyes and ears on the lookout, we miss most of what has happened. We are told the day after that crucial events have taken place, just next door, just a minute before, just when we had left exhausted with our tape recorder mute because of some batter failure. Even if we work diligently, things don’t get better because, after a few months, we are sunk in a flood of data, reports, transcripts, tables, statistics, and articles. How does one make sense of this mess as it piles up on our desks and fills countless disks with data? Sadly, it often remains to be written and is usually delayed. It rots there as advisors, sponsors, and clients are shouting at you and lovers, spouses, and kids are angry at you while you rummage about in the dark sludge of data to bring light to the world. And when you begin to write in earnest, finally pleased with yourself, you have to sacrifice vast amounts of data that cannot fit the in the small number of pages allotted to you. How frustrating this whole business of studying is.

And yet, is this not the way of all flesh? No matter how grandiose the perspective, no matter how scientific the outlook, no matter how tough the requirements, no matter how astute the advisor, the result of the inquiry–in 99% of the cases–will be a report prepared under immense duress on a topic requested by some colleagues for reasons that will remain for the most part unexplained. And this is excellent because there is no better way.

(pp. 123-4)

Crowdsourcing streetlights data for Kathmandu: Open Data Day 2020 report / Open Knowledge Foundation

On Saturday 7th March 2020, the tenth Open Data Day took place with people around the world organising over 300 events to celebrate, promote and spread the use of open data. Thanks to generous support from key funders, the Open Knowledge Foundation was able to support the running of more than 60 of these events via our mini-grants scheme

This blogspot is a report by Ankita Shah from Youth Innovation Lab in Nepal who received funding from Mapbox to showcase crowdsourced streetlights data for Kathmandu to influence policy for their maintenance.

Often times we notice people around us complaining about problems they face once they step outside of their houses every day. For instance, in Nepal people complain that the roads are not properly constructed, the air is polluted, there are not enough public transportation or toilets, the traffic jam is getting worst and the list goes on.

It has become a norm for people to complain about one thing or another, criticise and blame the institutions, and the people holding the authority. But most of them fail to do something about it, find a solution and actually work on it, instead of just complaining. They fail to realise that as a citizen of the country, we also have the responsibility to act upon it. Even though we might not have the authority or resources to do the job by ourselves, but we do hold the power to raise our voices, show evidences and make the people with authority accountable.

Realising this growing gap and the urgency to address it, Youth Innovation Lab (YI-Lab) launched LightsON, a digital advocacy campaign that aims to bring open data and awareness together for informed decision making.

A year ago on Open Data Day 2019, the LightsON campaign commenced with the aim of addressing one of the many problems our communities are facing on a daily basic i.e. lack of proper maintenance of streetlights. Streetlights are one of the many public utilities that are important for people for so many reasons. It is the basic infrastructure to ensure safer mobility after sundown. A lot of security and safety issues such as road accidents, theft, burglary, drug abuses, and, rape cases often occur in dark places which can simply be resolved if there is proper lighting and visibility. Unfortunately, most of the streetlights inside the valley does not work, the old ones are not replaced and the new ones not maintained. The institutions who are responsible to maintain these streetlights are failing to address this issue one of the many reasons being lack of data and spatial information of streetlights. Therefore, we decided to collect concrete data of streetlights and make it open and accessible to all so that we can urge the responsible institutions to draft policy for its periodic maintenance. 

During the launch of LightsON, one-day session was hosted by YI-Lab that brought together elected government representatives, officials from the Survey Department, a Nepal Electricity Authority (NEA) representative, Nepal police, open data enthusiasts, local citizens, digital volunteers and youths in an interactive discussion. The session sought to go in-depth of this issue. It included role of responsible institutions, accidents and crime rates in dark places, availability of data, role of technology and most importantly the importance of making data open to the public and giving them the power of interrogation with evidence. (Blog of launch event:

A low-cost mobile app and interactive web portal was developed in coordination with a tech company called NAXA to collect the data of streetlights.  The collected data are fed into the open web platform ( visualising functional and non-functional streetlights data of electric and solar streetlights.  Based on the data, we can identify the type, condition, and functionality of the streetlights with its exact location and picture. YI-Lab strongly believes in the spirit of volunteerism as one of the best mediums to generate a sense of civic responsibility among youths, and so we started the campaign by reaching out to youths from different colleges, sensitising them about the issues and encouraging them to be part of our campaign. 

For Open Data Day 2020, we aimed to shed light on what we had started a year back with the event ‘LightsON: Open Dialogue for Policy’. Supported by the Open Knowledge Foundation, this event aimed to present the streetlights data collected so far as an evidence to initiate and open dialogue to discuss on how the issues of poor maintenance of public utilities can be addressed by the responsible institutions using right data and evidence-based policy making.

The Deputy Mayor of Kathmandu Metropolitan City, Respected Ms. Hariprabha Khadgi (Shrestha), gave a keynote speech on how this issue can be addressed by municipal governments and what initiatives can be taken in future to periodically maintain streetlights. She was delighted with the initiative and extended her support to take this initiative further.

After the speech by Respected Ms. Khadgi, an hour-long open discussion began in the presence Hon. Biraj Bhakta Shrestha, Member of Parliament of Bagmati Province. The open discussion aimed to bring multidisciplinary perspectives on the issue of maintenance of streetlights that can be useful in suggesting the municipal governments to draft suitable policy. There were several interesting and insightful points brought up during the discussion that not only gave everybody an opportunity to learn but also opened up exciting avenues for the LightsON team to take the campaign further.

With such amazing and insightful discourse, the session ended with special remarks by Hon. Biraj Bhakta Shrestha. He has been a supporter of LightsON campaign since its inception. During his remarks, he highlighted the importance and potential of technology and the global paradigm shift towards technology driven. He emphasised that the era is shifting from capital intensive to ideas and innovation and so, the next generation is all about innovative ideas. Referring to LightsON as the tip of the iceberg, he encouraged the team to develop similar other technologies in future to solve other problems. According to him, data is the most important element in development, policy as well as good governance. In order to be able to advocate on policy, Hon. Shrestha urged the team to understand government’s structural functioning and underlined that ownership, economy, and security are the three motivational factors to engage communities. Finally, the session ended with Hon. Shrestha extending his support to take the campaign forward.

Evergreen Community Spotlight: Terran McCanna / Evergreen ILS

The Evergreen Outreach Committee is pleased to announce that March’s Community Spotlight is Terran McCanna of the Georgia Public Library Service (GPLS). Terran is the PINES Program Manager at GPLS.

Terran has been involved with the Evergreen community since 2013. In that time she has been very involved with bug squashing – whether reporting, testing, or patching, Terran has tackled almost every aspect of bug management in Evergreen. For many years she has served as the Coordinator for the community Bug Squashing Week, where she organizes sandbox servers and tracks all bug activity.

Bug Squashing Week was a mere Bug Squashing Day until 2017, and Terran was instrumental in making the larger event happen. “I’m proud of my work in encouraging and enabling more community members to participate in testing through the expansion of Bug Squashing events,” she says.

Terran’s work with bug squashing has generated some impressive Launchpad and git statistics. She is responsible for 218 reported bugs (along with a total of 808 bugs she’s commented on), and has authored 30 patches accepted into Evergreen and signed off on 83 more. 

In addition to working with bugs, Terran also helped found the recently-created New Developers Working Group, which gives new developers a forum to exchange ideas and help each other learn.  Terran was also the local chair of this year’s Evergreen Conference (sadly cancelled), in addition to being a member of the standing Conference Committee for many years.

Prior to joining the PINES team, Terran worked at a PINES member library in several front-line roles. At PINES, she handles tasks such as training, policy discussion, documentation, and helpdesk support. “I have to take both a larger state-wide view of how our consortium works, as well as taking a deeper look at how the software works,” Terran tells us. “I feel that this has made me a strong advocate for making the software more user-friendly, and for improvements to the software that will provide benefit to both patrons and library staff.”

Terran recommends that new community members get involved by joining a community working group, contributing to documentation, or participating in testing.

“It’s been incredibly rewarding and exciting to be a part of the Evergreen community as the software has evolved from the desktop client to the web client, and how it continues to evolve to meet library needs. I look forward to participating in many more developments in whatever way I can.”

Do you know someone in the community who deserves a bit of extra recognition? Please use this form to submit your nominations. We ask for your email in case we have any questions, but all nominations will be kept confidential.

Any questions can be directed to Andrea Buntz Neiman via or abneiman in IRC.

DLF Forum + COVID-19 / Digital Library Federation

We hope you and your families are staying healthy and safe during these unusual times. We write with some updates regarding planning for the 2020 DLF Forum and affiliated events.

Our current intention is to hold the DLF Forum and affiliated events as scheduled in Baltimore this November. CLIR is monitoring the science and current CDC guidelines about COVID-19 and, at this point, it appears that restrictions on large gatherings will be in effect at least through late spring. If these measures have the intended effect of containing the virus and “flattening the curve” so that the situation improves over the summer, we are optimistic that we’ll be able to proceed with the Forum as planned this fall.

We’ll be in ongoing communication with you over the coming months, and we hope you’ll be in communication with us, too. Your thoughts and comments are always welcome, on the DLF-Announce listserv, on Twitter @CLIRDLF, or directly to Aliya and the Forum planning team at Please get in touch if you have questions, concerns, or ideas for new ways we could energize our community for mutual support.

We know that many of you may be adjusting to working from home, caring for family members, or concerned about changes in your income, healthcare, or other personal assets, not to mention feeling the weight of the uncertainty each day brings. We know it may be challenging to think ahead to a conference taking place in November. On our end, the planning must continue to be sure we are ready for a great event later this year, but in watching this unfolding situation, we are committed to being as flexible with our deadlines as possible. To that end, we are extending the Call for Proposals deadline for all of our events by two weeks, to Monday, May 11.

We also want to reassure all of you that, if circumstances change so that the Forum cannot be safely held in person as planned, we will be in touch as early as we can. Additionally, going forward we’ll share with you monthly updates on conference planning milestones, as well as any Covid-19-related changes on the DLF Forum web site.

Again, if you have any questions or if you just want to talk, please reach out to us. We wish you and your families safety and health, and, as always and especially in times like these, we are grateful for this wonderful DLF community.

Aliya and the CLIR/DLF Team

The post DLF Forum + COVID-19 appeared first on DLF.

Tracking spending by Kenya’s county governments: Open Data Day 2020 report / Open Knowledge Foundation

On Saturday 7th March 2020, the tenth Open Data Day took place with people around the world organising over 300 events to celebrate, promote and spread the use of open data. Thanks to generous support from key funders, the Open Knowledge Foundation was able to support the running of more than 60 of these events via our mini-grants scheme

This blogspot is a report by Chepkemoi Magdaline from Eldohub in Kenya who received funding from Hivos to run a hackathon to develop tools and systems which can facilitate county governments’ involvement in Kenya’s transparency, accountability and public participation.

Open Data Day is an annual celebration of open data all over the world. To celebrate Open Data Day 2020, EldoHub, a technology innovation hub located in Uasin Gishu County in the western region of Kenya, hosted an hackathon event to brainstorm and come up with collaborative solutions to transparency and accountability.

The hackathon/pitching competition brought together technologists, students, local government officials and other stakeholders who came up with tools to track county governments’ use of finances provided by the national government or which could give the public to access information regarding the use of finances in the public offices and allow them to track development projects.

The 2020 celebrations in Eldoret raised our voice and triggered conversations for action on youth inclusive participation and transparency showing how tech savvy youth can help the local government of Uasin Gishu to develop tools for tracking the use of finances and running of projects in a more transparent and inclusive way.

The meeting started on time. Purity from EldoHub opened the event by welcoming the participants and giving a brief introduction of EldoHub. She also gave a lightning talk on open data and open government. Timz Owen was the MC for the day. He ensured that the event was celebrated with energy and fun-filled activities.

While Owen was entertaining the participants, Sarah, a software developer at EldoHub, and Zipeta, EldoHub’s hub manager, briefed the judges on the expectations for the event and the judging criteria. Our able judges were Stephen Mwongela, the founder of Plusfarm Kenya; Laryx Ochieng Kosgei from the Eldoret chapter of Start-Up Grind; Beverly Nicole Adhiambo, founder of Initiative For Her; and Gerald Makori from the Tumaini Innovation Centre.

The hackathon then started officially, led by Zipeta, who guided the audience on the design thinking process
and how to develop human-centered designs. The participants were able to come up with ideas and structure them using Business Model Canvas (BMC) in an effort to get a product market fit. Lastly, they were guided on how to come up with a three-minute elevator pitch, demonstrated before the final pitch with the assistance of Sarah Chepwogen and the Eldohub team. The pitching competition then commenced with four groups participating.

The winning idea was My Health Advisor led by Jacinta Gichuhi who works at Ampath. It is a web-based app aimed at saving more lives and resources used in treatment by providing health awareness to the general population.

The winning team present their health awareness app My Health Advisor

The first runner-up was Sambaza Farm, led by Ester Mwaniki of the University of Eldoret, who used open data to develop a project that helps Uasin County government to track and monitor projects that bridge the gap between places with excess food and places without adequate food. This was in an effort to solve the food insecurities in Kenya and all of Africa with transparency and openness.

EldoHub awarded the pitching competition winner with $50 to enable them to register a social enterprise, plus offering them free co-working space at EldoHub for one month with business training and coaching.

What really matters / Hugh Rundle

About a week ago - or maybe it was three days ago, I'm having trouble keeping track these days - the leader of the Australian Labor Party said something utterly predictable and yet highly revealing. It was around the time Qantas announced it was standing down 20,000 workers - two thirds of their workforce - and forcing them to use their accrued leave. If they didn't have any, CEO Alan Joyce helpfully suggested they might be able to compete with the tens of thousands of casually employed hospitality workers whose shifts had disappeared, for a job stacking chronically empty shelves at supermarkets.

"We can't let this crisis change the structure of our economy", said Anthony Albanese. The so called leader of the so called party of "labor" (sic) was at pains to emphasise his satisfaction with the very economic structure that has left so many workers bereft in a time of public health crisis. COVID-19 is not the only disease endangering people right now. The fact that most of the restaurant and cafe workforce was on 'casual' contracts, able to be dismissed or denied shifts at a whim, or that one of the largest and most profitable companies in Australia can simply 'stand down' two thirds of its workforce without pay, or that Australian fruit production can't function without thousands of Pacific Islanders on 'special' visas working under semi-legal pay and conditions, reflects a deep structural disease in Australia's economy.

On Tuesday night Prime Minister Scott Morrison declared - fewer than 60 seconds after running through a list of expanded bans on certain types of work - that "all workers are essential workers", and "if you have a job, it's essential". This sort of brain-dead ideological posturing was just as unsurprising as the Labor Party declaring undying love for neo-liberal capitalism, but did come off as somewhat more unhinged. What we're getting around the world - particularly in countries with "highly developed" economies - is a quick lesson in the limits of capitalism, and more importantly, the possibilities of collective government. Contra Anthony Albanese, we can't let this crisis fail to change the structure of our economy.

Australian news websites have slowly pivoted over the last fortnight from stories about how property prices might "hold up" to stories about newly jobless people having to choose between buying food or paying rent. After decades of declaring that unemployment payments were perfectly adequate - in the face of overwhelming evidence they were contributing to entrenched poverty - the Liberal Party has suddenly discovered that "Jobseeker allowance" is completely inadequate for "ordinary" people, and effectively doubled it. Additional $750 payments to low-income people that were originally conceived as economic stimulus are now being described as additional support for subsistence. Amid this sudden turnaround in government rhetoric, I've seen people both here and in other countries calling for a strong campaign to bring in a permanent Universal Basic Income (UBI). I think this is misplaced.

A cash income is certainly nice - and the hundreds of thousands of Australians who applied for Centrelink payments this month have suddenly discovered what the Australian Unemployed Workers Union[1] has been saying ever since they formed: unemployment payments are insufficient to keep people alive and healthy, the system is too complicated, and the government has deliberately made the entire process demeaning and lengthy. But there's another problem with conceiving a universal basic "income" as the solution to a precarious existence: it misplaces the cause of the precarity. Cash incomes are easily rendered inadequate by the very market forces that a UBI is expected to overcome. Fixed incomes and rising prices have triggered plenty of riots over the last several thousand years.

What we've seen in the last month is a sudden clarity about what people really need. Combining the things the suddenly-unemployed are most concerned about, and the things the suddenly-anxious are stripping from supermarket shelves, a reasonable list of basic needs might look something like:

  • safe, secure housing
  • well-resourced universal health care
  • reliable access to medicines
  • basic hygiene products (soap, toilet paper, tampons, etc)
  • fresh food

I'm cautious about the last two items, in terms of the role of governments. But it's quite clear to me that government guarantees on the first three would deal with most of the reasons people are calling for a UBI in the current crisis. It's housing in particular that causes so much stress: and not just because of COVID-19 induced job losses. Insecure or unsafe housing is a major contributor to homelessness, family violence, drug dependency, petty theft, and insecure employment. Little wonder there are calls in Australia and elsewhere for government-mandated rental relief and bans on eviction.

Free and secure housing. Universal, well resourced healthcare including dental and mental health. Cheap and accessible medicine and basic hygiene products. Guarantee those, and a whole suite of problems disappear. Let's not forget what really mattered in the crisis, when it's over.

  1. Sling them a donation if you can. ↩︎

Managing Metadata Editing for Telecommuting. / Mark E. Phillips

Many of us in the US, and around the world for that matter, are now sitting at home working remotely trying to maintain some semblance of normalcy during quarantine for COVID-19.

I’m going to write up a few of the things that we have been working on her at the UNT Libraries to try and provide as many of the library employees, including student employees, with activities that they can do remotely.

On March 13th it became quite clear that there was going to be a large number of folks from the library needing to work remotely, many of us have activities that we can do remotely, and some of us even prefer to work remotely when we have the opportunity. There are however a large group of people at the library that don’t have jobs that directly transfer to working remotely, or in the situation where there aren’t students on campus to directly serve, the work that they would be doing isn’t available for them.

We wanted to provide an option for these individuals to create metadata for the UNT Libraries Digital Collections if they were interested in doing so. Additionally, this would give supervisors some meaningful activity that could be verified in the event that we had to move the workforce remotely because of this pandemic.

What we needed

There were a few things that we needed to have in place in order to move people online to edit metadata records.

  • Web-based system for editing metadata
  • Clear instructions and guides
  • A way of identifying work to be done
  • A way of coordinating activity
  • A way of verifying/tracking activity for timekeeping

Web-based editing system.

There were a number of things that we have in place that allow us to move a large number of people into the metadata creation workflow. First, we have been using a web-based metadata editing system for the entire time that we have had our digital collections. Here at UNT Libraries we creatively call this system Edit. This system is built around the metadata format that we have been using locally called UNTL. We can add a user in the system, assign them to a subset of the collection, usually based on collections, and give them permission to begin editing metadata. We have a system in place to register new users in batches and then invite them to join the editing system.

In addition to our own work here in the libraries, we have made use of this metadata editing system for a number of metadata classes of library school students to give them real-world experience writing metadata records in a production system. Because of these previous experiences, the task of creating 60-70 new accounts for users in the library wasn’t too daunting.

For setting up user accounts we make use of a Django app that we wrote a few years ago called django_invite ( Generally, the workflow we follow for new accounts is that we are given one or more new users who need accounts. The information we need is just a name, an email address, and the scope of the collection they need access to. We can enter multiple names at once if they have the same permissions. The permissions for this app are based on the standard Django permission and group concepts.

Django InviteDjango Invite

Once you submit users, the system sends out an invitation email for the user to complete the registration process by picking a name. We are able to see who has established their account and if needed, resend the invitation email.

This process makes it fairly straightforward to get people set up in the system.

Editing Records

As I said, we have been using a web-based editing system for the UNT Libraries Digital Collections for over a decade now. The editing system (Edit) starts a user with a view of all of the records that they can access based on their permissions. We call this view the “Search Dashboard”.

Edit System: Search DashboardEdit System: Search Dashboard

From here a user can search the metadata for a record, sort results, and limit their result sets based on any of the facets on the left-hand side of the screen. For those interested, the facets include.

  • System
  • Collections
  • Partners
  • Resource Type
  • Visibility (hidden or not)
  • Date Validity (valid EDTF dates)
  • My Edits (records you have edited)
  • Record Completeness
  • Location data (with/without placenames, geocodes, or bounding boxes)
  • Recently Edited Records (Last 24h, 48h, 7d, 30d, 90d, 180d, 365d)
Edit System: Limiting to CollectionEdit System: Limiting to Collection

From there a user gets back their results where they can be further refined by sorting.

Edit System: Limited CollectionEdit System: Limited Collection

The current sort options include:

  • Title (default)
  • Date Added (newest/oldest
  • Creation Date (newest/oldest)
  • Date Last Modified (newest/oldest)
  • ARK Identifier (lowest/highest)
  • Completeness (lowest/highest)

They are also able to see basic information about the object including a thumbnail, the system, collections, and partner in which the item belongs. They can see the accession date and the last date that the item was edited. Finally, they can see a visibility flag, green for visible to the public and a red check for not visible.

They can choose to go to one of two places for a record, the edit view or the summary view. I will start with an overview of the summary view.

Edit System: Item SummaryEdit System: Item Summary

The Summary view provides an overview of the record including a compact version of the record itself. We provide an Edit Timeline to get a better sense of when the item has been edited, and when it became available online. Additionally, it has links and other information that are helpful for metadata creators. I will walk through a few of those links now. First up is the View Item screen.

Edit System: View ItemEdit System: View Item

This view is for the metadata creator to interact with the object itself, they are able to see all of the pages of the item and zoom in to look at the details. For audio and video, they have the ability to view the item in a player as well as download the media files as needed.

We also have the ability to see the history of edits that have occurred for a record. This history page presents information about who, when, and a high-level overview of what has happened to the item over time. You might notice a number of edits for this record. One of the things we have noticed in our metadata editing practice is that we tend to take a “column” approach to the editing of records instead of a “row” approach. We will find an issue, maybe an incorrectly formatted name, and fix all of those instances in the system. This results in many edits per record but allows editors to focus on a single task. As you can see, all of the record edits are versioned so it is possible to go back and view what was changed and by whom.

Edit System: Item HistoryEdit System: Item History

The final view is the metadata editor itself. I’ve mentioned this in a number of other posts over time so I won’t go into too much detail here. Basically all of the work of editing gets done here. Users can add, subtract, reorder, and edit elements. Most elements have qualifiers to designate the type of element being used such as the Main Title, Added Title, Serial Title, or Series Title for the title element. Some element such as creator, contributor, and publisher have a type ahead that pull from our name authority system (UNT Names) and include information about the type of name (personal/organization), the role (author, photographer, editor) and an info field for other bits of info about the agent. All dropdown values are pulled from a centralized vocabulary management system.

Some of the fields have popup modals for controlled vocabularies, picking locations from a map, or assigning bounding box information to an object. From here users can mark an object as hidden or visible, and publish the record in order to save it back into the system.

Edit System: Edit RecordEdit System: Edit Record

As I mentioned above there are a number of other components that are proving to be important as we move a large number of works into our metadata system. In the past week, we have created over 70 new accounts for students and staff in the library so that they can begin to incorporate metadata editing into their work. In the next few posts, I will go over how we are attempting to manage who is doing what and how we are providing social and technical infrastructure to help managers keep track of what is going on with the folks they are responsible for.

If you have questions or comments about this post,  please let me know via Twitter.

CAP Code Share: Caselaw Access Project API to CSV / Harvard Library Innovation Lab

Today we’re going to learn how to write case data from the Caselaw Access Project API to CSV. This post shows work from Jack Cushman, Senior Developer at the Harvard Library Innovation Lab.

The Caselaw Access Project makes 6.7 million individual cases freely available from Harvard Law School Library. With this code, we can create a script to get case data from the Caselaw Access Project API, and write that data to a spreadsheet with Python. This demo is made available as part of the CAP Examples repository on Github. Let’s get started!

How does this script find the data it’s looking for? This happens with an API call using the CAP API, and retrieves all cases that include the words “first amendment”: Want to create your own CAP API call? Here’s how.

The Caselaw Access Project has structured, case-level metadata. You can query parts of that data using the CAP API with endpoints, like “court” or “jurisdiction”. Here’s a rundown of the endpoints we have. This demo gets data using these endpoints to write case data to a CSV file: 'id', 'frontend_url', 'name', 'name_abbreviation', 'citation', 'decision_date', 'jurisdiction'. You can adapt this code, and choose your own endpoints.

To run this script, find your CAP API key by creating an account or logging in, and viewing your user details.

This code is part of the CAP Examples repository on Github, a place to find and share code for working with data from the Caselaw Access Project. Do you have code to share? We want to see this resource grow.

Are you creating new things with code or data made available by the Caselaw Access Project? Send it our way. Our inbox is always open.

Hacks with Friends 2020 Retrospective: A pitch to hitch in 2021 / Library Tech Talk (U of Michigan)

Image of event details: March 5-6, 2020. Hack with Friends. 4 - A Pangeo State of Mind. Developing a Python stack and repository for access, analysis, and management of big data

When the students go on winter break I go to Hacks with Friends (HWF) and highly recommend and encourage everyone who can to participate in HWF 2021. Not only is it two days of free breakfast, lunch, and snacks at the Ross School of Business, but it’s a chance to work with a diverse cross section of faculty, staff, and students on innovative solutions to complex problems.

Tracking deforestation in West Africa with satellites and drones: Open Data Day 2020 report / Open Knowledge Foundation

On Saturday 7th March 2020, the tenth Open Data Day took place with people around the world organising over 300 events to celebrate, promote and spread the use of open data. Thanks to generous support from key funders, the Open Knowledge Foundation was able to support the running of more than 60 of these events via our mini-grants scheme

This blogspot is a report by Gideon Sarpong from iWatch Africa in Ghana who received funding from Datopian to host a forum to leverage the power of public domain satellite and drone imagery to track deforestation and water pollution in West Africa.

iWatch Africa marked the 2020 Open Data Day in Accra in collaboration with GOIF and with support from the Open Knowledge Foundation.

The 2020 Open Data Day forum focused on the theme: ‘Leveraging public domain satellite and drone imagery to track deforestation and water pollution in West Africa’.

Over thirty people from various organisations in Accra participated in the event which took place at the Kofi Annan ICT Center in Accra.

Mr. Asante Sabrah of Linux Accra, and lecturer at GIMPA, was a guest speaker at the forum emphasised the importance of public domain satellite data in driving the conversation about climate change in West Africa.

Policy expert and co-founder of iWatch Africa, Mr. Henry Kyeremeh also delivered a presentation highlighting the negative effects of Climate Change on West Africa.

Mr. Kyeremeh stressed that, “the empirical evidence is quite strong on the impact climate change is having on developing countries especially in West Africa. Noticeable among are droughts, reduction in crop yields, unpredictable rainfall pattern which in many instances causes havoc etc.”

Using Lake Chad as a case study, Mr. Kyeremeh said: “The Chad River which has shrank by circa 90 percent since 1973 is a typical example of what climate change and water mismanagement is doing in West Africa.”

He called for innovative solutions and policy making to address this global threat.

“Without doing anything, the situation could be much dire and therefore, domestic and international policy making must respond to mitigating climate change impact by encouraging new technologies for adaption, increase resource allocation and increase research and innovation.”

Gideon Sarpong, policy and news director of iWatch Africa, also lead a session on the day with focus on how the interactive satellite data platform managed by the Global Forest Watch could be used to track deforestation and afforestation in West Africa and aid in innovative media reportage.

iWatch Africa also used the forum to officially join the World Economic Forum (WEF) One Trillion Trees Initiative to help nature and fight climate change.

As part of our climate action, iWatch Africa will collaborate with several organisations including the WEF to plant 5,000 trees in Ghana in the next five years.

• This post was originally published on

More On Failures From FAST 2020 / David Rosenthal

A Study of SSD Reliability in Large Scale Enterprise Storage Deployments by Stathis Maneas et al, which I discussed in Enterprise SSD Reliability, wasn't the only paper at this year's Usenix FAST conference about storage failures. Below the fold I comment on one specifically about hard drives rather than SSDs, making it more relevant to archival storage.

Because HDDs and SSDs are in fact computers, not just media, their software is capable of reporting a good deal of diagnostic information via the SMART API. It has always been an attractive idea to use this information to predict device failures and enable proactive replacement. But past efforts to do so haven't been as effective as one might have hoped.

Now, Making Disk Failure Predictions SMARTer! by Sidi Lu et al applies machine learning to a very comprehensive dataset. Their abstract reads:
Disk drives are one of the most commonly replaced hardware components and continue to pose challenges for accurate failure prediction. In this work, we present analysis and findings from one of the largest disk failure prediction studies covering a total of 380,000 hard drives over a period of two months across 64 sites of a large leading data center operator. Our proposed machine learning based models predict disk failures with 0.95 F-measure and 0.95 Matthews correlation coefficient (MCC) for 10-days prediction horizon on average.
Lu: Figures 1 & 2
Previous work showed that SMART attributes did predict failure, but only so close to actual failure as to prevent proactive replacement. Their Figure 1 shows this - the red line for failed drives only becomes distinct in the last day or so of its life.

On the other hand, Figure 2 shows that the performance metrics of failed disks are distinguishable much earlier. This matches the observation in FAST 2018's Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems by Haryadi Gunawi and 20 other authors from 12 companies, national labs and University data centers that slow performance attributable to various components is common:
Fail-slow hardware is an under-studied failure mode. We present a study of 101 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 12 institutions. We show that all hardware types such as disk, SSD, CPU, memory and network components can exhibit performance faults.
In the left half of Figure 2, note that the failed drives are the slowest for the entire period of the graph.

To observe cases where the performance of a disk leading up to failure was distinguishable from normal behavior, they used the difference between the actual parameter and the average of healthy disks on the same server:
If there is only one failed disk on a specific failed server, we keep the raw value of the failed disk (RFD) and calculate the average value of all healthy disks (AHD) for every time point. Then, we get the difference between RFD and AHD, which indicates the real-time difference between the signatures of failed disks and healthy disks on the same server. If there are N ( N ≥ 2 ) failed disks, then for each failed disk, we calculate the difference between RFD and AHD for every time point.
Lu Figure 5
Using the difference between RFD and AHD, the authors observed a range of behavior patterns leading up to failures, as shown in Figure 5:
The top two graphs of Figure 5 illustrate that some failed disks have a similar value to healthy disks at first, but then their behavior becomes unstable as the disk nears the impending failure. The bottom two graphs of Figure 5 show that some failed disks report a sharp impulse before they fail, as opposed to a longer erratic behavior. These sharp impulses may even repeat multiple times. We did not find such patterns for SMART attributes so far before the failure of this selected example. The diversity of patterns demonstrates that disk failure prediction using performance metrics is non-trivial.
Using machine learning techniques focused on performance metrics, with the interesting addition of location data, they were able to predict disk failures 10 days ahead with high confidence:
We discover that performance metrics are good indicators of disk failures. We also found that location markers can improve the accuracy of disk failure prediction. Lastly, we trained machine learning models including neural network models to predict disk failures with 0.95 F-measure and 0.95 MCC for 10 days prediction horizon.
Disks in the same location in a rack are subject to the same vibration and thermal environment. The relevance of this to failure was noted in Nisha Talagala's 1999 Ph.D. thesis Characterizing Large Storage Systems:Error Behavior and Performance Benchmarks as a prime cause of correlated failures:
The time correlation data in Section 3.5.4 showed that several machines showed bursts of SCSI errors, escalating over time. This data suggests that a sequence of error messages from the same device will suggest imminent failure. A single message is not enough; as Section 3.5 showed, components report hardware errors from time to time without failing entirely. ... the SCSI parity errors were relatively localized, appearing in only three of the 16 machines.
Back in 2015, RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures by Ao Ma et al showed that even simpler methods can work quite well. In EMC's environment they could effectively predict SATA disk failures by observing the reallocated sector count and greatly reduce RAID failures by proactively replacing drives whose counts exceed a threshold.

Ma Figure 6
From their abstract:
We empirically investigate disk failure data from a large number of production systems, specifically focusing on the impact of disk failures on RAID storage systems. Our data covers about one million SATA disks from 6 disk models for periods up to 5 years. We show how observed disk failures weaken the protection provided by RAID. The count of reallocated sectors correlates strongly with impending failures.
we have built and evaluated an active defense mechanism that monitors the health of each disk and replaces those that are predicted to fail imminently. This proactive protection has been incorporated into our product and is observed to eliminate 88% of triple disk errors, which are 80% of all RAID failures.
The graphs in Figure 6 show clearly the strong signal of impending failure that reallocated sector count provides.

You won't believe this shocking semantic web trick I use to avoid publishing my own ontologies! Will I end up going to hell for this? / Peter Sefton

[Update - as soon as this went live I spotted an error in the final example and fixed it].

In this post I describe a disgusting, filthy, but possibly beautiful hack* I devised to get around a common problem in data description using semantic web techniques, specifically JSON-LD and . How can we allow people who don't happen to be Semantic Web über-geeks to be able to define their own vocabularies when they need to go beyond common vocabularies like

* You tell me - beautiful or evil?

Jump to the spoiler at end - actually there are two hacks

For the last few years most of the posts on this blog have been presentations I've given at conferences, for example there's a series of posts on RO-Crate, the most recent of which was from eResearch Australasia. RO-Crate is a specification for describing and packaging research data (could be any data, really, but the main use cases that drove development come from research).

RO-Crate uses JSON-LD as its main metadata format, with vocabulary terms which mostly come from this makes it reasonably
easy for developers to write tools to generate good-quality low-ambiguity metadata in an extensible way. I'm not going to do a full JSON-LD tutorial, but to give you an idea RO-CRATE JSON-LD looks like this:

    "@id": "./",
    "@type": "Dataset",
    "name": "My dataset"

This is easy to work with because it's just JSON, with a trick up its sleeve - the keys in the JSON object, such as name are are defined in a 'context'. At its simplest, the context is a just a lookup between a key and a URI - in this case the context is defined like this:

    "@context": "",

And in the JSON document you get from, among a few hundred other properties is:

     "name": "",

If you go to you can read a definition of the name.

Having definitions is important. Let's take the metadata term "title". In Dublin Core title's the name of a resource in FOAF it's a title as in Mr, Mrs, Dr, Reverend etc in title's a job title.

With all these terms I can use a URI to get to a human readable page to read about the term but here's our problem: not everyone has the resources to define metadata terms by making an ontology and hosting it somewhere online.

So, what to do when you want to describe something, and provide some definitions but there's no obvious ontology to hand?

Lets look at an example. Dr Alana Piper at UTS has some criminal history data, which includes transcriptions of prison records - she sent me a spreadsheet with data on about 2600 prison records in PDF format.

Some of the variables in Alana's data were easy to map to vocabulary, like name and birthDate but some others that are not defined in a handy online ontology, like sentence and offence. You can see some sample data of Alana's here in an RO-Crate. RO-Crates come with web-previews - this is a bit of data that refers to a sentence for one Nora Abbot for the offence of Vagrancy.

I like JSON-LD in general but if you don't define mappings for the keys you're using then when you use JSON-LD software to process the files the undefined keys and their values disappear from your document, which is not user or developer friendly. I don't like that at all - nobody expects their data to be discarded when self-important, opinionated software library feels like it.

And more annoyingly, if you have an ad-hoc vocabulary there's no way to define that in your JSON-LD file or even the data that ships with it. Context keys MUST map to complete URLs.

There's a workaround. You can use a catch all @vocab key in the @context for your JSON-LD which points to a URI so that any undefined terms get forced into a particular vocabulary. does this - so if you use that context then you use whatever terms you like and JSON-LD processors won't swallow your data BUT that's cheating and it's not useful as you don't get real URIs that can be resolved to read a definition of the term. You get FAKE URIs.

Here's a screenshot of what the initial RO-Crate I made with Alana's data looked like - it shows that some of terms (like name, startTime and location are defined - these have a question-mark link beside them you can follow to read the definition. But a couple of others (sentence and offence) didn't have definitions - cos while they do map to a URI, The URI is a FAKE and it's not listed in the official RO-Crate @context.

Screen capture of metadata in RO-CRATE, described above

I've been thinking about this a lot, and looking for approaches to publishing light-weight semantic web vocabs (about which I found pretty much nothing) and eventually I came across some very interesting work from ten years ago, which looked at how to encode semantic statements into a URL for use in content authoring systems that don't allow entry of linked data directly. The solution was to encode semantic web stuff like this-document has-author into a URL. Hey, that idea could be adapted to this situation.

Anyway, what I came up with this decade was the idea of coding the entire definition for property into a URL so we can put that URL into a local context. Stupid? Probably. Naughty Fun that's likely to get disapproving looks from computer scientists and semantic-web purists? Certainly.

Ok, so what does a definition for a property look like? We can ask the server about that from the command line:

curl  -L -H "Accept: application/ld+json"

If we do that, we get: some JSON-LD which I've pruned a bit here:

  "@context": {
    "schema": "",
    "rdf": "",
    "rdfa": "",
    "rdfs": "",
    "schema": "",
  "@id": "schema:name",
  "@type": "rdf:Property",
  "rdfs:comment": "The name of the item.",
  "rdfs:label": "name",

Seems to me the MVD (that's "Minimum Viable Document") for defining a property are an @id, rdf:label and the rdfs:comment so I threw together a simple single-page web thing on the examples bit of our repository server at work that would decode those out of a URL and, well, just show them to you (and I linked to it via the venerable PURL service so it doesn't need a domain name).

Hack 1

So, my first filthy web hack was to change the code I used to generate the RO-Crate of Alana's data - fed it some extra config with definitions of her metadata terms, then set up a super simple one-page web app which ACTs like it's the documentation for an ontology, but actually, you supply your own documentation, in the form of a link, like this: imposed by court for criminal conviction. As the data is drawn from prison records, this will usually consist of a specified term of imprisonment and the type of imprisonment conditions, e.g. 6 months hard labour. However, during this historical period it was common for persons convicted of a minor offence to be sentenced to a fine 'with the option' of a prison sentence if they were unable to pay it. After the introduction in Victoria of the Indeterminate Sentences Act in 1907, prisoners who had been declared 'habitual criminals' could also receive an indefinite sentence that meant they were imprisoned until the government authorities determined that they had sufficiently reformed. Some prisoners also faced additional penalties in addition to their prison sentence, such as periods of solitary confinement, in irons or whippings.&

Follow that and you get a page something like this (it will change and may be removed by the internet police):

Property: sentence


Label: sentence


Penalty imposed by court for criminal conviction. As the data is drawn from prison records, this will usually consist of a specified term of imprisonment and the type of imprisonment conditions, e.g. 6 months hard labour. However, during this historical period it was common for persons convicted of a minor offence to be sentenced to a fine 'with the option' of a prison sentence if they were unable to pay it. After the introduction in Victoria of the Indeterminate Sentences Act in 1907, prisoners who had been declared 'habitual criminals' could also receive an indefinite sentence that meant they were imprisoned until the government authorities determined that they had sufficiently reformed. Some prisoners also faced additional penalties in addition to their prison sentence, such as periods of solitary confinement, in irons or whippings.

To use this property use this text Copy to clipboard <...>

See what I did there? Got around the limitations of JSON-LD and its (I think harmful) insistence that context terms must resolve to URLs and supplied a self-documenting URL which, at least in the context of RO-Crate will allow a user to see something useful when they view the data.

(If JSON-LD is Linked Data encoded in JSON then this must be URL-LD, linked data encoded in URLs - or is it URI-LD?)

Hack 2

Having done this work, and actually put up that web page to illustrate it I then came up with what might be a more elegant solution to the actual problem at hand, which is shipping usable definitions of ad-hoc terms around in an RO-Crate Metadata File. The trick is similar to something we already do in RO-Crate to make metadata as useful as possible. The thing is, some URIs in the semantic web world don't actually resolve to anything usable by most humans which means that in the RO-Crate Website that can accompany a crate the explanatory links are not helpful, so we came up with a way to provide links that are useful by adding an item to the RO-CRate metadata that species a more useful link using the sameAs property.

The example in the RO-Crate spec uses the BIBO interviewee property. It's URL does not resolve to a useful page (that used to be because it went to an RDF file not a web page, but is doubly so at time of writing because it resolves to an error page at

  "@context": [ 
    {"interviewee": ""},
  "@graph": [
      "@id": "",
      "sameAs": "",
      "@type": "Thing"

The above offers a more useful alternative URL and the code that generates the HTML summary of the RO-Crate can use that to provide a gloss [?]. But what if we actual also include the definition?

Instead of the above outrageously big URL with all the info to define sentence in it we could add this to our metadata with a made-up URL for the term and an on-board rdf:Property to define it:

 "@context": [ 
    {"sentence": ""},
"@graph": [
      "@id": "",
      "@type": "rdf:Property",
      "rdfs:label": "sentence",
      "rdf:comment": "Penalty imposed by court for criminal conviction. As the data is drawn from prison records, this will usually consist of a specified term of imprisonment and the type of imprisonment conditions, e.g. 6 months hard labour. However, during this historical period it was common for persons convicted of a minor offence to be sentenced to a fine 'with the option' of a prison sentence if they were unable to pay it. After the introduction in Victoria of the Indeterminate Sentences Act in 1907, prisoners who had been declared 'habitual criminals' could also receive an indefinite sentence that meant they were imprisoned until the government authorities determined that they had sufficiently reformed. Some prisoners also faced additional penalties in addition to their prison sentence, such as periods of solitary confinement, in irons or whippings."

I could then hack the RO-Crate web viewer to use the label and comment supplied here.

So - what do you think?

a. Hack 1? b. Hack 2? c. Both? d. Neither?

Will I go blind?

I think Hack 2 will work but Hack 1 is funnier.

MarcEdit Webinar: Editing Records within the MarcEditor / Terry Reese

Webinar #2

Terry Reese is inviting you to a scheduled CarmenZoom meeting.

Topic: MarcEdit: Editing Records within the MarcEditor
Time: Mar 27, 2020 01:00 PM Eastern Time (US and Canada)

Join Zoom Meeting

One tap mobile
+16468769923,,518973889# US (New York)
+13126266799,,518973889# US (Chicago)

Dial by your location
         +1 646 876 9923 US (New York)
         +1 312 626 6799 US (Chicago)
         +1 301 715 8592 US
         +1 346 248 7799 US (Houston)
         +1 408 638 0968 US (San Jose)
         +1 669 900 6833 US (San Jose)
         +1 253 215 8782 US
Meeting ID: 518 973 889
Find your local number:

Join by SIP

Join by H.323 (US West) (US East) (China) (India Mumbai) (India Hyderabad) (EMEA) (Australia) (Hong Kong) (Brazil) (Canada) (Japan)
Meeting ID: 518 973 889

The Ohio State University

CarmenZoom is supported by the Office of Distance Education and eLearning:

CarmenZoom resources
Phone: 614-688-HELP (4357)

If you have a disability and experience difficulty accessing this content, contact the Accessibility Help Line at 614-292-5000 or Text Telephone for the Deaf at 614-688-8743.

MarcEdit Webinar: Editing Records within the MarcEditor / Terry Reese

Webinar #2 – Evening Session.

Terry Reese is inviting you to a scheduled CarmenZoom meeting.

Topic: MarcEdit: Editing Records within the MarcEditor
Time: Mar 26, 2020 11:00 PM Eastern Time (US and Canada)

Join Zoom Meeting

One tap mobile
+13126266799,,966408364# US (Chicago)
+16468769923,,966408364# US (New York)

Dial by your location
         +1 312 626 6799 US (Chicago)
         +1 646 876 9923 US (New York)
         +1 301 715 8592 US
         +1 346 248 7799 US (Houston)
         +1 408 638 0968 US (San Jose)
         +1 669 900 6833 US (San Jose)
         +1 253 215 8782 US
Meeting ID: 966 408 364
Find your local number:

Join by SIP

Join by H.323 (US West) (US East) (China) (India Mumbai) (India Hyderabad) (EMEA) (Australia) (Hong Kong) (Brazil) (Canada) (Japan)
Meeting ID: 966 408 364

The Ohio State University

CarmenZoom is supported by the Office of Distance Education and eLearning:

CarmenZoom resources
Phone: 614-688-HELP (4357)

If you have a disability and experience difficulty accessing this content, contact the Accessibility Help Line at 614-292-5000 or Text Telephone for the Deaf at 614-688-8743.

Appraising COVID-19 / Ed Summers

The day-to-day reality of the COVID-19 crisis hit home three weeks ago. I’ve been trying to stay focused on the work I had going on before the pandemic…but it has been difficult. It’s really hard not to attend to things that seem relevant to this time we are living through.

In many ways COVID-19 feels like a moment of clarity, for re-examining how we work, and how we live. It is a singular opportunity for collectively recognizing that our lives can, and must, change–especially in light of other crises that demand global attention and cooperation like climate change. Our everyday activities are always being refashioned and rearranged by events. But, unfortunately, COVID-19 is also an opportunity for powerful interests to capitalize on this moment as well. As our infrastructures show their seams, break down, and transmute, our political and economic worlds are being remade.

If you’ve been watching this blog for the past few years you may have noticed that I’ve been interested in how practices of web archiving and archival appraisal meet, and where they don’t meet at all. The web is a big place, but the universe of documentation has always been a pretty big place. So how do we decide what to collect when the web makes everything instantly available, but collecting everything just isn’t realistic? How do our decisions about what to collect reflect (and create) technologies for appraisal that become expressions of our values?

A few weeks ago I learned about an ongoing effort by the International Internet Preservation Consortium (IIPC) to create a collection of web materials in their Novel Coronavirus (COVID-19) Archive-It collection. The IIPC made a nomination form using Google Forms for anyone to submit web content to be archived. These nomination forms have become a kind of standard practice for collaborative web archiving, and in this case are being used to drive Archive-It’s web crawling activities.

You can see previous collaborative web archiving efforts by the IIPC in a list of collections. As of this writing, the IIPC homepage reports that 2573 “sites” have been archived, from 30 languages. It seems fitting that an international organization like the IIPC would focus on the topics like the Olympics, climate change, and the refugee crisis. But how are sites being selected within those topic areas, and who is doing the selecting?

This IIPC collecting effort really spoke to me because it is the type of work we’ve been doing in the Documenting the Now project, since its beginnings in 2014 after the murder of Michael Brown in Ferguson, Missouri. One of the things that Ferguson made clear to us was that a large amount of documentation work was happening in social media, and that it was important to meaningfully engage with social media as a tool for appraisal when documenting an event like Ferguson.

And so thinking about this IIPC COVID-19 collection I became curious about how it intersects with other collecting efforts that are happening. For example a few weeks ago I learned in an email from Amelia Acker that a group of people were using GitHub to collect URLs for Chinese language stories about the COVID-19 crisis in the nCovMemory GitHub repository. While its unclear why GitHub was specifically chosen, media scholar Christina Xu points out that GitHub is difficult to censor in China, not for technical reasons, but for social ones.

Escaping censorship could have been a crucial factor in curating this collection on GitHub. Being able to easily copy the data within GitHub as forks (there are currently over a thousand) and externally as full clones are also affordances that GitHub (the platform) and Git (the technology) both provide. Amelia was rightly concerned about sharing the repository URL publicly on social media, not wanting to draw draw unwanted attention to its authors. But now that information about the repository has spread widely on Twitter, and there has even been a New York Times video documentary that mentions it I think those concerns are less pressing now.

The repository can be edited directly on GitHub, or contributors can submit content via their issue tracker and another project member will integrate it in. The repository also is used to make a static website available. Many of these stories document the Chinese government’s response to the crisis, and contributors are actively archiving content using as part of their process.

nCovMemory GitHub Issue Tracker

In addition people are finding and sharing web content using platforms like Pinboard and Reddit. Milligan, Ruest, & Lin (2016) have described some of the benefits of looking at social media streams like Twitter as a source of URLs–of leveraging the “masses” in addition to the specialized knowledge of content specialists like archivists. With over a 1.7 million members, I think the Coronavirus subreddit definitely qualifies as “the masses”. But nCovMemory and Pinboard have much smaller numbers of contributors, and it appears that they may be specialists in their own right.

So, I thought it would be interesting to look at these sites as potential sources of content that might be in need of archiving. But its important to be a bit more precise with language here, because “archiving” really encompasses preservation, description and access. Making sure there is a copy of the content at a URL is fundamental, but its only part of the work. As Maemura, Worby, Milligan, & Becker (2018) highlight, it’s extremely important to document how the content came to be archived if it is going to be used later. Researchers need to know why content was selected for the collection. So in the IIPC’s case, the spreadsheet that sits behind their Google Form is an essential part of the archive.

But lets return to the case of nCovMemory, Pinboard and Reddit. What would it mean to use these sites to help us document COVID-19? Part of the problem is knowing what URLs are found in these web platforms. Also the platforms are in constant motion so the URLs they make available are constantly changing. While it might be possible to build a generic tool that interfaces with APIs from Twitter, Reddit and Pinboard, I think there will always be ad hoc work to do, as is the case with nCovMemory. Another problem that dedicated applications or tools do is that they tend to obscure their inner workings, and present results in shiny surfaces that don’t reflect the many decisions that went into deciding what is (and is not) there.

Out of habit I started working locally on my laptop in a Jupyter Notebook to see what the IIPC Collection seeds (URLs for archiving) looked like using Archive-It’s Partner API (some of which is public). You can see that notebook here. I then moved on to a nCovMemory notebook, a Reddit notebook, and then a Pinboard notebook. You will see that each notebook fits the kinds of data each platform provides, as well as their authentication methods, and API surfaces.

But one nice thing about working in Jupyter, rather than creating an “application” to perform the collection and present the results, is that notebooks center writing prose about what is being done and why. You can share the notebooks with others as static documents that become executable again in the right environment. For example you can view them all as documents on GitHub launch my notebooks over in MyBinder, where you can rerun them to get the latest results, or run additional analysis and visualization.

It only occurred to me as I was in the middle of putting these notebooks together that just as the IIPC seed spreadsheets are part of the provenance of their collection, these notebooks could serve as documentation of the provenance of how a COVID-19 web archive (or collection if you prefer) came into being.

We’ve seen significant effort by the Library of Congress to to explore their web archives, sometimes using Jupyter notebooks. Recently Andy Jackson and Tim Sherrat announced that they are going to be working on building out practices for exploring history using Jupyter and web archives (see the IIPC Slack for a window into this work). But perhaps Jupyter also has a place when documenting how a collection came into being in the first place? What would be some useful practices for how to do that? Netflix has written about how they use Jupyter notebooks not only for doing data visualization, but as a unit of work in their data analysis pipelines. I think we must consider adding to our toolbox of web appraisal methods, to do more than simply ask people what they think should be archived, and to factor in what they are talking about, and sharing. Using Jupyter notebooks could be a viable way of both doing that work and providing documentation about it.


Maemura, E., Worby, N., Milligan, I., & Becker, C. (2018). If these crawls could talk: Studying and documenting web archives provenance. Journal of the Association for Information Science and Technology, 69(10), 1223–1233. Retrieved from

Milligan, I., Ruest, N., & Lin, J. (2016). Content selection and curation for web archiving: The gatekeepers vs. The masses. In Proceedings of the joint conference on digital libraries. Retrieved from

Build a better registry: My intended comments to the Library of Congress on the next Register of Copyrights / John Mark Ockerbloom

The Library of Congress is seeking public input on abilities and priorities desired for the next Register of Copyrights, who heads the Copyright Office, a department within the Library of Congress.  The deadline for comments as I write this is March 20, though I’m currently having trouble getting the form to accept my input, and operations at the Library, like many other places, are in flux due to the COVID-19 pandemic.  Below I reproduce the main portion of the comments I’m hoping to get in before the deadline, in the hope that they will be useful for both them and others interested in copyright.  I’ve added a few hyperlinks for context.

At root, the Register of Copyrights needs to do the job the position title implies: Build and maintain an effective copyright registry.

A well designed, up-to-date digital registry should make it easy for rightsholders to register, and for the public to use registration information. Using today’s copyright registry involves outdated, cumbersome, and costly technologies and practices. Much copyright data is not online, and the usability of what is online is limited.

The Library of Congress is now redesigning its catalogs for linked data and modern interfaces. Its Copyright Office thus also has an opportunity to build a modern copyright registry linked to Library databases and to the world, with compatible linked data technologies, robust APIs, and free open bulk downloads. The Copyright Office’s registry and the Library of Congress’s bibliographic and authority knowledge bases could share data, using global identifiers to name and describe entities they both cover, including publications, works, creators, rightsholders, publishers, serials and other aggregations, registrations, relationships, and transactions.

The Copyright Office need not convert wholesale to BIBFRAME, or to other Library-specific systems. It simply needs to create and support identifiers for semantic entities described in the registry (“things, not strings“), associate data with them, and exchange data in standard formats with the Library of Congress catalog and other knowledge bases. As a comprehensive US registry for creative works of all types, the Copyright Office is uniquely positioned to manage such data.

The Deep Backfile project at the University of Pennsylvania (which I maintain) provides one example of uses that can be made of linked copyright data. At


is a page showing selected copyrights associated with Collier’s Magazine (1888-1957). It links to online copies of public domain issues, contents and descriptive information from external sources like FictionMags, Wikidata, and Wikipedia, and rights contact information for some of its authors. The information shown has no rights restrictions, and can be used by humans and machines. JSON files, and the entire Deep Backfile knowledge base, are available from this page and from Github.

It is not the Copyright Office’s job to produce applications like these. But it can provide data that powers them. Much of our Deep Backfile data was copied manually from scanned Catalog of Copyright Entries pages, and from online catalogs lacking easily exported or linked data. The Copyright Office and the Library of Congress could instead produce such data natively (first prospectively, eventually retrospectively). In the process, they could also cross-pollinate each other’s knowledge bases.

To implement this vision, the Register needs to understand library standards and linked open data technologies, gather and manage a skilled implementation team, and be sufficiently persuasive, trusted, and organized to bring stakeholders together inside and outside the Copyright Office and the Library of Congress to support and fund a new system’s development. If explained and implemented well, a registry of the sort described here could greatly benefit copyright holders and copyright users alike.

The Register of Copyrights should also know copyright law thoroughly, implement sensible regulations required by copyright law and policy, and be a trusted and inclusive expert that rightsholders, users, and policymakers can consult. I expect other commenters to go into more detail about these skills, which are also useful in building a trustworthy registry of the sort I describe. But the Copyright Office is long overdue to be led by a Register who can revitalize its defining purpose: Register copyrights, in up-to-date, scalable, and flexible ways that encourage wide use of the creations they cover, and thus promote the progress of science and useful arts.

Update, March 20: As of the late afternoon on the day of the deadline, the form appears to be still rejecting my submission, without a clear error message.  It did, however, accept a very short submission without any attachment, and with a URL pointing here.  So below I include the rest of my intended comment, listing 3 top priorities. (The essay above was for the longer comment asked for about knowledge, skills, and abilities.) These priorities largely restate in summary form what I wrote above.  

If anyone else reading this was unable to post their full comment by the deadline due to technical difficulties, you can try emailing something to me (or leaving a comment to this post) and posting a simple comment to that effect on the LC site, and I’ll do my best to get your full comment posted on this blog.

  • Priority #1: Make copyright registration data easy to use: Data should be easy to search, consult, and analyze, individually and in bulk, by people and machines, linked with the Library of Congress’s rich bibliographic data, facilitating verification of copyright ownership, licensing from rightsholders, and cataloging and analysis by libraries, publishers, vendors, and researchers.
  • Priority #2: Make effective copyright registration easy to do: Ensure copyright registration is simple, inexpensive, supports a variety of electronic and physical deposits, and where possible supports persistent, addressible identifiers and accompanying data for semantic entities described in registrations, and their relationships.

  • Priority #3: Be a trusted, inclusive resource for understanding copyright and its uses: Creators, publishers, consumers, and policymakers all are concerned with copyright, and with possible reforms. The Register should help all understand their rights, and provide expert and impartial advice and mediation for diverse copyright stakeholders and policymaking priorities.

  • Other factors: The Register of Copyrights should also be capable of creating, implementing, and keeping up to date appropriate regulations and practices required or implied by Congressional statutes. 

(For the “additional comments” attachment, I had a static PDF attachment showing the Collier’s web page linked from my main essay, as it was on March 19.)


Staying open: how we will continue our work despite COVID-19 / Open Knowledge Foundation

We know that you will be concerned about the impact of COVID-19 on you and your loved ones.

At the Open Knowledge Foundation, our thoughts are with all those around the world who have been affected by the outbreak, and we would like to thank everyone working on the frontline to tackle the virus – health workers, researchers, public servants, cleaners, scientists, shopworkers and many, many others. We urge everyone to follow the official advice issued in their own country.

Despite the challenging circumstances, the Open Knowledge Foundation will continue to campaign for a fair, free and open future.

We recognise that data can play a significant role in obtaining positive solutions to the pandemic when it is open, accessible and disseminated in ways that are useful.

Emergency situations inevitably require emergency governmental powers, so we will be looking to apply our knowledge and skills to ensure technologies are developed and deployed in a manner that is equitable for everyone.

The Open Knowledge Foundation will maintain our international links which will be more critical than ever in the months and years ahead. We will continue to use our best endeavours to support all of our stakeholders and our team members.

We want to reassure all our partners that we expect to be working and delivering on our commitments as normal during this time. We have been a remote organisation for many years and on a practical level, we want to share our individual experiences in the hope that they may be of benefit and comfort to others as people recalibrate. You can read our recently published article on my experiences of remote working here. Remote working has many challenges and opportunities, and being open about our experiences will help others as this practice becomes the new normal.

Making remote working work for you and your organisation / Open Knowledge Foundation

The coronavirus outbreak means that up to 20 per cent of the UK workforce could be off sick or self-isolating during the peak of an epidemic.

Millions of people may not be ill, but they will be following expert advice to stay away from their workplace to help prevent the spread of the virus.

There are clearly hundreds of roles where working from home simply isn’t possible, and questions are rightly being asked about ensuring people’s entitlement to sick pay.

But for a huge number of workers who are usually based in an office environment, remote working is a possibility – and is therefore likely to become the norm for millions.

With the economy in major trouble as evidenced by yesterday’s stock market falls, ensuring those who are fit and able can continue to work is important.

So employers should start today to prepare for efficient remote working as part of their coronavirus contingency planning.

Giant companies such as Twitter are already prepared. But this may be an entirely new concept for some firms.

The Open Knowledge Foundation which I lead has been successfully operating remote working for several years.

Our staff are based in their homes in countries across the world, including the UK, Portugal, Zimbabwe and Australia.

Remote working was new to me a year ago when I joined the organisation.

I had been based in the European Parliament for 20 years as an MEP for Scotland. I had a large office on the 13th floor of the Parliament in Brussels, with space for my staff, as well as an office in Strasbourg when we were based there. For most of my time as a politician, I also had an office in Fife where my team would deal with constituents’ queries.

Things couldn’t be more different today. I work from my home in Dunfermline, in front of my desktop computer, with two screens so that I can type on one and keep an eye on real-time alerts on another.

The most obvious advantage is being able to see more of my family. Being a politician meant a lot of time away from my husband and children, and I very much sympathise with MSPs such as Gail Ross and Aileen Campbell who have decided to stand down from Holyrood to see more of their loved ones. If we want our parliaments to reflect society, we need to address the existing barriers to public office.

Now in charge of a team spread around the world, using a number of technology tools to communicate with them, remote working has been a revelation for me.

Why couldn’t I have used those tools in the European Parliament and even voted remotely?

In the same way that Gail Ross has questioned why there wasn’t a way for her to vote remotely from Wick, hundreds of miles from Edinburgh, the same question must be asked of the European Parliament.

But for companies now planning remote working, it is vital to adopt effective methods.

Access to reliable Wi-Fi is key, but effective communication is critical. Without physical interaction, a virtual space with video calling is essential.

It is important to see the person when remote working and be able to interact as close as it would be face-to-face. This also avoids distraction and allows people to check in with each other.

We tend to do staff calls through our Slack channel and our weekly all-staff call is through Google Hangout.

All-staff calls – or all-hands call as we call them – are important if people are forced to work remotely. We do this once a week, but for some organisations morning calls will also become an essential part of the day.

Our monthly global network call is on an open source tool called Jitsi and I use Zoom for diary meetings.

If all else fails, we resort to Skype and WhatsApp.

In terms of how we share documents between the team, we use Google Drive. That means participants in conference calls can see and update an agenda and add action points in real-time, and make alterations or comments on documents such as letters which need to be checked by multiple people.

In the same way that our staff work and collaborate remotely, using technology to co-operate on a wider scale also goes to the heart of our vision for a future that is fair, free and open.

We live in a time when technological advances offer incredible opportunities for us all.

Open knowledge will lead to enlightened societies around the world, where everyone has access to key information and the ability to use it to understand and shape their lives; where powerful institutions are comprehensible and accountable; and where vital research information that can help us tackle challenges such as poverty and climate change is available to all.

Campaigning for this openness in society is what our day job entails.

But to achieve that we have first worked hard to bring our own people together using various technological options.

Different organisations will find different ways of making it work.

But what is important is to have a plan in place today.

This post was originally published by the Herald newspaper

Frictionless Public Utility Data: A Pilot Study / Open Knowledge Foundation

This blog post describes a Frictionless Data Pilot with the Public Utility Data Liberation project. Pilot projects are part of the Frictionless Data for Reproducible Research project. Written by Zane Selvans, Christina Gosnell, and Lilly Winfree.

The Public Utility Data Liberation project, PUDL, aims to make US energy data easier to access and use. Much of this data, including information about the cost of electricity, how much fuel is being burned, powerplant usage, and emissions, is not well documented or is in difficult to use formats. Last year, PUDL joined forces with the Frictionless Data for Reproducible Research team as a Pilot project to release this public utility data. PUDL takes the original spreadsheets, CSV files, and databases and turns them into unified Frictionless tabular data packages that can be used to populate a database, or read in directly with Python, R, Microsoft Access, and many other tools. 

What is PUDL?

The PUDL project, which is coordinated by Catalyst Cooperative, is focused on creating an energy utility data product that can serve a wide range of users. PUDL was inspired to make this data more accessible because the current US utility data ecosystem fragmented, and commercial products are expensive. There are hundreds of gigabytes of information available from government agencies, but they are often difficult to work with, and different sources can be hard to combine.

PUDL users include researchers, activists, journalists, and policy makers. They have a wide range of technical backgrounds, from grassroots organizers who might only feel comfortable with spreadsheets, to PhDs with cloud computing resources, so it was important to provide data that would work for all users. 

Before PUDL, much of this data was freely available to download from various sources, but it was typically messy and not well documented. This led to a lack of uniformity and reproducibility amongst projects that were using this data. The users were scraping the data together in their own way, making it hard to compare analyses or understand outcomes. Therefore, one of the goals for PUDL was to minimize these duplicated efforts, and enable the creation of lasting, cumulative outputs.

What were the main Pilot goals?

The main focus of this Pilot was to create a way to openly share the utility data in a reproducible way that would be understandable to PUDL’s many potential users. The first change Catalyst identified they wanted to make during the Pilot was with their data storage medium. PUDL was previously creating a Postgresql database as the main data output. However many users,  even those with technical experience, found setting up the separate database software a major hurdle that prevented them from accessing and using the processed data. They also desired a static, archivable, platform-independent format. Therefore, Catalyst decided to transition PUDL away from PostgreSQL, and instead try Frictionless Tabular Data Packages. They also wanted a way to share the processed data without needing to commit to long-term maintenance and curation, meaning they needed the outputs to continue being useful to users even if they only had minimal resources to dedicate to the maintenance and updates. The team decided to package their data into Tabular Data Packages and identified Zenodo as a good option for openly hosting that packaged data.

Catalyst also recognized that most users only want to download the outputs and use them directly, and did not care about reproducing the data processing pipeline themselves, but it was still important to provide the processing pipeline code publicly to support transparency and reproducibility. Therefore, in this Pilot, they focused on transitioning their existing ETL pipeline from outputting a PostgreSQL database, that was defined using SQLAlchemy, to outputting datapackages which could then be archived publicly on Zenodo. Importantly, they needed this pipeline to maintain the metadata, information about data type, and database structural information that had already been accumulated. This rich metadata needed to be stored alongside the data itself, so future users could understand where the data came from and understand its meaning. The Catalyst team used Tabular Data Packages to record and store this metadata (see the code here:

Another complicating factor is that many of the PUDL datasets are fairly entangled with each other. The PUDL team ideally wanted users to be able to pick and choose which datasets they actually wanted to download and use without requiring them to download it all (currently about 100GB of data when uncompressed). However, they were worried that if single datasets were downloaded, the users might miss that some of the datasets were meant to be used together. So, the PUDL team created information, which they call “glue”,  that shows which datasets are linked together and that should ideally be used in tandem. 

The cumulation of this Pilot was a release of the PUDL data (access it here – and read the corresponding documentation here –, which includes integrated data from the EIA Form 860, EIA Form 923, The EPA Continuous Emissions Monitoring System (CEMS), The EPA Integrated Planning Model (IPM), and FERC Form 1.

What problems were encountered during this Pilot?

One issue that the group encountered during the Pilot was that the data types available in Postgres are substantially richer than those natively in the Tabular Data Package standard. However, this issue is an endemic problem of wanting to work with several different platforms, and so the team compromised and worked with the least common denominator.  In the future, PUDL might store several different sets of data types for use in different contexts, for example, one for freezing the data out into data packages, one for SQLite, and one for Pandas. 

Another problem encountered during the Pilot resulted from testing the limits of the draft Tabular Data Package specifications. There were aspects of the specifications that the Catalyst team assumed were fully implemented in the reference (Python) implementation of the Frictionless toolset, but were in fact still works in progress. This work led the Frictionless team to start a documentation improvement project, including a revision of the specifications website to incorporate this feedback. 

Through the pilot, the teams worked to implement new Frictionless features, including the specification of composite primary keys and foreign key references that point to external data packages. Other new Frictionless functionality that was created with this Pilot included partitioning of large resources into resource groups in which all resources use identical table schemas, and adding gzip compression of resources. The Pilot also focused on implementing more complete validation through goodtables, including bytes/hash checks, foreign keys checks, and primary keys checks, though there is still more work to be done here.

Future Directions

A common problem with using publicly available energy data is that the federal agencies creating the data do not use version control or maintain change logs for the data they publish, but they do frequently go back years after the fact to revise or alter previously published data — with no notification. To combat this problem, Catalyst is using data packages to encapsulate the raw inputs to the ETL process. They are setting up a process which will periodically check to see if the federal agencies’ posted data has been updated or changed, create an archive, and upload it to Zenodo. They will also store metadata in non-tabular data packages, indicating which information is stored in each file (year, state, month, etc.) so that there can be a uniform process of querying those raw input data packages. This will mean the raw inputs won’t have to be archived alongside every data release. Instead one can simply refer to these other versioned archives of the inputs. Catalyst hopes these version controlled raw archives will also be useful to other researchers.

Another next step for Catalyst will be to make the ETL and new dataset integration more modular to hopefully make it easier for others to integrate new datasets. For instance, they are planning on integrating the EIA 861 and the ISO/RTO LMP data next. Other future plans include simplifying metadata storage, using Docker to containerize the ETL process for better reproducibility, and setting up a Pangeo  instance for live interactive data access without requiring anyone to download any data at all. The team would also like to build visualizations that sit on top of the database, making an interactive, regularly updated map of US coal plants and their operating costs, compared to new renewable energy in the same area. They would also like to visualize power plant operational attributes from EPA CEMS (e.g., ramp rates, min/max operating loads, relationship between load factor and heat rate, marginal additional fuel required for a startup event…). 

Have you used PUDL? The team would love to hear feedback from users of the published data so that they can understand how to improve it, based on real user experiences. If you are integrating other US energy/electricity data of interest, please talk to the PUDL team about whether they might want to integrate it into PUDL to help ensure that it’s all more standardized and can be maintained long term. Also let them know what other datasets you would find useful (E.g. FERC EQR, FERC 714, PHMSA Pipelines, MSHA mines…).  If you have questions, please ask them on GitHub ( so that the answers will be public for others to find as well.

Proof-of-Stake In Practice / David Rosenthal

At the most abstract level, the work of Eric Budish, Raphael Auer, Joshua Gans and Neil Gandal is obvious. A blockchain is secure only if the value to be gained by an attack is less than the cost of mounting it. These papers all assume that actors are "economically rational", driven by the immediate monetary bottom line, but this isn't always true in the real world. As I wrote when commenting on Gans and Gandal:
As we see with Bitcoin's Lightning Network, true members of the cryptocurrency cult are not concerned that the foregone interest on capital they devote to making the system work is vastly greater than the fees they receive for doing so. The reason is that, as David Gerard writes, they believe that "number go up". In other words, they are convinced that the finite supply of their favorite coin guarantees that its value will in the future "go to the moon", providing capital gains that vastly outweigh the foregone interest.
Follow me below the fold for a discussion of a recent attack on a Proof-of-Stake blockchain that wasn't motivated by the immediate monetary bottom line.

Steem was one of the efforts to decentralize the Web discussed in the MIT report: They pointed out that:
Right now, the distribution of SP across users in the system is very unequal -- more than 90% of SP tokens are held by less than 2% of account holders in the system. This immense disparity in voting power complicates Steemit’s narrative around democratized content curation -- it means that a very small number of users are extremely influential and that the vast majority of users’ votes are virtually inconsequential.
Now this has proven true. David Gerard reports that:
Distributed Proof-of-Stake leaves your blockchain open to takeover bids — such as when Justin Sun of TRON tried to take over the Steem blockchain, by enlisting exchanges such as Binance to pledge their holdings to his efforts.
Gerard links to Yulin's Cheng's Tron takeover? Steem community in uproar as crypto exchanges back reversal of blockchain governance soft fork, a detailed account of events. First:
On Feb. 14, Steemit entered into a "strategic partnership" with Tron that saw Steemit's chairman declare on social media that he had sold Steemit to [Justin Sun]," referring to Tron's founder.
The result was that:
Concerns that Tron might possess too much power over the network resulted in a move by the Steem community on Feb. 24 to implement a soft fork. The soft fork deactivated the voting power of a large number of tokens owned by TRON and Steemit.
That was soft fork 2.22. One week later, on March 2nd, Tron arranged for exchanges, including Huobi, Binance and Poloniex, to stake tokens they held on behalf of their customers in a 51% attack:
According to the list of accounts powered up on March. 2, the three exchanges collectively put in over 42 million STEEM Power (SP).

With an overwhelming amount of stake, the Steemit team was then able to unilaterally implement hard fork 22.5 to regain their stake and vote out all top 20 community witnesses – server operators responsible for block production – using account @dev365 as a proxy. In the current list of Steem witnesses, Steemit and TRON’s own witnesses took up the first 20 slots.
Although this attack didn't provide Tron with an immediate monetary reward, the long term value of retaining effective control of the blockchain was vastly greater than the cost of staking the tokens. I've been pointing out that the high Gini coefficients of cryptocurrencies means Proof-of-Stake centralizes control of the blockchain in the hands of the whales since 2017's Why Decentralize? quoted Vitalik Buterin pointing out that a realistic scenario was:
In a proof of stake blockchain, 70% of the coins at stake are held at one exchange.
Or in this case three exchanges cooperating.

Apparently, the tokens that soft fork 2.22  blocked from voting were mined before the blockchain went live and retained by Steemit:
"The stake was essentially premined and was always said to be for on-boarding and community building. The witnesses decided to freeze it in an attempt to prevent a hostile takeover of the network,” [@jeffjagoe] told The Block. "But they forgot Justin has a lot of money, and money buys buddies at the exchanges."
Vitalik Buterin commented:
"Apparently Steem DPOS got taken over by big exchanges voting with depositors' funds," he tweeted. "Seems like the first big instance of a 'de facto bribe attack' on coin voting (the bribe being exchs giving holders convenience and taking their votes."
As Buterin wrote in 2014, Proof-of-Stake turned out to be non-trivial.

NOW AVAILABLE: VIVO 1.11.1 / DuraSpace News

VIVO 1.11.1 is now available!

VIVO 1.11.1 is a point release containing two patches to the previous 1.11.0 release:
– Security patch that now prevents users with self-edit privileges from editing other user profiles [1]
– Minor security patch to underlying puppycrawl dependency (CVE-2019-9658) [2]

Upgrading from 1.11.0 to 1.11.1 should be a trivial drop-in replacement. If you are upgrading from 1.10 or before, see the VIVO 1.11.0 Release Announcement [3].

VIVO 1.11.1 can be downloaded now from

Release Manager
– Ralph O’Flinn

Developers / Testers
– Huda Khan
– Benjamin Gross
– Brian Lowe
– Ralph O’Flinn
– Andrew Woods


The post NOW AVAILABLE: VIVO 1.11.1 appeared first on

Welcome to everybody’s online libraries / John Mark Ockerbloom

As coronavirus infections spread throughout the world, lots of people are staying home to slow down the spread and save lives.  In the US, many universities, schools, and libraries have closed their doors.  (Here’s what happening at the library where I work, which as I write this has closed all its buildings.)  But lots of people are still looking for information, to continue studies online, or just to find something good to read.

Libraries are stepping up to provide these things online.  Many libraries have provided online information for years, through our own websites, electronic resources that we license, create, or link to, and other online services.  During this crisis, as our primary forms of interaction move online, many of us will be working hard to meet increased demand for digital materials and services (even as many library workers also have to cope with increased demands and stresses on their personal lives). Services are likely to be in flux for a while.  I have a few suggestions for the near term:

Check your libraries’ web sites regularly. They should tell you whether the libraries are now physically open or closed (many are closed now, for good reason), and what services the library is currently offering.  Those might change over time, sometimes quickly.  Our main library location at Penn, for instance, was declared closed indefinitely last night, less than 12 hours before it was next due to reopen.   On the other hand, some digitally mediated library services and resources might not be available initially, but then become available after we have safe and workable procedures set up for them and sufficient staffing.   

Many library web sites also prominently feature their most useful electronic resources and services, and have extensive collections of electronic resources in their catalogs or online directories.  They may be acquiring more electronic resources to meet increased user demand for online content. Some providers are also increasing what they offer to their library customers during the crisis, and sometimes making some of their material free for all to access.

If  you need particular things from your library during this crisis, reach out to them using the contact information given on their website.  When libraries know what their users need, they can often make those needs a priority, and can let you know if and when they can provide them.

Check out other free online library services.    I run one of them, The Online Books Page, which now lists over 3 million books and serials freely readable online due to their public domain status or the generosity of their rightsholders.   We’ll be adding more material there over the next few weeks as we incorporate the listings of more collections, and respond to your requests.  There are many other services online as well.   Wikipedia serves not only as a crowd-sourced collection of articles on millions of topics, but also as a directory of further online resources related to those topics.   And the Internet Archive also offers access millions of books and other information resources no longer readily commercially available, many through controlled digital lending and other manifestations of fair use.  (While the limits of fair use are often subject to debate, library copyright specialists make a good case that its bounds tend to increase during emergencies like this one.  See also Kyle Courtney’s blog for more discussion of useful things libraries can do in a health crisis with their copyright powers.)

Support the people who provide the informative and creative resources you value.  The current health crisis has also triggered an economic crisis that will make life more precarious for many creators.  If you have funds you can spare, send some of them their way so they can keep making and publishing the content you value.  Humble Bundles, for instance, offer affordable packages of ebooks, games, and other online content you can enjoy while you’re staying home, and pay for to support their authors, publishers, and associated charities.  (I recently bought their Tachyon SF bundle with that in mind; it’s on offer for two more weeks as I write this.)  Check the websites of your favorite authors and artists to see if they offer ways to sponsor their work, or specific projects they’re planning.  Buy books from your favorite independent booksellers (and if they’re closed now, check their website or call them to see if you can buy gift cards to keep them afloat now and redeem them for books later on).  Pay for journalism you value.  Support funding robust libraries in your community.

Consider ways you can help build up online libraries.  Many research papers on COVID-19 and related topics have been opened to free access by their authors or publishers since the crisis began.  Increasing numbers of scholarly and other works are also being made open access, especially by those who have already been paid for creating them.   If you’re interested in sharing your work more broadly, and want to learn more about how you can secure rights to do so, the Authors’ Alliance has some useful resources.

As libraries shift focus from in-person to online service, some librarians may be busy with new tasks, while others may be left hanging until new plans and procedures get put into motion.  If you’re in the latter category, and want something to do, there are various library-related projects you can work on or learn about.  One that I’m running is the deep backfile project to identify serial issues that are in the public domain in less-than-obvious ways, and to find or create free digital copies of these serials (so that, among other things, people who are stuck at home can read them online).  I’ve recently augmented my list of serial backfiles to research to include serials held by the library in which I work, in the hopes that we could eventually find or produce digital surrogates for some of them that our readers (and anyone else interested) could access from afar.  I can also add sets for other libraries; if you’re interested in one for yours, let me know and I can go into more detail about the data I’m looking for.  (I’m not too worried about creating too many serial sets to research, especially since once information about a serial is added into one of the serial sets, it also gets automatically added into any other sets that include that serial.)

Take care of yourself, and your loved ones.  Whether you work in libraries of just use them, this is a stressful time.  Give yourself and those around you room and resources to cope, as we disengage from much of our previous activities, and deal with new responsibilities and concerns.  I’m gratified to see the response of the Wikimedia Foundation, for instance, which is committed both to keeping the world well-informed and up-to-date through Wikipedia and related projects, and also to letting its staff and contractors work half-time for the same pay during the crisis, and waiving sick-day limits. Among new online community support initiatives, I’m also pleased to see librarian-created resources like the Ontario Library Association’s pandemic information brief, with useful information for library users and workers, and the COVID4GLAM Discord community, a discussion space to support the professional and personal needs of people working in libraries, archives, galleries and museums.

These will be difficult times ahead.  Our libraries can make a difference online, even as our doors are closed.  I hope you’ll be able to put them to good use.


Cancellation of Evergreen Conference 2020 / Evergreen ILS

Due to the health and safety concerns of the Evergreen community during the COVID-19 pandemic, the Evergreen Conference Committee and The Evergreen Project Board have come to the conclusion that the 2020 Evergreen International Conference in Atlanta must be canceled and have done so. Conference registration fees, exhibitor fees, and sponsorships will be refunded. (We will be reaching out to exhibitors directly as well.)

Please be sure to cancel your hotel and travel reservations. While we will miss seeing everyone this April we are looking forward to the 2021 Evergreen International Conference in Missouri!

All of our best wishes for your health and safety,
The Evergreen Conference Committee & The Evergreen Project

March 2020 ITAL Issue Now Available / LITA

The March 2020 issue of Information Technology and Libraries (ITAL) is available now. In this issue, ITAL Editor Ken Varnum shares his support of LITA, ALCTS, and LLAMA merging to form a new ALA division, Core. Our content includes a message from LITA President, Emily Morton-Owens. “A Framework for Member Success,“ Morton-Owens discusses the current challenges of LITA as a membership organization and reinvention being the key to survival. Also in this edition, Laurie Willis discusses the pros and cons of handling major projects in-house versus hiring a vendor in “Tackling Big Projects.” Sheryl Cormicle Knox and Trenton Smiley discuss using digital tactics as a cost-effective way to increase marketing reach in “Google Us!

Featured Articles:

User Experience Methods and Maturity in Academic Libraries,” Scott W. H. Young, Zoe Chao, and Adam Chandler

This article presents a mixed-methods study of the methods and maturity of user experience (UX) practice in academic libraries. The authors apply qualitative content analysis and quantitative statistical analysis to a research dataset derived from a survey of UX practitioners. Results reveal the type and extent of UX methods currently in use by practitioners in academic libraries. Read more.

Virtual Reality,” by Megan Frost, Michael Goates, Sarah Cheng, and Jed Johnston

We conducted a survey to inform the expansion of a Virtual Reality (VR) service in our library. The survey assessed user experience, demographics, academic interests in VR, and methods of discovery. Currently our institution offers one HTC Vive VR system that can be reserved and used by patrons within the library, but we would like to expand the service to meet the interests and needs of our patrons. We found use among all measured demographics and sufficient patron interest for us to justify expansion of our current services. Read more.

Using Augmented and Virtual Reality in Information Literacy Instruction to Reduce Library Anxiety in Nontraditional and International Students,” by Angela Sample

Throughout its early years, the Oral Roberts University (ORU) Library held a place of pre-eminence on campus. ORU’s founder envisioned the Library as central to all academic function and scholarship. Under the direction of the founding dean of learning resources, the Library was an early pioneer in innovative technologies and methods. However, over time, as the case with many academic libraries, the Library’s reputation as an institution crucial to the academic work on campus had diminished. Read more.

Bento Box User Experience Study at Franklin University,” by Marc Jaffy

This article discusses the benefits of the bento-box method of searching library resources, including a comparison of the method with a tabbed search interface. It then describes a usability study conducted by the Franklin University Library in which 27 students searched for an article, an ebook, and a journal on two websites: one using a bento box and one using the EBSCO Discovery Service (EDS). Screen recordings of the searches were reviewed to see what actions users took while looking for information on each site, as well as how long the searches took. Read more.

User Experience with a New Public Interface for an Integrated Library System,” by Kelly Blessinger and David Comeaux

The purpose of this study was to understand the viewpoints and attitudes of researchers at Louisiana State University toward the new public search interface from SirsiDynix, Enterprise. Fifteen university constituents participated in user studies to provide feedback while completing common research tasks. Particularly of interest to the librarian observers were identifying and characterizing where problems were expressed by the participants as they utilized the new interface. Read more.

Creating and Managing a Repository of Past Exam Papers,” by Mariya Maistrovskaya and Rachel Wang

Exam period can be a stressful time for students, and having examples of past papers to help prepare for the tests can be extremely helpful. It is possible that past exams are already shared on your campus—by professors in their specific courses, via student unions or groups, or between individual students. In this article, we will go over the workflows and infrastructure to support systematically collecting, providing access to, and managing a repository of past exam papers. Read more.

Meeting Users Where They Are” by Graham Sherriff, Dan DeSanto, Daisy Benson, and Gary S. Atwood

Campus portals are one of the most visible and frequently used online spaces for students, offering one-stop access to key services for learning and academic self-management. This case study reports how instruction librarians at the University of Vermont collaborated with portal developers in the registrar’s office to develop high-impact, point-of-need content for a dedicated “Library” page. This content was then created in LibGuides and published using the Application Programming Interfaces (APIs) for LibGuides boxes. Read more.

Submit Your Ideas

Contact ITAL Editor Ken Varnum at with your proposal. Current formats are generally:

  • Articles – original research or comprehensive and in-depth analyses, in the 3000-5000 word range.
  • Communications – brief research reports, technical findings, and case studies, in the 1000-3000 word range.

Questions or Comments?

For all other questions or comments related to LITA publications, contact us at (312) 280-4268 or

QOTD: Storytelling in protest and politics / Jodi Schneider

I recently read Francesca Polletta‘s book It was like a fever: Storytelling in protest and politics (2006, University of Chicago Press).

I recommend it! It will appeal to researchers interested in topics such as narrative, strategic communication, (narrative) argumentation, or epistemology (here, of narrative). Parts may also interest activists.

The book’s case studies are drawn from the Student Nonviolent Coordinating Committee (SNCC) (Chapters 2 & 3); online deliberation about the 9/11 memorial (Listening to the City, summer 2002) (Chapter 4); women’s stories in law (including, powerfully, battered women who had killed their abusers, and the challenges in making their stories understandable) (Chapter 5); references to Martin Luther King by African American Congressmen (in the Congressional Record) and by “leading back political figures who were not serving as elected or appointed officials” (Chapter 6). Several are extended from work Polletta previously published from 1998 through 2005 (see page xiii for citations).

The conclusion—”Conclusion: Folk Wisdom and Scholarly Tales” (pages 166-187)—takes up several topics, starting with canonicity, interpretability, ambivalence. I especially plan to go back to the last two sections: “Scholars Telling Stories” (pages 179-184)—about narrative and storytelling in analysts’ telling of events—and “Towards a Sociology of Discursive Forms” (pages 185-187)—about investigating the beliefs and conventions of narrative and its institutional conventions (and relating those to conventions of other “discursive forms” such as interviews). These set forward a research agenda likely useful to other scholars interested in digging in further. These are foreshadowed a bit in the introduction (“Why Stories Matter”) which, among other things, sets out the goal of developing “A Sociology of Storytelling”.

A few quotes I noted—may give you the flavor of the book:

page 141: “But telling stories also carries risks. People with unfamiliar experiences have found those experiences assimilated to canonical plot lines and misheard as a result. Conventional expectations about how stories work, when they are true, and when they are appropriate have also operated to diminish the impact of otherwise potent political stories. For the abused women whom juries disbelieved because their stories had changed in small details since their first traumatized [p142] call to police, storytelling has not been especially effective. Nor was it effective for the citizen forum participants who did not say what it was like to search fruitlessly for affordable housing because discussions of housing were seen as the wrong place in which to tell stories.”

pages 166-167: “So which is it? Is narrative fundamentally subversive or hegemonic? Both. As a rhetorical form, narrative is equipped to puncture reigning verities and to uphold them. At times, it seems as if most of the stories in circulation are subtly or not so subtly defying authorities; at others as if the most effective storytelling is done by authorities. To make it more complicated, sometimes authorities unintentionally undercut their own authority when they tell stories. And even more paradoxically, undercutting their authority by way of a titillating but politically inconsequential story may actually strengthen it. Dissenters, for their part, may find their stories misread in ways that support the very institutions that are challenging….”For those interested in the relations between storytelling, protest, and politics, this all suggests two analytical tasks. One is to identify the features of narrative that allow it to [p167] achieve certain rhetorical effects. The other is to identify the social conditions in which those rhetorical effects are likely to be politically consequential. The surprise is that scholars of political processes have devoted so little attention to either task.”

pages 177-8 – “So institutional conventions of storytelling influence what people can do strategically with stories. In the previously pages, I have described the narrative conventions that operate in legal adjudication, media reporting, television talk shows, congressional debate, and public deliberation. Sociolinguists have documented such conventions in other settings: in medical intake interviews, for example, parole hearings, and jury deliberations. One could certainly generate a catalogue of the institutional conventions of storytelling. To some extent, those conventions reflect the peculiarities of the institution as it has developed historically. They also serve practical functions; some explicit, others less so. I have argued that the lines institutions draw between suitable and unsuitable occasions for storytelling or for certain kinds of stories serve to legitimate the institution.” [specific examples follow] ….”As these examples suggest, while institutions have different conventions of storytelling, storytelling does some of the same work in many institutions. It does so because of broadly shared assumptions about narrative’s epistemological status. Stories are generally thought to be more affecting by less authoritative than analysis, in part because narrative is associated with women rather than men, the private sphere rather than the public one, and custom rather than law. Of course, conventions of storytelling and the symbolic associations behind them are neither unitary nor fixed. Nor are they likely to be uniformly advantageous for those in power and disadvantageous for those without it. Narrative’s alignment [179] along the oppositions I noted is complex. For example, as I showed in chapter 5, Americans’ skepticism of expert authority gives those telling stories clout. In other words, we may contrast science with folklore (with science seen as much more credible), but we may also contrast it with common sense (with science seen as less credible). Contrary to the lamentation of some media critics and activists, when disadvantaged groups have told personal stories to the press and on television talk shows, they have been able to draw attention not only to their own victimization but to the social forces responsible for it.

Letter from the Editor / Information Technology and Libraries

Statement of support in favor of LITA, ALCTS, and LLAMA merging to form a new ALA division, Core.

A Framework for Member Success / Information Technology and Libraries

Our organization and governance (our committees, offices, processes, etc.) play a major role in what it is like to be a member. For those of us who are most involved in ALA and LITA, the organization may be familiar and supportive. But for new members looking for a foothold, or library workers who don’t see themselves in our association, our organization may look like a barrier. Moreover, many of our financial challenges are connected to our organization. The organization must evolve, but we must achieve this without losing what makes us loyal members.


Google Us! / Information Technology and Libraries

Capital Area District Libraries enrolled in Googe Ad Grants in 2018 and receives up to $10,000 of in-kind Google Ads each month. This article describes how we obtained the grant, the campaigns we've developed, and the impact it has made on visits to our online branch.

Tackling the Big Projects / Information Technology and Libraries

Everyone who works with library technology sooner or later finds they are faced with a major project to tackle. Sometimes we contract with a vendor to do the bulk of the work, sometimes we do the project ourselves. This article reviews the advantages and disadvantages to both methods.